Concentration without moments

Wassily Hoeffding (1914-1991) — Wassily Hoeffding (1914 -- 1991)

This post presents an inequality for self-normalized sums without moment assumptions, due to Bradley Efron (1969), that I have learnt from Laëtitia Comminges.

Symmetric laws. Recall that a probability distribution is symmetric when \( {X} \) and \( {-X} \) are equally distributed if \( {X} \) is a random variable following this distribution. In this case \( {\varepsilon=\mathrm{sign}(X)} \) and \( {|X|} \) are independent and \( {\varepsilon} \) follows a symmetric Rademacher distribution: \( {\mathbb{P}(\varepsilon=\pm1)=1/2} \).

Concentration. Let \( {X_1,\ldots,X_n} \) be independent real random variables with symmetric law and without atom at \( {0} \). Then for any real \( {r>0} \),

\[ \mathbb{P}\left(\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}}\geq r\right) \leq\mathrm{e}^{-\frac{r^2}{2}}. \]

Note that this is available without any moment assumption on the random variables.

Proof. Thanks to the independence and symmetry assumptions, the random variables \( {\varepsilon_1=\mathrm{sign}(X_1),\ldots,\varepsilon_n=\mathrm{sign}(X_n)} \) are iid, follow the symmetric Rademacher distribution, and are independent of \( {|X_1|,\ldots,|X_n|} \). Now by conditioning we get

\[ \mathbb{P}\left(\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}}\geq r\right) =\mathbb{E}(\varphi_r(|X_1|,\ldots,|X_n|)) \]

where \( {\varphi_r(c_1,\ldots,c_n)=\mathbb{P}((\varepsilon_1c_1+\cdots+\varepsilon_nc_n)/\sqrt{c_1^2+\cdots+c_n^2}\geq r)} \). We can assume that \( {c_i>0} \) since \( {\mathbb{P}(X_i=0)=0} \). It remains to use the Hoeffding inequality, which states that if \( {Z_1,\ldots,Z_n} \) are independent centered and bounded real random variables then for any real \( {r>0} \),

\[ \mathbb{P}\left(Z_1+\cdots+Z_n\geq r\right) \leq\exp\left(-\frac{2r^2}{\mathrm{osc}(Z_1)^2+\cdots+\mathrm{osc}(Z_n)^2}\right). \]

where \( {\mathrm{osc}(Z)=\max(Z)-\min(Z)} \). Here we use it with, for any \( {i=1,\ldots,n} \),

\[ Z_i=\frac{c_i}{\sqrt{c_1^2+\cdots+c_n^2}}\varepsilon_i \quad\text{for which}\quad \mathrm{osc}(Z_i)^2=\frac{4c_i^2}{c_1^2+\cdots+c_n^2}. \]

Indeed this gives \( { \varphi_r(c_1,\ldots,c_n) =\mathbb{P}\left(Z_1+\cdots+Z_n\geq r\right) \leq\mathrm{e}^{-\frac{r^2}{2}}} \).

Probabilistic interpretation. When \( {X_1,\ldots,X_n} \) are iid and in \( {L^2} \), then their mean is zero, and their variance is say \( {\sigma^2>0} \). The law of large numbers gives \( {\sqrt{X_1^2+\cdots+X_n^2}=\sqrt{n}(\sigma+o_{n\rightarrow\infty}(1))} \) almost surely. Therefore by the central limit theorem and Slutsky's lemma we get \( {(X_1+\cdots+X_n)/\sqrt{X_1^2+\cdots+X_n^2}\overset{\text{law}}{\longrightarrow}\mathcal{N}(0,1)} \) as \( {n\rightarrow\infty} \).

Geometric interpretation. If \( {X_1,\ldots,X_n} \) are iid standard Gaussian, then

\[ \frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}} =\langle U_n,\theta_n\rangle \]

where

\[ U_n=\sqrt{n}\frac{(X_1,\ldots,X_n)}{\sqrt{X_1^2+\cdots+X_n^2}} \quad\text{and}\quad \theta_n=\frac{(1,\ldots,1)}{\sqrt{n}}. \]

The random vector \( {U_n} \) is uniformly distributed on the sphere of \( {\mathbb{R}^n} \) of radius \( {\sqrt{n}} \), while the vector \( {\theta_n} \) belongs to the unit sphere. Note that \( {\langle U_n,\theta_n\rangle} \) is the law of the sum of the coordinates of a row or column of a uniform random orthogonal matrix.

Relation to Studentization. The result above can be related to the Studentized version of the empirical mean. Indeed, if one defined the empirical mean and the empirical variance

\[ \overline{X}_n=\frac{X_1+\cdots+X_n}{n} \quad\text{and}\quad \widehat\sigma^2_n=\frac{(X_1-\overline{X}_n)^2+\cdots+(X_n-\overline{X}_n)^2}{n-1} \]

then using

\[ (n-1)\widehat\sigma_n^2 =X_1^2+\cdots+X_n^2-\frac{(X_1+\cdots+X_n)^2}{n} \]

we get, for any \( {r\geq0} \), after some algebra,

\[ \left\{\sqrt{n}\frac{\overline{X}_n}{\widehat\sigma_n}\geq r\right\} =\left\{\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}} \geq r\sqrt{\frac{n}{n-1+r^2}}\right\}. \]

It follows then from the concentration inequality above that if \( {X_1,\ldots,X_n} \) are independent, with symmetric law without atom at \( {0} \), then for any \( {r\geq0} \),

\[ \mathbb{P}\left(\sqrt{n}\frac{\overline{X}_n}{\widehat\sigma_n}\geq r\right) \leq\exp\left(-\frac{nr^2}{2(n-1+r^2)}\right). \]

If \( {X_1,\ldots,X_n} \) are iid centered Gaussian then \( {\overline{X}_n\sim\mathcal{N}(0,1)} \) and \( {\widehat{\sigma}^2_n\sim\chi^2(n-1)} \) are independent and their ratio \( {\sqrt{n}\overline{X}_n/\widehat\sigma_n} \) follows the Student \( {t(n-1)} \) law, of density proportional to \( {x\mapsto 1/(1+t^2/(n-1))^{n/2}} \), which is in particular heavy tailed.

Further reading.

Bradley Efron
Student's t-Test Under Symmetry Conditions
Journal of the American Statistical Association 64(328) 1278-1302 (1969)
Sergey Bobkov and Friedrich Götze
Concentration inequalities and limit theorems for randomized sums
Probability Theory Related Fields 137(1-2) 49-81 (2007)
Qi-Man Shao and Qiying Wang
Self-normalized limit theorems: a survey (includes Cramér large deviations)
Probability Surveys 10 69-93 (2013)

Some other posts: