When the CLT fails… Sparsity and localization – Libres pensées d'un mathématicien ordinaire

This post discusses some basic aspects of the Central Limit Theorem (CLT) in relation with the notions of localization and sparsity. Let \( {G\sim\mathcal{N}(0,1)} \) and \( {(X_n)_{n\geq1}} \) be a sequence of independent real random variables with, for every \( {n\geq1} \),

\[ \mathbb{E}(X_n)=0\quad\text{and}\quad \sigma_n^2:=\mathbb{E}(X_n^2) \]

Let us define

\[ S_n:=\frac{X_1+\cdots+X_n}{s_n} \quad\text{where}\quad s_n^2:=\mathrm{Var}(X_1+\cdots+X_n)=\sigma_1^2+\cdots+\sigma_n^2. \]

The Lindeberg CLT states that if, for every \( {\varepsilon>0} \),

\[ \lim_{n\rightarrow\infty}\frac{1}{s_n^2}\sum_{k=1}^n \mathbb{E}(X_k^2\mathbf{1}_{\{|X_k|>\varepsilon s_n\}})=0 \]

then \( {(S_n)_{n\geq1}} \) converges in distribution to the standard Gaussian \( {\mathcal{N}(0,1)} \), in other words,

\[ \lim_{n\rightarrow\infty}\mathbb{P}(S_n\leq x)=\mathbb{P}(G\leq x) \quad\text{for all }x\in\mathbb{R}. \]

Moreover, the Feller criterion states that if

\[ \lim_{n\rightarrow\infty}\max_{1\leq k\leq n}\frac{\sigma_k^2}{s_n^2}=0 \]

then the Lindeberg condition is necessary and sufficient for the convergence of \( {(S_n)_{n\geq1}} \) in distribution to the standard Gaussian \( {\mathcal{N}(0,1)} \). The Feller condition means that each single variance \( {\sigma_k^2} \) represents an asymptotically negligible portion of the total variance \( {s_n^2} \), as \( {n} \) goes to infinity. In other words, the total variance is spread as \( {n} \) goes to infinity.

On the contrary, and quite intuitively, one can guess that if \( {(\sigma_n)_{n\geq1}} \) is localized then \( {S_n} \) is very close to the sum of few terms for arbitrary large \( {n} \), and the CLT may fail due to a lack of averaging (homogenization). Of course, if the sequence \( {(\sigma_n)_{n\geq1}} \) is sparse (extreme localization!) i.e. \( {\mathrm{Card}\{n\geq1:\sigma_n\neq0\}<\infty} \), then the CLT fails. Beyond sparsity, let us seek for a more subtle example for which one can check immediately from scratch that the CLT fails. Let us pick a sequence \( {(\sigma_n)_{n\geq1}} \) of positive real numbers, and a sequence \( {(U_n)_{n\geq1}} \) of bounded i.i.d. random variables on \( {[-c,c]} \) with mean \( {0} \) and variance \( {1} \). If we define the random variable \( {X_k:=\sigma_kU_k} \) then

\[ \mathbb{E}(X_k)=0 \quad\text{and}\quad \mathbb{E}(X_k^2)=\sigma_k^2 \]

and

\[ c^{-1}|S_n|\leq \frac{\left\Vert(\sigma_1,\ldots,\sigma_n)\right\Vert_1} {\left\Vert(\sigma_1,\ldots,\sigma_n)\right\Vert_2} =:\rho_n. \]

Now, if \( {(\rho_n)_{n\geq1}} \) is bounded then \( {(S_n)_{n\geq1}} \) is bounded and thus the CLT fails. The norms-ratio \( {\rho_n} \) measures the delocalisation of the vector \( {(\sigma_1,\ldots,\sigma_n} \)). Note that \( {(\rho_n)_{n\geq1}} \) is bounded if \( {(\sigma_n)_{n\geq1}} \) grows too fast or decays too fast. For instance, \( {(\rho_n)_{n\geq1}} \) is bounded if \( {(\sigma_n)_{n\geq1}\in\ell^1} \). On the other hand, since \( {s_n^2=s_{n-1}^2+\sigma_n^2} \), the Cauchy-Schwarz inequality gives

\[ \rho_n \leq \frac{1}{s_n}\sum_{k=1}^{n-1}\sigma_k +\frac{\sigma_n}{s_n} \leq \sqrt{n-1}\,\frac{s_{n-1}}{\sigma_n}+1 \]

and \( {(\rho_n)_{n\geq1}} \) is bounded e.g. if \( {\sigma_n=s_{n-1}\sqrt{n}} \). The delocalization control is an essential aspect of the CLT. The Berry-Esséen theorem, which constitues a quantitative CLT, involves also a norms-ratio measuring localization: if \( {(X_n)_{n\geq1}} \) are independent real random variables with

\[ \mathbb{E}(X_n)=0 \quad\text{and}\quad \sigma^2_n:=\mathbb{E}(X_n^2) \quad\text{and}\quad \tau_n^3:=\mathbb{E}(|X_n|^3) \]

and if \( {V_n:=(X_1,\ldots,X_n)} \) and \( {S_n} \) is defined from \( {(X_n)_{n\geq1}} \) as before then for all \( {n\geq1} \),

\[ \sup_{x\in\mathbb{R}} \left|\mathbb{P}\left(S_n\leq x\right)-\mathbb{P}(G\leq x)\right| \leq 6\frac{\mathbb{E}(\left\Vert V_n\right\Vert_3^3)} {\mathbb{E}(\left\Vert V_n\right\Vert_2^2)^{3/2}} = 6\frac{\tau_1^3+\cdots+\tau_n^3}{(\sigma_1^2+\cdots+\sigma_n^2)^{3/2}}. \]

You may take a look at the recent work of Klartag and Sodin on the role of delocalization in the Berry-Esséen theorem.

Measuring (de)localisation with norms-ratios is a classical trick in mathematics and physics. It plays a role for instance for eigenvectors in the formalization of the Anderson localization phenomenon for random Schrödinger operators, and in the recent work of Erdös, Schlein, Ramirez, Yau, Tao and Vu on the universality of eigenvalues spacings for models of random matrices. The norm ratio is also related to embeddings in the local theory of Banach spaces.

This post is inspired from a question asked by my friend Sébastien Blachère.

When the CLT fails... Sparsity and localization

Some other posts: