This tiny back to basics post is devoted to a couple of bits of Probability and Statistics.

The central limit theorem cannot hold in probability. Let ${{(X_n)}_{n\geq1}}$ be iid real random variables with zero mean and unit variance. The central limit theorem (CLT) states that as ${n\rightarrow\infty}$,

$Z_n=\frac{X_1+\cdots+X_n}{\sqrt{n}} \overset{\text{law}}{\longrightarrow}\mathcal{N}(0,1).$

A frequently asked question by good students is to know if one can replace the convergence in law by the (stronger) convergence in probability. The answer is negative, and in particular the convergence cannot hold almost surely or in ${L^p}$. Let us examine why. Recall that the convergence in probability is stable by linear combinations and by subsequence extraction.

We proceed by contradiction. Suppose that ${Z_n\rightarrow Z_\infty}$ in probability. Then necessarily ${Z_\infty\sim\mathcal{N}(0,1)}$. Now, on the one hand, ${Z_{2n}-Z_n\rightarrow0}$ in probability. On the other hand,

$Z_{2n}-Z_n =\frac{1-\sqrt{2}}{\sqrt{2}}Z_n+\frac{X_{n+1}\cdots+X_{2n}}{\sqrt{2n}} =\frac{1-\sqrt{2}}{\sqrt{2}}Z_n+\frac{1}{\sqrt{2}}Z_n’.$

But ${Z_n’}$ is an independent copy of ${Z_n}$. Thus the CLT used twice gives ${Z_{2n}-Z_n\overset{\text{law}}{\longrightarrow}\mathcal{N}(0,\sigma^2)}$ with ${\sigma^2=(1-\sqrt{2})^2/2+1/2=2-\sqrt{2}\neq0}$, hence the contradiction.

Alternative proof. Set ${S_n=X_1+\cdots+X_n}$, and observe that

$\frac{S_{2n}-S_n}{\sqrt{n}}=\sqrt{2}Z_{2n}-Z_n.$

Now, if the CLT was in probability, the right hand side would converge in probability to ${\sqrt{2}Z_\infty-Z_\infty}$ which follows the law ${\mathcal{N}(0,(\sqrt{2}-1)^2)}$. On the other hand, since ${S_{2n}-S_n}$ has the law of ${S_n}$, by the CLT, the left hand side converges in law towards ${Z_\infty\sim\mathcal{N}(0,1)}$, hence the contradiction. This “reversed” proof was kindly suggested by Michel Ledoux.

Intermezzo: Slutsky lemma. The Slutsky lemma asserts that if

$X_n\overset{\text{law}}{\longrightarrow} X \quad\text{and}\quad Y_n\overset{\text{law}}{\longrightarrow} c$

with ${c}$ constant, then

$(X_n,Y_n)\overset{\text{law}}{\longrightarrow}(X,c),$

and in particular, ${f(X_n,Y_n)\overset{\text{law}}{\longrightarrow} f(X,c)}$ for every continuous ${f}$.

Let us prove it. Since ${Y_n\overset{\text{law}}{\longrightarrow} c}$ and ${c}$ is constant, we have ${Y_n\rightarrow c}$ in probability, and since for all ${t\in\mathbb{R}}$, the function ${y\mapsto \mathrm{e}^{ity}}$ is uniformly continuous on ${\mathbb{R}}$, we have that for all ${s,t\in\mathbb{R}}$ and all ${\varepsilon>0}$, there exists ${\eta>0}$ such that for large enough ${n}$,

$\begin{array}{rcl} |\mathbb{E}(\mathrm{e}^{isX_n+itY_n})-\mathbb{E}(\mathrm{e}^{isX_n+itc})| &\leq&\mathbb{E}(|\mathrm{e}^{itY_n}-\mathrm{e}^{itc}|\mathbf{1}_{|Y_n-c|\leq\eta})+2\mathbb{P}(|Y_n-c|>\eta)\\ &\leq& \varepsilon+2\varepsilon. \end{array}$

Alternatively we can use the Lipschitz property instead of the uniform continuity:

$\begin{array}{rcl} |\mathbb{E}(\mathrm{e}^{isX_n+itY_n})-\mathbb{E}(\mathrm{e}^{isX_n+itc})| &\leq&\mathbb{E}(\left|\mathrm{e}^{itY_n}-\mathrm{e}^{itc}\right|\mathbf{1}_{|Y_n-c|\leq\eta})+2\mathbb{P}(|Y_n-c|>\eta)\\ &\leq& |t|\eta+2\varepsilon. \end{array}$

On the other hand, since ${X_n\overset{\text{law}}{\longrightarrow}X}$, we have, for all ${s,t\in\mathbb{R}}$, as ${n\rightarrow\infty}$,

$\mathbb{E}(\mathrm{e}^{isX_n+itc})=\mathrm{e}^{itc}\mathbb{E}(\mathrm{e}^{isX_n}) \longrightarrow \mathrm{e}^{itc}\mathbb{E}(\mathrm{e}^{isX}) =\mathbb{E}(\mathrm{e}^{isX+itc}).$

The delta-method. Bizarrely this basic result, very useful in Statistics, appears to be unknown to many young probabilists. Suppose that as ${n\rightarrow\infty}$,

$a_n(Z_n-b_n)\overset{\text{law}}{\longrightarrow}L,$

where ${{(Z_n)}_{n\geq1}}$ is a sequence of real random variables, ${L}$ a probability distribution, and ${{(a_n)}_{n\geq1}}$ and ${{(b_n)}_{n\geq1}}$ deterministic sequences such that ${a_n\rightarrow\infty}$ and ${b_n\rightarrow b}$. Then for any ${\mathcal{C}^1}$ function ${f:\mathbb{R}\rightarrow\mathbb{R}}$ such that ${f'(b)\neq0}$, we have

$\frac{a_n}{f'(b)}(f(Z_n)-f(b_n))\overset{\text{law}}{\longrightarrow}L.$

The typical usage in Statistics is for the fluctuations of estimators say for ${a_n(Z_n-b_n)=\sqrt{n}(\widehat{\theta}_n-\theta)}$. Note that the rate in ${n}$ and the fluctuation law are not modified! Let us give a proof. By a Taylor formula or here the mean value theorem,

$f(Z_n)-f(b_n)=f'(W_n)(Z_n-b_n)$

where ${W_n}$ is a random variable lying between ${b_n}$ and ${Z_n}$. Since ${a_n\rightarrow\infty}$, the Slutsky lemma gives ${Z_n-b_n\rightarrow0}$ in law, and thus in probability since the limit is deterministic. As a consequence ${W_n-b_n\rightarrow0}$ in probability and thus ${W_n\rightarrow b}$ in probability. The continuity of ${f’}$ at point ${b}$ provides ${f'(W_n)\rightarrow f'(b)}$ in probability, hence ${f'(W_n)/f'(b)\rightarrow1}$ in probability, and again by Slutsky lemma,

$\frac{a_n}{f'(b)}(f(Z_n)-f(b_n)) =\frac{f'(W_n)}{f'(b)}a_n(Z_n-b_n) \overset{\text{law}}{\longrightarrow}L.$

If ${f'(b)=0}$ then one has to use a higher order Taylor formula, and the rate and fluctuation will be deformed by a power. Namely, suppose that ${f^{(1)}(b)=\cdots=f^{(r-1)}(b)=0}$ while ${f^{(r)}(b)\neq0}$, then, denoting ${L_r}$ the push forward of ${L}$ by ${x\mapsto x^r}$, we get

$\frac{a_n^rr!}{f^{(r)}(b)}(f(Z_n)-f(b_n)) \overset{\text{law}}{\longrightarrow}L_r.$

The delta-method can be of course generalized to sequences of random vectors, etc.

Last Updated on 2018-01-26

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Syntax · Style · .