# Libres pensées d'un mathématicien ordinaire Posts

Sean O’Rourke pointed out on December 30, 2017 that a notation should be corrected in the statement of Lemma A.1 in the probability survey Around the circular law (2012) that I wrote years ago in collaboration with Charles Bordenave.

Indeed the definition of ${\sigma^2}$ should be corrected to

$\sigma^2 :=\min_{1\leq i,j\leq n}\mathrm{Var}(X_{ij}\mid|X_{i,j}|\leq a)>0.$

It was erroneously written

$\sigma^2 :=\min_{1\leq i,j\leq n}\mathrm{Var}(X_{ij}\mathbf{1}_{|X_{i,j}|\leq a})>0.$

Let us take this occasion for a back to basics about conditional variance and variance of truncation. Let ${X}$ be a real random variable on ${(\Omega,\mathcal{F},\mathbb{P})}$ and ${A\in\mathcal{F}}$ be an event. First the real number ${\mathbb{E}(X\mid A)=\mathbb{E}(X\mid\mathbf{1}_A=1)}$ is not the random variable ${\mathbb{E}(X\mid\mathbf{1}_A)}$. We have

$\mathbb{E}(X\mid\mathbf{1}_A) =\underbrace{\frac{\mathbb{E}(X\mathbf{1}_A)}{\mathbb{P}(A)}}_{\mathbb{E}(X\mid A)}\mathbf{1}_A +\underbrace{\frac{\mathbb{E}(X\mathbf{1}_{A^c})}{\mathbb{P}(A^c)}}_{\mathbb{E}(X\mid A^c)}\mathbf{1}_{A^c}.$

Note that this formula still makes sense when ${\mathbb{P}(A)=0}$ or ${\mathbb{P}(A)=1}$.

The quantity ${\mathbb{E}(X\mid A)}$ makes sense only if ${\mathbb{P}(A)>0}$, and in this case, the conditional variance of ${X}$ given the event ${A}$ is the real number given by

$\begin{array}{rcl} \mathrm{Var}(X\mid A) &=&\mathbb{E}((X-\mathbb{E}(X\mid A))^2\mid A)\\ &=&\mathbb{E}(X^2\mid A)-\mathbb{E}(X\mid A)^2\\ &=&\frac{\mathbb{E}(X^2\mathbf{1}_A)}{\mathbb{P}(A)} -\frac{\mathbb{E}(X\mathbf{1}_A)^2}{\mathbb{P}(A)^2}\\ &=& \frac{\mathbb{E}(X^2\mathbf{1}_A)\mathbb{P}(A)-\mathbb{E}(X\mathbf{1}_A)^2}{\mathbb{P}(A)^2}\\ &=&\mathbb{E}_A(X^2)-\mathbb{E}_A(X)^2=:\mathrm{Var}_A(X) \end{array}$

where ${\mathbb{E}_A}$ is the expectation with respect to the probability measure with density ${\mathbf{1}_A/\mathbb{P}(A)}$ with respect to ${\mathbb{P}}$. In particular, by the Cauchy–Schwarz inequality,

$\mathrm{Var}(X\mid A) \geq 0$

with equality if and only if ${X}$ and ${\mathbf{1}_A}$ are colinear.

Of course ${\mathrm{Var}(X\mid A)=0}$ if ${X}$ is constant. However ${\mathrm{Var}(X\mid A)}$ may vanish for a non-constant ${X}$. Indeed if ${A=\{|X|\leq a\}}$ and if ${X\sim\frac{1}{2}\delta_{a/2}+\frac{1}{2}\delta_{2a}}$ then ${X\mid A}$ is constant and equal to ${a/2}$. In this example, since ${X\mathbf{1}_A}$ is not a constant, this shows also that one cannot lower bound ${\mathrm{Var}(X\mid A)}$ with the variance of the truncation

$\mathrm{Var}(X\mathbf{1}_A)=\mathbb{E}(X^2\mathbf{1}_A)-\mathbb{E}(X\mathbf{1}_A)^2.$

Another notable correction. Mylène Maïda pointed out to me on February 27 2018 that at the bottom of page 14, just before the statement

$\sup_{z\in C}|n\varphi_{n,1}(\sqrt{n}z)-\pi^{-1}\mathbf{1}_{[0,1]}(|z|)|=0$

the compact set ${C}$ must be taken in ${\{z\in\mathbb{C}:|z|\neq1\}}$ and not on the whole complex plane ${\mathbb{C}}$. Indeed, when ${|z|=1}$, ${n\varphi_{n,1}(\sqrt{n}z)}$ tends as ${n\rightarrow\infty}$ to ${1/2}$, and not to ${\pi^{-1}}$, see for instance this former post for a one formula proof based on the central limit theorem for Poisson random variables. Anyway this is really not surprising since a sequence of continuous functions cannot converge uniformly to a discontinuous function.

This tiny back to basics post is devoted to a couple of bits of Probability and Statistics.

The central limit theorem cannot hold in probability. Let ${{(X_n)}_{n\geq1}}$ be iid real random variables with zero mean and unit variance. The central limit theorem (CLT) states that as ${n\rightarrow\infty}$,

$Z_n=\frac{X_1+\cdots+X_n}{\sqrt{n}} \overset{\text{law}}{\longrightarrow}\mathcal{N}(0,1).$

A frequently asked question by good students is to know if one can replace the convergence in law by the (stronger) convergence in probability. The answer is negative, and in particular the convergence cannot hold almost surely or in ${L^p}$. Let us examine why. Recall that the convergence in probability is stable by linear combinations and by subsequence extraction.

We proceed by contradiction. Suppose that ${Z_n\rightarrow Z_\infty}$ in probability. Then necessarily ${Z_\infty\sim\mathcal{N}(0,1)}$. Now, on the one hand, ${Z_{2n}-Z_n\rightarrow0}$ in probability. On the other hand,

$Z_{2n}-Z_n =\frac{1-\sqrt{2}}{\sqrt{2}}Z_n+\frac{X_{n+1}\cdots+X_{2n}}{\sqrt{2n}} =\frac{1-\sqrt{2}}{\sqrt{2}}Z_n+\frac{1}{\sqrt{2}}Z_n’.$

But ${Z_n’}$ is an independent copy of ${Z_n}$. Thus the CLT used twice gives ${Z_{2n}-Z_n\overset{\text{law}}{\longrightarrow}\mathcal{N}(0,\sigma^2)}$ with ${\sigma^2=(1-\sqrt{2})^2/2+1/2=2-\sqrt{2}\neq0}$, hence the contradiction.

Alternative proof. Set ${S_n=X_1+\cdots+X_n}$, and observe that

$\frac{S_{2n}-S_n}{\sqrt{n}}=\sqrt{2}Z_{2n}-Z_n.$

Now, if the CLT was in probability, the right hand side would converge in probability to ${\sqrt{2}Z_\infty-Z_\infty}$ which follows the law ${\mathcal{N}(0,(\sqrt{2}-1)^2)}$. On the other hand, since ${S_{2n}-S_n}$ has the law of ${S_n}$, by the CLT, the left hand side converges in law towards ${Z_\infty\sim\mathcal{N}(0,1)}$, hence the contradiction. This “reversed” proof was kindly suggested by Michel Ledoux.

Intermezzo: Slutsky lemma. The Slutsky lemma asserts that if

$X_n\overset{\text{law}}{\longrightarrow} X \quad\text{and}\quad Y_n\overset{\text{law}}{\longrightarrow} c$

with ${c}$ constant, then

$(X_n,Y_n)\overset{\text{law}}{\longrightarrow}(X,c),$

and in particular, ${f(X_n,Y_n)\overset{\text{law}}{\longrightarrow} f(X,c)}$ for every continuous ${f}$.

Let us prove it. Since ${Y_n\overset{\text{law}}{\longrightarrow} c}$ and ${c}$ is constant, we have ${Y_n\rightarrow c}$ in probability, and since for all ${t\in\mathbb{R}}$, the function ${y\mapsto \mathrm{e}^{ity}}$ is uniformly continuous on ${\mathbb{R}}$, we have that for all ${s,t\in\mathbb{R}}$ and all ${\varepsilon>0}$, there exists ${\eta>0}$ such that for large enough ${n}$,

$\begin{array}{rcl} |\mathbb{E}(\mathrm{e}^{isX_n+itY_n})-\mathbb{E}(\mathrm{e}^{isX_n+itc})| &\leq&\mathbb{E}(|\mathrm{e}^{itY_n}-\mathrm{e}^{itc}|\mathbf{1}_{|Y_n-c|\leq\eta})+2\mathbb{P}(|Y_n-c|>\eta)\\ &\leq& \varepsilon+2\varepsilon. \end{array}$

Alternatively we can use the Lipschitz property instead of the uniform continuity:

$\begin{array}{rcl} |\mathbb{E}(\mathrm{e}^{isX_n+itY_n})-\mathbb{E}(\mathrm{e}^{isX_n+itc})| &\leq&\mathbb{E}(\left|\mathrm{e}^{itY_n}-\mathrm{e}^{itc}\right|\mathbf{1}_{|Y_n-c|\leq\eta})+2\mathbb{P}(|Y_n-c|>\eta)\\ &\leq& |t|\eta+2\varepsilon. \end{array}$

On the other hand, since ${X_n\overset{\text{law}}{\longrightarrow}X}$, we have, for all ${s,t\in\mathbb{R}}$, as ${n\rightarrow\infty}$,

$\mathbb{E}(\mathrm{e}^{isX_n+itc})=\mathrm{e}^{itc}\mathbb{E}(\mathrm{e}^{isX_n}) \longrightarrow \mathrm{e}^{itc}\mathbb{E}(\mathrm{e}^{isX}) =\mathbb{E}(\mathrm{e}^{isX+itc}).$

The delta-method. Bizarrely this basic result, very useful in Statistics, appears to be unknown to many young probabilists. Suppose that as ${n\rightarrow\infty}$,

$a_n(Z_n-b_n)\overset{\text{law}}{\longrightarrow}L,$

where ${{(Z_n)}_{n\geq1}}$ is a sequence of real random variables, ${L}$ a probability distribution, and ${{(a_n)}_{n\geq1}}$ and ${{(b_n)}_{n\geq1}}$ deterministic sequences such that ${a_n\rightarrow\infty}$ and ${b_n\rightarrow b}$. Then for any ${\mathcal{C}^1}$ function ${f:\mathbb{R}\rightarrow\mathbb{R}}$ such that ${f'(b)\neq0}$, we have

$\frac{a_n}{f'(b)}(f(Z_n)-f(b_n))\overset{\text{law}}{\longrightarrow}L.$

The typical usage in Statistics is for the fluctuations of estimators say for ${a_n(Z_n-b_n)=\sqrt{n}(\widehat{\theta}_n-\theta)}$. Note that the rate in ${n}$ and the fluctuation law are not modified! Let us give a proof. By a Taylor formula or here the mean value theorem,

$f(Z_n)-f(b_n)=f'(W_n)(Z_n-b_n)$

where ${W_n}$ is a random variable lying between ${b_n}$ and ${Z_n}$. Since ${a_n\rightarrow\infty}$, the Slutsky lemma gives ${Z_n-b_n\rightarrow0}$ in law, and thus in probability since the limit is deterministic. As a consequence ${W_n-b_n\rightarrow0}$ in probability and thus ${W_n\rightarrow b}$ in probability. The continuity of ${f’}$ at point ${b}$ provides ${f'(W_n)\rightarrow f'(b)}$ in probability, hence ${f'(W_n)/f'(b)\rightarrow1}$ in probability, and again by Slutsky lemma,

$\frac{a_n}{f'(b)}(f(Z_n)-f(b_n)) =\frac{f'(W_n)}{f'(b)}a_n(Z_n-b_n) \overset{\text{law}}{\longrightarrow}L.$

If ${f'(b)=0}$ then one has to use a higher order Taylor formula, and the rate and fluctuation will be deformed by a power. Namely, suppose that ${f^{(1)}(b)=\cdots=f^{(r-1)}(b)=0}$ while ${f^{(r)}(b)\neq0}$, then, denoting ${L_r}$ the push forward of ${L}$ by ${x\mapsto x^r}$, we get

$\frac{a_n^rr!}{f^{(r)}(b)}(f(Z_n)-f(b_n)) \overset{\text{law}}{\longrightarrow}L_r.$

The delta-method can be of course generalized to sequences of random vectors, etc.

J’ai fini par remiser mon Origine Tuxedo au profit d’un vélo moins fragile, mieux adapté à mon nouveau trajet. Voici donc ma nouvelle bicyclette. Un poids relativement faible, de gros pneus, des freins à disques hydrauliques, 2×10 vitesses classiques, des porte-bagages. Un régal.

J’étais vendredi à l’Institut Henri-Poincaré  pour les exposés mensuel du MEGA (Matrices Et Graphes Aléatoires). Saviez-vous que cet institut doit son existence notamment aux efforts de Émile Borel – mathématicien et homme politique, figure sans doute très inspirante pour Cédric Villani – ainsi qu’aux moyens de la Fondation Rockefeller et de Edmond de Rothschild ?

Le cours didactique du matin était donné par Laurent Ménard, et portait sur la combinatoire analytique à la Philippe Flajolet. Il s’agit typiquement d’obtenir des formules de comptage notamment asymptotiques en utilisant l’arsenal de l’analyse complexe (intégrales de contour, méthode du col, …) à partir d’identités combinatoires fonctionnelles sur les fonctions génératrices. D’après Laurent une excellente référence est le livre Analytic combinatorics.

Le premier exposé de l’après-midi était donné par l’énergique Yan Fyodorov et portait sur l’article arXiv:1710.04699 concernant des formules explicites pour des statistiques liées aux vecteurs propres pour les modèles matriciels gaussiens de Ginibre réel et complexe. Le second était donné par Alice Guionnet et portait sur l’étude de modèles de gaz discrets avec beta variable, en liaison par exemple avec l’article arXiv:1705.05527. Dans les deux cas, la virtuosité et l’arsenal techniques sont impressionnants. Ces deux orateurs sont des sommités mondiales.

Il semble que dans le domaine des matrices aléatoires, la plupart des questions simples abordables ont déjà été explorées. Doit-on s’attendre dans le futur à des vagues de travaux simplificateurs ? C’est ce qu’on peut souhaiter au sujet. Certains pensent que pour survivre au temps, les mathématiques ont besoin d’être simples et profondes et que cela résulte d’une lente digestion collective. Au delà de ce domaine, il est frappant de constater la place grandissante de la sophistication technique dans les mathématiques actuelles. On se prend à douter parfois.

En tout cas, ces trois exposés étaient passionnant et enthousiasmants !