# Libres pensées d'un mathématicien ordinaire Posts

We organize with my colleagues Olivier Guédon, Guillaume Lecué, and Alain Pajor, of Marne-la-Vallée, a one week workshop on Probability & Geometry in High Dimensions.

The aim of this workshop is to reflect on recent developments in Probability and Geometry in High Dimensions with emphasis on interactions with other fields of mathematics such as compressed sensing, sparse statistical problems, random matrices, and empirical processes.

The workshop will take place on the campus of Université Paris-Est Marne-la-Vallée, May 17-21 2010. It turns out that Michel Talagrand will give a talk at this occasion.

This post discusses some basic aspects of the Central Limit Theorem (CLT) in relation with the notions of localization and sparsity. Let ${G\sim\mathcal{N}(0,1)}$ and ${(X_n)_{n\geq1}}$ be a sequence of independent real random variables with, for every ${n\geq1}$,

$\mathbb{E}(X_n)=0\quad\text{and}\quad \sigma_n^2:=\mathbb{E}(X_n^2)$

Let us define

$S_n:=\frac{X_1+\cdots+X_n}{s_n} \quad\text{where}\quad s_n^2:=\mathrm{Var}(X_1+\cdots+X_n)=\sigma_1^2+\cdots+\sigma_n^2.$

The Lindeberg CLT states that if, for every ${\varepsilon>0}$,

$\lim_{n\rightarrow\infty}\frac{1}{s_n^2}\sum_{k=1}^n \mathbb{E}(X_k^2\mathbf{1}_{\{|X_k|>\varepsilon s_n\}})=0$

then ${(S_n)_{n\geq1}}$ converges in distribution to the standard Gaussian ${\mathcal{N}(0,1)}$, in other words,

$\lim_{n\rightarrow\infty}\mathbb{P}(S_n\leq x)=\mathbb{P}(G\leq x) \quad\text{for all }x\in\mathbb{R}.$

Moreover, the Feller criterion states that if

$\lim_{n\rightarrow\infty}\max_{1\leq k\leq n}\frac{\sigma_k^2}{s_n^2}=0$

then the Lindeberg condition is necessary and sufficient for the convergence of ${(S_n)_{n\geq1}}$ in distribution to the standard Gaussian ${\mathcal{N}(0,1)}$. The Feller condition means that each single variance ${\sigma_k^2}$ represents an asymptotically negligible portion of the total variance ${s_n^2}$, as ${n}$ goes to infinity. In other words, the total variance is spread as ${n}$ goes to infinity.

On the contrary, and quite intuitively, one can guess that if ${(\sigma_n)_{n\geq1}}$ is localized then ${S_n}$ is very close to the sum of few terms for arbitrary large ${n}$, and the CLT may fail due to a lack of averaging (homogenization). Of course, if the sequence ${(\sigma_n)_{n\geq1}}$ is sparse (extreme localization!) i.e. ${\mathrm{Card}\{n\geq1:\sigma_n\neq0\}<\infty}$, then the CLT fails. Beyond sparsity, let us seek for a more subtle example for which one can check immediately from scratch that the CLT fails. Let us pick a sequence ${(\sigma_n)_{n\geq1}}$ of positive real numbers, and a sequence ${(U_n)_{n\geq1}}$ of bounded i.i.d. random variables on ${[-c,c]}$ with mean ${0}$ and variance ${1}$. If we define the random variable ${X_k:=\sigma_kU_k}$ then

$\mathbb{E}(X_k)=0 \quad\text{and}\quad \mathbb{E}(X_k^2)=\sigma_k^2$

and

$c^{-1}|S_n|\leq \frac{\left\Vert(\sigma_1,\ldots,\sigma_n)\right\Vert_1} {\left\Vert(\sigma_1,\ldots,\sigma_n)\right\Vert_2} =:\rho_n.$

Now, if ${(\rho_n)_{n\geq1}}$ is bounded then ${(S_n)_{n\geq1}}$ is bounded and thus the CLT fails. The norms-ratio ${\rho_n}$ measures the delocalisation of the vector ${(\sigma_1,\ldots,\sigma_n}$). Note that ${(\rho_n)_{n\geq1}}$ is bounded if ${(\sigma_n)_{n\geq1}}$ grows too fast or decays too fast. For instance, ${(\rho_n)_{n\geq1}}$ is bounded if ${(\sigma_n)_{n\geq1}\in\ell^1}$. On the other hand, since ${s_n^2=s_{n-1}^2+\sigma_n^2}$, the Cauchy-Schwarz inequality gives

$\rho_n \leq \frac{1}{s_n}\sum_{k=1}^{n-1}\sigma_k +\frac{\sigma_n}{s_n} \leq \sqrt{n-1}\,\frac{s_{n-1}}{\sigma_n}+1$

and ${(\rho_n)_{n\geq1}}$ is bounded e.g. if ${\sigma_n=s_{n-1}\sqrt{n}}$. The delocalization control is an essential aspect of the CLT. The Berry-Esséen theorem, which constitues a quantitative CLT, involves also a norms-ratio measuring localization: if ${(X_n)_{n\geq1}}$ are independent real random variables with

$\mathbb{E}(X_n)=0 \quad\text{and}\quad \sigma^2_n:=\mathbb{E}(X_n^2) \quad\text{and}\quad \tau_n^3:=\mathbb{E}(|X_n|^3)$

and if ${V_n:=(X_1,\ldots,X_n)}$ and ${S_n}$ is defined from ${(X_n)_{n\geq1}}$ as before then for all ${n\geq1}$,

$\sup_{x\in\mathbb{R}} \left|\mathbb{P}\left(S_n\leq x\right)-\mathbb{P}(G\leq x)\right| \leq 6\frac{\mathbb{E}(\left\Vert V_n\right\Vert_3^3)} {\mathbb{E}(\left\Vert V_n\right\Vert_2^2)^3} = 6\frac{\tau_1^3+\cdots+\tau_n^3}{(\sigma_1^2+\cdots+\sigma_n^2)^{3/2}}.$

You may take a look at the recent work of Klartag and Sodin on the role of delocalization in the Berry-Esséen theorem.

Measuring (de)localisation with norms-ratios is a classical trick in mathematics and physics. It plays a role for instance for eigenvectors in the formalization of the Anderson localization phenomenon for random Schrödinger operators, and in the recent work of Erdös, Schlein, Ramirez, Yau, Tao and Vu on the universality of eigenvalues spacings for models of random matrices. The norm ratio is also related to embeddings in the local theory of Banach spaces.

This post is inspired from a question asked by my friend Sébastien Blachère.

How to quickly simulate a positive real random variable $X$ with a polynomial tail, say $\mathbb{P}(X>t)\sim (1+t)^{-\alpha}$ as $t\to\infty$, for some prescribed $\alpha>0$. We may use an already coded function provided by  our favorite computational software. Quoting my former PhD advisor, you just have to press the button rand or something similar. But how to proceed from scratch using uniform random numbers? The inversion method suggests to set $X=U^{-1/\alpha}-1$ where $U$ is a uniform random variable on $[0,1].$ The law of $X$ is Pareto with probability density function $t\in[0,\infty)\mapsto a(1+t)^{-(1+a)}$.  It is amusing to remark that the fractional part of $X$ has probability density function

$$t\in[0,1]\mapsto\alpha\sum_{n=1}^\infty\frac{1}{(n+t)^{1+\alpha}}=\alpha\zeta(\alpha+1,t+1)$$

where $\zeta$ is the Hurwitz Zeta function. The integer part of $X$ follows a Zipf like law.

To simulate an $n$-sample of $X$ with GNU-Octave one can use rand(1,n).^(-1/alpha)-1. The problem is that on computers, real numbers are replaced by  floats: there is a quantification limit.  Small values of $\alpha$ produce a lack of precision for large values, which are frequent for heavy tails…

Regarding the algorithm, it is not wise to use the inversion method to simulate a law on an infinite set (recall that there is no uniform law of such sets).  To simulate $X$, one should prefer an excess packing method as in Marsaglia & Tsang Ziggurat algorithm. This alternative approach has the advantage of being faster and a bit more precise (the code is heavier). It is implemented for exponential, Gaussian, and gamma distributions in the  rande, randg, randn functions of GNU-Octave 3.2. If you read French, you may take a look at the first chapter of my pedagogical book with Bernard Bercu. This approach constitutes an instance of  a general meta principle in Computer Science which states that one can speed up algorithms by using static pre-computations, related to the rigid structure of the problem. The fastest algorithm is not necessarily the shortest algorithm in terms of number of code lines.

If you are interested in the simulation of $\alpha$-stable distributions, $0<\alpha\leq 2$, you may take a look at the Chambers-Mallows-Stuck exact algorithm which generalizes the well known Box-Muller algorithm for Gaussians (suffers from the quantification limit).

For true (unpredictable) randomness, take a look at random.org and its GNU-R interface 😉

The logarithmic potential is a classical object of potential theory intimately connected with the two dimensional Laplacian. It appears also in free probability theory via the free entropy, and in partial differential equations e.g. Patlak-Keller-Segel models. This post concerns only it usage for the spectra of non Hermitian random matrices.

Let ${\mathcal{P}(\mathbb{C})}$ be the set of probability measures on ${\mathbb{C}}$ which integrate ${\log\left|\cdot\right|}$ in a neighborhood of infinity. For every ${\mu\in\mathcal{P}(\mathbb{C})}$, the logarithmic potential ${U_\mu}$ of ${\mu}$ on ${\mathbb{C}}$ is the function

$U_\mu:\mathbb{C}\rightarrow(-\infty,+\infty]$

defined for every ${z\in\mathbb{C}}$ by

$U_\mu(z)=-\int_{\mathbb{C}}\!\log|z-z’|\,\mu(dz’) =-(\log\left|\cdot\right|*\mu)(z). \ \ \ \ \ (1)$

This integral in well defined in ${(-\infty,+\infty]}$ thanks to the assumption ${\mu\in\mathcal{P}(\mathbb{C})}$. Note that ${U_\mu(z)=+\infty}$ if ${\mu}$ has an atom at point ${z}$. For the circular law ${\mathcal{U}_\sigma}$ of density

$(\sigma^2\pi)^{-1}\mathbf{1}_{\{z\in\mathbb{C}:|z|\leq\sigma\}}$

we have, for all ${z\in\mathbb{C}}$, the remarkable formula

$U_{\mathcal{U}_\sigma}(z)= \begin{cases} \log(\sigma)-\log|z\sigma^{-1}| & \text{if } |z|>\sigma, \\ \log(\sigma)-\frac{1}{2}(|z\sigma^{-1}|^2-1) & \text{if } |z|\leq\sigma, \end{cases} \ \ \ \ \ (2)$

see e.g. the book of Saff and Totik. Let ${\mathcal{D}'(\mathbb{C})}$ be the set of Schwartz-Sobolev distributions (generalized functions). Since ${\log\left|\cdot\right|}$ is Lebesgue locally integrable on ${\mathbb{C}}$, one can check by using the Fubini theorem that ${U_\mu}$ is Lebesgue locally integrable on ${\mathbb{C}}$. In particular,

$U_\mu<\infty$

Lebesgue almost everywhere (a.e.), and

$U_\mu\in\mathcal{D}'(\mathbb{C}).$

Since ${\log\left|\cdot\right|}$ is the fundamental solution of the Laplace equation in ${\mathbb{C}}$, we have, in ${\mathcal{D}'(\mathbb{C})}$,

$\Delta U_\mu=-2\pi\mu. \ \ \ \ \ (3)$

In other words, for every smooth and compactly supported “test function” ${\varphi:\mathbb{C}\rightarrow\mathbb{R}}$,

$\int_{\mathbb{C}}\!\Delta\varphi(z)U_\mu(z)\,dz =-2\pi\int_{\mathbb{C}}\!\varphi(z)\,\mu(dz).$

The identity (3) means that ${\mu\mapsto U_\mu}$ in the distributional Green potential of the Laplacian in ${\mathbb{C}}$. The function ${U_\mu}$ contains enough information to recover ${\mu}$:

Lemma 1 (Unicity) For every ${\mu,\nu\in\mathcal{P}(\mathbb{C})}$, if ${U_\mu=U_\nu}$ a.e. then ${\mu=\nu}$.

Proof: Since ${U_\mu=U_\nu}$ in ${\mathcal{D}'(\mathbb{C})}$, we get ${\Delta U_\mu=\Delta U_\nu}$ in ${\mathcal{D}'(\mathbb{C})}$. Now (3) gives ${\mu=\nu}$ in ${\mathcal{D}'(\mathbb{C})}$, and thus ${\mu=\nu}$ as measures since ${\mu}$ and ${\nu}$ are Radon measures. ☐

Let ${A}$ be an ${n\times n}$ complex matrix. We define the discrete probability measure on ${\mathbb{C}}$

$\mu_A:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(A)}$

where ${\lambda_1(A),\ldots,\lambda_n(A)}$ are the eigenvalues of ${A}$, i.e. the roots in ${\mathbb{C}}$ of its characteristic polynomial ${P_A(z):=\det(A-zI)}$. We also define the discrete probability measure on ${[0,\infty)}$

$\nu_A:=\frac{1}{n}\sum_{k=1}^n\delta_{s_k(A)}$

where ${s_1(A),\ldots,s_n(A)}$ are the singular values of ${A}$, i.e. the eigenvalues of the positive semidefinite Hermitian matrix ${\sqrt{AA^*}}$. We have now

$U_{\mu_A}(z) =-\int_{\mathbb{C}}\!\log\left| z’-z\right|\,\mu_A(dz’) =-\frac{1}{n}\log\left|\det(A-zI)\right| =-\frac{1}{n}\log\left| P_A(z)\right|$

for every ${z\in\mathbb{C}\setminus\{\lambda_1(A),\ldots,\lambda_n(A)\}}$. We have also the alternative expression

$U_{\mu_A}(z) =-\frac{1}{n}\log\det(\sqrt{(A-zI)(A-zI)^*}) =-\int_0^\infty\!\log(t)\,\nu_{A-zI}(dt). \ \ \ \ \ (4)$

The identity (4) bridges the eigenvalues with the singular values, and is at the heart of the following lemma, which allows to deduce the convergence of ${\mu_A}$ from the one of ${\nu_{A-zI}}$. The strength of this Hermitization lies in the fact that in contrary to the eigenvalues, one can control the singular values with the entries of the matrix. The price payed here is the introduction of the auxiliary variable ${z}$ and the uniform integrability. We recall that on a Borel measurable space ${(E,\mathcal{E})}$, we say that a Borel function ${f:E\rightarrow\mathbb{R}}$ is uniformly integrable for a sequence of probability measures ${(\eta_n)_{n\geq1}}$ on ${E}$ when

$\lim_{t\rightarrow\infty}\varlimsup_{n\rightarrow\infty}\int_{\{|f|>t\}}\!|f|\,d\eta_n=0. \ \ \ \ \ (5)$

We will use this property as follows: if ${(\eta_n)_{n\geq1}}$ converges weakly to ${\eta}$ and ${f}$ is continuous and uniformly integrable for ${(\eta_n)_{n\geq1}}$ then ${f}$ is ${\eta}$-integrable and ${\lim_{n\rightarrow\infty}\int\!f\,d\eta_n=\int\!f\,\eta}$. The idea of using Hermitization goes back at least to Girko. However, theorem 2 and lemma 3 below are inspired from the approach of Tao and Vu.

Theorem 2 (Girko Hermitization) Let ${(A_n)_{n\geq1}}$ be a sequence of complex random matrices where ${A_n}$ is ${n\times n}$ for every ${n\geq1}$, defined on a common probability space. Suppose that for a.a. ${z\in\mathbb{C}}$, there exists a probability measure ${\nu_z}$ on ${[0,\infty)}$ such that a.s.

• (i) ${(\nu_{A_n-zI})_{n\geq1}}$ converges weakly to ${\nu_z}$ as ${n\rightarrow\infty}$
• (ii) ${\log(\cdot)}$ is uniformly integrable for ${\left(\nu_{A_n-zI}\right)_{n\geq1}}$

Then there exists a probability measure ${\mu\in\mathcal{P}(\mathbb{C})}$ such that

• (j) a.s. ${(\mu_{A_n})_{n\geq1}}$ converges weakly to ${\mu}$ as ${n\rightarrow\infty}$
• (jj) for a.a. ${z\in\mathbb{C}}$,

$U_\mu(z)=-\int_0^\infty\!\log(t)\,\nu_z(dt).$

Moreover, if ${(A_n)_{n\geq1}}$ is deterministic, then the statements hold without the “a.s.”

Proof: Let ${z}$ and ${\omega}$ be such that (i-ii) hold. For every ${1\leq k\leq n}$, define

$a_{n,k}:=|\lambda_k(A_n-zI)| \quad\text{and}\quad b_{n,k}:=s_k(A_n-zI)$

and set ${\nu:=\nu_z}$. Note that ${\mu_{A_n-zI}=\mu_{A_n}*\delta_{-z}}$. Thanks to the Weyl inequalities and to the assumptions (i-ii), one can use lemma 3 below, which gives that ${(\mu_{A_n})_{n\geq1}}$ is tight, that ${\log\left| z-\cdot\right|}$ is uniformly integrable for ${(\mu_{A_n})_{n\geq1}}$, and that

$\lim_{n\rightarrow\infty}U_{\mu_{A_n}}(z)=-\int_0^\infty\!\log(t)\,\nu_z(dt)=:U(z).$

Consequently, a.s. ${\mu\in\mathcal{P}(\mathbb{C})}$ and ${U_\mu=U}$ a.e. for every adherence value ${\mu}$ of ${(\mu_{A_n})_{n\geq1}}$. Now, since ${U}$ does not depend on ${\mu}$, by lemma 1, a.s. ${\left(\mu_{A_n}\right)_{n\geq1}}$ has a unique adherence value ${\mu}$, and since ${(\mu_n)_{n\geq1}}$ is tight, ${(\mu_{A_n})_{n\geq1}}$ converges weakly to ${\mu}$ by the Prohorov theorem. Finally, by (3), ${\mu}$ is deterministic since ${U}$ is deterministic, and (j-jj) hold. ☐

The following lemma is in a way the skeleton of the Girko Hermitization of theorem 2. It states essentially a propagation of a uniform logarithmic integrability for a couple of triangular arrays, provided that a logarithmic majorization holds between the arrays. See arXiv:0808.1502v2 for a proof.

Lemma 3 (Logarithmic majorization and uniform integrability) Let ${(a_{n,k})_{1\leq k\leq n}}$ and ${(b_{n,k})_{1\leq k\leq n}}$ be two triangular arrays in ${[0,\infty)}$. Define the discrete probability measures

$\mu_n:=\frac{1}{n}\sum_{k=1}^n\delta_{a_{n,k}} \quad\text{and}\quad \nu_n:=\frac{1}{n}\sum_{k=1}^n\delta_{b_{n,k}}.$

If the following properties hold

• (i) ${a_{n,1}\geq\cdots\geq a_{n,n}}$ and ${b_{n,1}\geq\cdots\geq b_{n,n}}$ for ${n\gg1}$,
• (ii) ${\prod_{i=1}^k a_{n,i} \leq \prod_{i=1}^k b_{n,i}}$ for every ${1\leq k\leq n}$ for ${n\gg1}$,
• (iii) ${\prod_{i=k}^n b_{n,i} \leq \prod_{i=k}^n a_{n,i}}$ for every ${1\leq k\leq n}$ for ${n\gg1}$,
• (iv) ${(\nu_n)_{n\geq1}}$ converges weakly to some probability measure ${\nu}$ as ${n\rightarrow\infty}$,
• (v) ${\log(\cdot)}$ is uniformly integrable for ${(\nu_n)_{n\geq1}}$,

then

• (j) ${(\mu_n)_{n\geq1}}$ is tight,
• (jj) ${\log(\cdot)}$ is uniformly integrable for ${(\mu_n)_{n\geq1}}$,
• (jjj) we have, as ${n\rightarrow\infty}$,

$\int_0^\infty\!\log(t)\,\mu_n(dt) =\int_0^\infty\!\log(t)\,\nu_n(dt)\rightarrow\int_0^\infty\!\log(t)\,\nu(dt),$

and in particular, for every adherence value ${\mu}$ of ${(\mu_n)_{n\geq1}}$,

$\int_0^\infty\!\log(t)\,\mu(dt)=\int_0^\infty\!\log(t)\,\nu(dt).$

The logarithmic potential is related to the Cauchy-Stieltjes transform of ${\mu}$ via

$S_\mu(z) :=\int_{\mathbb{C}}\!\frac{1}{z’-z}\,\mu(dz’) =(\partial_x-i\partial_y)U_\mu(z) \quad\text{and thus}\quad (\partial_x+i\partial_y)S_\mu=-2\pi\mu$

in ${\mathcal{D}'(\mathbb{C})}$. The term “logarithmic potential” comes from the fact that ${U_\mu}$ is the electrostatic potential of ${\mu}$ viewed as a distribution of charges in ${\mathbb{C}\equiv\mathbb{R}^2}$. The logarithmic energy

$\mathcal{E}(\mu) :=\int_{\mathbb{C}}\!U_\mu(z)\,\mu(dz) =-\int_{\mathbb{C}}\int_{\mathbb{C}}\!\log\left| z-z’\right|\,\mu(dz)\mu(dz’)$

is up to a sign the Voiculescu free entropy of ${\mu}$ in free probability theory. The circular law ${\mathcal{U}_\sigma}$ minimizes ${\mu\mapsto\mathcal{E}(\mu)}$ under a second moment constraint. In the spirit of (4) and beyond matrices, the Brown spectral measure of a nonnormal bounded operator ${a}$ is

$\mu_a:=(-4\pi)^{-1}\Delta\int_0^\infty\!\log(t)\,\nu_{a-zI}(dt)$

where ${\nu_{a-zI}}$ is the spectral distribution of the self-adjoint operator ${(a-zI)(a-zI)^*}$. Due to the logarithm, the Brown spectral measure ${\mu_a}$ depends discontinuously on the ${*}$-moments of ${a.}$ For random matrices, this problem is circumvented in the Girko Hermitization by requiring a uniform integrability, which turns out to be a.s. satisfied for random matrices with i.i.d. entries with finite positive variance.

Syntax · Style · .