Month: July 2013

Star-fish. My next door colleague Olivier Guédon is an expert in high dimensional convex geometry. He likes to picture convex bodies as some sort of starfishs. This habit is motivated by the fact that in high dimension, the extremal points of a convex body can be really far away from the center, further than most of the other points of the boundary. For instance, one may think about the cube ${\frac{1}{2}[-1,1]^n}$ in ${\mathbb{R}^n}$ of unit volume, for which the extremal points ${\frac{1}{2}(\pm 1,\ldots,\pm 1)}$ have length ${\frac{1}{2}\sqrt{n}}$, which becomes huge in high dimension, while the cube edge length remains equal to ${1}$. This leads to the natural question of the location of most of the mass of the convex body: is it close to the center or close to the extremal points? An answer is provided by the central limit theorem (probabilistic asymptotic geometric analysis!).

Thin-shell. A random vector ${X}$ of ${\mathbb{R}^n}$ follows the uniform law on ${\frac{1}{2}[-1,1]^n}$ iff its coordinates ${X_1,\ldots,X_n}$ are i.i.d. and follow the uniform law on ${\frac{1}{2}[-1,1]}$, for which

$m_2:=\mathbb{E}(X_i^2)=\frac{1}{12}, \quad m_4:=\mathbb{E}(X_i^4)=\frac{1}{80}.$

In particular, ${X_1^2,\ldots,X_n^2}$ have mean ${m_2}$ and variance ${\sigma^2=m_4-m_2^2=\frac{1}{180}}$. For every ${r>0}$, let us consider the thin shell (or ring)

$\mathcal{R}_n(r) :=\left\{x\in\mathbb{R}^n:-r\leq\frac{\left\Vert x\right\Vert_2^2-nm_2}{\sqrt{n\sigma^2}}\leq r\right\}.$

Then, thanks to the central limit theorem, we have

$\mathrm{Vol}\left(\frac{1}{2}[-1,1]^n\cap\mathcal{R}_n(r)\right) =\mathbb{P}((X_1,\ldots,X_n)\in\mathcal{R}_n(r)) \underset{n\rightarrow\infty}{\longrightarrow} \int_{-r}^r\!\frac{e^{-\frac{1}{2}t^2}}{\sqrt{2\pi}}\,dt.$

Taking ${r}$ large enough such that the right hand side is close to ${1}$ (every statistician knows that we get ${0.95}$ for ${r=1.96}$), this indicates that when ${n\gg1}$ most of the volume of the cube is contained in a thin shell at radial position ${\sqrt{nm_2}}$. It turns out that this high dimensional phenomenon is not specific to the cube (${\ell^\infty}$ ball) and remains valid for any convex body (exercise: consider the ${\ell^2}$ ball and the ${\ell^1}$ ball). This is also at the heart of the central limit theorem for convex bodies (references on the subject are given in this previous post).

Law of Large Numbers. In some sense log-concave distributions are geometric generalizations of product distributions with sub-exponential tails. A quick way to have in mind the thin-shell phenomenon is to think about a random vector ${X}$ with i.i.d. coordinates. The Law of Large Numbers states that almost surely the norm ${\left\Vert X\right\Vert=(X_1^2+\cdots+X_n^2)^{1/2}}$ is close to ${\sqrt{nm}}$ where ${m:=\mathbb{E}(X_1^2)}$ when ${n\gg1}$. Also most of the mass of ${X}$ is in a thin-shell at radius ${nm}$, a nice high dimensional phenomenon.

Further reading. You may read the survey Concentration phenomena in high dimensional geometry, by Olivier Guédon (2014).

Cet été se tourne pour moi une page importante : je quitterai au 1er septembre l’Université Paris-Est Marne-la-Vallée pour rejoindre l’Université Paris-Dauphine. Je garderai un excellent souvenir de ces quatre années passées à Marne. Il ne faut que 20 minutes de RER pour atteindre le site (Noisy-Champs) depuis Nation. Le laboratoire est bien doté, les collègues sont sympathiques et s’y sentent bien, le niveau scientifique est bon, l’environnement est agréable, et le site est riche sur le thème des mathématiques et de l’informatique. J’ai pu y exercer avec énergie et plaisir le métier de professeur des universités dans toutes ses dimensions : administration, animation, encadrement, enseignement, recherche,  etc.

Je resterai toujours admiratif des qualités scientifiques et humaines exceptionnelles de Alain Pajor, qui a beaucoup contribué à ce que je me sente bien à Marne-la-Vallée. Mon grand regret est de ne pas avoir su convaincre mes collègues probabilistes de développer les probabilités au sens large, en tenant compte de l’environnement scientifique, au delà des clivages boutiquiers historiques. Cela a compté dans ma décision d’accepter la proposition de mutation de Paris-Dauphine. À Dauphine, les défis sont différents mais tout aussi enthousiasmants.

Marne et Dauphine sont des universités relativement récentes, de taille modeste, proposant entre autres des formations en mathématiques financières et actuarielles. La similarité s’arrête là. Les fréquentations étudiantes sont différentes : Marne fait du social en milieu rurbain à l’est de Paris, tandis que Dauphine fait de la sélection en milieu urbain dans l’ouest parisien.

Le mode d’organisation de Paris-Dauphine représente peut-être l’avenir des universités françaises : rapprochement des statuts des universités et des grandes écoles, sélection, droits d’inscription, etc. Ce faisant, Dauphine implémente certains aspects du programme de Laurent Schwartz, en conformité avec les standards internationaux. La gauche universitaire française va-t-elle réussir à dépasser ses crispations symboliques pour réinventer enfin un système qui n’en fini pas de vieillir ? Rendez-vous dans trente ans !

Convergence in law to a constant. Let ${{(X_n)}_{n\geq1}}$ be a sequence of random variables defined on a common probability space ${(\Omega,\mathcal{A},\mathbb{P})}$, and taking their values in a metric space ${(E,d)}$ equipped with its Borel sigma-field. It is well known that if ${{(X_n)}_{n\geq1}}$ converges in law as ${n\rightarrow\infty}$ to some Dirac mass ${\delta_c}$ with ${c\in E}$ then the convergence holds in probability. Actually, since the limit is constant, we do not need to define the random variables on a common probability space: the property depends only on the laws ${{(\mu_n)}_{n\geq1}}$ of the random variables ${{(X_n)}_{n\geq1}}$:

$\forall\varepsilon>0,\quad \mathbb{P}(d(X_n,c)>\varepsilon) =\mu_n(B(c,\varepsilon)^c) \underset{n\rightarrow\infty}{\longrightarrow} 0.$

Note that starting from ${{(\mu_n)}_{n\geq1}}$, it is always possible to construct a sequence ${{(X_n)}_{n\geq1}}$ of random variables defined on a common probability space such that ${X_n\sim\mu_n}$ for every ${n\geq1}$. When

$\forall \varepsilon>0,\quad \sum_n\mu_n(B(c,\varepsilon)^c) =\sum_n\mathbb{P}(d(X_n,c)>\varepsilon)<\infty$

then thanks to the first Borel-Cantelli lemma, we get that ${{(X_n)}_{n\geq1}}$ converges almost surely to ${c}$ as ${n\rightarrow\infty}$. These observations lead naturally to the strange concept of universal convergence, which is a sort of almost sure convergence towards a constant for sequences of distributions!

Universal convergence. Let ${{(\mu_n)}_{n\geq1}}$ be a sequence of probability distribution on a metric space ${(E,d)}$, and let ${c\in E}$. We say that ${{(\mu_n)}_{n\geq1}}$ converges universally to ${c}$ as ${n\rightarrow\infty}$ when for every random variables ${{(X_n)}_{n\geq1}}$ defined on a common probability space with ${X_n\sim\mu_n}$ for every ${n}$, we have that ${{(X_n)}_{n\geq1}}$ converges almost surely to ${c}$ as ${n\rightarrow\infty}$.

Note that ${{(\mu_n)}_{n\geq1}}$ are the marginal laws of the sequence ${{(X_n)}_{n\geq1}}$, which tell nothing on the dependence structure.

Thanks to the first Borel-Cantelli lemma, a sufficient condition to get universal convergence is to have

$\forall \varepsilon>0,\quad \sum_n\mu_n(B(c,\varepsilon)^c)<\infty.$

This condition is actually also necessary, since we may construct the ${{(X_n)}_{n\geq1}}$ independent and use the second Borel-Cantelli lemma.

Who cares? Universal convergence is present in high dimensional phenomena in which a concentration of measure holds around a law of large numbers. This is typically the case for instance in asymptotic geometric analysis, in randomized combinatorial optimization, in random matrix theory, and in statistical physics. For instance, in the semicircle Wigner theorem, thanks to the exponential concentration around their mean for the empirical spectral distribution (coming from Azuma-Hoeffding concentration inequality), we have universal convergence. This means that no matter how we construct the sequence of random matrices, what is important is the law at each step.

How about the law of large numbers itself? It is well known that if ${{(X_n)}_{n\geq1}}$ are independent centered random variables with bounded fourth moment, then ${S_n:=\frac{1}{n}(X_1+\cdots+X_n)\rightarrow0}$ as ${n\rightarrow\infty}$ almost surely. Moreover, the convergence in universal. However, the convergence is no longer universal if we lower too much the moment condition. Still about the law of large numbers, one may also remark that in the Glivenko-Cantelli theorem, the convergence is universal thanks to the Dvoretzky-Kiefer-Wolfowitz concentration inequality.

Note. The term universal was suggested to us by Yan Doumerc. The notion of universal convergence is so natural that it might be present with several other names here and there. Charles Bordenave pointed out that this notion is called complete convergence (abridged into c.c.) in the book of Yukich on Euclidean combinatorial optimization. Jean-René Chazottes pointed out that complete convergence is discussed in the book by Gut on Probability, where it is explained that the concept goes back to Hsu, Robbins, and Erdős (~1947).