December 2018 – Libres pensées d'un mathématicien ordinaire

This tiny post is about a basic characterization of Gaussian distributions.

The theorem. A random vector of dimension two or more has independent components and is rotationally invariant if and only if its components are Gaussian, centered, with same variances.

In other words, for all $n\geq2$, a probability measure on $\mathbb{R}^n$ is in the same time product and rotationally invariant if and only if it is a Gaussian distribution $\mathcal{N}(0,\sigma^2I_n)$ for some $\sigma\geq0$.

Note that this does not work for $n=1$. In a sense it is a purely multivariate phenomenon.

A proof. For all $\sigma\geq0$, the Gaussian distribution $\mathcal{N}(0,\sigma^2I_n)$ is product and is rotationally invariant, and if $\sigma>0$, its density is, denoting $|x|:=\sqrt{x_1^2+\cdots+x_n^2}$, $$x\in\mathbb{R}^n\mapsto\mathrm{exp}\Bigr(-\frac{|x|^2}{2\sigma^2}-n\log\sqrt{2\pi\sigma^2}\Bigr).$$ Conversely, suppose that $\mu$ is a rotationally invariant product probability distribution on $\mathbb{R}^n$. We can assume without loss of generality that it has a smooth positive density $f:\mathbb{R}^n\to(0,\infty)$, since otherwise we can consider the probability measure $\mu*\mathcal{N}(0,\varepsilon I_n)$ for $\varepsilon>0$, which is also product and rotationally invariant. By rotational invariance, $\log f(x)=g(|x|^2)$, and thus $$\partial_i\log f(x)=2g'(|x|^2)x_i.$$ On the other hand, since $\mu$ is product, we have $\log f (x)=h(x_1)+\cdots+h(x_n)$ and thus $$\partial_i\log f (x)=h'(x_i).$$ Hence $\partial_i\log f(x)$, which depends on $|x|$ via $g'(|x|)$, depend only on $x_i$. Since $n\geq2$, it follows that $g'$ is constant. Therefore there exist $a,b\in\mathbb{R}$ such that $g(u)=au+b$ for all $u$, and thus $f(x)=\mathrm{e}^{a|x|^2+b}$ for all $x\in\mathbb{R}^n$. Since $f$ is a density, $a<0$ and $\mathrm{e}^b=(\pi/a)^{-n/2}$.

Another proof (for the converse). Suppose that $X_1,\ldots,X_n$ are $n\geq2$ independent random variables, such that the random vector $(X_1,\ldots,X_n)$ is rotationally invariant. Since for all $1\leq i\neq j\leq n$, there exists a rotation $R$ such that $Re_i=\pm e_j$, it follows that $X_1,\ldots,X_n$ are independent and identically distributed of say law $\nu$ which is symmetric. This reduces the problem to show that $\nu$ is Gaussian. This also means that it suffices to solve the problem for $n=2$. Now, for all $\theta\in\mathbb{R}$, the rotation of angle $\theta$ in $\mathrm{span}\{e_1,e_2\}$ gives that $\cos(\theta)X_1-\sin(\theta)X_2$ has the law of $X_1$. This indicates that $\nu$ is (symmetric) stable. But denoting $\varphi$ its characteristic function, and using the independence, we obtain $\varphi(\cos(\theta)t)\varphi(-\sin(\theta)t)=\varphi(t)$ for all $t\in\mathbb{R}$. Using the expression of the characteristic function of symmetric stable distributions, this leads to the Gaussianity of $\nu$. Note that without using stability, this shows also that all the cumulants or order $\neq2$ are all zero when $\nu$ has all its moments finite. This alternative proof without regularization is inspired by the idea of reduction to $n=2$ due to Dinh-Toan Nguyen, PhD student, communicated by my colleague Laure Dumaz.

History. The proof above is roughly the reasoning followed by James Clerk Maxwell (1831 - 1879) to derive the distribution of velocities in an ideal gas at equilibrium. In his case $n=3$, and the distribution is known in statistical physics as the Maxwellian distribution. This was a source of inspiration for Ludwig Boltzmann (1844 - 1906) for the derivation of his kinetic evolution equation and his H-theorem about entropy. This was known apparently before Maxwell, for instance by John Herschel (1792 - 1871) in 1850 in his commentaries on the work of Adolf Quetelet (1796 - 1874) in social statistics and probabilities. But this was maybe also known by Carl Friedrich Gauss (1777 - 1855) himself.

Characterizations. This characterization of Gaussian laws among product distributions using invariance by the action of transformations (rotations) leads to the same characterization for the heat semi-group and for the Laplacian operator. There are of course other remarkable characterizations of the Gaussian, for instance as being an eigenvector of the Fourier transform, and also, following Boltzmann, as being the maximum entropy distribution at fixed variance.

Maxwell characterization for unitary invariant random matrices. A random $n\times n$ Hermitian matrix has in the same time independent entries and a law invariant by conjugacy with respect to unitary matrices if and only if it has a Gaussian law with density of the form $$H\mapsto\exp(a\mathrm{Tr}(H^2)+b\mathrm{Tr}(H)+c).$$ Note that the unitary invariance implies that the density depends only on the spectrum and is actually a symmetric function of the eigenvalues. A complete solution can be found for instance in Madan Lal Mehta book on Random matrices (Theorem 2.6.3), who attributes the result to Charles E. Porter and Norbert Rosenzweig (~1960). It is related to a lemma due to Hermann Weyl: all the invariants of an $n\times n$ matrix $H$ under non-singular similarity transformations $H\mapsto UHU^*$ can be expressed in terms of traces of the first $n$ powers of $H$. The assumption about the independence of entries kills all powers above $2$.

Letac observation. It is not difficult to show that if $X$ is a random vector of $\mathbb{R}^n$, $n\geq1$ with independent Gaussian and centered components of positive variance then $\mathbb{P}(X=0)=0$ and $X/|X|$ is uniformly distributed on the sphere. Conversely, it was shown by my former teacher and colleague Gérard Letac (1940 - ) that if a random vector $X$ of $\mathbb{R}^n$, $n\geq3$, has independent components and is such that $\mathbb{P}(X=0)=0$ and $X/|X|$ is uniformly distributed on the sphere, then $X$ is Gaussian and in particular its components are Gaussian with zero mean and same positive variance. Moreover there are counter examples for $n=1$ and $n=2$. When $n\geq3$, this result of Letac implies the Maxwell theorem.

Lévy observation.

On sait que la théorie cinétique des gaz repose sur la loi de Maxwell : si on choisit au hasard une molécule d'une masse gazeuse homogène (j'entends par là que toutes les molécules sont de la même nature), les trois composantes de sa vitesse sont trois variables aléatoires gaussiennes et indépendantes. La justification de cette loi, dans le traité classique de Boltzmann, est terriblement compliquée. On en avait donné deux explications plus simples mais à mon avis sans valeur.

L'une est basée sur un fait mathématique exact: si la grandeur de la vitesse est indépendante de sa direction, et si ses trois composantes sont indépendantes, il ne peut s'agir que de la loi de Maxwell. Mais on n'a à priori aucune raison de croire à cette indépendance ; si par exemple la vitesse dépasse rarement mille mètres par seconde, et que nous constations une vitesse horizontale de 980 mètres par seconde, ne pouvons-nous pas penser que le mouvement est presque horizontal et que par suite la composante verticale est faible ? Elle ne serait pas indépendante de la composante horizontale.

D'autre part, si une masse gazeuse contient n molécules, on peut considérer les 3n composantes de leurs vitesses comme un point d'un espace euclidien à 3n dimensions. D'après le théorème des forces vives, ce point est sur une sphère de cet espace, et, en supposant la probabilité uniformément répartie sur cette surface, on obtient la loi de Maxwell, d'autant plus exactement que n est plus grand. Émile Borel, qui semble avoir été premier fait cette remarque, y voyait une explication du rôle de cette loi. Je ne suis pas de cet avis. Je ne vois aucune raison de considérer deux éléments égaux de cette sphère comme également probables, si l'un implique ]a concentration de presque toute l'énergie sur une seule molécule, tandis que l'autre implique qu'elle soit au contraire équitablement répartie.

Je pensai alors à utiliser la réversibilité des lois de la mécanique. D'après ces lois, si un mouvement des particules d'un gaz est possible, le mouvement inverse, qui s'en déduit en remontant le cours du temps, est aussi compatible avec les lois du choc. II y a cependant, dans la nature, des phénomènes irréversibles. Mais on sait depuis longtemps (depuis Gibbs, je crois) que, s'il s'agit de phénomènes régis par les lois de mécanique, cette impossibilité du mouvement inverse n'est pas absolue. II s'agit seulement d'un phénomène très peu probable. Ainsi, faisons communiquer un réservoir rempli d'azote et un réservoir rempli d'oxygène ; les deux gaz se mélangeront. Ils ne pourront plus se séparer. Pourtant le mouvement inversé des molécules est possible ; mais, si leurs positions et leurs vitesses initiales sont choisies au hasard, le nombre des siècles qu'il faudrait attendre pour que cela ait des chances appréciables de se réaliser, est si grand qu'il faudrait (même pour des masses gazeuses assez faibles) beaucoup de milliards de chiffres pour l'écrire). Naturellement, quand on mélange deux gaz, il y a une période transitoire plus ou moins longue. Mais on peut facilement admettre qu'un état d'équilibre finisse par s' établir, dans lesquelles vitesses sont réparties suivant une loi bien déterminée, et la répartition reste alors évidemment la même si on remonte le cours du temps. Si on considère les chocs de deux molécules assimilées pour simplifier à des sphères élastiques, on peut alors dire que le nombre de chocs d'une espèce déterminée (l'espèce étant définie par l'orientation du plan tan- gent aux molécules au point de contact et les vitesses de deux molécules) est sensiblement égal au nombre des chocs de l'espèce contraire (obtenue en échangeant les composantes normales des vitesses de deux molécules). Ce point admis, on en déduit aisément la loi de Maxwell. C'est ce que j'ai montré dans le dernier chapitre de mon livre de 1925.

Suis-je le premier à avoir indiqué cette méthode, qui est couramment enseignée aujourd'hui ? Je n'ose pas l'affirmer. Ce qui est sûr, c'est qu'elle n'était guère connue en 1925. J'ai vu des professeurs de physique me dire qu'ils n'avaient compris la théorie cinétique des gaz que grâce à moi. J'ajoute qu'ayant envoyé mon livre à l'illustre mathématicien italien Levi Civita, je reçus une lettre de lui me disant que, (bien entendu), c'était le chapitre sur la théorie cinétique des gaz qui l'avait le plus intéressé. C'est ce qui m'a fait penser que ma méthode devait être nouvelle.

Paul Lévy (1886 - 1971), Quelques aspects de la pensée d'un mathématicien, 1970.

Further reading.

John Frederick William Herschel
Quetelet on probabilities
Edinburgh Rev., 92, 1–57 (1850)
James Clerk Maxwell
Illustrations of the dynamical theory of gases
Philosophical Magazine. 4th Series. 19: 390–393 (1860)
Robert Robson, Timon Mehrling, and Jens Osterhoff
Great moments in kinetic theory: 150 years of Maxwell's (other) equations
European Journal of Physics 38(6) 2017
Balázs, Gyenis
Maxwell and the normal distribution: A colored story of probability, independence, and tendency toward equilibrium
Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics. 57: 53–65 (2017)
Paul Lévy
Quelques aspects de la pensée d'un mathématicien
Albert Blanchard (1970)
William Feller
An Introduction to Probability Theory and its Applications. Vol. II, section III.4
Wiley (1971)
Norbert Rosenzweig and Charles E. Porter
Repulsion of energy levels in complex atomic spectra
Phys. Rev., 120:1698–1714 (1960)
Madan Lal Mehta
Random Matrices
Elsevier (2004)
Hermann Weyl
The Classical Groups: Their Invariants and Representations
Princeton University Press (1966)
Gérard Letac
Isotropy and sphericity: Some characterisations of the normal distribution
The Annals of Statistics, 9(2):408–417 (1981)

4 Comments

Month: December 2018

Maxwell characterization of Gaussian distributions