Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Wasserstein distance between two Gaussians

The \( {W_2} \) Wasserstein coupling distance between two probability measures \( {\mu} \) and \( {\nu} \) on \( {\mathbb{R}^n} \) is

\[ W_2(\mu;\nu):=\inf\mathbb{E}(\Vert X-Y\Vert_2^2)^{1/2} \]

where the infimum runs over all random vectors \( {(X,Y)} \) of \( {\mathbb{R}^n\times\mathbb{R}^n} \) with \( {X\sim\mu} \) and \( {Y\sim\nu} \). It turns out that we have the following nice formula for \( {d:=W_2(\mathcal{N}(m_1,\Sigma_1);\mathcal{N}(m_2,\Sigma_2))} \):

\[ d^2=\Vert m_1-m_2\Vert_2^2 +\mathrm{Tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \ \ \ \ \ (1) \]

This formula interested several authors including Givens and Shortt, Knott and Smith, Olkin and Pukelsheim, and Dowson and Landau. Note in particular that we have

\[ \mathrm{Tr}((\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2})= \mathrm{Tr}((\Sigma_2^{1/2}\Sigma_1\Sigma_2^{1/2})^{1/2}). \]

In the commutative case where \( {\Sigma_1\Sigma_2=\Sigma_2\Sigma_1} \), the formula (1) boils down simply to

\[ W_2(\mathcal{N}(m_1,\Sigma_1);\mathcal{N}(m_2,\Sigma_2))^2 =\Vert m_1-m_2\Vert_2^2 +\Vert\Sigma_1^{1/2}-\Sigma_2^{1/2}\Vert_{Frobenius}^2. \]

To prove (1), one can first reduce to the centered case \( {m_1=m_2=0} \). Next, if \( {(X,Y)} \) is a random vector (Gaussian or not) of \( {\mathbb{R}^n\times\mathbb{R}^n} \) with covariance matrix

\[ \Gamma= \begin{pmatrix} \Sigma_1 & C\\ C^\top&\Sigma_2 \end{pmatrix} \]

then the quantity

\[ \mathbb{E}(\Vert X-Y\Vert_2^2)=\mathrm{Tr}(\Sigma_1+\Sigma_2-2C) \]

depends only on \( {\Gamma} \). Also, when \( {\mu=\mathcal{N}(0,\Sigma_1)} \) and \( {\nu=\mathcal{N}(0,\Sigma_2)} \), one can restrict the infimum which defines \( {W_2} \) to run over Gaussian laws \( {\mathcal{N}(0,\Gamma)} \) on \( {\mathbb{R}^n\times\mathbb{R}^n} \) with covariance matrix \( {\Gamma} \) structured as above. The sole constrain on \( {C} \) is the Schur complement constraint:

\[ \Sigma_1-C\Sigma_2^{-1}C^\top\succeq0. \]

The minimization of the function

\[ C\mapsto-2\mathrm{Tr}(C) \]

under the constraint above leads to (1). A detailed proof is given by Givens and Shortt. Alternatively, one may find an optimal transportation map as Knott and Smith. It turns out that \( {\mathcal{N}(m_2,\Sigma_2)} \) is the image law of \( {\mathcal{N}(m_1,\Sigma_1)} \) with the linear map

\[ x\mapsto m_2+A(x-m_1) \]


\[ A=\Sigma_1^{-1/2}(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}\Sigma_1^{-1/2}=A^\top. \]

To check that this maps \( {\mathcal{N}(m_1,\Sigma_1)} \) to \( {\mathcal{N}(m_2,\Sigma_2)} \), say in the case \( {m_1=m_2=0} \) for simplicity, one may define the random column vectors \( {X\sim\mathcal{N}(m_1,\Sigma_1)} \) and \( {Y=AX} \) and write

\[ \begin{array}{rcl} \mathbb{E}(YY^\top) &=& A \mathbb{E}(XX^\top) A^\top\\ &=& \Sigma_1^{-1/2}(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2} (\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}\Sigma_1^{-1/2}\\ &=& \Sigma_2. \end{array} \]

To check that the map is optimal, one may use,

\[ \begin{array}{rcl} \mathbb{E}(\|X-Y\|_2^2) &=&\mathbb{E}(\|X\|_2^2)+\mathbb{E}(\|Y\|_2^2)-2\mathbb{E}(\left<X,Y\right>) \\ &=&\mathrm{Tr}(\Sigma_1)+\mathrm{Tr}(\Sigma_2)-2\mathbb{E}(\left<X,AX\right>)\\ &=&\mathrm{Tr}(\Sigma_1)+\mathrm{Tr}(\Sigma_2)-2\mathrm{Tr}(\Sigma_1A) \end{array} \]

and observe that by the cyclic property of the trace,

\[ \mathrm{Tr}(\Sigma_1 A) =\mathrm{Tr}((\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \]

The generalizations to elliptic families of distributions and to infinite dimensional Hilbert spaces is probably easy. Some more “geometric” properties of Gaussians with respect to such distances where studied more recently by Takastu and Takastu and Yokota.


Exponential mixtures of exponentials are Pareto

Did you know that if [latex]X\sim\mathcal{E}(\lambda)[/latex] and [latex]\mathcal{L}(Y\,\vert\,X=x)=\mathcal{E}(x)[/latex] for all [latex]x\geq0[/latex] then [latex]Y[/latex] follows a Pareto distribution with  probability density function [latex]x\mapsto 1/(\lambda+x)^2[/latex]? Funny!

Consider now the kinetic diffusion process [latex](X_t,Y_t)_{t\geq0}[/latex] on [latex]\mathbb{R}^2[/latex] where

[latex]\displaystyle\begin{cases}dX_t&=dB_t-s(X_t)\lambda dt\\dY_t&=dW_t-s(Y_t)|X_t|dt\end{cases}[/latex]

where [latex](B_t)_{t\geq0}[/latex] and [latex](W_t)_{t\geq0}[/latex] are independent standard Brownian motions and [latex]s[/latex] is the sign function… Can you guess the invariant measure and control the speed of convergence?



The Portmanteau theorem gives several statements equivalent to the narrow convergence i.e. the weak convergence of probability measures with respect to continuous bounded functions. I wonder if Portmanteau was a mathematician or if this name is just due to the fact that the theorem is a portmanteau for several statements.

Leave a Comment

Eigenvectors universality for random matrices

Let $latex (X_{jk})_{j,k\geq1}$ be an infinite table of complex random variables and set $latex X:=(X_{j,k})_{1\leq j,k\leq n}$. If $latex X_{11}$ is Gaussian then $latex X$ belongs to the so called Ginibre Ensemble. Consider the random unitary matrices $latex U$ and $latex V$ such that $latex X=UDV$ where $latex D=\mathrm{diag}(s_1,\ldots,s_n)$ and where $latex s_1,\ldots,s_n$ are the singular values of $latex X$, i.e. the eigenvalues of $latex \sqrt{XX^*}$. When $latex X_{11}$ is Gaussian, the law of $latex X$ is rotationally invariant, and the matrices $latex U$ and $latex V$ are distributed according to the Haar law on the unitary group $latex \mathbb{U}_n$. The Gaussian version of the Marchenko-Pastur theorem tells us that with probability one, the counting probability distribution of the singular values, appropriately scaled, tends weakly to the quartercircular  law as $latex n\to\infty$.

The Marchenko-Pastur theorem is universal in the sense that it holds with the same limit beyond the Gaussian case provided that $latex X_{11}$ has moments identical to the Gaussian up to the order 2. One can ask if a similar statement holds for the eigenvectors, i.e. for the matrices $latex U$ and $latex V$. Are they asymptotically Haar distributed? For instance, one may ask if $latex W_2(\mathcal{L}(U),\mathrm{Haar}(\mathbb{U}_n))$ tends to zero as $latex n\to\infty$, where $latex W_2$ is the Wasserstein coupling distance. The distance choice is important. One may consider  many other distances including for instance the Fourier distance $latex \sup_g|\Phi_\mu(g)-\Phi_\nu(g)|$ where $latex \Phi_\mu$ denotes the Fourier transforrm of $latex \mu$ (characteristic function). A weakened version of this statement consist in asking if linear functionals of $latex U$ and $latex V$ behave asymptotically as Brownian bridges. Indeed, it is well known that linear functionals of the Haar law on the unitary group behave asmptotically like this. Silverstein has done some work in this direction. Of course, one can ask the same question for the eigenvectors in the Girko circular law and in the Wigner theorem. One can guess that a finite fourth moment assumption on $latex X_{11}$ is needed, otherwise the top of the spectrum will blow up and the corresponding eigenvectors will maybe localize.

If you do not trust me, just do simulations or… computations! There is here potentially a whole line of research, sparsely explored for the moment. If you like free probability, you may ask if $latex U’XV’$ is close to $latex X$ when $latex U’$ and $latex V’$ are Haar distributed and independent of $latex X$.

There is some literature on the behavior of eigenvectors of deterministic matrices under perturbations of the entries of the matrix, see e.g. the book of Bhatia (ch. VII). Among many results, if $latex A$ and $latex B$ are two invertible $latex n\times n$ complex matrices with respective polar unitary factors $latex U_A$ and $latex U_B$ in their polar factorization then for any unitary invariant norm $latex \left\Vert\cdot\right\Vert$ we have

$latex \displaystyle\left\Vert U_A-U_B\right\Vert\leq 2\frac{\left\Vert A-B\right\Vert}{\left\Vert A^{-1}\right\Vert^{-1}+\left\Vert B^{-1}\right\Vert^{-1}}.$

The eigenvectors are more sensitive than the bulk of the spectrum to perturbations on $latex X$, and one may understand this by remembering that for a normal matrix, they are arg-suprema while the eigenvalues are suprema. Also, one can guess that the asymptotic uniformization of the eigenvectors may be even sensitive to the skewness of the law of $latex X_{11}$.

It is well known that the $latex k$-dimensional projection of the uniform law on the sphere of $latex \mathbb{R}^n$ of radius $latex \sqrt{n}$ tends to the Gaussian law as $latex n\to\infty$. By viewing $latex \mathbb{U}_n$ as a bunch of exchangeable spheres, one can guess that the Haar law on the unitary group, appropriately scaled, will converge in some sense to the Brownian sheet bridge as the dimension tends to infinity. Recent addition to this post: this was proved in a paper by Donati-Martin and Rouault! We conjecture that this result is universal for  the eigenvectors matrix of random matrices with i.i.d. entries and moments identical to the Gaussian moments up to order $latex 4$.

The uniformization of the eigenvectors of random matrices is related to their delocalization, a phenomenon recently investigated by Erdös, Schlein, Ramirez, Yau, Tao, Vu, as a byproduct of their analysis of the universality of local statistics of the spectrum. This is a huge contrast with the well known Anderson localization phenomenon in mathematical physics for random Schrödinger operators.

The unitary group $latex \mathbb{U}_n$ is a purely $latex \ell^2$ object. Its $latex \ell^1$ analogue is the Birkhoff polytope of doubly stochastic matrices, also known as the transportation polytope,  assignment polytope, or perfect matching polytope, but this is another story…

This post benefined from discussions with Charles Bordenave and Florent Benaych-Georges.

1 Comment
Syntax · Style · Tracking & Privacy.