Press "Enter" to skip to content

Wasserstein distance between two Gaussians

The \( {W_2} \) Wasserstein coupling distance between two probability measures \( {\mu} \) and \( {\nu} \) on \( {\mathbb{R}^n} \) is

\[ W_2(\mu;\nu):=\inf\mathbb{E}(\Vert X-Y\Vert_2^2)^{1/2} \]

where the infimum runs over all random vectors \( {(X,Y)} \) of \( {\mathbb{R}^n\times\mathbb{R}^n} \) with \( {X\sim\mu} \) and \( {Y\sim\nu} \). It turns out that we have the following nice formula for \( {d:=W_2(\mathcal{N}(m_1,\Sigma_1);\mathcal{N}(m_2,\Sigma_2))} \):

\[ d^2=\Vert m_1-m_2\Vert_2^2 +\mathrm{Tr}(\Sigma_1+\Sigma_2-2(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \ \ \ \ \ (1) \]

This formula interested several authors including Givens and Shortt, Knott and Smith, Olkin and Pukelsheim, and Dowson and Landau. Note in particular that we have

\[ \mathrm{Tr}((\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2})= \mathrm{Tr}((\Sigma_2^{1/2}\Sigma_1\Sigma_2^{1/2})^{1/2}). \]

In the commutative case where \( {\Sigma_1\Sigma_2=\Sigma_2\Sigma_1} \), the formula (1) boils down simply to

\[ W_2(\mathcal{N}(m_1,\Sigma_1);\mathcal{N}(m_2,\Sigma_2))^2 =\Vert m_1-m_2\Vert_2^2 +\Vert\Sigma_1^{1/2}-\Sigma_2^{1/2}\Vert_{Frobenius}^2. \]

To prove (1), one can first reduce to the centered case \( {m_1=m_2=0} \). Next, if \( {(X,Y)} \) is a random vector (Gaussian or not) of \( {\mathbb{R}^n\times\mathbb{R}^n} \) with covariance matrix

\[ \Gamma= \begin{pmatrix} \Sigma_1 & C\\ C^\top&\Sigma_2 \end{pmatrix} \]

then the quantity

\[ \mathbb{E}(\Vert X-Y\Vert_2^2)=\mathrm{Tr}(\Sigma_1+\Sigma_2-2C) \]

depends only on \( {\Gamma} \). Also, when \( {\mu=\mathcal{N}(0,\Sigma_1)} \) and \( {\nu=\mathcal{N}(0,\Sigma_2)} \), one can restrict the infimum which defines \( {W_2} \) to run over Gaussian laws \( {\mathcal{N}(0,\Gamma)} \) on \( {\mathbb{R}^n\times\mathbb{R}^n} \) with covariance matrix \( {\Gamma} \) structured as above. The sole constrain on \( {C} \) is the Schur complement constraint:

\[ \Sigma_1-C\Sigma_2^{-1}C^\top\succeq0. \]

The minimization of the function

\[ C\mapsto-2\mathrm{Tr}(C) \]

under the constraint above leads to (1). A detailed proof is given by Givens and Shortt. Alternatively, one may find an optimal transportation map as Knott and Smith. It turns out that \( {\mathcal{N}(m_2,\Sigma_2)} \) is the image law of \( {\mathcal{N}(m_1,\Sigma_1)} \) with the linear map

\[ x\mapsto m_2+A(x-m_1) \]


\[ A=\Sigma_1^{-1/2}(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}\Sigma_1^{-1/2}=A^\top. \]

To check that this maps \( {\mathcal{N}(m_1,\Sigma_1)} \) to \( {\mathcal{N}(m_2,\Sigma_2)} \), say in the case \( {m_1=m_2=0} \) for simplicity, one may define the random column vectors \( {X\sim\mathcal{N}(m_1,\Sigma_1)} \) and \( {Y=AX} \) and write

\[ \begin{array}{rcl} \mathbb{E}(YY^\top) &=& A \mathbb{E}(XX^\top) A^\top\\ &=& \Sigma_1^{-1/2}(\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2} (\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}\Sigma_1^{-1/2}\\ &=& \Sigma_2. \end{array} \]

To check that the map is optimal, one may use,

\[ \begin{array}{rcl} \mathbb{E}(\|X-Y\|_2^2) &=&\mathbb{E}(\|X\|_2^2)+\mathbb{E}(\|Y\|_2^2)-2\mathbb{E}(\left<X,Y\right>) \\ &=&\mathrm{Tr}(\Sigma_1)+\mathrm{Tr}(\Sigma_2)-2\mathbb{E}(\left<X,AX\right>)\\ &=&\mathrm{Tr}(\Sigma_1)+\mathrm{Tr}(\Sigma_2)-2\mathrm{Tr}(\Sigma_1A) \end{array} \]

and observe that by the cyclic property of the trace,

\[ \mathrm{Tr}(\Sigma_1 A) =\mathrm{Tr}((\Sigma_1^{1/2}\Sigma_2\Sigma_1^{1/2})^{1/2}). \]

The generalizations to elliptic families of distributions and to infinite dimensional Hilbert spaces is probably easy. Some more “geometric” properties of Gaussians with respect to such distances where studied more recently by Takastu and Takastu and Yokota.


  1. Djalil Chafaï 2011-11-29

    I’ve just corrected a typo pointed out by Pierre-André Zitt. Thanks, Mr PAZ!

  2. Pierre-André Zitt 2011-12-28

    You’re welcome, Mr DjaC!

    For the record, the generalizations to elliptically symmetric distributions and infinite dimensional Hilbert spaces may be found in this paper by Gelbrich.

    Mr PAZ

  3. Gentil 2012-10-12

    Voila justement la formule que je cherchais. Merci!

  4. abhishek 2013-11-18

    can you please tell me what does “tr” stand for ?

  5. Djalil Chafaï 2013-11-18

    Tr is a standard notation for Trace, the sum of the diagonal terms.

  6. Djalil Chafaï 2013-12-16

    Fixed a bug (pointed out by Nicolas F. by email) in the formula of the optimal transportation map.

  7. Djalil Chafaï 2013-12-16

    Note that this distance is also known as the Fréchet or Mallows or Kantorovitch distance in certain communities.

  8. Djalil Chafaï 2014-10-28

    It seems that the expression of the W2 distance between two Gaussian laws is called the Bure metric. See for instance the article arXiv:1410.6883 by Peter Forrester and Mario Kieburg entitled “Relating the Bures measure to the Cauchy two-matrix model”.

  9. Luc, Chen 2018-01-16


    Just curious: Is that possible to get the W_2 distance of two centered Gaussian vectors with covariance matrices K, C in terms of operator norm of C – K ?

    Thanks in advance.

    Best, Luc

Leave a Reply

Your email address will not be published. Required fields are marked *