Processing math: 100%
Press "Enter" to skip to content

About probability metrics

Richard Mansfield Dudley (1938 – 2020)
Richard Mansfield Dudley (1938 – 2020)

This post is about some few distances or divergences between probability measures on the same space, by default Rn, or even R. They generate useful but different topologies.

The first one is familiar in Statistics:
χ2(νμ)=Varμ(dνdμ)=dνdμ12L2(μ). The second one is also known as the Kullback-Leibler information divergence or relative entropy:
Kullback(νμ)=logdνdμdν=dνdμlogdνdμdμ. The third one is the Fisher information, which can be seen as a logarithmic Sobolev norm:
Fisher(νμ)=|logdνdμ|2dν=4|dνdμ|2dμ The fourth one is the order 2 Monge-Kantorovich-Wasserstein Euclidean coupling distance
Wasserstein2(μ,ν)=inf(Xμ,Xν)E(12|XμXν|2)=supf,g(fdμgdν) where the infimum runs over all couples (Xμ,Xν) of random variables on the product space with marginal laws μ and ν, and where the supremum runs over all bounded and Lipschitz f and g such that f(x)g(y)12|xy|2. The optimal f is given by the infimum convolution f(x)=infy(g(y)+12|xy|2 for all x,y, which is the Hopf-Lax solution at unit time of the Hamilton-Jacobi equation ft+12|xft|2=0 with f0=g. The passage from an infimum formulation to a supremum formulation is an instance of the Kantorovich-Rubinstein duality.
The fifth one is the total variation distance:
μνTV=inf(Xμ,Xν)E(1XμXν)=supf12(fdμfdν)=supA|ν(A)μ(A)|=12φμφνL1(λ) where φμ and φν are the densities of μ and ν with respect to any common reference measure λ when such a measure exists. The first expression states that it can be seen as a Monge-Kantorovich-Wasserstein coupling distance of order 1 for the atomic distance on the underlying space. The total variation distance makes no difference between small differences and big differences, in contrast with the Euclidean Wasserstein distance introduced previously. The sixth and last one is the Hellinger distance:
Hellinger2(μ,ν)=12φμφν2L2(λ) which turns out to be essentially equivalent with total variation, while allowing explicit formulas for tensor products and Gaussians.

Some universal comparisons.

μν2TV2Kullback(νμ)2Hellinger2(μ,ν)Kullback(νμ)Kullback(νμ)2χ(νμ)+χ2(νμ)Hellinger2(μ,ν)μνTVHellinger(μ,ν)2Hellinger2(μ,ν)

Contraction. From the variational formulas, if dist{TV,Kullback,χ2} then
dist(νf1μf1)dist(νμ).In the same spirit, if f:RnRk, then
Wasserstein(μf1,νf1)fLipWasserstein(μ,ν)

Tensorisation.
Hellinger2(ni=1μi,ni=1νi)=1ni=1(1Hellinger2(μi,νi))Kullback(ni=1νini=1μi)=ni=1Kullback(νiμi)χ2(ni=1μini=1νi)=1+ni=1(χ2(μi,νi)+1)Fisher(ni=1νini=1μi)=ni=1Fisher(νiμi)Wasserstein2(ni=1μi,ni=1νi)=ni=1Wasserstein2(μi,νi)max1inμiνiTVni=1μini=1νiTVni=1μiνiTV

Monotonicity. If (Xt)t0 is a continuous time Markov process, ergodic with unique invariant probability measure μ, then, denoting μt=Law(Xt),
dist(μtμ)↘t0provided that dist is convex. Actually, if νμ, then
dist(νμ)=Φ(dνdμ)dμΦ(u)={u21ifdist=χ2ulog(u)ifdist=Kullback12|u1|ifdist=TV12(1u)ifdist=Hellinger2
In the case of Fisher and Wasserstein, the monotonicity along the time is not always true, but it always holds for the overdamped Langevin diffusion solving the stochastic differential equation dXt=2dBtV(Xt)dtprovided that the potential V:RdR is convex, but this is not obvious at all!

Further reading. There are of course plenty of other distances and divergences between probability measures, such as the Lévy-Prokhorov metric, the Fortet-Mourier or Dudley bounded-Lipschitz metric, the Kolmogorov-Smirnov metric, etc.

Further reading.

  • Gibbs, Alison L. and Su, Francis Edward
    On choosing and bounding probability metrics
    International Statistical Review / Revue Internationale de Statistique 70(3) 419-435 (2002)
  • Pollard, David
    A user's guide to measure theoretic probability
    Cambridge University Press (2002)
  • Rachev, Svetlozar Todorov
    Probability Metrics and the Stability of Stochastic Models
    Wiley (1991)

One Comment

  1. Eric Regis 2024-12-05

    This blog is great! Thanks for putting all of these probability metrics in one place.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Syntax · Style · .