Loading [MathJax]/jax/output/CommonHTML/jax.js
Press "Enter" to skip to content

About the Hellinger distance

Ernst David Hellinger (1883 - 1950)
Ernst David Hellinger (1883 - 1950)

This tiny post is devoted to the Hellinger distance and affinity.

Hellinger. Let μ and ν be probability measures with respective densities f and g with respect to the Lebesgue measure λ on Rd. Their Hellinger distance is

H(μ,ν)=fgL2(λ)=((fg)2dλ)1/2.

This is well defined since f and g belong to L2(λ). The Hellinger affinity is

A(μ,ν)=fgdλ,H(μ,ν)2=22A(μ,ν).

This gives H(μ,ν)2[0,2], A(μ,ν)[0,1], and the tensor product formula

H(μn,νn)2=22A(μn,νn)=22A(μ,ν)n=22(1H(μ,ν)22)n.

Note that H(μ,ν)2=2 iff μ and ν have disjoint supports.

Note that if μν then limnH(μn,νn)=2, a high dimensional phenomenon.

We could also take the following polarized definition

Hellinger2(μ,ν)=22dμdνdν=22dνdμdμ

which reveals a freeness with respect to the reference measure. This shows also that the Hellinger distance is a Φ-entropy, namely

Hellinger2(μ,ν)=Φ(f)dμΦ(fdμ)

where Φ:=2 and f:=dν/dμ.

The notions of Hellinger distance and affinity pass to discrete distributions by replacing the Lebesgue measure λ by the counting measure. The Hellinger distance is a special case of the Lp version f1/pg1/pLp(λ) available for arbitrary p1. This is useful in asymptotic statistics, and we refer to the textbooks listed below.

Relation to total variation distance. The Hellinger distance is equivalent topologically and close metrically to the total variation distance, in the sense that

H2(μ,ν)2μνTVH(μ,ν)4H(μ,ν)22H(μ,ν)

where

μνTV=supA|μ(A)ν(A)|=12|fg|dλ.

Indeed, the first inequality comes from the following elementary observation

(ab)2=a+b2aba+b2(ab)=|ab|,

valid for all a,b0, while the second inequality comes from

|ab|=|a2b2|=|ab|(a+b)

whiche gives, thanks to the Cauchy-Schwarz inequality,

|fg|dλH(μ,ν)(f+g)2dλ=H(μ,ν)2+2A(μ,ν).

Gaussian explicit formula. The Hellinger distance (or affinity) between two Gaussian distributions can be computed explicitly, just like the square Wasserstein distance and the Kullback-Leibler divergence or relative entropy. Namely

A(N(m1,σ21),N(m2,σ22))=2σ1σ2σ21+σ22exp((m1m2)24(σ21+σ22)),

equal to 1 iff (m1,σ1)=(m2,σ2). By using the tensor product formula, we have also

A(N(m1,σ21)n,N(m2,σ22)n)=(2σ1σ2σ21+σ22)n/2exp(n(m1m2)24(σ21+σ22)).

Here is a general matrix'' formula for Gaussians on Rd, d1, with Δm=m2m1,

A(N(m1,Σ1),N(m2,Σ2))=det(Σ1Σ2)1/4det(Σ1+Σ22)1/2exp(Δm,(Σ1+Σ2)1Δm)4),

see for instance Pardo's book, page 51, for a computation.

The Hellinger affinity is also known as the Bhattacharyya coefficient, and enters the definition of the Bhattacharyya distance (μ,ν)logA(μ,ν).

Application to long time behavior of Ornstein-Uhlenbeck. Let (Bt)t0 be an n-dimensional standard Brownian motion and let (Xxt)t0 be the Ornstein-Uhlenbeck process solution of the stochastic differential equation

X0=x,dXxt=2dBtXxtdt

where xRn. By plugging this equation into the identity d(etXxt)=etdXxt+etXxtdt we get the Mehler formula (the variance comes from the Wiener integral)

Xxt=xet+2t0estdBsN(xet,(1e2t)In)tN(0,In).

It follows in particular that for all x,yRn an t>0

12H2(Law(Xxt),Law(Xyt))=1exp(|xy|2e2t1e2t).

Moreover, denoting μt=Law(Xxt) and μ=N(0,In), it follows that

H(μt,μ)2=22(21e2t2e2t)1/2exp(|x|2e2t4(2e2t)).

This quantity tends to 0 as t. If |x|2=x21++x2ncn then this happens, as n is large, near the critical value t=12log(n), for which e2t=1/n. More information about cutoffs phenomena for Ornstein-Uhlenbeck and diffusions is available in the papers below.

Further reading

One Comment

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Syntax · Style · .