Loading [MathJax]/jax/output/CommonHTML/jax.js
Press "Enter" to skip to content

Month: August 2023

A reverse Pinsker inequality

Photo of Solomon Kullback (1907 – 1994)
Solomon Kullback (1907 – 1994)

This tiny post is devoted to a cute reverse Pinsker inequality, extracted from a recent work of my colleague Justin Salez. This reverse Pinsker inequality is at the heart of his investigation of the cutoff phenomenon, in total variation distance, for Markov chains, that we do not discuss here.

Pinsker inequality. If μ and ν are probability measures on the same probability space, it reads
2μν2TVEntropy(νμ) where
Entropy(νμ):=dνdμlogdνdμdμ. is the Kullback-Leibler divergence or relative entropy of ν with respect to μ, and where
μνTV:=supA|μ(A)ν(A)| is the total variation distance between μ and ν. By convention Entropy(νμ)=+ if dνdν does not exist, and in this case the inequality is useless. The constant 2 is optimal as one can check on Bernoulli distributions. The inequality was first proved by Mark S. Pinsker (1925 – 2003) with another constant, and later on and independently by Solomon Kullback (1907 – 1994) and Imre Csiszár (1938 – ). Let us give now a quick proof, due to Johannes H. B. Kemperman (1924 – 2011). Using 1A1Ac1, the L1L duality, the convexity inequality 3(u1)2(2u+4)(uloguu+1),u0,and the Cauchy-Schwarz inequality, we get, with α:=ανμ:=dνdμ, 2μνTVsupf1fd(μν)=|α1|dμ2α+43αlogαα+1dμ2α+43dμαlogαdμ=2αlogαdμ=2Entropy(νμ). Note that TV takes its values in [0,1] while Entropy takes its values in [0,+]. Indeed if dν=1Aμ(A)dμ, then μνTV=1μ(A) while Entropy(νμ)=logμ(A)+ as μ(A)0. Also we cannot expect a direct reverse Pinsker inequality.

Varentropy. The relative entropy can be seen as the mean of a log :
Entropy(νμ)=βνμdνwhereβνμ:=logdνdμ. The variance of βνμ or varentropy is
Varentropy(νμ):=Varν(βνμ)=(βνμEntropy(νμ))2dν=β2νμdνEntropy(νμ)2. If dν=1Aμ(A)dμ then βνμ is constant on A=supp(ν) and Varentropy(νμ)=0.

A reverse Pinsker inequality. It reads
Entropy(νμ)1+Varentropy(νμ)1μνTV. In particular for all ε(0,1),
μνTVεEntropy(νμ)1+Varentropy(νμ)1ε. Let us give a proof. Let us set ε:=μνTV. Consider the event
A:={dνdμeθ}whereθ:=Entropy(νμ)Varentropy(νμ)1ε. The Chebyshev inequality givesν(Ac)=ν(βνμ<θ)=ν(βνμEntropy(νμ)<Varentropy(νμ)1ε)(1ε)2, hence ν(A)1(1ε)2. On the other hand, the definition of A gives
μ(A)eθν(A). These two inequalities together give
ν(A)μ(A)(1(1ε)2)(1eθ). Now since μνTV=ε, we get ε(1(1ε)2)(1eθ), in other words
θlog(1+11ε). Finally, the convexity inequality log(1+u)u gives θ1/(1ε), as expected.

If dν=1Aμ(A)dμ then the reverse Pinsker inequality above reads μ(A)logμ(A)1.

Fisher information and Poincaré inequality. The logarithmic Fisher information is defined by
Fisher(νμ):=|βνμ|2dν=|dνdμ|2dνdμdμ. Also we have, denoting cPI(ν) the Poincaré inequality constant of ν :
Varentropy(νμ)=Varν(βνμ)cPI(ν)|βνμ|2dν=cPI(ν)Fisher(νμ). Combined with the reverse Pinsker inequality above, this gives
Entropy(νμ)1+cPI(ν)Fisher(νμ)1μνTV.

Note. There are plenty of alternate or reverse Pinsker inequalities. For instance the Bretagnolle-Huber inequality, used for hypothesis testing in Statistics, reads μν2TV1eEntropy(νμ),and has the advantage over Pinsker of being informative even when Entropy(νμ)2.

Further reading.

Photo of Justin Salez (1984 – )
Justin Salez (1984 - )
Leave a Comment
Syntax · Style · .