Press "Enter" to skip to content

A reverse Pinsker inequality

Photo of Solomon Kullback (1907 – 1994)
Solomon Kullback (1907 – 1994)

This tiny post is devoted to a cute reverse Pinsker inequality, extracted from a recent work of my colleague Justin Salez. This reverse Pinsker inequality is at the heart of his investigation of the cutoff phenomenon, in total variation distance, for Markov chains, that we do not discuss here.

Pinsker inequality. If μμ and νν are probability measures on the same probability space, it reads
2μν2TVEntropy(νμ)2μν2TVEntropy(νμ) where
Entropy(νμ):=dνdμlogdνdμdμ.Entropy(νμ):=dνdμlogdνdμdμ. is the Kullback-Leibler divergence or relative entropy of νν with respect to μμ, and where
μνTV:=supA|μ(A)ν(A)|μνTV:=supA|μ(A)ν(A)| is the total variation distance between μμ and νν. By convention Entropy(νμ)=+Entropy(νμ)=+ if dνdνdνdν does not exist, and in this case the inequality is useless. The constant 22 is optimal as one can check on Bernoulli distributions. The inequality was first proved by Mark S. Pinsker (1925 – 2003) with another constant, and later on and independently by Solomon Kullback (1907 – 1994) and Imre Csiszár (1938 – ). Let us give now a quick proof, due to Johannes H. B. Kemperman (1924 – 2011). Using 1A1Ac11A1Ac1, the L1LL1L duality, the convexity inequality 3(u1)2(2u+4)(uloguu+1),u0,3(u1)2(2u+4)(uloguu+1),u0,and the Cauchy-Schwarz inequality, we get, with α:=ανμ:=dνdμα:=ανμ:=dνdμ, 2μνTVsupf1fd(μν)=|α1|dμ2α+43αlogαα+1dμ2α+43dμαlogαdμ=2αlogαdμ=2Entropy(νμ). Note that TV takes its values in [0,1] while Entropy takes its values in [0,+]. Indeed if dν=1Aμ(A)dμ, then μνTV=1μ(A) while Entropy(νμ)=logμ(A)+ as μ(A)0. Also we cannot expect a direct reverse Pinsker inequality.

Varentropy. The relative entropy can be seen as the mean of a log :
Entropy(νμ)=βνμdνwhereβνμ:=logdνdμ. The variance of βνμ or varentropy is
Varentropy(νμ):=Varν(βνμ)=(βνμEntropy(νμ))2dν=β2νμdνEntropy(νμ)2. If dν=1Aμ(A)dμ then βνμ is constant on A=supp(ν) and Varentropy(νμ)=0.

A reverse Pinsker inequality. It reads
Entropy(νμ)1+Varentropy(νμ)1μνTV. In particular for all ε(0,1),
μνTVεEntropy(νμ)1+Varentropy(νμ)1ε. Let us give a proof. Let us set ε:=μνTV. Consider the event
A:={dνdμeθ}whereθ:=Entropy(νμ)Varentropy(νμ)1ε. The Chebyshev inequality givesν(Ac)=ν(βνμ<θ)=ν(βνμEntropy(νμ)<Varentropy(νμ)1ε)(1ε)2, hence ν(A)1(1ε)2. On the other hand, the definition of A gives
μ(A)eθν(A). These two inequalities together give
ν(A)μ(A)(1(1ε)2)(1eθ). Now since μνTV=ε, we get ε(1(1ε)2)(1eθ), in other words
θlog(1+11ε). Finally, the convexity inequality log(1+u)u gives θ1/(1ε), as expected.

If dν=1Aμ(A)dμ then the reverse Pinsker inequality above reads μ(A)logμ(A)1.

Fisher information and Poincaré inequality. The logarithmic Fisher information is defined by
Fisher(νμ):=|βνμ|2dν=|dνdμ|2dνdμdμ. Also we have, denoting cPI(ν) the Poincaré inequality constant of ν :
Varentropy(νμ)=Varν(βνμ)cPI(ν)|βνμ|2dν=cPI(ν)Fisher(νμ). Combined with the reverse Pinsker inequality above, this gives
Entropy(νμ)1+cPI(ν)Fisher(νμ)1μνTV.

Note. There are plenty of alternate or reverse Pinsker inequalities. For instance the Bretagnolle-Huber inequality, used for hypothesis testing in Statistics, reads μν2TV1eEntropy(νμ),and has the advantage over Pinsker of being informative even when Entropy(νμ)2.

Further reading.

Photo of Justin Salez (1984 – )
Justin Salez (1984 - )
    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Syntax · Style · .