{"id":18270,"date":"2023-08-24T16:00:15","date_gmt":"2023-08-24T14:00:15","guid":{"rendered":"https:\/\/djalil.chafai.net\/blog\/?p=18270"},"modified":"2024-02-14T10:01:42","modified_gmt":"2024-02-14T09:01:42","slug":"a-reverse-pinsker-inequality","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2023\/08\/24\/a-reverse-pinsker-inequality\/","title":{"rendered":"A reverse Pinsker inequality"},"content":{"rendered":"<figure id=\"attachment_18326\" aria-describedby=\"caption-attachment-18326\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Solomon_Kullback\"><img loading=\"lazy\" class=\"size-full wp-image-18326\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2023\/08\/Kullback.jpg\" alt=\"Photo of Solomon Kullback (1907 \u2013 1994)\" width=\"300\" height=\"353\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2023\/08\/Kullback.jpg 300w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2023\/08\/Kullback-255x300.jpg 255w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-18326\" class=\"wp-caption-text\">Solomon Kullback (1907 \u2013 1994)<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">This tiny post is devoted to a cute reverse Pinsker inequality, extracted from a recent work of my colleague Justin Salez. This reverse Pinsker inequality is at the heart of his investigation of the cutoff phenomenon, in total variation distance, for Markov chains, that we do not discuss here.<\/p>\n<p style=\"text-align: justify;\"><strong>Pinsker inequality.<\/strong> If $\\mu$ and $\\nu$ are probability measures on the same probability space, it reads<br \/>\n\\[<br \/>\n2\\|\\mu-\\nu\\|_{\\mathrm{TV}}^2\\leq\\mathrm{Entropy}(\\nu\\mid\\mu)<br \/>\n\\] where<br \/>\n\\[<br \/>\n\\mathrm{Entropy}(\\nu\\mid\\mu):=\\int\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\log\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\mathrm{d}\\mu.<br \/>\n\\] is the Kullback-Leibler divergence or relative entropy of $\\nu$ with respect to $\\mu$, and where<br \/>\n\\[<br \/>\n\\|\\mu-\\nu\\|_{\\mathrm{TV}}:=\\sup_A|\\mu(A)-\\nu(A)|<br \/>\n\\] is the total variation distance between $\\mu$ and $\\nu$. By convention $\\mathrm{Entropy}(\\nu\\mid\\mu)=+\\infty$ if $\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\nu}$ does not exist, and in this case the inequality is useless. The constant $2$ is optimal as one can check on Bernoulli distributions. The inequality was first proved by Mark S. Pinsker (1925 \u2013 2003) with another constant, and later on and independently by Solomon Kullback (1907 \u2013 1994) and Imre Csisz\u00e1r (1938 \u2013 ). Let us give now a quick proof, due to Johannes H. B. Kemperman (1924 \u2013 2011). Using $\\|\\mathrm{1}_A-\\mathrm{1}_{A^c}\\|_\\infty\\leq1$, the $L^1-L^\\infty$ duality, the convexity inequality \\[ 3(u-1)^2\\leq(2u+4)(u\\log u-u+1),\\quad u\\geq0,\\]and the Cauchy-Schwarz inequality, we get, with $\\alpha:=\\alpha_{\\nu\\mid\\mu}:=\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}$, \\begin{align*}<br \/>\n2\\|\\mu-\\nu\\|_{\\mathrm{TV}}<br \/>\n&amp;\\leq\\sup_{\\|f\\|_\\infty\\leq1}\\int f\\mathrm{d}(\\mu-\\nu)\\\\<br \/>\n&amp;=\\int|\\alpha-1|\\mathrm{d}\\mu\\\\<br \/>\n&amp;\\leq\\int\\sqrt{\\frac{2\\alpha+4}{3}}\\sqrt{\\alpha\\log\\alpha-\\alpha+1}\\mathrm{d}\\mu\\\\<br \/>\n&amp;\\leq\\sqrt{\\int\\frac{2\\alpha+4}{3}\\mathrm{d}\\mu}\\sqrt{\\int\\alpha\\log\\alpha\\mathrm{d}\\mu}\\\\<br \/>\n&amp;=\\sqrt{2\\int\\alpha\\log\\alpha\\mathrm{d}\\mu}\\\\<br \/>\n&amp;=\\sqrt{2\\mathrm{Entropy}(\\nu\\mid\\mu)}.<br \/>\n\\end{align*} Note that $\\left\\|\\cdot\\right\\|_{\\mathrm{TV}}$ takes its values in $[0,1]$ while $\\mathrm{Entropy}$ takes its values in $[0,+\\infty]$. Indeed if $\\mathrm{d}\\nu=\\frac{\\mathrm{1}_A}{\\mu(A)}\\mathrm{d}\\mu$, then $\\|\\mu-\\nu\\|_{\\mathrm{TV}}=1-\\mu(A)$ while $\\mathrm{Entropy}(\\nu\\mid\\mu)=-\\log\\mu(A)\\to+\\infty$ as $\\mu(A)\\to0$. Also we cannot expect a direct reverse Pinsker inequality.<\/p>\n<p style=\"text-align: justify;\"><strong>Varentropy.<\/strong> The relative entropy can be seen as the mean of a log :<br \/>\n\\[<br \/>\n\\mathrm{Entropy}(\\nu\\mid\\mu)=\\int\\beta_{\\nu\\mid\\mu}\\mathrm{d}\\nu<br \/>\n\\quad\\text{where}\\quad<br \/>\n\\beta_{\\nu\\mid\\mu}:=\\log\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}.<br \/>\n\\] The variance of $\\beta_{\\nu\\mid\\mu}$ or varentropy is<br \/>\n\\begin{align*}<br \/>\n\\mathrm{Varentropy}(\\nu\\mid\\mu)<br \/>\n&amp;:=<br \/>\n\\mathrm{Var}_\\nu(\\beta_{\\nu\\mid\\mu})\\\\<br \/>\n&amp;=\\int(\\beta_{\\nu\\mid\\mu}-\\mathrm{Entropy}(\\nu\\mid\\mu))^2\\mathrm{d}\\nu\\\\<br \/>\n&amp;=\\int\\beta_{\\nu\\mid\\mu}^2\\mathrm{d}\\nu-\\mathrm{Entropy}(\\nu\\mid\\mu)^2.<br \/>\n\\end{align*} If $\\mathrm{d}\\nu=\\frac{\\mathrm{1}_A}{\\mu(A)}\\mathrm{d}\\mu$ then $\\beta_{\\nu\\mid\\mu}$ is constant on $A=\\mathrm{supp}(\\nu)$ and $\\mathrm{Varentropy}(\\nu\\mid\\mu)=0$.<\/p>\n<p style=\"text-align: justify;\"><strong>A reverse Pinsker inequality.<\/strong> It reads<br \/>\n\\[<br \/>\n\\mathrm{Entropy}(\\nu\\mid\\mu)<br \/>\n\\leq\\frac{1+\\sqrt{\\mathrm{Varentropy}(\\nu\\mid\\mu)}}{1-\\|\\mu-\\nu\\|_{\\mathrm{TV}}}.<br \/>\n\\] In particular for all $\\varepsilon\\in(0,1)$,<br \/>\n\\[<br \/>\n\\|\\mu-\\nu\\|_{\\mathrm{TV}}\\leq\\varepsilon<br \/>\n\\quad\\Longrightarrow\\quad<br \/>\n\\mathrm{Entropy}(\\nu\\mid\\mu)<br \/>\n\\leq\\frac{1+\\sqrt{\\mathrm{Varentropy}(\\nu\\mid\\mu)}}{1-\\varepsilon}.<br \/>\n\\] Let us give a proof. Let us set $\\varepsilon:=\\|\\mu-\\nu\\|_{\\mathrm{TV}}$. Consider the event<br \/>\n\\[<br \/>\nA:=\\Bigr\\{\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\geq\\mathrm{e}^\\theta\\Bigr\\}<br \/>\n\\quad\\text{where}\\quad<br \/>\n\\theta:=\\mathrm{Entropy}(\\nu\\mid\\mu)-\\frac{\\sqrt{\\mathrm{Varentropy}(\\nu\\mid\\mu)}}{1-\\varepsilon}.<br \/>\n\\] The Chebyshev inequality gives\\begin{align*}<br \/>\n\\nu(A^c)<br \/>\n&amp;=\\nu(\\beta_{\\nu\\mid\\mu}&lt;\\theta)\\\\<br \/>\n&amp;=\\nu\\Bigr(\\beta_{\\nu\\mid\\mu}-\\mathrm{Entropy}(\\nu\\mid\\mu)&lt;-\\frac{\\sqrt{\\mathrm{Varentropy}(\\nu\\mid\\mu)}}{1-\\varepsilon}\\Bigr)\\\\<br \/>\n&amp;\\leq(1-\\varepsilon)^2,<br \/>\n\\end{align*} hence $\\nu(A)\\geq1-(1-\\varepsilon)^2$. On the other hand, the definition of $A$ gives<br \/>\n\\[<br \/>\n\\mu(A)\\leq\\mathrm{e}^{-\\theta}\\nu(A).<br \/>\n\\] These two inequalities together give<br \/>\n\\[<br \/>\n\\nu(A)-\\mu(A)\\geq(1-(1-\\varepsilon)^2)(1-\\mathrm{e}^{-\\theta}).<br \/>\n\\] Now since $\\|\\mu-\\nu\\|_{\\mathrm{TV}}=\\varepsilon$, we get $\\varepsilon\\geq(1-(1-\\varepsilon)^2)(1-\\mathrm{e}^{-\\theta})$, in other words<br \/>\n\\[<br \/>\n\\theta\\leq\\log\\Bigr(1+\\frac{1}{1-\\varepsilon}\\Bigr).<br \/>\n\\] Finally, the convexity inequality $\\log(1+u)\\leq u$ gives $\\theta\\leq1\/(1-\\varepsilon)$, as expected.<\/p>\n<p style=\"text-align: justify;\">If $\\mathrm{d}\\nu=\\frac{\\mathrm{1}_A}{\\mu(A)}\\mathrm{d}\\mu$ then the reverse Pinsker inequality above reads $-\\mu(A)\\log\\mu(A)\\leq1$.<\/p>\n<p style=\"text-align: justify;\"><strong>Fisher information and Poincar\u00e9 inequality.<\/strong> The logarithmic Fisher information is defined by<br \/>\n\\[<br \/>\n\\mathrm{Fisher}(\\nu\\mid\\mu)<br \/>\n:=\\int|\\nabla\\beta_{\\nu\\mid\\mu}|^2\\mathrm{d}\\nu<br \/>\n=\\int\\frac{|\\nabla\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}|^2}{\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}}\\mathrm{d}\\mu.<br \/>\n\\] Also we have, denoting $c_{\\mathrm{PI}}(\\nu)$ the Poincar\u00e9 inequality constant of $\\nu$ :<br \/>\n\\[<br \/>\n\\mathrm{Varentropy}(\\nu\\mid\\mu)<br \/>\n=<br \/>\n\\mathrm{Var}_\\nu(\\beta_{\\nu\\mid\\mu})<br \/>\n\\leq c_{\\mathrm{PI}}(\\nu)\\int|\\nabla\\beta_{\\nu\\mid\\mu}|^2\\mathrm{d}\\nu<br \/>\n=c_{\\mathrm{PI}}(\\nu)\\mathrm{Fisher}(\\nu\\mid\\mu).<br \/>\n\\] Combined with the reverse Pinsker inequality above, this gives<br \/>\n\\[<br \/>\n\\mathrm{Entropy}(\\nu\\mid\\mu)<br \/>\n\\leq\\frac{1+\\sqrt{c_{\\mathrm{PI}}(\\nu)\\mathrm{Fisher}(\\nu\\mid\\mu)}}{1-\\|\\mu-\\nu\\|_{\\mathrm{TV}}}.<br \/>\n\\]<\/p>\n<p style=\"text-align: justify;\"><strong>Note.<\/strong> There are plenty of alternate or reverse Pinsker inequalities. For instance the Bretagnolle-Huber inequality, used for hypothesis testing in Statistics, reads \\[\\|\\mu-\\nu\\|_{\\mathrm{TV}}^2\\leq1-\\mathrm{e}^{-\\mathrm{Entropy}(\\nu\\mid\\mu)},\\]and has the advantage over Pinsker of being informative even when $\\mathrm{Entropy}(\\nu\\mid\\mu)\\geq2$.<\/p>\n<p style=\"text-align: justify;\"><strong>Further reading.<\/strong><\/p>\n<ul>\n<li>Johannes H. B. Kemperman.<br \/>\n<a href=\"https:\/\/doi.org\/10.1007\/BFB0079123\">On the optimum rate of transmitting information<\/a> (in particular pages 2174-2175)<br \/>\nAnnals of Mathematical Statistics, 40:2156\u20132177 (1969)<\/li>\n<li>Justin Salez<br \/>\n<a href=\"https:\/\/doi.org\/10.4171\/jems\/1348\">Cutoff for non-negatively curved Markov chains<\/a> (in particular Lemma 8)<br \/>\nTo appear in Journal of the European Mathematical Society (2023)<br \/>\nSee also <a href=\"https:\/\/arxiv.org\/abs\/2102.05597\">arXiv:2102.05597<\/a><\/li>\n<\/ul>\n<figure id=\"attachment_18271\" aria-describedby=\"caption-attachment-18271\" style=\"width: 130px\" class=\"wp-caption aligncenter\"><a href=\"\/scripts\/search.php?q=Justin+Salez+mathematics\"><img loading=\"lazy\" class=\"size-full wp-image-18271\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2023\/08\/Justin_Salez.jpg\" alt=\"Photo of Justin Salez (1984 \u2013 )\" width=\"130\" height=\"154\" \/><\/a><figcaption id=\"caption-attachment-18271\" class=\"wp-caption-text\">Justin Salez (1984 - )<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>This tiny post is devoted to a cute reverse Pinsker inequality, extracted from a recent work of my colleague Justin Salez. This reverse Pinsker inequality&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2023\/08\/24\/a-reverse-pinsker-inequality\/\">Continue reading<span class=\"screen-reader-text\">A reverse Pinsker inequality<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1232},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/18270"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=18270"}],"version-history":[{"count":114,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/18270\/revisions"}],"predecessor-version":[{"id":19226,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/18270\/revisions\/19226"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=18270"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=18270"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=18270"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}