{"id":12095,"date":"2020-01-22T15:02:44","date_gmt":"2020-01-22T14:02:44","guid":{"rendered":"http:\/\/djalil.chafai.net\/blog\/?p=12095"},"modified":"2022-03-18T16:58:46","modified_gmt":"2022-03-18T15:58:46","slug":"about-the-hellinger-distance","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2020\/01\/22\/about-the-hellinger-distance\/","title":{"rendered":"About the Hellinger distance"},"content":{"rendered":"<figure id=\"attachment_12096\" aria-describedby=\"caption-attachment-12096\" style=\"width: 247px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Ernst_Hellinger\"><img loading=\"lazy\" class=\"wp-image-12096 size-medium\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2020\/01\/Hellinger-247x300.jpeg\" alt=\"Ernst David Hellinger (1883 - 1950)\" width=\"247\" height=\"300\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2020\/01\/Hellinger-247x300.jpeg 247w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2020\/01\/Hellinger-123x150.jpeg 123w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2020\/01\/Hellinger.jpeg 268w\" sizes=\"(max-width: 247px) 100vw, 247px\" \/><\/a><figcaption id=\"caption-attachment-12096\" class=\"wp-caption-text\">Ernst David Hellinger (1883 - 1950)<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">This tiny post is devoted to the Hellinger distance and affinity.<\/p>\n<p style=\"text-align: justify;\"><b>Hellinger.<\/b> Let \\( {\\mu} \\) and \\( {\\nu} \\) be probability measures with respective densities \\( {f} \\) and \\( {g} \\) with respect to the Lebesgue measure \\( {\\lambda} \\) on \\( {\\mathbb{R}^d} \\). Their <b>Hellinger distance<\/b> is<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{H}(\\mu,\\nu) ={\\Vert\\sqrt{f}-\\sqrt{g}\\Vert}_{\\mathrm{L}^2(\\lambda)} =\\Bigr(\\int(\\sqrt{f}-\\sqrt{g})^2\\mathrm{d}\\lambda\\Bigr)^{1\/2}. \\]<\/p>\n<p style=\"text-align: justify;\">This is well defined since \\( {\\sqrt{f}} \\) and \\( {\\sqrt{g}} \\) belong to \\( {\\mathrm{L}^2(\\lambda)} \\). The <b>Hellinger affinity<\/b> is<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{A}(\\mu,\\nu) =\\int\\sqrt{fg}\\mathrm{d}\\lambda, \\quad \\mathrm{H}(\\mu,\\nu)^2 =2-2A(\\mu,\\nu). \\]<\/p>\n<p style=\"text-align: justify;\">This gives \\( {H(\\mu,\\nu)^2\\in[0,2]} \\), \\( {A(\\mu,\\nu)\\in[0,1]} \\), and the <b>tensor product formula<\/b><\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{H}(\\mu^{\\otimes n},\\nu^{\\otimes n})^2 =2-2A(\\mu^{\\otimes n},\\nu^{\\otimes n}) =2-2A(\\mu,\\nu)^n =2-2\\left(1-\\frac{\\mathrm{H}(\\mu,\\nu)^2}{2}\\right)^n. \\]<\/p>\n<p style=\"text-align: justify;\">Note that \\( {\\mathrm{H}(\\mu,\\nu)^2=2} \\) iff \\( {\\mu} \\) and \\( {\\nu} \\) have disjoint supports.<\/p>\n<p style=\"text-align: justify;\">Note that if \\( {\\mu\\neq\\nu} \\) then \\( {\\lim_{n\\rightarrow\\infty}\\mathrm{H}(\\mu^{\\otimes n},\\nu^{\\otimes n})=2} \\), a high dimensional phenomenon.<\/p>\n<p style=\"text-align: justify;\">We could also take the following polarized definition<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Hellinger}^2(\\mu,\\nu)=2-2\\int\\sqrt{\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\nu}}\\mathrm{d}\\nu=2-2\\int\\sqrt{\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}}\\mathrm{d}\\mu \\]<\/p>\n<p style=\"text-align: justify;\">which reveals a freeness with respect to the reference measure. This shows also that the Hellinger distance is a \\( {\\Phi} \\)-entropy, namely<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Hellinger}^2(\\mu,\\nu) =\\int\\Phi(f)\\mathrm{d}\\mu-\\Phi\\Bigr(\\int f\\mathrm{d}\\mu\\Bigr) \\]<\/p>\n<p style=\"text-align: justify;\">where \\( {\\Phi:=-2\\sqrt{\\bullet}} \\) and \\( {f:=\\mathrm{d}\\nu\/\\mathrm{d}\\mu} \\).<\/p>\n<p style=\"text-align: justify;\">The notions of Hellinger distance and affinity pass to discrete distributions by replacing the Lebesgue measure \\( {\\lambda} \\) by the counting measure. The Hellinger distance is a special case of the \\( {\\mathrm{L}^p} \\) version \\( {\\Vert f^{1\/p}-g^{1\/p}\\Vert_{\\mathrm{L}^p(\\lambda)}} \\) available for arbitrary \\( {p\\geq1} \\). This is useful in asymptotic statistics, and we refer to the textbooks listed below.<\/p>\n<p style=\"text-align: justify;\"><b>Relation to total variation distance.<\/b> The Hellinger distance is equivalent topologically and close metrically to the total variation distance, in the sense that<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{H}^2(\\mu,\\nu) \\leq2\\left\\Vert\\mu-\\nu\\right\\Vert_{\\mathrm{TV}} \\leq\\mathrm{H}(\\mu,\\nu)\\sqrt{4-\\mathrm{H}(\\mu,\\nu)^2} \\leq2\\mathrm{H}(\\mu,\\nu) \\]<\/p>\n<p style=\"text-align: justify;\">where<\/p>\n<p style=\"text-align: center;\">\\[ \\left\\Vert\\mu-\\nu\\right\\Vert_{\\mathrm{TV}} =\\sup_A|\\mu(A)-\\nu(A)| =\\frac{1}{2}\\int|f-g|\\mathrm{d}\\lambda. \\]<\/p>\n<p style=\"text-align: justify;\">Indeed, the first inequality comes from the following elementary observation<\/p>\n<p style=\"text-align: center;\">\\[ (\\sqrt{a}-\\sqrt{b})^2 =a+b-2\\sqrt{ab} \\leq a+b-2(a\\wedge b) =|a-b|, \\]<\/p>\n<p style=\"text-align: justify;\">valid for all \\( {a,b\\geq0} \\), while the second inequality comes from<\/p>\n<p style=\"text-align: center;\">\\[ |a-b|=|\\sqrt{a}^2-\\sqrt{b}^2|=|\\sqrt{a}-\\sqrt{b}|(\\sqrt{a}+\\sqrt{b}) \\]<\/p>\n<p style=\"text-align: justify;\">whiche gives, thanks to the Cauchy-Schwarz inequality,<\/p>\n<p style=\"text-align: center;\">\\[ \\int|f-g|\\mathrm{d}\\lambda \\leq\\mathrm{H}(\\mu,\\nu)\\sqrt{\\int(\\sqrt{f}+\\sqrt{g})^2\\mathrm{d}\\lambda} =\\mathrm{H}(\\mu,\\nu)\\sqrt{2+2A(\\mu,\\nu)}. \\]<\/p>\n<p style=\"text-align: justify;\"><b>Gaussian explicit formula.<\/b> The Hellinger distance (or affinity) between two Gaussian distributions can be computed explicitly, just like the <a href= \"\/blog\/2010\/04\/30\/wasserstein-distance-between-two-gaussians\/\">square Wasserstein distance<\/a> and the Kullback-Leibler divergence or relative entropy. Namely<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{A}(\\mathcal{N}(m_1,\\sigma_1^2),\\mathcal{N}(m_2,\\sigma_2^2)) =\\sqrt{2\\frac{\\sigma_1\\sigma_2}{\\sigma_1^2+\\sigma_2^2}} \\exp\\Bigr(-\\frac{(m_1-m_2)^2}{4(\\sigma_1^2+\\sigma_2^2)}\\Bigr), \\]<\/p>\n<p style=\"text-align: justify;\">equal to \\( {1} \\) iff \\( {(m_1,\\sigma_1)=(m_2,\\sigma_2)} \\). By using the tensor product formula, we have also<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{A}(\\mathcal{N}(m_1,\\sigma_1^2)^n,\\mathcal{N}(m_2,\\sigma_2^2)^n) =\\Bigr(2\\frac{\\sigma_1\\sigma_2}{\\sigma_1^2+\\sigma_2^2}\\Bigr)^{n\/2} \\exp\\Bigr(-n\\frac{(m_1-m_2)^2}{4(\\sigma_1^2+\\sigma_2^2)}\\Bigr). \\]<\/p>\n<p style=\"text-align: justify;\">Here is a general ``matrix'' formula for Gaussians on \\( {\\mathbb{R}^d} \\), \\( {d\\geq1} \\), with \\( {\\Delta m=m_2-m_1} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{A}(\\mathcal{N}(m_1,\\Sigma_1),\\mathcal{N}(m_2,\\Sigma_2)) =\\frac{\\det(\\Sigma_1\\Sigma_2)^{1\/4}}{\\det(\\frac{\\Sigma_1+\\Sigma_2}{2})^{1\/2}} \\exp\\Bigr(-\\frac{\\langle\\Delta m,(\\Sigma_1+\\Sigma_2)^{-1}\\Delta m)\\rangle}{4}\\Bigr), \\]<\/p>\n<p style=\"text-align: justify;\">see for instance Pardo's book, page 51, for a computation.<\/p>\n<p style=\"text-align: justify;\">The Hellinger affinity is also known as the <b>Bhattacharyya coefficient<\/b>, and enters the definition of the <b>Bhattacharyya distance<\/b> \\( {(\\mu,\\nu)\\mapsto-\\log\\mathrm{A}(\\mu,\\nu)} \\).<\/p>\n<p style=\"text-align: justify;\"><b>Application to long time behavior of Ornstein-Uhlenbeck.<\/b> Let \\( {{(B_t)}_{t\\geq0}} \\) be an \\( {n} \\)-dimensional standard Brownian motion and let \\( {{(X^x_t)}_{t\\geq0}} \\) be the Ornstein-Uhlenbeck process solution of the stochastic differential equation<\/p>\n<p style=\"text-align: center;\">\\[ X_0=x,\\quad \\mathrm{d}X^x_t=\\sqrt{2}\\mathrm{d}B_t-X^x_t\\mathrm{d}t \\]<\/p>\n<p style=\"text-align: justify;\">where \\( {x\\in\\mathbb{R}^n} \\). By plugging this equation into the identity \\( {\\mathrm{d}(\\mathrm{e}^tX^x_t)=\\mathrm{e}^t\\mathrm{d}X^x_t+\\mathrm{e}^tX^x_t\\mathrm{d}t} \\) we get the Mehler formula (the variance comes from the Wiener integral)<\/p>\n<p style=\"text-align: center;\">\\[ X^x_t=x\\mathrm{e}^{-t}+\\sqrt{2}\\int_0^t\\mathrm{e}^{s-t}\\mathrm{d}B_s \\sim \\mathcal{N}(x\\mathrm{e}^{-t},(1-\\mathrm{e}^{-2t})I_n) \\underset{t\\rightarrow\\infty}{\\longrightarrow} \\mathcal{N}(0,I_n). \\]<\/p>\n<p style=\"text-align: justify;\">It follows in particular that for all \\( {x,y\\in\\mathbb{R}^n} \\) an \\( {t&gt;0} \\)<\/p>\n<p style=\"text-align: center;\">\\[ \\frac{1}{2}\\mathrm{H}^2(\\mathrm{Law}(X^x_t),\\mathrm{Law}(X^y_t)) =1-\\exp\\Bigr(-\\frac{|x-y|^2\\mathrm{e}^{-2t}}{1-\\mathrm{e}^{-2t}}\\Bigr). \\]<\/p>\n<p style=\"text-align: justify;\">Moreover, denoting \\( {\\mu_t=\\mathrm{Law}(X^x_t)} \\) and \\( {\\mu_\\infty=\\mathcal{N}(0,I_n)} \\), it follows that<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{H}(\\mu_t,\\mu_\\infty)^2 =2-2\\Bigr(2\\frac{\\sqrt{1-\\mathrm{e}^{-2t}}}{2-\\mathrm{e}^{-2t}}\\Bigr)^{1\/2} \\exp\\Bigr(-\\frac{|x|^2\\mathrm{e}^{-2t}}{4(2-\\mathrm{e}^{-2t})}\\Bigr). \\]<\/p>\n<p style=\"text-align: justify;\">This quantity tends to \\( {0} \\) as \\( {t\\rightarrow\\infty} \\). If \\( {|x|^2=x_1^2+\\cdots+x_n^2\\sim cn} \\) then this happens, as \\( {n} \\) is large, near the critical value \\( {t=\\frac{1}{2}\\log(n)} \\), for which \\( {\\mathrm{e}^{-2t}=1\/n} \\). More information about cutoffs phenomena for Ornstein-Uhlenbeck and diffusions is available in the papers below.<\/p>\n<p style=\"text-align: justify;\"><b>Further reading<\/b><\/p>\n<ul>\n<li>David Pollard<br \/> <a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=1873379\">A user's guide to measure theoretic probability<\/a><br \/> Cambridge University Press (2002)<\/li>\n<li>B\u00e9atrice Lachaud<br \/> <a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=2203823\">Cut-off and hitting times of a sample of Ornstein-Uhlenbeck processes and its average<\/a><br \/> Journal of Applied Probability 42(4) 1069-1080 (2005)<\/li>\n<li>Laurent Saloff-Coste<br \/> <a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=1306030\">Precise estimates on the rate at which certain diffusions tend to equilibrium<\/a><br \/> 1994 Mathematische Zeitschrift 217(4) 641-677 (1994)<\/li>\n<li>Aad van der Vaart<br \/> <a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=1652247\">Asymptotic statistics<\/a><br \/> Cambridge University Press (1998)<\/li>\n<li>Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khasminskii<br \/> <a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=620321\">Statistical estimation Asymptotic theory<\/a><br \/> Springer (1981)<\/li>\n<li>Leandro Pardo<br \/> <a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=2183173\">Statistical Inference Based on Divergence Measures<\/a><br \/> Chapman &amp; Hall (2006)<\/li>\n<li>Djalil Chafa\u00ef et Florent Malrieu<br \/> <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-01897577\">Recueil de mod\u00e8les al\u00e9atoires<\/a><br \/> Springer (2015)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This tiny post is devoted to the Hellinger distance and affinity. Hellinger. Let \\( {\\mu} \\) and \\( {\\nu} \\) be probability measures with respective&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2020\/01\/22\/about-the-hellinger-distance\/\">Continue reading<span class=\"screen-reader-text\">About the Hellinger distance<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":4829},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/12095"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=12095"}],"version-history":[{"count":20,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/12095\/revisions"}],"predecessor-version":[{"id":15818,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/12095\/revisions\/15818"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=12095"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=12095"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=12095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}