{"id":14799,"date":"2021-02-22T14:03:38","date_gmt":"2021-02-22T13:03:38","guid":{"rendered":"https:\/\/djalil.chafai.net\/blog\/?p=14799"},"modified":"2022-04-24T19:58:21","modified_gmt":"2022-04-24T17:58:21","slug":"fisher-information-between-two-gaussians","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2021\/02\/22\/fisher-information-between-two-gaussians\/","title":{"rendered":"Fisher information between two Gaussians"},"content":{"rendered":"<figure id=\"attachment_14800\" aria-describedby=\"caption-attachment-14800\" style=\"width: 230px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Ronald_Fisher\"><img loading=\"lazy\" class=\"wp-image-14800 size-medium\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2021\/02\/fisher1951-230x300.jpg\" alt=\"Photo of Ronald Aylmer Fisher\" width=\"230\" height=\"300\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2021\/02\/fisher1951-230x300.jpg 230w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2021\/02\/fisher1951-788x1030.jpg 788w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2021\/02\/fisher1951-768x1004.jpg 768w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2021\/02\/fisher1951-1175x1536.jpg 1175w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2021\/02\/fisher1951.jpg 1400w\" sizes=\"(max-width: 230px) 100vw, 230px\" \/><\/a><figcaption id=\"caption-attachment-14800\" class=\"wp-caption-text\">Ronald A. Fisher (1890 - 1962) in 1951.<\/figcaption><\/figure>\n<p style=\"text-align: justify;\"><b>Fisher information.<\/b> The Fisher information or divergence of a positive Borel measure measure \\( {\\nu} \\) with respect to another one \\( {\\mu} \\) on the same space is<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Fisher}(\\nu\\mid\\mu) =\\int\\left|\\nabla\\log\\textstyle\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\right|^2\\mathrm{d}\\nu =\\int\\frac{|\\nabla\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}|^2}{\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}}\\mathrm{d}\\mu =4\\int\\left|\\nabla\\sqrt{\\textstyle\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}}\\right|^2\\mathrm{d}\\mu %=4\\int\\left|\\nabla\\sqrt{\\varphi}\\right|^2\\mathrm{d}\\mu %=\\int_{\\{\\varphi&gt;0\\}}\\frac{\\left|\\nabla \\varphi\\right|^2}{\\varphi}\\mathrm{d}\\mu \\in[0,+\\infty] \\]<\/p>\n<p style=\"text-align: justify;\">if \\( {\\nu} \\) is absolutey continuous with respect to \\( {\\mu} \\), and \\( {\\mathrm{Fisher}(\\nu\\mid\\mu)=+\\infty} \\) otherwise.<\/p>\n<p style=\"text-align: justify;\">It plays a role in the analysis and geometry of statistics, information, partial differential equations, and Markov diffusion stochastic processes. It is named after <a href= \"https:\/\/en.wikipedia.org\/wiki\/Ronald_Fisher\">Ronald Aylmer Fisher (1890 \u2013 1962)<\/a>, a British scientist who is also the Fisher of many other objects and concepts including for instance:<\/p>\n<ul>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Fisher_information\">Fisher information of a statistical model<\/a> and the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Fisher_information_metric\">Fisher information metric<\/a>,<\/li>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Fisher's_exact_test\">Fisher exact statistical test<\/a>,<\/li>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/F-distribution\">Fisher F probability distribution<\/a>,<\/li>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Generalized_extreme_value_distribution\"> Fisher--Tippett extreme value probability distribution<\/a>,<\/li>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Fisher's_equation\">Fisher--Kolmogorov--Petrovsky--Piskunov (FKPP) partial differential equation<\/a>,<\/li>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Genetic_drift#Wright-Fisher_model\">Wright-Fisher<\/a> model for genetic drift,<\/li>\n<li>the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Fisher's_principle\">Fisher principle<\/a> in evolutionary biology.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">However, he should not be confused with for instance:<\/p>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Irving_Fisher\">Irving Fisher (1867 -- 1947)<\/a> and the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Fisher_equation\">Fisher equation in mathematical finance<\/a>,<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Michael_Fisher\">Michael Fisher (1931 -- )<\/a> who has contributions to equilibrium statistical mechanics,<\/li>\n<li><a href= \"https:\/\/en.wikipedia.org\/wiki\/Ernst_Sigismund_Fischer\">Ernst Sigismund Fischer (1875 -- 1954)<\/a> related to the <a href= \"https:\/\/en.wikipedia.org\/wiki\/Min-max_theorem\">Courant--Fischer--Weyl minimax variational formulas for eigenvalues<\/a> and to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Riesz%E2%80%93Fischer_theorem\"> Riesz-Fischer theorem<\/a><\/li>\n<\/ul>\n<p style=\"text-align: justify;\">Let us denote \\( {|x|=\\sqrt{x_1^2+\\cdots+x_n^2}} \\) and \\( {x\\cdot y=x_1y_1\\cdots+x_ny_n} \\) for all \\( {x,y\\in\\mathbb{R}^n} \\).<\/p>\n<p style=\"text-align: justify;\"><b>Explicit formula for Gaussians.<\/b> For all \\( {n\\geq1} \\), all vectors \\( {m_1,m_2\\in\\mathbb{R}^n} \\), and all \\( {n\\times n} \\) covariance matrices \\( {\\Sigma_1} \\) and \\( {\\Sigma_2} \\), we have<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Fisher}(\\mathcal{N}(m_1,\\Sigma_1)\\mid\\mathcal{N}(m_2,\\Sigma_2)) =|\\Sigma_2^{-1}(m_1-m_2)|^2+\\mathrm{Tr}(\\Sigma_2^{-2}\\Sigma_1-2\\Sigma_2^{-1}+\\Sigma_1^{-1}). \\]<\/p>\n<p style=\"text-align: justify;\">When \\( {\\Sigma_1} \\) and \\( {\\Sigma_2} \\) commute, this reduces to the following, closer to the univariate case,<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Fisher}(\\mathcal{N}(m_1,\\Sigma_1)\\mid\\mathcal{N}(m_2,\\Sigma_2)) =|\\Sigma_2^{-1}(m_1-m_2)|^2+\\mathrm{Tr}(\\Sigma_2^{-2}(\\Sigma_2-\\Sigma_1)^2\\Sigma_1^{-1}). \\]<\/p>\n<p style=\"text-align: justify;\">In the univariate case, this reads, for all \\( {m_1,m_2\\in\\mathbb{R}} \\) and \\( {\\sigma_1^2,\\sigma_2^2\\in(0,\\infty)} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Fisher}(\\mathcal{N}(m_1,\\sigma_1^2)\\mid\\mathcal{N}(m_2,\\sigma_2^2)) =\\frac{(m_1-m_2)^2}{\\sigma_2^4}+\\frac{(\\sigma_2^2-\\sigma_1^2)^2}{\\sigma_1^2\\sigma_2^4}. \\]<\/p>\n<p style=\"text-align: justify;\"><b>A proof.<\/b> If \\( {X\\sim\\mathcal{N}(m,\\Sigma)} \\) then, for all \\( {1\\leq i,j\\leq n} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{E}(X_iX_j)=\\Sigma_{ij}+m_im_j, \\]<\/p>\n<p style=\"text-align: justify;\">hence, for all \\( {n\\times n} \\) symmetric matrices \\( {A} \\) and \\( {B} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\begin{array}{rcl} \\mathbb{E}(AX\\cdot BX) &=&\\mathbb{E}\\sum_{i,j,k=1}^nA_{ij}X_jB_{ik}X_k\\\\ &=&\\sum_{i,j,k=1}^nA_{ij}B_{ik}\\mathbb{E}(X_jX_k)\\\\ &=&\\sum_{i,j,k=1}^nA_{ij}B_{ik}(\\Sigma_{jk}+m_jm_k)\\\\ &=&\\mathrm{Trace}(A\\Sigma B)+Am\\cdot Bm, \\end{array} \\]<\/p>\n<p style=\"text-align: justify;\">and thus for all \\( {n} \\)-dimensional vectors \\( {a} \\) and \\( {b} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\begin{array}{rcl} \\mathbb{E}(A(X-a)\\cdot B(X-b)) &=&\\mathbb{E}(AX\\cdot BX)+A(m-a)\\cdot B(m-b)-Am\\cdot Bm\\\\ &=&\\mathrm{Trace}(A\\Sigma B)+A(m-a)\\cdot B(m-b). \\end{array} \\]<\/p>\n<p style=\"text-align: justify;\">Now, using the notation \\( {q_i(x)=\\Sigma_i^{-1}(x-m_i)\\cdot(x-m_i)} \\) and \\( {|\\Sigma_i|=\\det(\\Sigma_i)} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\begin{array}{rcl} \\mathrm{Fisher}(\\Gamma_1\\mid\\Gamma_2) &=&\\displaystyle4\\frac{\\sqrt{|\\Sigma_2|}}{\\sqrt{|\\Sigma_1|}}\\int\\Bigr|\\nabla\\mathrm{e}^{-\\frac{q_1(x)}{4}+\\frac{q_2(x)}{4}}\\Bigr|^2\\frac{\\mathrm{e}^{-\\frac{q_2(x)}{2}}}{\\sqrt{2\\pi|\\Sigma_2|}}\\mathrm{d}x\\\\ &=&\\displaystyle\\int|\\Sigma_2^{-1}(x-m_2)-\\Sigma_1^{-1}(x-m_1)|^2\\frac{\\mathrm{e}^{-\\frac{q_1(x)}{2}}}{\\sqrt{2\\pi|\\Sigma_1|}}\\mathrm{d}x\\\\ &=&\\displaystyle\\int(|\\Sigma_2^{-1}(x-m_2)|^2\\\\ &&\\qquad-2\\Sigma_2^{-1}(x-m_2)\\cdot\\Sigma_1^{-1}(x-m_1)\\\\ &&\\qquad+|\\Sigma_1^{-1}(x-m_1)|^2)\\frac{\\mathrm{e}^{-\\frac{q_1(x)}{2}}}{\\sqrt{2\\pi|\\Sigma_1|}}\\mathrm{d}x\\\\ &=&\\mathrm{Trace}(\\Sigma_2^{-1}\\Sigma_1\\Sigma_2^{-1})+|\\Sigma_2^{-1}(m_1-m_2)|^2-2\\mathrm{Trace}(\\Sigma_2^{-1})+\\mathrm{Trace}(\\Sigma_1^{-1})\\\\ &=&\\mathrm{Trace}(\\Sigma_2^{-2}\\Sigma_1-2\\Sigma_2^{-1}+\\Sigma_1^{-1})+|\\Sigma_2^{-1}(m_1-m_2)|^2. \\end{array} \\]<\/p>\n<p style=\"text-align: justify;\">The formula when \\( {\\Sigma_1\\Sigma_2=\\Sigma_2\\Sigma_1} \\) follows immediately.<\/p>\n<p style=\"text-align: justify;\"><b>Other distances.<\/b> Recall that the <em><a href= \"https:\/\/en.wikipedia.org\/wiki\/Ernst_Hellinger\">Hellinger<\/a> distance<\/em> between probability measures \\( {\\mu} \\) and \\( {\\nu} \\) with densities \\( {f_\\mu} \\) and \\( {f_\\nu} \\) with respect to the same reference measure \\( {\\lambda} \\) is<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Hellinger}(\\mu,\\nu) =\\Bigr(\\int(\\sqrt{f_\\mu}-\\sqrt{f_\\nu})^2\\mathrm{d}\\lambda\\Bigr)^{1\/2} =\\Bigr(2-2\\int\\sqrt{f_\\mu f_\\nu}\\mathrm{d}\\lambda\\Bigr)^{1\/2} \\in[0,\\sqrt{2}]. \\]<\/p>\n<p style=\"text-align: justify;\">This quantity does not depend on the choice of \\( {\\lambda} \\).<\/p>\n<p style=\"text-align: justify;\">The \\( {\\chi^2} \\) divergence (inappropriately called distance) is defined by<\/p>\n<p style=\"text-align: center;\">\\[ \\chi^2(\\nu\\mid\\mu) =\\mathrm{Var}_\\mu\\Bigr(\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\Bigr) =\\Bigr\\|\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}-1\\Bigr\\|_{\\mathrm{L}^2(\\mu)} =\\Bigr\\|\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\Bigr\\|_{\\mathrm{L}^2(\\mu)}-1 \\in[0,+\\infty]. \\]<\/p>\n<p style=\"text-align: justify;\">The <em><a href= \"https:\/\/en.wikipedia.org\/wiki\/Solomon_Kullback\">Kullback<\/a>--<a href=\"https:\/\/en.wikipedia.org\/wiki\/Richard_Leibler\">Leibler<\/a> divergence or relative entropy<\/em> is defined by<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Kullback}(\\nu\\mid\\mu) =\\int\\log{\\textstyle\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}}\\mathrm{d}\\nu =\\int{\\textstyle\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}} \\log{\\textstyle\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}}\\mathrm{d}\\mu \\in[0,+\\infty] \\]<\/p>\n<p style=\"text-align: justify;\">if \\( {\\nu} \\) is absolutey continuous with respect to \\( {\\mu} \\), and \\( {\\mathrm{Kullback}(\\nu\\mid\\mu)=+\\infty} \\) otherwise.<\/p>\n<p style=\"text-align: justify;\">The <em><a href= \"https:\/\/en.wikipedia.org\/wiki\/Leonid_qVaser%C5%A1te%C4%ADn\">Wasserstein<\/a>--<a href=\"https:\/\/en.wikipedia.org\/wiki\/Leonid_Kantorovich\">Kantorovich<\/a>--<a href=\"https:\/\/en.wikipedia.org\/wiki\/Gaspard_Monge\">Monge<\/a> transportation distance<\/em> of order \\( {2} \\) and with respect to the underlying Euclidean distance is defined for all probability measures \\( {\\mu} \\) and \\( {\\nu} \\) on \\( {\\mathbb{R}^n} \\) by<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Wasserstein}(\\mu,\\nu)=\\Bigr(\\inf_{(X,Y)}\\mathbb{E}(\\left|X-Y\\right|^2)\\Bigr)^{1\/2} \\in[0,+\\infty] \\ \\ \\ \\ \\ (1) \\]<\/p>\n<p style=\"text-align: justify;\">where the inf runs over all couples \\( {(X,Y)} \\) with \\( {X\\sim\\mu} \\) and \\( {Y\\sim\\nu} \\).<\/p>\n<p style=\"text-align: justify;\">Now, for all \\( {n\\geq1} \\), \\( {m_1,m_2\\in\\mathbb{R}^n} \\), and all \\( {n\\times n} \\) covariance matices \\( {\\Sigma_1,\\Sigma_2} \\), denoting<\/p>\n<p style=\"text-align: center;\">\\[ \\Gamma_1=\\mathcal{N}(\\mu_1,\\Sigma_1) \\quad\\mbox{and}\\quad \\Gamma_2=\\mathcal{N}(\\mu_2,\\Sigma_2), \\]<\/p>\n<p style=\"text-align: justify;\">we have, with \\( {m=m_1-m_2} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\begin{array}{rcl} \\mathrm{Hellinger}^2(\\Gamma_1,\\Gamma_2) &=&2-2\\frac{\\det(\\Sigma_1\\Sigma_2)^{1\/4}}{\\det(\\frac{\\Sigma_1+\\Sigma_2}{2})^{1\/2}}\\mathrm{exp}\\Bigr(-\\frac{1}{4}(\\Sigma_1+\\Sigma_2)^{-1}m\\cdot m\\Bigr)\\\\ \\chi^2(\\Gamma_1\\mid\\Gamma_2) &=&\\sqrt{\\frac{|\\Sigma_2|}{|2\\Sigma_1-\\Sigma_1^2\\Sigma_2^{-1}|}}\\exp\\Bigr(\\frac{1}{2}\\Sigma_2^{-1}(\\mathrm{I}_n+(2\\Sigma_1^{-1}\\Sigma_2^{-1}-\\Sigma_2^{-2}))m\\cdot m\\Bigr)-1\\\\ 2\\mathrm{Kullback}(\\Gamma_1\\mid\\Gamma_2) &=&\\Sigma_2^{-1}m\\cdot m+\\mathrm{Tr}(\\Sigma_2^{-1}\\Sigma_1-\\mathrm{I}_n)+\\log\\det(\\Sigma_2\\Sigma_1^{-1})\\\\ \\mathrm{Fisher}(\\Gamma_1\\mid\\Gamma_2) &=&|\\Sigma_2^{-1}m|^2+\\mathrm{Tr}(\\Sigma_2^{-2}\\Sigma_1-2\\Sigma_2^{-1}+\\Sigma_1^{-1})\\\\ \\mathrm{Wasserstein}^2(\\Gamma_1,\\Gamma_2) &=&|m|^2+\\mathrm{Tr}\\Bigr(\\Sigma_1+\\Sigma_2-2\\sqrt{\\sqrt{\\Sigma_1}\\Sigma_2\\sqrt{\\Sigma_1}}\\Bigr), \\end{array} \\]<\/p>\n<p style=\"text-align: justify;\">and if \\( {\\Sigma_1} \\) and \\( {\\Sigma_2} \\) commute, \\( {\\Sigma_1\\Sigma_2=\\Sigma_2\\Sigma_1} \\), then we find the simpler formulas<\/p>\n<p style=\"text-align: center;\">\\[ \\begin{array}{rcl} \\mathrm{Fisher}(\\Gamma_1\\mid\\Gamma_2) &=&|\\Sigma_2^{-1}(m_1-m_2)|^2+\\mathrm{Tr}(\\Sigma_2^{-2}(\\Sigma_2-\\Sigma_1)^2\\Sigma_1^{-1})\\\\ \\mathrm{Wasserstein}^2(\\Gamma_1,\\Gamma_2) &=&|m_1-m_2|^2+\\mathrm{Tr}((\\sqrt{\\Sigma_1}-\\sqrt{\\Sigma_2})^2). \\end{array} \\]<\/p>\n<p style=\"text-align: justify;\"><b>Fisher as an infinitesimal Kullback.<\/b> The <em><a href= \"https:\/\/en.wikipedia.org\/wiki\/Ludwig_Boltzmann\">Boltzmann<\/a>--<a href=\"https:\/\/en.wikipedia.org\/wiki\/Claude_Shannon\">Shannon<\/a> entropy<\/em> is in a sense the opposite of the Kullback divergence with respect to the Lebesgue measure \\( {\\lambda} \\), namely<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Entropy}(\\mu) =-\\int\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda} \\log\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\mathrm{d}\\lambda =\\mathrm{Kullback}(\\mu\\mid\\lambda). \\]<\/p>\n<p style=\"text-align: justify;\">It was discovered by <em><a href= \"https:\/\/en.wikipedia.org\/wiki\/Nicolaas_Govert_de_Bruijn\">Nicolaas Govert de Bruijn<\/a><\/em> (1918 -- 2012) that the Fisher information appears as the differential version of the entropy under Gaussian noise. More precisely, it states that if \\( {X} \\) is a random vector of \\( {\\mathbb{R}^n} \\) with finite entropy and if \\( {Z\\sim\\mathcal{N}(0,I_n)} \\) then<\/p>\n<p style=\"text-align: center;\">\\[ \\frac{\\mathrm{d}}{\\mathrm{d}t}\\Bigr\\vert_{t=0} \\mathrm{Entropy}(\\mathrm{Law}(X+\\sqrt{t}Z)\\mid\\lambda) =-\\mathrm{Fisher}(\\mathrm{Law}(X)\\mid\\lambda). \\]<\/p>\n<p style=\"text-align: justify;\">In other words, if \\( {\\mu_t} \\) is the law at time \\( {t} \\) of an \\( {n} \\)-dimensional Brownian motion started from a random initial condition \\( {X} \\) then<\/p>\n<p style=\"text-align: center;\">\\[ \\frac{\\mathrm{d}}{\\mathrm{d}t}\\Bigr\\vert_{t=0} \\mathrm{Entropy}(\\mu_t\\mid\\lambda) =-\\mathrm{Fisher}(\\mu_0\\mid\\lambda). \\]<\/p>\n<p style=\"text-align: justify;\">The Lebesgue measure is the invariant (and reversible) measure of Brownian motion. More generally, let us consider the stochastic differential equation<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{d}X_t=\\sqrt{2}\\mathrm{d}B_t-\\nabla V(X_t)\\mathrm{d}t \\]<\/p>\n<p style=\"text-align: justify;\">on \\( {\\mathbb{R}^n} \\), where \\( {V:\\mathbb{R}^n\\mapsto\\mathbb{R}} \\) is \\( {\\mathcal{C}^2} \\) and where \\( {{(B_t)}_{t\\geq0}} \\) is a standard Brownian motion. If we assume that \\( {V-\\frac{\\rho}{2}\\left|\\cdot\\right|^2} \\) is convex for some \\( {\\rho\\in\\mathbb{R}} \\) then it admits a solution \\( {{(X_t)}_{t\\geq0}} \\) known as the overdamped Langevin process, which is a Markov diffusion process. If we further assume that \\( {\\mathrm{e}^{-V}} \\) is integrable with respect to the Lebesgue measure, then the probability measure \\( {\\mu} \\) with density proportional to \\( {\\mathrm{e}^{-V}} \\) is invariant and reversible. Now, denoting \\( {\\mu_t=\\mathrm{Law}(X_t)} \\), the analogue of the De Bruijn identity reads, for all \\( {t\\geq0} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\frac{\\mathrm{d}}{\\mathrm{d}t} \\mathrm{Kullback}(\\mu_t\\mid\\mu) =-\\mathrm{Fisher}(\\mu_t\\mid\\mu) \\]<\/p>\n<p style=\"text-align: justify;\">but this requires that \\( {\\mu_0} \\) is chosen in such a way that \\( {t\\mapsto\\mathrm{Kullback}(\\mu_t\\mid\\mu)} \\) is well defined and differentiable. This condition is easily checked in the example of the <em><a href= \"https:\/\/en.wikipedia.org\/wiki\/Leonard_Ornstein\">Ornstein<\/a>--<a href=\"https:\/\/en.wikipedia.org\/wiki\/George_Uhlenbeck\">Uhlenbeck<\/a> process<\/em> which corresponds to \\( {V=\\frac{1}{2}\\left|\\cdot\\right|^2} \\) and for which \\( {\\mu=\\mathcal{N}(0,I_n)} \\).<\/p>\n<p style=\"text-align: justify;\"><b>Ornstein--Uhlenbeck.<\/b> If \\( {{(X_t^x)}_{t\\geq0}} \\) is an \\( {n} \\)-dimensional Ornstein--Uhlenbeck process solution of the stochastic differential equation<\/p>\n<p style=\"text-align: center;\">\\[ X_0^x=x\\in\\mathbb{R}^n, \\quad\\mathrm{d}X^x_t=\\sqrt{2}\\mathrm{d}B_t-X^x_t\\mathrm{d}t \\]<\/p>\n<p style=\"text-align: justify;\">where \\( {{(B_t)}_{t\\geq0}} \\) is a standard \\( {n} \\)-dimensional Brownian motion, then the invariant law is \\( {\\gamma=\\mathcal{N}(0,I_n)} \\) and the Mehler formula reads<\/p>\n<p style=\"text-align: center;\">\\[ X^x_t=x\\mathrm{e}^{-t}+\\int_0^t\\mathrm{e}^{s-t}\\mathrm{d}B_s\\sim\\mathcal{N}(x\\mathrm{e}^{-t},(1-\\mathrm{e}^{-2t})I_n), \\]<\/p>\n<p style=\"text-align: justify;\">and the explicit formula for the Fisher information for Gaussians gives<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Fisher}(\\mathrm{Law}(X^x_t)\\mid\\gamma) =\\mathrm{Fisher}(\\mathcal{N}(x\\mathrm{e}^{-t},(1-\\mathrm{e}^{-2t})I_n)\\mid\\gamma) =|x|^2\\mathrm{e}^{-2t}+n\\frac{\\mathrm{e}^{-4t}}{1-\\mathrm{e}^{-2t}}. \\]<\/p>\n<p style=\"text-align: justify;\"><b>Log-Sobolev inequality.<\/b> The optimal log-Sobolev inequality for \\( {\\mu=\\mathcal{N}(0,\\mathrm{I}_n)} \\) writes<\/p>\n<p style=\"text-align: center;\">\\[ \\mathrm{Kullback}(\\nu\\mid\\mu) \\leq\\frac{1}{2}\\mathrm{Fisher}(\\nu\\mid\\mu) \\]<\/p>\n<p style=\"text-align: justify;\">for all probability measure \\( {\\nu} \\) on \\( {\\mathbb{R}^n} \\), and equality is achieved when \\( {\\log\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}} \\) is linear, namely when \\( {\\nu=\\mathcal{N}(m,\\mathrm{I}_n)} \\) for some \\( {m\\in\\mathbb{R}^n} \\). By using the Gaussian formulas above for Kullback and Fisher, this log-Sobolev inequality boils down when \\( {\\nu=\\mathcal{N}(m,\\Sigma_1)} \\) to<\/p>\n<p style=\"text-align: center;\">\\[ \\log\\det(\\Sigma_1^{-1})\\leq\\mathrm{Tr}(\\Sigma_1^{-1}-\\mathrm{I}_n). \\]<\/p>\n<p style=\"text-align: justify;\">Taking \\( {K=\\Sigma_1^{-1}} \\) shows that this is a matrix version of \\( {\\log(x)\\leq x-1} \\), nothing else.<\/p>\n<p style=\"text-align: justify;\"><b>Note.<\/b> This post was written while working on<\/p>\n<ul>\n<li><a href=\"http:\/\/arxiv.org\/abs\/2107.14452\">Universal cutoff for Dyson Ornstein Uhlenbeck process<\/a><br \/> By <a href= \"https:\/\/djalil.chafai.net\/scripts\/search.php?q=Jeanne+Boursier+mathematics\"> Boursier<\/a>, <a href=\"https:\/\/djalil.chafai.net\/\">Chafa\u00ef<\/a>, and <a href= \"https:\/\/djalil.chafai.net\/scripts\/search.php?q=Cyril+Labb%C3%A9+mathematics\"> Labb\u00e9<\/a> (2021)<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><b>Further reading.<\/b><\/p>\n<ul>\n<li><a href= \"https:\/\/djalil.chafai.net\/blog\/2020\/01\/22\/about-the-hellinger-distance\/\"> About the Hellinger distance<\/a><br \/> (on this blog)<\/li>\n<li><a href= \"https:\/\/djalil.chafai.net\/blog\/2010\/04\/30\/wasserstein-distance-between-two-gaussians\/\"> Wasserstein distance between two Gaussians<\/a><br \/> (on this blog)<\/li>\n<li><a href= \"https:\/\/djalil.chafai.net\/blog\/2016\/02\/13\/aspects-of-the-ornstein-uhlenbeck-process\/\"> Aspects of the Ornstein-Uhlenbeck process<\/a><br \/> (on this blog)<\/li>\n<li><a href=\"https:\/\/zbmath.org\/?q=an:00049698\">Probability metrics and the stability of stochastic models<\/a><br \/> by Svetlozar T. Rachev (1991)<\/li>\n<li><a href=\"https:\/\/zbmath.org\/?q=an:02124714\">On choosing and bounding probability metrics<\/a><br \/> by Alison L. Gibbs and Francis Edward Su (2002)<\/li>\n<li><a href= \"https:\/\/mathscinet.ams.org\/mathscinet-getitem?mr=2183173\">Statistical inference based on divergence measures<\/a><br \/> by Leandro Pardo (2006)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Fisher information. The Fisher information or divergence of a positive Borel measure measure \\( {\\nu} \\) with respect to another one \\( {\\mu} \\) on&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2021\/02\/22\/fisher-information-between-two-gaussians\/\">Continue reading<span class=\"screen-reader-text\">Fisher information between two Gaussians<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1577},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/14799"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=14799"}],"version-history":[{"count":64,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/14799\/revisions"}],"predecessor-version":[{"id":16080,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/14799\/revisions\/16080"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=14799"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=14799"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=14799"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}