{"id":15957,"date":"2022-04-02T23:04:06","date_gmt":"2022-04-02T21:04:06","guid":{"rendered":"https:\/\/djalil.chafai.net\/blog\/?p=15957"},"modified":"2022-04-14T12:48:17","modified_gmt":"2022-04-14T10:48:17","slug":"boltzmann-gibbs-entropic-variational-principle","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2022\/04\/02\/boltzmann-gibbs-entropic-variational-principle\/","title":{"rendered":"Boltzmann-Gibbs entropic variational principle"},"content":{"rendered":"<figure id=\"attachment_16015\" aria-describedby=\"caption-attachment-16015\" style=\"width: 223px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Nicolas_L%C3%A9onard_Sadi_Carnot\"><img loading=\"lazy\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-223x300.jpeg\" alt=\"Nicolas L\u00e9onard Sadi Carnot (1796 - 1932)\" width=\"223\" height=\"300\" class=\"size-medium wp-image-16015\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-223x300.jpeg 223w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-766x1030.jpeg 766w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-768x1033.jpeg 768w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot.jpeg 956w\" sizes=\"(max-width: 223px) 100vw, 223px\" \/><\/a><figcaption id=\"caption-attachment-16015\" class=\"wp-caption-text\">Nicolas L\u00e9onard Sadi Carnot (1796 - 1932), an \u00c9variste Galois of Physics.<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">The aim of this short post is to explain why the maximum entropy principle could be better seen as a minimum relative entropy principle, in other words an entropic projection.<\/p>\n<p style=\"text-align: justify;\"><strong>Relative entropy.<\/strong> Let $\\lambda$ be a reference measure on some measurable space $E$. The <strong>relative entropy<\/strong> with respect to $\\lambda$ is defined for every measure $\\mu$ on $E$ with density $\\mathrm{d}\\mu\/\\mathrm{d}\\lambda$ by $$\\mathrm{H}(\\mu\\mid\\lambda):=\\int\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\log\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\mathrm{d}\\lambda.$$ If the integral is not well defined, we could simply set $\\mathrm{H}(\\mu\\mid\\lambda):=+\\infty$.\n<\/p>\n<ul>\n<li style=\"text-align: justify;\">An important case is when $\\lambda$ is a probability measure. In this case $\\mathrm{H}$ becomes the <strong>Kullback-Leibler divergence<\/strong>, and the Jensen inequality for the strictly convex function $u\\mapsto u\\log(u)$ indicates then that $\\mathrm{H}(\\mu\\mid\\lambda)\\geq0$ with equality if and only if $\\mu=\\lambda$.<\/li>\n<li style=\"text-align: justify;\">Another important case is when $\\lambda$ is the Lebesgue measure on $\\mathbb{R}^n$ or the counting measure on a discrete set, then $$-\\mathrm{H}(\\mu\\mid\\lambda)$$ is the <strong>Boltzmann-Shannon entropy of $\\mu$<\/strong>. Beware that when $E=\\mathbb{R}^n$, this entropy takes its values in the whole $(-\\infty,+\\infty)$ since for all positive scale factor $\\sigma>0$, denoting $\\mu_\\sigma$ the push forward of $\\mu$ by the dilation $x\\mapsto\\sigma x$, we have $$\\mathrm{H}(\\mu_\\sigma\\mid\\lambda)=\\mathrm{H}(\\mu\\mid\\lambda)-n\\log \\sigma.$$<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><strong>Boltzmann-Gibbs probability measures.<\/strong> Such a probability measure $\\mu_{V,\\beta}$ takes the form $$\\mathrm{d}\\mu_{V,\\beta}:=\\frac{\\mathrm{e}^{-\\beta V}}{Z_{V,\\beta}}\\mathrm{d}\\lambda$$ where $V:E\\mapsto(-\\infty,+\\infty]$, $\\beta\\in[0,+\\infty)$, and $$Z_{V,\\beta}:=\\int\\mathrm{e}^{-\\beta V}\\mathrm{d}\\lambda<\\infty$$ is the normalizing factor. The more $\\beta$ is large, the more $\\mu_{V,\\beta}$ puts its probability mass on the regions where $V$ is low. The corresponding asymptotic analysis, known as the <strong>Laplace method<\/strong>, states that as $\\beta\\to\\infty$ the probability measure $\\mu_{V,\\beta}$ concentrates on the minimizers of $V$.<\/p>\n<p style=\"text-align: justify;\">The mean of $V$ or $V$-moment of $\\mu_{V,\\beta}$ writes<br \/>\n$$<br \/>\n\\int V\\mathrm{d}\\mu_{V,\\beta}<br \/>\n=-\\frac{1}{\\beta}\\mathrm{H}(\\mu_{V,\\beta}\\mid\\lambda)-\\frac{1}{\\beta}\\log Z_{V,\\beta}.<br \/>\n$$<br \/>\nIn thermodynamics $-\\frac{1}{\\beta}\\log Z_{V,\\beta}$ appears as a <strong>Helmholtz free energy<\/strong> since it is equal to $\\int V\\mathrm{d}\\mu_{V,\\beta}$ (mean energy) minus $\\frac{1}{\\beta}\\times-\\mathrm{H}(\\mu_{V,\\beta}\\mid\\lambda)$ (temperature times entropy).<\/p>\n<p style=\"text-align: justify;\">When $\\beta$ ranges from $-\\infty$ to $\\infty$, the $V$-moment of $\\mu_{V,\\beta}$ ranges from $\\sup V$ downto $\\inf V$, and $$\\partial_\\beta\\int V\\mathrm{d}\\mu_{V,\\beta}=\\Bigr(\\int V\\mathrm{d}\\mu_{V,\\beta}\\Bigr)^2-\\int V^2\\mathrm{d}\\mu_{V,\\beta}\\leq0.$$ If $\\lambda(E)<\\infty$ then $\\mu_{V,0}=\\frac{1}{\\lambda(E)}\\lambda$ and its $V$-moment is $\\frac{1}{\\lambda(E)}\\int V\\mathrm{d}\\lambda$.<\/p>\n<p style=\"text-align: justify;\"><strong>Variational principle.<\/strong> Let $\\beta\\geq0$ such that $Z_{V,\\beta}<\\infty$ and $c:=\\int V\\mathrm{d}\\mu_{V,\\beta}<\\infty$. Then, among all the probability measures $\\mu$ on $E$ with same $V$-moment as $\\mu_{V,\\beta}$, the relative entropy $\\mathrm{H}(\\mu\\mid\\lambda)$ is minimized by the Boltzmann-Gibbs measures $\\mu_{V,\\beta}$. In other words,$$\\min_{\\int V\\mathrm{d}\\mu=c}\\mathrm{H}(\\mu\\mid\\lambda)=\\mathrm{H}(\\mu_{V,\\beta}\\mid\\lambda).$$<\/p>\n<p style=\"text-align: justify;\">Indeed we have $$\\begin{align*}\\mathrm{H}(\\mu\\mid\\lambda)-\\mathrm{H}(\\mu_{V,\\beta}\\mid\\lambda)&=\\int\\log\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\mathrm{d}\\lambda-\\int\\log\\frac{\\mathrm{d}\\mu_{V,\\beta}}{\\mathrm{d}\\lambda}\\mathrm{d}\\mu_{V,\\beta}\\\\&=\\int\\log\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\mathrm{d}\\lambda+\\int(\\log(Z_{V,\\beta})+\\beta V)\\mathrm{d}\\mu_{V,\\beta}\\\\&=\\int\\log\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\mathrm{d}\\lambda+\\int(\\log(Z_{V,\\beta})+\\beta V)\\mathrm{d}\\mu\\\\&=\\int\\log\\frac{\\mathrm{d}\\mu}{\\mathrm{d}\\lambda}\\mathrm{d}\\lambda-\\int\\log\\frac{\\mathrm{d}\\mu_{V,\\beta}}{\\mathrm{d}\\lambda}\\mathrm{d}\\mu\\\\&=\\mathrm{H}(\\mu\\mid\\mu_{V,\\beta})\\geq0\\end{align*}$$ with equality if and only if $\\mu=\\mu_{V,\\beta}$. The crucial point is that $\\mu$ and $\\mu_{V,\\beta}$ are equal on test functions of the form $a+bV$ where $a,b$ are arbitrary real constants, by assumption.<\/p>\n<ul>\n<li style=\"text-align: justify;\">When $\\lambda$ is the Lebesgue measure on $\\mathbb{R}^n$ or the counting measure on a discrete set, we recover the usual <strong>maximum Boltzmann-Shannon entropy principe<\/strong> $$\\max_{\\int V\\mathrm{d}\\mu=c}-\\mathrm{H}(\\mu\\mid\\lambda)=-\\mathrm{H}(\\mu_{V,\\beta}).$$In particular, Gaussians maximize the Boltzmann-Shannon entropy under variance constraint (take for $V$ a quadratic form), while the uniform measures maximize the Boltzmann-Shannon entropy under support constraint (take $V$ constant on a set of finite measure for $\\lambda$, and infinity elsewere). Maximum entropy is minimum relative entropy with respect to Lebesgue or counting measure, a way to find, among the probability measures with a moment constraint, the closest to the Lebesgue or counting measure.<\/li>\n<li style=\"text-align: justify;\">When $\\lambda$ is a probability measure, then we recover the fact that the Boltzmann-Gibbs measures realize the <strong>projection or least Kullback-Leibler divergence<\/strong> of $\\lambda$ on the set of probability measures with a given $V$-moment. This is the <strong>Csisz\u00e1r $\\mathrm{I}$-projection<\/strong>.<\/li>\n<li style=\"text-align: justify;\">There are other interesiting applications, for instance when $\\lambda$ is a Poisson point process.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><strong>Note.<\/strong> The concept of <em>maximum entropy<\/em> was studied notably by<\/p>\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Rudolf_Clausius\">Rudolf Julius Emanuel Clausius (1822 - 1888)<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Ludwig_Boltzmann\">Ludwig Boltzmann (1844 - 1906)<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Hermann_von_Helmholtz\">Hermann von Helmholtz (1821 - 1894)<\/a><\/li>\n<li><a href=\"https:\/\/fr.wikipedia.org\/wiki\/Willard_Gibbs\">Josiah Willard Gibbs (1839 \u2013 1903)<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Claude_Shannon\">Claude Elwood Shannon (1916 - 2001)<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Solomon_Kullback\">Solomon Kullback (1907 - 1994)<\/a><\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Richard_Leibler\">Richard Leibler (1914 - 2003)<\/a><\/li>\n<\/ul>\n<p style=\"text-align:justify;\">and by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Edwin_Thompson_Jaynes\">Edwin Thompson Jaynes (1922 - 1998)<\/a> in relation with thermodynamics, statistical physics, statistical mechanics, information theory, and Bayesian statistics. The concept of I-projection or minimum relative entropy was studied notably by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Imre_Csisz%C3%A1r\">Imre Csisz\u00e1r (1938 - )<\/a>.<\/p>\n<p style=\"text-align: justify;\"><strong>Related.<\/strong><\/p>\n<ul>\n<li>On this blog<br \/>\n<a href=\"https:\/\/djalil.chafai.net\/blog\/2015\/03\/16\/entropy-ubiquity\/\">Entropy ubiquity<\/a><br \/>\nLPMO (2015)<\/li>\n<li>\nOlivier Darrigol, Roger Balian, Christian Maes, F\u00e9lix Ritort, Thibault Damour<br \/>\n<a href=\"http:\/\/www.bourbaphy.fr\/decembre2003.html\">L'entropie<\/a><br \/>\n<a href=\"http:\/\/www.bourbaphy.fr\/\">S\u00e9minaire Poincar\u00e9 or Bourbaphy, a Bourbaki of Physics<\/a> (2003)\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The aim of this short post is to explain why the maximum entropy principle could be better seen as a minimum relative entropy principle, in&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2022\/04\/02\/boltzmann-gibbs-entropic-variational-principle\/\">Continue reading<span class=\"screen-reader-text\">Boltzmann-Gibbs entropic variational principle<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1575},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/15957"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=15957"}],"version-history":[{"count":103,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/15957\/revisions"}],"predecessor-version":[{"id":16067,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/15957\/revisions\/16067"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=15957"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=15957"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=15957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}