{"id":19986,"date":"2024-04-06T15:12:35","date_gmt":"2024-04-06T13:12:35","guid":{"rendered":"https:\/\/djalil.chafai.net\/blog\/?p=19986"},"modified":"2024-04-11T07:44:47","modified_gmt":"2024-04-11T05:44:47","slug":"a-few-words-about-entropy","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2024\/04\/06\/a-few-words-about-entropy\/","title":{"rendered":"A few words about entropy"},"content":{"rendered":"<figure id=\"attachment_16015\" aria-describedby=\"caption-attachment-16015\" style=\"width: 223px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot.jpeg\"><img loading=\"lazy\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-223x300.jpeg\" alt=\"Nicolas L\u00e9onard Sadi Carnot (1796 - 1932)\" width=\"223\" height=\"300\" class=\"size-medium wp-image-16015\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-223x300.jpeg 223w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-766x1030.jpeg 766w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot-768x1033.jpeg 768w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2022\/04\/Sadi_Carnot.jpeg 956w\" sizes=\"(max-width: 223px) 100vw, 223px\" \/><\/a><figcaption id=\"caption-attachment-16015\" class=\"wp-caption-text\">Nicolas L\u00e9onard Sadi Carnot (1796 - 1932) A romantic figure behind entropy<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">Why and how entropy emerges in basic mathematics? This tiny post aims to provide some answers. We have already tried something in this spirit in a <a href=\"https:\/\/djalil.chafai.net\/blog\/2015\/03\/16\/entropy-ubiquity\/\">previous post<\/a> almost ten years ago.<\/p>\n<p style=\"text-align: justify;\"><strong>Combinatorics.<\/strong> Asymptotic analysis of the multinomial coefficient $\\binom{n}{n_1,\\ldots,n_r}:=\\frac{n!}{n_1!\\cdots n_r!}$ :<br \/>\n\\[<br \/>\n\\frac{1}{n}\\log\\binom{n}{n_1,\\ldots,n_r}<br \/>\n\\xrightarrow[n=n_1+\\cdots+n_r\\to\\infty]{\\nu_i=\\frac{n_i}{n}\\to p_i}<br \/>\n\\mathrm{S}(p):=-\\sum_{i=1}^rp_i\\log(p_i).<br \/>\n\\] Recall that if $A=\\{a_1,\\ldots,a_r\\}$ is a finite set of cardinal $r$ and $n=n_1+\\cdots+n_r$ then<br \/>\n\\[<br \/>\n\\mathrm{Card}\\Bigr\\{(x_1,\\ldots,x_n)\\in A^n:\\forall 1\\leq i\\leq r,\\sum_{k=1}^n\\mathbf{1}_{x_k=a_i}=n_i\\Bigr\\}=\\binom{n}{n_1,\\ldots,n_r}.<br \/>\n\\] The multinomial coefficient can be interpreted as the number of microstates $(x_1,\\ldots,x_n)$ compatible with the macrostate $(n_1,\\ldots,n_r)$, while the quantity $\\mathrm{S}(p)$ appears as a normalized asymptotic measure of additive degrees of freedom or disorder. This is already in the work of Ludwig Eduard Boltzmann (1844 -- 1906) in kinetic gas theory at the origins of statistical physics. The quantity $\\mathrm{S}(p)$ is also the one used by Claude Elwood Shannon (1916 -- 2001) in information and communication theory as the average length of optimal lossless coding. It is characterized by the following three natural axioms or properties, denoting $\\mathrm{S}^{(n)}$ to remember $n$ :<\/p>\n<ul>\n<li>for all $n\\geq1$, $p\\mapsto\\mathrm{S}^{(n)}(p)$ is continuous<\/li>\n<li>for all $n\\geq1$, $\\mathrm{S}^{(n)}(\\frac{1}{n},\\ldots,\\frac{1}{n})<\\mathrm{S}^{(n+1)}(\\frac{1}{n+1},\\ldots,\\frac{1}{n+1})$<\/li>\n<li>for all $n=n_1+\\cdots+n_r\\geq1$, $\\mathrm{S}^{(n)}(\\frac{1}{n},\\ldots,\\frac{1}{n})=\\mathrm{S}^{(r)}(\\frac{n_1}{n},\\ldots,\\frac{n_r}{n})+\\sum_{i=1}^r\\frac{n_i}{n}\\mathrm{S}^{(n_i)}(\\frac{1}{n_i},\\ldots,\\frac{1}{n_i}).$<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><strong>Probability.<\/strong> If $X_1,\\ldots,X_n$ are independent and identically distributed random variables of law $\\mu$ on a finite set or alphabet $A=\\{a_1,\\ldots,a_r\\}$, then for all $x_1,\\ldots,x_n\\in A$,<br \/>\n\\begin{align*}<br \/>\n\\mathbb{P}((X_1,\\ldots,X_n)=(x_1,\\ldots,x_n))<br \/>\n&=\\prod_{i=1}^r\\mu_i^{\\sum_{k=1}^n\\mathbb{1}_{x_k=a_i}}<br \/>\n=\\prod_{i=1}^r\\mu_i^{n\\nu_i}<br \/>\n=\\mathrm{e}^{n\\sum_{i=1}^n\\nu_i\\log\\mu_i}\\\\<br \/>\n&=\\mathrm{e}^{-n(\\mathrm{S}(\\nu)+\\mathrm{H}(\\nu\\mid\\mu))},<br \/>\n\\end{align*} a remarkable identity where $\\mathrm{S}(\\nu)$ is the Boltzmann-Shannon entropy considered before, and where $\\mathrm{H}(\\nu\\mid\\mu)$ is a new quantity known as the Kullback-Leibler divergence or relative entropy :<br \/>\n\\[<br \/>\n\\mathrm{S}(\\nu):=-\\sum_{i=1}^r\\nu_i\\log\\nu_i=-\\int f(x)\\log f(x)\\mathrm{d}x<br \/>\n\\] where $f$ is the density of $\\nu$ with respect to the counting measure $\\mathrm{d}x$, and<br \/>\n\\[<br \/>\n\\mathrm{H}(\\nu\\mid\\mu):=\\sum_{i=1}^r\\nu_i\\log\\frac{\\nu_i}{\\mu_i}<br \/>\n=\\sum_{i=1}^r\\frac{\\nu_i}{\\mu_i}\\log\\frac{\\nu_i}{\\mu_i}\\mu_i<br \/>\n=\\int\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\log\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}\\mathrm{d}\\mu.<br \/>\n\\] This comes from information theory after Solomon Kullback (1907 - 1994) and Richard Leibler (1914 -- 2003). Here $\\mathrm{S}(\\nu)$ measures the combinatorics on $x_1,\\ldots,x_n$ at prescribed frequencies $\\nu$, while $\\mathrm{H}(\\nu\\mid\\mu)$ measures the cost or energy of deviation from the actual distribution $\\mu$. This is a Boltzmann--Gibbsfication of the probability $\\mathbb{P}((X_1,\\ldots,X_n)=(x_1,\\ldots,x_n))$, see below, leading via the Laplace method to the large deviations principle of Ivan Nikolaevich Sanov (1919 -- 1968). The Jensen inequality for the strictly convex function $u\\mapsto u\\log(u)$ gives<br \/>\n\\[<br \/>\n\\mathrm{H}(\\nu\\mid\\mu)\\geq0\\quad\\text{with equality iff}\\quad\\nu=\\mu.<br \/>\n\\]<\/p>\n<p style=\"text-align: justify;\"><strong>Statistics.<\/strong> If $Y_1,\\ldots,Y_n$ are independent and identically distributed random variables of law $\\mu^{(\\theta)}$ in parametric family parametrized by $\\theta$, on a finite set $A$, then, following Ronald Aylmer Fisher (1890 - 1962), the likelihood of data $(x_1,\\ldots,x_n)\\in A^n$ is<br \/>\n\\[<br \/>\n\\ell_{x_1,\\ldots,x_n}(\\theta):=\\mathbb{P}(Y_1=x_1,\\ldots,Y_n=x_n)<br \/>\n=\\prod_{i=1}^n\\mu^{(\\theta)}_{x_i}.<br \/>\n\\] It can also be seen as the likelihood of $\\theta$ with respect to $x_1,\\ldots,x_n$. This dual point of view leads to the following : if $X_1,\\ldots,X_n$ is an observed sample of $\\mu^{(\\theta_*)}$ with $\\theta_*$ unknown then the maximum likelihood estimator of $\\theta_*$ is<br \/>\n\\[<br \/>\n\\widehat{\\theta}_n:=\\arg\\max_{\\theta\\in\\Theta}\\ell_{X_1,\\ldots,X_n}(\\theta)<br \/>\n=\\arg\\max_{\\theta\\in\\Theta}\\Bigr(\\frac{1}{n}\\log\\ell_{X_1,\\ldots,X_n}(\\theta)\\Bigr).<br \/>\n\\] The asymptotic analysis via the law of large numbers reveals entropy as asymptotic contrast<br \/>\n\\begin{align*}<br \/>\n\\frac{1}{n}\\log\\ell_{X_1,\\ldots,X_n}(\\theta)<br \/>\n&=\\frac{1}{n}\\sum_{i=1}^n\\log\\mu^{(\\theta)}_{X_i}\\\\&\\xrightarrow[n\\to\\infty]{\\mathrm{a.s.}}<br \/>\n\\sum_{k=1}^r\\mu^{(\\theta_*)}_k\\log\\mu^{(\\theta)}_k<br \/>\n=\\underbrace{-\\mathrm{S}(\\mu^{(\\theta_*)})}_{\\text{const}}-\\mathrm{H}(\\mu^{(\\theta_*)}\\mid\\mu^{(\\theta)}).<br \/>\n\\end{align*}\n<\/p>\n<p style=\"text-align: justify;\"><strong>Analysis.<\/strong> The entropy appears naturally as a derivative of the $L^p$ norm of $f\\geq0$ as follows:<br \/>\n\\[<br \/>\n\\partial_p\\|f\\|_p^p<br \/>\n=\\partial_p\\int f^p\\mathrm{d}\\mu<br \/>\n=\\partial_p\\int \\mathrm{e}^{-p\\log(f)}\\mathrm{d}\\mu<br \/>\n=\\int f^p\\log(f)\\mathrm{d}\\mu<br \/>\n=\\frac{1}{p}\\int f^p\\log(f^p)\\mathrm{d}\\mu.<br \/>\n\\] This is at the heart of the Leonard Gross (1931 -- ) theorem relating the hypercontractivity of Markov semigroups with the logarithmic Sobolev inequality for the invariant measure. This can also be used to extract from the William Henry Young (1843 - 1942) convolution inequalities certain entropic uncertainty principles.\n<\/p>\n<p style=\"text-align: justify;\"><strong>Boltzmann-Gibbs measures, variational characterizations, and Helmholtz free energy.<\/strong> We take $V:A\\to\\mathbb{R}$, interpreted as an energy. Maximizing $\\mu\\mapsto\\mathrm{S}(\\mu)$ over the constraint of average energy $\\int V\\mathrm{d}\\mu=v$ gives the maximizer \\[<br \/>\n\\mu_\\beta<br \/>\n:=\\frac{1}{Z_\\beta}\\mathrm{e}^{-\\beta V}\\mathrm{d}x<br \/>\n\\quad\\text{where}\\quad<br \/>\nZ_\\beta:=\\int\\mathrm{e}^{-\\beta V}\\mathrm{d}x.<br \/>\n\\] We use integrals instead of sums to lightnen notation. The notation $\\mathrm{d}x$ stands for the counting measure on $A$. The parameter $\\beta>0$, interpreted as inverse temperature, is dictated by $v$. Such a probability distribution $\\mu_\\beta$ is known as a Boltzmann-Gibbs distribution, after Ludwig Eduard Boltzmann (1844 - 1906) and Josiah Willard Gibbs (1839 - 1903). We have a variational characterization as a maximum entropy at fixed average energy :<br \/>\n\\[<br \/>\n\\int V\\mathrm{d}\\mu=\\int V\\mathrm{d}\\mu_\\beta<br \/>\n\\quad\\Rightarrow\\quad<br \/>\n\\mathrm{S}(\\mu_\\beta)-\\mathrm{S}(\\mu)<br \/>\n=\\mathrm{H}(\\mu\\mid\\mu_\\beta).<br \/>\n\\] There is a dual point of view in which instead of fixing the average energy, we fix the inverse temperature $\\beta$ and we introduce the Hermann von Helmholtz (1821 - 1894) free energy<br \/>\n\\[<br \/>\n\\mathrm{F}(\\mu):=\\int V\\mathrm{d}\\mu-\\frac{1}{\\beta}\\mathrm{S}(\\mu)<br \/>\n\\] This can be seen as a Joseph-Louis Lagrange (1736 - 1813) point of view in which the constraint is added to the functional. We have<br \/>\n\\[<br \/>\n\\mathrm{F}(\\mu_\\beta)=-\\frac{1}{\\beta}\\log(Z_\\beta)<br \/>\n\\quad\\text{since}\\quad<br \/>\n\\mathrm{S}(\\mu_\\beta)=\\beta\\int V\\mathrm{d}\\mu_\\beta+\\log Z_\\beta.<br \/>\n\\] We have then a new variational characterization as a minimum free energy at fixed temperature :<br \/>\n\\[<br \/>\n\\mathrm{F}(\\mu)-\\mathrm{F}(\\mu_\\beta)=\\frac{1}{\\beta}\\mathrm{H}(\\mu\\mid\\mu_\\beta).<br \/>\n\\] This explains why $\\mathrm{H}$ is often called free energy.<\/p>\n<p style=\"text-align: justify;\"><strong>Legrendre transform.<\/strong> The relative entropy $\\nu\\mapsto\\mathrm{H}(\\nu\\mid\\mu)$ is the Legendre transform of the log-Laplace transform, in the sense that<br \/>\n\\[<br \/>\n\\sup_g\\Bigr\\{\\int g\\mathrm{d}\\nu-\\log\\int\\mathrm{e}^g\\mathrm{d}\\mu\\Bigr\\}=\\mathrm{H}(\\nu\\mid\\mu).<br \/>\n\\] Indeed, for all $h$ such that $\\int\\mathrm{e}^h\\mathrm{d}\\mu=1$, by the Jensen inequality, with $f:=\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}$,<br \/>\n\\begin{align*}<br \/>\n\\int h\\mathrm{d}\\nu<br \/>\n&=\\int f\\log(f)\\mathrm{d}\\mu+\\int\\log\\frac{\\mathrm{e}^h}{f}f\\mathrm{d}\\mu\\\\<br \/>\n&\\leq\\int f\\log(f)\\mathrm{d}\\mu+\\log\\int\\mathrm{e}^h\\mathrm{d}\\mu<br \/>\n=\\int f\\log(f)\\mathrm{d}\\mu=\\mathrm{H}(\\nu\\mid\\mu),<br \/>\n\\end{align*} and equality is achieved for $h=\\log f$. It remains to reparametrize with $h=g-\\log\\int\\mathrm{e}^g\\mathrm{d}\\mu$. Conversely, the Legendre transform of the relative entropy is the log-Laplace transform :<br \/>\n\\[<br \/>\n\\sup_{\\nu}<br \/>\n\\Bigr\\{\\int g\\mathrm{d}\\nu-\\mathrm{H}(\\nu\\mid\\mu)\\Bigr\\}=\\log\\int\\mathrm{e}^g\\mathrm{d}\\mu.<br \/>\n\\] This is an instance of the convex duality for the convex functional $\\nu\\mapsto\\mathrm{H}(\\nu\\mid\\mu)$.<\/p>\n<p style=\"text-align: justify;\">Same story for $-\\mathrm{S}$ which is convex as a function of the Lebesgue density of its argument.<\/p>\n<p style=\"text-align: justify;\"><strong>Heat equation.<\/strong> The heat equation $\\partial_tf_t=\\Delta f_t$ is the gradient flow of entropy :$$\\partial_t\\int f_t\\log(f_t)\\mathrm{d}x=-\\int\\frac{\\|\\nabla f_t\\|^2}{f_t}\\mathrm{d}x$$where we used integration by parts, the right hand side is the Fisher information. In other words, the entropy is a Lyapunov function for the heat equation seen as an infinite dimensional ODE.<\/p>\n<p style=\"text-align: justify;\"><strong>Further reading.<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/djalil.chafai.net\/blog\/2022\/04\/02\/boltzmann-gibbs-entropic-variational-principle\/\">Boltzmann-Gibbs entropic variational principle<\/a><br \/>\nOn this blog (2022)<\/li>\n<li><a href=\"https:\/\/djalil.chafai.net\/blog\/2015\/03\/16\/entropy-ubiquity\/\">Entropy ubiquity<\/a><br \/>\nOn this blog (2015)<\/li>\n<li><a href=\"https:\/\/djalil.chafai.net\/blog\/2012\/04\/26\/bosons-and-fermions\/\">Bosons and fermions<\/a><br \/>\nOn this blog (2012)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Why and how entropy emerges in basic mathematics? This tiny post aims to provide some answers. We have already tried something in this spirit in&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2024\/04\/06\/a-few-words-about-entropy\/\">Continue reading<span class=\"screen-reader-text\">A few words about entropy<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":641},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/19986"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=19986"}],"version-history":[{"count":139,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/19986\/revisions"}],"predecessor-version":[{"id":20126,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/19986\/revisions\/20126"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=19986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=19986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=19986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}