{"id":8203,"date":"2015-03-16T23:55:07","date_gmt":"2015-03-16T22:55:07","guid":{"rendered":"http:\/\/djalil.chafai.net\/blog\/?p=8203"},"modified":"2022-04-03T13:19:54","modified_gmt":"2022-04-03T11:19:54","slug":"entropy-ubiquity","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2015\/03\/16\/entropy-ubiquity\/","title":{"rendered":"Entropy ubiquity"},"content":{"rendered":"<figure id=\"attachment_8205\" aria-describedby=\"caption-attachment-8205\" style=\"width: 237px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/en.wikipedia.org\/wiki\/Ludwig_Boltzmann\"><img loading=\"lazy\" class=\"wp-image-8205 size-medium\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Boltzmann-237x300.jpg\" alt=\"Ludwig Boltzmann (1844 - 1906)\" width=\"237\" height=\"300\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Boltzmann-237x300.jpg 237w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Boltzmann-118x150.jpg 118w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Boltzmann.jpg 257w\" sizes=\"(max-width: 237px) 100vw, 237px\" \/><\/a><figcaption id=\"caption-attachment-8205\" class=\"wp-caption-text\">Ludwig Boltzmann (1844 - 1906)<\/figcaption><\/figure>\n<p style=\"text-align: justify;\">Recently a friend of mine asked about finding a good reason to explain the presence of the Boltzmann-Shannon entropy here and there in mathematics. Well, a vague answer is to simply say that the logarithm is already in many places, waiting for a nice interpretation. A bit less vaguely, here are some concrete fundamental formulas involving the Boltzmann-Shannon entropy \\( {\\mathcal{S}} \\) also denoted \\( {-H} \\).<\/p>\n<p style=\"text-align: justify;\"><b>Combinatorics.<\/b> If \\( {n=n_1+\\cdots+n_r} \\) and \\( {\\lim_{n\\rightarrow\\infty}\\frac{(n_1,\\ldots,n_r)}{n}=(p_1,\\ldots,p_r)} \\) then<\/p>\n<p style=\"text-align: center;\">\\[ \\frac{1}{n}\\log\\binom{n}{n_1,\\ldots,n_r} =\\frac{1}{n}\\log\\frac{n!}{n_1!\\cdots n_r!} \\underset{n\\rightarrow\\infty}{\\longrightarrow} -\\sum_{k=1}^r p_k\\log(p_k)=:\\mathcal{S}(p_1,\\ldots,p_r). \\]<\/p>\n<p style=\"text-align: justify;\">Also \\( {\\binom{n}{n_1,\\ldots,n_r}\\approx e^{-nH(p_1,\\ldots,p_r)}} \\) when \\( {n\\gg1} \\) and \\( {\\frac{(n_1,\\ldots,n_r)}{n}\\approx (p_1,\\ldots,p_r)} \\). Wonderful!<\/p>\n<p style=\"text-align: justify;\"><b>Volumetrics.<\/b> In terms of microstates and macrostate we also have<\/p>\n<p style=\"text-align: center;\">\\[ \\inf_{\\varepsilon&gt;0} \\varlimsup_{n\\rightarrow\\infty} \\frac{1}{n} \\log\\left| \\left\\{ f:\\{1,\\ldots,n\\}\\rightarrow\\{1,\\ldots,r\\}: \\max_{1\\leq k\\leq r}\\left|\\frac{f^{-1}(k)}{n}-p_k\\right|&lt;\\varepsilon \\right\\}\\right| =\\mathcal{S}(p_1,\\ldots,p_r). \\]<\/p>\n<p style=\"text-align: justify;\">This formula can be related to the <a href= \"\/scripts\/search.php?q=Sanov+large+deviation+principle\">Sanov Large Deviations Principle<\/a>, some sort of refinement of the <a href= \"\/scripts\/search.php?q=strong+law+of+large+numbers\">strong Law of Large Numbers<\/a>.<\/p>\n<p style=\"text-align: justify;\"><b>Maximization.<\/b> If \\( {\\displaystyle\\int\\!V(x)\\,f(x)\\,dx=\\int\\!V(x)f_\\beta(x)\\,dx} \\) with \\( {f_\\beta(x)=\\frac{e^{-\\beta V(x)}}{Z_\\beta}} \\) then<\/p>\n<p style=\"text-align: center;\">\\[ \\mathcal{S}(f_\\beta) - \\mathcal{S}(f) =\\int\\!\\frac{f}{f_\\beta}\\log\\frac{f}{f_\\beta}f_\\beta\\,dx \\geq\\left(\\int\\!\\frac{f}{f_\\beta}f_\\beta\\,dx\\right)\\log\\left(\\int\\!\\frac{f}{f_\\beta}f_\\beta\\,dx\\right)=0. \\]<\/p>\n<p style=\"text-align: justify;\">This formula plays an important role in <a href= \"\/scripts\/search.php?q=statistical+physics\">statistical physics<\/a> and in <a href=\"\/scripts\/search.php?q=Bayesian+statistics\">Bayesian statistics<\/a>.<\/p>\n<p style=\"text-align: justify;\"><b>Legendre transform is log-Laplace.<\/b> If \\( {\\mu} \\) and \\( {\\nu} \\) are probability measures then<\/p>\n<p style=\"text-align: center;\">\\[ \\sup_{\\substack{g\\geq0\\\\\\int g\\mathrm{d}\\mu=1}} \\Bigr\\{\\int fg\\mathrm{d}\\mu-\\int f\\log f\\mathrm{d}\\mu\\Bigr\\} =\\log\\int\\mathrm{e}^f\\mathrm{d}\\mu \\quad\\text{where}\\quad f:=\\frac{\\mathrm{d}\\nu}{\\mathrm{d}\\mu}. \\]<\/p>\n<p style=\"text-align: justify;\">and the reverse identity by convex duality<\/p>\n<p style=\"text-align: center;\">\\[ \\int f\\log f\\mathrm{d}\\mu =\\sup_{\\int g\\mathrm{d}\\mu=1} \\Bigr\\{\\int fg\\mathrm{d}\\mu-\\log\\int\\mathrm{e}^g\\mathrm{d}\\mu\\Bigr\\}. \\]<\/p>\n<p style=\"text-align: justify;\">Involved notably in <b>large deviations theory<\/b> and <b>functional inequalities<\/b>.<\/p>\n<p style=\"text-align: justify;\"><b>Likelihood.<\/b> If \\( {X_1,X_2,\\ldots} \\) are i.i.d. r.v. on \\( {\\mathbb{R}^d} \\) with density \\( {f} \\) then<\/p>\n<p style=\"text-align: center;\">\\[ L(f;X_1,\\ldots,X_n) =\\frac{1}{n}\\log(f(X_1,\\ldots,X_n)) \\overset{a.s.}{\\underset{n\\rightarrow\\infty}{\\longrightarrow}} \\int\\!f\\log(f)\\,dx=:-\\mathcal{S}(f). \\]<\/p>\n<p style=\"text-align: justify;\">This formula allows to reinterpret the <a href= \"\/scripts\/search.php?q=maximum+likelihood+estimator\">maximum likelihood estimator<\/a> as a <a href= \"\/scripts\/search.php?q=minimum+contrast+estimator\">minimum contrast estimator<\/a> for the <a href= \"\/scripts\/search.php?q=Kullback-Leibler+divergence\">Kullback-Leibler divergence<\/a> or relative entropy. It is also at the heart of <a href=\"\/scripts\/search.php?q=Shannon+coding+theorem\">Shannon coding theorems<\/a> in information theory.<\/p>\n<p style=\"text-align: justify;\"><b>\\( {L^p} \\) norms.<\/b> If \\( {f\\geq0} \\) then<\/p>\n<p style=\"text-align: center;\">\\[ \\partial_{p=1}\\left\\Vert f\\right\\Vert_p^p =\\partial_{p=1}\\int\\!e^{p\\log(f)}\\,dx =\\int\\!f\\log(f)\\,dx =-\\mathcal{S}(f). \\]<\/p>\n<p style=\"text-align: justify;\">This formula is at the heart of a famous theorem of <a href= \"\/scripts\/search.php?q=Leonard+Gross+Mathematics\">Leonard Gross<\/a> which relates the <a href= \"\/scripts\/search.php?q=hypercontractivity\">hypercontractivity<\/a> of <a href= \"\/scripts\/search.php?q=ergodic+Markov+semigroups\">ergodic Markov semigroups<\/a> with a <a href= \"\/scripts\/search.php?q=logarithmic+Sobolev+inequality\">logarithmic Sobolev inequality<\/a> for the invariant measure of the semigroup.<\/p>\n<p style=\"text-align: justify;\"><b>Fisher information.<\/b> If \\( {\\partial_t f_t(x)=\\Delta f_t(x)} \\) then by integration by parts<\/p>\n<p style=\"text-align: center;\">\\[ \\partial_t\\mathcal{S}(f_t) =-\\int\\!\\log(f_t)\\,\\Delta f_t\\,dx =\\int\\!\\frac{\\left|\\nabla f_t\\right|^2}{f_t}\\,dx =\\mathcal{F}(f_t). \\]<\/p>\n<p style=\"text-align: justify;\">This formula, attribued to <a href= \"http:\/\/en.wikipedia.org\/wiki\/Nicolaas_Govert_de_Bruijn\">de Bruijn<\/a>, is at the heart of the analysis and geometry of <a href=\"\/scripts\/search.php?q=heat+kernel\">heat kernels<\/a>, <a href=\"\/scripts\/search.php?q=diffusion+processes\">diffusion processes<\/a>, and <a href= \"\/scripts\/search.php?q=gradient+flows\">gradient flows<\/a> in partial differential equations.<\/p>\n<figure id=\"attachment_8206\" aria-describedby=\"caption-attachment-8206\" style=\"width: 212px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/en.wikipedia.org\/wiki\/Claude_Shannon\"><img loading=\"lazy\" class=\"wp-image-8206 size-medium\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Shannon-212x300.jpg\" alt=\"Claude Shannon (1916 - 2001)\" width=\"212\" height=\"300\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Shannon-212x300.jpg 212w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Shannon-106x150.jpg 106w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2015\/03\/Shannon.jpg 216w\" sizes=\"(max-width: 212px) 100vw, 212px\" \/><\/a><figcaption id=\"caption-attachment-8206\" class=\"wp-caption-text\">Claude Shannon (1916 - 2001)<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Recently a friend of mine asked about finding a good reason to explain the presence of the Boltzmann-Shannon entropy here and there in mathematics. Well,&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2015\/03\/16\/entropy-ubiquity\/\">Continue reading<span class=\"screen-reader-text\">Entropy ubiquity<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":348},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/8203"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=8203"}],"version-history":[{"count":18,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/8203\/revisions"}],"predecessor-version":[{"id":16040,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/8203\/revisions\/16040"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=8203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=8203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=8203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}