Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Mellin transform and Riesz potentials

Photo of Robert Hjalmar Mellin (1854 – 1933)
Robert Hjalmar Mellin (1854 – 1933)

Even if the Mellin transform is just a deformed Fourier or Laplace transform, it plays a pleasant role in all the mathematics involving powers and integral transforms, such as Dirichlet series in number theory, Riesz power kernels in potential theory, mathematical statistics, etc. In particular, the famous Meijer G-function is defined as the inverse Mellin transform of ratios of products of Gamma functions. This tiny post is about an application of the Mellin transform to Riesz integral formulas, taken from an article by Bartłomiej Dyda, Alexey Kuznetsov, and Mateusz Kwaśnicki.

The Mellin transform and its inverse. Quoting Davies’s book on integral transforms (chapter 12), we recall that the Fourier transform pair may be written in the form\begin{align}
A(\theta)&:=\int_{\mathbb{R}}a(t)\mathrm{e}^{\mathrm{i}\theta t}\mathrm{d}t,\quad\alpha<\Im\theta<\beta,\\
a(t)&=\frac{1}{2\pi}\int_{\mathrm{i}c+\mathbb{R}}A(\theta)\mathrm{e}^{-\mathrm{i}\theta t}\mathrm{d}\theta,\quad\alpha<c<\beta.
\end{align} The Mellin transform and its inverse follows if we introduce the variable change
z=\mathrm{i}\theta,\quad x=\mathrm{e}^t,\quad f(x)=a(\log(x)),
\end{equation} so that we obtain the reciprocal pair of integral transforms, for $f:(0,+\infty)\to\mathbb{R}$,
F(z)&:=\int_0^\infty f(x)x^{z-1}\mathrm{d}x,\quad\alpha<\Re z<\beta,\\
\end{align} These are the Mellin transform, and the Mellin inversion formula. The integral defining the transform normally exists only in the strip $\alpha<\Re(z)<\beta$; therefore the inversion contour must be placed in this strip.
For convenience we also denote by $\mathcal{M}f=F$ the Mellin transform of $f$.

The Mellin transform of $x\mapsto\mathrm{e}^{-x}$ is the Euler $\Gamma$ function. Its poles are $0,-1,-2,-3,\ldots$ In the same spirit, the Mellin transform of $x\mapsto(1-x)_+^{b-1}$ at point $z$ is $$\int_0^1x^{z-1}(1-x)^{b-1}\mathrm{d}x=\mathrm{Beta}(z,b).$$Recall the definition of the Euler Beta function $\mathrm{Beta}(a,b):=\int_0^1t^{a-1}(1-t)^{b-1}\mathrm{d}t=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$.

Lemma (Riesz potential of radial functions). Suppose that
x\in\mathbb{R}^d\mapsto f(x)=\varphi(|x|^2)
\] where $\varphi:\mathbb{C}\to\mathbb{R}$ is given as the absolutely convergent inverse Mellin transform
\text{for some }\lambda\in\mathbb{R}.
\] If $0<\alpha<2\lambda<d$ then the Riesz potential $(\left|\cdot\right|^{-(d-\alpha)}*f)(x)$ is well defined for $x\neq0$, and
\] where \[
\] In other words, the Mellin transform of $\psi$ satisfies

Proof of Lemma (Riesz potential of radial functions). The Lemma is Proposition 2 in Dyda-Kuznetsov-Kwaśnicki with $V\equiv1$ and $l=0$. The idea is to use the inverse Mellin transform of $\varphi$ to reduce the problem, via the Fubini theorem, to the Riesz potential of inverse powers of the norm, which is immediate from the semigroup property of the Riesz kernel. Namely, following eq. (1.1.12) in Landkof’s book or eq. (8) p. 118 in Stein’s book, on $\mathbb{R}^d$, the semigroup property for Riesz kernels reads, for all $\alpha,\beta\in\mathbb{C}$ such that $\Re\alpha,\Re\beta>0$ and $\Re\alpha+\Re\beta<d$, \begin{equation}\left|\cdot\right|^{-(d-\alpha)}*\left|\cdot\right|^{-(d-\beta)}
\end{equation}Now, by the inverse Mellin transform of $\varphi$, the Fubini theorem, and the semigroup property,

Theorem (Riesz integral formula). Here is a Riesz integral formula mentioned in the Appendix of Landkof’s book, in eq.~(1.6) of C.-Saff-Womersley, and in Remark 1 of Dyda-Kuznetsov-Kwaśnicki : if $d\geq1$ and $0<s<d$. Define, on $\mathbb{R}^d$,\[f:=(1-\left|\cdot\right|^2)_+^{\frac{s-d}{2}}.
\] Then, for all $x\in\mathbb{R}^d$ such that $0<|x|<1$,
\] The last equality simply comes from the Euler reflection formula $\Gamma(z)\Gamma(1-z)=\frac{\pi}{\sin(\pi z)}$.

Proof of the Riesz integral formula. The theorem is a special case of Corollary 4 in Dyda-Kuznetsov-Kwaśnicki, namely with $V\equiv1$, $l=0$, $\alpha=s-d$, $\delta=d$, $\rho = \sigma = -\frac{d-s}{2}$, $0<s<d-2$ (implies $\sigma>-1$). Let us give the proof extracted from there. With $\varphi(r):=(1-r)_+^{\frac{s-d}{2}}$, we have
\end{equation}and by the Lemma above,
\end{align} Now, if the vertical line $\lambda+\mathrm{i}\mathbb{R}$ separates the poles of $z\mapsto\Gamma(z)$ and of $z\mapsto\Gamma(\frac{s}{2}-z)$, then
&=\sum_{k=0}^\infty\frac{\Gamma(\frac{s}{2}+k)}{\Gamma(\frac{d}{2}+k)\Gamma(-k+1)}\frac{(-x)^{k}}{k!} =\frac{\Gamma(\frac{s}{2})}{\Gamma(\frac{d}{2})}.

Recall that the $\Gamma$ function has no zeros, indeed a zero leads via $\Gamma(z)=(z-1)\Gamma(z-1)$ to infinitely many zeros to the left, and then via $\Gamma(z)\Gamma(1-z)=\frac{\pi}{\sin(\pi z)}$ to infinitely many poles to the right, which contradicts the analycity of $\Gamma$ on $\Re z>0$.

Recall also that the $\Gamma$ function is meromorphic on the complex plane, its poles are the non-positive integers, and are simple. Moreover, using $(z+n)\Gamma(z)=\frac{\Gamma(z+n+1)}{z(z+1)\cdots(z+n-1)}$ we get $$\mathrm{Residue}_{z=-n}(\Gamma(z)):=\lim_{z\to-n}(z-(-n))\Gamma(z)=\frac{(-1)^n}{n!}.$$

Meijer G-functions. A key point of the proof above is the computation of the inverse Mellin transform of a certain ratio of products of Gamma functions (Mellin transfrom of $\psi$). This is actually the definition of Meijer G-functions. If for example $$f(x):=(1-|x|^2)_+^\sigma\ {}_2F_1(a,b;c;1-|x|^2)=\varphi(|x|^2)$$ where $\varphi(r)=(1-r)_+^\sigma\ {}_2F_1(a,b;c;r)$, then it is possible, by using the same method as above, to express $\varphi$ as a Meijer G-function, and to deduce that the Riesz potential of $f$ on the unit ball is equal to another Meijer G-function, which reduces to a hypergeometric function in certain cases. This is explored in Dyda-Kuznetsov-Kwaśnicki and references therein.

Goody: Mellin transform of Gauss hypergeometric function. Recall the definition $${}_2F_1(a,b;c;z):=\sum_{n=0}^\infty\frac{(a)_n(b)_n}{(c)_n}\frac{z^n}{n!}$$ where $(a)_n:=a(a+1)\cdots(a+n-1)$ if $n\geq1$ and $(a)_0:=1$, in other words $(a)_n=\Gamma(a+n)/\Gamma(a)$. Now, if $f(x):={}_2F_1(a,b;c;-x)$ then \[\mathcal{M}f(z)
=\frac{\Gamma(c)}{\Gamma(a)\Gamma(b)}\frac{\Gamma(z)\Gamma(a-z)\Gamma(b-z)}{\Gamma(c-z)}.\] To see it, we can start from the Euler integral formula (which can be proved by a series expansion of $(1+xt)^{-b}$ using the Newton binomial series $(1-z)^{-\alpha}=\sum_{n=0}^\infty(\alpha)_n\frac{z^n}{n!}$) : \[{}_2F_1(a,b;c;-x) =\frac{\Gamma(c)}{\Gamma(a)\Gamma(c-a)} \int_0^1t^{a-1}(1-t)^{c-a-1}(1+xt)^{-b}\mathrm{d}t. \] This mixture representation gives, by the Fubini theorem, with $g_{t,b}(x):=(1+xt)^{-b}$,
\] But by using the change of variable $y=(1+xt)^{-1}$ we get
=\int_0^\infty x^{z-1}g_{t,b}(x)\mathrm{d}x
\] It remains to note that $\int_0^1t^{a-1}(1-t)^{c-a-1}t^{-z}\mathrm{d}t=\mathrm{Beta}(c-a,a-z)$ and \[
\] Alternatively, we could compute the inverse Mellin transform by using the residue formula.

Motivation. The original proof by Riesz of the integral formula, largely geometric, is also given in the Appendix of Landkof’s book and of C.-Saff-Womersley. We were happy to locate a relatively short analytic proof, due to Dyda-Kuznetsov-Kwaśnicki, which is the subject of this post.

Further reading. 

Leave a Comment

From Mao Zedong to Xi Jinping

Chinese President Xi Jinping, front row center, stands with his cadres during the Communist song at the closing ceremony for the 19th Party Congress at the Great Hall of the People in Beijing on Oct. 24, 2017. (AP Photo/Ng Han Guan)

List of courses taken by a third year student from a major Chinese university in 2022 :

  • Abstract Algebra
  • Advanced linear Algebra
  • Appreciation of Symphony Music
  • Complex Analysis
  • Differential Geometry
  • English for Academic Purposes : Research Paper Writing
  • English for Academic Purposes : Spoken Communication
  • Experience of Manufacturing Engineering
  • Foundation of Vocal Music
  • Game Theory
  • How To Start a Startup – Face To Face with Famous Entrepreneurs
  • Ideological Moral and Legal Education
  • Introduction to Mao Zedong Thought and Socialism with Chinese Characteristics
  • Introduction to Mao Zedong Thoughts and Theoretical System of Socialism with Chinese Characteristic
  • Introduction to the Social Service of College Students
  • Introduction to Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era
  • Mathematical Analysis
  • Math from Examples
  • Measures and Integrals
  • Military Skills
  • Military Theory
  • Ordinary Differential Equations
  • Outline of Modern Chinese History
  • Physical Education
  • Physics for Scientists and Engineers
  • Principle of Marxist Philosophy
  • Probability and Statistics
  • Probability Theory
  • Programming Fundamentals
  • Situation and Policy
  • Swimming Competency Test
  • Sympathetic of Drama
  • The Practice of C++ Programming
Leave a Comment

Unexpected phenomena for equilibrium measures

Photo of Marcel Riesz
Marcel Riesz (1886 -1969)

This post is about Riesz energy problems, a subject that I like to explore with Edward B. Saff (Vanderbilt University, USA) and Robert S. Womersley (UNSW Sydney, Australia).

Riesz kernel. For $-2<s<d$, the Riesz $s$-kernel in $\mathbb{R}^d$  is $$
\displaystyle\frac{1}{s\left|\cdot\right|^{s}} & \text{if } s\neq0\\[1em]
\displaystyle-\log\left|\cdot\right| & \text{if } s=0
$$ We recover the Coulomb or Newton kernel when $s=d-2$. This definition of the $s$-kernel allows to pass from $K_s$ to $K_0$ by removing the $1/s$ singularity at $s=0$, namely, for $x\neq0$, $$-\log|x|=\lim_{\underset{s\neq0}{s\to0}}\frac{|x|^{-s}-1}{s-0}=\lim_{\underset {s\neq0}{s\to0}}\Bigr(\frac{1}{s|x|^s}-\frac{1}{s}\Bigr).$$

Riesz energy. For $-2<s<d$, the Riesz energy of a probability measure $\mu$ on $\mathbb{R}^d$ is $$
\mathrm{I}_s(\mu):=\iint K_s(x-y)\mathrm{d}\mu(x)\mathrm{d}\mu(y)
$$ The Riesz energy is strictly convex and lower semi-continuous for the weak convergence of probability measures with respect to continuous and bounded test functions. This convexity is related to the Bochner positivity of $K_s$, which is a nice observation from harmonic analysis.

Equilibrium measure. The equilibrium measure on a ball $B_R:=\{x\in\mathbb{R}^d:|x|\leq R\}$ is
=\arg\min_{\substack{\mu\\\mathrm{supp}(\mu)\subset B_R}}\mathrm{I}_s(\mu).

Riesz original problem (1938). Equilibrium measure on $B_R$ when $d\geq2$ :
\sigma_R & \text{if $-2<s\leq d-2$}\\[1em]
\frac{\mathbf{1}_{B_R}}{(R^2-|x|^2)^{\frac{d-s}{2}}}\mathrm{d}x &
\text{if $0\leq d-2<s<d$}
$$ where $\sigma_R$ is the uniform distribution on the sphere $\{x\in\mathbb{R}^d:|x|=R\}$ of radius $R$.

The proof relies on the following integral formula for the variational characterization : $$
\int_{|y|\leq R}
\frac{|x-y|^{-s}}{(R^2-|y|^2)^{\frac{d-s}{2}}}\mathrm{d} y
x\in B_R
$$ The Riesz proof of this integral formula involves in turn a Kelvin transform and a reduction to the planar case. It can be found in detail in the Appendix of the book by Landkof (1972), and also with even more details in our 2022 JMAA article. A generalization (and a new proof) was published in 2017 by Dyda, Kuznetsov, and Kwaśnicki by using Fourier analysis.

The result expresses a threshold phenomenon : the support condensates on a sphere when $s$ passes the critical value $d-2$ (Coulomb). Our main finding is that this Riesz problem admits a full space extension in which we replace the ball support constraint with an external field. We show that a new threshold phenomenon occurs, related to the strenght of the external field.

External field equilibrium problem. The energy with external field $V$ on $\mathbb{R}^d$ is defined by $$\mathrm{I}(\mu)=\mathrm{I}_{s,V}(\mu):=\iint\left[K_s(x-y)+V(x)+V(y)\right]\mathrm{d}\mu(x)\mathrm{d}\mu(y)$$
and the associated equilibrium measure $$\mu_{\mathrm{eq}}=\arg\min_{\mu}\mathrm{I}(\mu)$$ The Frostman or Euler-Lagrange variational characterization of $\mu_{\mathrm{eq}}$ reads $$K_s*\mu+V
=c& \text{quasi-everywhere on }\mathrm{supp}(\mu)\\
\geq c&\text{quasi-everywhere outside }\mathrm{supp}(\mu)
\end{cases}$$ Quasi-everywhere means except on a set that cannot carry a probability measure of finite energy. By taking $V=\infty\mathbf{1}_{B_R^c}$ we recover the Riesz problem on the ball mentioned previously.

Coulomb case : $s=d-2$. The kernel $K_{d-2}$ is a Laplace fundamental solution :
\Delta K_{d-2}\overset{\mathcal{D}’}{=}-c_d\delta_0,\quad\text{with}\quad c_d=|\mathbb{S}^{d-1}|.
$$Also, restricted to the interior of $\mathrm{supp}(\mu_{\mathrm{eq}})$,
\mu_{\mathrm{eq}}\overset{\mathcal{D}’}{=}\frac{\Delta V}{c_d}
$$In particular, if $V=\left|\cdot\right|^\alpha$, $\alpha>0$, then
\quad\text{with}\quad R=\bigr(\frac{1}{\alpha}\bigr)^{\frac{1}{d-2+\alpha}}.$$ The proof relies crucially on the local nature of the Laplacian.

At this point we observe that the formula $$\Delta K_u=-c_{d,u}K_{u+2},\quad c_{d,u}:=d-2-u$$ suggests to apply iteratively $\Delta$ to reach the case $s=d-2n$ for an arbitrary positive integer $n$.

Findings for the iterated Coulomb case $s=d-2n, n=1,2,3,\ldots$. Then, restricted to the interior of $\mu_{\mathrm{eq}}$, in the sense of distributions,
$$ In particular : if $s=d-4$ and $V=\left|\cdot\right|^\alpha$, $\alpha\geq2$, then $C_{d,2}<0$ while $\Delta V=\alpha(\alpha+d-2)\left|\cdot\right|^{\alpha-2}\geq0$ and thus $\mu_{\mathrm{eq}}$ is necessarily singular! Actually the case $s=d-4$ can be analyzed completely, and this analysis reveals the singularity when $\alpha\geq2$ as well as a threshold condensation to this singular support when $\alpha$ reaches the critical value $2$.

Findings when $s=d-4$. Suppose that $V=\gamma\left|\cdot\right|^\alpha$, $\gamma>0,  \alpha>0$.

  • Let $d\geq4$ and $s=d-4\geq0$.
    • If $\alpha\geq2$ then $\mu_{\mathrm{eq}}=\sigma_R$ (indeed it is singular!) where $$
    • If $0<\alpha<2$ then (mixture!) $$\mu_{\mathrm{eq}}=\beta fm_d+(1-\beta)\sigma_R$$ where
  • Let $d=3$ and $s=d-4=-1$ (non-singular kernel!).
    • If $0<\alpha<1$, then $\mu_{\mathrm{eq}}$ does not exist (blowup)
    • If $\alpha=1$ and $\gamma\geq1$, then $\mu_{\mathrm{eq}}=\delta_0$ (collapse).
    • If $\alpha>1$, then $\mu_{\mathrm{eq}}$ is as above (mixture).

In contrast, there is no threshold condensation phenomenon when $s=d-3$.

Findings when $s=d-3$. Suppose that $V=\gamma\left|\cdot\right|^\alpha$, $\gamma>0, \alpha>0$.

  • If $s=d-3$ and $\alpha=2$ then $$\mu_{\mathrm{eq}}
    \mathrm{d}x$$ where $$R=\Bigr(\frac{\sqrt{\pi}}{4\gamma}\frac{\Gamma(\frac{s+4}{2})}{\Gamma(\frac{s+5}{2})}\Bigr)^{\frac{1}{s+2}}$$
  • This is also $\mu_{\mathrm{eq}}$ for $s=d-1$ on $B_R$ with this $R$.

Methods of proof.

  • Frostman or Euler-Lagrange variational characterization
  • Applying Laplacian on support of $\mu_{\mathrm{eq}}$
  • Rotational invariance and maximum principle
  • Dimensional reduction with Funk-Hecke formula
  • Orthogonal polynomials expansions
  • Integral formulas and special functions


  • Super-harmonic kernel and sub-harmonic external field
  • Non-locality of fractional Laplacian

Selected Open Problems.

  • When $s=d-3$ with $\alpha\neq2$, we conjecture that the support of the equilibrium measure is a ball if $0<\alpha<2$ and a full dimensional shell (annulus) if $\alpha>2$
  • When $s=d-6$, it could be that the support of the equilibrium measure is disconnected
  • Other norms in kernel and external field

Marcel Riesz (1886 – 1969) is the young brother of Frigyes Riesz (1880 – 1956). I do not known if Naoum Samoilovitch Landkof (1915 – 2004) has ever met in person Marcel Riesz. Landkof was a student of Mikhaïl Alekseïevitch Lavrentiev (1900 – 1980),  who gave his name to the Lavrentiev phenomenon in the calcul of variations. Landkof was an expert in potential theory. He advised Vladimir Alexandrovich Marchenko (1922 – ), famous notably for his findings on random operators and matrices with his student Leonid Pastur (1937 – ).

Further reading.

Photo of Naoum Samoilovitch Landkof
Naoum Samoilovitch Landkof (1915 – 2004)
Leave a Comment

Boltzmann-Gibbs entropic variational principle

Nicolas Léonard Sadi Carnot (1796 - 1932)
Nicolas Léonard Sadi Carnot (1796 – 1932), an Évariste Galois of Physics.

The aim of this short post is to explain why the maximum entropy principle could be better seen as a minimum relative entropy principle, in other words an entropic projection.

Relative entropy. Let $\lambda$ be a reference measure on some measurable space $E$. The relative entropy with respect to $\lambda$ is defined for every measure $\mu$ on $E$ with density $\mathrm{d}\mu/\mathrm{d}\lambda$ by $$\mathrm{H}(\mu\mid\lambda):=\int\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\log\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\mathrm{d}\lambda.$$ If the integral is not well defined, we could simply set $\mathrm{H}(\mu\mid\lambda):=+\infty$.

  • An important case is when $\lambda$ is a probability measure. In this case $\mathrm{H}$ becomes the Kullback-Leibler divergence, and the Jensen inequality for the strictly convex function $u\mapsto u\log(u)$ indicates then that $\mathrm{H}(\mu\mid\lambda)\geq0$ with equality if and only if $\mu=\lambda$.
  • Another important case is when $\lambda$ is the Lebesgue measure on $\mathbb{R}^n$ or the counting measure on a discrete set, then $$-\mathrm{H}(\mu\mid\lambda)$$ is the Boltzmann-Shannon entropy of $\mu$. Beware that when $E=\mathbb{R}^n$, this entropy takes its values in the whole $(-\infty,+\infty)$ since for all positive scale factor $\sigma>0$, denoting $\mu_\sigma$ the push forward of $\mu$ by the dilation $x\mapsto\sigma x$, we have $$\mathrm{H}(\mu_\sigma\mid\lambda)=\mathrm{H}(\mu\mid\lambda)-n\log \sigma.$$

Boltzmann-Gibbs probability measures. Such a probability measure $\mu_{V,\beta}$ takes the form $$\mathrm{d}\mu_{V,\beta}:=\frac{\mathrm{e}^{-\beta V}}{Z_{V,\beta}}\mathrm{d}\lambda$$ where $V:E\mapsto(-\infty,+\infty]$, $\beta\in[0,+\infty)$, and $$Z_{V,\beta}:=\int\mathrm{e}^{-\beta V}\mathrm{d}\lambda<\infty$$ is the normalizing factor. The more $\beta$ is large, the more $\mu_{V,\beta}$ puts its probability mass on the regions where $V$ is low. The corresponding asymptotic analysis, known as the Laplace method, states that as $\beta\to\infty$ the probability measure $\mu_{V,\beta}$ concentrates on the minimizers of $V$.

The mean of $V$ or $V$-moment of $\mu_{V,\beta}$ writes
\int V\mathrm{d}\mu_{V,\beta}
=-\frac{1}{\beta}\mathrm{H}(\mu_{V,\beta}\mid\lambda)-\frac{1}{\beta}\log Z_{V,\beta}.
In thermodynamics $-\frac{1}{\beta}\log Z_{V,\beta}$ appears as a Helmholtz free energy since it is equal to $\int V\mathrm{d}\mu_{V,\beta}$ (mean energy) minus $\frac{1}{\beta}\times-\mathrm{H}(\mu_{V,\beta}\mid\lambda)$ (temperature times entropy).

When $\beta$ ranges from $-\infty$ to $\infty$, the $V$-moment of $\mu_{V,\beta}$ ranges from $\sup V$ downto $\inf V$, and $$\partial_\beta\int V\mathrm{d}\mu_{V,\beta}=\Bigr(\int V\mathrm{d}\mu_{V,\beta}\Bigr)^2-\int V^2\mathrm{d}\mu_{V,\beta}\leq0.$$ If $\lambda(E)<\infty$ then $\mu_{V,0}=\frac{1}{\lambda(E)}\lambda$ and its $V$-moment is $\frac{1}{\lambda(E)}\int V\mathrm{d}\lambda$.

Variational principle. Let $\beta\geq0$ such that $Z_{V,\beta}<\infty$ and $c:=\int V\mathrm{d}\mu_{V,\beta}<\infty$. Then, among all the probability measures $\mu$ on $E$ with same $V$-moment as $\mu_{V,\beta}$, the relative entropy $\mathrm{H}(\mu\mid\lambda)$ is minimized by the Boltzmann-Gibbs measures $\mu_{V,\beta}$. In other words,$$\min_{\int V\mathrm{d}\mu=c}\mathrm{H}(\mu\mid\lambda)=\mathrm{H}(\mu_{V,\beta}\mid\lambda).$$

Indeed we have $$\begin{align*}\mathrm{H}(\mu\mid\lambda)-\mathrm{H}(\mu_{V,\beta}\mid\lambda)&=\int\log\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\mathrm{d}\lambda-\int\log\frac{\mathrm{d}\mu_{V,\beta}}{\mathrm{d}\lambda}\mathrm{d}\mu_{V,\beta}\\&=\int\log\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\mathrm{d}\lambda+\int(\log(Z_{V,\beta})+\beta V)\mathrm{d}\mu_{V,\beta}\\&=\int\log\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\mathrm{d}\lambda+\int(\log(Z_{V,\beta})+\beta V)\mathrm{d}\mu\\&=\int\log\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\mathrm{d}\lambda-\int\log\frac{\mathrm{d}\mu_{V,\beta}}{\mathrm{d}\lambda}\mathrm{d}\mu\\&=\mathrm{H}(\mu\mid\mu_{V,\beta})\geq0\end{align*}$$ with equality if and only if $\mu=\mu_{V,\beta}$. The crucial point is that $\mu$ and $\mu_{V,\beta}$ are equal on test functions of the form $a+bV$ where $a,b$ are arbitrary real constants, by assumption.

  • When $\lambda$ is the Lebesgue measure on $\mathbb{R}^n$ or the counting measure on a discrete set, we recover the usual maximum Boltzmann-Shannon entropy principe $$\max_{\int V\mathrm{d}\mu=c}-\mathrm{H}(\mu\mid\lambda)=-\mathrm{H}(\mu_{V,\beta}).$$In particular, Gaussians maximize the Boltzmann-Shannon entropy under variance constraint (take for $V$ a quadratic form), while the uniform measures maximize the Boltzmann-Shannon entropy under support constraint (take $V$ constant on a set of finite measure for $\lambda$, and infinity elsewere). Maximum entropy is minimum relative entropy with respect to Lebesgue or counting measure, a way to find, among the probability measures with a moment constraint, the closest to the Lebesgue or counting measure.
  • When $\lambda$ is a probability measure, then we recover the fact that the Boltzmann-Gibbs measures realize the projection or least Kullback-Leibler divergence of $\lambda$ on the set of probability measures with a given $V$-moment. This is the Csiszár $\mathrm{I}$-projection.
  • There are other interesiting applications, for instance when $\lambda$ is a Poisson point process.

Note. The concept of maximum entropy was studied notably by

and by Edwin Thompson Jaynes (1922 – 1998) in relation with thermodynamics, statistical physics, statistical mechanics, information theory, and Bayesian statistics. The concept of I-projection or minimum relative entropy was studied notably by Imre Csiszár (1938 – ).


Leave a Comment
Syntax · Style · .