Press "Enter" to skip to content

Month: March 2026

Convexity of entropies and integral representation

Photo of Constantin Carathéodory
Constantin Carathéodory (1873-1950), German mathematician of Greek origin. Started as an Engineer. He contributed to many fields including measure theory, complex analysis, and convex geometry

This post is about the structure of the set of $\Phi$ for which the $\Phi$-entropy is convex. It is a follow-up of a previous post published one year ago about variance and entropy. We connect it to a theorem by Stieltjes and Nevanlinna on an integral representation of certain smooth functions, which is the subject of another previous post.

$\Phi$-entropy and convexity. Let us consider a convex and $\mathcal{C}^4$ function $\Phi:I\to\mathbb{R}$ on an open interval $I\subset\mathbb{R}$. For every probability space $(\Omega,\mathcal{A},\mu)$, the following set is convex: \[ L^\Phi(\mu)=\{f:\Omega\to I\mid f\in L^1(\mu),\Phi(f)\in L^1(\mu)\} \] The $\Phi$-entropy of $f\in L^\Phi(\mu)$ is defined by \[ \mathrm{Ent}_\mu^\Phi(f)=\int\Phi(f)\mathrm{d}\mu-\Phi\Bigl(\int f\mathrm{d}\mu\Bigr). \] Moreover the following conditions are equivalent:

  • $f\mapsto\mathrm{Ent}_\mu^\Phi(f)$ is convex for all $(\Omega,\mathcal{A},\mu)$.
  • $(u,v)\mapsto\Phi''(u)v^2$ is convex
  • either $\Phi$ is affine, or $\Phi'' > 0$ and $1/\Phi''$ is concave (in other words $\Phi''\Phi''''\geq 2\Phi'''^2$).

We denote by $\mathcal{K}$ the set of such $\Phi$. For simplicity, we take from now on $I=\mathbb{R}_+=(0,+\infty)$.

Examples. The set $\mathcal{K}$ is convex, and contains all the affine functions. Basic non-affine examples of elements of $\mathcal{K}$ are

  • $\Phi(u)=u\log(u)$
  • $\Phi(u)=u^p$, $1 < p\leq 2$.

The set $\mathcal{K}$ is a convex cone in the sense that it is stable by linear combinations with non-negative coefficients. Since it contains all affine functions, we have $u\mapsto\frac{u^p-u}{p-1}\in\mathcal{K}$ for all $1 < p\leq 2$, and we recover $u\mapsto u\log(u)$ as $p\to1$.

The case $u\mapsto u^p$, $1 < p<2$ is not extreme in the sense that it is a mixture (convex conic combination) of affine functions and shifts of $u\mapsto u\log(u)$, more precisely \[ u^p=c_p\int_0^\infty t^{p-2}\varphi_t(u)\mathrm{d}t \] where \begin{align*} c_p&=\frac{p(p-1)\sin(\pi(p-1))}{\pi}\\ \varphi_t(u)&=(u+t)\log(u+t)\underbrace{-(1+\log(t))u-t\log(t)}_{\text{affine}} \end{align*} Indeed, this comes from the Stieltjes identity, \[ \int_0^\infty\frac{t^{a-1}}{u+t}\mathrm{d}t=\frac{\pi}{\sin(\pi a)}u^{a-1}, \quad 0 < a<1,\quad u>0, \] which boils down to an Euler Beta integral via the substitutions $s=t/u$, $x=1/(1+s)$ as \begin{align*} \int_0^\infty\frac{t^{a-1}}{u+t}\mathrm{d}t &=u^{a-1}\int_0^\infty\frac{s^{a-1}}{1+s}\mathrm{d}s\\ &=u^{a-1}\int_0^1(1-x)^{a-1}x^{-a}\mathrm{d}x =u^{a-1}\frac{\Gamma(a)\Gamma(1-a)}{\Gamma(1)} =u^{a-1}\frac{\pi}{\sin(\pi a)} \end{align*} where the last equality follows from the Euler reflection formula.

We could ask about the extremality of $u\mapsto u\log(u)$ and $u\mapsto u^2$, and some sort of integral representation of elements of $\mathcal{K}$ as mixtures of extreme points.

Extremality. The vector space of affine functions $\mathcal{A}=\{u\mapsto au+b:a,b\in\mathbb{R}\}$ is contained in $\mathcal{K}$. Thus $\mathcal{K}$ has no extreme points, indeed for all $f\in\mathcal{K}$ and $a\in\mathcal{A}$, $f=(f+a)+(-a)$ which is the sum of two elements of $\mathcal{K}$. A natural way to remove the affine part is to work with the second derivative, namely to consider the convex cone \begin{align*} \mathcal{K}'' &=\{\varphi=\Phi'':\Phi\in\mathcal{K}\}\\ &=\{0\}\cup\{\varphi\in\mathcal{C}^2(\mathbb{R}_+):\varphi > 0,\varphi\varphi''-2\varphi'^2\geq0\}. \end{align*}

Extremality ODE. For all $\varphi\in\mathcal{K}''\setminus\{0\}$, \[ \varphi\text{ is extreme in }\mathcal{K}''\quad\text{iff}\quad Q:=\varphi\varphi''-2\varphi'^2=0. \] Proof. Suppose that $Q\neq0$ : $Q(u_0) > 0$ for some $u_0$. Choose a compact interval $J$ and constants $m,\beta > 0$ with $u_0\in J$ and $Q\geq m$ and $\varphi\geq \beta$ on $J$. Let $\eta\in C^2_c(J)$, $\eta\neq0$. Then \[ \varphi=\tfrac{1}{2}\varphi_++\tfrac{1}{2}\varphi_- \quad\text{where}\quad \varphi_\pm:=\varphi\pm \varepsilon\eta, \quad\text{for all $\varepsilon > 0$}. \] If $\varepsilon < \frac{\beta}{2\|\eta\|_\infty}$, we have $\varphi_\pm\geq \beta/2 > 0$ on $J$, while $\varphi_\pm=\varphi$ outside $J$. Moreover \[ Q_{\pm} :=\varphi_\pm\varphi_\pm''-2\varphi_\pm'^2 =Q \pm \varepsilon L + \varepsilon^2R, \] where $L:=\varphi\eta''+\varphi''\eta-4\varphi'\eta'$ and $R:=\eta\eta''-2\eta'^2$ on $J$. Then, on $J$, \[ Q_\pm\geq m-\varepsilon L_0-\varepsilon^2R_0 \quad\text{where}\quad L_0:=\|L\|_{\infty,J}\quad\text{and}\quad R_0:=\|R\|_{\infty,J}. \] Thus, for small enough $\varepsilon > 0$, $Q_\pm\geq 0$ on $J$. Outside $J$, we have $\eta=0$ and $Q_\pm=Q\geq 0$. Hence $\varphi_\pm\in\mathcal{K}''$ for small enough $\varepsilon > 0$, and therefore $\frac{1}{2}\varphi_\pm\in\mathcal{K}''$ for small enough $\varepsilon > 0$.

Let us show that $\frac{1}{2}\varphi_\pm$ are not colinear for small enough $\varepsilon > 0$. Suppose $\varphi_+=c\varphi$ for some $c\geq 0$. Outside $J$ we have $\varphi_+=\varphi$, hence $\varphi=c\varphi$ outside $J$. Since $\varphi > 0$, this forces $c=1$, hence $\varphi_+=\varphi$ everywhere and $\varepsilon\eta\equiv 0$, contradiction. Thus $\varphi_+$ and $\varphi$ (and thus $\varphi_-$) are not colinear, and the decomposition $\varphi=\frac{1}{2}\varphi_-+\frac{1}{2}\varphi_+$ is not trivial.

Hence $Q\neq0$ implies that $\varphi$ is not extreme (equivalently, if $\varphi$ is extreme then $Q=0$).

Conversely, suppose that $Q=0$, and $\varphi=\varphi_1+\varphi_2$ with $\varphi_1,\varphi_2\in\mathcal{K}''$. Let us define the bivariate functions $F(u,v):=\varphi(u)v^2$ and $F_i(u,v):=\varphi_i(u)v^2$. Then \[ \nabla^2F=\nabla^2F_1+\nabla^2F_2,\quad \nabla^2F_i\succeq 0. \] Since $Q=0$, for all $v\neq 0$, the Hessian matrix \[ \nabla^2F(u,v)=\begin{pmatrix}\varphi''(u)v^2&2\varphi'(u)v\\2\varphi'(u)v&2\varphi(u)\end{pmatrix} \] has determinant $0$ and $(2,2)$ entry $2\varphi(u) > 0$, hence has rank $1$.

Fix $(u,v)$ with $v\neq 0$ and set $A=\nabla^{2}F_{1}(u,v)$ and $B=\nabla^{2}F_{2}(u,v)$. Then $A,B\succeq 0$ and $A+B$ has rank $1$. For positive semidefinite matrices, $\mathrm{range}(A)\subseteq \mathrm{range}(A+B)$ and $\mathrm{range}(B)\subseteq \mathrm{range}(A+B)$, so the range of $A$ and $B$ is in the same one-dimensional subspace. Hence there exists $\theta_{u,v}\in[0,1]$ such that \[ A=\theta_{u,v}(A+B). \] Comparing the $(2,2)$ entries gives $\varphi_1(u)=\theta_{u,v}\varphi(u)$, so $\theta_{u,v}$ depends only on $u$, say $\theta_{u,v}=\theta_u$. Comparing the $(1,2)$ entries gives $\varphi_1'(u)=\theta_u\varphi'(u)$. Since $\varphi_1=\theta\varphi$, differentiating yields $\varphi_1'=\theta'\varphi+\theta \varphi'$, hence $\theta'(u)\varphi(u)=0$ and $\theta'(u)=0$. Hence $\theta$ is constant. Finally $\varphi_1=\theta\varphi$ and $\varphi_2=(1-\theta)\varphi$, hence $\varphi_1$ and $\varphi_2$ are colinear, thus $\varphi$ is extreme.

Solving the extremality ODE $Q=0$. Let $\varphi\in\mathcal{K}''\setminus\{0\}$. Assume that $Q=0$. Then the function $\psi=1/\varphi$ satisfies $\psi''(u)=-Q(u)/\varphi(u)^3=0$, hence $\psi(u)=\alpha u+\beta$, and \[ \varphi(u)=\frac{1}{\alpha u+\beta}\quad\text{with}\quad\alpha u+\beta > 0\text{ on }\mathbb{R}_{+}. \] Up to multiplication by a positive constant, this yields

  • $\varphi(u)\equiv c$ with $c > 0$
  • $\varphi(u)=\frac{c}{u+t}$ with $c > 0$ and $u+t > 0$ on $\mathbb{R}_+$.

Back to $\mathcal{K}$ and $\Phi$, this gives, recalling that $\varphi=\Phi''$,

  • If $\Phi''(u)=c$, then $\Phi(u)=\frac{c}{2}u^2+\text{affine}$.
  • If $\Phi''(u)=\frac{c}{u+t}$, then $\Phi(u)=c(u+t)\log(u+t)+\text{affine}$.

Note that if we define $H(x):=x\log x-x$, $x > 0$, then for all $t\geq0$, the function \begin{align*} K_t(u)&:=H(u+t)-H(1+t)-(u-1)H'(1+t)\\ &=(u+t)\log(u+t)\underbrace{-(u+t)\log(1+t)-u+1}_{\mathrm{affine}}\\ &= (u+t)\log\frac{u+t}{1+t}-(u-1), \end{align*} $u > 0$, satisfies $K_t(1)=K_t'(1)=0$ and \[ K_t''(u)=\frac{1}{u+t}, \quad K_{t}'''(u)=-\frac{1}{(u+t)^2}, \quad K_{t}''''(u)=\frac{2}{(u+t)^3}. \]

Stieltjes-Nevanlinna classical theorem. This theorem (recalled here without proof) states that for all $\varphi:(0,\infty)\to[0,\infty)$, the following two properties are equivalent:

  • There exist a constant $c\geq0$ and a positive Borel measure $\nu$ on $[0,\infty)$ such that \[ \int\frac{1}{1+t}\mathrm{d}\nu(t) < \infty \quad\text{and}\quad \varphi(u)=c+\int\frac{1}{u+t}\mathrm{d}\nu(t)\quad\text{for all }u>0. \]
  • $\varphi$ extends to a holomorphic function on $\mathbb{C}\setminus(-\infty,0]$, satisfies \[ \varphi(x)\geq0\text{ for }x > 0,\quad \Im \varphi(z)\leq 0\text{ for }\Im z>0, \quad\text{and } \lim_{x\to+\infty} \varphi(x)\text{ exists in }[0,\infty). \]

In either case, $c=\lim_{x\to+\infty} \varphi(x)$ and $\nu$ is uniquely determined by $\varphi$.

We say that such a $\varphi$ is a Stieltjes function. Such a function is always $C^{\infty}$ on $(0,\infty)$ and completely monotone : $(-1)^{n}\varphi^{(n)}(u)\geq 0$ for all $n\geq 0$ and $u > 0$.

Integral representation of Stieltjes extremes. If $\Phi$ is $\mathcal{C}^4(\mathbb{R}_+)$ and $\varphi=\Phi''$ is a Stieltjes function, with integral representation \[ \varphi(u)=c+\int\frac{1}{u+t}\mathrm{d}\nu(t), \quad u > 0, \] then $\Phi\in\mathcal{K}$ and there exists an affine function $\ell$ such that for all $u > 0$, \[ \Phi(u)=\ell(u)+R(u) \quad\text{with}\quad R(u):=\frac{c}{2}u^2+\int K_t(u)\mathrm{d}\nu(t). \]

Proof. Since $K_t''(u)=1/(u+t)$, we have \[ \frac{\mathrm{d}^2}{\mathrm{d}u^2}\left(\frac{c}{2}u^2+\int K_{t}(u)\mathrm{d}\nu(t)\right) =c+\int\frac{1}{u+t}\mathrm{d}\nu(t)=\varphi(u). \] The differentiation under the integral is valid by dominated convergence thanks to the integrability properties of $\nu$, more precisely $K_t''(u)=1/(u+t)\leq\max(1,1/u)/(1+t)$ and $1/(u+t)^k\leq u^{1-k}/(u+t)$ for $k=2$ and $k=3$. Choosing the affine function $\ell(u)=(\Phi(1)-R(1))+(\Phi'(1)-R'(1))(u-1)$ yields the identity. Finally, for $u > 0$, we have \[ \varphi'(u)=-\int\frac{1}{(u+t)^2}\mathrm{d}\nu(t), \quad\text{and}\quad \varphi''(u)=2\int\frac{1}{(u+t)^3}\mathrm{d}\nu(t). \] By Cauchy-Schwarz in $L^2(\nu)$ applied to $(u+t)^{-1/2}$ and $(u+t)^{-3/2}$, \[ \left(\int\frac{1}{(u+t)^{2}}\mathrm{d}\nu(t)\right)^2 \le \left(\int\frac{1}{u+t}\mathrm{d}\nu(t)\right) \left(\int\frac{1}{(u+t)^3}\mathrm{d}\nu(t)\right), \] therefore \[ \varphi(u)\varphi''(u)-2\varphi'(u)^2 \geq 2c\int\frac{1}{(u+t)^3}\mathrm{d}\nu(t)\geq 0 \] so $(u,v)\mapsto\varphi(u)v^2$ is convex and $\Phi\in\mathcal{K}$.

Counter examples. For a non-affine $\Phi$, the condition of being in $\mathcal{K}$ is equivalent to concavity of $\psi:=1/\Phi''$ on $\mathbb{R}_+$, which is strictly weaker than the Stieltjes assumption (analytic continuation plus a half-plane sign condition). In other words, beyond Stieltjes functions, there exist non-affine elements of $\mathcal{K}$ that are not mixtures of shifts of $u\mapsto u\log(u)$. Let us give explicit counterexamples.

Our first counter example is $C^{\infty}$. Let $h\in C^\infty(\mathbb{R}_+)$, $h\neq0$, $h\geq0$, with compact support included in the interval $(1,2)$. Fix $\varepsilon > 0$ and define \[ g(u):=1+u-\varepsilon\int_0^u(u-s)h(s)\mathrm{d}s. \] Then $g\in C^\infty(\mathbb{R}_+)$ and $g$ is concave since $g''(u)=-\varepsilon h(u)\leq 0$. Moreover $g > 0$ on $\mathbb{R}_+$ for $\varepsilon$ small enough. Next we define \[ \varphi(u)=\Phi''(u)=\frac{1}{g(u)}, \] and we choose $\Phi(1)$ and $\Phi'(1)$ arbitrarily, integrating twice to obtain $\Phi\in C^\infty(\mathbb{R}_+)$. Now, since $1/\Phi''=g$ is concave, $\Phi\in\mathcal{K}$. However $g''$ vanishes identically on $(0,a)$, $a:=\inf(\mathrm{supp}\,h)\in(1,2)$, but is not identically zero on $(a,2)$, therefore $g$ is not real-analytic at $u=a$. Indeed, since $g''\equiv0$ on $(0,a)$, all derivatives of $g''$ at $a$ vanish. If $g''$ were real-analytic at $a$, it would vanish in a neighborhood of $a$, contradicting the definition of $a$. In particular, $\Phi''$ cannot be a Stieltjes function, since Stieltjes functions extend holomorphically to $\mathbb{C}\setminus(-\infty,0]$ and are therefore real-analytic on $(0,\infty)$.

Our second counter example is real-analytic. Let $h(u)=(1+u)^2\mathrm{e}^{-u}$, $u\geq0$, which is real-analytic and non-negative. Let $p > 0$ and $\varepsilon > 0$ and define \[ g(u)=1+pu-\varepsilon\int_{0}^{u}(u-s)h(s)\mathrm{d}s. \] Then $g$ is real-analytic on $\mathbb{R}_+$ and $g''(u)=-\varepsilon h(u)\leq 0$ so $g$ is concave. Moreover, \[ \int_0^\infty h(s)\mathrm{d}s = \int_0^\infty(1+s)^2\mathrm{e}^{-s}\mathrm{d}s = 5, \] so for any choice of parameters satisfying $\varepsilon< p/5$ we have, for all $u > 0$, \[ g(u)\geq 1+pu-\varepsilon u\int_0^\infty h(s)\mathrm{d}s = 1 + (p-5\varepsilon)u>0. \] Define $\varphi(u)=\Phi''(u)=1/g(u)$, $u > 0$, and integrate twice. Then $\Phi$ is real-analytic and belongs to $\mathcal{K}$ since $1/\Phi''=g$ is concave. Let us show now that $\Phi''$ is not a Stieltjes function for a concrete choice of parameters. Suppose that $\varphi$ is a Stieltjes function. Then it is completely monotone, in particular $\varphi'''(u)\leq 0$ for all $u > 0$. We have, for any smooth positive $g$, \[ \varphi'''(u)=\frac{-g(u)^2g'''(u)+6g(u)g'(u)g''(u)-6(g'(u))^3}{g(u)^4}. \] As $u\searrow0$ we have $g(0^+)=1$, $g'(0^+)=p$, $g''(0^+)=-\varepsilon h(0)=-\varepsilon$, and \[ g'''(0^+)=-\varepsilon h'(0^+)\quad h'(0^+)=\left.\frac{\mathrm{d}}{\mathrm{d}u}\bigl((1+u)^2\mathrm{e}^{-u}\bigr)\right|_{u=0}=1, \] so $g'''(0^+)=-\varepsilon$. Plugging into the numerator yields \[ -\varphi'''(0^+)g(0^+)^4 = -\bigl(\varepsilon(1-6p)-6p^3\bigr) = -\varepsilon(1-6p)+6p^3. \] The right hand side is $ < 0$ for instance if $p=10^{-2}$ and $\varepsilon=10^{-3}$, hence $\varphi'''(0^+) > 0$. By continuity of $\varphi'''$ as $u\searrow0$, $\varphi'''(u) > 0$ for all $u > 0$ small enough, contradicting complete monotonicity. In particular, $\Phi$ cannot admit the shifted-$u\log u$ representation.

Personal. The idea of studying the integral representation of the convex cone of $\Phi$-entropies in relation with the extremes $u\mapsto u^2$ and $u\mapsto u\log(u)$ is already suggested in my 2004 (J. Kyoto) and 2006 (ESAIM) papers. It is the motivation of this post.

Further reading. The notion of $\Phi$-entropy was generalized to matrices and operators using traces by Joel Tropp and Richard Y. Chen (2014), and a study of the associated convex cone was then conducted by Frank Hansen and Zhihua Zhang (2015).

  • On this blog
    Herglotz and Nevanlinna integral representation
    (2026-01-23)
  • On this blog
    About variance and entropy
    (2025-01-25)
  • Joel A. Tropp and Richard Yuhua Chen
    Subadditivity of matrix $\varphi$-entropy and concentration of random matrices
    Electronic Journal of Probability 19(27) 1-30 (2014)
  • Frank Hansen and Zhihua Zhang
    Characterisation of matrix entropies
    Letters in Mathematical Physics 105(10) 1399-1411 (2015)
Leave a Comment
Syntax · Style · .