
This post is about the structure of the set of $\Phi$ for which the $\Phi$-entropy is convex. It is a follow-up of a previous post published one year ago about variance and entropy. We connect it to a theorem by Stieltjes and Nevanlinna on an integral representation of certain smooth functions, which is the subject of another previous post.
$\Phi$-entropy and convexity. Let us consider a convex and $\mathcal{C}^4$ function $\Phi:I\to\mathbb{R}$ on an open interval $I\subset\mathbb{R}$. For every probability space $(\Omega,\mathcal{A},\mu)$, the following set is convex: \[ L^\Phi(\mu)=\{f:\Omega\to I\mid f\in L^1(\mu),\Phi(f)\in L^1(\mu)\} \] The $\Phi$-entropy of $f\in L^\Phi(\mu)$ is defined by \[ \mathrm{Ent}_\mu^\Phi(f)=\int\Phi(f)\mathrm{d}\mu-\Phi\Bigl(\int f\mathrm{d}\mu\Bigr). \] Moreover the following conditions are equivalent:
- $f\mapsto\mathrm{Ent}_\mu^\Phi(f)$ is convex for all $(\Omega,\mathcal{A},\mu)$.
- $(u,v)\mapsto\Phi''(u)v^2$ is convex
- either $\Phi$ is affine, or $\Phi'' > 0$ and $1/\Phi''$ is concave (in other words $\Phi''\Phi''''\geq 2\Phi'''^2$).
We denote by $\mathcal{K}$ the set of such $\Phi$. For simplicity, we take from now on $I=\mathbb{R}_+=(0,+\infty)$.
Examples. The set $\mathcal{K}$ is convex, and contains all the affine functions. Basic non-affine examples of elements of $\mathcal{K}$ are
- $\Phi(u)=u\log(u)$
- $\Phi(u)=u^p$, $1 < p\leq 2$.
The set $\mathcal{K}$ is a convex cone in the sense that it is stable by linear combinations with non-negative coefficients. Since it contains all affine functions, we have $u\mapsto\frac{u^p-u}{p-1}\in\mathcal{K}$ for all $1 < p\leq 2$, and we recover $u\mapsto u\log(u)$ as $p\to1$.
The case $u\mapsto u^p$, $1 < p<2$ is not extreme in the sense that it is a mixture (convex conic combination) of affine functions and shifts of $u\mapsto u\log(u)$, more precisely \[ u^p=c_p\int_0^\infty t^{p-2}\varphi_t(u)\mathrm{d}t \] where \begin{align*} c_p&=\frac{p(p-1)\sin(\pi(p-1))}{\pi}\\ \varphi_t(u)&=(u+t)\log(u+t)\underbrace{-(1+\log(t))u-t\log(t)}_{\text{affine}} \end{align*} Indeed, this comes from the Stieltjes identity, \[ \int_0^\infty\frac{t^{a-1}}{u+t}\mathrm{d}t=\frac{\pi}{\sin(\pi a)}u^{a-1}, \quad 0 < a<1,\quad u>0, \] which boils down to an Euler Beta integral via the substitutions $s=t/u$, $x=1/(1+s)$ as \begin{align*} \int_0^\infty\frac{t^{a-1}}{u+t}\mathrm{d}t &=u^{a-1}\int_0^\infty\frac{s^{a-1}}{1+s}\mathrm{d}s\\ &=u^{a-1}\int_0^1(1-x)^{a-1}x^{-a}\mathrm{d}x =u^{a-1}\frac{\Gamma(a)\Gamma(1-a)}{\Gamma(1)} =u^{a-1}\frac{\pi}{\sin(\pi a)} \end{align*} where the last equality follows from the Euler reflection formula.
We could ask about the extremality of $u\mapsto u\log(u)$ and $u\mapsto u^2$, and some sort of integral representation of elements of $\mathcal{K}$ as mixtures of extreme points.
Extremality. The vector space of affine functions $\mathcal{A}=\{u\mapsto au+b:a,b\in\mathbb{R}\}$ is contained in $\mathcal{K}$. Thus $\mathcal{K}$ has no extreme points, indeed for all $f\in\mathcal{K}$ and $a\in\mathcal{A}$, $f=(f+a)+(-a)$ which is the sum of two elements of $\mathcal{K}$. A natural way to remove the affine part is to work with the second derivative, namely to consider the convex cone \begin{align*} \mathcal{K}'' &=\{\varphi=\Phi'':\Phi\in\mathcal{K}\}\\ &=\{0\}\cup\{\varphi\in\mathcal{C}^2(\mathbb{R}_+):\varphi > 0,\varphi\varphi''-2\varphi'^2\geq0\}. \end{align*}
Extremality ODE. For all $\varphi\in\mathcal{K}''\setminus\{0\}$, \[ \varphi\text{ is extreme in }\mathcal{K}''\quad\text{iff}\quad Q:=\varphi\varphi''-2\varphi'^2=0. \] Proof. Suppose that $Q\neq0$ : $Q(u_0) > 0$ for some $u_0$. Choose a compact interval $J$ and constants $m,\beta > 0$ with $u_0\in J$ and $Q\geq m$ and $\varphi\geq \beta$ on $J$. Let $\eta\in C^2_c(J)$, $\eta\neq0$. Then \[ \varphi=\tfrac{1}{2}\varphi_++\tfrac{1}{2}\varphi_- \quad\text{where}\quad \varphi_\pm:=\varphi\pm \varepsilon\eta, \quad\text{for all $\varepsilon > 0$}. \] If $\varepsilon < \frac{\beta}{2\|\eta\|_\infty}$, we have $\varphi_\pm\geq \beta/2 > 0$ on $J$, while $\varphi_\pm=\varphi$ outside $J$. Moreover \[ Q_{\pm} :=\varphi_\pm\varphi_\pm''-2\varphi_\pm'^2 =Q \pm \varepsilon L + \varepsilon^2R, \] where $L:=\varphi\eta''+\varphi''\eta-4\varphi'\eta'$ and $R:=\eta\eta''-2\eta'^2$ on $J$. Then, on $J$, \[ Q_\pm\geq m-\varepsilon L_0-\varepsilon^2R_0 \quad\text{where}\quad L_0:=\|L\|_{\infty,J}\quad\text{and}\quad R_0:=\|R\|_{\infty,J}. \] Thus, for small enough $\varepsilon > 0$, $Q_\pm\geq 0$ on $J$. Outside $J$, we have $\eta=0$ and $Q_\pm=Q\geq 0$. Hence $\varphi_\pm\in\mathcal{K}''$ for small enough $\varepsilon > 0$, and therefore $\frac{1}{2}\varphi_\pm\in\mathcal{K}''$ for small enough $\varepsilon > 0$.
Let us show that $\frac{1}{2}\varphi_\pm$ are not colinear for small enough $\varepsilon > 0$. Suppose $\varphi_+=c\varphi$ for some $c\geq 0$. Outside $J$ we have $\varphi_+=\varphi$, hence $\varphi=c\varphi$ outside $J$. Since $\varphi > 0$, this forces $c=1$, hence $\varphi_+=\varphi$ everywhere and $\varepsilon\eta\equiv 0$, contradiction. Thus $\varphi_+$ and $\varphi$ (and thus $\varphi_-$) are not colinear, and the decomposition $\varphi=\frac{1}{2}\varphi_-+\frac{1}{2}\varphi_+$ is not trivial.
Hence $Q\neq0$ implies that $\varphi$ is not extreme (equivalently, if $\varphi$ is extreme then $Q=0$).
Conversely, suppose that $Q=0$, and $\varphi=\varphi_1+\varphi_2$ with $\varphi_1,\varphi_2\in\mathcal{K}''$. Let us define the bivariate functions $F(u,v):=\varphi(u)v^2$ and $F_i(u,v):=\varphi_i(u)v^2$. Then \[ \nabla^2F=\nabla^2F_1+\nabla^2F_2,\quad \nabla^2F_i\succeq 0. \] Since $Q=0$, for all $v\neq 0$, the Hessian matrix \[ \nabla^2F(u,v)=\begin{pmatrix}\varphi''(u)v^2&2\varphi'(u)v\\2\varphi'(u)v&2\varphi(u)\end{pmatrix} \] has determinant $0$ and $(2,2)$ entry $2\varphi(u) > 0$, hence has rank $1$.
Fix $(u,v)$ with $v\neq 0$ and set $A=\nabla^{2}F_{1}(u,v)$ and $B=\nabla^{2}F_{2}(u,v)$. Then $A,B\succeq 0$ and $A+B$ has rank $1$. For positive semidefinite matrices, $\mathrm{range}(A)\subseteq \mathrm{range}(A+B)$ and $\mathrm{range}(B)\subseteq \mathrm{range}(A+B)$, so the range of $A$ and $B$ is in the same one-dimensional subspace. Hence there exists $\theta_{u,v}\in[0,1]$ such that \[ A=\theta_{u,v}(A+B). \] Comparing the $(2,2)$ entries gives $\varphi_1(u)=\theta_{u,v}\varphi(u)$, so $\theta_{u,v}$ depends only on $u$, say $\theta_{u,v}=\theta_u$. Comparing the $(1,2)$ entries gives $\varphi_1'(u)=\theta_u\varphi'(u)$. Since $\varphi_1=\theta\varphi$, differentiating yields $\varphi_1'=\theta'\varphi+\theta \varphi'$, hence $\theta'(u)\varphi(u)=0$ and $\theta'(u)=0$. Hence $\theta$ is constant. Finally $\varphi_1=\theta\varphi$ and $\varphi_2=(1-\theta)\varphi$, hence $\varphi_1$ and $\varphi_2$ are colinear, thus $\varphi$ is extreme.
Solving the extremality ODE $Q=0$. Let $\varphi\in\mathcal{K}''\setminus\{0\}$. Assume that $Q=0$. Then the function $\psi=1/\varphi$ satisfies $\psi''(u)=-Q(u)/\varphi(u)^3=0$, hence $\psi(u)=\alpha u+\beta$, and \[ \varphi(u)=\frac{1}{\alpha u+\beta}\quad\text{with}\quad\alpha u+\beta > 0\text{ on }\mathbb{R}_{+}. \] Up to multiplication by a positive constant, this yields
- $\varphi(u)\equiv c$ with $c > 0$
- $\varphi(u)=\frac{c}{u+t}$ with $c > 0$ and $u+t > 0$ on $\mathbb{R}_+$.
Back to $\mathcal{K}$ and $\Phi$, this gives, recalling that $\varphi=\Phi''$,
- If $\Phi''(u)=c$, then $\Phi(u)=\frac{c}{2}u^2+\text{affine}$.
- If $\Phi''(u)=\frac{c}{u+t}$, then $\Phi(u)=c(u+t)\log(u+t)+\text{affine}$.
Note that if we define $H(x):=x\log x-x$, $x > 0$, then for all $t\geq0$, the function \begin{align*} K_t(u)&:=H(u+t)-H(1+t)-(u-1)H'(1+t)\\ &=(u+t)\log(u+t)\underbrace{-(u+t)\log(1+t)-u+1}_{\mathrm{affine}}\\ &= (u+t)\log\frac{u+t}{1+t}-(u-1), \end{align*} $u > 0$, satisfies $K_t(1)=K_t'(1)=0$ and \[ K_t''(u)=\frac{1}{u+t}, \quad K_{t}'''(u)=-\frac{1}{(u+t)^2}, \quad K_{t}''''(u)=\frac{2}{(u+t)^3}. \]
Stieltjes-Nevanlinna classical theorem. This theorem (recalled here without proof) states that for all $\varphi:(0,\infty)\to[0,\infty)$, the following two properties are equivalent:
- There exist a constant $c\geq0$ and a positive Borel measure $\nu$ on $[0,\infty)$ such that \[ \int\frac{1}{1+t}\mathrm{d}\nu(t) < \infty \quad\text{and}\quad \varphi(u)=c+\int\frac{1}{u+t}\mathrm{d}\nu(t)\quad\text{for all }u>0. \]
- $\varphi$ extends to a holomorphic function on $\mathbb{C}\setminus(-\infty,0]$, satisfies \[ \varphi(x)\geq0\text{ for }x > 0,\quad \Im \varphi(z)\leq 0\text{ for }\Im z>0, \quad\text{and } \lim_{x\to+\infty} \varphi(x)\text{ exists in }[0,\infty). \]
In either case, $c=\lim_{x\to+\infty} \varphi(x)$ and $\nu$ is uniquely determined by $\varphi$.
We say that such a $\varphi$ is a Stieltjes function. Such a function is always $C^{\infty}$ on $(0,\infty)$ and completely monotone : $(-1)^{n}\varphi^{(n)}(u)\geq 0$ for all $n\geq 0$ and $u > 0$.
Integral representation of Stieltjes extremes. If $\Phi$ is $\mathcal{C}^4(\mathbb{R}_+)$ and $\varphi=\Phi''$ is a Stieltjes function, with integral representation \[ \varphi(u)=c+\int\frac{1}{u+t}\mathrm{d}\nu(t), \quad u > 0, \] then $\Phi\in\mathcal{K}$ and there exists an affine function $\ell$ such that for all $u > 0$, \[ \Phi(u)=\ell(u)+R(u) \quad\text{with}\quad R(u):=\frac{c}{2}u^2+\int K_t(u)\mathrm{d}\nu(t). \]
Proof. Since $K_t''(u)=1/(u+t)$, we have \[ \frac{\mathrm{d}^2}{\mathrm{d}u^2}\left(\frac{c}{2}u^2+\int K_{t}(u)\mathrm{d}\nu(t)\right) =c+\int\frac{1}{u+t}\mathrm{d}\nu(t)=\varphi(u). \] The differentiation under the integral is valid by dominated convergence thanks to the integrability properties of $\nu$, more precisely $K_t''(u)=1/(u+t)\leq\max(1,1/u)/(1+t)$ and $1/(u+t)^k\leq u^{1-k}/(u+t)$ for $k=2$ and $k=3$. Choosing the affine function $\ell(u)=(\Phi(1)-R(1))+(\Phi'(1)-R'(1))(u-1)$ yields the identity. Finally, for $u > 0$, we have \[ \varphi'(u)=-\int\frac{1}{(u+t)^2}\mathrm{d}\nu(t), \quad\text{and}\quad \varphi''(u)=2\int\frac{1}{(u+t)^3}\mathrm{d}\nu(t). \] By Cauchy-Schwarz in $L^2(\nu)$ applied to $(u+t)^{-1/2}$ and $(u+t)^{-3/2}$, \[ \left(\int\frac{1}{(u+t)^{2}}\mathrm{d}\nu(t)\right)^2 \le \left(\int\frac{1}{u+t}\mathrm{d}\nu(t)\right) \left(\int\frac{1}{(u+t)^3}\mathrm{d}\nu(t)\right), \] therefore \[ \varphi(u)\varphi''(u)-2\varphi'(u)^2 \geq 2c\int\frac{1}{(u+t)^3}\mathrm{d}\nu(t)\geq 0 \] so $(u,v)\mapsto\varphi(u)v^2$ is convex and $\Phi\in\mathcal{K}$.
Counter examples. For a non-affine $\Phi$, the condition of being in $\mathcal{K}$ is equivalent to concavity of $\psi:=1/\Phi''$ on $\mathbb{R}_+$, which is strictly weaker than the Stieltjes assumption (analytic continuation plus a half-plane sign condition). In other words, beyond Stieltjes functions, there exist non-affine elements of $\mathcal{K}$ that are not mixtures of shifts of $u\mapsto u\log(u)$. Let us give explicit counterexamples.
Our first counter example is $C^{\infty}$. Let $h\in C^\infty(\mathbb{R}_+)$, $h\neq0$, $h\geq0$, with compact support included in the interval $(1,2)$. Fix $\varepsilon > 0$ and define \[ g(u):=1+u-\varepsilon\int_0^u(u-s)h(s)\mathrm{d}s. \] Then $g\in C^\infty(\mathbb{R}_+)$ and $g$ is concave since $g''(u)=-\varepsilon h(u)\leq 0$. Moreover $g > 0$ on $\mathbb{R}_+$ for $\varepsilon$ small enough. Next we define \[ \varphi(u)=\Phi''(u)=\frac{1}{g(u)}, \] and we choose $\Phi(1)$ and $\Phi'(1)$ arbitrarily, integrating twice to obtain $\Phi\in C^\infty(\mathbb{R}_+)$. Now, since $1/\Phi''=g$ is concave, $\Phi\in\mathcal{K}$. However $g''$ vanishes identically on $(0,a)$, $a:=\inf(\mathrm{supp}\,h)\in(1,2)$, but is not identically zero on $(a,2)$, therefore $g$ is not real-analytic at $u=a$. Indeed, since $g''\equiv0$ on $(0,a)$, all derivatives of $g''$ at $a$ vanish. If $g''$ were real-analytic at $a$, it would vanish in a neighborhood of $a$, contradicting the definition of $a$. In particular, $\Phi''$ cannot be a Stieltjes function, since Stieltjes functions extend holomorphically to $\mathbb{C}\setminus(-\infty,0]$ and are therefore real-analytic on $(0,\infty)$.
Our second counter example is real-analytic. Let $h(u)=(1+u)^2\mathrm{e}^{-u}$, $u\geq0$, which is real-analytic and non-negative. Let $p > 0$ and $\varepsilon > 0$ and define \[ g(u)=1+pu-\varepsilon\int_{0}^{u}(u-s)h(s)\mathrm{d}s. \] Then $g$ is real-analytic on $\mathbb{R}_+$ and $g''(u)=-\varepsilon h(u)\leq 0$ so $g$ is concave. Moreover, \[ \int_0^\infty h(s)\mathrm{d}s = \int_0^\infty(1+s)^2\mathrm{e}^{-s}\mathrm{d}s = 5, \] so for any choice of parameters satisfying $\varepsilon< p/5$ we have, for all $u > 0$, \[ g(u)\geq 1+pu-\varepsilon u\int_0^\infty h(s)\mathrm{d}s = 1 + (p-5\varepsilon)u>0. \] Define $\varphi(u)=\Phi''(u)=1/g(u)$, $u > 0$, and integrate twice. Then $\Phi$ is real-analytic and belongs to $\mathcal{K}$ since $1/\Phi''=g$ is concave. Let us show now that $\Phi''$ is not a Stieltjes function for a concrete choice of parameters. Suppose that $\varphi$ is a Stieltjes function. Then it is completely monotone, in particular $\varphi'''(u)\leq 0$ for all $u > 0$. We have, for any smooth positive $g$, \[ \varphi'''(u)=\frac{-g(u)^2g'''(u)+6g(u)g'(u)g''(u)-6(g'(u))^3}{g(u)^4}. \] As $u\searrow0$ we have $g(0^+)=1$, $g'(0^+)=p$, $g''(0^+)=-\varepsilon h(0)=-\varepsilon$, and \[ g'''(0^+)=-\varepsilon h'(0^+)\quad h'(0^+)=\left.\frac{\mathrm{d}}{\mathrm{d}u}\bigl((1+u)^2\mathrm{e}^{-u}\bigr)\right|_{u=0}=1, \] so $g'''(0^+)=-\varepsilon$. Plugging into the numerator yields \[ -\varphi'''(0^+)g(0^+)^4 = -\bigl(\varepsilon(1-6p)-6p^3\bigr) = -\varepsilon(1-6p)+6p^3. \] The right hand side is $ < 0$ for instance if $p=10^{-2}$ and $\varepsilon=10^{-3}$, hence $\varphi'''(0^+) > 0$. By continuity of $\varphi'''$ as $u\searrow0$, $\varphi'''(u) > 0$ for all $u > 0$ small enough, contradicting complete monotonicity. In particular, $\Phi$ cannot admit the shifted-$u\log u$ representation.
Personal. The idea of studying the integral representation of the convex cone of $\Phi$-entropies in relation with the extremes $u\mapsto u^2$ and $u\mapsto u\log(u)$ is already suggested in my 2004 (J. Kyoto) and 2006 (ESAIM) papers. It is the motivation of this post.
Further reading. The notion of $\Phi$-entropy was generalized to matrices and operators using traces by Joel Tropp and Richard Y. Chen (2014), and a study of the associated convex cone was then conducted by Frank Hansen and Zhihua Zhang (2015).
- On this blog
Herglotz and Nevanlinna integral representation
(2026-01-23) - On this blog
About variance and entropy
(2025-01-25) - Joel A. Tropp and Richard Yuhua Chen
Subadditivity of matrix $\varphi$-entropy and concentration of random matrices
Electronic Journal of Probability 19(27) 1-30 (2014) - Frank Hansen and Zhihua Zhang
Characterisation of matrix entropies
Letters in Mathematical Physics 105(10) 1399-1411 (2015)