Press "Enter" to skip to content

About the Jensen inequality

Johan Jensen (mathematician)

The idea of writing this tiny post came after a coffee break discussion with MAP432 colleagues. All is about the Jensen inequality, one of my favorite inequality, which connects convexity and probability in a rigid way. Equality cases and strict convexity are not considered here.

Jensen inequality. In its simplest form, the Jensen inequality states that if \( {\varphi:\mathbb{R}\rightarrow\mathbb{R}} \) is a convex function and if \( {X} \) is a real random variable such that \( {X} \) and \( {\varphi(X)} \) are integrable, then

\[ \varphi(\mathbb{E}(X))\leq\mathbb{E}(\varphi(X)). \]

Geometric proof. The function \( {\varphi} \) is convex iff it is equal to the envelope of all its affine lower bounds (of course, only those in contact are relevant, which leads to Legendre transform and convex duality, but we do not need these subtleties here). Namely

\[ \varphi=\sup_{f\in\mathcal{F}}f \]

where \( {\mathcal{F}} \) is the family of all affine functions \( {f} \) such that \( {f\leq\varphi} \). But for any affine function \( {f} \),

\[ f(\mathbb{E}(X))=\mathbb{E}(f(X)), \]

and therefore

\[ \varphi(\mathbb{E}(X)) =\sup_{f\in\mathcal{F}}f(\mathbb{E}(X)) =\sup_{f\in\mathcal{F}}\mathbb{E}(f(X)) \leq\mathbb{E}(\sup_{f\in\mathcal{F}}f(X)) =\mathbb{E}(\varphi(X)). \]

This proof can be extended to the case where \( {\varphi} \) may take the value \( {+\infty} \), by reduction to the convex set \( {\{\varphi<\infty\}} \). It can be also extended to the multivariate case where \( {\varphi} \) is defined on \( {\mathbb{R}^d} \) and \( {X} \) is a random vector of \( {\mathbb{R}^d} \), by taking affine forms (i.e. affine hyperplanes) instead.

Probabilistic proof. We have already seen this in a previous post. Let \( {{(X_n)}_{n\geq1}} \) be i.i.d. copies of \( {X} \). Since \( {\varphi} \) is convex, for any \( {n\geq1} \),

\[ \varphi\left(\frac{X_1+\cdots+X_n}{n}\right) \leq\frac{\varphi(X_1)+\cdots+\varphi(X_n)}{n}. \]

Now, since \( {X} \) and \( {\varphi(X)} \) are integrable, it remains to use the strong law of large numbers for both sides, for an \( {\omega} \) lying in the intersection of the left side and right side almost sure sets (necessarily not empty!). For the left hand side, we also need to use the fact that \( {\varphi} \) is continuous, which follows from convexity.

Here again, the proof can be extended to the case where \( {\varphi} \) may take the value \( {+\infty} \), and to the multivariate case. This proof is very quick, but relies on the strong law of large numbers, which is a non trivial result. One can use the weak law of large numbers instead, but the proof is then less beautiful. More generally, the law of large numbers is only used to produce an empirical probability measure which converges to the law of \( {X} \), and the randomness is a nuisance, not a feature.

Integrability. Let \( {X} \) be a real random variable and let \( {\varphi:\mathbb{R}\rightarrow\mathbb{R}} \) be a convex function. It turns out that if \( {X} \) is integrable then \( {\mathbb{E}(\varphi(X))} \) makes sense. Conversely, if \( {\varphi(X)} \) is integrable and \( {\varphi} \) is not constant then \( {\mathbb{E}(X)} \) makes sense.

To see it, let us assume that \( {X} \) is integrable, and let us show then that \( {\varphi(X)_-} \) is integrable, in other words that \( {\mathbb{E}(\varphi(X))} \) makes always sense in \( {\mathbb{R}\cup\{+\infty\}} \). Since \( {\varphi} \) is convex, there exists an affine function \( {f} \) (possibly constant) such that \( {f\leq\varphi} \). Since \( {X} \) is integrable and \( {f} \) is affine, it follows that \( {f(X)} \) is integrable. Therefore, \( {\varphi(X)} \) admits an integrable lower bound. Let us define

\[ U:=f(X)\leq\varphi(X)=:V. \]

We have

\[ V_-=-V\mathbf{1}_{V\leq0}\leq -U\mathbf{1}_{V\leq0}\leq|U| \]

and therefore \( {V_-} \) is integrable, and thus \( {\mathbb{E}(V)} \) has always a meaning in \( {\mathbb{R}\cup\{+\infty\}} \).

Conversely, let us assume that \( {\varphi(X)} \) is integrable and that \( {\varphi} \) is not constant. Since \( {\varphi} \) is convex and non constant, we have \( {\lim_{x\rightarrow-\infty}\varphi(x)=+\infty} \) or \( {\lim_{x\rightarrow+\infty}\varphi(x)=+\infty} \). Let us show then that

  1. \( {X_+} \) is integrable when \( {\lim_{x\rightarrow+\infty}\varphi(x)=+\infty} \) (example: \( {\varphi(x)=e^x} \));
  2. \( {X_-} \) is integrable when \( {\lim_{x\rightarrow-\infty}\varphi(x)=+\infty} \) (example: \( {\varphi(x)=e^{-x}} \));
  3. \( {X} \) is integrable when \( {\lim_{|x|\rightarrow+\infty}\varphi(x)=+\infty} \) (example: \( {\varphi(x)=|x|^p} \), \( {p\geq1} \)).

The last statement means \( {\mathrm{L}^1\subset\mathrm{L}^{\varphi}} \) when \( {\varphi} \) is convex with \( {\lim_{|x|\rightarrow+\infty}\varphi(x)=+\infty} \). Obviously, one may deduce the integrability of \( {X} \) from the integrability of \( {\varphi(X)} \) when \( {\varphi} \) is affine, except if \( {\varphi} \) is constant. More generally, if \( {\varphi} \) is convex and not constant, then there exits a non constant affine function \( {f} \) such that \( {f(X)\leq\varphi(X)} \). If \( {f} \) is increasing (equivalently if \( {\lim_{x\rightarrow+\infty}\varphi(x)=+\infty} \)) then \( {X} \) has an integrable upper bound \( {V} \) and thus \( {X_+} \) is integrable because

\[ X_+=X\mathbf{1}_{X\geq0}\leq V\mathbf{1}_{X\geq0}\leq|V|. \]

If \( {f} \) is decreasing (equivalently if \( {\lim_{x\rightarrow-\infty}\varphi(x)=+\infty} \)) then \( {X} \) has an integrable lower bound \( {V} \) and thus \( {X_-} \) is integrable because

\[ X_-=-X\mathbf{1}_{X\leq0}\leq-V\mathbf{1}_{X\leq0}\leq|V|. \]

Remark. Suppose that \( {X} \) is integrable and that we do not know if \( {\varphi(X)} \) is integrable or not. For every deterministic threshold \( {n\geq0} \), the function \( {\varphi_n=\max(-n,\varphi)} \) is convex as a maximum of convex functions. Since \( {\varphi_n\geq-n} \), the quantity \( {\mathbb{E}(\varphi_n(X))} \) makes sense in \( {[-n,+\infty]} \), and we obtain, say by using the envelope approach above,

\[ \varphi(\mathbb{E}(X)) \leq\varphi_n(\mathbb{E}(X)) \leq\mathbb{E}(\varphi_n(X)). \]

We have \( {\varphi_n=\varphi_+-\min(n,\varphi_-)} \), and thus, if \( {\varphi_+(X)} \) is integrable then by monotone convergence, \( {\mathbb{E}(\varphi_-(X))\leq\mathbb{E}(\varphi_+(X))-\varphi(\mathbb{E}(X))<\infty} \), and then \( {\varphi(X)} \) is integrable.

    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Syntax · Style · .