Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Log-Sobolev and Bakry-Émery

Leonard Gross (1931 -)
Leonard Gross (1931 — )

This post is formed with the rough notes that I have prepared for a long informal talk given on January 3, 2023, at Paris-Dauphine, around log-Sobolev inequalities and the Bakry-Émery criterion. Many aspects are already in Master 2 Lecture Notes written with Joseph Lehec.

The logarithmic Sobolev inequality (LSI) concept was forged by Leonard Gross (1931 -) in 1975, as a reformulation of the hypercontractivity of a Markov semigroup. More precisely, if $(P_t)_{t\geq0}=(\mathrm{e}^{tL})_{t\geq}$ is a Markov semigroup with infinitesimal generator $L$ and invariant probability measure $\mu$, and if $L$ is a diffusion or if $\mu$ is reversible, then for all constant $c>0$,
\[
\|P_t(f)\|_{1+(p-1)\mathrm{e}^{4t/c}}\leq\|f\|_p,\quad \forall t\geq0,\forall p\geq1,\forall f,
\] if and only if
\[
\int f^2\log(f^2)\mathrm{d}\mu
\leq -c\int fLf\mathrm{d}\mu+\int f^2\mathrm{d}\mu\log\int f^2\mathrm{d}\mu\,\quad\forall f.
\] The name hypercontractivity comes from the fact that $1+(p-1)\mathrm{e}^{4t/c}>p$ if $t>0$. The name LSI comes from an analogy with classical Sobolev inequalities. The logarithm in the LSI comes from hypercontractivity as a derivative of the $L^p$ norm with respect to $p$.
\[
\partial_p\int |f|^p\mathrm{d}\mu=\int f^p\log|f|\mathrm{d}\mu.
\] The assumption of being a diffusion or reversible allows to transform the LSI into a $p$-homogeneous statement by a simple power change of function. The hypercontractivity inequality is an equality at time $t=0$, and the LSI is its infinitesimal version, via $L=\partial_{t=0}P_t$. The same holds for all $t$ due to the Markov nature of the semigroup and the invariance of $\mu$.

Still in the reversible Markovian context, the LSI is also equivalent to the sub-exponential decay in time of the relative entropy along the dynamics, namely, denoting $f_t:=P_t(f)$ for $f\geq0$,
$$
\int f_t\log(f_t)\mathrm{d}\mu
\leq\mathrm{e}^{-4t/c}\int f_0\log(f_0)\mathrm{d}\mu,\quad\forall f\geq0,\forall t\geq0.
$$ The Boltzmannian H-theorem for the (linear) evolution equation $\partial_tf_t=Lf_t$ reads
$$\partial_t\int f_t\log(f_t)\mathrm{d}\mu=\int\log(f_t)Lf_t\mathrm{d}t\leq0$$ while the sub-exponential decay above is the corresponding Cercignany theorem. In this context, and beyond the monotonicity, the Bakry-Émery criterion provides a convexity in time, as well as the exponential decay via a Grönwall lemma. It is this connection with kinetic theory and the Boltzmann PDE that brought Cédric Villani to the domain in the late 1990s.

We know nowadays that the LSI is linked with information theory, functional analysis, analysis on manifolds, statistical mechanics, harmonic analysis, analysis of PDE, stochastic processes, free probability, high dimensional probability, high dimensional statistics, among other fields. Inspired by Cédric Villani, a historical note by Michel Ledoux gathers 15 proofs of the Gaussian LSI.

Leonard Gross should not be confused with the physicist David J. Gross (1941 — ), the theoretical physicist Eugene P. Gross (1926 — 1991), the mathematicians Benedict Hyman Gross (1950 — ) and Mark Gross (1965 – his son!), among other big Gross.

The Gaussian LSI was already studied, in its Lebesgue form in 1959 by the information theorist Aart Johannes Stam (1929 — 2020), in 1969 by the mathematical physicist Paul Gerard Federbush (1934 — ), an academic grandson of Enrico Fermi, and the academic grandfather of Roland Bauerschmidt, and in 1975 by William G. Faris (1939 — ).

In the 1980s, the LSI and related functional inequalities were also studied by Paul-André Meyer and Dominique Bakry, Michel Émery, as well as Michel Ledoux, Daniel Stroock, Oscar Rothaus, … In the 1990s, came Laurent Miclo, William Beckner, Laurent Saloff-Coste, Persi Diaconis, …, but also, for statistical mechanics, Horng-Tzer Yau, Boguslaw Zegarlinski, Fabio Martinelli, Thierry Bodineau, Bernard Helffer, … In the 2000s, came Sergey Bobkov, Cédric Villani, Liming Wu, Patrick Cattiaux, Arnaud Guillin,…

Paul Gerard Federbush (1934 -)
Paul Gerard Federbush (1934 — )

Variance, entropies, Fisher.
\begin{align*}
\mathrm{Var}_{\mu}(f)&=\int f^2\mathrm{d}\mu-\left(\int f\mathrm{d}\mu\right)^2=\mathrm{Var}(f(X))\quad\text{where $X\sim\mu$}\\
\mathrm{Ent}_{\mu} (f)&=\int f \log f\mathrm{d}\mu – \left( \int f\mathrm{d}\mu \right)\log \left( \int f\mathrm{d}\mu \right),\quad f\geq0\\
\mathrm{Ent}_{\mu}^\Phi(f)&=\int\Phi(f)\mathrm{d}\mu-\Phi\Bigr(\int f\mathrm{d}\mu\Bigr) =\mathbb{E}(\Phi(f(X)))-\Phi(\mathbb{E}(f(X))\\
\Phi(u)&=u^2, u\in\mathbb{R}\\
\Phi(u)&=u\log(u), u\geq0
\end{align*} To make it $2$-homogeneous like the variance : $\mathrm{Ent}_{\mu}(f^2)$
Kullback-Leibler divergence (relative entropy) : $\mathrm{H}(\nu\mid\mu)\geq0$, $\mathrm{d}\nu:=f\mathrm{d}\mu$
On $\mathbb{R}^n$ relation to statistical physics/mechanics :

  • Boltzmann (or Shannon) entropy and Boltzmann-Gibbs probability measure
    \[
    \mathrm{S}(\nu):=-\int\frac{\mathrm{d}\nu}{\mathrm{d}x}\log\frac{\mathrm{d}\nu}{\mathrm{d}x}\mathrm{d}x
    \quad\text{and}\quad
    \mu_\beta:=\frac{1}{Z_\beta}\mathrm{e}^{-\beta
    V}\mathrm{d}x
    \]
  • Maximum entropy at fixed average energy
    \[
    \mathrm{S}(\mu_\beta)-\mathrm{S}(\nu)=\mathrm{H}(\nu\mid\mu_\beta)\geq0 \quad\text{if}\quad \int V\mathrm{d}\mu_\beta=\int V\mathrm{d}\nu
    \]
  • Minimum Helmholtz free energy via penalization
    \begin{align*}
    \mathrm{F}(\nu)&:=\int V\mathrm{d}\nu-\frac{1}{\beta}\mathrm{S}(\nu)\\
    \mathrm{F}(\nu)-\mathrm{F}(\mu_\beta) &= \frac{1}{\beta}\mathrm{H}(\nu\mid\mu_\beta)\geq0\\
    \mathrm{F}(\mu_\beta)&=-\frac{1}{\beta}\log Z_\beta.
    \end{align*}
  • Fisher information (statistics, information theory) $\mathrm{d}\nu=f\mathrm{d}\mu$,
    \[
    \mathrm{I}(\nu\mid\mu)
    =\int\frac{|\nabla f|^2}{f}\mathrm{d}\mu
    =\int|\nabla\log f|^2\mathrm{d}\nu.
    \]

Poincaré and log-Sobolev inequalities.
Here on $\mathbb{R}^n$. For a class $\mathrm{I}$ of test functions $\mathbb{R}^n\to\mathbb{R}$, $\exists c<\infty$, $\forall f\in\mathcal{F}$,
\begin{align*}
\mathrm{Var}_\mu (f)
&\leq c_{\mathrm{PI}}\int|\nabla f|^2\mathrm{d}\mu\\
\mathrm{Ent}_\mu(f^2)
&\leq c_{\mathrm{LSI}}\int|\nabla f|^2\mathrm{d}\mu\\
\mathrm{Ent}_\mu(f)
&\leq \frac{c_{\mathrm{LSI}}}{4}\int\frac{|\nabla f|^2}{f}\mathrm{d}\mu=\frac{c_{\mathrm{LSI}}}{4}\int|\nabla \log(f)|^2f\mathrm{d}\mu=c_{\mathrm{LSI}}\int|\nabla\sqrt{f}|^2\mathrm{d}\mu\\
\mathrm{H}(\nu\mid\mu)
&\leq\frac{c_{\mathrm{LSI}}}{4}\mathrm{I}(\nu\mid\mu)
\quad\text{(Villani notation, above is Bakry-Ledoux)}
\end{align*}

  • Best (optimal) constant is smallest, infinite if impossible, depends on $\mu$ and $\mathcal{F}$
  • Righ-hand side. The term $\int|\nabla f|^2\mathrm{d}\mu$ is often called energy by Bakry-Ledoux. This matches potential theory (Coulomb energy, via carré du champ électrique), Riemanian geometry (geodesics), and quantum mechanics. Bakry, Émery, and Ledoux, not specially versed in kinetic theory, information theory, or quantum mechanics, were however interested in geometric functional analysis, statistical mechanics, and some aspects of non-kinetic statistical physics.
  • Beyond $\mathbb{R}^n$ : replace $\nabla$ by analogues: discrete gradient, Malliavin, etc.
  • Discrete space. No chain rule, no equivalence between $1$-homogeneous and $2$-homogeneous forms of LSI, leads to several modified LSI, which can be compared possibly by restricting the class $\mathcal{F}$ of test functions.
  • Markov. If $\mu$ invariant law of Markov process with generator $L$ then replace $\int|\nabla f|^2\mathrm{d}\mu$ by
    \[
    -\int fLf\mathrm{d}\mu\quad(\text{Dirichlet form})
    \] Conversely, if one can interpret the right hand side of LSI as a Dirichlet form (quadratic form analogue of unbounded linear operators), then this leads to an operator that we can try to interpret as a Markov generator. For instance
    \[
    \int|\nabla f|^2\mathrm{d}x=-\int f\Delta f\mathrm{d}x
    \] is typically associated via integration by parts to Laplacian, hence to Brownian motion, with boundary conditions related to the class of test functions $\mathcal{F}$ used for LSI (or PI).
  • Linearisation. $\mathrm{Ent}_\mu((1+\varepsilon f)^2)=\frac{\varepsilon^2}{2}\mathrm{Var}_\mu(f)+o(\varepsilon^2)$ gives
    \[
    c_{\mathrm{PI}}\leq\frac{1}{2}c_{\mathrm{LSI}}
    \] Since PI is simpler than LSI, always try to prove PI, which is necessary for LSI.
  • PI and LSI not always achievable by an $f$, Rothaus alternative
  • PI $\Leftrightarrow$ spectral gap of Laplacian type operator, eigenvalues
  • Functional inequalities. PI and LSI, Sobolev functional spaces embeddings.
    $$\begin{align*}\mathbb{E}_\mu(\Phi(f))&\leq \Phi(\mathbb{E}_\mu(f))+c\mathbb{E}_\mu(|\nabla f|^2)\\
    \Phi(f)\in L^1(\mu)&\Leftarrow f\in L^1(\mu)\text{ and }|\nabla f|^2\in L^1(\mu)\end{align*}$$
  • Perturbation. If $\mathrm{d}\mu_B=\mathrm{e}^B\mathrm{d}\mu$ then $c_{\mathrm{LSI}}(\mu_B)\leq\mathrm{e}^{\|B\|_{\mathrm{osc}}}c_{\mathrm{LSI}}(\mu)$ (Holley-Stroock)
  • One dimensional case. Characterization by Hardy type inequalities (Muckenhoupt) : probability on $\mathbb{R}$ with density $\propto\mathrm{e}^{-c|x|^\alpha}$ satisfies PI if $\alpha\geq1$ and LSI if $\alpha\geq2$.
  • Disconnected support. $c_{\mathrm{PI}}(\mu)=c_{\mathrm{LSI}}(\mu)=\infty$ if support of $\mu$ not connected
    Take a non-constant $f$ which is constant on each connected component
  • Probabilistic functional analysis, analysis and geometry of Markov operators, geometric functional analysis. Not always related to Markov/PDE/Dynamics.

Concentration of measure for Lipschitz functions. LSI is a (sub-)gaussian statement, in which $c_{\mathrm{LSI}}$ plays the role of the norm of covariance matrix.

  • Laplace transform of $F:\mathbb{R}^n\to\mathbb{R}$ for $\mu$, sub-Gaussian bound from LSI:
    \[
    L(\theta):=\int\mathrm{e}^{\theta F}\mathbb{d}\mu, \quad \log
    L(\theta)\leq\theta^2\frac{c_{\mathrm{LSI}}\Vert F\Vert_{\mathrm{Lip}}^2}{4}+\theta\int
    F\mathrm{d}\mu,\quad\forall \theta\in\mathbb{R}
    \]
  • Proof (Herbst): $f=\mathrm{e}^{\theta F}$ in LSI for $\mu$ gives
    \[
    \theta L'(\theta)-L(\theta)\log
    L(\theta)\leq\frac{c_{\mathrm{LSI}}}{2}\theta^2L(\theta),
    \quad L(0)=1.
    \]
  • By Markov, for all $Z\sim\mu$, $r\geq0$,
    \[
    \mathbb{P}(|F(Z)-\mathbb{E}F(Z)|\geq r)\leq2\exp\left(-\frac{r^2}{c_{\mathrm{LSI}}\Vert F\Vert_{\mathrm{Lip}}^2}\right).
    \]
  • $\mathrm{Ent}_\mu$ is the Legendre transform of the log-Laplace transform in the sense that
    \[
    \mathrm{Ent}_\mu(f)
    =\sup_g\Bigr\{\int\!fg\,\mathrm{d}\mu
    -\log\int\mathrm{e}^g\,\mathrm{d}\mu\Bigr\}.
    \] and conversely (convex duality)
    \[
    \sup_{g\geq0\atop\int g\,\mathrm{d}\mu=1}\Bigr\{\int fg\,\mathrm{d}\mu-\mathrm{Ent}_\mu(g)\Bigr\}
    =\log\int\mathrm{e}^f\,\mathrm{d}\mu.
    \] $\to$ Concentration of measure, transportation of measure, large deviations,
    $\to$ Hopf-Lax infimum convolution solution of Hamilton-Jacobi equations.
  • Consequence : no LSI for exponential law and Poisson law, $\to$ modified inequalities.
  • Roughly LSI = sub-gaussian at $\infty$, smoothness, and connected support.
    Sub-Gaussian concentration implies LSI if curvature lower bounded (Wang)
  • If $X_1,\dots,X_N$ iid $\sim\mu$, $f:\mathbb{R}^n\to\mathbb{R}$, then, for all $r\geq0$,
    \[
    \mathbb{P}\left(
    \left|\frac{f(X_1)+\cdots+f(X_N)}{N}-\mathbb{E}f(X_1)\right|\geq r\right)
    \leq 2\exp\left(-\frac{Nr^2}{c_{\mathrm{LSI}}(\mu^{\otimes N})\Vert
    f\Vert_{\mathrm{Lip}}^2}\right).
    \] Actually, tensorization below gives: $c_{\mathrm{LSI}}(\mu^{\otimes N})\leq c_{\mathrm{LSI}}(\mu)$.
Sergey bobkov (1961 -- )
Sergey bobkov (1961 — )

Tensorization of PI/LSI via sub-additivity of entropies and additivity of gradient.

  • All these are equivalent for a convex $\Phi$, and valid for $\Phi(u)=u^2$ and $\Phi(u)=u\log(u)$ :
    1. Convexity : $(u,v)\mapsto\Phi”(u)v^2$ is convex
    2. Jensen sub-commutation : $\forall\mu_1,\mu_2$, $\forall f$, (just Cauchy-Schwarz for PI!)
      \[
      \mathrm{Ent}^\Phi_{\mu_2}(\mathbb{E}_{\mu_1}(f))\leq\mathbb{E}_{\mu_1}(\mathrm{Ent}^\Phi_{\mu_2}(f))
      \]
    3. Sub-additivity : $\forall n$, $\forall\mu_1,\ldots,\mu_n$, $\forall f$,
      \[
      \mathrm{Ent}^\Phi_{\mu_1\otimes\cdots\otimes\mu_n}(f)
      \leq\sum_{i=1}^n\mathbb{E}_{\mu_1\otimes\cdots\otimes\mu_n}(\mathrm{Ent}_{\mu_i}(f))
      \]
    4. Functional convexity : $\forall\mu$,
      \[
      f\mapsto\mathrm{Ent}^\Phi_\mu(f)
      \quad\text{is convex}
      \]
    5. Variational formula : $\forall \mu$, $\forall f$,
      \[
      \mathrm{Ent}^\Phi_\mu(f)=\sup_g\{\mathbb{E}_\mu((\Phi'(g)-\Phi'(\mathbb{E}_\mu
      g))(f-g))-\mathrm{Ent}^\Phi_\mu(g)\}
      \]

    Heritage of Boltzmann, Shannon, Bobkov, Latała-Oleszkiewicz, among others.

  • PI/LSI tensorization (dimension free) via $\mathrm{Ent}^\Phi$ sub-additivity and $|\nabla f|^2$ additivity
    \begin{align*}
    c_{\mathrm{PI}}(\mu_1\otimes\cdots\otimes\mu_n)
    &\leq\max_{1\leq i\leq n}c_{\mathrm{PI}}(\mu_i)
    \quad\text{and}\quad
    c_{\mathrm{LSI}}(\mu_1\otimes\cdots\otimes\mu_n)
    \leq\max_{1\leq i\leq n}c_{\mathrm{LSI}}(\mu_i)\\
    c_{\mathrm{PI}}(\mu^{\otimes n})
    &\leq c_{\mathrm{PI}}(\mu)
    \quad\text{and}\quad
    c_{\mathrm{LSI}}(\mu^{\otimes n})
    \leq c_{\mathrm{LSI}}(\mu)
    \end{align*} $x^{1\leq p\leq 2}$, Beckner, Latała-Oleszkiewicz, Arnold-Markowich-Toscani-Uerreiter, Dolbeault, etc.

From $\{-1,1\}$ to Gaussians via tensorization and the CLT.

  • On two point space $\{\pm1\}$ with uniform measure and $\nabla$ replaced by $(f(1)-f(0))^2$, elementary, $a=f(-1),b=f(1)$. Poincaré is an equality :
    \[
    \frac{a^2\log(a^2)+b^2\log(b^2)}{2}-\frac{a^2+b^2}{2}\log\frac{a^2+b^2}{2}
    \leq\frac{(a-b)^2}{2}.
    \] By homogeneity $(a^2+b^2=2$, $u=a^2$), this reduces to the even simpler inequality
    \[
    u\log(u)+(2-u)\log(2-u)\leq (\sqrt{u}-\sqrt{2-u})^2,\quad 0\leq u\leq 2.
    \]
  • From uniform on cube $\{-1,1\}^n$ to Gaussian on $\mathbb{R}$ via CLT
    and tensorization $\frac{x_1+\cdots+x_n}{\sqrt{n}}$
    \begin{align*}
    c_{\mathrm{LSI}}(\mathcal{N}(0,1))
    &=2\quad\text{achieved by $f(x)=\mathrm{e}^{\lambda x}$}\\
    c_{\mathrm{PI}}(\mathcal{N}(0,1))
    &=1\quad \text{achieved by $f(x)=\lambda
    x$, also via Hermite $\perp$
    polys}\\
    c_{\mathrm{PI}}(\mathcal{N}(0,1))
    &=\frac{1}{2}c_{\mathrm{LSI}}(\mathcal{N}(0,1))
    \end{align*}
  • By tensorization again: for all $n\geq1$,
    \begin{align*}
    c_{\mathrm{LSI}}(\mathcal{N}(0,I_n)
    &=\mathcal{N}(0,1)^{\otimes n}=2\quad\text{achieved by
    $f(x)=\mathrm{e}^{\lambda\cdot x}$}\\
    c_{\mathrm{PI}}(\mathcal{N}(0,I_n))
    &=2\quad\text{achieved by
    $f(x)=\lambda\cdot x$}
    \end{align*}
  • Dimension free : Wiener measure, Loop space, Lie groups.
  • Brascamp-Lieb and log-concavity
    \begin{align*}
    \mathrm{Ent}_{\mathcal{N}(m,K)}(f^2)
    &\leq2\mathbb{E}_{\mathcal{N}(m,K)}(\langle
    K\nabla f,\nabla
    f\rangle^2)\leq2\lambda_{\max}(K)\mathbb{E}_{\mathcal{N}(m,K)}(|\nabla
    f|^2)\\
    \mathrm{Var}_{\mathrm{e}^{-V}}(f^2)
    &\leq\mathbb{E}_{\mathrm{e}^{-V}}(\langle
    (\mathrm{Hess}V)^{-1}\nabla f,\nabla
    f\rangle)\leq\frac{1}{\rho}\mathbb{E}_{\mathrm{e}^{-V}}(|\nabla
    f|^2)\quad\text{if $\mathrm{Hess}V(x)\geq\rho I_n$ $\forall x$}\\
    c_{\mathrm{LSI}}
    &\approx\|\text{Cov}\|\quad(\text{gaussian analogy})
    \end{align*} See also Hörmander and Helffer-Sjöstrand, see also KLS conjecture.
Cédric Villani (1973 -- )
Cédric Villani (1973 — )

Markov and Langevin.

  • ${(X_t)}_{t\geq0}$, state space $E$
  • Markov semi-group $P_t f(x)=\mathbb{E}(f(X_t)\mid X_0=x)$
    \[
    P_0=\mathrm{id},\quad
    P_{s+t}=P_s\circ P_t,\quad
    P_t \mathbf 1= \mathbf 1,\quad
    f\geq 0\Rightarrow P_t f \geq 0.
    \]
  • Infinitesimal generator (BM: $L=\Delta$, OU: $L=\Delta-x\cdot\nabla$)
    \[
    P_t=\mathrm{e}^{tL},\quad \partial_tP_t=LP_t=P_tL
    \]
  • Left operator on $\mu$ and right operator on $f$ :
    \[
    \mu P_t f=\mathbb{E}(f(X_t)),\quad X_0\sim\mu.
    \]
  • Invariance : if $X_0\sim\mu\Rightarrow X_t\sim\mu$ $\forall t$,
    \[
    \mu P_t=\mu\quad\forall t,\quad \mu L=0
    \]
  • Resersibility : $X_0\sim\mu\Rightarrow
    {(X_s)}_{s\in[0,t]}\overset{\mathrm{d}}{=}{(X_{t-s})}_{s\in[0,t]}$ $\forall
    t$.
    \begin{eqnarray*}
    \mu_t
    &=&\mathrm{Law}(X_t)=\mu_0P_t,\quad
    f_t=\frac{\mathrm{d}\mu_t}{\mathrm{d}\mu}=P_tf_0\\
    \partial_tf_t
    &=&Lf_t\quad\text{(Fokker-Planck, Chapman-Kolmogorov)}
    \end{eqnarray*}
  • Reversibility & integration by parts via $P_t=\mathrm{id}+tL+o(t)$ ($L$ selfajoint in $L^2(\mu)$)
    \[
    \int fLg\mathrm{d}\mu=\int gLf\mathrm{d}\mu, \quad \forall f,g
    \]
  • Diffusion property (replaces IBP for $P_t$ and implies IBP under $\mu$)
    \[
    L\phi(f)=\phi'(f)Lf+\phi”(f)|\nabla f|^2
    \] Integration by parts when diffusion
    \[
    -\int fLg\mathrm{d}\mu
    =\int\nabla f\cdot\nabla g\mathrm{d}\mu\quad\text{(integration by parts)}
    \]
  • (Overdamped) Langevin reversible diffusion process for potential $V:\mathbb{R}^d\to\mathbb{R}$
    \begin{align*}
    X_t&=X_0-\int_0^t\nabla V(X_s)\mathrm{d}X_s+\sqrt{2}B_t\quad(\text{ODE
    with noise})\\
    L&=\Delta-\nabla V\cdot\nabla,\quad \mu\propto\mathrm{e}^{-V}
    \end{align*} We focus on Langevin for simplicity in the sequel.
    Gaussian case (Ornstein-Uhlenbeck): $V=\rho\frac{\left|\cdot\right|^2}{2}$, $\rho\geq0$, $\rho=0$ for BM
    Many things work for general Markov, some aspects are specific to diffusions.

Entropy decay and Markov LSI.

  • Monotonicity (second part with diffusion IBP since Langevin)
    \[
    \partial_t\mathrm{Ent}^\Phi_\mu(P_tf)
    =\int\Phi'(P_tf)LP_tf\mathrm{d}\mu\leq0
    \quad\Bigr(=-\int\Phi”(P_tf)|\nabla f|^2\mathrm{d}\mu\Bigr).
    \] Jensen $\Phi(P_tf)-P_t(\Phi(f))\leq0$, $=$ at $t=0$ hence $\Phi'(f)Lf-L\Phi(f)\leq0$. Invariance.
  • Markov LSI (second requires Markov diffusion)
    \[
    \mathrm{Ent}^\Phi_\mu(f)
    \leq-c_{\mathrm{LSI}}\int\Phi'(f)Lf\mathrm{d}\mu
    \quad\Bigr(=c_{\mathrm{LSI}}\int\Phi”(f)|\nabla f|^2\mathrm{d}\mu\Bigr).
    \]
  • Sub-exponential decay (à la Cercignani) via deBruijn and Grönwall
    \[
    c_{\mathrm{LSI}}\leq c
    \quad\text{iif}\quad
    \mathrm{Ent}^\Phi_\mu(P_tf)
    \leq\mathrm{e}^{-4t/c}\mathrm{Ent}^\Phi_\mu(f),\ \forall t,\forall f
    \]

Gross hypercontractivity.

  • Contractivity : $\forall p\in[0,\infty]$, $\forall t\geq0$, $\forall
    f$, $\|P_t(f)\|_p\leq\|f\|_p$.
  • Hypercontractivity (Gross theorem on characterization via LSI)
    If $\mu$ is reversible or $\mu$ invariant and Markov is diffusion:
    \[
    c_{\mathrm{LSI}}(\mu)\leq c
    \quad\Leftrightarrow\quad
    \|P_t\|_{1+(p-1)\mathrm{e}^{4t/c}}\leq\|f\|_p\quad\forall f,\forall t.
    \] Note : Markov version of LSI.
  • Proof : $\displaystyle\partial_p\|f\|_p^p=\int f^p\log(f)\mathrm{d}\mu$,
    $\partial_tP_t=LP_t=P_tf$, $L^p$-$L^1$ via reversibility or diffusion.
    Note : Gross forged the LSI concept here
    Note : LSI is linearization of hypercontractivity
    Note : historically Nelson showed hypercontractivity for OU without LSI
    Note : hypercontractivity for Rademacher $\{\pm1\}$ and discrete LSI : Bonami-Beckner.
    Note : combinatorial aspects of entropy rarely play a role in this universe
Michel Émery
Michel Émery (1949 — )

Entropy convexity along dynamics and Bakry-Émery $\Gamma_2$.

  • Langevin : $L=\Delta-\nabla V\cdot\nabla$, $\mu\propto\mathrm{e}^{-V}$, $\mu_t=\mu_0P_t$, $f_t=\mathrm{d}\mu_t/\mathrm{d}\mu$.
  • We have (mix of Villani Boltzmannian notation and Bakry-Ledoux notation)
    \begin{align*}
    \mathrm{I}(\nu\mid\mu)
    &=\int\Gamma(\log f)\mathrm{d}\nu,
    \quad f=\frac{\mathrm{d}\nu}{\mathrm{d}\mu},\\
    \partial_t\mathrm{I}(\mu_t\mid\mu)
    &=
    \partial_t\int\Gamma(\log f_t)\mathrm{d}\mu_t =-2\int\Gamma_2(\log
    f_t)\mathrm{d}\mu_t
    \end{align*} where
    \[
    \Gamma(f):=|\nabla f|^2
    \quad\text{and}\quad
    \Gamma_2(f):=\mathrm{Tr}((\mathrm{Hess}f)^2)+\mathrm{Hess}(V)\nabla
    f\cdot\nabla f.
    \] We use here reversible IBP to kill all instances of $L$ and replace by $\nabla$
  • Decay and convexity along dynamics (deBruijn identity and Stam inequality)
    \begin{align*}
    \partial_t\mathrm{H}(\mu_t\mid\mu)
    &=-\mathrm{I}(\mu_t\mid\mu)\leq0 \forall t\quad(\text{H-theorem})\\
    \partial_t^2\mathrm{H}(\mu_t\mid\mu)
    &=2\int\Gamma_2(\log f_t)\mathrm{d}\mu_t\geq0\ \forall t
    \quad(\text{Cercignany theorem})
    \end{align*} $\Gamma_2\geq0$ when $V$ convex ($\mu$ log-concave) including $V$ constant (Lebesgue)
    deBruijn and Stam correspond to Lebesgue measure (which is Gaussian)
  • Bakry-Émery $\Gamma_2$ criterion: $\rho\geq0$,
    \[
    \Gamma_2(f)\geq\rho\Gamma(f)\ \forall f
    \quad\Leftrightarrow\quad
    \mathrm{Hess}V(x)\geq\rho\ \forall x.
    \]
  • Grönwall (motivation of $\Gamma_2$, OU and = ref model):
    \begin{align*}
    \partial_t\mathrm{I}(\mu_t\mid\mu)
    &\leq-2\rho\mathrm{I}(\mu_t\mid\mu)\ \forall t\\
    \mathrm{I}(\mu_t\mid\mu)
    &\leq\mathrm{e}^{-2\rho t}
    \mathrm{I}(\mu_0\mid\mu)\ \forall t
    \end{align*}
  • LSI for log-concave, optimal for Gaussians
    \begin{align*}
    \mathrm{H}(\mu_0\mid\mu)
    &=-\int_0^\infty-\mathrm{I}(\mu_t\mid\mu)\mathrm{d}t\\
    &\leq\Bigr(\int_0^\infty\mathrm{e}^{-2\rho
    t}\mathrm{d}t\Bigr)\mathrm{I}(\mu_0\mid\mu)\\
    &=\frac{1}{2\rho}\mathrm{I}(\mu_0\mid\mu).
    \end{align*}
  • Exponential decay
    \[
    \mathrm{H}(\mu_t\mid\mu)
    \leq\mathrm{e}^{-2\rho t}
    \mathrm{H}(\mu_0\mid\mu)\ \forall t.
    \]
Dominique Bakry (1954 - )
Dominique Bakry (1954 – )

Bakry-Émery : Langevin.

  • Let us show for $L=\Delta-\nabla V\cdot\nabla$ how $\Gamma_2$ emerges and gives local PI.
    Comes from infinitesimal form at $t=0$, semigroup interpolation, Grönwall:
    \begin{align*}
    P_t(f^2)-P_t(f)^2
    &=\alpha(t)-\alpha(0)=\int_0^t\alpha'(s)\mathrm{d}s\\
    \alpha(s)&=P_s((P_{t-s}f)^2)\\
    \alpha'(s)&=2P_s(\Gamma P_{t-s}f)\\
    \alpha”(s)&=4P_s(\Gamma_2(P_{t-s}f)).
    \end{align*} If $\Gamma_2\geq\rho\Gamma$ then $\alpha”\geq2\rho\alpha’$, hence by Grönwall $\alpha'(t)\geq\mathrm{e}^{2\rho t}\alpha'(0)$, and since we have $\alpha(0)=2\Gamma P_tf$, we get (for O.-U. via commutation from Mehler formula)
    \[
    \Gamma P_tf\leq\mathrm{e}^{-2\rho t}P_t\Gamma f.
    \] Used with $t-s$ this gives in turn
    \[
    \alpha'(t-s)\leq2\mathrm{e}^{-2\rho(t-s)}P_sP_{t-s}\Gamma f=2\mathrm{e}^{-2\rho(t-s)}P_t\Gamma
    \] where the semigroup no longer involves $s$, hence
    \[
    \mathrm{Var}_{P_t}(f) =\alpha(t)-\alpha(0)\leq2\frac{1-\mathrm{e}^{-2\rho t}}{2\rho}P_t\Gamma f.
    \]
  • Bakry-Ledoux interpolation : all these are equivalent for $L=\Delta-\nabla V\cdot\nabla$
    1. $\mathrm{Ent}^\Phi_{P_t}(f)\leq\frac{1-\mathrm{e}^{-2\rho
      t}}{2\rho}P_t(\Phi”(f)|\nabla f|^2)$, $\forall t>0,\forall f$
    2. $P_t(f^2)-P_t(f)^2\leq\frac{1-\mathrm{e}^{-2\rho
      t}}{\rho}P_t(|\nabla f|^2)$
    3. $P_t(f\log f)-P_t(f)\log P_t(f)\leq\frac{1-\mathrm{e}^{-2\rho
      t}}{2\rho}P_t(\frac{|\nabla f|^2}{f})$
    4. $I(P_tf)\leq P_t\sqrt{I(f)^2+\frac{1-\mathrm{e}^{-2\rho
      t}}{2\rho}|\nabla f|^2}$, $I:=F’\circ F^{-1}$, $F:=\mathbb{P}(\mathcal{N}(0,1)\leq\cdot)$
    5. $|\nabla P_tf|\leq\mathrm{e}^{-\rho t}P_t|\nabla f|$
    6. $|\nabla P_tf|^2\leq\mathrm{e}^{-2\rho t}P_t(|\nabla f|^2)$
    7. $\mathrm{Hess}V(x)\geq\rho I_n$, $\forall x$
  • Note:
    • Interpretation of sub-commutation via curvature/trajectories
      For Langevin $\nabla L=L\nabla-\mathrm{Hess}V\nabla=L\nabla-\rho\nabla$
      Bochner-Lichnerowicz-Weitzenböck in Riemannian geometry
    • We speak about the $\Gamma_2$ criterion, or Bakry-Émery criterion
    • PI : weak sub-commutation is enough, no diffusion property is needed.
    • These equivalences fail beyond diffusions on discrete spaces
      Except PI which does not really need the diffusionproperty
      Some adaptation can be done for Poisson and Lévy processes
    • Curvature-Dimension inequality : $\mathrm{CD}(\rho,m)$ $\Gamma_2(f)\geq\rho\Gamma(f)+\frac{1}{m}(L f)^2$.
    • The $\Gamma_2$ criterion is $\mathrm{CD}(\rho,\infty)$
    • On a Riemannian manifold add $\mathrm{Ric}(\nabla f,\nabla f)$ to $\Gamma_2$
      Bakry-Émery tensor, Perelman on Poincaré conjecture
  • LSI for $P_t$ (local LSI) via diffusion property: $P_t(\cdot)(x)=\mu_t$, $\mu_0=\delta_x$, with $g:=P_{t-s}f$,
    \begin{align*}
    \mathrm{Ent}^\Phi_{P_t}(f)
    &=\int_0^t\partial_sP_s(\Phi(P_{t-s}f))\mathrm{d}s\\
    &=\int_0^tP_s(L(\Phi(g))-\Phi'(g)Lg)\mathrm{d}s\\
    &=\int_0^tP_s(\Phi”(g)|\nabla g|^2)\mathrm{d}s\\
    &\leq\int_0^t\mathrm{e}^{-2\rho s}P_s(\Phi”(P_{t-s}f)|P_{t-s}(|\nabla f|)^2)\mathrm{d}s\\
    &\leq\int_0^t\mathrm{e}^{-2\rho s}P_s(P_{t-s}(\Phi”(f)||\nabla
    f|^2)\mathrm{d}s\\
    &= P_t(\Phi”(f)|\nabla f|^2)\int_0^t\mathrm{e}^{-2\rho s}\mathrm{d}s\\
    &=\frac{1-\mathrm{e}^{-2\rho t}}{2\rho}
    P_t(\Phi”(f)|\nabla f|^2).
    \end{align*} When $t\to\infty$, recover inequality for invariant measure $\mu=P_\infty(\cdot)(x)$, $\forall x$.
    Diffusion property plays for $P_t$ the role playbed by IBP for $\mu$
    As for IBP, allows to kill $L$ and replace it by $\nabla$ i.e. $\Gamma$
  • LSI for $\mu$ via semigroup interpolation (Bakry-Émery method, IBP for diffusion)
    \begin{align*}
    \mathrm{Ent}^\Phi_\mu(f)
    &=-\mathbb{E}_\mu\int_0^\infty\partial_t\Phi(P_tf)\mathrm{d}t\\
    &=-\mathbb{E}_\mu\int_0^\infty\Phi'(P_tf)LP_tf\mathrm{d}t\\
    &=\int_0^\infty\mathbb{E}_\mu(\Phi”(P_tf)|\nabla P_tf|^2)\mathrm{d}t.
    \end{align*} Bakry-Ledoux semigroup interpolation proof of LSI via sub-commutation
    \begin{align*}
    |\nabla P_tf|&\leq\mathrm{e}^{-\rho t}P_t|\nabla f|\\
    \mathrm{Ent}^\Phi_\mu(f)
    &=\int_0^\infty\mathbb{E}_\mu(\Phi”(P_tf)|\nabla P_tf|^2)\mathrm{d}t\\
    &\leq\int_0^\infty\mathrm{e}^{-2\rho t}\mathbb{E}_\mu(\Phi”(P_tf)P_t(|\nabla f|)^2)\mathrm{d}t\\
    &\leq\int_0^\infty\mathrm{e}^{-2\rho t}\mathbb{E}_\mu(P_t(\Phi”(f)|\nabla
    f|^2))\mathrm{d}t\\
    &=\mathbb{E}_\mu(\Phi”(f)|\nabla
    f|^2)\int_0^\infty\mathrm{e}^{-2\rho t}\mathrm{d}t\\
    &=\frac{1}{2\rho}\mathbb{E}_\mu(\Phi”(f)|\nabla f|^2).
    \end{align*}
  • PI, spectral gap, integrated $\Gamma_2$ criterion (no diffusion, robust to discrete spaces):
    \[
    \mathbb{E}_\mu(\Gamma_2f)\geq\rho\mathbb{E}_\mu(\Gamma f)
    \quad\Leftrightarrow\quad
    c_{\mathrm{PI}}\leq\frac{1}{\rho}
    \quad\Leftrightarrow\quad
    \mathrm{SpectralGap}(-L)\geq\rho.
    \]
  • Alternative via mass transportation : Caffarelli contraction theorem
    \begin{align*}
    \mathrm{d}\mu
    &=\mathrm{e}^{-V}\mathbb{d}x, \quad\mathrm{d}\nu =\mathrm{e}^{-W}\mathbb{d}\\
    \mathrm{Hess}V &\leq AI_n , \quad \mathrm{Hess}W \geq BI_n.
    \end{align*} Following Caffarelli, maximum principle for Monge-Ampère implies that Brenier mass transportation map $\nabla \phi$ between $\mu$ and $\nu$ is Lipschitz with $$\|\nabla\phi\|_{\mathrm{Lip}}\leq\sqrt{A/B}$$ Works very well on $\mathbb{R}^n$ to get LSI for uniformly log-concave
    Does not work very well on manifolds
    Requires knowledge of Gaussian inequalities
Michel Ledoux (1958 -)
Michel Ledoux (1958 — )

Bakry-Émery : abstract Markov.

  • Abstract Markov setting (Bakry-Ledoux)
    \begin{align*}
    P_t&=\mathrm{e}^{tL}\\
    \Gamma(f,g)&=\frac{1}{2}(L(fg)-fLg-gLf)\quad(\text{carré du champ})\\
    \Gamma_2(f,g)&=\frac{1}{2}(L\Gamma(f,g)-\Gamma(f,Lg)-\Gamma(g,Lf))\\
    -\int fLg\mathrm{d}\mu
    &=\int\Gamma(f,g)\mathrm{d}\mu\\
    L\phi(f)&=\phi'(f)Lf+\phi”(f)\Gamma(f)\\
    \Gamma P_tf&\leq\mathrm{e}^{-2\rho t}P_t\Gamma f\\
    \sqrt{\Gamma P_tf}&\leq\mathrm{e}^{-\rho t}P_t\sqrt{\Gamma f}\\
    \mathrm{Ent}^\Phi_{P_t}(f)&\leq\frac{1-\mathrm{e}^{-2\rho t}}{2\rho}
    P_t(\Phi”(f)\Gamma(f))\\
    \Gamma_2(f)&\geq\rho\Gamma(f)+\frac{1}{m}(Lf)^2
    \end{align*}
  • Problem of $\mathcal{A}$ algebra to make things rigorous
  • All in all, the Bakry-Émery-Ledoux approach consists in commutations and positivity, the latter coming essentially from squares and convexity. In some sense, it is a rigid or algebraic-geometric side of probabilistic functional analysis and differential calculus.
  • Time non-homogeneous Markov, $L_t$, $\rho_t$, $\int_0^t\rho(s)\mathrm{d}s$ : Collet-Malrieu, but still linear, $\neq$ Polchinski renormalization (Bauerschmidt-Bodineau), which can be seen as an instance of a Schrödinger bridge, a multiscale Bakry-Émery criterion, see Shenfeld. What is called renormalization group if often nothing else but a semigroup interpolation.

Related functional inequalities.

  • LSI is linearization of Bobkov functional Gaussian isoperimetry (Beckner)
  • LSI is projection of Sobolev on high dimensional spheres (Beckner)
  • LSI is connected to Talagrand transportation inequalities
    (Bobkov-Götze, Otto-Villani, Bobkov-Gentil-Ledoux, etc)
  • LSI connected to Nash inequalities and Li-Yau parabolic Harnack inequalities
  • LSI for Gaussian is Shannon-Stam inequality for Lebesgue (information theory)

Statistical mechanics and beyond product measures.

  • Integrated $\Gamma_2$ criterion: $c_{\mathrm{PI}}(\mu)\leq\frac{1}{\rho}$ $\Leftrightarrow$ $\mathbb{E}_\mu(\Gamma_2(f))\geq\rho\mathbb{E}_\mu(\Gamma(f))$ $\forall f$
  • Only sufficient integral criteria for LSI
  • $c_{\mathrm{LSI}}(\mu)<\infty$ if $\mathrm{d}\mu(x)=\mathrm{e}^{-V}\mathrm{d}x$ $V$ uniformly convex @ $\infty$ (Bodineau-Helffer)
  • PI/LSI for spin systems (discrete or continuous) Glauber or Kawasaki dynamics
    \[
    \frac{\mathrm{e}^{-V(x)}}{Z}\mathrm{d}x,\quad\mathbb{R}^\Lambda,\quad
    V(x)=\sum_iU(x_i)+\sum_{i\sim j}W(x_i,x_j).
    \] Control of correlations.
    Perturbative approaches.
    High dimensional convexification.
    Conditionnings (martingale decomposition).
    (Lu-)Yau(-Landim), Zegarlinski, Martinelli, Bodineau-Helffer, Ledoux, etc.
Aart Johannes Stam (1929 – 2020)
Aart Johannes Stam (1929 – 2020)

Further reading.

  • C. Ané, S. Blachère, D. Chafaï, P. Fougères, I. Gentil, F. Malrieu, C. Roberto, and G. Scheffer.
    Sur les inégalités de Sobolev logarithmiques
    Panoramas et Synthèses 10 Société Mathématique de France (2000)
  • D. Bakry and M. Émery
    Diffusions hypercontractives
    Séminaire de probabilités XIX, Université de Strasbourg 1983/84, Lecture Notes in Mathematics 1123, 177-206 (1985)
  • D. Bakry, I. Gentil, and M. Ledoux.
    Analysis and geometry of Markov diffusion operators
    Grundlehren Math. Wiss. 348, Springer (2014)
  • D. Bakry and M. Ledoux
    Lévy-Gromov’s isoperimetric inequality for an infinite dimensional diffusion generator
    Invent. Math. 123(2):259-281 (1996)
  • D. Bakry and M. Ledoux
    A logarithmic Sobolev form of the Li-Yau parabolic inequality
    Rev. Mat. Iberoam. 22(2):683-702 (2006)
  • D. Bakry, M. Ledoux, and L. Saloff-Coste
    Markov semigroups at Saint-Flour
    Reprint (2012) of lectures originally published in the Lecture Notes in Mathematics volumes 1581 (1994), 1648 (1996) and 1665 (1997).
  • R. Bauerschmidt and T. Bodineau
    Log-Sobolev inequality for the continuum sine-Gordon model
    Commun. Pure Appl. Math. 74(10):2064-2113 (2021)
  • D. Chafaï
    Binomial-Poisson entropic inequalities and the M/M/$\infty$ queue
    ESAIM, Probab. Stat. 10:317–339 (2006)
  • D. Chafaï
    From Boltzmann to random matrices and beyond
    Ann. Fac. Sci. Toulouse, Math. 6 24(4):641–689 (2015)
  • J.-F. Collet and F. Malrieu
    Logarithmic Sobolev inequalities for inhomogeneous Markov semigroups
    ESAIM, Probab. Stat. 12:492–504 (2008)
  • E. B. Davies, L. Gross, and B. Simon.
    Hypercontractivity: a bibliographic review
    Ideas and methods in quantum and statistical physics
    Oslo, 1988 370–389, Cambridge Univ. Press (1992)
  • J.-D. Deuschel and D. W. Stroock
    Large deviations
    Academic Press, rev. ed. edition (1989)
  • W. G. Faris
    Product spaces and Nelson’s inequality
    Helv. Phys. Acta 48(5/6):721–730 (1975)
  • P. Federbush
    Partially alternate derivation of a result of Nelson
    J. Math. Phys. 10:50–52 (1969)
  • L. Gross
    Logarithmic Sobolev inequalities
    Am. J. Math. 97(4):1061–1083 (1975)
  • L. Gross
    Logarithmic Sobolev inequalities and contractivity properties of semigroups
    Dirichlet Forms, Varenna, 1992, Lecture Notes in Math. 1563, 54–88, Springer (1993)
  • L. Gross
    Hypercontractivity, logarithmic Sobolev inequalities, and applications: a survey of surveys.
    Diffusion, Quantum Theory, and Radically Elementary Mathematics
    Math. Notes 47, 45–73. Princeton Univ. Press (2006)
  • A. Guionnet and B. Zegarlinski
    Lectures on logarithmic Sobolev inequalities
    Séminaire de probabilités XXXVI, 1-134. Springer (2003)
  • B. Helffer
    Semiclassical analysis, Witten Laplacians, and statistical mechanics
    World Scientific (2002)
  • E. P. Hsu
    Stochastic analysis on manifolds
    Grad. Stud. Math. 38 American Mathematical Society (2002)
  • B. Klartag and J. Lehec
    Bourgain’s slicing problem and KLS isoperimetry up to polylog
    Geom. Funct. Anal. 32(5):1134–1159 (2022)
  • M. Ledoux
    Concentration of measure and logarithmic Sobolev inequalities
    Séminaire de probabilités XXXIII, pages 120-216. Springer (1999)
  • M. Ledoux
    The geometry of Markov diffusion generators
    Ann. Fac. Sci. Toulouse, Math. (6) 9(2):305–366 (2000)
  • M. Ledoux
    Logarithmic Sobolev inequalities for unbounded spin systems revisited
    Séminaire de Probabilités XXXV, 167-194. Springer (2001)
  • M. Ledoux
    Heat flows, geometric and functional inequalities
    Proceedings of the International Congress of Mathematicians (ICM 2014), Seoul, Korea, August 13-21 (2014) Vol. IV: Invited lectures pages 117-135.
  • M. Ledoux
    More than fifteen proofs of the logarithmic Sobolev inequality
    Historical note available on personal webpage
  • M. Ledoux
    Curvature-Dimension
    Historical note available on personal webpage
  • F. Martinelli
    Lectures on Glauber dynamics for discrete spin models.
    Lectures on probability theory and statistics. Ecole d’été de Probabilités de Saint-Flour XXVII-1997 July 7–23, 1997}, pages 93-191. Springer (1999)
  • R. Montenegro and P. Tetali
    Mathematical aspects of mixing times in Markov chains
    Found. Trends Theor. Comput. Sci. 1(3):237–354 (2005)
  • G. Royer
    An initiation to logarithmic Sobolev inequalities
    SMF/AMS Texts Monogr. 14 American Mathematical Society and Société Mathématique de France (2007)
  • L. Saloff-Coste
    Aspects of Sobolev-type inequalities
    Lond. Math. Soc. Lect. Note Ser. 289 Cambridge University Press (2002)
  • Y. Shenfeld
    Exact renormalization groups and transportation of measures
    arXiv:2205.01642
  • A. J. Stam
    Some inequalities satisfied by the quantities of information of Fisher and Shannon
    Inf. Control 2:101-112 (1959)
  • C. Villani
    Optimal transport. Old and new
    Grundlehren Math. Wiss. 338 Springer (2009)
  • F.-Y. Wang
    Analysis for diffusion processes on Riemannian manifolds
    Adv. Ser. Stat. Sci. Appl. Probab. 18 World Scientific (2014)
William G. Faris (1939 -)
William G. Faris (1939 — )
Leave a Comment
Syntax · Style · .