Cutoff for high-dimensional curved diffusions

Photo of Persi Diaconis — Persi Diaconis (1945 - ) A mathemagician who conceptualized the cutoff phenomenon

This post is inspired from the last part of a recent work [CF] in collaboration with Max Fathi about the cutoff phenomenon for curved diffusions in high dimension.

Diffusion. Let ${(X_t)}_{t\geq0}$ be the solution of the stochastic differential equation \[ \mathrm{d}X_t = -\nabla V(X_t)\mathrm{d}t + \sqrt{2}\mathrm{d}B_t \] where ${(B_t)}_{t\geq0}$ is a standard Brownian motion in $\mathbb{R}^d$, $V:\mathbb{R}^d\to\mathbb{R}$ is strictly convex and $\mathcal{C}^2$ with $\lim_{|x|\to\infty}V(x)=+\infty$, and $\left|\cdot\right|$ is the Euclidean norm of $\mathbb{R}^d$. In Statistical Physics, this drift-diffusion is also known as an overdamped Langevin process with potential $V$. By adding a constant to $V$, we can assume without loss of generality that $\mu_V=\mathrm{e}^{-V}$ namely \[ \mathrm{d}\mu_V(x)=\mathrm{e}^{-V(x)}\mathrm{d}x \] is a probability measure. It is the unique invariant law of the process, and it is moreover reversible. The associated infinitesimal generator is the linear differential operator \[ \mathrm{L} = \Delta - \nabla V \cdot \nabla \] acting on smooth functions. It is symmetric in $L^2(\mu_V)$, and its kernel is the set of constant functions. Moreover, its spectrum is included in $(-\infty,-\lambda_1]\cup\{0\}$, for some $\lambda_1 > 0$ called the spectral gap of $\mathrm{L}$. When $V(x)=\frac{\rho}{2}|x|^2$ for some $\rho > 0$ then $X$ is the Ornstein-Uhlenbeck (OU) process, $\mu_V$ is Gaussian, $\lambda_1=\rho$, and $\mathrm{Hess}(V)(x)=\rho \mathrm{Id}$ for all $x\in\mathbb{R}^d$.

When $V$ is $\rho$-convex for some $\rho > 0$, namely if $V-\frac{\rho}{2}\left|\cdot\right|^2$ is convex, then the spectrum of $-\mathrm{L}$ is discrete, the spectral gap is an eigenvalue, and $\mathrm{Hess}(V)(x)\geq\rho\mathrm{Id}$ for all $x\in\mathbb{R}^d$.

Trend to the equilibrium. Let us denote $\mu_t:=\mathrm{Law}(X_t)$. We know that for all $\mu_0$, \[ \mu_t \xrightarrow[t\to\infty]{\mathrm{weak}} \mu_V. \]

Functional inequalities. In order to quantify this trend to the equilibrium, we use the total variation distance, the Kantorovich--Wasserstein quadratic cost distance, the Kullback-Leibler relative entropy, and the Fisher information. Recall the formulas \begin{eqnarray*} \mathrm{d}_{\mathrm{TV}}(\nu,\mu) &=&\inf_{\substack{(U,V)\\U\sim\mu,V\sim\nu}}\mathbb{P}(U\neq V)\\\\ \mathrm{W}_2(\nu,\mu) &=&\inf_{\substack{(U,V)\\U\sim\mu,V\sim\nu}}\sqrt{\mathbb{E}(|U-V|^2)}\\ \mathrm{H}(\nu\mid\mu) &=&\displaystyle\int\log\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\mathrm{d}\nu\\ \mathrm{I}(\nu\mid\mu) &=&\displaystyle\int\Bigr|\nabla\log\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\Bigr|^2\mathrm{d}\nu \end{eqnarray*} They take their values in $[0,+\infty]$, but $\mathrm{d}_{\mathrm{TV}}\leq1$. They are comparable, generically or under certain conditions, and these comparisons are known as functional inequalities. The simplest and most well known is the Pinsker or Csiszár-Kullback inequality \[ \mathrm{d}_{\mathrm{TV}}(\nu,\mu)^2\leq 2\mathrm{H}(\nu\mid\mu), \] valid for arbitrary probability measures $\nu$ and $\mu$. Well known comparisons or functional inequalities involving $\mathrm{W}_2(\cdot,\mu_V)$, $\mathrm{H}(\cdot\mid\mu_V)$, $\mathrm{I}(\cdot\mid\mu_V)$ are available by using the $\rho$-convexity of $V$, see [BGL]. For instance, we have the Talagrand inequality \[ \frac{\rho}{2}\mathrm{W}_2(\nu,\mu_V)^2\leq\mathrm{H}(\nu\mid\mu_V) \] as well as the logarithmic Sobolev inequality \[ 2\rho\mathrm{H}(\nu\mid\mu_V)\leq\mathrm{I}(\nu\mid\mu_V). \] By linearization, the latter implies a Poincaré inequality which is equivalent to $\lambda_1\geq\rho$.

Cutoff phenomenon. Let $S\subset\mathcal{P}(\mathbb{R}^d)$ be an arbitrary non-empty set of probability measures. Let $\eta\in(0,1)$ be an arbitrary fixed threshold which does not dependent on $d$. Suppose that there exists a positive constant $\rho$ that may depend on $d$ such that for all $x\in\mathbb{R}^d$, \[ \mathrm{Hess}(V)(x)\geq\rho\mathrm{Id}, \] and that the following curvature product condition holds: \[ \lim_{d\to\infty}\rho T=+\infty \quad\text{where}\quad T := \inf\bigr\{t\geq0:\sup_{\mu_0\in S} \mathrm{d}_{\mathrm{TV}}(\mu_t, \mu_V) \leq\eta\bigr\}. \] Then there is cutoff at critical time $T$ in the sense that for all fixed $\varepsilon\in(0,1)$, \begin{eqnarray*} \lim_{d\to\infty} \sup_{\mu_0\in S} \mathrm{d}_{\mathrm{TV}}(\mu_{t_d},\mu_V) &=& \begin{cases} 1 & \text{if $t_d=(1-\varepsilon)T$}\\ 0 & \text{if $t_d=(1+\varepsilon)T$} \end{cases}\\ \lim_{d\to\infty} \sup_{\mu_0\in S} \mathrm{I}(\mu_{t_d}\mid\mu_V) &=& \begin{cases} +\infty & \text{if $t_d=(1-\varepsilon)T$}\\ 0 & \text{if $t_d=(1+\varepsilon)T$} \end{cases}\\ \lim_{d\to\infty} \sup_{\mu_0\in S} \mathrm{H}(\mu_{t_d}\mid\mu_V) &=& \begin{cases} +\infty & \text{if $t_d=(1-\varepsilon)T$}\\ 0 & \text{if $t_d=(1+\varepsilon)T$} \end{cases}\\ \lim_{d\to\infty} \sup_{\mu_0\in S} \mathrm{W}_2(\mu_{t_d},\mu_V) &=& \begin{cases} +\infty & \text{if $t_d=(1-\varepsilon)T$}\\ 0 & \text{if $t_d=(1+\varepsilon)T$} \end{cases}. \end{eqnarray*} It is a high-dimensional phenomenon. Note that $X$, $S$, $\rho$, $T$, $S$ depend on $d$.

This mathematical formulation is a way to express the abrupt transition at critical time $T$ from the maximum value to the minimum value of the distance or divergence.

Proof. The Bakry-Émery version of the Lichnérovicz inequality gives $\lambda_1\geq\rho$, thus \begin{equation} \lim_{d\to\infty}\lambda_1 T=+\infty, \end{equation} which is the Peres product condition in Corollary 1 of [S], hence the cutoff in total variation distance. It remains to prove cutoff for the other cases. Let us start with the relative entropy lower bound. The Pinsker or Csiszár-Kullback inequality gives \begin{equation} \varliminf_{d\to\infty} \sup_{\mu_0\in S} \mathrm{H}(\mu_{(1-\frac{\varepsilon}{2})T}\mid\mu_V) \geq\frac{\eta^2}{2}. \end{equation} On the other hand, by the Bakry-Émery curvature theorem, for all $t'\geq t\geq0$, \[ \mathrm{H}(\mu_{t'}\mid\mu_V)\leq\mathrm{e}^{-2\rho(t'-t)}\mathrm{H}(\mu_t\mid\mu_V) \] Taking $t' = (1-\frac{\varepsilon}{2})T$, $t = (1-\varepsilon)T$, and using $\lim_{d\to\infty}\rho T=+\infty$, we get \begin{equation} \varliminf_{d\to\infty}\mathrm{H}(\mu_{(1-\varepsilon)T}\mid\mu_V) \geq\mathrm{e}^{\varepsilon\varlimsup_{d\to\infty}\rho T} \frac{\eta^2}{2}=+\infty. \end{equation} For the upper bound, a careful reading of the proof of Theorem 1 in [S] shows that \begin{equation} \varlimsup_{d\to\infty} \mathrm{H}\bigr(\mu_{(1+\frac{\varepsilon}{2})T}\mid\mu_V\bigr) \leq C_\varepsilon < \infty. \end{equation} Using the exponential decay of the relative entropy and $\lim_{d\to\infty}\rho T=+\infty$, we get \begin{equation} \varlimsup_{d\to\infty}\mathrm{H}(\mu_{(1+\varepsilon)T}\mid\mu_V) \leq\mathrm{e}^{-\varepsilon\varliminf_{d\to\infty}\rho T} C_\varepsilon =0. \end{equation}

For Wasserstein distance, the upper bound comes from the one for relative entropy via the Talagrand inequality $\rho\mathrm{W}_2(\mu_t,\mu_V)^2 \leq 2\mathrm{H}(\mu_t\mid\mu_V)$, while the lower bound comes from the Wasserstein regularization inequality (see [BGL]) \[ \mathrm{H}(\mu_t\mid\mu_V) \leq \frac{\rho\mathrm{e}^{-2\rho t}}{1-\mathrm{e}^{-2\rho t}}\mathrm{W}_2(\mu_0, \mu_V)^2 \leq\frac{1}{2t}\mathrm{W}_2(\mu_0, \mu_V)^2, \] used with $t = \varepsilon T$, combined with the Markov semigroup property.

Finally, for Fisher information, the lower bound comes from the one for the relative entropy via the logarithmic Sobolev inequality $2\rho\mathrm{H}(\mu_t\mid\mu_V) \leq \mathrm{I}(\mu_t\mid \mu_V)$, while to upper bound $\mathrm{I}(\mu_{t_1}\mid\mu_V)$, we write, for all $0 < t_0 < t_1$, \[ \mathrm{H}(\mu_{t_0}\mid\mu_V)-\mathrm{H}(\mu_{t_1}\mid\mu_V) =\int_{t_0}^{t_1}\mathrm{I}(\mu_s\mid\mu_V)\mathrm{d}s \geq(t_1-t_0)\mathrm{I}(\mu_{t_1}\mid\mu_V) \] where we have used the monotinicity of $\mathrm{I}$, which gives, when $t_0 > 1$, the regularization \[ \mathrm{I}(\mu_{t_1}\mid\mu_V) \leq\frac{\mathrm{H}(\mu_{t_0}\mid\mu_V)}{t_1-t_0} \leq\frac{\mathrm{e}^{-\rho(t_0-1)}}{2(t_1-t_0)}\mathrm{W}_2(\mu_0, \mu_V)^2. \] This proof melts arguments from [CSC], [S], and [CF]. It is inspired from what is done in [BCL], with a simpler regularization procedure.

Comments. When $\mu_0$ is a Dirac mass, then $\mathrm{H}(\mu_0\mid\mu_V)=+\infty$ and $\mathrm{I}(\mu_0\mid\mu_V)=+\infty$ , which is not the case for $\mathrm{W}_2(\mu_0\mid\mu_V)$, hence the usage of short time regularization.

The $\rho$-convexity of $V$ is used to ensure several remarkable properties:

the exponential decay of $t\mapsto\mathrm{H}(\mu_t\mid\mu_V)$,
the monotonicity of $t\mapsto\mathrm{I}(\mu_t\mid\mu_V)$ and $t\mapsto\mathrm{W}_2(\mu_t,\mu_V)$,
the regularization of $\mathrm{H}(\mu_t\mid\mu_V)$ with $\mathrm{W}_2(\mu_t\mid\mu_V)$,
the Talagrand and log-Sobolev inequalities.

The monotonicity of $t\mapsto\mathrm{H}(\mu_t\mid\mu_V)$ and monotonicity of $t\mapsto\mathrm{d}_{\mathrm{TV}}(\mu_t\mid\mu_V)$ is a general Markovian fact that does not rely on the convexity of $V$ or diffusion nature of $X$.

This approach does not provide the value of the mixing time $T$.

Rigidity. Following [CF], if the process is rigid, in the sense that $\lambda_1=\rho$, then, by taking \[ S=B(m_V,c\sqrt{d})\quad\text{or}\quad S=m_V+[-c,c]^d \] where $m_V$ is the mean of $\mu_V$ and $c > 0$ is an arbitrary constant, we get \[ T\asymp\frac{\log(d)}{2\rho}. \] A remarkable example of a rigid process beyond Ornstein-Uhlenbeck is given by \[ V(x)=\frac{\rho}{2}|x|^2+W(x), \quad x\in\mathbb{R}^d, \] where $\rho > 0$ and $W:\mathbb{R}^d\to\mathbb{R}$ is convex and translation invariant in the direction $(1,\ldots,1)\in\mathbb{R}^d$, namely for all $u\in\mathbb{R}$ and all $x\in\mathbb{R}^d$, $W(x+u(1,\ldots,1))=W(x)$. This is the case for example when for some convex even function $h:\mathbb{R}\to\mathbb{R}$, \[ W(x)=\sum_{i < j}h(x_i-x_j),\quad x\in\mathbb{R}^d. \] If $\pi$ and $\pi^\perp$ are the orthogonal projections on $\mathbb{R}(1,\ldots,1)$ and its orthogonal, respectively, then $|x|^2=|\pi(x)|^2+|\pi^\perp(x)|^2$, while the translation invariance of $W$ in the direction $(1/\sqrt{d},\ldots,1/\sqrt{d})$ gives $W(x)=W(\pi(x)+\pi^\perp(x))=W(\pi^\perp(x))$, therefore \[ \mathrm{e}^{-V(x)} =\mathrm{e}^{-\frac{\rho}{2}|\pi(x)|^2}\mathrm{e}^{-(W(\pi^\perp(x))+\frac{\rho}{2}|\pi^\perp(x)|^2)} \] which means that $\mu_V$ is, up to a rotation, a product measure, and splits into a 1D Gaussian factor $\mathcal{N}(0,\frac{1}{\rho})$ and a log-concave factor with a $\rho$-convex potential.

This covers as a special degenerate case the Dyson-OU process studied in [CL,CF] as \[ h(x)= \begin{cases} -\beta\log(x) & \text{if $x > 0$}
+\infty & \text{if $x\leq0$} \end{cases} \] for an arbitrary $\beta\geq0$, the degeneracy being equivalent to define the Dyson-OU process on the convex domain $\{x\in\mathbb{R}^d:x_1 > \cdots>x_d\}$ instead of on the whole $\mathbb{R}^d$, to exploit convexity.

The proof in [CF] relies on upper and lower bounds on the Wasserstein distance to the equilibrium. The upper bound comes from curvature while the lower bound comes from the OU factor obtained by projection on affine eigenfunctions provided by rigidity. The passage to other distances is done using functional inequalities, using again curvature.

Geometry. These cutoff and rigidity estimate extend, beyond Euclidean space, to positively curved diffusions on Riemannian manifolds, see [CF] for more information.

Open questions. How about cutoff via stability beyond rigidity?

Further reading.

[SC] Laurent Saloff-Coste
Precise estimates on the rate at which certain diffusions tend to equilibrium
Mathematische Zeitschrift (1994)
[CSC] Guan-Yu Chen and Laurent Saloff-Coste
The cutoff phenomenon for ergodic Markov processes
Electronic Journal of Probability (2008)
See also https://djalil.chafai.net/blog/2024/01/27/cutoff-for-markov-processes/
[BGL] Dominique Bakry, Ivan Gentil, and Michel Ledoux
Analysis and geometry of Markov diffusion operators
Springer (2014)
[CZ] Xu Cheng and Detang Zhou
Eigenvalues of the drifted Laplacian on complete metric measure spaces
Communications in Contemporary Mathematics (2017)
[DPF] Guido De Philippis and Alessio Figalli
Rigidity and stability of Caffarelli's log-concave perturbation theorem
Nonlinear Analysis Theory Methods and Applications (2017)
[CL] Djalil Chafaï and Joseph Lehec
On Poincaré and logarithmic Sobolev inequalities for a class of singular Gibbs measures
Geometric aspects of functional analysis. Vol. I
Lecture Notes in Mathematics, Springer (2020)
[BCL] Jeanne Boursier, Djalil Chafaï, and Cyril Labbé
Universal cutoff for Dyson Ornstein Uhlenbeck process
Probability Theory and Related Fields (2023)
[CF] Djalil Chafaï and Max Fathi
On cutoff via rigidity for high dimensional curved diffusions
arXiv:2412.15969v2 (2024)
[S] Justin Salez
Cutoff for non-negatively curved diffusions
arXiv:2501.01304v1 (2025)

Group Photo of ANR Conviviality Meeting in Lyon June 2025 — Group photo of the "ANR Conviviality Project" meeting in Lyon, June 2025, organized by Marielle Simon, Christophe Poquet, and Ivan Gentil. On the left : Justin Salez, Max Fathi, Arnaud Guillin, Michel Bonnefont, Manon Michel, ... On the right : the author of this blog, Louis-Pierre Chaintron, Pierre Le Bris, Baptiste Devyver, ...

Some other posts: