This post is about diffusions leaving invariant a given probability measure. For simplicity, we focus on a probability measure \( {\mu} \) on \( {\mathbb{R}^d} \), \( {d\geq1} \), given by
\[ \mu(\mathrm{d}x)=\varphi(x)\mathrm{d}x, \]
where \( {\varphi:\mathbb{R}^d\rightarrow\mathbb{R}_+} \) is smooth with \( {\Vert\varphi\Vert_{L^1(\mathrm{d}x)}=1} \). In statistical physics, this measure is a Boltzmann-Gibbs measure and takes the form
\[ \varphi(x)=\frac{\mathrm{e}^{-\beta U(x)}}{Z} \]
where \( {Z} \) is the normalizing constant (“partition function”), where \( {U:\mathbb{R}^d\rightarrow\overline{\mathbb{R}}} \) is smooth (“energy” or “potential”), while \( {\beta\geq0} \) is a factor (“inverse temperature”).
Gradient diffusion. The most well known diffusion process \( {X={(X_t)}_{t\geq0}} \) leaving invariant \( {\mu} \) is the solution of the stochastic differential equation
\[ \mathrm{d}X_t=\sqrt{2}\mathrm{d}B_t+(\nabla\log\varphi)(X_t)\mathrm{dt}. \]
where \( {{(B_t)}_{t\geq0}} \) is a standard Brownian motion on \( {\mathbb{R}^d} \). Equivalently
\[ \mathrm{d}X_t=\sqrt{2}\mathrm{d}B_t-\beta\nabla U(X_t)\mathrm{dt} \]
which is a noisy version of the ordinary differential equation \( {x_t’=-\beta\nabla U(x_t)} \). We suppose that the solution of the stochastic differential equation does not explode in finite time, which is certain when \( {-\log\varphi=\beta U} \) is nice enough, say with a second derivative bounded below by some real number. When the probability measure \( {\mu} \) is a Gaussian law with mean zero, then \( {X} \) is an Ornstein-Uhlenbeck process.
The infinitesimal generator of the process is the diffusion operator
\[ Lf=\Delta f+\langle\nabla\log\varphi,\nabla f\rangle. \]
In other words
\[ Lf=\Delta f-\beta\nabla U\cdot\nabla f. \]
Let us check that \( {\mu} \) is invariant in the sense that \( {X_0\sim\mu} \) implies \( {X_t\sim\mu} \) for any \( {t\geq0} \). Since
\[ Lf(x)=\partial_{t=0}\mathrm{E}(f(X_t)\mid X_0=x), \]
it suffices to show that for any smooth and compactly supported test function \( {f} \),
\[ \int Lf(x)\mathrm{d}\mu(x)=0. \]
Indeed an integration by parts with respect to the Lebesgue measure gives
\[ \int\Delta f\,\mathrm{d}\mu = \int\Delta f(x)\,\varphi(x)\mathrm{d}x =-\int\langle\nabla f(x),\nabla\varphi(x)\rangle\mathrm{d}x =-\int\langle\nabla f,\nabla\log\varphi\rangle\mathrm{d}\mu. \]
In fact \( {\mu} \) is reversible in the sense that for any \( {f,g} \),
\[ \int fLg\mathrm{d}\mu =-\int\langle\nabla f,\nabla g\rangle\mathrm{d}\mu =\int gLf\mathrm{d}\mu. \]
Playing with speed. For any \( {\alpha>0} \) the process \( {X’=(X_{\alpha t})_{t\geq0}} \) obtained from \( {X} \) by a linear time change also leaves \( {\mu} \) invariant. It solves the equation
\[ \mathrm{d}X’_t = \sqrt{2\alpha}\mathrm{d}B_t-\alpha\beta\nabla U(X’_t)\mathrm{dt}. \]
Its semigroup is \( {\mathrm{e}^{\alpha tL}} \) where \( {Lf=\Delta f-\beta\nabla U\cdot\nabla f} \). Taking \( {\alpha=1/\beta} \) gives
\[ \mathrm{d}X’_t = \sqrt{\frac{2}{\beta}}\mathrm{d}B_t-\nabla U(X’_t)\mathrm{dt} \]
which allows to see \( {1/\beta} \) as the variance parameter for the Brownian noise term, which is perfectly coherent with the “inverse temperature” terminology for \( {1/\beta} \).
Divergence free perturbation. Are there other diffusions leaving invariant \( {\mu} \)? The answer is YES. Namely if \( {F:\mathbb{R}^d\rightarrow\mathbb{R}^d} \) is smooth with
\[ \mathrm{div}(\varphi F)=0 \]
then \( {\mu} \) is left invariant by the process \( {Y=(Y_t)_{t\geq0}} \) solution of
\[ \mathrm{d}Y_t =\sqrt{2}\mathrm{d}B_t+(\nabla\log\varphi)(Y_t)\mathrm{d}t+F(Y_t)\mathrm{dt}. \]
Indeed, the infinitesimal generator of \( {Y} \) is
\[ Lf=\Delta f+\langle\nabla\log\varphi,\nabla f\rangle+\langle F,\nabla f\rangle. \]
It suffices to check that for any smooth and compactly supported test function \( {f} \),
\[ \displaystyle\int\langle F,\nabla f\rangle\mathrm{d}\mu=0. \]
Indeed, by integration by parts, we have, using the assumption \( {\mathrm{div}(\varphi F)=0} \),
\[ \int\langle F,\nabla f\rangle\mathrm{d}\mu =\int\langle \varphi F,\nabla f\rangle\mathrm{d}x =-\int\mathrm{div}(\varphi F)f\mathrm{d}x =0. \]
A geometric interpretation of the condition \( {\mathrm{div}(\varphi F)=0} \) is given by the identity
\[ \mathrm{div}(\varphi F) =\sum_{i=1}^d\partial_i(\varphi F_i) =\langle\nabla\varphi,F\rangle+\varphi\mathrm{div}F. \]
Indeed if \( {F} \) is such that \( {\mathrm{div}F=0} \) and \( {F\perp\nabla\varphi} \) then \( {\mathrm{div}(\varphi F)=0} \). For the stochastic process \( {Y} \), this means that the effect of \( {F} \) is a drift inside the level sets of \( {\varphi} \) (or equivalently of the potential \( {-\frac{1}{\beta}\log\varphi=U} \)). Intuitively this drift will help the process to reach its equilibrium quickly, even when the initial process \( {X} \) is a purely Gaussian Ornstein-Uhlenbeck process, by providing additional spatial exploration. The drawback is the loss of the reversibility: in general \( {\mu} \) is no longer reversible for \( {Y} \) due to the fact that \( {F} \) is not a gradient in general.
The long time numerical approximation of the process \( {X} \) or more generally \( {Y} \) can be used as a proxy for the simulation of its invariant measure \( {\mu} \). Also, for Monte Carlo Markov Chains practitioners, the concrete problem is to select \( {F} \) in order to be sure to improve the speed of convergence to the equilibrium.
Non reversibility may improve the speed of convergence, by going beyond standard diffusivity. One of the most simple instance of this phenomenon is the asymmetric random walk on the discrete circle (which is not reversible) but which converges to its equilibrium (uniform distribution) faster than the symmetric random walk (which is reversible). Many other examples are provided for instance by Piecewise Deterministic Markov Processes (PDMP) which are typically not reversible.
Hybrid or Hamiltonian Monte Carlo. There is another way to produce a diffusion which goes faster than \( {X} \) to the equilibrium \( {\mu} \). Roughly speaking, the idea is to add to the space more dimensions, connecting them to the dynamics and use them to better explore the space. An implementation of this vague idea is provided by Hybrid or Hamiltonian Monte Carlo (HMC) methods. Let us briefly describe it.
We introduce an auxiliary smooth density \( {\psi:\mathbb{R}^d\rightarrow\mathbb{R}_+} \) with \( {\Vert\psi\Vert_{L^1(\mathrm{d}x)}=1} \). Now let \( {{(X_t,V_t)}_{t\geq0}} \) be the diffusion process on \( {\mathbb{R}^d\times\mathbb{R}^d} \) solution of
\[ \begin{cases} \mathrm{d} X_t & =-\nabla\log\psi(V_t)\mathrm{d}t\\ \mathrm{d} V_t & =\nabla\log\varphi(X_t)\mathrm{d} t+\nabla\log\psi(V_t)\mathrm{d} t+\sqrt{2}\mathrm{d} B_t \end{cases} \]
where \( {{(B_t)}_{t\geq0}} \) is a standard Brownian motion on \( {\mathbb{R}^d} \). In the case where the auxiliary density \( {\psi} \) is a standard Gaussian say \( {\psi(v)=(2\pi)^{-\frac{n}{2}}\exp(-\frac{1}{2}|v|^2)} \) then
\[ V_t=\frac{\mathrm{d} X(t)}{\mathrm{d} t}, \]
and in this case \( {X_t} \) and \( {V_t} \) can be interpreted as respectively the position and the velocity of a point in \( {\mathbb{R}^d} \) at time \( {t} \) (in this kinetic context, the stochastic differential equation above is known as a “Langevin dynamics”). It turns out that the invariant measure of the process \( {(X,V)} \) is the product measure
\[ \eta=\mu\otimes\nu \quad\text{where}\quad \nu(\mathrm{d}s)=\psi(v)\mathrm{d}v \]
in other words
\[ \eta(\mathrm{d}x,\mathrm{d}v)=\varphi(x)\psi(v)\mathrm{d}x\mathrm{d}v. \]
To check the invariance of \( {\eta} \) for the process \( {(X,V)} \), it suffices to show that for any smooth and compactly supported test function \( {f:\mathbb{R}^d\times\mathbb{R}^d\rightarrow\mathbb{R}} \),
\[ \int Lf\mathrm{d}\eta =0 \]
where \( {L} \) is the infinitesimal generator of \( {(X,V)} \) given by
\[ Lf= \underbrace{\Delta_vf+\nabla\log\psi(v)\cdot\nabla_vf}_{L_1}+\underbrace{\nabla \log\varphi(x)\cdot\nabla_vf-\nabla\log\psi(v)\cdot\nabla_x f}_{L_2}. \]
Indeed we have by integration by parts
\[ \int L_1f\mathrm{d}\eta =\int\left(\int L_1f\psi(v)\mathrm{d}v\right)\varphi(x)\mathrm{d}x =0. \]
while by integration by parts again
\[ \begin{array}{rcl} \int L_2f\mathrm{d}\eta &=&\int\nabla\log\varphi(x)\cdot\left(\int(\nabla_vf)\psi(v)\mathrm{d}v\right)\varphi(x)\mathrm{d}x\\ &\quad& -\int\nabla\log\psi(v)\cdot\left(\int(\nabla_xf)\varphi(x)\mathrm{d}x\right)\psi(v)\mathrm{d}v\\ &=&-\int\nabla\log\varphi(x)\cdot\nabla\log\psi(v)\varphi(x)f(x,s)\psi(v)\mathrm{d}x\mathrm{d}v\\ &\quad& +\int\nabla\log\psi(v)\cdot\nabla\log\varphi(x)f(x,s)\varphi(x)\psi(v)\mathrm{d} x\mathrm{d}v\ &=&0. \end{array} \]
If we write \( {\psi(v)=\frac{\mathrm{e^{-\beta W(v)}}}{Z’}} \) then
\[ \eta(\mathrm{d}x,\mathrm{d}v) =\varphi(x)\psi(v)\mathrm{d}x\mathrm{d}v =\frac{\mathrm{e}^{-\beta H(x,v)}}{Z”}\mathrm{d}x\mathrm{d}v \]
where \( {Z”=ZZ’} \) and where \( {H(x,v)=U(x)+W(v)} \) is the “Hamiltonian”. HMC methods are very useful in practice, associated to efficient numerical schemes for approximating the solution of the stochastic differential equation of \( {(X,V)} \).
Further reading. There are plenty of good references, such as for instance the recent survey by Tony Lelièvre and Gabriel Stoltz entitled Partial differential equations and stochastic methods in molecular dynamics (Acta Numerica 25 (2016), 681-880).
You may also take a look at the talk by Pierre Monmarché during the Journée algorithmes stochastiques à Paris-Dauphine.
1 Comment