Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Fisher information between two Gaussians

Photo of Ronald Aylmer Fisher
Ronald A. Fisher (1890 – 1962) in 1951.

Fisher information. The Fisher information or divergence of a positive Borel measure measure \( {\nu} \) with respect to another one \( {\mu} \) on the same space is

\[ \mathrm{Fisher}(\nu\mid\mu) =\int\left|\nabla\log\textstyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\right|^2\mathrm{d}\nu =\int\frac{|\nabla\frac{\mathrm{d}\nu}{\mathrm{d}\mu}|^2}{\frac{\mathrm{d}\nu}{\mathrm{d}\mu}}\mathrm{d}\mu =4\int\left|\nabla\sqrt{\textstyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}}\right|^2\mathrm{d}\mu \in[0,+\infty] \]

if \( {\nu} \) is absolutey continuous with respect to \( {\mu} \), and \( {\mathrm{Fisher}(\nu\mid\mu)=+\infty} \) otherwise.

It plays a role in the analysis and geometry of statistics, information, partial differential equations, and Markov diffusion stochastic processes. It is named after Ronald Aylmer Fisher (1890 – 1962), a British scientist who is also the Fisher of many other objects and concepts including for instance:

However, he should not be confused with for instance:

Let us denote \( {|x|=\sqrt{x_1^2+\cdots+x_n^2}} \) and \( {x\cdot y=x_1y_1\cdots+x_ny_n} \) for all \( {x,y\in\mathbb{R}^n} \).

Explicit formula for Gaussians. For all \( {n\geq1} \), all vectors \( {m_1,m_2\in\mathbb{R}^n} \), and all \( {n\times n} \) covariance matrices \( {\Sigma_1} \) and \( {\Sigma_2} \), we have

\[ \mathrm{Fisher}(\mathcal{N}(m_1,\Sigma_1)\mid\mathcal{N}(m_2,\Sigma_2)) =|\Sigma_2^{-1}(m_1-m_2)|^2+\mathrm{Tr}(\Sigma_2^{-2}\Sigma_1-2\Sigma_2^{-1}+\Sigma_1^{-1}). \]

When \( {\Sigma_1} \) and \( {\Sigma_2} \) commute, this reduces to the following, closer to the univariate case,

\[ \mathrm{Fisher}(\mathcal{N}(m_1,\Sigma_1)\mid\mathcal{N}(m_2,\Sigma_2)) =|\Sigma_2^{-1}(m_1-m_2)|^2+\mathrm{Tr}(\Sigma_2^{-2}(\Sigma_2-\Sigma_1)^2\Sigma_1^{-1}). \]

In the univariate case, this reads, for all \( {m_1,m_2\in\mathbb{R}} \) and \( {\sigma_1^2,\sigma_2^2\in(0,\infty)} \),

\[ \mathrm{Fisher}(\mathcal{N}(m_1,\sigma_1^2)\mid\mathcal{N}(m_2,\sigma_2^2)) =\frac{(m_1-m_2)^2}{\sigma_2^2}+\frac{(\sigma_2^2-\sigma_1^2)^2}{\sigma_1^2\sigma_2^4}. \]

A proof. If \( {X\sim\mathcal{N}(m,\Sigma)} \) then, for all \( {1\leq i,j\leq n} \),

\[ \mathbb{E}(X_iX_j)=\Sigma_{ij}+m_im_j, \]

hence, for all \( {n\times n} \) symmetric matrices \( {A} \) and \( {B} \),

\[ \begin{array}{rcl} \mathbb{E}(AX\cdot BX) &=&\mathbb{E}\sum_{i,j,k=1}^nA_{ij}X_jB_{ik}X_k\\ &=&\sum_{i,j,k=1}^nA_{ij}B_{ik}\mathbb{E}(X_jX_k)\\ &=&\sum_{i,j,k=1}^nA_{ij}B_{ik}(\Sigma_{jk}+m_jm_k)\\ &=&\mathrm{Trace}(A\Sigma B)+Am\cdot Bm, \end{array} \]

and thus for all \( {n} \)-dimensional vectors \( {a} \) and \( {b} \),

\[ \begin{array}{rcl} \mathbb{E}(A(X-a)\cdot B(X-b)) &=&\mathbb{E}(AX\cdot BX)+A(m-a)\cdot B(m-b)-Am\cdot Bm\\ &=&\mathrm{Trace}(A\Sigma B)+A(m-a)\cdot B(m-b). \end{array} \]

Now, using the notation \( {q_i(x)=\Sigma_i^{-1}(x-m_i)\cdot(x-m_i)} \) and \( {|\Sigma_i|=\det(\Sigma_i)} \),

\[ \begin{array}{rcl} \mathrm{Fisher}(\Gamma_1\mid\Gamma_2) &=&\displaystyle4\frac{\sqrt{|\Sigma_2|}}{\sqrt{|\Sigma_1|}}\int\Bigr|\nabla\mathrm{e}^{-\frac{q_1(x)}{4}+\frac{q_2(x)}{4}}\Bigr|^2\frac{\mathrm{e}^{-\frac{q_2(x)}{2}}}{\sqrt{2\pi|\Sigma_2|}}\mathrm{d}x\\ &=&\displaystyle\int|\Sigma_2^{-1}(x-m_2)-\Sigma_1^{-1}(x-m_1)|^2\frac{\mathrm{e}^{-\frac{q_1(x)}{2}}}{\sqrt{2\pi|\Sigma_1|}}\mathrm{d}x\\ &=&\displaystyle\int(|\Sigma_2^{-1}(x-m_2)|^2\\ &&\qquad-2\Sigma_2^{-1}(x-m_2)\cdot\Sigma_1^{-1}(x-m_1)\\ &&\qquad+|\Sigma_1^{-1}(x-m_1)|^2)\frac{\mathrm{e}^{-\frac{q_1(x)}{2}}}{\sqrt{2\pi|\Sigma_1|}}\mathrm{d}x\\ &=&\mathrm{Trace}(\Sigma_2^{-1}\Sigma_1\Sigma_2^{-1})+|\Sigma_2^{-1}(m_1-m_2)|^2-2\mathrm{Trace}(\Sigma_2^{-1})+\mathrm{Trace}(\Sigma_1^{-1})\\ &=&\mathrm{Trace}(\Sigma_2^{-2}\Sigma_1-2\Sigma_2^{-1}+\Sigma_1^{-1})+|\Sigma_2^{-1}(m_1-m_2)|^2. \end{array} \]

The formula when \( {\Sigma_1\Sigma_2=\Sigma_2\Sigma_1} \) follows immediately.

Other distances. Recall that the Hellinger distance between probability measures \( {\mu} \) and \( {\nu} \) with densities \( {f_\mu} \) and \( {f_\nu} \) with respect to the same reference measure \( {\lambda} \) is

\[ \mathrm{Hellinger}(\mu,\nu) =\Bigr(\int(\sqrt{f_\mu}-\sqrt{f_\nu})^2\mathrm{d}\lambda\Bigr)^{1/2} =\Bigr(2-2\int\sqrt{f_\mu f_\nu}\mathrm{d}\lambda\Bigr)^{1/2} \in[0,\sqrt{2}]. \]

This quantity does not depend on the choice of \( {\lambda} \).

The KullbackLeibler divergence or relative entropy is defined by

\[ \mathrm{Kullback}(\nu\mid\mu) =\int\log{\textstyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}}\mathrm{d}\nu =\int{\textstyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}} \log{\textstyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}}\mathrm{d}\mu \in[0,+\infty] \ \ \ \ \ (1) \]

if \( {\nu} \) is absolutey continuous with respect to \( {\mu} \), and \( {\mathrm{Kullback}(\nu\mid\mu)=+\infty} \) otherwise.

The WassersteinKantorovichMonge transportation distance of order \( {2} \) and with respect to the underlying Euclidean distance is defined for all probability measures \( {\mu} \) and \( {\nu} \) on \( {\mathbb{R}^n} \) by

\[ \mathrm{Wasserstein}(\mu,\nu)=\Bigr(\inf_{(X,Y)}\mathbb{E}(\left|X-Y\right|^2)\Bigr)^{1/2} \in[0,+\infty] \ \ \ \ \ (2) \]

where the inf runs over all couples \( {(X,Y)} \) with \( {X\sim\mu} \) and \( {Y\sim\nu} \).

Now, for all \( {n\geq1} \), \( {m_1,m_2\in\mathbb{R}^n} \), and all \( {n\times n} \) covariance matices \( {\Sigma_1,\Sigma_2} \), denoting

\[ \Gamma_1=\mathcal{N}(\mu_1,\Sigma_1) \quad\mbox{and}\quad \Gamma_2=\mathcal{N}(\mu_2,\Sigma_2), \]

we have

\[ \begin{array}{rcl} \mathrm{Hellinger}^2(\Gamma_1,\Gamma_2) &=&2-2\frac{\det(\Sigma_1\Sigma_2)^{1/4}}{\det(\frac{\Sigma_1+\Sigma_2}{2})^{1/2}}\mathrm{exp}\Bigr(-\frac{1}{4}(\Sigma_1+\Sigma_2)^{-1}(m_2-m_1)\cdot(m_2-m_1)\Bigr),\\ 2\mathrm{Kullback}(\Gamma_1\mid\Gamma_2) &=&\Sigma_2^{-1}(m_1-m_2)\cdot(m_1-m_2)+\mathrm{Tr}(\Sigma_2^{-1}\Sigma_1-\mathrm{Id})+\log\det(\Sigma_2\Sigma_1^{-1}),\\ \mathrm{Fisher}(\Gamma_1\mid\Gamma_2) &=&|\Sigma_2^{-1}(m_1-m_2)|^2+\mathrm{Tr}(\Sigma_2^{-2}\Sigma_1-2\Sigma_2^{-1}+\Sigma_1^{-1}),\\ \mathrm{Wasserstein}^2(\Gamma_1,\Gamma_2) &=&|m_1-m_2|^2+\mathrm{Tr}\Bigr(\Sigma_1+\Sigma_2-2\sqrt{\sqrt{\Sigma_1}\Sigma_2\sqrt{\Sigma_1}}\Bigr), \end{array} \]

and if \( {\Sigma_1} \) and \( {\Sigma_2} \) commute, \( {\Sigma_1\Sigma_2=\Sigma_2\Sigma_1} \), then we find the simpler formulas

\[ \begin{array}{rcl} \mathrm{Fisher}(\Gamma_1\mid\Gamma_2) &=&|\Sigma_2^{-1}(m_1-m_2)|^2+\mathrm{Tr}(\Sigma_2^{-2}(\Sigma_2-\Sigma_1)^2\Sigma_1^{-1})\\ \mathrm{Wasserstein}^2(\Gamma_1,\Gamma_2) &=&|m_1-m_2|^2+\mathrm{Tr}((\sqrt{\Sigma_1}-\sqrt{\Sigma_2})^2). \end{array} \]

Fisher as an infinitesimal Kullback. The BoltzmannShannon entropy is in a sense the opposite of the Kullback divergence with respect to the Lebesgue measure \( {\lambda} \), namely

\[ \mathrm{Entropy}(\mu) =-\int\frac{\mathrm{d}\mu}{\mathrm{d}\lambda} \log\frac{\mathrm{d}\mu}{\mathrm{d}\lambda}\mathrm{d}\lambda =\mathrm{Kullback}(\mu\mid\lambda). \]

It was discovered by Nicolaas Govert de Bruijn (1918 — 2012) that the Fisher information appears as the differential version of the entropy under Gaussian noise. More precisely, it states that if \( {X} \) is a random vector of \( {\mathbb{R}^n} \) with finite entropy and if \( {Z\sim\mathcal{N}(0,I_n)} \) then

\[ \frac{\mathrm{d}}{\mathrm{d}t}\Bigr\vert_{t=0} \mathrm{Entropy}(\mathrm{Law}(X+\sqrt{t}Z)\mid\lambda) =-\mathrm{Fisher}(\mathrm{Law}(X)\mid\lambda). \]

In other words, if \( {\mu_t} \) is the law at time \( {t} \) of an \( {n} \)-dimensional Brownian motion started from a random initial condition \( {X} \) then

\[ \frac{\mathrm{d}}{\mathrm{d}t}\Bigr\vert_{t=0} \mathrm{Entropy}(\mu_t\mid\lambda) =-\mathrm{Fisher}(\mu_0\mid\lambda). \]

The Lebesgue measure is the invariant (and reversible) measure of Brownian motion. More generally, let us consider the stochastic differential equation

\[ \mathrm{d}X_t=\sqrt{2}\mathrm{d}B_t-\nabla V(X_t)\mathrm{d}t \]

on \( {\mathbb{R}^n} \), where \( {V:\mathbb{R}^n\mapsto\mathbb{R}} \) is \( {\mathcal{C}^2} \) and where \( {{(B_t)}_{t\geq0}} \) is a standard Brownian motion. If we assume that \( {V-\frac{\rho}{2}\left|\cdot\right|^2} \) is convex for some \( {\rho\in\mathbb{R}} \) then it admits a solution \( {{(X_t)}_{t\geq0}} \) known as the overdamped Langevin process, which is a Markov diffusion process. If we further assume that \( {\mathrm{e}^{-V}} \) is integrable with respect to the Lebesgue measure, then the probability measure \( {\mu} \) with density proportional to \( {\mathrm{e}^{-V}} \) is invariant and reversible. Now, denoting \( {\mu_t=\mathrm{Law}(X_t)} \), the analogue of the De Bruijn identity reads, for all \( {t\geq0} \),

\[ \frac{\mathrm{d}}{\mathrm{d}t} \mathrm{Kullback}(\mu_t\mid\mu) =-\mathrm{Fisher}(\mu_t\mid\mu) \]

but this requires that \( {\mu_0} \) is chosen in such a way that \( {t\mapsto\mathrm{Kullback}(\mu_t\mid\mu)} \) is well defined and differentiable. This condition is easily checked in the example of the OrnsteinUhlenbeck process which corresponds to \( {V=\frac{1}{2}\left|\cdot\right|^2} \) and for which \( {\mu=\mathcal{N}(0,I_n)} \).

Ornstein–Uhlenbeck. If \( {{(X_t^x)}_{t\geq0}} \) is an \( {n} \)-dimensional Ornstein–Uhlenbeck process solution of the stochastic differential equation

\[ X_0^x=x\in\mathbb{R}^n, \quad\mathrm{d}X^x_t=\sqrt{2}\mathrm{d}B_t-X^x_t\mathrm{d}t \]

where \( {{(B_t)}_{t\geq0}} \) is a standard \( {n} \)-dimensional Brownian motion, then the invariant law is \( {\gamma=\mathcal{N}(0,I_n)} \) and the Mehler formula reads

\[ X^x_t=x\mathrm{e}^{-t}+\int_0^t\mathrm{e}^{s-t}\mathrm{d}B_s\sim\mathcal{N}(x\mathrm{e}^{-t},(1-\mathrm{e}^{-2t})I_n), \]

and the explicit formula for the Fisher information for Gaussians gives

\[ \mathrm{Fisher}(\mathrm{Law}(X^x_t)\mid\gamma) =\mathrm{Fisher}(\mathcal{N}(x\mathrm{e}^{-t},(1-\mathrm{e}^{-2t})I_n)\mid\gamma) =|x|^2\mathrm{e}^{-2t}+n\frac{\mathrm{e}^{-4t}}{1-\mathrm{e}^{-2t}}. \]

Further reading.


Back to basics : the Dubins-Schwarz theorem

Lester Eli Dubins (1921-2010)
Lester Eli Dubins (1921-2010)

The Dubins-Schwarz theorem is an important result of stochastic calculus. It states essentially that continuous local martingales and in particular continuous martingales are time changed Brownian motion. It is named after the American mathematician Lester Dubins (1920-2010), and the Israeli mathematician and statistician Gideon E. Schwarz (1933-2007) who is also at the origin of the Bayesian information criterion (BIC) in statistics. He is neither the famous German mathematician Karl Hermann Amandus Schwarz (1841-1921) nor the famous French mathematician Laurent Schwartz (1915-2002). The Dubins-Schwarz theorem was also discovered independently by the Russian mathematician K.È. Dambis, who apparently published a single article, in Russian, in 1965, the same year as the paper by Dubins and Schwarz.

Dubins-Schwarz theorem. Let \( {M} \) be a continuous local martingale with respect to a filtration \( {{(\mathcal{F}_t)}_{t\geq0}} \), such that \( {M_0=0} \) and \( {\langle M\rangle_\infty=\infty} \) almost surely. For all \( {t\geq0} \), let

\[ T_t=\inf\{s\geq0:\langle M\rangle_s>t\}=\langle M\rangle_t^{-1} \]

be the generalized inverse of the non-decreasing process \( {\langle M\rangle} \) issued from \( {0} \). Then

  1. \( {B={(M_{\langle M\rangle_t^{-1}})}_{t\geq0}} \) is a Brownian motion with respect to the filtration \( {{(\mathcal{F}_{T_t})}_{t\geq0}} \)
  2. \( {{(B_{\langle M\rangle_t})}_{t\geq0}={(M_t)}_{t\geq0}} \).

For instance, if \( {M=\alpha W} \) where \( {\alpha>0} \) is a constant and \( {W} \) is a Brownian motion issued from the origin, then for all \( {t\geq0} \) we have \( {\langle M\rangle_t=\alpha^2t} \) and \( {T_t=\alpha^{-2}t} \), and the process

\[ B={(M_{T_t})}_{t\geq0}={(\alpha W_{\alpha^{-2}t})}_{t\geq0} \]

is a Brownian motion with respect to \( {{(\mathcal{F}_{\alpha^{-2}t})}_{t\geq0}} \). In this example, the change of time is deterministic, but in general, it is random, for instance if \( {M_t=\int_0^tW_s\mathrm{d}W_s} \) where \( {{(W_t)}_{t\geq0}} \) is a Brownian motion then \( {\langle M\rangle_t=\int_0^tW_s^2\mathrm{d}s} \) which is random.

Flatness lemma. Since \( {\langle M\rangle} \) can be flat on an interval, the map \( {t\mapsto T_t} \) can be discontinuous. But this does not contradict the continuity of \( {t\mapsto M_{T_t}} \). Indeed, the flatness lemma states that \( {M} \) and \( {\langle M\rangle} \) are constant on the same intervals in the sense that almost surely, for all \( {0\leq a<b} \),

\[ \forall t\in[a,b], M_t=M_a \quad\text{if and only if}\quad \langle M\rangle_b=\langle M\rangle_a. \]

Proof of the flatness lemma. Since \( {M} \) and \( {\langle M\rangle} \) are continuous, it suffices to show that for all \( {0\leq a\leq b} \), almost surely,

\[ \{\forall t\in[a,b]:M_t=M_a\}=\{\langle M\rangle_b=\langle M\rangle_a\}. \]

The inclusion \( {\subset} \) comes from the approximation of the quadratic variation \( {\langle M\rangle=[M]} \). Let us prove the converse. To this end, we consider the continuous local martingale \( {{(N_t)}_{t\geq0}={(M_t-M_{t\wedge a})}_{t\geq0}} \). We have

\[ \langle N\rangle =\langle M\rangle-2\langle M,M^a\rangle+\langle M^a\rangle =\langle M\rangle-2\langle M\rangle^a+\langle M\rangle^a =\langle M\rangle-\langle M\rangle^a. \]

For all \( {\varepsilon>0} \), we set the stopping time \( {T_\varepsilon=\inf\{t\geq0:\langle N\rangle_t>\varepsilon\}} \). The continuous semi-martingale \( {N^{T_\varepsilon}} \) satisfies \( {N^{T_\varepsilon}_0=0} \) and \( {\langle N^{T_\varepsilon}\rangle_\infty=\langle N\rangle_{T_\varepsilon}\leq\varepsilon} \). It follows that \( {N^{T_\varepsilon}} \) is a martingale bounded in \( {\mathrm{L}^2} \), and for all \( {t\geq0} \),

\[ \mathbb{E}(N^2_{t\wedge T_\varepsilon}) =\mathbb{E}(\langle N\rangle_{t\wedge T_\varepsilon}) \leq\varepsilon. \]

Let us define the event \( {A=\{\langle M\rangle_b=\langle M\rangle_a\}} \). Then \( {A\subset\{T_\varepsilon\geq b\}} \) and, for all \( {t\in[a,b]} \),

\[ \mathbb{E}(\mathbf{1}_AN^2_t) =\mathbb{E}(\mathbf{1}_AN^2_{t\wedge T_\varepsilon}) \leq\mathbb{E}(N^2_{t\wedge T_\varepsilon}) \leq\varepsilon. \]

By sending \( {\varepsilon} \) to \( {0} \) we obtain \( {\mathbb{E}(\mathbf{1}_AN^2_t)=0} \) and thus \( {N_t=0} \) almost surely on \( {A} \). This ends the proof of the flatness lemma, which is of independent interest.

Proof of the Dubins-Schwarz theorem. For all \( {t\geq0} \), the random variable \( {T_t} \) is a stopping time with respect to \( {{(\mathcal{F}_u)}_{u\geq0}} \), and \( {s\mapsto T_s} \) is non-decreasing. It follows that for all \( {0\leq s\leq t} \), \( {\mathcal{F}_{T_s}\subset\mathcal{F}_{T_t}} \), and thus \( {{(\mathcal{F}_{T_u})}_{u\geq0}} \) is a filtration. Moreover for all \( {t\geq0} \), \( {T_t} \) is a stopping time for the filtration \( {{(\mathcal{F}_{T_u})}_{u\geq0}} \). We have \( {T_t<\infty} \) for all \( {t\geq0} \) on the almost sure event \( {\{\langle M\rangle_\infty=\infty\}} \). By construction \( {{(T_t)}_{t\geq0}} \) is right continuous, non-decreasing (and thus with left limits), and adapted with respect to \( {{(\mathcal{F}_{T_t})}_{t\geq0}} \). Since \( {M} \) is continuous, \( {B={(M_{T_t})}_{t\geq0}} \) is right continuous with left limits. Moreover, for all \( {t\geq0} \),

\[ B_{t^-}=\lim_{s\underset{<}{\rightarrow}t}B_s=M_{T_{t^-}}. \]

By the flatness lemma, almost surely \( {B_{t^-}=B_t} \) for all \( {t\geq0} \), hence \( {B} \) is continuous.

Let us show that \( {B} \) is a Brownian motion for \( {{(\mathcal{F}_{T_t})}_{t\geq0}} \). For all \( {n\geq0} \), \( {M^{T_n}} \) is a continuous local martingale issued from the origin and \( {\langle M^{T_n}\rangle_\infty=\langle M\rangle_{T_n}=n} \) almost surely. It follows that for all \( {n\geq0} \), the processes

\[ M^{T_n} \quad\mbox{and}\quad (M^{T_n})^2-\langle M\rangle^{T_n} \]

are uniformly integrable martingales. Now, for all \( {0\leq s\leq t\leq n} \), and by the Doob stopping theorem for uniformly integrable martingales, using \( {T_s\leq T_t\leq T_n} \),

\[ \mathbb{E}(B_t\mid\mathcal{F}_{T_s}) =\mathbb{E}(M^{T_n}_{T_t}\mid\mathcal{F}_{T_s}) =M^{T_n}_{T_s} =M_{T_n\wedge T_s} =B_{s} \]

and similarly, using additionally the property \( {\langle M\rangle^{T_n}_{T_t}=\langle M\rangle_{T_n\wedge T_t}=\langle M\rangle_{T_t}=t} \),

\[ \mathbb{E}(B_t^2-t\mid\mathcal{F}_{T_s}) =\mathbb{E}((M^{T_n}_{T_t})^2-\langle M^{T_n}\rangle_{T_t}\mid\mathcal{F}_{T_s}) =(M^{T_n}_{T_s})^2-\langle M^{T_n}\rangle_{T_s} =B_{T_s}. \]

Thus \( {B} \) and \( {{(B_t^2-t)}_{t\geq0}} \) are martingales with respect to the filtration \( {{(\mathcal{F}_{T_t})}_{t\geq0}} \). It follows now from the Lévy characterization that \( {B} \) is a Brownian motion for \( {{(\mathcal{F}_{T_t})}_{t\geq0}} \).

Let us show that \( {M=B_{\langle M\rangle}} \). By definition of \( {B} \), almost surely, for all \( {t\geq0} \),

\[ B_{\langle M\rangle_t}=M_{T_{\langle M\rangle_t}}. \]

Now \( {T_{\langle M\rangle_t^-}\leq t\leq T_{\langle M\rangle_t}} \) and since \( {\langle M\rangle} \) takes the same value at \( {T_{\langle M\rangle_t^-}} \) and \( {T_{\langle M\rangle_t}} \), we get \( {t=T_{\langle M\rangle_t}} \) and the flatness lemma gives \( {M_t=M_{T_{\langle M\rangle_t}}} \) for all \( {t\geq0} \) almost surely. In other words, using the definition of \( {B} \), this means that almost surely, for all \( {t\geq0} \),

\[ M_t=M_{T_{\langle M\rangle_t}}=B_{\langle M\rangle_t}. \]

This ends the proof of the Dubins-Schwarz theorem.

Warnings about the Dubins-Schwarz theorem.

  • The Dubins-Schwarz theorem does not state that \( {B_{\langle M\rangle}=M} \) for a Brownian motion \( {B} \) with respect to the filtration for which \( {M} \) is a local martingale.
  • The Dubins-Schwarz theorem is not valid for semi-martingales.

Ornstein-Uhlenbeck process. For an arbitrary \( {x\in\mathbb{R}} \), let us consider the Ornstein-Uhlenbeck process \( {{(Z_t)}_{t\geq0}} \) issued from \( {x} \) and given for all \( {t\geq0} \) by

\[ Z_t=x\mathrm{e}^{-t}+\mathrm{e}^{-t}M_t \quad\mbox{where}\quad M_t=\sqrt{2}\int_0^t\mathrm{e}^s\mathrm{d}B_s \]

where \( {B={(B_t)}_{t\geq0}} \) is a Brownian motion in \( {\mathbb{R}} \) with respect to \( {{(\mathcal{F}_t)}_{t\geq0}} \). The process \( {{(Z_t)}_{t\geq0}} \) is the unique square integrable continuous semi-martingale solution of the stochastic differential equation \( {Z_t=x+\sqrt{2}B_t-\int_0^tZ_s\mathrm{d}s} \), \( {t\geq0} \).

The process \( {{(M_t)}_{t\geq0}} \) is Gaussian and for all \( {t\geq0} \), \( {M_t\sim\mathcal{N}(0,\langle M\rangle_t)} \) (Wiener integral) with \( {\langle M\rangle_t=\int_0^t(\sqrt{2}\mathrm{e}^{s})^2\mathrm{d}s=\mathrm{e}^{2t}-1} \). Hence, for all \( {t\geq0} \), we have the equality in law

\[ Z_t\overset{\mathrm{d}}{=}x\mathrm{e}^{-t}+\mathrm{e}^{-t}B_{\mathrm{e}^{2t}-1}. \]

The processes \( {{(Z_t)}_{t\geq0}} \) and \( {{(x\mathrm{e}^{-t}+\mathrm{e}^{-t}B_{\mathrm{e}^{2t}-1})}_{t\geq0}} \) have same one-dimensional marginal distributions, but they are not equal since the second is not measurable with respect to \( {{(\mathcal{F}_t)}_{t\geq0}} \).

However, since \( {{(M_t)}_{t\geq0}} \) is a continuous local martingale with respect to \( {{(\mathcal{F}_t)}_{t\geq0}} \) for which \( {M_0=0} \) and \( {\langle M\rangle_\infty=\infty} \), the Dubins-Schwarz theorem states that there exists a Brownian motion \( {{(W_t)}_{t\geq0}} \) with respect to \( {{(\mathcal{F}_{T_t})}_{t\geq0}} \) where

\[ T_t =\inf\{s\geq0:\langle M\rangle_s>t\} =\frac{\log(t+1)}{2} \]

such that

\[ {(Z_t)}_{t\geq0} = {(x\mathrm{e}^{-t}+\mathrm{e}^{-t}W_{\langle M\rangle_t})}_{t\geq0} = {(x\mathrm{e}^{-t}+\mathrm{e}^{-t}W_{\mathrm{e}^{2t}-1})}_{t\geq0}. \]

About Gideon E. Schwarz. Born 1933 in Salzburg, Austria. Escaped in 1938, after the Anschluss, to Palestine, today Israel. M.Sc. in Mathematics at the Hebrew University, Jerusalem in 1956. Ph.D. in Mathematical Statistics at Columbia University in 1961. Research fellowships: Miller Institute 1964-66, Institute for Advanced Studies on Mt. Scopus 1975-76. Visiting appointments: Stanford University, Tel Aviv University, University of California in Berkeley. Since 1961, Fellow of the Institute of Mathematical Statistics. Presently, Professor of Statistics at the Hebrew University. Taken from his paper The dark side of the Moebius strip, Amer. Math. Monthly 97 (1990), no. 10, 890-897.

Gideon E. Schwarz (1933-2017)
Gideon E. Schwarz (1933-2017)
Leave a Comment

An exactly solvable model

Rodney James Baxter
Rodney James Baxter (1940 – )

This post is about a model of statistical physics which consists in a probability measure on \( {\mathbb{R}^n} \) modeling \( {n} \) one-dimensional unit charge particles subject to Coulomb pair repulsion and to attraction with respect to a background of opposite charge. It is a one-dimensional Coulomb gas (not a log-gas!) confined by the potential generated by a charged background, a special case of the jellium model of Paul Eugene Wigner (1938). In the case of a uniform background, it is related to a conditioned Gaussian distribution. It was already observed by Rodney James Baxter (1963) that this model is exactly solvable. This exact solvability can be seen as the one-dimensional analogue of the exact solvability discovered by Eric Kostlan (1992) in the case of the two-dimensional Coulomb gas describing the spectrum of Ginibre random matrices.

One-dimensional Wigner jellium or Coulomb gas. The electrostatic potential at point \( {x\in\mathbb{R}} \) generated by a one-dimensional unit charge located at the origin is given by \( {g(x)=-|x|} \). By the principle of superposition, the electrostatic potential generated at point \( {x\in\mathbb{R}} \) by a distribution of charges \( {\mu} \) on \( {\mathbb{R}} \) is given by

\[ U_{\mu}(x)=(g*\mu)(x)=-\int|x-y|\mu(\mathrm{d}y) \]

and the electric field by \( {E_\mu=-U_{\mu}’=-g’*\mu=\mathrm{sign}*\mu} \). The derivative of \( {g} \) is the sense of distributions is the Heaviside step function \( {g’=\mathbf{1}_{-(\infty,0)}-\mathbf{1}_{(0,+\infty)}} \), which is an element of \( {\mathrm{L}^\infty} \) defined almost everywhere, while the second derivative of \( {g} \) in the sense of Schwartz distributions is a Dirac mass at zero \( {g”=-2\delta_0} \). It particular \( {g} \) is the fundamental solution of the Poisson equation and we can recover \( {\mu} \) from its potential by

\[ U”_{\mu}=g”*\mu=-2\delta_0*\mu=-2\mu. \]

The self-interaction energy of the distribution of charges \( {\mu} \) is

\[ \mathcal{E}(\mu) =\frac{1}{2}\iint g(x-y)\mu(\mathrm{d}x)\mu(\mathrm{d}y) =\int U_\mu\mathrm{d}\mu. \]

Let us consider now \( {n\geq1} \) one-dimensional unit charges at positions \( {x_1,\ldots,x_n} \), lying in a positive background of total charge \( {\alpha>0} \) smeared according to a probability measure \( {\rho} \) on \( {\mathbb{R}} \) with finite Coulomb energy \( {\mathcal{E}(\rho)} \). The total potential energy of the system is

\[ H_n(x_1,\ldots,x_n) = -\sum_{i<j} |x_i-x_j| -\alpha\sum_{i=1}^nU_{\rho}(x_i) \]

up to the additive constant \( {\alpha^2\mathcal{E}(\rho)} \). The system is charge neutral when \( {\alpha=n} \). Following Wigner (1938), let us define now the Boltzmann-Gibbs probability measure \( {P_n} \) over all the possible configurations at inverse temperature \( {\beta>0} \) by

\[ \mathrm{d}P_n(x_1,\ldots,x_n) =\frac{\mathrm{e}^{-\beta H_n(x_1,\ldots,x_n)}}{Z_n} \mathrm{d}x_1\cdots\mathrm{d}x_n \]


\[ Z_n=\int_{\mathbb{R}^n}\mathrm{e}^{-\beta H_n(x_1,\ldots,x_n)} \mathrm{d}x_1\cdots\mathrm{d}x_n. \]

It can be checked that \( {Z_n<\infty} \) if and only if \( {\alpha<n} \). Note that \( {P_n} \) is a one-dimensional Coulomb gas with external field associated to the potential \( {V=-\frac{\alpha}{n}U_\rho} \).

Baxter exact solvability for uniform backgrounds. The model is exactly solvable. Indeed, following Baxter (1963), we have the combinatorial identity

\[ -\sum_{i < j} |x_i – x_j| =\sum_{i<j}(x_{(j)}-x_{(i)}) =\sum_{k=1}^n (2k-n-1) x_{(k)}, \]

where \( {x_{(n)}\leq\cdots\leq x_{(1)}} \) is the reordering of \( {x_1,\ldots,x_n} \); in particular,

\[ x_{(n)}=\min_{1\leq i\leq n}x_i \quad\text{and}\quad x_{(1)}=\max_{1\leq i\leq n}x_i, \]

which allows to rewrite the potential energy as

\[ H_n(x_1,\dots,x_n) = \sum_{k=1}^n \Bigr((2k-n-1)x_{(k)}-\alpha_n U_\rho(x_{(k)})\Bigr). \]

We assume now that \( {\rho} \) is the uniform law on an interval \( {[a,b]} \). Then, for all \( {x\in\mathbb{R}} \),

\[ -U_\rho(x) =\frac{1}{b-a}\int_a^b|x-y|\mathrm{d}y =\begin{cases} \displaystyle\left|x-\frac{a+b}{2}\right| &\mbox{if }x\not\in[a,b]\\ \displaystyle\frac{\left(x-\frac{a+b}{2}\right)^2+\frac{(b-a)^2}{4}}{b-a} &\mbox{if }x\in[a,b] \end{cases}. \]

The potential \( {V=-\frac{\alpha}{n}U_\rho} \) then behaves quadratically on \( {[a,b]} \) and is affine outside \( {[a,b]} \). Conditioned on all the particles lying inside \( {[a,b]} \), it is possible to interpret \( {P_n} \) as a conditioned Gaussian law. Indeed, using Baxter’s identity, if \( {\{x_1,\ldots,x_n\}\subset[a,b]} \) then

\[ H_n(x_1,\ldots,x_n) = \sum_{k=1}^n(2k-n-1)x_{(k)} +\frac{\alpha}{b-a}\sum_{i=1}^n\Bigr(x_{(i)}-\frac{a+b}{2}\Bigr)^2+\frac{n\alpha(b-a)}{4}. \]

This formula shows then that \( {X_n\sim P_n} \) is conditionally Gaussian in the sense that

\[ \mathrm{Law}\Bigr((X_{(n)},\ldots,X_{(1)})\bigm\vert \{X_1,\ldots,X_n\}\subset[a,b]\Bigr) \]

\[ \qquad =\mathrm{Law}\Bigr((Y_n,\ldots,Y_1)\bigm\vert a\leq Y_n\leq\cdots\leq Y_1\leq b\Bigr) \]

where \( {Y_1,\ldots,Y_n} \) are independent real Gaussian random variables with

\[ \mathbb{E}Y_k=\frac{a+b}{2}+\frac{b-a}{2\alpha}\left(n+1-2k \right) \quad\text{and}\quad \mathbb{E}((Y_k-\mathbb{E}Y_k)^2)=\frac{b-a}{2\alpha\beta}. \]

This was already observed by Baxter. Now if we consider the limit \( {a\rightarrow-\infty, b\rightarrow \infty} \) with \( {\alpha/(b-a) \rightarrow c > 0} \), then \( {P_n} \) can be interpreted as a Coulomb gas for which the potential is quadratic everywhere, namely \( {V=\frac{c}{2n}\left|\cdot\right|^2} \). This can also be seen as a jellium with a background equal to a multiple of Lebesgue measure on the whole of \( {\mathbb{R}} \). Under the scaling \( {x_i=\sqrt{n}y_i} \), this limiting case matches the model studied by Abhishek Dhar, Anupam Kundu, Satya N. Majumdar, Sanjib Sabhapandit, and Grégory Schehr (2018). This Coulomb gas model with quadratic external field in one dimension is analogous to the complex Ginibre ensemble which is a Coulomb gas in two dimensions.

Scale invariance. The model \( {P_n} \) has a scale invariance which comes from the homogeneity of the one-dimensional Coulomb kernel \( {g} \). Indeed, if \( {\mathrm{dil}_\sigma(\mu)} \) denotes the law of the random vector \( {\sigma X} \) when \( {X_n\sim\mu} \), then, for all \( {\sigma>0} \), dropping the \( {n} \) subscript,

\[ \mathrm{dil}_\sigma(P^{\alpha,\beta,\rho}) =P^{\alpha,\frac{\beta}{\sigma},\mathrm{dil}_\sigma(\rho)}. \]

In other words, if \( {X_n\sim P^{\alpha,\beta,\rho}} \) then

\[ \sigma X_n\sim P^{\alpha,\frac{\beta}{\sigma},\mathrm{dil}_\sigma(\rho)}. \]

This property is useful in the asymptotic analysis of the model as \( {n\rightarrow\infty} \), and reveals the special role played by \( {\alpha} \) as a shape parameter. Here the inverse temperature \( {\beta} \) is a scale parameter, in contrast with the situation for log-gases.

Asymptotic analysis. An asymptotic analysis of \( {P_n} \) as \( {n\rightarrow\infty} \) is conducted in a joint work arXiv:2012.04633 with David García-Zelada and Paul Jung, for general backgrounds. This is a continuation of our previous work devoted to two-dimensional jelliums. We study one-dimensional Wigner jelliums, not necessarily charge neutral, for which the unit charges are allowed to exist beyond the support of the background. The model can be seen as a one-dimensional Coulomb gas (not a log-gas!) in which the external field is generated by a smeared background on an interval. We first observe that the system exists iff the total background charge is greater than the number of unit charges minus one. Moreover we obtain a Rényi-type probabilistic representation for the order statistics of the particle system beyond the support of the background. Furthermore, for various backgrounds, we show convergence to point processes, at the edge of the support of the background. In particular, this provides asymptotic analysis of the fluctuations of the right-most particle. Our analysis reveals that these fluctuations are not universal, in the sense that depending on the background, the tails range anywhere from exponential to Gaussian-like behavior, including for instance Tracy-Widom-like behavior.

One Dimensional Models. Excerpt from Baxter’s book on exactly solved models (1982):
One-dimensional models can be solved if they have finite-range, decaying exponential, or Coulomb interactions. As guides to critical phenomena, such models with short-range two-particle forces (including exponentially decaying forces) have a serious disadvantage: they do not have a phase transition at a non-zero temperature (van Hove, 1950; Lieb and Mattis, 1966). The Coulomb systems also do not have a phase transition, (Lenard, 1961; Baxter, 1963, 1964 and 1965), though the one-dimensional electron gas has long-range order at all temperatures (Kunz, 1974).
Of the one-dimensional models, only the nearest-neighbour Ising model (Ising, 1925; Kramers and Wannier, 1941) will be considered in this book. It provides a simple introduction to the transfer matrix technique that will be used for the more difficult two-dimensional models. Although it does not have a phase transition for non-zero temperature, the correlation length does become infinite at H = T = 0, so in a sense this is a ‘critical point’ and the scaling hypothesis can be tested near it.
A one-dimensional system can have a phase transition if the interactions involve infinitely many particles, as in the cluster interaction model (Fisher and Felderhof, 1970; Fisher, 1972). It can also have a phase transition if the interactions become infinitely long-ranged, but then the system really belongs to the following class of ‘infinite-dimensional’ models.

A famous book by R. J. Baxter
A famous book by R. J. Baxter
Leave a Comment

Vice-présidence du numérique : 2017-2020

Tunnel numériqueMon mandat de vice-président en charge du numérique de l’université Paris-Dauphine s’est achevé ce mois-ci. Durant ces quatre années 2017-2020 bien remplies, j’aurai collaboré avec deux directeurs du numérique successifs, quatre directeurs généraux des services successifs dont deux par intérim, une présidente puis un administrateur provisoire, et une grande variété de responsables politiques et administratifs. Pour paraphraser Nietzsche, ce qui ne tue pas renforce ! Tous comptes faits, il faut du temps, peut-être un an ou deux, pour commencer à comprendre en profondeur les tenants et les aboutissants d’une organisation comme une université.

Le numérique – on dit « digital » en anglais – est le nom donné depuis quelques années dans l’univers francophone à l’informatique au sens large, qui va des technologies elles-mêmes aux sciences humaines et sociales, en incluant à la fois l’administration, l’enseignement, et la recherche. Être responsable du numérique incite à avoir autant que possible les pieds sur terre et la tête dans les étoiles, une vision stratégique et une prise en compte pragmatique du concret. J’ai plutôt apprécié ce difficile défi du grand écart. L’état du numérique à Paris-Dauphine en 2020 n’a plus grand chose à voir avec celui de 2016, mais beaucoup reste encore à faire.

Ces années de vice-présidence m’ont vraiment permis de mieux connaître l’université et ceux qui la peuplent. Inutile de préciser qu’on sue souvent sous le costume (*) dans ce grand théâtre social. Dans la myriade d’expériences diverses vécues, je me souviens encore avec sourire de réunions avec des représentants syndicaux durant lesquelles j’étais manifestement assimilé à un membre du patronat aux intentions maléfiques. Je me souviens également d’une séance du conseil d’administration durant laquelle, après mon exposé sur l’état du numérique, un élu étudiant a pris la parole en commençant son intervention par un « Nous les millénials… ».

(*) costume que je ne portais jamais, ce qui n’arrangeait pas forcément mes affaires.

Dauphine est une université de petite taille, avec un nombre réduit de départements de formation et de laboratoires de recherche, et un campus particulièrement ramassé. Malgré tout, les uns et les autres se connaissent assez peu, chacun vit dans son microcosme. Il faut donc constamment inciter à la prise en compte des réalités et contraintes des autres pour donner du sens aux choix collectifs. L’ingratitude braillarde reste quand même assez répandue. Tout le monde se plaint, étudiants, administratifs, enseignants, chercheurs, en général des autres, parfois toujours des mêmes. Le numérique peut être autant une source de souffrance qu’un bouc émissaire pour masquer des médiocrités. Mais il y a aussi une hiérarchie sociale, les étudiants et les administratifs sont ceux qui subissent le plus, tandis que les enseignants-chercheurs sont libres, pour le meilleur et pour le pire. Il y a pourtant des personnes formidables dans toutes les catégories. Chaque catégorie a elle-même sa hiérarchie sociale, son histoire pesante, et ses difficultés, et c’est parfois chez les autres que se trouve un ou une semblable.

En 2016, un grand nombre d’utilisateurs dauphinois de toutes les communautés étaient exaspérés par le manque de disponibilité, de fiabilité, et de sécurité de la solution utilisée pour le courrier électronique à l’époque (Partage). Par ailleurs ils étaient tout aussi nombreux à déplorer le manque de communication des outils informatiques entre eux. Pour toutes ces raisons, il est apparu que le passage à un bouquet de services numériques dans les nuages (cloud) était la meilleure chose à faire. Deux solutions étaient disponibles sur le marché : Microsoft O365 et Google G Suite. Étant donné que l’administration et une bonne partie des étudiants et des enseignants-chercheurs utilisaient en standard Microsoft Office, il est apparu que le choix de Microsoft O365 était le plus approprié, pour éviter une difficile voire impossible conduite du changement. Il n’a pas toujours été facile de défendre ce choix, et nous avons parfois douté de cette audace. Rétrospectivement, l’expérience vécue en ce moment dans l’entreprise Airbus, qui a choisi G Suite, ne laisse aucun doute à ce sujet. Non seulement nous avons finalement fait le bon choix, mais nous avons également eu de la chance, car Microsoft Teams est apparu ensuite au sein de O365, et constitue un élément majeur de la transformation numérique de l’organisation, démarrée bien avant la crise de la Covid-19. Il est vrai que Microsoft a une image plus négative que Apple ou Google dans certaines communautés, qui ont hérité d’une époque révolue. Microsoft aujourd’hui a misé sur le cloud, rivalise de dynamisme avec les autres GAFAM, contribue au noyau Linux, a racheté GitHub, etc. La vision du nouveau PDG Satya Nadella depuis une dizaine d’années n’y est pas pour rien. Cela étant dit, Microsoft n’est pas plus vertueux que les autres GAFAM. L’hégémonie écrasante des États-Unis et de l’Asie sur l’industrie du numérique pose problème, aussi bien pour le matériel, le logiciel, les réseaux, que pour les services, avec ou sans cloud. Il est légitime de regretter le manque de vision économique de l’Europe en la matière, mais ce n’est pas en tournant le dos à la modernité à Dauphine que nous résoudrons ce problème de politique européenne. En fournissant un bouquet de services numériques de qualité, professionnel, en protégeant les données contractuellement, Dauphine jugule la prolifération massive de l’usage de services numériques en ligne tiers faussement gratuits qui font payer les utilisateurs avec leurs données et métadonnées privées. Ce fléau au parfum de paradoxe ravage bon nombre d’universités et d’organismes de recherche en France.

Office 365 pèse relativement peu sur le budget au vu de ce qu’il apporte. Mais contrairement à la plupart des autres universités françaises, Dauphine développe ou adapte des solutions logicielles pour ses besoins spécifiques : candidatures et dossiers vacataires dématérialisés, base de données de la recherche, gestion de la relation client, … Tout ce sur-mesure coûte cher, et le retard à rattraper sur l’idéal est encore très important. Dauphine souffre en matière de numérique d’avoir l’ambition du secteur privé, les contraintes du secteur public, et un désordre typiquement universitaire, qui commence par celui des enseignants-chercheurs. Ici comme ailleurs, le principal levier pour la transformation numérique de l’établissement n’est pas la qualité du réseau WiFi ou des vidéoprojecteurs, mais plutôt le niveau numérique des salariés de l’organisation, et en tout premier lieu celui des responsables et des dirigeants.

Il est parfois utile de penser le numérique comme un bouquet de symétries : numérique pour l’administration et administration du numérique, numérique pour l’enseignement et enseignement du numérique, numérique pour la recherche et recherche sur le numérique. La mandature 2017-2020 a beaucoup consisté, en matière de numérique, à introduire un peu plus de méthode, de rigueur, de qualité. En matière de transformation numérique de l’organisation, il restera toujours vrai que numériser du désordre produit du désordre numérique. Le numérique a fait l’objet de la première régulation de la mandature, à travers notamment la création d’un schéma directeur numérique. La transformation de la direction des systèmes d’information en direction du numérique s’est accompagné d’une nouvelle vision plus orientée vers les services, les usages, et le numérique de proximité. Parallèlement, la création du programme transversal Dauphine numérique a permis de renforcer le numérique sur les versants de l’enseignement et de la recherche, en phase avec l’institut PRAIRIE de PSL, et de renforcer le développement des relations avec les entreprises autour des sciences des organisations et du numérique. Deux postes de professeurs « transversaux » en sciences des données ont été créés et pourvus sur le programme Dauphine numérique, ce qui n’est pas négligeable à l’échelle de Dauphine.

La nouvelle mandature s’inscrit dans une continuité et une consolidation de celle qui s’achève, avec notamment une cohérence plus marquée avec PSL, la prise en compte du chantier du nouveau campus, et une mise en œuvre plus volontaire du numérique dans les formations.

Quelques billets reliés :

Leave a Comment
Syntax · Style · .