Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Militantisme

Chaque disparition de célébrité est l’occasion de prendre le temps d’une redécouverte. L’interview ci-dessus de Claire Brétecher (1940 – 2020) est intéressante. On y apprend (2:22), entre autres choses, qu’à titre personnel, elle n’appréciait guère le militantisme, trop pur et dur à son gout, avec sa vérité dogmatique et exclusive. Je suis plutôt de cet avis, depuis l’adolescence, et c’est pour moi un plaisir d’identifier une sœur de pensée ! Cela ne m’a pas empêché de mener des activités engagées, par exemple en faveur des logiciels libres. Ces engagements m’ont fait souvent côtoyer des camarades dont le militantisme était trop religieux à mon goût. Il y a, en toutes choses intellectuelles, une voie faite d’esthétique de l’absolu, qui met en avant une vérité, la vérité, une statique définitive que je trouve au bout du compte mortifère. Je lui préfère la dynamique de la dialectique permanente, bien vivante. Mais la dynamique n’est-elle pas qu’une réinvention perpétuelle de la statique, une statique de seconde espèce en quelque sorte ? L’univers des mathématiques est intéressant à ce sujet : beaucoup de mathématiciens pensent découvrir des vérités définitives, pures et dures, pour l’éternité, et ne font bien souvent qu’accomplir une œuvre humaine, imparfaite, datée, éphémère, comme Nicolas Bourbaki. Malgré tout, le bouillonnement de l’histoire fait surnager des bribes d’éternité, en mouvement.

L’affrontement des contraires a toujours lieu quelque part, entre groupes sociaux, entre individus, ou à l’intérieur de chaque esprit. Cela semble être une affaire de lieu et de jeu,  et le choix de la posture individuelle pose toujours question dans le contexte théâtral de la confrontation. Cela peut conduire par exemple à défendre le plus faible du moment, à condition d’être plus avocat que procureur dans l’âme, et sans perdre de vue qu’il ne vaut pas plus que le plus fort. Cela renvoie aussi bien à un absurde existentialiste qu’à la pensée d’un Emil Cioran (1911 – 1995).

Leave a Comment

Uniformization by dilation and truncation

John Wilder Tukey
John Wilder Tukey (1915 – 2000)

Histogram of a sample of fractional part of standard gaussians. It is remarkably almost uniform!

This tiny post is devoted to a phenomenon regarding dilation/truncation/rounding. This can be traced back to Aurel Wintner (1933), P. Kosulajeff (1937), and John Wilder Tukey (1938).

For any real number \( {x} \), let us denote its fractional part by \( {\langle x\rangle=x-\lfloor x\rfloor\in[0,1)} \).

Note that then \( {\langle 0\rangle=0} \) while \( {\langle x\rangle=1-\langle|x|\rangle} \) when \( {x<0} \), thus \( {\langle x\rangle+\langle -x\rangle=\mathbf{1}_{x\not\in\mathbb{Z}}} \).

Following Kosulajeff, let \( {X} \) be a real random variable with cumulative distribution function \( {F} \) which is absolutely continuous. Then the fractional part of the dilation of \( {X} \) by a factor \( {\sigma} \) tends in law, as \( {\sigma\rightarrow\infty} \), to the uniform distribution on \( {[0,1]} \). In other words

\[ \langle\sigma X\rangle :=\sigma X-\lfloor \sigma X\rfloor \underset{\sigma\rightarrow\infty}{\overset{\text{law}}{\longrightarrow}}\mathrm{Uniform}([0,1]). \]

Actually if \( {X\sim\mathcal{N}(0,1)} \) and \( {\sigma=1} \) then \( {\langle X\rangle} \) is already close to the uniform!

We could find the result intuitive: the law of the dilated random variable \( {\sigma X} \) is close, as the factor \( {\sigma} \) gets large, to the Lebesgue measure on \( {\mathbb{R}} \), and the image measure of it via the canonical projection on \( {[0,1]} \) is uniform. What is unclear is the precise role of local regularity and behavior at infinity of the distribution of \( {X} \).

A proof. Recall that \( {F} \) is absolutely continuous: its derivative \( {F’} \) exists almost everywhere, \( {F’\in\mathrm{L}^1(\mathbb{R},\mathrm{d}x)} \), and \( {F(\beta)-F(\alpha)=\int_\alpha^\beta F'(x)\mathrm{d}x} \) for all \( {\alpha\leq\beta} \). Following Kosulajeff, we start by observing that for every interval \( {[a,b]\subset[0,1)} \),

\[ \begin{array}{rcl} \mathbb{P}(\langle\sigma X\rangle\in [a,b]) &=&\sum_{n\in\mathbb{Z}}\mathbb{P}(\sigma X\in[n+a,n+b])\\ &=&\sum_{n\in\mathbb{Z}}\Bigr(F\Bigr(\frac{n+b}{\sigma}\Bigr)-F\Bigr(\frac{n+a}{\sigma}\Bigr)\Bigr)\\ &=&\sum_{n\in\mathbb{Z}}\int_a^b\frac{1}{\sigma}F’\Bigr(\frac{n+x}{\sigma}\Bigr)\mathrm{d}x. \end{array} \]

Now, if \( {[a,b]} \) and \( {[a’,b’]} \) are sub-intervals of \( {[0,1)} \) of same length \( {b-a=b’-a’} \),

\[ \int_{a’}^{b’}\frac{1}{\sigma}F’\Bigr(\frac{n+x}{\sigma}\Bigr)\mathrm{d}x =\int_a^{b’-a’+a}\frac{1}{\sigma}F’\Bigr(\frac{n+x-a+a’}{\sigma}\Bigr)\mathrm{d}x =\int_a^{b}\frac{1}{\sigma}F’\Bigr(\frac{n+x-a+a’}{\sigma}\Bigr)\mathrm{d}x, \]

and thus

\[ \begin{array}{rcl} \Bigr| \int_a^b\frac{1}{\sigma}F’\Bigr(\frac{n+x}{\sigma}\Bigr)\mathrm{d}x -\int_{a’}^{b’}\frac{1}{\sigma}F’\Bigr(\frac{n+x}{\sigma}\Bigr)\mathrm{d}x \Bigr| &\leq&\int_a^b\frac{1}{\sigma}\Bigr|F’\Bigr(\frac{n+x}{\sigma}\Bigr)-F’\Bigr(\frac{n+x-a+a’}{\sigma}\Bigr)\Bigr|\mathrm{d}x\\ &=&\int_{\frac{n+a}{\sigma}}^{\frac{n+b}{\sigma}}\Bigr|F'(x)-F’\Bigr(x+\frac{a’-a}{\sigma}\Bigr)\Bigr|\mathrm{d}x. \end{array} \]

Using the fact that the intervals \( {\Bigr[\frac{n+a}{\sigma},\frac{n+b}{\sigma}\Bigr]} \), \( {n\in\mathbb{Z}} \), are pairwise disjoint, we obtain

\[ \begin{array}{rcl} \mathbb{P}(\langle\sigma X\rangle\in [a,b])- \mathbb{P}(\langle\sigma X\rangle\in [a’,b’]) &\leq& \sum_{n\in\mathbb{Z}}\int_{\frac{n+a}{\sigma}}^{\frac{n+b}{\sigma}}\Bigr|F'(x)-F’\Bigr(x+\frac{a’-a}{\sigma}\Bigr)\Bigr|\mathrm{d}x\\ &\leq&\int_{-\infty}^{+\infty}\Bigr|F'(x)-F’\Bigr(x+\frac{a’-a}{\sigma}\Bigr)\Bigr|\mathrm{d}x\\ &=&\Bigr\|F’-F’\Bigr(\cdot+\frac{a’-a}{\sigma}\Bigr)\Bigr\|_{\mathrm{L}^1(\mathbb{R},\mathrm{d}x)}, \end{array} \]

which goes to zero as \( {\sigma\rightarrow\infty} \) because translations are continuous in \( {\mathrm{L}^1(\mathbb{R},\mathrm{d}x)} \).

Densities. Still following Kosulajeff, if \( {X} \) has a continuous density \( {f} \) with respect to the Lebesgue measure such that \( {f(x)} \) decreases when \( {|x|} \) increases when \( {|x|} \) is large enough, then the density of \( {\langle\sigma X\rangle} \) tends pointwise to \( {1} \) as \( {\sigma\rightarrow\infty} \).

To see it, we start by observing that the density of \( {\langle\sigma X\rangle} \) is

\[ x\in[0,1]\mapsto\sum_{n\in\mathbb{Z}}\frac{1}{\sigma}f\Bigr(\frac{n+x}{\sigma}\Bigr). \]

Let us fix an arbitrary \( {x\in\mathbb{R}} \) and \( {\varepsilon>0} \). Let \( {A>0} \) be large enough such that

\[ \int_{-\sigma A}^{\sigma A}f\Bigr(\frac{y}{\sigma}\Bigr)\mathrm{d}y =\int_{-A}^Af(y)\mathrm{d}y>1-\varepsilon. \]

Now we observe that

\[ \begin{array}{rcl} \Bigr|\sum_{n\in\mathbb{Z}\cap[-\sigma A,\sigma A]}\frac{1}{\sigma}f\Bigr(\frac{n+x}{\sigma}\Bigr)-\int_{-\sigma A}^{\sigma A}\frac{1}{\sigma}f\Bigr(\frac{y}{\sigma}\Bigr)\mathrm{d}y\Bigr| &\leq&\int_{-\sigma A}^{\sigma A}\frac{1}{\sigma}\Bigr|f\Bigr(\frac{\lfloor y\rfloor+x}{\sigma}\Bigr)-f\Bigr(\frac{y}{\sigma}\Bigr)\Bigr|\mathrm{d}y\\ &=&\int_{-A}^{A}\Bigr|f\Bigr(\frac{\lfloor \sigma z\rfloor+x}{\sigma}\Bigr)-f(z)\Bigr|\mathrm{d}z, \end{array} \]

and this last quantity goes to \( {0} \) as \( {\sigma\rightarrow\infty} \) by the uniform continuity of \( {f} \) on \( {[-A,A]} \) which follows from the continuity of \( {f} \) by the Heine theorem. It follows that

\[ \Bigr|\sum_{n\in\mathbb{Z}\cap[-\sigma A,\sigma A]}\frac{1}{\sigma}f\Bigr(\frac{n+x}{\sigma}\Bigr)-1\Bigr|\leq2\varepsilon. \]

On the other hand, since \( {f(x)} \) decreases when \( {|x|} \) increases, for large enough \( {|x|} \), we have, for a large enough \( {A>0} \),

\[ \begin{array}{rcl} \sum_{n\in\mathbb{Z}\cap[-\sigma A,\sigma A]^c}\frac{1}{\sigma}f\Bigr(\frac{n+x}{\sigma}\Bigr) &\leq& \sum_{n\in\mathbb{Z}\cap[-\sigma A,\sigma A]^c}\int_{n-1}^n\frac{1}{\sigma}f\Bigr(\frac{y+x}{\sigma}\Bigr)\mathrm{d}y\\ &\leq&\int_{|y|>\sigma A-1}\frac{1}{\sigma}f\Bigr(\frac{y+x}{\sigma}\Bigr)\mathrm{d}y\\ &=&\int_{|z|>A-\frac{1}{\sigma}}f\Bigr(z+\frac{x}{\sigma}\Bigr)\mathrm{d}z \end{array} \]

which is \( {\leq\varepsilon} \) for \( {A} \) and \( {\sigma} \) large enough. Finally we obtain, as expected,

\[ \Bigr|\sum_{n\in\mathbb{Z}}\frac{1}{\sigma}f\Bigr(\frac{n+x}{\sigma}\Bigr)-1\Bigr|\leq3\varepsilon. \]

Necessary and sufficient condition via Fourier analysis. Following Tukey, for any real random variable \( {X} \), we have that \( {Y=\langle\sigma X\rangle} \) tends in law as \( {\sigma\rightarrow\infty} \) to the uniform law on \( {[0,1]} \) if and only if the Fourier transform or characteristic function \( {\varphi_X} \) of \( {X} \) tends to zero at infinity.

If the cumulative distribution function (cdf) of \( {X} \) is absolutely continuous then the Riemann-Lebesgue lemma gives that \( {\varphi_X} \) vanishes at infinity, and we recover the result of Kosulajeff. The result of Tukey is however strictly stronger since there exists singular distributions with respect to the Lebesgue measure for which the cdf is not absolutely continuous while the Fourier transform vanishes at infinity.

Let us give the proof of Tukey. For every real random variable \( {Z} \) and all \( {t\in\mathbb{R}} \),

\[ \begin{array}{rcl} \varphi_Z(t) &=&\mathbb{E}(\mathrm{e}^{\mathrm{i}tZ})\\ &=&\sum_{n\in\mathbb{Z}}\mathbb{E}(\mathrm{e}^{\mathrm{i}tZ}\mathbf{1}_{Z\in[n,n+1)})\\ &=&\sum_{n\in\mathbb{Z}}\mathbb{E}(\mathrm{e}^{\mathrm{i}t(n+\langle Z\rangle)}\mathbf{1}_{Z\in[n,n+1)}) \\ &=&\mathbb{E}(\mathrm{e}^{\mathrm{i}t\langle Z\rangle}\sum_{n\in\mathbb{Z}}\mathrm{e}^{\mathrm{i}tn}\mathbf{1}_{Z\in[n,n+1)}). \end{array} \]

But the sum in the right hand side is constant \( {=1} \) if \( {t=2\pi k\in2\pi\mathbb{Z}} \). Hence, with \( {Z=\sigma X} \),

\[ \varphi_{\sigma X}(2\pi k)=\varphi_{\langle\sigma X\rangle}(2\pi k), k\in\mathbb{Z}. \]

Note also that

\[ \varphi_{\sigma X}(2\pi k)=\varphi_X(2\pi k\sigma), k\in\mathbb{Z}. \]

On the other hand, if \( {U} \) is uniformly distributed on \( {[0,1]} \) then for all \( {t\in\mathbb{R}} \),

\[ \varphi_U(t)=\mathbb{E}(\mathrm{e}^{\mathrm{i}tU})=\frac{\mathrm{e}^{\mathrm{i}t}-1}{\mathrm{i}t}\mathbf{1}_{t\neq0}+\mathbf{1}_{t=0} \]

and in particular,

\[ \varphi_U(2\pi k)=\mathbf{1}_{k=0}, k\in\mathbb{Z}. \]

Now, by a Fourier coefficients version of the Paul Lévy theorem, \( {\langle\sigma X\rangle\rightarrow U} \) in law as \( {\sigma\rightarrow\infty} \) if and only if \( {\varphi_{\langle\sigma X\rangle}\rightarrow\varphi_U} \) as \( {\sigma\rightarrow\infty} \) pointwise on \( {2\pi\mathbb{Z}} \).

Weyl equipartition theorem. The Weyl equipartition theorem in analytic number theory is a sort of deterministic version. More precisely, a real sequence \( {{(x_n)}_n} \) is uniformly distributed modulo \( {1} \) in the sense of Weyl when for any Borel subset \( {I\subset [0,1]} \) of Lebesgue measure \( {|I|} \),

\[ \lim_{n\rightarrow\infty}\frac{\mathrm{Card}\{k\in\{1,\ldots,n\}:\langle x_k\rangle\in I\}}{n}=|I|. \]

Now the famous equipartition or equidistribution theorem proved by Hermann Weyl in 1916 states that this is equivalent to say that for all \( {m\in\mathbb{Z}} \), \( {m\neq0} \),

\[ \lim_{n\rightarrow\infty}\frac{1}{n}\sum_{k=1}^n\mathrm{e}^{2\pi\mathrm{i}mx_k}=0. \]

A proof may consist in approximating the indicator of \( {I} \) by trigonometric polynomials.

Jacobi theta function. In the special case \( {X\sim\mathcal{N}(0,1)} \) the density of \( {\langle\sigma X\rangle} \) at point \( {x\in[0,1]} \) is given by

\[ \frac{1}{\sqrt{2\pi}\sigma} \sum_{n\in\mathbb{Z}}\mathrm{e}^{-\frac{(n+x)^2}{2\sigma^2}} =\frac{\mathrm{e}^{-\frac{x^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} \sum_{n\in\mathbb{Z}}\mathrm{e}^{-\frac{n^2}{2\sigma^2}+\frac{nx}{\sigma^2}} =\frac{\mathrm{e}^{-\frac{x^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} \vartheta(-\mathrm{i}/(2\pi\sigma^2),\mathrm{i}/(2\pi\sigma^2)) \]

where

\[ \vartheta(z,\tau) =\sum_{n\in\mathbb{Z}} \mathrm{e}^{\pi\mathrm{i}n^2\tau+2\pi\mathrm{i}nz} =\sum_{n\in\mathbb{Z}}(\mathrm{e}^{\pi\mathrm{i}\tau})^{n^2}\cos(2\pi nz) \]

is the Jacobi theta function, as special function that plays a role in analytic number theory. Note that the Greek letter \( {\vartheta} \) is a variant of \( {\theta} \), obtained in LaTeX with the command vartheta.

This is the path followed by Wintner, in relation with a rather sketchy Fourier analysis. Actually the main subject of the 1933 paper by Wintner was to show that the Gaussian law is the unique stable distribution with finite variance. His analysis led him at the end of his short paper to consider another problem: the uniformization of the fractional part of a Gaussian with a blowing variance! Wintner was fond of probability theory and number theory.

Central Limit Theorem on Compact Lie Groups. More generally, if \( {X_1,X_2,\ldots} \) are independent and identically distributed real random variables of mean \( {m} \) and variance \( {\sigma^2} \), then, by the central limit phenomenon, \( {X_1+\cdots+X_n} \) will be close to \( {\mathcal{N}(nm,n\sigma^2)} \). Also, from the Wintner-Kosulajeff-Tukey phenomenon, for any \( {a>0} \), \( {(X_1+\cdots+X_n)\mod a=\sum_{i=1}^n(X_i\mod a)} \) will probably converge in law as \( {n\rightarrow\infty} \) towards the uniform law on \( {[0,a]} \). The case \( {a=2\pi} \) corresponds to the unit circle. This suggests a central limit theorem for triangular arrays on compact Lie groups, a notion of Gaussian on Lie groups via Fourier analysis, which gives the Haar measure in the compact case. A dilation factor is probably not necessary in the compact case.

Rounding. The rounding of \( {x} \) takes the form \( {\sigma^{-1}\lfloor\sigma x\rfloor} \), and the Wintner-Kosulajeff-Tukey phenomenon states that the rounding error \( {x-\sigma^{-1}\lfloor\sigma x\rfloor} \) becomes uniform as \( {\sigma\rightarrow\infty} \).

Personal. I have encountered this phenomenon fifteen years ago while working on a statistical problem with my colleague Didier Concordet. At that time, I was not aware of the literature so I had to rediscover everything in particular the role of absolute continuity. Fun! Recently, by accident, I have encoutered again this striking phenomenon in relation with a research project for master students about a Monte Carlo algorithm for stochastic billiards. I have discussed a little bit the problem with my colleague Marc Hoffmann, who also knew this phenomenon from the community of the general theory of stochastic processes and mathematical finance.

Bibliography.

It seems that P. Kosulajeff, also written P. A. Kozulyaev, or Petr Alekseevich Ozulyaev, was a former doctoral student of Andrey Kolmogorov, who saved him from the Staline great purge. This information comes from Laurent Mazliak (via my friend Arnaud Guyader), who reads Russian and knows the history of probability theory.

2 Comments

Publications

Actualité sur le front des publications mathématiques françaises :

Leave a Comment

About the Hellinger distance

Ernst David Hellinger (1883 - 1950)
Ernst David Hellinger (1883 – 1950)

This tiny post is devoted to the Hellinger distance and affinity.

Hellinger. Let \( {\mu} \) and \( {\nu} \) be probability measures with respective densities \( {f} \) and \( {g} \) with respect to the Lebesgue measure \( {\lambda} \) on \( {\mathbb{R}^d} \). Their Hellinger distance is

\[ \mathrm{H}(\mu,\nu) ={\Vert\sqrt{f}-\sqrt{g}\Vert}_{\mathrm{L}^2(\lambda)} =\Bigr(\int(\sqrt{f}-\sqrt{g})^2\mathrm{d}\lambda\Bigr)^{1/2}. \]

This is well defined since \( {\sqrt{f}} \) and \( {\sqrt{g}} \) belong to \( {\mathrm{L}^2(\lambda)} \). The Hellinger affinity is

\[ \mathrm{A}(\mu,\nu) =\int\sqrt{fg}\mathrm{d}\lambda, \quad \mathrm{H}(\mu,\nu)^2 =2-2A(\mu,\nu). \]

This gives \( {H(\mu,\nu)^2\in[0,2]} \), \( {A(\mu,\nu)\in[0,1]} \), and the tensor product formula

\[ \mathrm{H}(\mu^{\otimes n},\nu^{\otimes n})^2 =2-2A(\mu^{\otimes n},\nu^{\otimes n}) =2-2A(\mu,\nu)^n =2-2\left(1-\frac{\mathrm{H}(\mu,\nu)^2}{2}\right)^n. \]

Note that \( {\mathrm{H}(\mu,\nu)^2=2} \) iff \( {\mu} \) and \( {\nu} \) have disjoint supports.

Note that if \( {\mu\neq\nu} \) then \( {\lim_{n\rightarrow\infty}\mathrm{H}(\mu^{\otimes n},\nu^{\otimes n})=2} \), a high dimensional phenomenon.

The notions of Hellinger distance and affinity pass to discrete distributions by replacing the Lebesgue measure \( {\lambda} \) by the counting measure. The Hellinger distance above is a special case of the \( {\mathrm{L}^p} \) version \( {\Vert f^{1/p}-g^{1/p}\Vert_{\mathrm{L}^p(\lambda)}} \) available for arbitrary \( {p\geq1} \). This is useful in asymptotic statistics, and we refer to the textbooks listed below.

Relation to total variation distance. The Hellinger distance is equivalent topologically and close metrically to the total variation distance, in the sense that

\[ \mathrm{H}^2(\mu,\nu) \leq\left\Vert\mu-\nu\right\Vert_{\mathrm{TV}} \leq\mathrm{H}(\mu,\nu)\sqrt{4-\mathrm{H}(\mu,\nu)^2} \leq2\mathrm{H}(\mu,\nu) \]

where

\[ \left\Vert\mu-\nu\right\Vert_{\mathrm{TV}} =\sup_A|\mu(A)-\nu(A)| =\int|f-g|\mathrm{d}\lambda. \]

Indeed, the first inequality comes from the following elementary observation

\[ (\sqrt{a}-\sqrt{b})^2 =a+b-2\sqrt{ab} \leq a+b-2(a\wedge b) =|a-b|, \]

valid for all \( {a,b\geq0} \), while the second inequality comes from

\[ |a-b|=|\sqrt{a}^2-\sqrt{b}^2|=|\sqrt{a}-\sqrt{b}|(\sqrt{a}+\sqrt{b}) \]

whiche gives, thanks to the Cauchy-Schwarz inequality,

\[ \int|f-g|\mathrm{d}\lambda \leq\mathrm{H}(\mu,\nu)\sqrt{\int(\sqrt{f}+\sqrt{g})^2\mathrm{d}\lambda} =\mathrm{H}(\mu,\nu)\sqrt{2+2A(\mu,\nu)}. \]

Gaussian explicit formula. The Hellinger distance (or affinity) between two Gaussian distributions can be computed explicitly, just like the square Wasserstein distance and the Kullback-Leibler divergence or relative entropy. Namely

\[ \mathrm{A}(\mathcal{N}(m_1,\sigma_1^2),\mathcal{N}(m_2,\sigma_2^2)) =\sqrt{2\frac{\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \exp\Bigr(-\frac{(m_1-m_2)^2}{4(\sigma_1^2+\sigma_2^2)}\Bigr), \]

equal to \( {1} \) iff \( {(m_1,\sigma_1)=(m_2,\sigma_2)} \). By using the tensor product formula, we have also

\[ \mathrm{A}(\mathcal{N}(m_1,\sigma_1^2)^n,\mathcal{N}(m_2,\sigma_2^2)^n) =\Bigr(2\frac{\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}\Bigr)^{n/2} \exp\Bigr(-n\frac{(m_1-m_2)^2}{4(\sigma_1^2+\sigma_2^2)}\Bigr). \]

Here is a general “matrix” formula for Gaussians on \( {\mathbb{R}^d} \), \( {d\geq1} \), with \( {\Delta m=m_2-m_1} \),

\[ \mathrm{A}(\mathcal{N}(m_1,\Sigma_1),\mathcal{N}(m_2,\Sigma_2)) =\frac{\det(\Sigma_1\Sigma_2)^{1/4}}{\det(\frac{\Sigma_1+\Sigma_2}{2})^{1/2}} \exp\Bigr(-\frac{\langle\Delta m,(\Sigma_1+\Sigma_2)^{-1}\Delta m)\rangle}{4}\Bigr). \]

The Hellinger affinity is also known as the Bhattacharyya coefficient, and enters the definition of the Bhattacharyya distance \( {(\mu,\nu)\mapsto-\log\mathrm{A}(\mu,\nu)} \).

Application to long time behavior of Ornstein-Uhlenbeck. Let \( {{(B_t)}_{t\geq0}} \) be an \( {n} \)-dimensional standard Brownian motion and let \( {{(X^x_t)}_{t\geq0}} \) be the Ornstein-Uhlenbeck process solution of the stochastic differential equation

\[ X_0=x,\quad \mathrm{d}X^x_t=\sqrt{2}\mathrm{d}B_t-X^x_t\mathrm{d}t \]

where \( {x\in\mathbb{R}^n} \). By plugging this equation into the identity \( {\mathrm{d}(\mathrm{e}^tX^x_t)=\mathrm{e}^t\mathrm{d}X^x_t+\mathrm{e}^tX^x_t\mathrm{d}t} \) we get the Mehler formula (the variance comes from the Wiener integral)

\[ X^x_t=x\mathrm{e}^{-t}+\sqrt{2}\int_0^t\mathrm{e}^{s-t}\mathrm{d}B_s \sim \mathcal{N}(x\mathrm{e}^{-t},(1-\mathrm{e}^{-2t})I_n) \underset{t\rightarrow\infty}{\longrightarrow} \mathcal{N}(0,I_n). \]

It follows in particular that for all \( {x,y\in\mathbb{R}^n} \) an \( {t>0} \)

\[ \frac{1}{2}\mathrm{H}^2(\mathrm{Law}(X^x_t),\mathrm{Law}(X^y_t)) =1-\exp\Bigr(-\frac{|x-y|^2\mathrm{e}^{-2t}}{1-\mathrm{e}^{-2t}}\Bigr). \]

Moreover, denoting \( {\mu_t=\mathrm{Law}(X^x_t)} \) and \( {\mu_\infty=\mathcal{N}(0,I_n)} \), it follows that

\[ \mathrm{H}(\mu_t,\mu_\infty)^2 =2-2\Bigr(2\frac{\sqrt{1-\mathrm{e}^{-2t}}}{2-\mathrm{e}^{-2t}}\Bigr)^{1/2} \exp\Bigr(-\frac{|x|^2\mathrm{e}^{-2t}}{4(2-\mathrm{e}^{-2t})}\Bigr). \]

This quantity tends to \( {0} \) as \( {t\rightarrow\infty} \). If \( {|x|^2=x_1^2+\cdots+x_n^2\sim cn} \) then this happens, as \( {n} \) is large, near the critical value \( {t=\frac{1}{2}\log(n)} \), for which \( {\mathrm{e}^{-2t}=1/n} \). More information about cutoffs phenomena for Ornstein-Uhlenbeck and diffusions is available in the papers below.

Further reading

Leave a Comment
Syntax · Style · Tracking & Privacy.