# Category: Uncategorized

This post is about a model of statistical physics which consists in a probability measure on ${\mathbb{R}^n}$ modeling ${n}$ one-dimensional unit charge particles subject to Coulomb pair repulsion and to attraction with respect to a background of opposite charge. It is a one-dimensional Coulomb gas (not a log-gas!) confined by the potential generated by a charged background, a special case of the jellium model of Paul Eugene Wigner (1938). In the case of a uniform background, it is related to a conditioned Gaussian distribution. It was already observed by Rodney James Baxter (1963) that this model is exactly solvable. This exact solvability can be seen as the one-dimensional analogue of the exact solvability discovered by Eric Kostlan (1992) in the case of the two-dimensional Coulomb gas describing the spectrum of Ginibre random matrices.

One-dimensional Wigner jellium or Coulomb gas. The electrostatic potential at point ${x\in\mathbb{R}}$ generated by a one-dimensional unit charge located at the origin is given by ${g(x)=-|x|}$. By the principle of superposition, the electrostatic potential generated at point ${x\in\mathbb{R}}$ by a distribution of charges ${\mu}$ on ${\mathbb{R}}$ is given by

$U_{\mu}(x)=(g*\mu)(x)=-\int|x-y|\mu(\mathrm{d}y)$

and the electric field by ${E_\mu=-U_{\mu}’=-g’*\mu=\mathrm{sign}*\mu}$. The derivative of ${g}$ is the sense of distributions is the Heaviside step function ${g’=\mathbf{1}_{-(\infty,0)}-\mathbf{1}_{(0,+\infty)}}$, which is an element of ${\mathrm{L}^\infty}$ defined almost everywhere, while the second derivative of ${g}$ in the sense of Schwartz distributions is a Dirac mass at zero ${g”=-2\delta_0}$. It particular ${g}$ is the fundamental solution of the Poisson equation and we can recover ${\mu}$ from its potential by

$U”_{\mu}=g”*\mu=-2\delta_0*\mu=-2\mu.$

The self-interaction energy of the distribution of charges ${\mu}$ is

$\mathcal{E}(\mu) =\frac{1}{2}\iint g(x-y)\mu(\mathrm{d}x)\mu(\mathrm{d}y) =\int U_\mu\mathrm{d}\mu.$

Let us consider now ${n\geq1}$ one-dimensional unit charges at positions ${x_1,\ldots,x_n}$, lying in a positive background of total charge ${\alpha>0}$ smeared according to a probability measure ${\rho}$ on ${\mathbb{R}}$ with finite Coulomb energy ${\mathcal{E}(\rho)}$. The total potential energy of the system is

$H_n(x_1,\ldots,x_n) = -\sum_{i<j} |x_i-x_j| -\alpha\sum_{i=1}^nU_{\rho}(x_i)$

up to the additive constant ${\alpha^2\mathcal{E}(\rho)}$. The system is charge neutral when ${\alpha=n}$. Following Wigner (1938), let us define now the Boltzmann-Gibbs probability measure ${P_n}$ over all the possible configurations at inverse temperature ${\beta>0}$ by

$\mathrm{d}P_n(x_1,\ldots,x_n) =\frac{\mathrm{e}^{-\beta H_n(x_1,\ldots,x_n)}}{Z_n} \mathrm{d}x_1\cdots\mathrm{d}x_n$

where

$Z_n=\int_{\mathbb{R}^n}\mathrm{e}^{-\beta H_n(x_1,\ldots,x_n)} \mathrm{d}x_1\cdots\mathrm{d}x_n.$

It can be checked that ${Z_n<\infty}$ if and only if ${\alpha<n}$. Note that ${P_n}$ is a one-dimensional Coulomb gas with external field associated to the potential ${V=-\frac{\alpha}{n}U_\rho}$.

Baxter exact solvability for uniform backgrounds. The model is exactly solvable. Indeed, following Baxter (1963), we have the combinatorial identity

$-\sum_{i < j} |x_i – x_j| =\sum_{i<j}(x_{(j)}-x_{(i)}) =\sum_{k=1}^n (2k-n-1) x_{(k)},$

where ${x_{(n)}\leq\cdots\leq x_{(1)}}$ is the reordering of ${x_1,\ldots,x_n}$; in particular,

$x_{(n)}=\min_{1\leq i\leq n}x_i \quad\text{and}\quad x_{(1)}=\max_{1\leq i\leq n}x_i,$

which allows to rewrite the potential energy as

$H_n(x_1,\dots,x_n) = \sum_{k=1}^n \Bigr((2k-n-1)x_{(k)}-\alpha_n U_\rho(x_{(k)})\Bigr).$

We assume now that ${\rho}$ is the uniform law on an interval ${[a,b]}$. Then, for all ${x\in\mathbb{R}}$,

$-U_\rho(x) =\frac{1}{b-a}\int_a^b|x-y|\mathrm{d}y =\begin{cases} \displaystyle\left|x-\frac{a+b}{2}\right| &\mbox{if }x\not\in[a,b]\\ \displaystyle\frac{\left(x-\frac{a+b}{2}\right)^2+\frac{(b-a)^2}{4}}{b-a} &\mbox{if }x\in[a,b] \end{cases}.$

The potential ${V=-\frac{\alpha}{n}U_\rho}$ then behaves quadratically on ${[a,b]}$ and is affine outside ${[a,b]}$. Conditioned on all the particles lying inside ${[a,b]}$, it is possible to interpret ${P_n}$ as a conditioned Gaussian law. Indeed, using Baxter’s identity, if ${\{x_1,\ldots,x_n\}\subset[a,b]}$ then

$H_n(x_1,\ldots,x_n) = \sum_{k=1}^n(2k-n-1)x_{(k)} +\frac{\alpha}{b-a}\sum_{i=1}^n\Bigr(x_{(i)}-\frac{a+b}{2}\Bigr)^2+\frac{n\alpha(b-a)}{4}.$

This formula shows then that ${X_n\sim P_n}$ is conditionally Gaussian in the sense that

$\mathrm{Law}\Bigr((X_{(n)},\ldots,X_{(1)})\bigm\vert \{X_1,\ldots,X_n\}\subset[a,b]\Bigr)$

$\qquad =\mathrm{Law}\Bigr((Y_n,\ldots,Y_1)\bigm\vert a\leq Y_n\leq\cdots\leq Y_1\leq b\Bigr)$

where ${Y_1,\ldots,Y_n}$ are independent real Gaussian random variables with

$\mathbb{E}Y_k=\frac{a+b}{2}+\frac{b-a}{2\alpha}\left(n+1-2k \right) \quad\text{and}\quad \mathbb{E}((Y_k-\mathbb{E}Y_k)^2)=\frac{b-a}{2\alpha\beta}.$

This was already observed by Baxter. Now if we consider the limit ${a\rightarrow-\infty, b\rightarrow \infty}$ with ${\alpha/(b-a) \rightarrow c > 0}$, then ${P_n}$ can be interpreted as a Coulomb gas for which the potential is quadratic everywhere, namely ${V=\frac{c}{2n}\left|\cdot\right|^2}$. This can also be seen as a jellium with a background equal to a multiple of Lebesgue measure on the whole of ${\mathbb{R}}$. Under the scaling ${x_i=\sqrt{n}y_i}$, this limiting case matches the model studied by Abhishek Dhar, Anupam Kundu, Satya N. Majumdar, Sanjib Sabhapandit, and Grégory Schehr (2018). This Coulomb gas model with quadratic external field in one dimension is analogous to the complex Ginibre ensemble which is a Coulomb gas in two dimensions.

Scale invariance. The model ${P_n}$ has a scale invariance which comes from the homogeneity of the one-dimensional Coulomb kernel ${g}$. Indeed, if ${\mathrm{dil}_\sigma(\mu)}$ denotes the law of the random vector ${\sigma X}$ when ${X_n\sim\mu}$, then, for all ${\sigma>0}$, dropping the ${n}$ subscript,

$\mathrm{dil}_\sigma(P^{\alpha,\beta,\rho}) =P^{\alpha,\frac{\beta}{\sigma},\mathrm{dil}_\sigma(\rho)}.$

In other words, if ${X_n\sim P^{\alpha,\beta,\rho}}$ then

$\sigma X_n\sim P^{\alpha,\frac{\beta}{\sigma},\mathrm{dil}_\sigma(\rho)}.$

This property is useful in the asymptotic analysis of the model as ${n\rightarrow\infty}$, and reveals the special role played by ${\alpha}$ as a shape parameter. Here the inverse temperature ${\beta}$ is a scale parameter, in contrast with the situation for log-gases.

Asymptotic analysis. An asymptotic analysis of ${P_n}$ as ${n\rightarrow\infty}$ is conducted in a joint work arXiv:2012.04633 with David García-Zelada and Paul Jung, for general backgrounds. This is a continuation of our previous work devoted to two-dimensional jelliums. We study one-dimensional Wigner jelliums, not necessarily charge neutral, for which the unit charges are allowed to exist beyond the support of the background. The model can be seen as a one-dimensional Coulomb gas (not a log-gas!) in which the external field is generated by a smeared background on an interval. We first observe that the system exists iff the total background charge is greater than the number of unit charges minus one. Moreover we obtain a Rényi-type probabilistic representation for the order statistics of the particle system beyond the support of the background. Furthermore, for various backgrounds, we show convergence to point processes, at the edge of the support of the background. In particular, this provides asymptotic analysis of the fluctuations of the right-most particle. Our analysis reveals that these fluctuations are not universal, in the sense that depending on the background, the tails range anywhere from exponential to Gaussian-like behavior, including for instance Tracy-Widom-like behavior.

One Dimensional Models. Excerpt from Baxter’s book on exactly solved models (1982):
One-dimensional models can be solved if they have finite-range, decaying exponential, or Coulomb interactions. As guides to critical phenomena, such models with short-range two-particle forces (including exponentially decaying forces) have a serious disadvantage: they do not have a phase transition at a non-zero temperature (van Hove, 1950; Lieb and Mattis, 1966). The Coulomb systems also do not have a phase transition, (Lenard, 1961; Baxter, 1963, 1964 and 1965), though the one-dimensional electron gas has long-range order at all temperatures (Kunz, 1974).
Of the one-dimensional models, only the nearest-neighbour Ising model (Ising, 1925; Kramers and Wannier, 1941) will be considered in this book. It provides a simple introduction to the transfer matrix technique that will be used for the more difficult two-dimensional models. Although it does not have a phase transition for non-zero temperature, the correlation length does become infinite at H = T = 0, so in a sense this is a ‘critical point’ and the scaling hypothesis can be tested near it.
A one-dimensional system can have a phase transition if the interactions involve infinitely many particles, as in the cluster interaction model (Fisher and Felderhof, 1970; Fisher, 1972). It can also have a phase transition if the interactions become infinitely long-ranged, but then the system really belongs to the following class of ‘infinite-dimensional’ models.
“.

Mon mandat de vice-président en charge du numérique de l’université Paris-Dauphine s’est achevé ce mois-ci. Durant ces quatre années 2017-2020 bien remplies, j’aurai collaboré avec deux directeurs du numérique successifs, quatre directeurs généraux des services successifs dont deux par intérim, une présidente puis un administrateur provisoire, et une grande variété de responsables politiques et administratifs. Pour paraphraser Nietzsche, ce qui ne tue pas renforce ! Tous comptes faits, il faut du temps, peut-être un an ou deux, pour commencer à comprendre en profondeur les tenants et les aboutissants d’une organisation comme une université.

Le numérique – on dit « digital » en anglais – est le nom donné depuis quelques années dans l’univers francophone à l’informatique au sens large, qui va des technologies elles-mêmes aux sciences humaines et sociales, en incluant à la fois l’administration, l’enseignement, et la recherche. Être responsable du numérique incite à avoir autant que possible les pieds sur terre et la tête dans les étoiles, une vision stratégique et une prise en compte pragmatique du concret. J’ai plutôt apprécié ce difficile défi du grand écart. L’état du numérique à Paris-Dauphine en 2020 n’a plus grand chose à voir avec celui de 2016, mais beaucoup reste encore à faire.

Ces années de vice-présidence m’ont vraiment permis de mieux connaître l’université et ceux qui la peuplent. Inutile de préciser qu’on sue souvent sous le costume (*) dans ce grand théâtre social. Dans la myriade d’expériences diverses vécues, je me souviens encore avec sourire de réunions avec des représentants syndicaux durant lesquelles j’étais manifestement assimilé à un membre du patronat aux intentions maléfiques. Je me souviens également d’une séance du conseil d’administration durant laquelle, après mon exposé sur l’état du numérique, un élu étudiant a pris la parole en commençant son intervention par un « Nous les millénials… ».

(*) costume que je ne portais jamais, ce qui n’arrangeait pas forcément mes affaires.

Dauphine est une université de petite taille, avec un nombre réduit de départements de formation et de laboratoires de recherche, et un campus particulièrement ramassé. Malgré tout, les uns et les autres se connaissent assez peu, chacun vit dans son microcosme. Il faut donc constamment inciter à la prise en compte des réalités et contraintes des autres pour donner du sens aux choix collectifs. L’ingratitude braillarde reste quand même assez répandue. Tout le monde se plaint, étudiants, administratifs, enseignants, chercheurs, en général des autres, parfois toujours des mêmes. Le numérique peut être autant une source de souffrance qu’un bouc émissaire pour masquer des médiocrités. Mais il y a aussi une hiérarchie sociale, les étudiants et les administratifs sont ceux qui subissent le plus, tandis que les enseignants-chercheurs sont libres, pour le meilleur et pour le pire. Il y a pourtant des personnes formidables dans toutes les catégories. Chaque catégorie a elle-même sa hiérarchie sociale, son histoire pesante, et ses difficultés, et c’est parfois chez les autres que se trouve un ou une semblable.

En 2016, un grand nombre d’utilisateurs dauphinois de toutes les communautés étaient exaspérés par le manque de disponibilité, de fiabilité, et de sécurité de la solution utilisée pour le courrier électronique à l’époque (Partage). Par ailleurs ils étaient tout aussi nombreux à déplorer le manque de communication des outils informatiques entre eux. Pour toutes ces raisons, il est apparu que le passage à un bouquet de services numériques dans les nuages (cloud) était la meilleure chose à faire. Deux solutions étaient disponibles sur le marché : Microsoft O365 et Google G Suite. Étant donné que l’administration et une bonne partie des étudiants et des enseignants-chercheurs utilisaient en standard Microsoft Office, il est apparu que le choix de Microsoft O365 était le plus approprié, pour éviter une difficile voire impossible conduite du changement. Il n’a pas toujours été facile de défendre ce choix, et nous avons parfois douté de cette audace. Rétrospectivement, l’expérience vécue en ce moment dans l’entreprise Airbus, qui a choisi G Suite, ne laisse aucun doute à ce sujet. Non seulement nous avons finalement fait le bon choix, mais nous avons également eu de la chance, car Microsoft Teams est apparu ensuite au sein de O365, et constitue un élément majeur de la transformation numérique de l’organisation, démarrée bien avant la crise de la Covid-19. Il est vrai que Microsoft a une image plus négative que Apple ou Google dans certaines communautés, qui ont hérité d’une époque révolue. Microsoft aujourd’hui a misé sur le cloud, rivalise de dynamisme avec les autres GAFAM, contribue au noyau Linux, a racheté GitHub, etc. La vision du nouveau PDG Satya Nadella depuis une dizaine d’années n’y est pas pour rien. Cela étant dit, Microsoft n’est pas plus vertueux que les autres GAFAM. L’hégémonie écrasante des États-Unis et de l’Asie sur l’industrie du numérique pose problème, aussi bien pour le matériel, le logiciel, les réseaux, que pour les services, avec ou sans cloud. Il est légitime de regretter le manque de vision économique de l’Europe en la matière, mais ce n’est pas en tournant le dos à la modernité à Dauphine que nous résoudrons ce problème de politique européenne. En fournissant un bouquet de services numériques de qualité, professionnel, en protégeant les données contractuellement, Dauphine jugule la prolifération massive de l’usage de services numériques en ligne tiers faussement gratuits qui font payer les utilisateurs avec leurs données et métadonnées privées. Ce fléau au parfum de paradoxe ravage bon nombre d’universités et d’organismes de recherche en France.

Office 365 pèse relativement peu sur le budget au vu de ce qu’il apporte. Mais contrairement à la plupart des autres universités françaises, Dauphine développe ou adapte des solutions logicielles pour ses besoins spécifiques : candidatures et dossiers vacataires dématérialisés, base de données de la recherche, gestion de la relation client, … Tout ce sur-mesure coûte cher, et le retard à rattraper sur l’idéal est encore très important. Dauphine souffre en matière de numérique d’avoir l’ambition du secteur privé, les contraintes du secteur public, et un désordre typiquement universitaire, qui commence par celui des enseignants-chercheurs. Ici comme ailleurs, le principal levier pour la transformation numérique de l’établissement n’est pas la qualité du réseau WiFi ou des vidéoprojecteurs, mais plutôt le niveau numérique des salariés de l’organisation, et en tout premier lieu celui des responsables et des dirigeants.

Il est parfois utile de penser le numérique comme un bouquet de symétries : numérique pour l’administration et administration du numérique, numérique pour l’enseignement et enseignement du numérique, numérique pour la recherche et recherche sur le numérique. La mandature 2017-2020 a beaucoup consisté, en matière de numérique, à introduire un peu plus de méthode, de rigueur, de qualité. En matière de transformation numérique de l’organisation, il restera toujours vrai que numériser du désordre produit du désordre numérique. Le numérique a fait l’objet de la première régulation de la mandature, à travers notamment la création d’un schéma directeur numérique. La transformation de la direction des systèmes d’information en direction du numérique s’est accompagné d’une nouvelle vision plus orientée vers les services, les usages, et le numérique de proximité. Parallèlement, la création du programme transversal Dauphine numérique a permis de renforcer le numérique sur les versants de l’enseignement et de la recherche, en phase avec l’institut PRAIRIE de PSL, et de renforcer le développement des relations avec les entreprises autour des sciences des organisations et du numérique. Deux postes de professeurs « transversaux » en sciences des données ont été créés et pourvus sur le programme Dauphine numérique, ce qui n’est pas négligeable à l’échelle de Dauphine.

La nouvelle mandature s’inscrit dans une continuité et une consolidation de celle qui s’achève, avec notamment une cohérence plus marquée avec PSL, la prise en compte du chantier du nouveau campus, et une mise en œuvre plus volontaire du numérique dans les formations.

Quelques billets reliés :

This tiny post is about the ubiquity of Gaussian distributions. Here are three reasons.

Geometry.  A random vector of dimension two or more has independent components and is rotationally invariant if and only if its components are Gaussian, centered, with same variances.

In other words, for all $n\geq2$, a probability measure on $\mathbb{R}^n$ is in the same time product and spherically symmetric if and only if it is a Gaussian law $\mathcal{N}(0,\sigma^2I_n)$ for some $\sigma\geq0$.

Note that this does not work for $n=1$. In a sense it is a purely multivariate phenomenon. It is remarkable that the statement does not involve any condition on moments or integrability.

This observation goes back at least to James Clerk Maxwell (1831 – 1879) in physics and kinetic theory. Such a characterization by mean of invariance by a group of transformations is typical of geometry or in a sense Felix Klein algebraization of geometry. A matrix version of this phenomenon provides a characterization of Gaussian ensembles of random matrices by mean of unitary invariance and independence of the entries. More can be found in a previous post.

Optimization. Among the probability measures with fixed variance and finite entropy, the maximum entropy is reached by Gaussian distributions. In other words, if we denote by $S(f)=-\int f(x)\log(f(x))\mathrm{d}x$ the entropy of a density $f:\mathbb{R}^n\to\mathbb{R}_+$, and if $\int V(x)f(x)\mathrm{d}x=\int V(x)f_\beta(x)\mathrm{d}x\quad\mbox{with}\quad f_\beta(x)=\frac{\mathrm{e}^{-\beta V(x)}}{Z_\beta}\mbox{ and }V(x)=|x|^2$ then the Jensen inequality for the strictly convex function $u\geq0\mapsto u\log(u)$ gives $\mathcal{S}(f_\beta) – \mathcal{S}(f) =\int\frac{f}{f_\beta}\log\frac{f}{f_\beta}f_\beta\mathrm{d}x \geq\left(\int\frac{f}{f_\beta}f_\beta\mathrm{d}x\right)\log\left(\int\frac{f}{f_\beta}f_\beta\mathrm{d}x\right)=0,$ with equality if and only if $f=f_\beta$. This works beyond Gaussians for non-quadratic $V$’s.

This observation goes back at least to Ludwig Boltzmann (1844 – 1906) in kinetic theory. Nowadays it is an essential fact for statistical physics, statistical mechanics, and Bayesian statistics. More can be found in a previous post see also this one.

Probability. The univariate Gaussian distribution is the unique stable distribution with finite variance. In other words if $X$ is a real random variable with finite variance such that every linear combination of independent copies of it is distributed as a dilation of it, then it is Gaussian.$$\mathrm{Law}(a_1X_1+\cdots+a_nX_n)=\mathrm{Law}\Bigr(\sqrt{a_1^2+\cdots+a_n^2}X\Bigr).$$ As a consequence of this stability by convolution, if we sum up $n$ independent copies of a square integrable random variable and scale the sum by $\sqrt{n}$ to fix the variance, then the limit as $n\to\infty$ can only be Gaussian. This is the central limit phenomenon. This observation goes back, in a combinatorial simple form, at least to Abraham de Moivre (1667 – 1754) via$\binom{n}{k}p^k(1-p)^{n-k} \simeq \frac{1}{\sqrt{2 \pi np(1-p)}}\mathrm{e}^{-\frac{(k-np)^2}{2np(1-p)}}.$ A more general version states that if $X_1,\ldots,X_n$ are independent centered random vectors of $\mathbb{R}^d$ with same covariance matrix $\Sigma$, the increments of a random walk, then $$\mathrm{Law}\Bigr(\frac{X_1+\cdots+X_n}{\sqrt{n}}\Bigr)\underset{n\to\infty}{\longrightarrow}\mathcal{N}(0,\Sigma).$$ The phenomenon holds as soon as the variance of the sum is well spread among the elements of the sum in order to get stability by convolution. More can be found in a previous post.

Note that the discrete time is at the order of the squared space. A time-space scaling leads, from random walks, via the central limit phenomenon, to Brownian motion and the heat equation$$\partial_t p(t,x)=\Delta_x p(t,x)\quad\mbox{where}\quad p(t,x)=(4\pi t)^{-\frac{d}{2}}\mathrm{e}^{-\frac{|x|^2}{4t}}.$$ The Laplacian differential operator appears in this scaling limit via a second order Taylor expansion as the infinitesimal effect of the increments, and has the invariances of the Gaussian.

Actually the Gaussian density is invariant by the Fourier transform in the sense that $$(2\pi)^{-\frac{d}{2}}\int_{\mathbb{R}^d}\mathrm{e}^{-\frac{|x|^2}{2}}\mathrm{e}^{\mathrm{i}\langle \theta,x\rangle}\mathrm{d}x=\mathrm{e}^{-\frac{|\theta|^2}{2}},$$and this allows to solve the heat equation as observed by Joseph Fourier (1768 — 1830). The Fourier transform is also useful for the analysis of the stability by convolution, the central limit phenomenon, and Brownian motion, as observed by Paul Lévy (1886 — 1971).

There are multiple links between all these aspects. For instance the Boltzmann observation is related to the Boltzmann equation, to the Maxwell observation, and to the heat equation.

This post is devoted to a sub-Gaussian tail bound and exponential square integrability for local martingales, taken from my master course on stochastic calculus.

Sub-Gaussian tail bound and exponential square integrability for local martingales. Let ${M={(M_t)}_{t\geq0}}$ be a continuous local martingale issued from the origin. Then for all ${t,K,r\geq0}$,

$\mathbb{P}\Bigr(\sup_{s\in[0,t]}|M_s|\geq r, \langle M\rangle_t\leq K\Bigr) \leq2\mathrm{e}^{-\frac{r^2}{2K}},$

and in particular, if ${\langle M\rangle_t\leq Ct}$ then

$\mathbb{P}\Bigr(\sup_{s\in[0,t]}|M_s|\geq r\Bigr) \leq2\mathrm{e}^{-\frac{r^2}{2Ct}}$

and, for all ${\alpha<\frac{1}{2Ct}}$,

$\mathbb{E}\Bigr(\mathrm{e}^{\alpha\sup_{s\in[0,t]}|M_s|^2}\Bigr)<\infty.$

The condition ${\langle M\rangle_t\leq Ct}$ is a comparison to Brownian motion for which equality holds.

Proof. For all ${\lambda,t\geq0}$, the Doléans-Dade exponential

$X^\lambda ={\Bigr(\mathrm{e}^{\lambda M_t-\frac{\lambda^2}{2}\langle M\rangle_t}\Bigr)}_{t\geq0}$

is a positive super-martingale with ${X^\lambda_0=1}$ and ${\mathbb{E}X^\lambda_t\leq1}$ for all ${t\geq0}$. For all ${t,\lambda,r,K\geq0}$, by using the maximal inequality for the super-martingale ${X^\lambda}$ in the last step,

$\begin{array}{rcl} \mathbb{P}\Bigr(\langle M\rangle_t\leq K,\sup_{0\leq s\leq t}M_s\geq r\Bigr) &\leq&\mathbb{P}\Bigr(\langle M\rangle_t\leq K,\sup_{0\leq s\leq t}X^\lambda_s\geq\mathrm{e}^{\lambda r-\frac{\lambda^2}{2}K}\Bigr) \\ &\leq&\mathbb{P}\Bigr(\sup_{0\leq s\leq t}X^\lambda_s\geq\mathrm{e}^{\lambda r-\frac{\lambda^2}{2}K}\Bigr)\\ &\leq&\mathbb{E}(X^\lambda_0)\mathrm{e}^{-\lambda r+\frac{\lambda^2}{2}K} =\mathrm{e}^{-\lambda r+\frac{\lambda^2}{2}K}. \end{array}$

Taking ${\lambda=r/K}$ gives

$\mathbb{P}\Bigr(\langle M\rangle_t\leq K,\sup_{0\leq s\leq t}M_s\geq r\Bigr) \leq\mathrm{e}^{-\frac{r^2}{2K}}.$

The same reasoning for ${-M}$ instead of ${M}$ provides (note that ${\langle -M\rangle=\langle M\rangle}$ obviously)

$\mathbb{P}\Bigr(\langle M\rangle_t\leq K,\sup_{0\leq s\leq t}(-M_s)\geq r\Bigr) \leq\mathrm{e}^{-\frac{r^2}{2K}}.$

The union bound (hence the prefactor ${2}$) gives then the first desired inequality. The exponential square integrability comes from the usual link between tail bound and integrability, namely if ${X=\sup_{s\in[0,t]}|M_s|}$, ${U(x)=\mathrm{e}^{\alpha x^2}}$, ${\alpha<\frac{1}{2Kt}}$, then, by Fubini-Tonelli,

$\begin{array}{rcl} \mathbb{E}(U(X)) &=&\mathbb{E}\Bigr(\int_0^XU'(x)\mathrm{d}x\Bigr)\\ &=&\mathbb{E}\Bigr(\int_0^\infty\mathbf{1}_{x\leq X}U'(x)\mathrm{d}x\Bigr)\\ &=&\int_0^\infty U'(x)\mathbb{P}(X\geq x)\mathrm{d}x\ &\leq&\int_0^\infty2\alpha x\mathrm{e}^{\alpha x^2}\mathrm{e}^{-\frac{x^2}{2Kt}}\mathrm{d}x <\infty. \end{array}$

Doob maximal inequality for super-martingales. If ${M}$ is a continuous super-martingale, then for all ${t\geq0}$ and ${\lambda>0}$, denoting ${M^-=\max(0,-M)}$,

$\mathbb{P}\Bigr(\max_{s\in[0,t]}|M_s|\geq\lambda\Bigr) \leq\frac{\mathbb{E}(M_0)+2\mathbb{E}(M^-_t)}{\lambda}.$

In particular when ${M}$ is non-negative then ${\mathbb{E}(M^-)=0}$ and the upper bound is ${\frac{\mathbb{E}(M_0)}{\lambda}}$.

Proof. Let us define the bounded stopping time

$T=t\wedge \inf\{s\in[0,t]:M_s\geq \lambda\}.$

We have ${M_T\in\mathrm{L}^1}$ since ${|M_T|\leq\max(|M_0|,|M_t|,\lambda)}$. By the Doob stopping theorem for the sub-martingale ${-M}$ and the bounded stopping times ${0}$ and ${T}$ that satisfy ${M_0\in\mathrm{L}^1}$ and ${M_T\in\mathrm{L}^1}$, we get

$\mathbb{E}(M_0) \geq\mathbb{E}(M_T) \geq \lambda\mathbb{P}(\max_{s\in[0,t]}M_s\geq \lambda) +\mathbb{E}(M_t\mathbf{1}_{\max_{s\in[0,t]}M_s<\lambda})$

hence, recalling that ${M^-=\max(-M,0)}$,

$\lambda\mathbb{P}(\max_{s\in[0,t]}M_s\geq \lambda) \leq \mathbb{E}(M_0)+\mathbb{E}(M^-_t).$

This produces the desired inequality when ${M}$ is non-negative. For the general case, we observe that the Jensen inequality for the nondecreasing convex function ${u\in\mathbb{R}\mapsto\max(u,0)}$ and the sub-martingale ${-M}$ shows that ${M^-}$ is a sub-martingale. Thus, by the Doop maximal inequality for non-negative sub-martingales,

$\lambda\mathbb{P}(\max_{s\in[0,t]}M^-_s\geq \lambda) \leq\mathbb{E}(M^-_t).$

Finally, putting both inequalities together gives

$\lambda\mathbb{P}(\max_{s\in[0,t]}|M_s|\geq \lambda) \leq \lambda\mathbb{P}(\max_{s\in[0,t]}M_s\geq \lambda) +\lambda\mathbb{P}(\max_{s\in[0,t]}M^-_s\geq \lambda) \leq\mathbb{E}(M_0)+2\mathbb{E}(M^-_t).$

Doob maximal inequalities. Let ${M}$ be a continuous process.

1. If ${M}$ is a martingale or a non-negative sub-martingale then for all ${p\geq1}$, ${t\geq0}$, ${\lambda>0}$,

$\mathbb{P}\Bigr(\max_{s\in[0,t]}|M_s|\geq\lambda\Bigr) \leq\frac{\mathbb{E}(|M_t|^p)}{\lambda^p}.$

2. If ${M}$ is a martingale then for all ${p>1}$ and ${t\geq0}$,

$\mathbb{E}\Bigr(\max_{s\in[0,t]}|M_s|^p\Bigr) \leq\Bigr(\frac{p}{p-1}\Bigr)^p\mathbb{E}(|M_t|^p)$

in other words

$\Bigr\|\max_{s\in[0,t]}|M_s|\Bigr\|_p\leq\frac{p}{p-1}\|M_t\|_p.$

In particular if ${M_t\in\mathrm{L}^p}$ then ${M^*_t=\max_{s\in[0,t]}M_s\in\mathrm{L}^p}$.

Comments. This inequality allows to control the tail of the supremum by the moment at the terminal time. It is a continuous time martingale version of the simpler Kolmogorov maximal inequality for sums of independent and identically distributed random variables. Note that ${q=1/(1-1/p)=p/(p-1)}$ is the Hölder conjugate of ${p}$ namely ${1/p+1/q=1}$. The inequality is often used with ${p=2}$, for which ${(p/(p-1))^p=4}$.

Proof. We can always assume that the right hand side is finite, otherwise the inequalities are trivial.

1. If ${M}$ is a martingale, then by the Jensen inequality for the convex function ${u\in\mathbb{R}\mapsto |u|^p}$, the process ${|M|^p}$ is a sub-martingale. Similarly, If ${M}$ is a non-negative sub-martingale then, since ${u\in[0,+\infty)\mapsto u^p}$ is convex and non-decreasing it follows that ${M^p=|M|^p}$ is a sub-martingale. Therefore in all cases ${{(|M_s|^p)}_{s\in[0,t]}}$ is a sub-martingale. Let us define the bounded stopping time

$T=t\wedge \inf\{s\geq0:|M_s|\geq\lambda\}.$

Note that ${|M_T|\leq\max(|M_0|,\lambda)}$ and thus ${M_T\in\mathrm{L}^1}$. The Doob stopping theorem for the sub-martingale ${|M|^p}$ and the bounded stopping times ${T}$ and ${t}$ that satisfy ${T\leq t}$ gives

$\mathbb{E}(|M_T|^p)\leq\mathbb{E}(|M_t|^p).$

On the other hand the definition of ${T}$ gives

$|M_T|^p \geq\lambda^p\mathbf{1}_{\max_{s\in[0,t]}|M_s|\geq\lambda} +|M_t|^p\mathbf{1}_{\max_{s\in[0,t]}|M_s|<\lambda}\\ \geq\lambda^p\mathbf{1}_{\max_{s\in[0,t]}|M_s|\geq\lambda}.$

It remains to combine these inequalities to get the desired result.

2. If we introduce for all ${n\geq1}$ the “localization” stopping time

$T_n=t\wedge\inf\{s\geq0:|M_s|\geq n\},$

then the desired inequality for the bounded sub-martingale ${{(|M_{s\wedge T_n}|)}_{s\in[0,t]}}$ would give

$\mathbb{E}(\max_{s\in[0,t]}|M_{s\wedge T_n}|^p) \leq\left(\frac{p}{p-1}\right)^p\mathbb{E}(|M_t|^p),$

and the desired result for ${{(M_s)}_{s\in[0,t]}}$ would then follow by monotone convergence theorem. Thus this shows that we can assume without loss of generality that ${{(|M_s|)}_{s\in[0,t]}}$ is bounded, in particular that ${\mathbb{E}(\max_{s\in[0,t]}|M_s|^p)<\infty}$. This a martingale localization argument! The previous proof gives

$\mathbb{P}(\max_{s\in[0,t]}|M_s|\geq\lambda) \leq\frac{\mathbb{E}(|M_t|\mathbf{1}_{\max_{s\in[0,t]}|M_s|\geq\lambda})}{\lambda}$

for all ${\lambda>0}$, and thus

$\int_0^\infty\lambda^{p-1} \mathbb{P}(\max_{s\in[0,t]}|M_s|\geq\lambda)\mathrm{d}\lambda \leq \int_0^\infty\lambda^{p-2} \mathbb{E}(|M_t|\mathbf{1}_{\max_{s\in[0,t]}|M_s|\geq\lambda})\mathrm{d}\lambda.$

Now the Fubini-Tonelli theorem gives

$\int_0^\infty\lambda^{p-1}\mathbb{P}(\max_{s\in[0,t]}|M_s|\geq\lambda)\mathrm{d}\lambda =\mathbb{E}\int_0^{\max_{s\in[0,t]}|M_s|}\lambda^{p-1}\mathrm{d}\lambda =\frac{1}{p}\mathbb{E}(\max_{s\in[0,t]}|M_s|^p).$

and similarly (here we need ${p>1}$)

$\int_0^\infty\lambda^{p-2}\mathbb{E}(|M_t|\mathbf{1}_{\max_{s\in[0,t]}|M_s|\geq\lambda)})\mathrm{d}\lambda =\frac{1}{p-1}\mathbb{E}(|M_t|\max_{s\in[0,t]}|M_s|^{p-1}).$

Combining all this gives

$\mathbb{E}(\max_{s\in[0,t]}|M_s|^p) \leq\frac{p}{p-1} \mathbb{E}(M_t\max_{s\in[0,t]}|M_s|^{p-1}).$

But since the Hölder inequality gives

$\mathbb{E}(|M_t|\max_{s\in[0,t]}|M_s|^{p-1}) \leq\mathbb{E}(|M_t|^p)^{1/p}\mathbb{E}(\max_{s\in[0,t]}|M_s|^p)^{\frac{p-1}{p}},$

we obtain

$\mathbb{E}(\max_{s\in[0,t]}|M_s|^p) \leq\frac{p}{p-1}\mathbb{E}(|M_t|^p)^{1/p}\mathbb{E}(\max_{s\in[0,t]}|M_s|^p)^{\frac{p-1}{p}}.$

Consequently, since ${\mathbb{E}(\max_{s\in[0,t]}|M_s|^p)<\infty}$, we obtain the desired inequality.

Doob stopping theorem for sub-martingales. If ${M}$ is a continuous sub-martingale and ${S}$ and ${T}$ are bounded stopping times such that ${S\leq T,\quad M_S\in\mathrm{L}^1,\quad M_T\in\mathrm{L}^1}$, then

$\mathbb{E}(M_S)\leq\mathbb{E}(M_T).$

Proof. We proceed as in the proof of the Doob stopping theorem for martingales, by assuming first that ${S}$ and ${T}$ take their values in the finite set ${\{t_1,\ldots,t_n\}}$ where ${t_1<\cdots<t_n}$. In this case ${M_T}$ and ${M_S}$ are in ${\mathrm{L}^1}$ automatically. The inequality ${S\leq T}$ gives ${\mathbf{1}_{S\geq t}\leq\mathbf{1}_{T\geq t}}$ for all ${t}$. Using this fact and the sub-martingale property of ${M}$, we get

$\begin{array}{rcl} \mathbb{E}(M_S) &=&\mathbb{E}(M_0) +\mathbb{E}\Big(\sum_{k=1}^n\overbrace{\mathbb{E}(M_{t_k}-M_{t_{k-1}}\mid\mathcal{F}_{t_{k-1}})}^{\geq0}\mathbf{1}_{S\geq t_k}\Bigr)\\ &\leq&\mathbb{E}(M_0) +\mathbb{E}\Big(\sum_{k=1}^n\mathbb{E}(M_{t_k}-M_{t_{k-1}}\mid\mathcal{F}_{t_{k-1}})\mathbf{1}_{T\geq t_k}\Bigr)\\ &=&\mathbb{E}(M_T). \end{array}$

More generally, when ${S}$ and ${T}$ are arbitrary bounded stopping times satisfying ${S\leq T}$, we proceed by approximation as in the proof of the Doob stopping for martingales, using again the sub-martingale nature of ${M}$ to get uniform integrability.

Doob stopping theorem for martingales. If ${M}$ is a continuous martingale and ${T:\Omega\rightarrow[0,+\infty]}$ is a stopping time then ${{(M_{t\wedge T})}_{t\geq0}}$ is a martingale: for all ${t\geq0}$ and ${s\in[0,t]}$, we have

$M_{t\wedge T}\in\mathrm{L}^1 \quad\text{and}\quad \mathbb{E}(M_{t\wedge T}\mid\mathcal{F}_s)=M_{s\wedge T}.$

Moreover, if ${T}$ is bounded, or if ${T}$ is almost surely finite and ${{(M_{t\wedge T})}_{t\geq0}}$ is uniformly integrable (for instance dominated by an integrable random variable), then

$M_T\in\mathrm{L}^1 \quad\text{and}\quad \mathbb{E}(M_T)=\mathbb{E}(M_0).$

Comments. The most important is that ${{(M_{t\wedge T})}_{t\geq0}}$ is a martingale. We have always ${\lim_{t\rightarrow\infty}M_{T\wedge t}\mathbf{1}_{T<\infty}=M_T\mathbf{1}_{T<\infty}}$ almost surely. When ${T<\infty}$ almost surely we could use what we know on ${M}$ and ${T}$ to deduce by monotone or dominated convergence that this holds in ${\mathrm{L}^1}$, giving ${\mathbb{E}(M_T)=\mathbb{E}(\lim_{t\rightarrow\infty}M_{t\wedge T})=\lim_{t\rightarrow\infty}\mathbb{E}(M_{t\wedge T})=\mathbb{E}(M_0)}$.

The theorem states that this is automatically the case when ${T}$ is bounded or when ${M^T}$ is uniformly integrable. Furthermore, if ${M^T}$ is uniformly integrable then it can be shown that ${M_\infty}$ exists, giving a sense to ${M_T}$ even on ${\{T=\infty\}}$, and then ${\mathbb{E}(M_T)=\mathbb{E}(M_0)}$.

Proof. Let assume first that ${T}$ takes a finite number of values ${t_1<\cdots<t_n}$. Let us show that ${M_T\in\mathrm{L}^1}$ and ${\mathbb{E}(M_T)=\mathbb{E}(M_0)}$. We have ${M_T=\sum_{k=1}^nM_{t_k}\mathbf{1}_{T=t_k}\in\mathrm{L}^1}$, and moreover, using

$\{T\geq t_k\}=(\cup_{i=1}^{k-1}\{T=t_i\})^c\in\mathcal{F}_{t_{k-1}},$

and the martingale property ${\mathbb{E}(M_{t_k}-M_{t_{k-1}}\mid\mathcal{F}_{t_{k-1}})=0}$, for all ${k}$, we get

$\mathbb{E}(M_T) =\mathbb{E}(M_0) +\mathbb{E}\Big(\sum_{k=1}^n\mathbb{E}(M_{t_k}-M_{t_{k-1}}\mid\mathcal{F}_{t_{k-1}})\mathbf{1}_{T\geq t_k}\Bigr) =\mathbb{E}(M_0).$

Suppose now that ${T}$ takes an infinite number of values but is bounded by some constant ${C}$. For all ${n\geq0}$, we approximate ${T}$ by the piecewise constant random variable (discretization of ${[0,C]}$).

$T_n=C\mathbf{1}_{T=C}+\sum_{k=1}^{n}t_{k}\mathbf{1}_{t_{k-1}\leq T<t_{k}} \quad\text{where}\quad t_k=t_{n,k}=C\frac{k}{n}.$

This is a stopping time since for all ${t\geq0}$, ${\{T_n\leq t\}=\{T_n\leq\lfloor t\rfloor\}\in\mathcal{F}_{\lfloor t\rfloor}}$, which reduces the problem to a discrete ${t}$, and then for all integer ${m\geq0}$, we have that ${\{T_n=m\} =\varnothing\in\mathcal{F}_0}$ if ${m\not\in\{t_{k}:1\leq k\leq n\}}$, while ${\{T_n=m\}=\{T=C\}\in\mathcal{F}_C}$ if ${m=C}$, and

$\{T_n=m\} =\{T<t_{k-1}\}^c\cap\{T<t_{k}\}\in\mathcal{F}_{t_{k}}$

if ${m=t_{k}}$, ${1\leq k\leq n}$, where we used the fact that for all ${t\geq0}$,

$\{T=t\}=\{T\leq t\}\cap\{T<t\}^c=\{T\leq t\}\cap\cap_{r=1}^\infty\{T>t-1/r\}\in\mathcal{F}_t.$

Since ${T_n}$ takes a finite number of values, the previous step gives ${\mathbb{E}(M_{T_n})=\mathbb{E}(M_0)}$. On the other hand, almost surely, ${T_n\rightarrow T}$ as ${n\rightarrow\infty}$. Since ${M}$ is continuous, it follows that almost surely ${M_{T_n}\rightarrow M_T}$ as ${n\rightarrow\infty}$. Let us show now that ${{(M_{T_n})}_{n\geq1}}$ is uniformly integrable. Since for all ${n\geq0}$, ${T_n}$ takes its values in a finite set ${t_1<\cdots<t_{m_n}\leq C}$, the martingale property and the Jensen inequality give, for all ${R>0}$,

$\begin{array}{rcl} \mathbb{E}(|M_C|\mathbf{1}_{|M_{T_n}|\geq R}) &=&\sum_k\mathbb{E}(|M_C|\mathbf{1}_{|M_{t_k}|\geq R,T_n=t_k})\\ &=&\sum_k\mathbb{E}(\mathbb{E}(|M_C|\mid\mathcal{F}_{t_k})\mathbf{1}_{|M_{t_k}|\geq R,T_n=t_k})\\ &\geq&\sum_k\mathbb{E}(|\mathbb{E}(M_C\mid\mathcal{F}_{t_k})|\mathbf{1}_{|M_{t_k}|\geq R,T_n=t_k})\\ &=&\sum_k\mathbb{E}(|M_{t_k}|\mathbf{1}_{|M_{t_k}|\geq R,T_n=t_k})\\ &=&\mathbb{E}(|M_{T_n}|\mathbf{1}_{|M_{T_n}|\geq R}). \end{array}$

Now ${M}$ is continuous and thus locally bounded, and ${M_C\in\mathrm{L}^1}$, thus, by dominated convergence,

$\sup_n\mathbb{E}(|M_{T_n}|\mathbf{1}_{|M_{T_n}|>R}) \leq\mathbb{E}(|M_C|\mathbf{1}_{\sup_{s\in[0,C]}|M_s|\geq R}) \underset{R\rightarrow\infty}{\longrightarrow}0.$

Therefore ${{(M_{T_n})}_{n\geq0}}$ is uniformly integrable. As a consequence

$\overset{\mathrm{a.s.}}{\lim_{n\rightarrow\infty}}M_{T_n}=M_T\in\mathrm{L}^1 \quad\text{and}\quad \mathbb{E}(M_T)=\lim_{n\rightarrow\infty}\mathbb{E}(M_{T_n})=\mathbb{E}(M_0).$

Let us suppose now that ${T}$ is an arbitrary stopping time. For all ${0\leq s\leq t}$ and ${A\in\mathcal{F}_s}$, the random variable ${S=s\mathbf{1}_A+t\mathbf{1}_{A^c}}$ is a (finite) stopping time, and what precedes for the finite stopping time ${t\wedge T\wedge S}$ gives ${M_{t\wedge T\wedge S}\in\mathrm{L}^1}$ and ${\mathbb{E}(M_{t\wedge T\wedge S})=\mathbb{E}(M_0)}$. Now, using the definition of ${S}$, we have

$\mathbb{E}(M_0) =\mathbb{E}(M_{t\wedge T\wedge S}) =\mathbb{E}(\mathbf{1}_AM_{s\wedge T}) +\mathbb{E}(\mathbf{1}_{A^c}M_{t\wedge T}) =\mathbb{E}(\mathbf{1}_A(M_{s\wedge T}-M_{t\wedge T}))+\mathbb{E}(M_{t\wedge T}).$

Since ${\mathbb{E}(M_{t\wedge T})=\mathbb{E}(M_0)}$, we get the martingale property for ${{(M_{t\wedge T})}_{t\geq0}}$, namely

$\mathbb{E}((M_{t\wedge T}-M_{s\wedge T})\mathbf{1}_A)=0.$

Finally, suppose that ${T<\infty}$ almost surely and ${{(M_{t\wedge T})}_{t\geq0}}$ is uniformly integrable. The random variable ${M_T}$ is well defined and ${\lim_{t\rightarrow\infty}M_{t\wedge T}=M_T}$ almost surely because ${M}$ is continuous. Furthermore, since ${{(M_{t\wedge T})}_{t\geq0}}$ is uniformly integrable, it follows that ${M_T\in\mathrm{L}^1}$ and ${\lim_{t\rightarrow\infty}M_{t\wedge T}=M_T}$ in ${\mathrm{L}^1}$. In particular ${\mathbb{E}(M_0)\underset{\forall t}{=}\mathbb{E}(M_{t\wedge T})=\lim_{t\rightarrow\infty}\mathbb{E}(M_{t\wedge T})=\mathbb{E}(M_T)}$. Further reading in the same spirit.

Syntax · Style · .