# Libres pensées d'un mathématicien ordinaire Posts

10. Le département de mathématiques et applications (DMA) de l’École normale supérieure de Paris est à la fois un laboratoire de recherche et un département d’enseignement. Ce département a une particularité rare : aucun poste d’enseignant-chercheur n’y est permanent, et la durée maximale d’affectation est de dix ans. Outre les administratifs, doctorants, et post-doctorants, les membres sont essentiellement des chercheurs du CNRS et des enseignants-chercheurs mis à disposition par les universités parisiennes. Cette particularité du DMA est due au mathématicien Georges Poitou (1926 – 1989) qui a dirigé et transformé l’établissement.

U. Dans ce petit établissement, les effectifs des départements sont assez réduits. Au DMA, les élèves proviennent, pour la plupart, des classes préparatoires aux grandes écoles, mais aussi, dans une moindre mesure, de l’étranger et des universités françaises. En particulier, chaque année, quelques étudiants de licence de mathématiques sont recrutés à l’issue du Concours normalien étudiant. Ce concours étudiant mérite d’être connu, tout comme celui de l’ÉNS Paris-Saclay, de l’ÉNS Lyon, et de l’École Polytechnique (qui recrute beaucoup à Dauphine !). Les lignes bougent le long de l’opposition franco-française entre filières sélectives et universités.

If you believe that the completion and right-continuity of filtrations are typical abstract non sense of the general theory of stochastic processes, useless and obscure, you are maybe missing something interesting. Contrary to discrete time or space stochastic processes, continuous time and space stochastic processes lead naturally to measurability issues, when considering for instance natural objects such as running suprema or stopping times.

Negligible sets and completeness. In a probability space ${(\Omega,\mathcal{F},\mathbb{P})}$, we say that ${A\subset\Omega}$ is negligible when there exists ${A’\in\mathcal{F}}$ with ${A\subset A’}$ and ${\mathbb{P}(A’)=0}$. We say that the ${(\Omega,\mathcal{F},\mathbb{P})}$ is complete when ${\mathcal{F}}$ contains the negligible subsets of ${\Omega}$. A filtration ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ on ${(\Omega,\mathcal{F},\mathbb{P})}$ is complete when ${\mathcal{F}_0}$ contains the negligible elements of ${\mathcal{F}}$.

Completeness emerges naturally via almost sure events which are complement of negligible sets (as for running supremum below) as well as via projections of measurable sets of product spaces (as for hitting times of Borel sets below) .

We say that a process ${{(X_t)}_{t\in\mathbb{R}_+}}$ taking values in a topological space equipped with its Borel ${\sigma}$-field is continuous when it has almost surely continuous trajectories. This is the case for instance of Brownian motion constructed from random series.

Measurability of running supremum from completeness. Let ${{(X_t)}_{t\in\mathbb{R}_+}}$ be continuous, defined on a probability space ${(\Omega,\mathcal{F},\mathbb{P})}$, and taking values in a topological space ${E}$ equipped with its Borel ${\sigma}$-field ${\mathcal{E}}$. Let ${f:E\rightarrow\mathbb{R}}$ be a measurable function.

• If ${(\Omega,\mathcal{F},\mathbb{P})}$ is complete then ${\sup_{s\in[0,t]}f(X_s)}$ is measurable for all ${t\in\mathbb{R}_+}$.
• If ${X}$ is adapted for a complete ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ then ${{(\sup_{s\in[0,t]}f(X_s))}_{t\in\mathbb{R}_+}}$ is adapted.

Proof. Let ${\Omega’\in\mathcal{F}}$ be an a.s. event on which ${X}$ is continuous. Set ${S_t=\sup_{s\in[0,t]}f(X_s)}$.

• We have ${\Omega’\in\mathcal{F}}$. Next, for all ${t\in\mathbb{R}_+}$ and ${A\in\mathcal{E}}$, we have

$\Omega’\cap\{S_t\in A\} =\Omega’\cap \bigr\{\sup_{s\in[0,t]\cap\mathbb{Q}}f(X_s)\in A\bigr\}\in\mathcal{F},$

while ${(\Omega\setminus\Omega’)\cap\{S_t\in A\}\subset\Omega\setminus\Omega’}$ is negligible and thus in ${\mathcal{F}}$ by completeness.

• Same argument as before with ${\mathcal{F}_t}$ instead of ${\mathcal{F}}$.

Universal completeness. The notion of completeness is relative to the probability measure ${\mathbb{P}}$. There is also a notion of universal completeness, that does not depend on the probability measure, see Dellacherie and Meyer 1978. This is not that useful in probability.

Stopping times. A map ${T:\Omega\rightarrow[0,+\infty]}$ is a stopping time for ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ when

$\{T\leq t\}\in\mathcal{F}_t$

for all ${t\in\mathbb{R}_+}$. Contrary to discrete time filtrations, the notion of stopping times for continuous time filtrations leads naturally to the notions of complete filtration and right continuous filtration. This is visible notably with hitting times as follows.

Hitting times as archetypal examples of stopping times. Let ${X={(X_t)}_{t\in\mathbb{R}_+}}$ be a continuous and adapted process defined on ${(\Omega,\mathcal{F},\mathbb{P})}$ with respect to a complete filtration ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$, and taking its values in a metric space ${G}$ equipped with its Borel ${\sigma}$-field. Then, for all closed subset ${A\subset G}$, the hitting time ${T_A:\Omega\rightarrow[0,+\infty]}$ of ${A}$, given by

$T_A=\inf\{t\in\mathbb{R}_+:X_t\in A\},$

with convention ${\inf\varnothing=+\infty}$, is a stopping time.

Proof. Let ${\Omega’}$ be the a.s. event on which ${X}$ is continuous. On ${\Omega’}$, since ${X}$ is continuous and ${A}$ is closed, we have ${\{t\in\mathbb{R}_+:X_t\in A\}=\{t\in\mathbb{R}_+:\mathrm{dist}(X_t,A)=0\}}$, the map ${t\in\mathbb{R}_+\mapsto\mathrm{dist}(X_t,A)}$ is continuous, and the ${\inf}$ in the definition of ${T_A}$ is a ${\min}$. Now, since ${X}$ is adapted, we have, for all ${t\in\mathbb{R}_+}$,

$\Omega’\cap\{T_A\leq t\} =\Omega’\cap\bigcap_{s\in[0,t]\cap\mathbb{Q}}\{X_s\in A\} \in\mathcal{F}_t,$

where we have also used ${\Omega’\in\mathcal{F}_t}$ for all ${t\in\mathbb{R}_+}$ since ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ is complete. Moreover ${(\Omega\setminus\Omega’)\cap\{T_A\leq t\}\subset\Omega\setminus\Omega’}$ is negligible, and thus in ${\mathcal{F}_t}$ by completeness.

Right-continuity. A filtration ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ is right-continuous if for all ${t\in\mathbb{R}_+}$ we have

$\mathcal{F}_t=\mathcal{F}_{t^+} \quad\text{where}\quad \mathcal{F}_{t+} =\bigcap_{\varepsilon>0}\mathcal{F}_{t+\varepsilon} =\bigcap_{s>t}\mathcal{F}_s.$

Alternative definition of stopping times. If ${T:\Omega\rightarrow[0,+\infty]}$ is a stopping time with respect to a filtration ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ then

$\{T<t\}\in\mathcal{F}_t$

for all ${t\in\mathbb{R}_+}$. Conversely this property implies that ${T}$ is a stopping time when the filtration is right-continuous. Indeed, if ${T}$ is a stopping time then for all ${t\in\mathbb{R}_+}$ we have

$\{T<t\} =\bigcup_{n=1}^\infty\{T\leq t-{\textstyle\frac{1}{n}}\} \in\mathcal{F}_t,$

Conversely ${\{T\leq t\}\in\cap_{s>t}\mathcal{F}_s=\mathcal{F}_{t+}}$ since for all ${s>t}$,

$\{T\leq t\} =\bigcap_{n=1}^\infty\{T<(t+{\textstyle\frac{1}{n}})\wedge s\} \in\mathcal{F}_{s}.$

Note that if ${T}$ is a stopping time then ${\{T=t\}=\{T\leq t\}\cap\{T<t\}^c\in\mathcal{F}_t}$.

Progressively measurable processes. Recall that a process ${{(X_t)}_{t\in\mathbb{R}_+}}$ defined on a probability space ${(\Omega,\mathcal{F},\mathbb{P})}$ is progressively measurable for ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$ when for all ${t\in\mathbb{R}_+}$ the map ${(\omega,s)\in\Omega\times[0,t]\mapsto X_s(\omega)}$ is measurable for ${\mathcal{F}_t\otimes\mathcal{B}_{[0,t]}}$. Example of progressively measurable processes include adapted right-continuous processes.

Hitting time of Borel sets. Let ${X={(X_t)}_{t\in\mathbb{R}_+}}$ be a progressively measurable process defined on a probability space ${(\Omega,\mathcal{F},\mathbb{P})}$ equipped with a right continuous and complete filtration ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$, and taking its values in a measurable space ${G}$. Then for all measurable subset ${A\subset G}$, the hitting time ${T_A:\Omega\rightarrow[0,+\infty]}$ defined by

$T_A=\inf\{t\in\mathbb{R}_+:X_t\in A\},$

with convention ${\inf\varnothing=+\infty}$, is a stopping time.

Proof. The debut ${D_B}$ of any ${B\in\mathcal{F}\otimes\mathcal{B}(\mathbb{R}_+)}$ is defined for all ${\omega\in\Omega}$ by

$D_B(\omega)=\inf\{t\in\mathbb{R}_+:(\omega,t)\in B\}\in[0,+\infty].$

If ${B}$ is progressive, then ${D_B}$ is a stopping time (this is known as the debut theorem). Indeed, for all ${t\in\mathbb{R}_+}$ the set ${\{D_B<t\}}$ is then the projection on ${\Omega}$ of

$C=\{s\in[0,t):(\omega,s)\in B\},$

which belongs to ${\mathcal{B}(\mathbb{R}_+)\otimes\mathcal{F}_t}$ since ${B}$ is progressive. Since the filtration is complete, this projection belongs to ${\mathcal{F}_t}$, see for instance Dellacherie and Meyer Theorem IV.50 page 116. Now ${\{D_B<t\}\in\mathcal{F}_t}$ for all ${t\in\mathbb{R}_+}$ implies that ${D_B}$ is a stopping time since the filtration is right continuous. Finally it remains to note that

$T_A=D_B\quad\text{with}\quad B=\{(\omega,t):X_t\in A\},$

which is progressive as pre-image of ${\mathbb{R}_+\times A}$ by ${(\omega,t)\mapsto X_t(\omega)}$ (${X}$ is progressive).

A famous mistake. This is related to a famous mistake made by Henri Lebesgue (1875 — 1941) on the measurability of projections of measurable sets in product spaces, that motivated Nikolai Luzin (1883 — 1950) and his student Mikhail Suslin (1894 — 1919) to forge the concept of analytic set and descriptive set theory.

[…] Sans que le terme de tribu soit utilisé à l’époque, il semblait à Borel qu’aucune opération de l’analyse ne ferait jamais sortir de la tribu borélienne. C’était aussi l’avis de Lebesgue, et il avait cru le démontrer en 1905. Rarement erreur a été plus fructueuse. Au début de l’année 1917, les Comptes rendus publient deux notes des Russes Nicolas Lusin et M. Ya. Souslin. […] La projection d’un borélien n’est pas nécessairement un borélien. L’analyse classique force donc à sortir de la tribu borélienne. Entre la tribu de Borel et celle de Lebesgue se trouve la tribu de Lusin, constituée par les ensembles que Lusin appelle analytiques et qui sont des images continues de boréliens. Au cours des années 1920 s’est développée à Moscou une école mathématique extrêmement brillante, dont Lusin a été le fondateur. Ainsi Lebesgue et son œuvre ont été beaucoup mieux connus à Moscou qu’ils ne l’étaient en France. Hongrie, Pologne et Russie ont été les foyers de rayonnement de la pensée de Lebesgue et de son héritage. […]

Jean-Pierre Kahane (2001)

Canonical filtration. It is customary to assume that the underlying filtration is right-continuous and complete. For a given filtration ${{(\mathcal{F}_t)}_{t\in\mathbb{R}_+}}$, it is always possible to consider its completion ${(\sigma_t)}_{t\in\mathbb{R}_+}={(\sigma(\mathcal{N}\cup\mathcal{F}_t))}_{t\in\mathbb{R}_+}$ where ${\mathcal{N}}$ is the collection of negligible elements of ${\mathcal{F}}$. It is also customary to consider the right-continuous version ${{(\sigma_{t+})}_{t\in\mathbb{R}_+}}$, called the canonical filtration. A process is always adapted with respect to the canonical filtration constructed from its completed natural filtration.

Subtleties about righ-continuity of filtrations. The natural filtration of a right-continuous process is not right-continuous in general, indeed a counter example is given by ${X_t=tZ}$ for all ${t\in\mathbb{R}_+}$ where ${Z}$ is a non-constant random variable. Indeed, we have ${\sigma(X_0)=\{\varnothing,\Omega\}}$ while ${\sigma(X_{0+\varepsilon}:\varepsilon>0)=\sigma(Z)\neq\sigma(X_0)}$. However it can be shown that the completion of the natural filtration of a Feller Markov process, including all Lévy processes and in particular Brownian motion, is always right-continuous.

The Sinkhorn factorization of a square matrix $A$ with positive entries takes the form $$A = D_1 S D_2$$ where $D_1$ and $D_2$ are diagonal matrices with positive diagonal entries and where $S$ is a doubly stochastic matrix meaning that it has entries in $[0,1]$ and each row and column sums up to one. It is named after Richard Dennis Sinkhorn (1934 – 1995) who worked on the subject in the 1960s. A natural algorithm for the numerical approximation of this factorization consists in iteratively normalize each row and each column, and this is referred to as the Sinkhorn-Knopp algorithm. Its convergence is fast. Actually this is a special case of the iterative proportional fitting (IPF) algorithm, classic in probability and statistics, which goes back at least to the analysis of the contingency tables in the 1920s, and which was reinvented many times, see for instance the article by Friedrich Pukelsheim and references therein. It is in particular well known that $S$ is the projection of $A$, with respect to the Kullback-Leibler divergence, on the set of doubly stochastic matrices (Birkhoff polytope). The Sinkhorn factorization and its algorithm became fashionable in the domain of computational optimal transportation due to a relation with entropic regularization. It is also considered in the domain of quantum information theory.

Here is a simple implementation using the Julia programming language.

using LinearAlgebra # for Diagonal
#
function sinkhorn(A, tol = 1E-6, maxiter = 1E6)
# Sinkhorn factorization A = D1 S D2 where D1 and D2 are diagonal and S doubly
# stochastic using Sinkhorn-Knopp algorithm which consists in iterative rows
# and columns sums normalizations. This code is not optimized.
m , n = size(A,1), size(A,2)
D1, D2 = ones(1,n), ones(1,n)
S = A
iter, err = 0, +Inf
while ((err > tol) && (iter < maxiter))
RS = vec(sum(S, dims = 2)) # column vector of rows sums
D1 = D1 .* RS
S = Diagonal(1 ./ RS) * S
CS = vec(sum(S, dims = 1)) # row vector of columns sums
D2 = CS .* D2
S = S * Diagonal(1 ./ CS)
iter += 1
err = norm(RS .- 1) + norm(CS .- 1)
end
return S, Diagonal(D1), Diagonal(D2), iter
end # function


Circular law. Suppose now that $A$ is a random matrix, $n\times n$, with independent and identically distributed positive entries of mean $m$ and variance $\sigma^2$, for instance following the uniform or the exponential distribution. Let $S$ be the random doubly stochastic matrix provided by the Sinkhorn factorization. It is tempting to conjecture that the empirical spectral distribution of $$\frac{m\sqrt{n}}{\sigma}S$$ converges weakly as $n\to\infty$ to the uniform distribution on the unit disc of the complex plane, with a single outlier equal to $\frac{m\sqrt{n}}{\sigma}$. The circular law for $S$ is inherited from the circular law for $A$ and the law of large numbers. This can be easily guessed from the first iteration of the Sinkhorn-Knopp algorithm, that reminds the circular law analysis of random Markov matrices inspired from the Dirichlet Markov Ensemble and its decomposition. The circular law for $A$ is a universal high dimensional phenomenon that holds as soon as the variance of the entries is finite. One can also explore the case of heavy tails, and wonder if the circular law for $S$ still remains true!

The following Julia code produces the graphics at the top and the bottom of this post.

using Printf # for @printf
using Plots # for scatter and plot
using LinearAlgebra # for LinRange

function sinkhornplot(A,m,sigma,filename)
S, D1, D2, iter = sinkhorn(A)
@printf("%s\n",filename)
@printf(" ‖A-D1 S D2‖ = %s\n",norm(A-D1*S*D2))
@printf(" ‖RowsSums(S)-1‖ = %s\n",norm(sum(S,dims=2).-1))
@printf(" ‖ColsSums(S)-1‖ = %s\n",norm(sum(S,dims=1).-1))
@printf(" SK iterations = %d\n",iter)
spec = eigvals(S) * m * sqrt(size(S,1)) / sigma
maxval, maxloc = findmax(abs.(spec))
deleteat!(spec, maxloc)
scatter(real(spec),imag(spec),aspect_ratio=:equal,legend=false)
x = LinRange(-1,1,100)
plot!(x,sqrt.(1 .- x.^2),linewidth=2,linecolor=:steelblue)
plot!(x,-sqrt.(1 .- x.^2),linewidth=2,linecolor=:steelblue)
savefig(filename)
end #function

sinkhornplot(rand(500,500),1/2,1/sqrt(12),"sinkhorn-unif.png")
sinkhornplot(-log.(rand(500,500)),1,1,"sinkhorn-expo.png")
sinkhornplot(abs.(randn(500,500)./randn(500,500)),1,1,"sinkhorn-heavy.png")


Example of program output.

sinkhorn-unif.png
‖A-D1 S D2‖ = 6.704194265149758e-14
‖RowsSums(S)-1‖ = 8.209123420903459e-11
‖ColsSums(S)-1‖ = 3.326966272203901e-15
SK iterations = 4
sinkhorn-expo.png
‖A-D1 S D2‖ = 1.9956579227110527e-13
‖RowsSums(S)-1‖ = 5.155302149020719e-11
‖ColsSums(S)-1‖ = 3.3454392974351074e-15
SK iterations = 5
sinkhorn-heavy.png
‖A-D1 S D2‖ = 8.423054036449791e-10
‖RowsSums(S)-1‖ = 4.5152766460217607e-7
‖ColsSums(S)-1‖ = 3.8778423131653425e-15
SK iterations = 126