Press "Enter" to skip to content

Month: November 2019

About convergence of random variables


Suppose that we would like to describe mathematically the convergence of a sequence ${(X_n)}_n$ of random variables towards a limiting random variable $X_\infty$, as $n\to\infty$. We have to select a notion of convergence. If we decide to use almost sure convergence, we need to define all the $X_n$’s as well as the limit $X_\infty$ on a common probability space in order to give a meaning to $$\mathbb{P}(\lim_{n\to\infty}X_n=X_\infty)=1.$$ This means that we need to couple the random variables. If we decide to use convergence in probability or in $L^p$, we have to define, for all $n$, both $X_n$ and $X_\infty$ in the same probability space in order to give a meaning to $\mathbb{P}(|X_n-X_\infty|>\varepsilon)$ and $\mathbb{E}(|X_n-X_\infty|^p)$ respectively, and therefore we end up to define all the $X_n$’s as well as $X_\infty$ on a common probability space. However, if we decide to use convergence in law (i.e. in distribution), then we do not need at all to define the random variables on a common probability space.

In the special case where $X_\infty$ is deterministic, the convergence in probability or in $L^p$ no longer impose to define the random variables on the same probability space. However, the almost sure convergence still requires the same probability space. Moreover if we impose that the almost sure convergence holds regardless of the way we define the random variables on the same probability space (i.e. for arbitrary couplings), then we end up with the important notion of complete convergence, which is equivalent, thanks to Borel-Cantelli lemmas, to a summable convergence in probability. Note that when the limit is deterministic, we also know that the convergence in law is equivalent to the convergence in probability. Moreover, we know in general from the Borel-Cantelli lemma that a summable convergence in probability implies almost sure convergence. Furthermore, the convergence in probability becomes easily summable under moment conditions.

Following Hsu & Robbins, if we consider $X_n=\frac{1}{n}(Z_1+\cdots+Z_n)$ where $Z_1,\ldots,Z_n$ are independent copies of some $Z$ of mean $m$, then the sequence ${(X_n)}_n$ converges completely towards $m$ as soon as $Z$ has a finite second moment, and this condition is almost necessary. This sheds an interesting light on the law of large numbers for triangular arrays.

Some people refuse to consider the almost sure convergence as a true mode of convergence in the sense that it is not associated to a metric, contrary to the other modes of convergence. In some sense, it appears as a critical notion in the law of large numbers, when we lower the concentration typically via integrability (moments conditions). Of course there are plenty of concrete situations for instance with martingales in which the coupling is in fact imposed and for which the almost sure convergence towards a non-constant random variable holds very naturally. A famous example is for instance the one of Pólya urns and of Galton-Watson branching processes. The Marchenko-Pastur theorem in random matrix theory provides an example of natural coupling with a limiting object which is deteterministic, and the convergence is complete via concentration of measure provided that the ingredients have enough finite moments.

Note. The idea of writing this tiny post came from a discussion with my friend Adrien Hardy.

Leave a Comment

Annals of mathematics : probability and statistics

Les joueurs de dés, vers 1640, Georges de la Tour
Georges de la Tour – Les joueurs de dés, vers 1640.

Recently, during a coffee break, emerged a discussion about the presence of probability and statistics in top journals such as Annals of mathematics, Acta Mathematica, Inventiones Mathematicae, or Journal of the AMS. Well, the question has an interest from the point of view of the sociology and history of science. Let us use the Primary and Secondary Mathematical Subject Classification (MSC) codes of each article in order to detect Probability (60x) or Statistics (62x). Here is the data from MathSciNet/zbMath:

  • Annals of Mathematics published 4464 papers in total from 1938 to 2019.
    Among them, 76 (1.7%) have Primary MSC 60x [PDF]
    Among them, 112 (2.5%) have Primary or Secondary MSC 60x [PDF]
    Moreover only 2 have Primary or Secondary MSC 62x [PDF]
  • Acta Mathematica published 1297 papers in total from 1938 to 2017.
    Among them, 44 (3.4%) have Primary MSC 60x [PDF]
    Among them, 63 (4.9%) have Primary or Secondary MSC 60x [PDF]
    Moreover only 4 have Primary or Secondary MSC 62x [PDF]
  • Inventiones Mathematicae published 4311 papers in total from 1966 to 2019.
    Among them, 52 (1.2%) have Primary MSC 60x [PDF]
    Among them, 95 (2.2%) have Primary or Secondary MSC 60x [PDF]
    Moreover only 2 have Primary or Secondary MSC 62x [PDF]
  • Journal of the AMS published 963 papers in total from 1988 to 2019.
    Among them, 28 (2.9%) have Primary MSC 60x [PDF]
    Among them, 49 (5.1%) have Primary or Secondary MSC 60x [PDF]
    Moreover only 5 have Primary or Secondary MSC 62x [PDF]

The presence of probability is low, while the one of statistics is microscopic. A scandal.

AO(P|S). Annals of Probability (AOP) and Annals of Statistics (AOS) were founded only in 1973.

1938. Annals of Mathematics is historically American whereas Acta Mathematica is European. They started respectively in 1892 and 1882. According to MathSciNet, it seems that the first article classified 60x in these journals was published in 1938. The MSC by itself was introduced at the end of the thirties and many articles in MathSciNet are not classified before 1940 at the time of writing. Note that N. Wiener published in the twenties while A. N. Kolmogorov published in the thirties.

Why. The phenomenon has probably multiple explanations, among them we could mention for instance the possible effects of utilitarism and anti-utilitarism in the mathematical elite, in particular during the fifties and sixties, and the possible overweight of some kind of “snobish pure mathematics or mathematicians” in top journals boards. We could also see AOP and AOS as some sort of mathematical ghettos and think about self-censorship. We could moreover think about generational effects. Finally we have to keep in mind that some probability papers were published without any primary or secondary 60x code, such as for instance this one  or that one.

Here is some additional data provided by MathSciNet for Annals of Mathematics:

Other (includes unclassified papers before 1940)1626
14Algebraic geometry319
57Manifolds and cell complexes298
20Group theory and generalizations292
11Number theory288
53Differential geometry262
46Functional analysis207
58Global analysis, analysis on manifolds203
32Several complex variables and analytic spaces174
55Algebraic topology172
10Number theory159
22Topological groups, Lie groups149
30Functions of a complex variable135
35Partial differential equations126
42Harmonic analysis on Euclidean spaces111
37Dynamical systems and ergodic theory107
60Probability theory and stochastic processes76
54General topology51
12Field theory and polynomials40
49Calculus of variations and optimal control; optimization39
17Nonassociative rings and algebras35
47Operator theory35
02Logic and foundations33
52Convex and discrete geometry32
34Ordinary differential equations31
81Quantum theory26
28Measure and integration25
16Associative rings and algebras22
03Mathematical logic and foundations21
13Commutative algebra20
40Sequences, series, summability20
18Category theory; homological algebra19
83Relativity and gravitational theory19
31Potential theory18
43Abstract harmonic analysis17
82Statistical mechanics, structure of matter13
90Operations research, mathematical programming12
06Order, lattices, ordered algebraic structures11
41Approximations and expansions11
15Linear and multilinear algebra; matrix theory10
33Special functions10
26Real functions9
45Integral equations7
76Fluid mechanics7
04Set theory6
39Difference and functional equations6
70Mechanics of particles and systems6
68Computer science3
44Integral transforms, operational calculus2
73Mechanics of solids2
01History and biography1
80Classical thermodynamics, heat transfer1
94Information and communication, circuits1

The same for the last three years :

11Number theory29
14Algebraic geometry20
53Differential geometry17
35Partial differential equations10
37Dynamical systems and ergodic theory8
20Group theory and generalizations6
58Global analysis, analysis on manifolds6
03Mathematical logic and foundations5
57Manifolds and cell complexes5
13Commutative algebra4
22Topological groups, Lie groups4
32Several complex variables and analytic spaces4
60Probability theory and stochastic processes4
42Harmonic analysis on Euclidean spaces3
49Calculus of variations and optimal control; optimization3
52Convex and discrete geometry3
55Algebraic topology3
28Measure and integration2
30Functions of a complex variable2
46Functional analysis2
83Relativity and gravitational theory2
06Order, lattices, ordered algebraic structures1
43Abstract harmonic analysis1
47Operator theory1

Graphics for Annals of mathematics.Graphics for Acta Mathematica.Graphics for Inventiones Mathematicae.Graphics for Journal of the AMS.

JMPA. We could think that a journal such as Journal de mathématiques pures et appliquées, founded in 1872, is in the same time relatively prestigious, generalist, and more open to applied mathematics in general and to probability and statistics in particular. Here is the data for all MSC codes, taken from MathSciNet. We see an obvious overweight for partial differential equations. In the mean time, the situation of probability is better than before, while the presence of statistics is still microscopic.

35Partial differential equations801
58Global analysis, analysis on manifolds138
53Differential geometry136
49Calculus of variations and optimal control; optimization107
46Functional analysis86
76Fluid mechanics69
14Algebraic geometry61
60Probability theory and stochastic processes60
93Systems theory; control58
30Functions of a complex variable57
47Operator theory55
32Several complex variables and analytic spaces53
34Ordinary differential equations43
37Dynamical systems and ergodic theory42
42Harmonic analysis on Euclidean spaces35
74Mechanics of deformable solids34
31Potential theory32
81Quantum theory24
82Statistical mechanics, structure of matter23
22Topological groups, Lie groups21
73Mechanics of solids21
20Group theory and generalizations19
83Relativity and gravitational theory19
11Number theory16
45Integral equations16
28Measure and integration15
10Number theory11
17Nonassociative rings and algebras10
26Real functions10
54General topology10
65Numerical analysis10
78Optics, electromagnetic theory10
33Special functions9
92Biology and other natural sciences8
43Abstract harmonic analysis7
91Game theory, economics, social and behavioral sciences7
55Algebraic topology6
02Logic and foundations5
12Field theory and polynomials5
15Linear and multilinear algebra; matrix theory5
41Approximations and expansions5
16Associative rings and algebras4
44Integral transforms, operational calculus4
52Convex and discrete geometry4
57Manifolds and cell complexes4
70Mechanics of particles and systems4
40Sequences, series, summability3
85Astronomy and astrophysics3
13Commutative algebra2
39Difference and functional equations2
80Classical thermodynamics, heat transfer2
90Operations research, mathematical programming2
94Information and communication, circuits2
18Category theory; homological algebra1

The same for the last three years:

35Partial differential equations154
49Calculus of variations and optimal control; optimization25
53Differential geometry19
14Algebraic geometry16
93Systems theory; control16
58Global analysis, analysis on manifolds12
42Harmonic analysis on Euclidean spaces8
37Dynamical systems and ergodic theory7
60Probability theory and stochastic processes7
76Fluid mechanics7
31Potential theory6
32Several complex variables and analytic spaces6
34Ordinary differential equations5
46Functional analysis5
81Quantum theory5
11Number theory4
47Operator theory4
74Mechanics of deformable solids4
28Measure and integration2
30Functions of a complex variable2
43Abstract harmonic analysis2
82Statistical mechanics, structure of matter2
83Relativity and gravitational theory2
92Biology and other natural sciences2
12Field theory and polynomials1
15Linear and multilinear algebra; matrix theory1
20Group theory and generalizations1
26Real functions1
45Integral equations1
65Numerical analysis1
70Mechanics of particles and systems1
78Optics, electromagnetic theory1
91Game theory, economics, social and behavioral sciences1
94Information and communication, circuits1

CPAM. Finally, here is the same data for Communication on Pure and Applied Mathematics. This journal, established in 1948, is truly open to applied mathematics in general and to probability theory in particular. However, the presence of statistics is still extremely low.

35Partial differential equations898
76Fluid mechanics234
58Global analysis, analysis on manifolds182
60Probability theory and stochastic processes177
53Differential geometry97
65Numerical analysis92
82Statistical mechanics, structure of matter92
34Ordinary differential equations85
47Operator theory65
49Calculus of variations and optimal control; optimization64
37Dynamical systems and ergodic theory58
78Optics, electromagnetic theory58
20Group theory and generalizations49
46Functional analysis48
81Quantum theory43
10Number theory39
73Mechanics of solids37
30Functions of a complex variable29
32Several complex variables and analytic spaces24
57Manifolds and cell complexes24
11Number theory23
74Mechanics of deformable solids23
94Information and communication, circuits22
42Harmonic analysis on Euclidean spaces20
03Mathematical logic and foundations15
31Potential theory15
45Integral equations15
14Algebraic geometry14
55Algebraic topology14
70Mechanics of particles and systems14
83Relativity and gravitational theory14
92Biology and other natural sciences14
01History and biography13
15Linear and multilinear algebra; matrix theory13
52Convex and discrete geometry13
44Integral transforms, operational calculus12
22Topological groups, Lie groups11
26Real functions9
80Classical thermodynamics, heat transfer9
85Astronomy and astrophysics8
43Abstract harmonic analysis7
12Field theory and polynomials6
28Measure and integration6
90Operations research, mathematical programming6
39Difference and functional equations5
68Computer science5
93Systems theory; control5
41Approximations and expansions4
91Game theory, economics, social and behavioral sciences4
02Logic and foundations3
17Nonassociative rings and algebras3
33Special functions3
16Associative rings and algebras2
40Sequences, series, summability2
54General topology2

Probability space


If you say “Let $X$ be a random variable, bla bla bla, and let $Y$ be another random variable independent of $X$…“, then you might be in trouble because $X$ is defined on some uncontrolled and implicit probability space $(\Omega,\mathcal{A},\mathbb{P})$ and this space is not necessarily large enough to allow the definition of $Y$. The definition of $Y$ may require the enlargement of the initial probability space. This implicitly and sneakily breaks the flow of the mathematical reasoning. Of course this is not a problem in general, and we are often interested in (joint) distributions rather that in probability spaces. But this may produce serious bugs sometimes. The funny thing is that this is done silently everywhere and many are not aware of the danger.

Regarding probability spaces glitches, another common subtlety is the misuse of the Skorokhod representation theorem. This nice theorem states that if $(X_n)$ is a sequence of random variables taking values on say a metric space and such that $X_n\to X$ in law, then there exists a probability space $\Omega^*$ carrying $(X^*_n)$ and $X^*$, such that $X^*_n$ has the law of $X_n$ for all $n$ and $X^*$ has the law of $X$, and $X^*_n\to X^*$ almost surely. This theorem is dangerous because it does not control the law of the sequence $(X^*_n)$ itself, in other words the correlations between the $X^*_n$. Its proof plays with these correlations in order to produce almost sure convergence! In particular $(X_1,\ldots,X_n)$ and $(X^*_1,\ldots,X^*_n)$ do not have the same law in general when $n>1$. Moreover even if the initial $X_n$ are independent, the $X^*_n$ are not independent in general. It is customary to say that if you prove something with the Skorokhod representation theorem, then it is likely that either your statement is wrong or you can find another proof.

Note. The idea behind the proof of the Skorokhod representation theorem is that the proximity of distributions implies the existence of a coupling close to the diagonal. For instance it can be easily checked that if $\mu$ and $\nu$ are probability measures on say $\mathbb{Z}$ then $$\mathrm{d}_{\mathrm{TV}}(\mu,\nu)=\inf_{(X,Y)}\mathbb{P}(X\neq Y)$$ where the inf runs over all couples of random variables $(X,Y)$ with $X\sim\mu$ and $Y\sim\nu$.

Note. The idea of writing this micro-post came from a discussion with a PhD student.

Leave a Comment
Syntax · Style · .