 Convergence in law to a constant. Let ${{(X_n)}_{n\geq1}}$ be a sequence of random variables defined on a common probability space ${(\Omega,\mathcal{A},\mathbb{P})}$, and taking their values in a metric space ${(E,d)}$ equipped with its Borel sigma-field. It is well known that if ${{(X_n)}_{n\geq1}}$ converges in law as ${n\rightarrow\infty}$ to some Dirac mass ${\delta_c}$ with ${c\in E}$ then the convergence holds in probability. Actually, since the limit is constant, we do not need to define the random variables on a common probability space: the property depends only on the laws ${{(\mu_n)}_{n\geq1}}$ of the random variables ${{(X_n)}_{n\geq1}}$:

$\forall\varepsilon>0,\quad \mathbb{P}(d(X_n,c)>\varepsilon) =\mu_n(B(c,\varepsilon)^c) \underset{n\rightarrow\infty}{\longrightarrow} 0.$

Note that starting from ${{(\mu_n)}_{n\geq1}}$, it is always possible to construct a sequence ${{(X_n)}_{n\geq1}}$ of random variables defined on a common probability space such that ${X_n\sim\mu_n}$ for every ${n\geq1}$. When

$\forall \varepsilon>0,\quad \sum_n\mu_n(B(c,\varepsilon)^c) =\sum_n\mathbb{P}(d(X_n,c)>\varepsilon)<\infty$

then thanks to the first Borel-Cantelli lemma, we get that ${{(X_n)}_{n\geq1}}$ converges almost surely to ${c}$ as ${n\rightarrow\infty}$. These observations lead naturally to the strange concept of universal convergence, which is a sort of almost sure convergence towards a constant for sequences of distributions!

Universal convergence. Let ${{(\mu_n)}_{n\geq1}}$ be a sequence of probability distribution on a metric space ${(E,d)}$, and let ${c\in E}$. We say that ${{(\mu_n)}_{n\geq1}}$ converges universally to ${c}$ as ${n\rightarrow\infty}$ when for every random variables ${{(X_n)}_{n\geq1}}$ defined on a common probability space with ${X_n\sim\mu_n}$ for every ${n}$, we have that ${{(X_n)}_{n\geq1}}$ converges almost surely to ${c}$ as ${n\rightarrow\infty}$.

Note that ${{(\mu_n)}_{n\geq1}}$ are the marginal laws of the sequence ${{(X_n)}_{n\geq1}}$, which tell nothing on the dependence structure.

Thanks to the first Borel-Cantelli lemma, a sufficient condition to get universal convergence is to have

$\forall \varepsilon>0,\quad \sum_n\mu_n(B(c,\varepsilon)^c)<\infty.$

This condition is actually also necessary, since we may construct the ${{(X_n)}_{n\geq1}}$ independent and use the second Borel-Cantelli lemma.

Who cares? Universal convergence is present in high dimensional phenomena in which a concentration of measure holds around a law of large numbers. This is typically the case for instance in asymptotic geometric analysis, in randomized combinatorial optimization, in random matrix theory, and in statistical physics. For instance, in the semicircle Wigner theorem, thanks to the exponential concentration around their mean for the empirical spectral distribution (coming from Azuma-Hoeffding concentration inequality), we have universal convergence. This means that no matter how we construct the sequence of random matrices, what is important is the law at each step.

How about the law of large numbers itself? It is well known that if ${{(X_n)}_{n\geq1}}$ are independent centered random variables with bounded fourth moment, then ${S_n:=\frac{1}{n}(X_1+\cdots+X_n)\rightarrow0}$ as ${n\rightarrow\infty}$ almost surely. Moreover, the convergence in universal. However, the convergence is no longer universal if we lower too much the moment condition. Still about the law of large numbers, one may also remark that in the Glivenko-Cantelli theorem, the convergence is universal thanks to the Dvoretzky-Kiefer-Wolfowitz concentration inequality.

Note. The term universal was suggested to us by Yan Doumerc. The notion of universal convergence is so natural that it might be present with several other names here and there. Charles Bordenave pointed out that this notion is called complete convergence (abridged into c.c.) in the book of Yukich on Euclidean combinatorial optimization. Jean-René Chazottes pointed out that complete convergence is discussed in the book by Gut on Probability, where it is explained that the concept goes back to Hsu, Robbins, and Erdős (~1947).