Libres pensées d'un mathématicien ordinaire – LPMO

Thorvald Nicolai Thiele (1838 - 1910) who is credited for the first introduction of the cumulants, in 1889, under the name of semi-invariants. The name cumulants seems to be due to Ronald Fisher, John Wishart, or Harold Hotelling.

This post is about a Central Limit Theorem (CLT) involving cumulants, discovered in the 1930s by René Maurice Fréchet and James Alexander Shohat. It is simple and useful.

Cumulants. If $X$ is a real random variable with finite moments of all orders, its cumulants ${(C_k(X))}_{k\geq1}$ are given by the expansion of \[ t\mapsto\log\mathbb{E}(\mathrm{e}^{\mathrm{i}tX})=\sum_{k=1}^\infty\frac{\mathrm{i}^kt^k}{k!}C_k(X) \] in a neighborhood of $0$. In other words $C_k(X)=(-\mathrm{i})^k\partial^k_{t=0}\log\varphi_X(t)$. In particular \[ C_1(X)=\mathbb{E}(X) \quad\text{and}\quad C_2(X)=\mathrm{Var}(X)=\mathbb{E}(X^2)-\mathbb{E}(X)^2. \] More generally, for all $k$, there exists a universal polynomial that depends only on $k$ expressing $C_k(X)$ in terms of the moments of $X$ up to order $k$, and conversely. The cumulants can be seen as more attractive than the moments due to their translation-dilation properties and additivity property with respect to independence, namely \[ C_k(\lambda X)=\lambda^kC_k(X),\quad C_k(X+c)=C_k(X)+c\mathbb{1}_{k=1} \] and \[ C_k(X+Y)=C_k(X)+C_k(Y) \quad\text{if $X$ and $Y$ are independent.} \] We have $X\sim\mathcal{N}(0,1)$ if and only if $C_1(X)=0$, $C_2(X)=1$, and $C_k(X)=0$ for all $k\geq3$.

The Fréchet-Shohat theorem : a CLT from the cumulants. Let ${(X_n)}_{n\geq1}$ be a sequence of real random variables with finite moments of all orders, and such that \[ \lim_{n\to\infty}\frac{C_k(X_n)}{\sqrt{C_2(X_n)}^k}=0 \quad\text{for all $k\geq3$}. \] Then the following CLT holds: \[ \frac{X_n-\mathbb{E}(X_n)}{\sqrt{\mathrm{Var}(X_n)}} \xrightarrow[n\to\infty]{\mathrm{d}} \mathcal{N}(0,1). \]

About the condition. It is satisfied in the usual iid case $X_n=Z_1+\cdots+Z_n$ with $Z_1,\ldots,Z_n$ iid with positive variance and finite moments of all orders. Indeed, $C_k(X_n)=nC_k(Z_1)$, hence \[ C_2(X_n)=\mathrm{Var}(Z_1)n\xrightarrow[n\to\infty]{}+\infty \] while for $k\geq3$, \[ \frac{C_k(X_n)}{\sqrt{C_2(X_n)}^k} =\frac{C_k(Z_1)}{\sqrt{C_2(Z_1)}^k}n^{1-\frac{k}{2}} \xrightarrow[n\to\infty]{} 0. \] Note that the condition on the cumulants is also satisfied when the variance blows up while the higher order cumulants remain bounded, as $n\to\infty$, more precisely \[ \lim_{n\to\infty}C_2(X_n)=+\infty \quad\text{while}\quad \sup_{n\geq1}C_k(X_n) < \infty \quad\text{for all $k\geq3$}, \] a situation that can be encountered for certain linear statistics of point processes.

Usefulness. For certain stochastic models, it is possible to express the cumulants using combinatorics, paving the way to the Fréchet-Shohat theorem. This is for instance the case for determinantal point processes, as observed initially by Costin and Lebowitz, and further studied notably by Soshnikov, and by Rider and Virág.

Proof of the theorem. If we define \[ Y_n=\frac{X_n-\mathbb{E}(X_n)}{\sqrt{\mathrm{Var}(X_n)}}=\frac{X_n-C_1(X_n)}{\sqrt{C_2(X_n)}}, \] then we get, by the basic translation-scaling properties of cumulants \[ C_k(Y_n) =\begin{cases} 0 & \text{if $k=1$} \\ 1 & \text{if $k=2$} \\ \displaystyle\frac{C_k(X_n)}{\sqrt{C_2(X_n)}^k} & \text{if $k\geq3$} \end{cases}. \] Thus $Y_n\to\mathcal{N}(0,1)$ as $n\to\infty$, in the sense of cumulants, and therefore also in the sense of moments. Now the standard normal $\mathcal{N}(0,1)$ is characterized by its moments, according to the Carleman condition, and therefore, the convergence in the sense of moments implies the convergence in distribution, see below.

The Carleman condition on characterization by the moments. Let $\mu$ be a probability measure on $\mathbb{R}$ with finite moments of all orders : $\mathbb{R}[X]\subset L^1(\mu)$. We denote them \[ m_k=\int x^k\mathrm{d}\mu(x),\quad k\geq1. \] If (Carleman condition) \[ \sum_k m_{2k}^{-\frac{1}{2k}}=\infty, \] then $\mu$ is the unique probability measure on $\mathbb{R}$ with moments ${(m_k)}_{k\geq1}$, in other words, it is characterized by its moments.

We say then that the Hamburger moment problem, which is the moment problem on the real line, is determinate. Actually the Carleman condition is essentially a condition from complex analysis that implies the quasi-analycity of the Fourier transform. See for instance [Feller, p. 222].

The Carleman condition is obviously satisfied if $\mu$ has compact support since in this case \[ m_{2k}\leq C^{2k}. \] Beyond compact support, the Carleman condition is satisfied by say $\mathcal{N}(0,1)$ since in this case \[ m_{2k}=(2k-1)!!=\frac{(2k)!}{2^kk!}\sim_{k\to\infty}\sqrt{2}(\frac{2k}{\mathrm{e}})^k. \] However the log-normal distribution, which is the law of $\mathrm{e}^Z$ when $Z\sim\mathcal{N}(0,1)$, has a so heavy tail that it does not satisfy the Carleman condition. Indeed, in this case \[ m_{2k}=\mathrm{e}^{\frac{k^2}{2}}. \] It can also be shown that it is not characterized by its moments. Actually a probability measure may fail to satisfy the Carleman condition while being caracterized by its moments.

From convergence of moments to narrow convergence. Let ${(\mu_n)}_n$ and $\mu$ be probability measures on $\mathbb{R}$ with finite moments of all orders, denoted $m_{n,k}$ and $m_k$ respectively. If $\mu$ is characterized by its moments (for instance it satisfies the Carleman condition!) and if \[ \lim_{n\to\infty}m_{n,k}=m_k\quad\text{for all $k$} \] then \[ \mu_n\to\mu\quad\text{narrowly as}\quad n\to\infty, \] in other words weakly with respect to continuous and bounded test functions.

Indeed, since the sequence of second moments ${(m_{n,2})_n}$ is bounded, by the Markov inequality, the sequence ${(\mu_n)}_n$ is tight. Therefore, by the Prohorov theorem, it is sequentially relatively compact for the narrow convergence. Thus it suffices to show that all converging sub-sequences have the same limit. Now, suppose that a sub-sequence ${(\mu_{\varphi(n)})}_n$ converges towards a probability measure $\nu$. For all fixed $k$, since ${(m_{n,2k})}_n$ is bounded, it follows that for all $k'< 2k$, $\nu$ admits a finite moment of order $k'$, denoted $M_{k'}$, and $\lim_{n\to\infty}m_{n,k'}=M_{k'}$. But $M_{k'}=m_{k'}$, and thus $\nu=\mu$ since $\mu$ is characterized by its moments. This is what is written in [Billingsley, Th. 30.2] for instance.

Convergence in the sense of moments and Wasserstein distances. Let $\mathcal{P}_k$ be the set of probability measures with finite moments up to order $k$. Recall that the Wasserstein distance of order $k$ on $\mathcal{P}_k$ is defined for all $\mu,\nu\in\mathcal{P}_k$ by \[ \mathrm{W}_k(\mu,\nu)=\inf\mathbb{E}(|X-Y|^k)^{1/k} \] where the infimum runs over all couples of random variables $(X,Y)$ with $X\sim\mu$ and $Y\sim\nu$. It turns out that for all $k$, if ${(\mu_n)}_n$ and $\mu$ are in $\mathcal{P}_k$ then \[ \mathrm{W}_k(\mu_n,\mu)\xrightarrow[n\to\infty]{}0 \] if and only if $\mu_n\to\mu$ narrowly as well as in the sense of moments up to order $k$.

It follows that if ${(\mu_n)}_n$ and $\mu$ are probability measures on $\mathbb{R}$ with finite moments of all orders, in other words in $\cap_k\mathcal{P}_k$, and if $\mu$ is characterized by its moments, then \[ \mu_n\to\mu\quad\text{in the sense of moments} \] if and only if \[ \mathrm{W}_k(\mu_n,\mu)\xrightarrow[n\to\infty]{}0\quad\text{for all $k$.} \]

Further reading.

Patrick Billingsley
Probability and measure
John Wiley and Sons, third edition (1995)
William Feller
An introduction to probability theory and its applications - Volume II
John Wiley and Sons, second edition (1971)
René Maurice Fréchet and James Alexander Shohat
A proof of the generalized second-limit theorem in the theory of probability
Transactions of the American Mathematical Society 33:533-543 (1931)
Ovidiu Costin and Joel L. Lebowitz
Gaussian fluctuations in random matrices
Physical Review Letters 75 69-72 (1995)
Alexander Soshnikov
Gaussian limits for determinantal random point fields
The Annals of Probability 30 171-181 (2002)
Brian Rider and Balint Virág
The noise in the circular law and the Gaussian free field
International Mathematical Research Notices 2:rnm006 (2007)
Djalil Chafaï
A few moments with the problem of moments
On this blog (2010)
Djalil Chafaï
Random projections, marginals, and moments
Unpublished expository notes (2007)

About Fréchet, Shohat, and Carleman. René Maurice Fréchet (1878 - 1973) was a famous French mathematician mostly known to students for his contributions to general topology. But in fact he also played an important role in the development of probability theory and mathematical statistics, on the theoretical side as well as on the practical side ! James Alexander Shohat (1886 - 1944) was an American-Russian mathematician, who is famous for his work on the moment problem, notably through his monograph with Jacob David Tamarkin (1888 - 1945). Torsten Carleman (1892 - 1949) was a Swedish mathematician, one of the most influential, best known for his works on classical analysis. He delivered the Peccot Lectures at the Collège de France in Paris in 1922-1923 on quasi-analytic functions, in French. He discovered the mean ergodic theorem independently of John von Neumann. He also worked on partial differential equations, including the Boltzmann equation. He served as director of the Mittag-Leffler Institute for more than two decades. He should not be confused with Fritz David Carlson (1888 - 1952), or Lennart Axel Edvard Carleson (1928-), two other prominent Swedish mathematicians.

Libres pensées d'un mathématicien ordinaire Posts

Back to basics : Fréchet-Shohat CLT from cumulants