Recueil de modèles aléatoires

September 21st, 2015 2 comments

Recueil de modèles aléatoires

L’été a été studieux. Suite à la réception de rapports de relecture en juillet, nous avons travaillé, avec mon vieil ami Florent Malrieu, à la confection d’une version encore plus aboutie de notre Recueil de modèles aléatoires (PDF). L’ouvrage compte 27 chapitres et 450 pages. Tous les chapitres ont été revus. Des rappels de cours ont été ajoutés en fin d’ouvrage.

Categories: Books, Probability

Enigmas of chance

September 17th, 2015 No comments

Mark Kac, Enigmas of chanceI had the great pleasure to read once again recently the autobiography of Mark Kac (1914 – 1984) entitled Enigmas of chance, published initially in 1985, shortly after the death of the author. Mark Kac, born Marek Kac, is a famous Polish-American mathematician and more precisely a great probabilist and mathematical physicist of the twentieth century. He was born in eastern Europe in a Polish family, and grew up as a young mathematician under the advisory of Hugo Steinhaus (1887 – 1972), a great figure of the Polish school of functional analysis in Lwów at that time. He moved to America after his PhD and became famous. His autobiography gives interesting information about life and research during his time. The description of his quest for the understanding of the notion of stochastic independence with Hugo Steinhaus is truly fascinating. Mark Kac is a remarkable mind, who was most of the time attracted by simple problems and deep solutions. He was in a way a problem solver without being a heavy technician, with a tasty sense of elegance. He was attracted by mathematical physics. Among other things, he produced the rigorous formalization of the notion of phase transition. He is  known for many things, including the Ten Martini Problem and Can one hear the shape of a drum? His autobiography contains several anecdotes, like this one about George Uhlenbeck (1900 – 1988):

Uhlenbeck’s attitude to Wiener’s work was brutally pragmatic and it is summarized at the end of footnote 9 in his paper (written jointly with Ming Chen Wong) “On the Theory of Brownian Motion II” : the authors are aware of the fact that in the mathematical literature, especially in papers by N. Wiener, J. L Doob, and others [cf. for instance Doob (Annals of Mathematics 43, 351 {1942}) also for further references], the notion of a random (or stochastic) process has been defined in a much more refined way. This allows [us], for instance, to determine in certain cases the probability that the random function y(t) is of bounded variation or continuous or differentiable, etc. However it seems to us that these investigations have not helped in the solution of problems of direct physical interest and we will therefore not try to give an account of them.

In America, Mark Kac worked notably for Cornell University and the Rockfeller University. For my old fellows:

When I went to the Rockfeller in 1961 I was almost forty-seven years old, an age at which one is theoretically way past one’s mathematical prime. By and large it is true, for mathematics and, even more, physics are young men’s games. And yet some of what I consider to be my best work was done during my years at the Rockfeller when I was by accepted reckoning an old man.

Another interesting autobiography of a great Polish-American mathematician & physicist is the one of Stanislaw Ulam (1909 – 1984) entitled Adventures of a Mathematician. Ulam and Kac belong to the same generation and they knew each other.

Categories: Mathematicians

About the central multinomial coefficient

August 29th, 2015 No comments
James Stirling (1692-1770)

James Stirling (1692-1770)

We have already listed many basic aspects of the multinomial law in a former post. This post is devoted to an additional remark. Recall that the multinomial law

\[ \mathcal{M}(n,p_1\delta_{e_1}+\cdots+p_d\delta_{e_d}) \]

where \( {e_1,\ldots,e_d} \) is the canonical basis of \( {\mathbb{R}^d} \) is given by

\[ (p_1\delta_{e_1}+\cdots+p_d\delta_{e_d})^{*n} =\sum_{\substack{(n_1,\ldots,n_d)\in\mathbb{N}^d\\n_1+\cdots+n_d=n}} \binom{n}{n_1,\ldots,n_d}p_1^{n_1}\cdots p_d^{n_d}\delta_{(a_1,\ldots,a_d)} \]


\[ \binom{n}{n_1,\ldots,n_d} =\frac{n!}{n_1!\cdots n_d!} \]

is the multinomial coefficient. This distribution encodes \( {n} \) throws of a dice with \( {d} \) faces.

Maximality of central multinomial coefficient. It turns out that the map

\[ (n_1,\ldots,n_d)\mapsto\binom{n}{n_1,\ldots,n_d} \]

is maximized when \( {n_1,\ldots,n_d} \) are all equal, more precisely as close as possible to each others due to discreteness. In other words, the central multinomial coefficient is the largest possible multinomial coefficient: for any \( {d\geq1} \) and \( {n\geq1} \),

\[ m_{n,d} :=\max_{\substack{(n_1,\ldots,n_d)\in\mathbb{N}^d\\n_1+\cdots+n_d=n}} \frac{n!}{n_1!\cdots n_d!} = \frac{n!}{q!^{d-r}(q+1)!^r}. \]

where \( {q:=n\mod d} \) in other words (Euclidean division)

\[ n=qd+r, \quad 0\leq r<d, \]

and in particular, if \( {n=qd} \) then

\[ m_{n,q}=m_{qd,d}=\frac{n!}{q!^d}. \]

Namely if \( {n_1+\cdots+n_d=n} \) and \( {n_1<q} \) then necessarily \( {n_k\geq q+1} \) for some \( {1\leq k\leq d} \), and then, for instance in the case \( {k=2} \) to keep things simple,

\[ \frac{n!}{n_1!\cdots n_d!} <\frac{n_2}{n_1+1}\frac{n!}{n_1!\cdots n_d!} =\frac{n!}{(n_1+1)!(n_2-1)!n_3!\cdots n_d!}. \]

This type of reasoning by monotonicity allows to establish that the maximum \( {m_{n,d}} \) is achieved when \( {q\leq n_k\leq q+1} \) for any \( {1\leq k\leq n} \), for instance by \( {n_1=\cdots=n_r=q+1} \) and \( {n_{r+1}=\cdots=n_d=q} \), hence the result.

Application to simple random walk. The simple symmetric random walk \( {{(X_n)}_{n\geq0}} \) on \( {\mathbb{Z}^d} \) is intimately associated to the multinomial law. Namely, for every \( {n\geq0} \), the law of \( {X_n-X_0} \) is the image law of the multinomial law

\[ \mathcal{M} \left(n,\sum_{x\in\{\pm e_1,\ldots,\pm e_d\}}\frac{1}{2d}\delta_x\right) \]

by the map (we throw \( {n} \) times a dice with \( {2d} \) faces which are \( {\pm e_1,\ldots,\pm e_d} \))

\[ (x_1,\ldots,x_{2d})\mapsto (x_1-x_2,\ldots,x_{2d}-x_{2d-1}). \]

The sequence \( {{(X_n)}_{n\geq0}} \) is an auto-regressive process, but also a homogeneous Markov chain with state space \( {\mathbb{Z}^d} \) and transition kernel \( {P(x,y)=\mathbb{P}(X_1=y\,|\,X_0=x)} \). The chain is irreducible and by parity has period \( {2} \). Moreover there is translation invariance and \( {P(x,y)=P(0,y-x)} \). It follows that the chain is recurrent if and only if the series \( {\sum_{n\geq1}P^{2n}(0,0)} \) is convergent. From the multinomial formula

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{(2d)^{2n}} \sum_{\substack{n_1+\cdots+n_{2d}=n\\n_1=n_2,\ldots,n_{d-1}=n_d}} \binom{2n}{n_1,\ldots,n_{2d}}\\ &=&\frac{1}{(2d)^{2n}}\sum_{n_1+\cdots+n_d=n} \frac{(2n)!}{n_1!^2\cdots n_d!^2}. \end{array} \]

Case \( {d=1} \). In this case the sum in the formula for \( {P^{(2n)}(0,0)} \) boils down to the central binomial coefficient \( {\binom{2n}{n}} \), and the Stirling formula \( {n!\sim \sqrt{2\pi n}(n/e)^n} \) gives

\[ P^{2n}(0,0) =\frac{1}{2^{2n}}\frac{(2n)!}{n!^2} \sim \frac{1}{\sqrt{\pi n}}, \]

which is not sumable in \( {n} \) and therefore the chain is recurrent.

Case \( {d=2} \). The Vandermonde convolution identity for binomial coefficients

\[ \binom{n+m}{r}=\sum_{k=0}^r\binom{n}{k}\binom{m}{r-k} \]


\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{4^{2n}}\sum_{n_1+n_2=n}\frac{(2n)!}{n_1!^2n_2!^2}\\ &=&\frac{(2n)!}{4^{2n}n!^2} \sum_{n_1+n_2=n}\left(\frac{n!}{n_1!n_2!}\right)^2\\ &=&\frac{(2n)!}{4^{2n}n!^2}\sum_{k=0}^n\binom{n}{k}^2 \\ &=&\frac{(2n)!}{4^{2n}n!^2}\frac{(2n)!}{n!^2} \underset{n\rightarrow\infty}{\sim}\frac{1}{\pi n} \end{array} \]

which is not summable and thus the chain is recurrent. Note the appearance of the squared central binomial coefficient \( {\binom{2n}{n}^2} \) in the last formula.

Case \( {d=3} \). The multinomial formula gives

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{6^{2n}}\sum_{n_1+n_2+n_3=n}\frac{(2n)!}{n_1!^2n_2!^2n_3!^2}\\ &=&\frac{(2n)!}{2^{2n}3^nn!^2}\sum_{n_1+n_2+n_3=n}\binom{n}{n_1, n_2, n_3}^2 \left(\frac{1}{3}\right)^{n}. \end{array} \]

Now if \( {n=3m} \) then \( {\binom{n}{n_1, n_2, n_3}\leq \binom{n}{m, m, m}} \) and thus, thanks to the multinomial formula of size \( {n} \) and parameter \( {(1/3,1/3,1/3)} \), we get, for \( {n=3m} \),

\[ P^{2n}(0,0) \leq \frac{(2n)!}{2^{2n}3^nn!^2}\binom{n}{m, m, m} \sim \frac{1}{2}\left(\frac{3}{\pi n}\right)^{3/2} =\frac{c}{n^{3/2}}, \]


\[ \sum_mP^{6m}(0,0)<\infty. \]

Since \( {(1/6)^{2k}P^{6m}(0,0)\leq P^{6m+2k}(0,0)} \) for \( {k=1,2} \), we get

\[ \sum_nP^{2n}(0,0)=\sum_{k=0,1,2}\sum_mP^{6m+2k}(0,0)<\infty \]

and therefore the chain is transcient.

Case \( {d\geq3} \). We have

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{(2d)^{2n}} \sum_{n_1+\cdots+n_d=n}\frac{(2n)!}{n_1!^2\cdots n_d!^2}\\ &=&\frac{(2n)!}{n!^2(4d)^n} \sum_{n_1+\cdots+n_d=n} \left(\frac{n!}{n_1!^2\cdots n_d!^2}\right)^2\left(\frac{1}{d}\right)^n. \end{array} \]

We have

\[ \begin{array}{rcl} P^{2n}(0,0) &\leq& \frac{(2n)!m_n}{n!^2(4d)^n} \sum_{n_1+\cdots+n_d=n}\frac{n!}{n_1!\cdots n_d!}\left(\frac{1}{d}\right)^n\\ &=&\frac{(2n)!m_n}{n!^2(4d)^n} \sim \frac{m_n}{d^n\sqrt{\pi n}} \end{array} \]

where \( {m_n=m_{n,d}} \) is the maximal value of the multinomial coefficient

\[ \binom{n}{n_1,\ldots,n_d}=\frac{n!}{n_1!\cdots n_d!} \]


\[ \{(n_1,\ldots,n_d)\in\mathbb{N}^d:n_1+\cdots+n_d=n\}. \]

Since \( {m_{n+1}/m_n=(n+1)/(q+1)\leq d} \), the sequence \( {{(m_n/d^n)}_{n\geq1}} \) is \( {\searrow} \) and thus

\[ \begin{array}{rcl} \sum_n\frac{m_n}{d^n\sqrt{n}} &=&\sum_q\sum_{n=qd}^{(q+1)d-1}\frac{m_n}{d^n\sqrt{n}}\\ &\leq& \sum_q\frac{1}{\sqrt{q}}\sum_{n=qd}^{(q+1)d-1}\frac{m_n}{d^n}\\ &\leq& \sum_q\frac{dm_{qd}}{d^{qd}\sqrt{q}}. \end{array} \]

Now the value of \( {m_d} \) and the Stirling formula give, as \( {q\rightarrow\infty} \),

\[ \frac{m_{qd}}{d^{qd}\sqrt{q}} =\frac{(dq)!}{d^{dq}q!^d\sqrt{q}} \sim \frac{\sqrt{d}}{(2\pi)^{(d-1)/2}q^{d/2}}. \]

But \( {d\geq3} \) and thus \( {\sum_qq^{-d/2}<\infty} \), which gives

\[ \sum_nP^{2n}(0,0)<\infty, \]

and therefore the chain is transcient.

Categories: Combinatorics, Probability

Back to basics: exchangeability

July 5th, 2015 No comments
Bruno de Finetti (1906 - 1985)

Bruno de Finetti (1906 – 1985)

This post is devoted to some aspects of exchangeability, an important concept in probability theory, explored brilliantly by the mathematician Bruno de Finetti (1906 – 1985) who was a probabilist, a statistician, and an actuary, which is very pleasant for my teaching in Paris-Dauphine, a university centered around decision making.

The law \( {P_n} \) of a random vector \( {X=(X_1,\ldots,X_n)} \) on a product space \( {\mathbb{S}^n} \) is exchangeable when it is invariant by permutation of the coordinates:

\[ \forall f:\mathbb{S}\rightarrow\mathbb{R},\quad \forall\sigma\in\Sigma_n,\quad \int\!f\,dP_n=\int\!f_\sigma\,P_n \]

where \( {f_\sigma(x):=f(x_\sigma)} \) and \( {x_\sigma:=(x_{\sigma(1)},\ldots,x_{\sigma(n)})} \), and where \( {\Sigma_n} \) is the symmetric group formed by all bijective functions from \( {\{1,\ldots,n\}} \) to itself. It is customary to say that \( {(X_1,\ldots,X_n)} \) is a finite exchangeable (random) sequence.

Mixtures of product measures. Obviously, a sequence of iid random variables forms always an exchangeable vector. More generally, if \( {X} \) is a random vector of \( {\mathbb{S}^n} \) and if there exists a sigma-field \( {\mathcal{A}} \) such that the components \( {X_1,X_2,\ldots} \) of \( {X} \) are conditionally iid given \( {\mathcal{A}} \), then the law of \( {X} \) is exchangeable. The Hewitt-Savage theorem (see below) provides a converse.

In terms of distributions, any convex combination (mixture!) of product distributions is an exchangeable distribution. More precisely, if \( {\Pi_n} \) is a probability measure on the convex set of product probability measures on \( {\mathbb{S}^n} \), then the probability measure \( {P_n} \) on \( {\mathbb{S}^n} \) defined below is exchangeable:

\[ \int\!f\,dP_n =\int\!\left(\int\!f\,d\pi\right) \,d\Pi_n(\pi), \quad f:\mathbb{S}\rightarrow\mathbb{R}. \]

Conversely, can we approximate an exchangeable distribution \( {P_n} \) on \( {\mathbb{S}^n} \) by a mixture of product distributions? The answer is positive as expressed by the Diaconis-Freedman theorem below. Namely, we start by noting that for every test function \( {f:\mathbb{S}^n\rightarrow\mathbb{R}} \), the exchangeability of \( {P_n} \) gives

\[ \int\!f\,dP_n =\frac{1}{n!}\sum_{\sigma\in\Sigma_n}\int\!f_\sigma\,dP_n =\int\!\frac{1}{n!}\sum_{\sigma\in\Sigma_n}f_\sigma\,dP_n =\int\!\left(\int\!f\,d\mu\right)\,dP_n(x) \]

where \( {\mu} \) is the discrete probability measure given by

\[ \mu:=\frac{1}{n!}\sum_{\sigma\in\Sigma_n}\delta_{(x_{\sigma_1},\ldots,x_{\sigma_n})}. \]

Now, in order to approximate \( {\mu} \) by a product measure, we drop the bijective constraint which defines \( {\Sigma_n} \), and we replace \( {\frac{1}{n!}\sum_{\sigma\in\Sigma_n}} \) by \( {\frac{1}{n^n}\sum_{\sigma\in\Sigma_n^{\#}}} \) where \( {\Sigma_n^{\#}} \) is the set of all functions from \( {\{1,\ldots,n\}} \) to itself. This leads to approximate \( {\mu} \) by

\[ \mu^{\#}:=\frac{1}{n^n}\sum_{\sigma\in\Sigma_n^{\#}} \delta_{(x_{\sigma(1)},\ldots,x_{\sigma(n)})} =\left(\frac{1}{n}\sum_{k=1}^n\delta_{x_k}\right)^{\otimes n} =:\mu_n^{\otimes n} \]

and then approximate \( {P_n} \) by the mixture \( {P_n^{\#}} \) of product measures defined by

\[ \int\!f\,dP_n^{\#} :=\int\!\left(\int\!f\,d\mu^{\#}\right)\,dP_n =\int\!\left(\int\!f\,d\mu_n^{\otimes n}\right)\,dP_n. \]

If we denote by \( {R_n} \) the law of the empirical measure \( {\mu_n} \) under \( {P_n} \) then

\[ \int\!f\,dP_n^{\#} =\int\!\left(\int\!f\,d\rho^{\otimes n}\right)\,dR_n(\rho). \]

If we denote by \( {\Pi_n} \) the law of the product measure \( {\mu_n^{\otimes n}} \) under \( {P_n} \) then

\[ \int\!f\,dP_n^{\#} =\int\!\left(\int\!f\,d\pi\right)\,d\Pi_n(\pi). \]

Obviously if \( {\Pi_n} \) is a Dirac mass then \( {P_n} \) is a product measure.

The Diaconis-Freedman theorem states that \( {P_n^{\#}} \) is a good approximation of \( {P_n} \).

The Diaconis-Freedman theorem. Recall that the total variation distance between two probability measures \( {P} \) and \( {Q} \) is defined by

\[ \left\Vert P-Q\right\Vert_{\mathrm{TV}} :=\frac{1}{2}\sup_{\left\Vert f\right\Vert_\infty\leq1} \left(\int\!f\,dP-\int\!f\,dQ\right) =\sup_{A}(P(A)-Q(A))\in[0,1]. \]

Let \( {P_n} \) be an exchangeable law on \( {\mathbb{S}^n} \), and let \( {P_n^{\#}} \) be the mixture of product distributions constructed from \( {P_n} \) as above. For every \( {1\leq k\leq n} \), if \( {P_{n,k}} \) and \( {P_{n,k}^{\#}} \) denote the \( {k} \)-dimensional marginal distribution of \( {P_n} \) and \( {P_n^{\#}} \) respectively, then

\[ \left\Vert P_{n,k}-P_{n,k}^{\#}\right\Vert_{\mathrm{TV}} \leq\frac{k(k-1)}{n}. \]

This is known as the Diaconis-Freedman inequality or theorem.

  • If \( {k>\sqrt{n}} \) then the right hand side exceeds \( {1} \) and the inequality says nothing;
  • For \( {k=1} \) the right hand side vanishes and this corresponds to the fact that

    \[ P_{n,1}=P_{n,1}^{\#}=\mathbb{E}_{P_n}(\mu_n) \]

    where \( {\mu_n=\frac{1}{n}\sum_{k=1}^n\delta_{x_k}} \) since for every test function \( {f:\mathbb{S}\rightarrow\mathbb{R}} \),

    \[ \int\!f\,dP_{n,1} =\frac{1}{n}\int\!\left(\sum_{k=1}^nf(x_k)\right)\,dP_n(x) =\int\!\left(\int\!f\,d\mu_n\right)\,dP_n; \]

    For the two dimensional marginal, we have

    \[ dP_{n,2}^{\#}(x_1,x_2) =\frac{n-1}{n}P_{n,2}(x_1,x_2) +\frac{1}{n}\mathbf{1}_{x_1=x_2}P_{n,1}(dx_1) \]

    The \( {k} \)-dimensional marginal \( {P_{n,k}} \) satisfies similar formulas for every \( {k} \).

To prove the Diaconis-Freedman theorem (or inequality) we start from the formulas that we have already obtained, on a test function \( {f:\mathbb{S}^k\rightarrow\mathbb{R}} \), namely

\[ \int\!f\,dP_{n,k} =\int\!\left(\int\!f\,d\mu_k\right)\,dP_n, \quad\text{and}\quad \int\!f\,dP_{n,k}^{\#} =\int\!\left(\int\!f\,d\mu_k^{\#}\right)\,dP_n \]

where \( {\mu_k} \) and \( {\mu_k^{\#}} \) are the \( {k} \)-dimensional marginal distributions of

\[ \mu=\frac{1}{n!}\sum_{\sigma\in\Sigma_n}\delta_{x_\sigma} \quad\text{and}\quad \mu^{\#}=\frac{1}{n^n}\sum_{\sigma\in\Sigma_n^{\#}}\delta_{x_\sigma}. \]


\[ \mu_k^{\#} =\frac{n(n-1)\cdots(n-k+1)}{n^k}\mu_k+\nu_k \]

where \( {\nu_k} \) is a positive measure, and therefore

\[ \mu_k-\mu_k^{\#}=\left(1-\frac{n(n-1)\cdots(n-k+1)}{n^k}\right)-\nu_k. \]

But since the total mass of \( {\nu_k} \) is given by

\[ \nu_k(\mathbb{S}^k)=1-\frac{n(n-1)\cdots(n-k+1)}{n^k}, \]

it follows then that

\[ \left\Vert\mu_k-\mu_k^{\#}\right\Vert_{\mathrm{TV}} \leq \frac{1}{2}\times2\left(1-\frac{n(n-1)\cdots(n-k+1)}{n^k}\right). \]

To achieve the proof the Diaconis-Freedman theorem, it remains to observe that

\[ \frac{n(n-1)\cdots(n-k+1)}{n^k} \geq\frac{(n-k+1)^k}{n^k} =\left(1-\frac{k-1}{n}\right)^k \geq1-\frac{k(k-1)}{n}. \]

The difference in nature between \( {\mu_k} \) and \( {\mu_k^{\#}} \) can be naturally understood in combinatorial terms. Namely \( {\mu_k} \) encodes the sampling of \( {k} \) balls from an urn containing \( {n} \) balls, without replacement, while \( {\mu_k^{\#}} \) does it with replacement.

When \( {\mathbb{S}} \) is finite, the upper bound can be improved as follows

\[ \left\Vert P_{n,k}-P_{n,k}^{\#}\right\Vert_{\mathrm{TV}} \leq\frac{k\min(C|\mathbb{S}|,k-1)}{n}. \]

The Hewitt-Savage theorem. Let \( {X=(X_1,X_2,\ldots)} \) be the vector of an infinite sequence of random variables on \( {\mathbb{S}} \). We say that \( {X} \) is exchangeable when its law \( {P} \) is invariant by all permutations of coordinates with finite support, meaning the set of bijective functions from \( {\mathbb{N}=\{1,2,\ldots\}} \) to itself such that all but a finite number of elements are fixed points. The law \( {P} \) is a law on the infinite product \( {\mathbb{S}^{\mathbb{N}}} \).

The Hewitt-Savage theorem states that if \( {P} \) is an exchangeable probability distribution on \( {\mathbb{S}^{\mathbb{N}}} \), then there exists a unique probability measure \( {R} \) on the set of probability measures on \( {\mathbb{S}} \), such that for every \( {n\in\mathbb{N}} \), the \( {n} \)-dimensional marginal \( {P_n} \) of \( {P} \) satisfies, for every test function \( {f:\mathbb{S}^n\rightarrow\mathbb{R}} \),

\[ \int\!f\,dP_n=\int\!\left(\int\!f\,d\rho^{\otimes n}\right)\,dR(\rho). \]

In terms of random variables, this means that there exists a sigma-field \( {\mathcal{A}} \) such that the components \( {X_1,X_2,\ldots} \) of \( {X} \) are conditionally iid given \( {\mathcal{A}} \). The Hewitt-Savage theorem can be proved by using the Diaconis-Freedman theorem: its suffices to show that the probability measure \( {R_n} \), which is the law of the empirical measure \( {\mu_n} \) under \( {P_n} \), converges weakly to some probability measure \( {R} \) on \( {\mathbb{S}} \) as \( {N\rightarrow\infty} \) (this can be done easly when \( {\mathbb{S}} \) is compact, otherwise one can use the method of moments).

The Hewitt-Savage theorem is false in general for finite sequences. This means that the law of certain finite dimensional exchangeable random vectors is not the mixture of product laws. This is fairly compatible with the fact that the Diaconis-Freedman theorem gives nothing for \( {k=n} \) (even if \( {\mathbb{S}} \) is finite!).

The Hewitt-Savage theorem for \( {S=\{0,1\}} \) is known as the de Finetti theorem. The de Finetti theorem can be stated as follows: if \( {(X_1,X_2,\ldots)} \) is an infinite random sequence of zeros and ones which is exchangeable then there exists a probability distribution \( {R} \) on \( {[0,1]} \) such that for every \( {n\geq1} \) and every \( {x_1,\ldots,x_n\in\{0,1\}} \),

\[ \mathbb{P}(X_1=x_1,\ldots,X_n=x_n) =\binom{n}{x_1+\cdots+x_n}\int\!p^{x_1+\cdots+x_n}(1-p)^{n-(x_1+\cdots+x_n)}\,dR(p). \]

A proof with martingales can be found for instance in the book of Achim Klenke (see below).

From exchangeability to asymptotic independence. Let \( {P_n} \) be an exchangeable probability measure on \( {\mathbb{S}^n} \) and suppose that under \( {P_n} \) the random empirical measure \( {\mu_n} \) tends weakly as \( {n\rightarrow\infty} \) to a non-random probability measure \( {\mu_\infty} \) on \( {\mathbb{S}} \):

\[ \mu_n\underset{n\rightarrow\infty}{\longrightarrow}\mu_\infty. \]

This is typically the case for the Coulomb gases that appear in unitary invariant random matrix ensembles for instance (in this case \( {\mathbb{S}} \) is typically \( {\mathbb{R}} \) or \( {\mathbb{C}} \) and \( {\mu_\infty} \) is continuous with a Lebesgue density). For every \( {k\leq n} \) we may write

\[ \left\Vert P_{n,k}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV} \leq \left\Vert P_{n,k}-P_{n,k}^{\#}\right\Vert_\mathrm{TV} + \left\Vert P_{n,k}^{\#}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV} \]

The first term in the right hand side can be controlled in turn using the Diaconis-Freedman theorem by \( {k(k-1)/n} \) which is meaningful when \( {k<\sqrt{n}} \). In order to control the second term in the right hand side, one may try to write

\[ \left\Vert P_{n,k}^{\#}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV} \leq \mathbb{E}_{P_N}(\left\Vert \mu_n^{\otimes k}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV}) \leq k\mathbb{E}_{P_N}(\left\Vert \mu_n-\mu_\infty\right\Vert_\mathrm{TV}). \]

Unfortunately the right hand side is equal to \( {1} \) when \( {\mu_\infty} \) is continuous since \( {\mu_n} \) is discrete. This is due to the total variation. A way to circumvent this problem is to replace the total variation distance by say a Wasserstein distance. Actually one can pass from one to another up to some loss, provided regularity or moments, and this was extensively used by Maxime Hauray and Stéphane Mischler for instance (see the reference below). One may also try to replace \( {\mathbb{E}_{P_N}\mathrm{dist}(\mu_n,\mu_\infty)} \) by \( {\mathrm{dist}(\mathbb{E}_{P_N}\mu_n,\mu_\infty)=\mathrm{dist}(P_{n,1},\mu_\infty)} \).

The weak convergence of random empirical measures of exchangeable particles when the number of particles tends to infinity is known to be equivalent to the fact that two particles taken randomly become asymptotically independent. This phenomenon is known as Tanaka’s lemma in certain communities (see the course of Alain-Sol Sznitman and the survey by Sylvie Méléard below).

Example. My favorite example of finite exchangeable sequence is given by the standard Dirichlet distribution. Namely \( {X} \) is uniformly distributed on the \( {\ell_n^1} \) sphere

\[ \{(x_1,\ldots,x_n)\in\mathbb{R}_+^n:x_1+\cdots+x_n=1\}. \]

One can realize \( {X} \) as follows:

\[ (X_1,\ldots,X_n)=\frac{(Z_1,\ldots,Z_n)}{Z_1+\cdots+Z_n} \]

where \( {Z_1,\ldots,Z_n} \) are iid with standard exponential law \( {\mathcal{E}} \) of density \( {x\mapsto e^{-x}\mathbf{1}_{\mathbb{R} +}(x)} \). Thanks to the law of large numbers \( {X_1+\cdots+X_n=n(1+o_{n\rightarrow\infty}(1))} \) almost surely, and therefore for every fixed \( {k\geq1} \) the random vector \( {n(X_1,\ldots,X_k)} \) tends in law as \( {n\rightarrow\infty} \) to the product probability measure \( {\mathcal{E}^{\otimes k}} \). If \( {P_n} \) denotes the law of \( {X} \) then the law of \( {n(X_1,\ldots,X_k)} \) is a dilation of the marginal distribution \( {P_{n,k}} \). This is a remarkable asymptotic independence property! There is an analogue for \( {\ell_n^p} \) spheres, \( {p\geq1} \), and in particular for \( {p=2} \) the exponential is replaced by the Gaussian.

Other striking examples of finite exchangeability are provided by the Ginibre Coulomb gas, and by the standard Pólya urn, but let us say that it is another story!

Further reading. The literature on exchangeability is huge and crosses several parts of mathematics (see the references below). The term “de Finetti theorem” is used for various statements connecting exchangeability with mixtures of product distributions.

Note. This post benefited from discussions with François Bolley and Joaquin Fontbona.

Categories: Analysis, Probability