About the central multinomial coefficient

August 29th, 2015 No comments
James Stirling (1692-1770)

James Stirling (1692-1770)

We have already listed many basic aspects of the multinomial law in a former post. This post is devoted to an additional remark. Recall that the multinomial law

\[ \mathcal{M}(n,p_1\delta_{e_1}+\cdots+p_d\delta_{e_d}) \]

where \( {e_1,\ldots,e_d} \) is the canonical basis of \( {\mathbb{R}^d} \) is given by

\[ (p_1\delta_{e_1}+\cdots+p_d\delta_{e_d})^{*n} =\sum_{\substack{(n_1,\ldots,n_d)\in\mathbb{N}^d\\n_1+\cdots+n_d=n}} \binom{n}{n_1,\ldots,n_d}p_1^{n_1}\cdots p_d^{n_d}\delta_{(a_1,\ldots,a_d)} \]

where

\[ \binom{n}{n_1,\ldots,n_d} =\frac{n!}{n_1!\cdots n_d!} \]

is the multinomial coefficient. This distribution encodes \( {n} \) throws of a dice with \( {d} \) faces.

Maximality of central multinomial coefficient. It turns out that the map

\[ (n_1,\ldots,n_d)\mapsto\binom{n}{n_1,\ldots,n_d} \]

is maximized when \( {n_1,\ldots,n_d} \) are all equal, more precisely as close as possible to each others due to discreteness. In other words, the central multinomial coefficient is the largest possible multinomial coefficient: for any \( {d\geq1} \) and \( {n\geq1} \),

\[ m_{n,d} :=\max_{\substack{(n_1,\ldots,n_d)\in\mathbb{N}^d\\n_1+\cdots+n_d=n}} \frac{n!}{n_1!\cdots n_d!} = \frac{n!}{q!^{d-r}(q+1)!^r}. \]

where \( {q:=n\mod d} \) in other words (Euclidean division)

\[ n=qd+r, \quad 0\leq r<d, \]

and in particular, if \( {n=qd} \) then

\[ m_{n,q}=m_{qd,d}=\frac{n!}{q!^d}. \]

Namely if \( {n_1+\cdots+n_d=n} \) and \( {n_1<q} \) then necessarily \( {n_k\geq q+1} \) for some \( {1\leq k\leq d} \), and then, for instance in the case \( {k=2} \) to keep things simple,

\[ \frac{n!}{n_1!\cdots n_d!} <\frac{n_2}{n_1+1}\frac{n!}{n_1!\cdots n_d!} =\frac{n!}{(n_1+1)!(n_2-1)!n_3!\cdots n_d!}. \]

This type of reasoning by monotonicity allows to establish that the maximum \( {m_{n,d}} \) is achieved when \( {q\leq n_k\leq q+1} \) for any \( {1\leq k\leq n} \), for instance by \( {n_1=\cdots=n_r=q+1} \) and \( {n_{r+1}=\cdots=n_d=q} \), hence the result.

Application to simple random walk. The simple symmetric random walk \( {{(X_n)}_{n\geq0}} \) on \( {\mathbb{Z}^d} \) is intimately associated to the multinomial law. Namely, for every \( {n\geq0} \), the law of \( {X_n-X_0} \) is the image law of the multinomial law

\[ \mathcal{M} \left(n,\sum_{x\in\{\pm e_1,\ldots,\pm e_d\}}\frac{1}{2d}\delta_x\right) \]

by the map (we throw \( {n} \) times a dice with \( {2d} \) faces which are \( {\pm e_1,\ldots,\pm e_d} \))

\[ (x_1,\ldots,x_{2d})\mapsto (x_1-x_2,\ldots,x_{2d}-x_{2d-1}). \]

The sequence \( {{(X_n)}_{n\geq0}} \) is an auto-regressive process, but also a homogeneous Markov chain with state space \( {\mathbb{Z}^d} \) and transition kernel \( {P(x,y)=\mathbb{P}(X_1=y\,|\,X_0=x)} \). The chain is irreducible and by parity has period \( {2} \). Moreover there is translation invariance and \( {P(x,y)=P(0,y-x)} \). It follows that the chain is recurrent if and only if the series \( {\sum_{n\geq1}P^{2n}(0,0)} \) is convergent. From the multinomial formula

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{(2d)^{2n}} \sum_{\substack{n_1+\cdots+n_{2d}=n\\n_1=n_2,\ldots,n_{d-1}=n_d}} \binom{2n}{n_1,\ldots,n_{2d}}\\ &=&\frac{1}{(2d)^{2n}}\sum_{n_1+\cdots+n_d=n} \frac{(2n)!}{n_1!^2\cdots n_d!^2}. \end{array} \]

Case \( {d=1} \). In this case the sum in the formula for \( {P^{(2n)}(0,0)} \) boils down to the central binomial coefficient \( {\binom{2n}{n}} \), and the Stirling formula \( {n!\sim \sqrt{2\pi n}(n/e)^n} \) gives

\[ P^{2n}(0,0) =\frac{1}{2^{2n}}\frac{(2n)!}{n!^2} \sim \frac{1}{\sqrt{\pi n}}, \]

which is not sumable in \( {n} \) and therefore the chain is recurrent.

Case \( {d=2} \). The Vandermonde convolution identity for binomial coefficients

\[ \binom{n+m}{r}=\sum_{k=0}^r\binom{n}{k}\binom{m}{r-k} \]

gives

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{4^{2n}}\sum_{n_1+n_2=n}\frac{(2n)!}{n_1!^2n_2!^2}\\ &=&\frac{(2n)!}{4^{2n}n!^2} \sum_{n_1+n_2=n}\left(\frac{n!}{n_1!n_2!}\right)^2\\ &=&\frac{(2n)!}{4^{2n}n!^2}\sum_{k=0}^n\binom{n}{k}^2 \\ &=&\frac{(2n)!}{4^{2n}n!^2}\frac{(2n)!}{n!^2} \underset{n\rightarrow\infty}{\sim}\frac{1}{\pi n} \end{array} \]

which is not summable and thus the chain is recurrent. Note the appearance of the squared central binomial coefficient \( {\binom{2n}{n}^2} \) in the last formula.

Case \( {d=3} \). The multinomial formula gives

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{6^{2n}}\sum_{n_1+n_2+n_3=n}\frac{(2n)!}{n_1!^2n_2!^2n_3!^2}\\ &=&\frac{(2n)!}{2^{2n}3^nn!^2}\sum_{n_1+n_2+n_3=n}\binom{n}{n_1, n_2, n_3}^2 \left(\frac{1}{3}\right)^{n}. \end{array} \]

Now if \( {n=3m} \) then \( {\binom{n}{n_1, n_2, n_3}\leq \binom{n}{m, m, m}} \) and thus, thanks to the multinomial formula of size \( {n} \) and parameter \( {(1/3,1/3,1/3)} \), we get, for \( {n=3m} \),

\[ P^{2n}(0,0) \leq \frac{(2n)!}{2^{2n}3^nn!^2}\binom{n}{m, m, m} \sim \frac{1}{2}\left(\frac{3}{\pi n}\right)^{3/2} =\frac{c}{n^{3/2}}, \]

thus

\[ \sum_mP^{6m}(0,0)<\infty. \]

Since \( {(1/6)^{2k}P^{6m}(0,0)\leq P^{6m+2k}(0,0)} \) for \( {k=1,2} \), we get

\[ \sum_nP^{2n}(0,0)=\sum_{k=0,1,2}\sum_mP^{6m+2k}(0,0)<\infty \]

and therefore the chain is transcient.

Case \( {d\geq3} \). We have

\[ \begin{array}{rcl} P^{2n}(0,0) &=&\frac{1}{(2d)^{2n}} \sum_{n_1+\cdots+n_d=n}\frac{(2n)!}{n_1!^2\cdots n_d!^2}\\ &=&\frac{(2n)!}{n!^2(4d)^n} \sum_{n_1+\cdots+n_d=n} \left(\frac{n!}{n_1!^2\cdots n_d!^2}\right)^2\left(\frac{1}{d}\right)^n. \end{array} \]

We have

\[ \begin{array}{rcl} P^{2n}(0,0) &\leq& \frac{(2n)!m_n}{n!^2(4d)^n} \sum_{n_1+\cdots+n_d=n}\frac{n!}{n_1!\cdots n_d!}\left(\frac{1}{d}\right)^n\\ &=&\frac{(2n)!m_n}{n!^2(4d)^n} \sim \frac{m_n}{d^n\sqrt{\pi n}} \end{array} \]

where \( {m_n=m_{n,d}} \) is the maximal value of the multinomial coefficient

\[ \binom{n}{n_1,\ldots,n_d}=\frac{n!}{n_1!\cdots n_d!} \]

on

\[ \{(n_1,\ldots,n_d)\in\mathbb{N}^d:n_1+\cdots+n_d=n\}. \]

Since \( {m_{n+1}/m_n=(n+1)/(q+1)\leq d} \), the sequence \( {{(m_n/d^n)}_{n\geq1}} \) is \( {\searrow} \) and thus

\[ \begin{array}{rcl} \sum_n\frac{m_n}{d^n\sqrt{n}} &=&\sum_q\sum_{n=qd}^{(q+1)d-1}\frac{m_n}{d^n\sqrt{n}}\\ &\leq& \sum_q\frac{1}{\sqrt{q}}\sum_{n=qd}^{(q+1)d-1}\frac{m_n}{d^n}\\ &\leq& \sum_q\frac{dm_{qd}}{d^{qd}\sqrt{q}}. \end{array} \]

Now the value of \( {m_d} \) and the Stirling formula give, as \( {q\rightarrow\infty} \),

\[ \frac{m_{qd}}{d^{qd}\sqrt{q}} =\frac{(dq)!}{d^{dq}q!^d\sqrt{q}} \sim \frac{\sqrt{d}}{(2\pi)^{(d-1)/2}q^{d/2}}. \]

But \( {d\geq3} \) and thus \( {\sum_qq^{-d/2}<\infty} \), which gives

\[ \sum_nP^{2n}(0,0)<\infty, \]

and therefore the chain is transcient.

Share
Categories: Combinatorics, Probability

Back to basics: exchangeability

July 5th, 2015 No comments
Bruno de Finetti (1906 - 1985)

Bruno de Finetti (1906 – 1985)

This post is devoted to some aspects of exchangeability, an important concept in probability theory, explored brilliantly by the mathematician Bruno de Finetti (1906 – 1985) who was a probabilist, a statistician, and an actuary, which is very pleasant for my teaching in Paris-Dauphine, a university centered around decision making.

The law \( {P_n} \) of a random vector \( {X=(X_1,\ldots,X_n)} \) on a product space \( {\mathbb{S}^n} \) is exchangeable when it is invariant by permutation of the coordinates:

\[ \forall f:\mathbb{S}\rightarrow\mathbb{R},\quad \forall\sigma\in\Sigma_n,\quad \int\!f\,dP_n=\int\!f_\sigma\,P_n \]

where \( {f_\sigma(x):=f(x_\sigma)} \) and \( {x_\sigma:=(x_{\sigma(1)},\ldots,x_{\sigma(n)})} \), and where \( {\Sigma_n} \) is the symmetric group formed by all bijective functions from \( {\{1,\ldots,n\}} \) to itself. It is customary to say that \( {(X_1,\ldots,X_n)} \) is a finite exchangeable (random) sequence.

Mixtures of product measures. Obviously, a sequence of iid random variables forms always an exchangeable vector. More generally, if \( {X} \) is a random vector of \( {\mathbb{S}^n} \) and if there exists a sigma-field \( {\mathcal{A}} \) such that the components \( {X_1,X_2,\ldots} \) of \( {X} \) are conditionally iid given \( {\mathcal{A}} \), then the law of \( {X} \) is exchangeable. The Hewitt-Savage theorem (see below) provides a converse.

In terms of distributions, any convex combination (mixture!) of product distributions is an exchangeable distribution. More precisely, if \( {\Pi_n} \) is a probability measure on the convex set of product probability measures on \( {\mathbb{S}^n} \), then the probability measure \( {P_n} \) on \( {\mathbb{S}^n} \) defined below is exchangeable:

\[ \int\!f\,dP_n =\int\!\left(\int\!f\,d\pi\right) \,d\Pi_n(\pi), \quad f:\mathbb{S}\rightarrow\mathbb{R}. \]

Conversely, can we approximate an exchangeable distribution \( {P_n} \) on \( {\mathbb{S}^n} \) by a mixture of product distributions? The answer is positive as expressed by the Diaconis-Freedman theorem below. Namely, we start by noting that for every test function \( {f:\mathbb{S}^n\rightarrow\mathbb{R}} \), the exchangeability of \( {P_n} \) gives

\[ \int\!f\,dP_n =\frac{1}{n!}\sum_{\sigma\in\Sigma_n}\int\!f_\sigma\,dP_n =\int\!\frac{1}{n!}\sum_{\sigma\in\Sigma_n}f_\sigma\,dP_n =\int\!\left(\int\!f\,d\mu\right)\,dP_n(x) \]

where \( {\mu} \) is the discrete probability measure given by

\[ \mu:=\frac{1}{n!}\sum_{\sigma\in\Sigma_n}\delta_{(x_{\sigma_1},\ldots,x_{\sigma_n})}. \]

Now, in order to approximate \( {\mu} \) by a product measure, we drop the bijective constraint which defines \( {\Sigma_n} \), and we replace \( {\frac{1}{n!}\sum_{\sigma\in\Sigma_n}} \) by \( {\frac{1}{n^n}\sum_{\sigma\in\Sigma_n^{\#}}} \) where \( {\Sigma_n^{\#}} \) is the set of all functions from \( {\{1,\ldots,n\}} \) to itself. This leads to approximate \( {\mu} \) by

\[ \mu^{\#}:=\frac{1}{n^n}\sum_{\sigma\in\Sigma_n^{\#}} \delta_{(x_{\sigma(1)},\ldots,x_{\sigma(n)})} =\left(\frac{1}{n}\sum_{k=1}^n\delta_{x_k}\right)^{\otimes n} =:\mu_n^{\otimes n} \]

and then approximate \( {P_n} \) by the mixture \( {P_n^{\#}} \) of product measures defined by

\[ \int\!f\,dP_n^{\#} :=\int\!\left(\int\!f\,d\mu^{\#}\right)\,dP_n =\int\!\left(\int\!f\,d\mu_n^{\otimes n}\right)\,dP_n. \]

If we denote by \( {R_n} \) the law of the empirical measure \( {\mu_n} \) under \( {P_n} \) then

\[ \int\!f\,dP_n^{\#} =\int\!\left(\int\!f\,d\rho^{\otimes n}\right)\,dR_n(\rho). \]

If we denote by \( {\Pi_n} \) the law of the product measure \( {\mu_n^{\otimes n}} \) under \( {P_n} \) then

\[ \int\!f\,dP_n^{\#} =\int\!\left(\int\!f\,d\pi\right)\,d\Pi_n(\pi). \]

Obviously if \( {\Pi_n} \) is a Dirac mass then \( {P_n} \) is a product measure.

The Diaconis-Freedman theorem states that \( {P_n^{\#}} \) is a good approximation of \( {P_n} \).

The Diaconis-Freedman theorem. Recall that the total variation distance between two probability measures \( {P} \) and \( {Q} \) is defined by

\[ \left\Vert P-Q\right\Vert_{\mathrm{TV}} :=\frac{1}{2}\sup_{\left\Vert f\right\Vert_\infty\leq1} \left(\int\!f\,dP-\int\!f\,dQ\right) =\sup_{A}(P(A)-Q(A))\in[0,1]. \]

Let \( {P_n} \) be an exchangeable law on \( {\mathbb{S}^n} \), and let \( {P_n^{\#}} \) be the mixture of product distributions constructed from \( {P_n} \) as above. For every \( {1\leq k\leq n} \), if \( {P_{n,k}} \) and \( {P_{n,k}^{\#}} \) denote the \( {k} \)-dimensional marginal distribution of \( {P_n} \) and \( {P_n^{\#}} \) respectively, then

\[ \left\Vert P_{n,k}-P_{n,k}^{\#}\right\Vert_{\mathrm{TV}} \leq\frac{k(k-1)}{n}. \]

This is known as the Diaconis-Freedman inequality or theorem.

  • If \( {k>\sqrt{n}} \) then the right hand side exceeds \( {1} \) and the inequality says nothing;
  • For \( {k=1} \) the right hand side vanishes and this corresponds to the fact that

    \[ P_{n,1}=P_{n,1}^{\#}=\mathbb{E}_{P_n}(\mu_n) \]

    where \( {\mu_n=\frac{1}{n}\sum_{k=1}^n\delta_{x_k}} \) since for every test function \( {f:\mathbb{S}\rightarrow\mathbb{R}} \),

    \[ \int\!f\,dP_{n,1} =\frac{1}{n}\int\!\left(\sum_{k=1}^nf(x_k)\right)\,dP_n(x) =\int\!\left(\int\!f\,d\mu_n\right)\,dP_n; \]

    For the two dimensional marginal, we have

    \[ dP_{n,2}^{\#}(x_1,x_2) =\frac{n-1}{n}P_{n,2}(x_1,x_2) +\frac{1}{n}\mathbf{1}_{x_1=x_2}P_{n,1}(dx_1) \]

    The \( {k} \)-dimensional marginal \( {P_{n,k}} \) satisfies similar formulas for every \( {k} \).

To prove the Diaconis-Freedman theorem (or inequality) we start from the formulas that we have already obtained, on a test function \( {f:\mathbb{S}^k\rightarrow\mathbb{R}} \), namely

\[ \int\!f\,dP_{n,k} =\int\!\left(\int\!f\,d\mu_k\right)\,dP_n, \quad\text{and}\quad \int\!f\,dP_{n,k}^{\#} =\int\!\left(\int\!f\,d\mu_k^{\#}\right)\,dP_n \]

where \( {\mu_k} \) and \( {\mu_k^{\#}} \) are the \( {k} \)-dimensional marginal distributions of

\[ \mu=\frac{1}{n!}\sum_{\sigma\in\Sigma_n}\delta_{x_\sigma} \quad\text{and}\quad \mu^{\#}=\frac{1}{n^n}\sum_{\sigma\in\Sigma_n^{\#}}\delta_{x_\sigma}. \]

Now

\[ \mu_k^{\#} =\frac{n(n-1)\cdots(n-k+1)}{n^k}\mu_k+\nu_k \]

where \( {\nu_k} \) is a positive measure, and therefore

\[ \mu_k-\mu_k^{\#}=\left(1-\frac{n(n-1)\cdots(n-k+1)}{n^k}\right)-\nu_k. \]

But since the total mass of \( {\nu_k} \) is given by

\[ \nu_k(\mathbb{S}^k)=1-\frac{n(n-1)\cdots(n-k+1)}{n^k}, \]

it follows then that

\[ \left\Vert\mu_k-\mu_k^{\#}\right\Vert_{\mathrm{TV}} \leq \frac{1}{2}\times2\left(1-\frac{n(n-1)\cdots(n-k+1)}{n^k}\right). \]

To achieve the proof the Diaconis-Freedman theorem, it remains to observe that

\[ \frac{n(n-1)\cdots(n-k+1)}{n^k} \geq\frac{(n-k+1)^k}{n^k} =\left(1-\frac{k-1}{n}\right)^k \geq1-\frac{k(k-1)}{n}. \]

The difference in nature between \( {\mu_k} \) and \( {\mu_k^{\#}} \) can be naturally understood in combinatorial terms. Namely \( {\mu_k} \) encodes the sampling of \( {k} \) balls from an urn containing \( {n} \) balls, without replacement, while \( {\mu_k^{\#}} \) does it with replacement.

When \( {\mathbb{S}} \) is finite, the upper bound can be improved as follows

\[ \left\Vert P_{n,k}-P_{n,k}^{\#}\right\Vert_{\mathrm{TV}} \leq\frac{k\min(C|\mathbb{S}|,k-1)}{n}. \]

The Hewitt-Savage theorem. Let \( {X=(X_1,X_2,\ldots)} \) be the vector of an infinite sequence of random variables on \( {\mathbb{S}} \). We say that \( {X} \) is exchangeable when its law \( {P} \) is invariant by all permutations of coordinates with finite support, meaning the set of bijective functions from \( {\mathbb{N}=\{1,2,\ldots\}} \) to itself such that all but a finite number of elements are fixed points. The law \( {P} \) is a law on the infinite product \( {\mathbb{S}^{\mathbb{N}}} \).

The Hewitt-Savage theorem states that if \( {P} \) is an exchangeable probability distribution on \( {\mathbb{S}^{\mathbb{N}}} \), then there exists a unique probability measure \( {R} \) on the set of probability measures on \( {\mathbb{S}} \), such that for every \( {n\in\mathbb{N}} \), the \( {n} \)-dimensional marginal \( {P_n} \) of \( {P} \) satisfies, for every test function \( {f:\mathbb{S}^n\rightarrow\mathbb{R}} \),

\[ \int\!f\,dP_n=\int\!\left(\int\!f\,d\rho^{\otimes n}\right)\,dR(\rho). \]

In terms of random variables, this means that there exists a sigma-field \( {\mathcal{A}} \) such that the components \( {X_1,X_2,\ldots} \) of \( {X} \) are conditionally iid given \( {\mathcal{A}} \). The Hewitt-Savage theorem can be proved by using the Diaconis-Freedman theorem: its suffices to show that the probability measure \( {R_n} \), which is the law of the empirical measure \( {\mu_n} \) under \( {P_n} \), converges weakly to some probability measure \( {R} \) on \( {\mathbb{S}} \) as \( {N\rightarrow\infty} \) (this can be done easly when \( {\mathbb{S}} \) is compact, otherwise one can use the method of moments).

The Hewitt-Savage theorem is false in general for finite sequences. This means that the law of certain finite dimensional exchangeable random vectors is not the mixture of product laws. This is fairly compatible with the fact that the Diaconis-Freedman theorem gives nothing for \( {k=n} \) (even if \( {\mathbb{S}} \) is finite!).

The Hewitt-Savage theorem for \( {S=\{0,1\}} \) is known as the de Finetti theorem. The de Finetti theorem can be stated as follows: if \( {(X_1,X_2,\ldots)} \) is an infinite random sequence of zeros and ones which is exchangeable then there exists a probability distribution \( {R} \) on \( {[0,1]} \) such that for every \( {n\geq1} \) and every \( {x_1,\ldots,x_n\in\{0,1\}} \),

\[ \mathbb{P}(X_1=x_1,\ldots,X_n=x_n) =\binom{n}{x_1+\cdots+x_n}\int\!p^{x_1+\cdots+x_n}(1-p)^{n-(x_1+\cdots+x_n)}\,dR(p). \]

A proof with martingales can be found for instance in the book of Achim Klenke (see below).

From exchangeability to asymptotic independence. Let \( {P_n} \) be an exchangeable probability measure on \( {\mathbb{S}^n} \) and suppose that under \( {P_n} \) the random empirical measure \( {\mu_n} \) tends weakly as \( {n\rightarrow\infty} \) to a non-random probability measure \( {\mu_\infty} \) on \( {\mathbb{S}} \):

\[ \mu_n\underset{n\rightarrow\infty}{\longrightarrow}\mu_\infty. \]

This is typically the case for the Coulomb gases that appear in unitary invariant random matrix ensembles for instance (in this case \( {\mathbb{S}} \) is typically \( {\mathbb{R}} \) or \( {\mathbb{C}} \) and \( {\mu_\infty} \) is continuous with a Lebesgue density). For every \( {k\leq n} \) we may write

\[ \left\Vert P_{n,k}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV} \leq \left\Vert P_{n,k}-P_{n,k}^{\#}\right\Vert_\mathrm{TV} + \left\Vert P_{n,k}^{\#}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV} \]

The first term in the right hand side can be controlled in turn using the Diaconis-Freedman theorem by \( {k(k-1)/n} \) which is meaningful when \( {k<\sqrt{n}} \). In order to control the second term in the right hand side, one may try to write

\[ \left\Vert P_{n,k}^{\#}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV} \leq \mathbb{E}_{P_N}(\left\Vert \mu_n^{\otimes k}-\mu_\infty^{\otimes k}\right\Vert_\mathrm{TV}) \leq k\mathbb{E}_{P_N}(\left\Vert \mu_n-\mu_\infty\right\Vert_\mathrm{TV}). \]

Unfortunately the right hand side is equal to \( {1} \) when \( {\mu_\infty} \) is continuous since \( {\mu_n} \) is discrete. This is due to the total variation. A way to circumvent this problem is to replace the total variation distance by say a Wasserstein distance. Actually one can pass from one to another up to some loss, provided regularity or moments, and this was extensively used by Maxime Hauray and Stéphane Mischler for instance (see the reference below). One may also try to replace \( {\mathbb{E}_{P_N}\mathrm{dist}(\mu_n,\mu_\infty)} \) by \( {\mathrm{dist}(\mathbb{E}_{P_N}\mu_n,\mu_\infty)=\mathrm{dist}(P_{n,1},\mu_\infty)} \).

The weak convergence of random empirical measures of exchangeable particles when the number of particles tends to infinity is known to be equivalent to the fact that two particles taken randomly become asymptotically independent. This phenomenon is known as Tanaka’s lemma in certain communities (see the course of Alain-Sol Sznitman and the survey by Sylvie Méléard below).

Example. My favorite example of finite exchangeable sequence is given by the standard Dirichlet distribution. Namely \( {X} \) is uniformly distributed on the \( {\ell_n^1} \) sphere

\[ \{(x_1,\ldots,x_n)\in\mathbb{R}_+^n:x_1+\cdots+x_n=1\}. \]

One can realize \( {X} \) as follows:

\[ (X_1,\ldots,X_n)=\frac{(Z_1,\ldots,Z_n)}{Z_1+\cdots+Z_n} \]

where \( {Z_1,\ldots,Z_n} \) are iid with standard exponential law \( {\mathcal{E}} \) of density \( {x\mapsto e^{-x}\mathbf{1}_{\mathbb{R} +}(x)} \). Thanks to the law of large numbers \( {X_1+\cdots+X_n=n(1+o_{n\rightarrow\infty}(1))} \) almost surely, and therefore for every fixed \( {k\geq1} \) the random vector \( {n(X_1,\ldots,X_k)} \) tends in law as \( {n\rightarrow\infty} \) to the product probability measure \( {\mathcal{E}^{\otimes k}} \). If \( {P_n} \) denotes the law of \( {X} \) then the law of \( {n(X_1,\ldots,X_k)} \) is a dilation of the marginal distribution \( {P_{n,k}} \). This is a remarkable asymptotic independence property! There is an analogue for \( {\ell_n^p} \) spheres, \( {p\geq1} \), and in particular for \( {p=2} \) the exponential is replaced by the Gaussian.

Other striking examples of finite exchangeability are provided by the Ginibre Coulomb gas, and by the standard Pólya urn, but let us say that it is another story!

Further reading. The literature on exchangeability is huge and crosses several parts of mathematics (see the references below). The term “de Finetti theorem” is used for various statements connecting exchangeability with mixtures of product distributions.

Note. This post benefited from discussions with François Bolley and Joaquin Fontbona.

Share
Categories: Analysis, Probability

Nous sommes tous des faussaires

June 27th, 2015 No comments

Guy Ribes - Autoportrait d'un faussaire

« Comme pour n’importe quel tableau, chaque faux suppose une recherche. Un peintre, qu’il soit faussaire ou non, est, toute sa vie, un étudiant qui se met à l’école de l’art. Je n’ai pas fait un seul Matisse ou un seul Renoir, je les ai faits en dix exemplaires, jusqu’à ce que je comprenne leur nature et leur sens. On ne peut pas créer une œuvre de cette importance du premier coup. On commet des erreurs, on détruit, on recommence… » Guy Ribes (1948 – ) in Autoportrait d’un faussaire, p. 120, 2015.

La lecture de l’autoportrait de Guy Ribes est passionnante ! On y trouve le récit d’une vie romanesque, une description sans fard du marché de l’art, mais aussi, au fil des pages, une réflexion pleine de fraîcheur sur la création et l’originalité. Peut-on classer les artistes en créateurs et interprètes ? La réalité est beaucoup plus complexe, en peinture comme en musique ou en mathématiques. D’une certaine manière, les mathématiciens sont eux aussi des faussaires, créant bien souvent des œuvres à la manière de, comme le faisait Guy Ribes.

« Qu’est-ce que, au fond, un peintre ? C’est un collectionneur qui veut se constituer une collection en faisant lui-même les tableaux qu’il aime chez les autres. C’est comme ça que je commence et ça devient autre chose. … C’est par ce qu’on ne réussit pas à imiter un maître qu’on fait quelque chose d’original. » Pablo Picasso (1881 – 1973), in Daniel-Henry Kahnweiler Huit entretiens avec Picasso, Le Point, Mulhouse, n°XLII, oct. 1952, p. 22-30.

Share
Categories: Mathematicians, Society

Kirchhoff determinant and Markov chain tree formulas

June 21st, 2015 No comments
Gustav Robert Kirchhoff

Gustav Robert Kirchhoff (1824 – 1887)

This post is devoted to the Gustav Kirchhoff formula which expresses the invariant measure of an irreducible finite Markov chain in terms of spanning trees. Many of us have already encountered the name of Gustav Kirchhoff in Physics classes when studying electricity.

Let \( {X={(X_t)}_{t\geq0}} \) be a continuous time Markov chain with finite state space \( {V=\{1,\ldots,n\}} \), and with infinitesimal generator or Q-matrix \( {Q} \), meaning that

\[ Q_{i,j}=\partial_{t=0^+}\mathbb{P}(X_t=j|X_0=i), \quad \mathbb{P}(X_t=j|X_0=i)=\left(e^{tQ}\right)_{i,j}. \]

The off-diagonal entries of \( {Q} \) are non-negative and the rows of \( {Q} \) have zero sum:

\[ \forall i,j\in V, \quad i\neq j, \quad Q_{i,j}\geq0, \quad Q_{i,i}=-\sum_{k\neq i}Q_{i,k}. \]

The skeleton graph \( {G=(V,E)} \) of \( {Q} \) is the oriented graph defined by

\[ (i,j)\in E\quad\mathrm{iff}\quad Q_{i,j}>0. \]

Let us assume that \( {Q} \) is irreducible, which means that \( {G} \) is connected. The Perron-Frobenius theorem states then that there exists a unique invariant probability measure, in other words \( {\mathrm{ker}(Q^\top)} \) contains up to scaling a unique vector with positive coordinates. We will provide an explicit formula for it!

Since the sum of the columns of \( {Q} \) is zero, it follows that for every \( {i,j\in V} \),

\[ (\mathrm{cof}(-Q))_{i,j}:=(-1)^{i+j}\det(-Q^{(i,j)})=\det(-Q^{(i,i)})=:\mu_i. \]

In other words the cofactors matrix of \( {-Q} \) has constant rows (identical columns). Now by expanding the determinant along an arbitrary column \( {j\in V} \) we get

\[ 0=\det(-Q)=-\sum_{i=1}^nQ_{i,j}(-1)^{i+j}\det(-Q^{(i,j)}) =-\sum_{i\in V}Q_{i,j}\mu_i \]

which shows that the vector \( {{(\mu_i)}_{i\in V}} \) is invariant for \( {Q} \). The fact that \( {{(\mu_i)}_{i\in V}} \) has positive coordinates follows from the Kirchhoff formula

\[ \mu_i:=\det(-Q^{(i,i)})=\sum_{T\in\mathcal{T}_i}\prod_{(j,k)\in T}Q_{j,k} \]

where the sum runs over the set \( {\mathcal{T}_i} \) of spanning trees of \( {G} \) rooted at \( {i} \). Recall that a spanning tree or covering tree of \( {G} \) rooted at \( {i\in V} \) is a sub-graph of \( {G} \) with edge set \( {V} \) which is a tree rooted at \( {i} \). Note that \( {\mathcal{T}_i} \) is non-empty since \( {G} \) is connected. Note that \( {\mathcal{T}_1,\ldots,\mathcal{T}_n} \) do not have necessarily the same cardinal due to the fact that the edges are oriented: a covering tree rooted at \( {i} \) contains any \( {j} \) as a vertex but not necessarily as a possible root due to the arrows.

To prove the Kirchhoff formula, one can first label from \( {1} \) to \( {m=|E|} \) the edges of \( {G} \), and define an \( {n\times m} \) matrix \( {L} \) called the incidence matrix as follows: if \( {(i,j)} \) is the \( {k} \)-th edge of \( {G} \) then \( {L_{i,k}=Q_{i,j}^{1/2}} \), \( {L_{j,k}=-Q_{i,j}^{1/2}} \), and \( {L_{\cdot,k}=0} \) otherwise. Now

\[ -Q=LL^\top. \]

We also have \( {Q^{(i,i)}=L^{(i)}(L^{(i)})^\top} \). Next we use the Cauchy-Binet formula

\[ \det(-Q^{(i,i)}) =\sum_{s}\det(L^{(i,s)})^2 \]

where \( {s} \) runs over the subsets of cardinality \( {m-(n-1)} \). Every \( {s} \) specifies \( {n-1} \) edges of \( {G} \). It can then be shown that these edges induce a spanning tree iif \( {\det(L^{(i,s)})\neq0} \).

Random walk. Suppose now that \( {Q_{i,j}\in\{0,1\}} \) for every \( {i,j\in V} \) with \( {i\neq j} \). Then \( {X} \) is the nearest neighbors random walk on \( {G} \), the diagonal of \( {-Q} \) is formed by the degree sequence of \( {G} \), and \( {Q} \) is the Laplacian \( {\Delta_G} \) of \( {G} \). In this case \( {Q_{j,k}=1} \) for every \( {(j,k)\in G} \) and thus the Kirchhoff formula above boils down to

\[ \forall i\in V,\quad \mu_i=|\mathcal{T}_i|. \]

Suppose now that \( {Q} \) is additionally symmetric, meaning that \( {G} \) is not oriented in the sense that \( {(j,k)\in E} \) iff \( {(k,j)\in E} \). In this case the invariant probability measure of \( {Q} \) is then the uniform law on \( {V} \). Due to the absence of orientation, the sets \( {\mathcal{T}_1,\ldots,\mathcal{T}_n} \) have the same cardinal \( {\kappa} \), which is the number of spanning trees of \( {G} \) (without rooting). We have

\[ \forall i,j\in V,\quad \mu_i=(\mathrm{cof}(-Q))_{i,j}=(-1)^{i+j}\det(-Q^{(i,j)}) =|\mathcal{T}|_i=\kappa. \]

In other words the cofactors matrix of the Laplacian of a non-oriented graph has constant entries equal to the number of spanning trees of the graph.

Let us connect \( {\kappa} \) with the eigenvalues of the the symmetric matrix \( {-Q} \), numbered as

\[ 0=\lambda_1<\lambda_2\leq\cdots\leq \lambda_n. \]

Here \( {\lambda_1} \) is simple since \( {Q} \) is irreducible. The characteristic polynomial of \( {-Q} \) is

\[ P_{-Q}(x)=\det(xI_n+Q)=x(x-\lambda_2)\cdots(x-\lambda_n). \]

By looking at the coefficient of \( {x} \) in this polynomial we get (Jacobi formula)

\[ (-1)^{n-1}\lambda_2\cdots\lambda_n =(-1)^{n-1}\mathrm{Tr}(\mathrm{cof}(-Q)) \]

which gives finally the remarkable Kirchhoff formula

\[ \lambda_2\cdots\lambda_n=n\kappa. \]

(note that the left hand side is thus necessarily an integer!). Let us show how to recover from this formula the so-called Cayley formula which states that the complete graph \( {K_n} \) possesses \( {n^{n-2}} \) covering trees. In this case \( {G=K_n} \) and \( {-Q=nI-J} \) where \( {J} \) is the \( {n\times n} \) matrix full of \( {1} \). The matrix \( {-J} \) has rank \( {1} \), its eigenvalues are \( {0} \) with multiplicity \( {n-1} \) and \( {-n} \) with multiplicity \( {1} \), and thus it follows that the eigenvalues of \( {-Q} \) are \( {0} \) with multiplicity \( {1} \) and \( {n} \) with multiplicity \( {n-1} \). Hence \( {\lambda_2=\cdots=\lambda_n=n} \) and finally \( {\kappa=n^{n-2}} \).

Wilson algorithm. David Wilson invented an exact algorithm to generate a spanning tree of \( {G} \) which follows the uniform distribution on the set of spanning trees of \( {G} \). Starting from vertex \( {1} \), the algorithm consists in setting \( {A_1=\{1\}} \), and to simulate a path of the simple random walk on \( {G} \) started from vertex \( {2} \) until it reaches \( {A_1} \). Then \( {A_2} \) is the union of \( {A_1} \) and of the loop erased version of this path. If \( {A_2=V} \), the algorithm is stopped. If not, then we redo the same starting this time the path from smallest vertex not in \( {A_2} \), stopping the path when it reaches \( {A_2} \), and so on. The Wilson algorithm can be naturally related to the Green function of the random walk on \( {G} \) with Dirichlet boundary conditions on a subset of \( {V} \).

Markov matrix tree theorem. The Kirchhoff formula provides an exact and non-asymptotic formula for the invariant probability measure of a finite Markov chain (this is sometimes referred to as the Kirchhoff Markov matrix tree theorem). This is remarkable, and constitutes an alternative to the asymptotic formula

\[ \forall i,j\in V,\quad \mu_j=\left(\lim_{m\rightarrow\infty}\frac{1}{m}\sum_{k=0}^{m-1} e^{kQ}\right)_{i,j}. \]

Further reading.

Share