Back to basics – Multinomial law – Libres pensées d'un mathématicien ordinaire

This post is devoted to elementary combinatorial aspects of the multinomial law. We start with the Boltzmann picture, which provides a nice interpretation of both the multinomial coefficient, the number of terms in the multinomial formula, and the corresponding sum. By the way, if you really like Physics, you may also read the previous post about bosons an fermions.

Boltzmann gas. Let us consider a system of \( {n} \) distinguishable particles, each of them being in one of the \( {r} \) possible energy levels. Let us denote by \( {n_i} \) the number of particles being in level \( {i} \). The macroscopic state of the system is formed by the sequence of numbers \( {(n_1,\ldots,n_r)} \) and we have \( {n_1+\cdots+n_r=n} \).

Number of macroscopic states. The number of possible macroscopic states is

\[ \mathrm{card}\{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r:n_1+\cdots+n_r=n\} =\binom{n+r-1}{r-1} =\binom{n+r-1}{n}. \]

This counting corresponds also to the…

number of terms in the multinomial formula
\[ (a_1+\cdots+a_r)^n=\sum_{\substack{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r\\n_1+\cdots+n_r=n}} \frac{n!}{n_1!\cdots n_r!}a_1^{n_1}\cdots a_r^{n_r}; \]
number of possible results in the extraction with replacement of \( {n} \) balls from an urn containing \( {r} \) distinguishable balls;
number of ways to throw \( {n} \) indistinguishable balls into \( {r} \) distinguishable boxes.

To derive the formula given above, we may use for instance the last interpretation and align the \( {n} \) balls between two vertical sticks. We may then add \( {r-1} \) additional sticks in order to delimit the \( {r} \) boxes. There are \( {n+1} \) ways to position the first additional stick, \( {n+2} \) ways for the second additional stick, etc, and \( {n+r-1} \) ways to position the \( {(r-1)} \)-th additional stick. Since there are \( {(r-1)!} \) ways to permute these sticks, we obtain at the end \( {\frac{(n+1)\cdots(n+r-1)}{(r-1)!}=\binom{n+r-1}{r-1}=\binom{n+r-1}{n}} \), as expected.

Number of miscroscopic states. For a fixed macroscopic state \( {(n_1,\ldots,n_r)} \) with \( {0\leq n_1,\ldots,n_r\leq n} \) and \( {n_1+\cdots+n_r=n} \), the number of microscopic states corresponds to the degree of freedom of the system, and is denoted \( {\binom{n}{n_1,\ldots,n_r}} \). Since the particles are distinguishable, we have

\[ \binom{n}{n_1,\ldots,n_r} :=\mathrm{card}\left\{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n: \forall i\in\{1,\ldots,r\}, \mathrm{card}\{1\leq k\leq n:b_k=i\}=n_i \right\}. \]

There are \( {\binom{n}{n_1}} \) ways to select the particles which are in level \( {1} \), \( {\binom{n-n_1}{n_2}} \) ways to select the particles which are in level \( {2} \), etc, and thus, using the fact that \( {n_r=n-n_1-\cdots -n_{r-1}} \),

\[ \binom{n}{n_1,\ldots,n_r} =\binom{n}{n_1}\binom{n-n_1}{n_2}\cdots\binom{n-n_1-\cdots -n_{r-1}}{n_r} =\frac{n!}{n_1!\cdots n_r!}. \]

This is the multinomial coefficient! The total number of miscroscopic states can be obtained by summing over all macroscopic states the corresponding number of microscopic states, which gives, thanks to the multinomial formula:

\[ \sum_{\substack{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r\\n_1+\cdots+n_r=n}}\frac{n!}{n_1!\cdots n_r!}=(\underbrace{1+\cdots+1}_{r\mbox{ times}})^n=r^n. \]

This number is not a surprise: \( {n} \) distinguishable particles each in \( {r} \) possible levels.

Boltzmann-Shannon entropy. If we let \( {n\rightarrow\infty} \) in such a way that \( {n_i/n\rightarrow p_i} \) for every \( {1\leq i\leq r} \), then \( {p_1+\cdots+p_r=1} \) and thanks to the Stirling formula (\( {n!\sim\sqrt{2\pi n}(n/e)^n} \)), the asymptotic logarithmic degree of freedom per particle is

\[ H(p_1,\ldots,p_r):=\lim_{\substack{n\rightarrow\infty\\(n_1,\ldots,n_r)/r\rightarrow(p_1,\ldots,p_r)}} \frac{\ln\binom{n}{n_1,\ldots,n_r}}{n} =\sum_{i=1}^rp_i\ln\frac{1}{p_i}. \]

In other words, when \( {n\rightarrow\infty} \) and \( {(n_1/n,\ldots,n_r/n)\rightarrow(p_1,\ldots,p_r)} \), we have

\[ \binom{n}{n_1,\ldots,n_r} = e^{nH\left(p_1,\ldots,p_r\right)+o(1)}. \]

The quantity \( {H(p_1,\ldots,p_r)} \) is the Boltzmann-Shannon entropy of the discrete distribution \( {(p_1,\ldots,p_r)} \). In Shannon information theory, \( {H(p_1,\ldots,p_r)} \) is the mean information per symbol in a random message written in an alphabet of \( {r} \) symbols of frequencies \( {p_1,\ldots,p_r} \). Note that both Boltzmann and Shannon have a double n in their names, a sort of combinatorial signature! The function \( {H} \) defined on the simplex

\[ \{(p_1,\ldots,p_r)\in[0,1]^r:p_1+\cdots+p_r=1\} \]

is strictly concave since \( {t\in[0,1]\mapsto t\ln(t)} \) is strictly convex. It achieves its minimum \( {0} \) at the extremal points of the simplex (\( {p} \) is then a Dirac mass, and this corresponds to certainty), and its maximum \( {\ln(n)/n} \) at the center of the simplex (\( {p} \) is then the uniform law \( {p=(1/n,\ldots,1/n)} \), and this corresponds to maximum uncertainty).

Let us fix a real number \( {m\geq1} \). Over the discrete laws \( {(p_1,p_2,\ldots)} \) on the integers \( {\{1,2,\ldots\}} \) with mean \( {m} \), the entropy \( {H} \) is maximized by the geometric distribution of mean \( {m} \). To see it, one may use Lagrange multipliers with the constraints:

\[ \sum_ip_i=1, \sum_iip_i=m, 0\leq p_i\leq1 \]

which gives \( {\partial_iH(P)=-\ln(p_i)-1=\alpha+\beta i} \) leading to \( {p_i=e^{-(1+\alpha)}e^{-\beta i}} \).

The multinomial law from a dice. Let us consider a dice with \( {r} \) faces and let us denote by \( {p_1,\ldots,p_r} \) the probabilities of the faces. We throw this dice \( {n} \) times and we encode the results into a sequence \( {X_1,\ldots,X_n} \) of independent and identically distributed random variables taking values in \( {\{e_1,\ldots,e_r\}} \) where \( {e_1,\ldots,e_r} \) is the canonical basis of \( {\mathbb{R}^r} \), with law \( {(p_1,\ldots,p_r)} \) meaning that \( {\mathbb{P}(X_k=e_i)=p_i} \) for every \( {1\leq i\leq r} \) and every \( {1\leq k\leq n} \). The whole result of the \( {n} \) throws is summarized by \( {S:=X_1+\cdots+X_n} \) which is a random variable taking its values in

\[ \{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r:n_1+\cdots+n_r=n\}\subset\mathbb{R}^r. \]

The law of \( {S} \) is given by counting (microstates of a macrostate: we have to place the \( {n_1} \) faces of type \( {1} \), \( {n_2} \) faces of type \( {2} \), etc, among the \( {n} \) trials)

\[ \begin{array}{rcl} \mathbb{P}(S=(n_1,\ldots,n_r)) &=& \sum_{\substack{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n\\\forall i,\mathrm{card}\{k:b_k=i\}=n_i}} \mathbb{P}(X_1=b_1,\ldots,X_n=b_n)\\ &=&\sum_{\substack{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n\\\forall i,\mathrm{card}\{k:b_k=i\}=n_i}} p_1^{n_1}\cdots p_r^{n_r}\\ &=&p_1^{n_1}\cdots p_r^{n_r}\sum_{\substack{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n\\\forall i,\mathrm{card}\{k:b_k=i\}=n_i}}1\\ &=&p_1^{n_1}\cdots p_r^{n_r}\binom{n}{n_1}\binom{n-n_1}{n_2}\cdots\binom{n-(n_1+\cdots+n_{r-1})}{n_r}\\ &=&p_1^{n_1}\cdots p_r^{n_r}\frac{n!}{n_1!\cdots n_r!}. \end{array} \]

This is the multinomial law of size \( {n} \) and parameter \( {(p_1,\ldots,p_r)} \). In summary:

\[ \mathrm{Law}(S) =\left(\sum_{i=1}^rp_i\delta_{e_i}\right)^{*n} =\sum_{\substack{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r\\n_1+\cdots+n_r=n}} \frac{n!}{n_1!\cdots n_r!}p_1^{n_1}\cdots p_r^{n_r}\delta_{(n_1,\ldots,n_r)}. \]

This is the multinomial formula in the commutative algebra of Dirac masses equipped with the addition of measures and with the convolution product.

To connect the dice picture with the Boltzmann picture, take the correspondence \( {n} \) dice throws = \( {n} \) particles, and \( {r} \) faces = \( {r} \) energy levels.

Multinomial law from hypergeometric law. Consider an urn containing \( {N=N_1+\cdots+N_r} \) balls with \( {N_1} \) indistinguishable balls of color \( {1} \), \( {N_2} \) indistinguishable balls of color \( {2} \), etc (the colors are different). We extract \( {n} \) balls from this urn in a single step. There are \( {\binom{N}{n}} \) ways to do this. Now for every \( {0\leq n_1,\ldots,n_r\leq n} \) with \( {n_1+\cdots+n_r=n} \), there are exactly \( {\binom{N_1}{n_1}\cdots\binom{N_r}{n_r}} \) ways to extract \( {n} \) balls in a single step with \( {n_1} \) balls of color \( {1} \), \( {n_2} \) balls of color \( {2} \), etc. The corresponding probability coming from the uniform distribution on all extractions is then

\[ \frac{\binom{N_1}{n_1}\cdots\binom{N_r}{n_r}}{\binom{N}{n}}. \]

Note that we recover for free the Vandermonde binomial convolution formula:

\[ \sum_{\substack{0\leq n_1\leq N_1,\ldots,0\leq n_r\leq N_r\\n_1+\cdots+n_r=n}} \binom{N_1}{n_1}\cdots\binom{N_r}{n_r} =\binom{N}{n}. \]

Suppose now that \( {N\rightarrow\infty} \) in such a way that \( {N_i/N\rightarrow p_i} \) for every \( {1\leq i\leq r} \). We have then \( {p_1+\cdots+p_r=1} \), and, according to the Stirling formula (\( {n!\sim\sqrt{2\pi n}(n/e)^n} \)),

\[ \lim_{\substack{N\rightarrow\infty\\(N_1,\ldots,N_r)/N\rightarrow (p_1,\ldots,p_r)}} \frac{\binom{N_1}{n_1}\cdots\binom{N_r}{n_r}}{\binom{N}{n}} =\frac{n!}{n_1!\cdots n_r!}p_1^{n_1}\cdots p_r^{n_r}. \]

Note that we recover for free the multinomial formula by summing over

\[ \{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r:n_1+\cdots+n_r=n\}. \]

This is indeed the multinomial law of size \( {n} \) and parameter \( {(p_1,\ldots,p_r)} \). The moral is that a sample of size \( {n} \) without replacement in a giant urn of size \( {N\gg n} \) is equivalent to \( {n} \) independent samples with replacement. This is remarkable: it allows to deduce a non-uniform law (multinomial) from a uniform law (urn sampling).

Multinomial law from Poisson laws. Let \( {X_1,\ldots,X_r} \) be independent Poisson random variables with mean \( {\lambda_1,\ldots,\lambda_r} \) respectively. Then, for every integer \( {n\geq0} \), conditional on the event \( {\{X_1+\cdots+X_r=n\}} \), the random vector \( {X} \) follows the multinomial law of size \( {n} \) and parameter \( {(p_1,\ldots,p_r)} \) where

\[ p_i :=\frac{\lambda_i}{\lambda_1+\cdots+\lambda_r}. \]

Namely, since \( {S_r:=X_1+\cdots+X_r} \) is Poisson with mean \( {\lambda:=\lambda_1+\cdots+\lambda_r} \), we have \( {\mathbb{P}(S_r=n)=e^{-\lambda}\frac{\lambda^n}{n!}} \) and thus for any \( {0\leq n_1,\ldots,n_r\leq k} \) such that \( {n_1+\cdots+n_r=k} \),

\[ \mathbb{P}(X_1=n_1,\ldots,X_r=n_r|S_r=n) =\frac{\mathbb{P}(X_1=n_1,\ldots,X_r=n_r)}{\mathbb{P}(S_r=n)} =\cdots=\frac{n!}{n_1!\cdots{}n_r!}p_1^{n_1}\cdots p_r^{n_r}. \]

This property is the analogue of a very similar property for Dirichlet laws (one has to replace Poisson by Gamma laws). It allows also to relate Galton-Watson and Fisher-Wright processes.

Stability of multinomial by grouping. If \( {X} \) is a random vector of \( {\mathbb{N}^r} \) following the multinomial law of size \( {n} \) and parameter \( {(p_1,\ldots,p_r)} \) then for every partition \( {\{1,\ldots,r\}=I_1\cup\cdots\cup I_d} \) the random vector

\[ \left(\sum_{i\in I_1}X_i,\ldots,\sum_{i\in I_d}X_i\right) \]

of \( {\mathbb{N}^d} \) follows the multinomial law of size \( {n} \) and parameter

\[ \left(\sum_{i\in I_1}p_i,\ldots,\sum_{i\in I_d}p_i\right). \]

This property is the analogue of a very similar property for Dirichlet laws.

Multinomial and binomial. If \( {X} \) is a random vector of \( {\mathbb{N}^r} \) following the multinomial law of size \( {n} \) and parameter \( {(p_1,\ldots,p_r)} \) then the components \( {X_1,\ldots,X_r} \) of \( {X} \) follow a binomial law of size \( {n} \) and respective parameter \( {p_1,\ldots,p_d} \) (they are not independent, and their sum is \( {n} \)). For any non-empty \( {I\subset\{1,\ldots,r\}} \), the random variable \( {\sum_{i\in I}X_i} \) follow a binomial law of size \( {n} \) and parameter \( {\sum_{i\in I}p_i} \). This property is the analogue of a very similar property for Dirichlet laws (one has to replace binomial by Beta laws).

Back to basics – Multinomial law

Some other posts: