This post is devoted to elementary combinatorial aspects of the multinomial law. We start with the Boltzmann picture, which provides a nice interpretation of both the multinomial coefficient, the number of terms in the multinomial formula, and the corresponding sum. By the way, if you really like Physics, you may also read the previous post about bosons an fermions.
Boltzmann gas. Let us consider a system of n distinguishable particles, each of them being in one of the r possible energy levels. Let us denote by ni the number of particles being in level i. The macroscopic state of the system is formed by the sequence of numbers (n1,…,nr) and we have n1+⋯+nr=n.
Number of macroscopic states. The number of possible macroscopic states is
card{(n1,…,nr)∈{0,…,n}r:n1+⋯+nr=n}=(n+r−1r−1)=(n+r−1n).
This counting corresponds also to the...
- number of terms in the multinomial formula
(a1+⋯+ar)n=∑(n1,…,nr)∈{0,…,n}rn1+⋯+nr=nn!n1!⋯nr!an11⋯anrr;
- number of possible results in the extraction with replacement of n balls from an urn containing r distinguishable balls;
- number of ways to throw n indistinguishable balls into r distinguishable boxes.
To derive the formula given above, we may use for instance the last interpretation and align the n balls between two vertical sticks. We may then add r−1 additional sticks in order to delimit the r boxes. There are n+1 ways to position the first additional stick, n+2 ways for the second additional stick, etc, and n+r−1 ways to position the (r−1)-th additional stick. Since there are (r−1)! ways to permute these sticks, we obtain at the end (n+1)⋯(n+r−1)(r−1)!=(n+r−1r−1)=(n+r−1n), as expected.
Number of miscroscopic states. For a fixed macroscopic state (n1,…,nr) with 0≤n1,…,nr≤n and n1+⋯+nr=n, the number of microscopic states corresponds to the degree of freedom of the system, and is denoted (nn1,…,nr). Since the particles are distinguishable, we have
(nn1,…,nr):=card{(b1,…,bn)∈{1,…,r}n:∀i∈{1,…,r},card{1≤k≤n:bk=i}=ni}.
There are (nn1) ways to select the particles which are in level 1, (n−n1n2) ways to select the particles which are in level 2, etc, and thus, using the fact that nr=n−n1−⋯−nr−1,
(nn1,…,nr)=(nn1)(n−n1n2)⋯(n−n1−⋯−nr−1nr)=n!n1!⋯nr!.
This is the multinomial coefficient! The total number of miscroscopic states can be obtained by summing over all macroscopic states the corresponding number of microscopic states, which gives, thanks to the multinomial formula:
∑(n1,…,nr)∈{0,…,n}rn1+⋯+nr=nn!n1!⋯nr!=(1+⋯+1⏟r times)n=rn.
This number is not a surprise: n distinguishable particles each in r possible levels.
Boltzmann-Shannon entropy. If we let n→∞ in such a way that ni/n→pi for every 1≤i≤r, then p1+⋯+pr=1 and thanks to the Stirling formula (n!∼√2πn(n/e)n), the asymptotic logarithmic degree of freedom per particle is
H(p1,…,pr):=limn→∞(n1,…,nr)/r→(p1,…,pr)ln(nn1,…,nr)n=r∑i=1piln1pi.
In other words, when n→∞ and (n1/n,…,nr/n)→(p1,…,pr), we have
(nn1,…,nr)=enH(p1,…,pr)+o(1).
The quantity H(p1,…,pr) is the Boltzmann-Shannon entropy of the discrete distribution (p1,…,pr). In Shannon information theory, H(p1,…,pr) is the mean information per symbol in a random message written in an alphabet of r symbols of frequencies p1,…,pr. Note that both Boltzmann and Shannon have a double n in their names, a sort of combinatorial signature! The function H defined on the simplex
{(p1,…,pr)∈[0,1]r:p1+⋯+pr=1}
is strictly concave since t∈[0,1]↦tln(t) is strictly convex. It achieves its minimum 0 at the extremal points of the simplex (p is then a Dirac mass, and this corresponds to certainty), and its maximum ln(n)/n at the center of the simplex (p is then the uniform law p=(1/n,…,1/n), and this corresponds to maximum uncertainty).
Let us fix a real number m≥1. Over the discrete laws (p1,p2,…) on the integers {1,2,…} with mean m, the entropy H is maximized by the geometric distribution of mean m. To see it, one may use Lagrange multipliers with the constraints:
∑ipi=1,∑iipi=m,0≤pi≤1
which gives ∂iH(P)=−ln(pi)−1=α+βi leading to pi=e−(1+α)e−βi.
The multinomial law from a dice. Let us consider a dice with r faces and let us denote by p1,…,pr the probabilities of the faces. We throw this dice n times and we encode the results into a sequence X1,…,Xn of independent and identically distributed random variables taking values in {e1,…,er} where e1,…,er is the canonical basis of Rr, with law (p1,…,pr) meaning that P(Xk=ei)=pi for every 1≤i≤r and every 1≤k≤n. The whole result of the n throws is summarized by S:=X1+⋯+Xn which is a random variable taking its values in
{(n1,…,nr)∈{0,…,n}r:n1+⋯+nr=n}⊂Rr.
The law of S is given by counting (microstates of a macrostate: we have to place the n1 faces of type 1, n2 faces of type 2, etc, among the n trials)
P(S=(n1,…,nr))=∑(b1,…,bn)∈{1,…,r}n∀i,card{k:bk=i}=niP(X1=b1,…,Xn=bn)=∑(b1,…,bn)∈{1,…,r}n∀i,card{k:bk=i}=nipn11⋯pnrr=pn11⋯pnrr∑(b1,…,bn)∈{1,…,r}n∀i,card{k:bk=i}=ni1=pn11⋯pnrr(nn1)(n−n1n2)⋯(n−(n1+⋯+nr−1)nr)=pn11⋯pnrrn!n1!⋯nr!.
This is the multinomial law of size n and parameter (p1,…,pr). In summary:
Law(S)=(r∑i=1piδei)∗n=∑(n1,…,nr)∈{0,…,n}rn1+⋯+nr=nn!n1!⋯nr!pn11⋯pnrrδ(n1,…,nr).
This is the multinomial formula in the commutative algebra of Dirac masses equipped with the addition of measures and with the convolution product.
To connect the dice picture with the Boltzmann picture, take the correspondence n dice throws = n particles, and r faces = r energy levels.
Multinomial law from hypergeometric law. Consider an urn containing N=N1+⋯+Nr balls with N1 indistinguishable balls of color 1, N2 indistinguishable balls of color 2, etc (the colors are different). We extract n balls from this urn in a single step. There are (Nn) ways to do this. Now for every 0≤n1,…,nr≤n with n1+⋯+nr=n, there are exactly (N1n1)⋯(Nrnr) ways to extract n balls in a single step with n1 balls of color 1, n2 balls of color 2, etc. The corresponding probability coming from the uniform distribution on all extractions is then
(N1n1)⋯(Nrnr)(Nn).
Note that we recover for free the Vandermonde binomial convolution formula:
∑0≤n1≤N1,…,0≤nr≤Nrn1+⋯+nr=n(N1n1)⋯(Nrnr)=(Nn).
Suppose now that N→∞ in such a way that Ni/N→pi for every 1≤i≤r. We have then p1+⋯+pr=1, and, according to the Stirling formula (n!∼√2πn(n/e)n),
limN→∞(N1,…,Nr)/N→(p1,…,pr)(N1n1)⋯(Nrnr)(Nn)=n!n1!⋯nr!pn11⋯pnrr.
Note that we recover for free the multinomial formula by summing over
{(n1,…,nr)∈{0,…,n}r:n1+⋯+nr=n}.
This is indeed the multinomial law of size n and parameter (p1,…,pr). The moral is that a sample of size n without replacement in a giant urn of size N≫n is equivalent to n independent samples with replacement. This is remarkable: it allows to deduce a non-uniform law (multinomial) from a uniform law (urn sampling).
Multinomial law from Poisson laws. Let X1,…,Xr be independent Poisson random variables with mean λ1,…,λr respectively. Then, for every integer n≥0, conditional on the event {X1+⋯+Xr=n}, the random vector X follows the multinomial law of size n and parameter (p1,…,pr) where
pi:=λiλ1+⋯+λr.
Namely, since Sr:=X1+⋯+Xr is Poisson with mean λ:=λ1+⋯+λr, we have P(Sr=n)=e−λλnn! and thus for any 0≤n1,…,nr≤k such that n1+⋯+nr=k,
P(X1=n1,…,Xr=nr|Sr=n)=P(X1=n1,…,Xr=nr)P(Sr=n)=⋯=n!n1!⋯nr!pn11⋯pnrr.
This property is the analogue of a very similar property for Dirichlet laws (one has to replace Poisson by Gamma laws). It allows also to relate Galton-Watson and Fisher-Wright processes.
Stability of multinomial by grouping. If X is a random vector of Nr following the multinomial law of size n and parameter (p1,…,pr) then for every partition {1,…,r}=I1∪⋯∪Id the random vector
(∑i∈I1Xi,…,∑i∈IdXi)
of Nd follows the multinomial law of size n and parameter
(∑i∈I1pi,…,∑i∈Idpi).
This property is the analogue of a very similar property for Dirichlet laws.
Multinomial and binomial. If X is a random vector of Nr following the multinomial law of size n and parameter (p1,…,pr) then the components X1,…,Xr of X follow a binomial law of size n and respective parameter p1,…,pd (they are not independent, and their sum is n). For any non-empty I⊂{1,…,r}, the random variable ∑i∈IXi follow a binomial law of size n and parameter ∑i∈Ipi. This property is the analogue of a very similar property for Dirichlet laws (one has to replace binomial by Beta laws).