Processing math: 100%
Press "Enter" to skip to content

Back to basics - Multinomial law

Dice

This post is devoted to elementary combinatorial aspects of the multinomial law. We start with the Boltzmann picture, which provides a nice interpretation of both the multinomial coefficient, the number of terms in the multinomial formula, and the corresponding sum. By the way, if you really like Physics, you may also read the previous post about bosons an fermions.

Boltzmann gas. Let us consider a system of n distinguishable particles, each of them being in one of the r possible energy levels. Let us denote by ni the number of particles being in level i. The macroscopic state of the system is formed by the sequence of numbers (n1,,nr) and we have n1++nr=n.

Number of macroscopic states. The number of possible macroscopic states is

card{(n1,,nr){0,,n}r:n1++nr=n}=(n+r1r1)=(n+r1n).

This counting corresponds also to the...

  1. number of terms in the multinomial formula

    (a1++ar)n=(n1,,nr){0,,n}rn1++nr=nn!n1!nr!an11anrr;

  2. number of possible results in the extraction with replacement of n balls from an urn containing r distinguishable balls;
  3. number of ways to throw n indistinguishable balls into r distinguishable boxes.

To derive the formula given above, we may use for instance the last interpretation and align the n balls between two vertical sticks. We may then add r1 additional sticks in order to delimit the r boxes. There are n+1 ways to position the first additional stick, n+2 ways for the second additional stick, etc, and n+r1 ways to position the (r1)-th additional stick. Since there are (r1)! ways to permute these sticks, we obtain at the end (n+1)(n+r1)(r1)!=(n+r1r1)=(n+r1n), as expected.

Number of miscroscopic states. For a fixed macroscopic state (n1,,nr) with 0n1,,nrn and n1++nr=n, the number of microscopic states corresponds to the degree of freedom of the system, and is denoted (nn1,,nr). Since the particles are distinguishable, we have

(nn1,,nr):=card{(b1,,bn){1,,r}n:i{1,,r},card{1kn:bk=i}=ni}.

There are (nn1) ways to select the particles which are in level 1, (nn1n2) ways to select the particles which are in level 2, etc, and thus, using the fact that nr=nn1nr1,

(nn1,,nr)=(nn1)(nn1n2)(nn1nr1nr)=n!n1!nr!.

This is the multinomial coefficient! The total number of miscroscopic states can be obtained by summing over all macroscopic states the corresponding number of microscopic states, which gives, thanks to the multinomial formula:

(n1,,nr){0,,n}rn1++nr=nn!n1!nr!=(1++1r times)n=rn.

This number is not a surprise: n distinguishable particles each in r possible levels.

Boltzmann-Shannon entropy. If we let n in such a way that ni/npi for every 1ir, then p1++pr=1 and thanks to the Stirling formula (n!2πn(n/e)n), the asymptotic logarithmic degree of freedom per particle is

H(p1,,pr):=limn(n1,,nr)/r(p1,,pr)ln(nn1,,nr)n=ri=1piln1pi.

In other words, when n and (n1/n,,nr/n)(p1,,pr), we have

(nn1,,nr)=enH(p1,,pr)+o(1).

The quantity H(p1,,pr) is the Boltzmann-Shannon entropy of the discrete distribution (p1,,pr). In Shannon information theory, H(p1,,pr) is the mean information per symbol in a random message written in an alphabet of r symbols of frequencies p1,,pr. Note that both Boltzmann and Shannon have a double n in their names, a sort of combinatorial signature! The function H defined on the simplex

{(p1,,pr)[0,1]r:p1++pr=1}

is strictly concave since t[0,1]tln(t) is strictly convex. It achieves its minimum 0 at the extremal points of the simplex (p is then a Dirac mass, and this corresponds to certainty), and its maximum ln(n)/n at the center of the simplex (p is then the uniform law p=(1/n,,1/n), and this corresponds to maximum uncertainty).

Let us fix a real number m1. Over the discrete laws (p1,p2,) on the integers {1,2,} with mean m, the entropy H is maximized by the geometric distribution of mean m. To see it, one may use Lagrange multipliers with the constraints:

ipi=1,iipi=m,0pi1

which gives iH(P)=ln(pi)1=α+βi leading to pi=e(1+α)eβi.

The multinomial law from a dice. Let us consider a dice with r faces and let us denote by p1,,pr the probabilities of the faces. We throw this dice n times and we encode the results into a sequence X1,,Xn of independent and identically distributed random variables taking values in {e1,,er} where e1,,er is the canonical basis of Rr, with law (p1,,pr) meaning that P(Xk=ei)=pi for every 1ir and every 1kn. The whole result of the n throws is summarized by S:=X1++Xn which is a random variable taking its values in

{(n1,,nr){0,,n}r:n1++nr=n}Rr.

The law of S is given by counting (microstates of a macrostate: we have to place the n1 faces of type 1, n2 faces of type 2, etc, among the n trials)

P(S=(n1,,nr))=(b1,,bn){1,,r}ni,card{k:bk=i}=niP(X1=b1,,Xn=bn)=(b1,,bn){1,,r}ni,card{k:bk=i}=nipn11pnrr=pn11pnrr(b1,,bn){1,,r}ni,card{k:bk=i}=ni1=pn11pnrr(nn1)(nn1n2)(n(n1++nr1)nr)=pn11pnrrn!n1!nr!.

This is the multinomial law of size n and parameter (p1,,pr). In summary:

Law(S)=(ri=1piδei)n=(n1,,nr){0,,n}rn1++nr=nn!n1!nr!pn11pnrrδ(n1,,nr).

This is the multinomial formula in the commutative algebra of Dirac masses equipped with the addition of measures and with the convolution product.

To connect the dice picture with the Boltzmann picture, take the correspondence n dice throws = n particles, and r faces = r energy levels.

Multinomial law from hypergeometric law. Consider an urn containing N=N1++Nr balls with N1 indistinguishable balls of color 1, N2 indistinguishable balls of color 2, etc (the colors are different). We extract n balls from this urn in a single step. There are (Nn) ways to do this. Now for every 0n1,,nrn with n1++nr=n, there are exactly (N1n1)(Nrnr) ways to extract n balls in a single step with n1 balls of color 1, n2 balls of color 2, etc. The corresponding probability coming from the uniform distribution on all extractions is then

(N1n1)(Nrnr)(Nn).

Note that we recover for free the Vandermonde binomial convolution formula:

0n1N1,,0nrNrn1++nr=n(N1n1)(Nrnr)=(Nn).

Suppose now that N in such a way that Ni/Npi for every 1ir. We have then p1++pr=1, and, according to the Stirling formula (n!2πn(n/e)n),

limN(N1,,Nr)/N(p1,,pr)(N1n1)(Nrnr)(Nn)=n!n1!nr!pn11pnrr.

Note that we recover for free the multinomial formula by summing over

{(n1,,nr){0,,n}r:n1++nr=n}.

This is indeed the multinomial law of size n and parameter (p1,,pr). The moral is that a sample of size n without replacement in a giant urn of size Nn is equivalent to n independent samples with replacement. This is remarkable: it allows to deduce a non-uniform law (multinomial) from a uniform law (urn sampling).

Multinomial law from Poisson laws. Let X1,,Xr be independent Poisson random variables with mean λ1,,λr respectively. Then, for every integer n0, conditional on the event {X1++Xr=n}, the random vector X follows the multinomial law of size n and parameter (p1,,pr) where

pi:=λiλ1++λr.

Namely, since Sr:=X1++Xr is Poisson with mean λ:=λ1++λr, we have P(Sr=n)=eλλnn! and thus for any 0n1,,nrk such that n1++nr=k,

P(X1=n1,,Xr=nr|Sr=n)=P(X1=n1,,Xr=nr)P(Sr=n)==n!n1!nr!pn11pnrr.

This property is the analogue of a very similar property for Dirichlet laws (one has to replace Poisson by Gamma laws). It allows also to relate Galton-Watson and Fisher-Wright processes.

Stability of multinomial by grouping. If X is a random vector of Nr following the multinomial law of size n and parameter (p1,,pr) then for every partition {1,,r}=I1Id the random vector

(iI1Xi,,iIdXi)

of Nd follows the multinomial law of size n and parameter

(iI1pi,,iIdpi).

This property is the analogue of a very similar property for Dirichlet laws.

Multinomial and binomial. If X is a random vector of Nr following the multinomial law of size n and parameter (p1,,pr) then the components X1,,Xr of X follow a binomial law of size n and respective parameter p1,,pd (they are not independent, and their sum is n). For any non-empty I{1,,r}, the random variable iIXi follow a binomial law of size n and parameter iIpi. This property is the analogue of a very similar property for Dirichlet laws (one has to replace binomial by Beta laws).

    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Syntax · Style · .