Month: May 2013

This post is devoted to elementary combinatorial aspects of the multinomial law. We start with the Boltzmann picture, which provides a nice interpretation of both the multinomial coefficient, the number of terms in the multinomial formula, and the corresponding sum. By the way, if you really like Physics, you may also read the previous post about bosons an fermions.

Boltzmann gas. Let us consider a system of ${n}$ distinguishable particles, each of them being in one of the ${r}$ possible energy levels. Let us denote by ${n_i}$ the number of particles being in level ${i}$. The macroscopic state of the system is formed by the sequence of numbers ${(n_1,\ldots,n_r)}$ and we have ${n_1+\cdots+n_r=n}$.

Number of macroscopic states. The number of possible macroscopic states is

$\mathrm{card}\{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r:n_1+\cdots+n_r=n\} =\binom{n+r-1}{r-1} =\binom{n+r-1}{n}.$

This counting corresponds also to the…

1. number of terms in the multinomial formula

$(a_1+\cdots+a_r)^n=\sum_{\substack{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r\\n_1+\cdots+n_r=n}} \frac{n!}{n_1!\cdots n_r!}a_1^{n_1}\cdots a_r^{n_r};$

2. number of possible results in the extraction with replacement of ${n}$ balls from an urn containing ${r}$ distinguishable balls;
3. number of ways to throw ${n}$ indistinguishable balls into ${r}$ distinguishable boxes.

To derive the formula given above, we may use for instance the last interpretation and align the ${n}$ balls between two vertical sticks. We may then add ${r-1}$ additional sticks in order to delimit the ${r}$ boxes. There are ${n+1}$ ways to position the first additional stick, ${n+2}$ ways for the second additional stick, etc, and ${n+r-1}$ ways to position the ${(r-1)}$-th additional stick. Since there are ${(r-1)!}$ ways to permute these sticks, we obtain at the end ${\frac{(n+1)\cdots(n+r-1)}{(r-1)!}=\binom{n+r-1}{r-1}=\binom{n+r-1}{n}}$, as expected.

Number of miscroscopic states. For a fixed macroscopic state ${(n_1,\ldots,n_r)}$ with ${0\leq n_1,\ldots,n_r\leq n}$ and ${n_1+\cdots+n_r=n}$, the number of microscopic states corresponds to the degree of freedom of the system, and is denoted ${\binom{n}{n_1,\ldots,n_r}}$. Since the particles are distinguishable, we have

$\binom{n}{n_1,\ldots,n_r} :=\mathrm{card}\left\{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n: \forall i\in\{1,\ldots,r\}, \mathrm{card}\{1\leq k\leq n:b_k=i\}=n_i \right\}.$

There are ${\binom{n}{n_1}}$ ways to select the particles which are in level ${1}$, ${\binom{n-n_1}{n_2}}$ ways to select the particles which are in level ${2}$, etc, and thus, using the fact that ${n_r=n-n_1-\cdots -n_{r-1}}$,

$\binom{n}{n_1,\ldots,n_r} =\binom{n}{n_1}\binom{n-n_1}{n_2}\cdots\binom{n-n_1-\cdots -n_{r-1}}{n_r} =\frac{n!}{n_1!\cdots n_r!}.$

This is the multinomial coefficient! The total number of miscroscopic states can be obtained by summing over all macroscopic states the corresponding number of microscopic states, which gives, thanks to the multinomial formula:

$\sum_{\substack{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r\\n_1+\cdots+n_r=n}}\frac{n!}{n_1!\cdots n_r!}=(\underbrace{1+\cdots+1}_{r\mbox{ times}})^n=r^n.$

This number is not a surprise: ${n}$ distinguishable particles each in ${r}$ possible levels.

Boltzmann-Shannon entropy. If we let ${n\rightarrow\infty}$ in such a way that ${n_i/n\rightarrow p_i}$ for every ${1\leq i\leq r}$, then ${p_1+\cdots+p_r=1}$ and thanks to the Stirling formula (${n!\sim\sqrt{2\pi n}(n/e)^n}$), the asymptotic logarithmic degree of freedom per particle is

$H(p_1,\ldots,p_r):=\lim_{\substack{n\rightarrow\infty\$n_1,\ldots,n_r)/r\rightarrow(p_1,\ldots,p_r)}} \frac{\ln\binom{n}{n_1,\ldots,n_r}}{n} =\sum_{i=1}^rp_i\ln\frac{1}{p_i}.$ In other words, when \( {n\rightarrow\infty}$ and ${(n_1/n,\ldots,n_r/n)\rightarrow(p_1,\ldots,p_r)}$, we have

$\binom{n}{n_1,\ldots,n_r} = e^{nH\left(p_1,\ldots,p_r\right)+o(1)}.$

The quantity ${H(p_1,\ldots,p_r)}$ is the Boltzmann-Shannon entropy of the discrete distribution ${(p_1,\ldots,p_r)}$. In Shannon information theory, ${H(p_1,\ldots,p_r)}$ is the mean information per symbol in a random message written in an alphabet of ${r}$ symbols of frequencies ${p_1,\ldots,p_r}$. Note that both Boltzmann and Shannon have a double n in their names, a sort of combinatorial signature! The function ${H}$ defined on the simplex

$\{(p_1,\ldots,p_r)\in[0,1]^r:p_1+\cdots+p_r=1\}$

is strictly concave since ${t\in[0,1]\mapsto t\ln(t)}$ is strictly convex. It achieves its minimum ${0}$ at the extremal points of the simplex (${p}$ is then a Dirac mass, and this corresponds to certainty), and its maximum ${\ln(n)/n}$ at the center of the simplex (${p}$ is then the uniform law ${p=(1/n,\ldots,1/n)}$, and this corresponds to maximum uncertainty).

Let us fix a real number ${m\geq1}$. Over the discrete laws ${(p_1,p_2,\ldots)}$ on the integers ${\{1,2,\ldots\}}$ with mean ${m}$, the entropy ${H}$ is maximized by the geometric distribution of mean ${m}$. To see it, one may use Lagrange multipliers with the constraints:

$\sum_ip_i=1, \sum_iip_i=m, 0\leq p_i\leq1$

which gives ${\partial_iH(P)=-\ln(p_i)-1=\alpha+\beta i}$ leading to ${p_i=e^{-(1+\alpha)}e^{-\beta i}}$.

The multinomial law from a dice. Let us consider a dice with ${r}$ faces and let us denote by ${p_1,\ldots,p_r}$ the probabilities of the faces. We throw this dice ${n}$ times and we encode the results into a sequence ${X_1,\ldots,X_n}$ of independent and identically distributed random variables taking values in ${\{e_1,\ldots,e_r\}}$ where ${e_1,\ldots,e_r}$ is the canonical basis of ${\mathbb{R}^r}$, with law ${(p_1,\ldots,p_r)}$ meaning that ${\mathbb{P}(X_k=e_i)=p_i}$ for every ${1\leq i\leq r}$ and every ${1\leq k\leq n}$. The whole result of the ${n}$ throws is summarized by ${S:=X_1+\cdots+X_n}$ which is a random variable taking its values in

$\{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r:n_1+\cdots+n_r=n\}\subset\mathbb{R}^r.$

The law of ${S}$ is given by counting (microstates of a macrostate: we have to place the ${n_1}$ faces of type ${1}$, ${n_2}$ faces of type ${2}$, etc, among the ${n}$ trials)

$\begin{array}{rcl} \mathbb{P}(S=(n_1,\ldots,n_r)) &=& \sum_{\substack{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n\\\forall i,\mathrm{card}\{k:b_k=i\}=n_i}} \mathbb{P}(X_1=b_1,\ldots,X_n=b_n)\\ &=&\sum_{\substack{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n\\\forall i,\mathrm{card}\{k:b_k=i\}=n_i}} p_1^{n_1}\cdots p_r^{n_r}\\ &=&p_1^{n_1}\cdots p_r^{n_r}\sum_{\substack{(b_1,\ldots,b_n)\in\{1,\ldots,r\}^n\\\forall i,\mathrm{card}\{k:b_k=i\}=n_i}}1\\ &=&p_1^{n_1}\cdots p_r^{n_r}\binom{n}{n_1}\binom{n-n_1}{n_2}\cdots\binom{n-(n_1+\cdots+n_{r-1})}{n_r}\\ &=&p_1^{n_1}\cdots p_r^{n_r}\frac{n!}{n_1!\cdots n_r!}. \end{array}$

This is the multinomial law of size ${n}$ and parameter ${(p_1,\ldots,p_r)}$. In summary:

$\mathrm{Law}(S) =\left(\sum_{i=1}^rp_i\delta_{e_i}\right)^{*n} =\sum_{\substack{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r\\n_1+\cdots+n_r=n}} \frac{n!}{n_1!\cdots n_r!}p_1^{n_1}\cdots p_r^{n_r}\delta_{(n_1,\ldots,n_r)}.$

This is the multinomial formula in the commutative algebra of Dirac masses equipped with the addition of measures and with the convolution product.

To connect the dice picture with the Boltzmann picture, take the correspondence ${n}$ dice throws = ${n}$ particles, and ${r}$ faces = ${r}$ energy levels.

Multinomial law from hypergeometric law. Consider an urn containing ${N=N_1+\cdots+N_r}$ balls with ${N_1}$ indistinguishable balls of color ${1}$, ${N_2}$ indistinguishable balls of color ${2}$, etc (the colors are different). We extract ${n}$ balls from this urn in a single step. There are ${\binom{N}{n}}$ ways to do this. Now for every ${0\leq n_1,\ldots,n_r\leq n}$ with ${n_1+\cdots+n_r=n}$, there are exactly ${\binom{N_1}{n_1}\cdots\binom{N_r}{n_r}}$ ways to extract ${n}$ balls in a single step with ${n_1}$ balls of color ${1}$, ${n_2}$ balls of color ${2}$, etc. The corresponding probability coming from the uniform distribution on all extractions is then

$\frac{\binom{N_1}{n_1}\cdots\binom{N_r}{n_r}}{\binom{N}{n}}.$

$\sum_{\substack{0\leq n_1\leq N_1,\ldots,0\leq n_r\leq N_r\\n_1+\cdots+n_r=n}} \binom{N_1}{n_1}\cdots\binom{N_r}{n_r} =\binom{N}{n}.$

Suppose now that ${N\rightarrow\infty}$ in such a way that ${N_i/N\rightarrow p_i}$ for every ${1\leq i\leq r}$. We have then ${p_1+\cdots+p_r=1}$, and, according to the Stirling formula (${n!\sim\sqrt{2\pi n}(n/e)^n}$),

$\lim_{\substack{N\rightarrow\infty\$N_1,\ldots,N_r)/N\rightarrow (p_1,\ldots,p_r)}} \frac{\binom{N_1}{n_1}\cdots\binom{N_r}{n_r}}{\binom{N}{n}} =\frac{n!}{n_1!\cdots n_r!}p_1^{n_1}\cdots p_r^{n_r}.$ Note that we recover for free the multinomial formula by summing over $\{(n_1,\ldots,n_r)\in\{0,\ldots,n\}^r:n_1+\cdots+n_r=n\}.$ This is indeed the multinomial law of size \( {n}$ and parameter ${(p_1,\ldots,p_r)}$. The moral is that a sample of size ${n}$ without replacement in a giant urn of size ${N\gg n}$ is equivalent to ${n}$ independent samples with replacement. This is remarkable: it allows to deduce a non-uniform law (multinomial) from a uniform law (urn sampling).

Multinomial law from Poisson laws. Let ${X_1,\ldots,X_r}$ be independent Poisson random variables with mean ${\lambda_1,\ldots,\lambda_r}$ respectively. Then, for every integer ${n\geq0}$, conditional on the event ${\{X_1+\cdots+X_r=n\}}$, the random vector ${X}$ follows the multinomial law of size ${n}$ and parameter ${(p_1,\ldots,p_r)}$ where

$p_i :=\frac{\lambda_i}{\lambda_1+\cdots+\lambda_r}.$

Namely, since ${S_r:=X_1+\cdots+X_r}$ is Poisson with mean ${\lambda:=\lambda_1+\cdots+\lambda_r}$, we have ${\mathbb{P}(S_r=n)=e^{-\lambda}\frac{\lambda^n}{n!}}$ and thus for any ${0\leq n_1,\ldots,n_r\leq k}$ such that ${n_1+\cdots+n_r=k}$,

$\mathbb{P}(X_1=n_1,\ldots,X_r=n_r|S_r=n) =\frac{\mathbb{P}(X_1=n_1,\ldots,X_r=n_r)}{\mathbb{P}(S_r=n)} =\cdots=\frac{n!}{n_1!\cdots{}n_r!}p_1^{n_1}\cdots p_r^{n_r}.$

This property is the analogue of a very similar property for Dirichlet laws (one has to replace Poisson by Gamma laws). It allows also to relate Galton-Watson and Fisher-Wright processes.

Stability of multinomial by grouping. If ${X}$ is a random vector of ${\mathbb{N}^r}$ following the multinomial law of size ${n}$ and parameter ${(p_1,\ldots,p_r)}$ then for every partition ${\{1,\ldots,r\}=I_1\cup\cdots\cup I_d}$ the random vector

$\left(\sum_{i\in I_1}X_i,\ldots,\sum_{i\in I_d}X_i\right)$

of ${\mathbb{N}^d}$ follows the multinomial law of size ${n}$ and parameter

$\left(\sum_{i\in I_1}p_i,\ldots,\sum_{i\in I_d}p_i\right).$

This property is the analogue of a very similar property for Dirichlet laws.

Multinomial and binomial. If ${X}$ is a random vector of ${\mathbb{N}^r}$ following the multinomial law of size ${n}$ and parameter ${(p_1,\ldots,p_r)}$ then the components ${X_1,\ldots,X_r}$ of ${X}$ follow a binomial law of size ${n}$ and respective parameter ${p_1,\ldots,p_d}$ (they are not independent, and their sum is ${n}$). For any non-empty ${I\subset\{1,\ldots,r\}}$, the random variable ${\sum_{i\in I}X_i}$ follow a binomial law of size ${n}$ and parameter ${\sum_{i\in I}p_i}$. This property is the analogue of a very similar property for Dirichlet laws (one has to replace binomial by Beta laws).

« … Que diriez-vous d’ordinateur ? C’est un mot correctement formé, qui se trouve même dans le Littré comme adjectif désignant Dieu qui met de l’ordre dans le monde. Un mot de ce genre a l’avantage de donner aisément un verbe ordiner, un nom d’action ordination. L’inconvénient est que ordination désigne une cérémonie religieuse ; mais les deux champs de signification (religion et comptabilité) sont si éloignés et la cérémonie d’ordination connue, je crois, de si peu de personnes que l’inconvénient est peut-être mineur. … D’ailleurs votre machine serait ordinateur (et non ordination) et ce mot est tout à fait sorti de l’usage théologique. … »

Jacques Perret, professeur de philologie latine à la Sorbonne. Lettre à IBM France, 1955.

Source: « La théorie de l’information », roman de Aurélien Bellanger, Gallimard, 2012.

J’ai mis à jour très récemment deux documents pédagogiques sur ma page d’enseignement :

Il est très plaisant d’enseigner les probabilités. Comme vous le savez sans doute, dans les lycées français, la nouvelle seconde (2010), la nouvelle première (2011), la nouvelle terminale (2012), la nouvelle sup (2013), et la nouvelle spé (2014) font la part belle aux probabilités ainsi qu’à une statistique plus descriptive qu’inférientielle. L’intention est bonne mais la mise en œuvre ne fait pas l’unanimité. Il est bien sûr difficile de faire des choix et des compromis pour mieux vivre avec son temps. Il est aussi inquiétant de remettre en cause les habitudes. Il est même terrifiant de devoir enseigner quelque chose jamais vraiment appris ou compris. Une formation continue massive est nécessaire. Rendez-vous dans quelques années pour un bilan. Le programme de terminale C de 1972 contenait un brin de probabilités, y compris l’inégalité de Tchebycheff et la loi faible des grands nombres ! Ah, le-niveau-baisse ! (depuis Euclide).

Recently, a friend of mine, who is also a mathematician, received a new laptop equipped with Ubuntu, a version of the Linux operating system based on Debian. He asked then about the best available free LaTeX editor under Linux. There is no canonical answer to this question. I like to use Emacs and sometimes vim (old Unix habits). Many other editors have a LaTeX plugin such as gedit, geany, jedit, etc. There is also a bunch of LaTeX specific editors such as texworks, latexila, texmaker, kile, etc. For people not so familiar with Unix,  I suggest to try gummi (for people familiar with Apple Macs) or texstudio (for people familiar with Microsoft Windows). People familiar with Scientific Workplace or Scientific Word may try lyx. This software universe is evolving so rapidly that this post will become obsolete very soon.

I take this opportunity to advertise xournal, written by the mathematician Denis Auroux. This program has nothing to do with LaTeX, but  is very useful : it allows to annotate PDF files!