# Month: October 2010 Let ${X_1,X_2,\ldots}$ be i.i.d. random variables uniformly distributed on ${\{1,\ldots,r\}}$. The classical coupon collector problem consists in looking at the random time

$T=\min\{n\geq1:\{X_1,\ldots,X_n\}=\{1,\ldots,r\}\}.$

Since we have ${T=G_1+\cdots+G_r}$ where ${G_1,\ldots,G_r}$ are independent geometric random variables on ${\{1,2,\ldots\}}$ with ${\mathbb{E}(G_i)=r/(r-i+1)}$, we obtain

$\mathbb{E}(T)=r\sum_{i=1}^r\frac{1}{i}=r(\log(r)+\gamma+o_{r\rightarrow\infty}(1)).$

One may compute similarly the variance and obtain a deviation bound around the mean via Chebyshev’s inequality (exercise!). A simple reasoning shows that for any ${n\geq r}$,

$\mathbb{P}(T=n)=\frac{r!}{r^n}S(n-1,r-1)$

where ${S(n-1,r-1)}$ is the Stirling number of the second kind (the number of ways to partition ${n-1}$ objects into ${r-1}$ non empty blocs). It is difficult to get the tail of ${T}$ from this formula.

Another possibility is to write, for every ${n\geq r}$,

$\mathbb{P}(T>n) = \mathbb{P}(E_{n,1}\cup\cdots\cup E_{n,r}) \quad\text{where}\quad E_{n,i} = \{X_1\neq i,\ldots,X_n\neq i\}.$

Note that if ${1\leq i_1,\ldots,i_k\leq r}$ are distincs, then with ${R:=\{1,\ldots,r\}\setminus\{i_1,\ldots,i_k\}}$,

$\mathbb{P}(E_{n,i_1}\cap\cdots\cap E_{n,i_k}) =\mathbb{P}(X_1\in R)\cdots\mathbb{P}(X_n\in R) =\left(\frac{r-k}{r}\right)^n=\left(1-\frac{k}{r}\right)^n.$

Now, by the inclusion-exclusion principle (also known as the Poincaré sieve), for every ${n\geq r }$,

$\mathbb{P}(T>n) = \sum_{k=1}^r(-1)^{k-1}\binom{r}{k}\left(1-\frac{k}{r}\right)^n.$

Unfortunately, the signs are alternating. Actually, if we write

$\mathbb{P}(T>n)\leq \sum_{i=1}^r \mathbb{P}(E_{n,i})$

and if we note that

$\mathbb{P}(E_{n,i})=\left(1-\frac{1}{r}\right)^n\leq e^{-n/r}$

then, taking ${n=\lceil t r\log(r)\rceil}$, we obtain that for all real ${t>0}$,

$\mathbb{P}(T> \lceil tr\log(r)\rceil) \leq r^{-t+1}.$

It turns out that the asymptotic fluctuations of ${T}$ are Gumbel.

Theorem 1 (Gumbel fluctuations) We have

$\frac{T-r\log(r)}{r} \underset{r\rightarrow\infty}{\overset{\mathrm{law}}{\longrightarrow}} G$

where ${G}$ is the Gumbel law on ${\mathbb{R}}$ with density ${t\mapsto e^{-e^{-t}-t}}$.

Proof: Let us fix ${t\in\mathbb{R}}$ and let us show that

$\lim_{r\rightarrow\infty}\mathbb{P}(T> r\log(r)+tr)=S(t)=1-e^{-e^{-t}}.$

Fix ${t\in\mathbb{R}}$ and take ${r}$ large enough such that ${r\log(r)+tr>r}$. Set ${n_{t,r}=r\log(r)+tr}$ if ${r\log(r)+tr\in\mathbb{N}}$ and ${n_{t,r}=\lceil r\log(r)+tr\rceil}$ if not. We already know that

$\mathbb{P}(T> r\log(r)+tr) = \sum_{k=1}^r(-1)^{k-1}\binom{r}{k} \left(1-\frac{k}{r}\right)^{n_{t,r}}.$

Now, since ${\binom{r}{k}\leq r^kk!}$ and ${(1-u)\leq e^{-u}}$ for all ${u\geq0}$, we have

$\binom{r}{k}\left(1-\frac{k}{r}\right)^{n_{t,r}} \overset{\leq}{\underset{r\rightarrow\infty}{\longrightarrow}} \frac{e^{-kt}}{k!}.$

By dominated convergence, we obtain finally

$\lim_{r\rightarrow\infty} \sum_{k=1}^r(-1)^{k-1}\binom{r}{k} \left(1-\frac{k}{r}\right)^{n_{t,r}} =\sum_{k=1}^\infty(-1)^{k-1}\frac{e^{-kt}}{k!}=S(t).$

$\Box$

Actually, the coupon collector is a special instance of a more general model. Namely, if ${(X_n)_{n\geq0}}$ is a sequence of random variables, not necessarily independent or of the same law, taking their values in a finite set ${E}$, then we define the cover time by

$T=\inf\{n\geq0:\{X_0,\ldots,X_n\}=E\}.$

The cover time of finite Markov chains or random walks on finite graphs was extensively studied. For recent results, see e.g. Ding, Lee, Peres (2010).

Syntax · Style · .