October 2010 – Libres pensées d'un mathématicien ordinaire

Let $ {X_1,X_2,\ldots} $ be i.i.d. random variables uniformly distributed on $ {\{1,\ldots,r\}} $. The classical coupon collector problem consists in looking at the random time

\[ T=\min\{n\geq1:\{X_1,\ldots,X_n\}=\{1,\ldots,r\}\}. \]

Since we have $ {T=G_1+\cdots+G_r} $ where $ {G_1,\ldots,G_r} $ are independent geometric random variables on $ {\{1,2,\ldots\}} $ with $ {\mathbb{E}(G_i)=r/(r-i+1)} $, we obtain

\[ \mathbb{E}(T)=r\sum_{i=1}^r\frac{1}{i}=r(\log(r)+\gamma+o_{r\rightarrow\infty}(1)). \]

One may compute similarly the variance and obtain a deviation bound around the mean via Chebyshev's inequality (exercise!). A simple reasoning shows that for any $ {n\geq r} $,

\[ \mathbb{P}(T=n)=\frac{r!}{r^n}S(n-1,r-1) \]

where $ {S(n-1,r-1)} $ is the Stirling number of the second kind (the number of ways to partition $ {n-1} $ objects into $ {r-1} $ non empty blocs). It is difficult to get the tail of $ {T} $ from this formula.

Another possibility is to write, for every $ {n\geq r} $,

\[ \mathbb{P}(T>n) = \mathbb{P}(E_{n,1}\cup\cdots\cup E_{n,r}) \quad\text{where}\quad E_{n,i} = \{X_1\neq i,\ldots,X_n\neq i\}. \]

Note that if $ {1\leq i_1,\ldots,i_k\leq r} $ are distincs, then with $ {R:=\{1,\ldots,r\}\setminus\{i_1,\ldots,i_k\}} $,

\[ \mathbb{P}(E_{n,i_1}\cap\cdots\cap E_{n,i_k}) =\mathbb{P}(X_1\in R)\cdots\mathbb{P}(X_n\in R) =\left(\frac{r-k}{r}\right)^n=\left(1-\frac{k}{r}\right)^n. \]

Now, by the inclusion-exclusion principle (also known as the Poincaré sieve), for every $ {n\geq r } $,

\[ \mathbb{P}(T>n) = \sum_{k=1}^r(-1)^{k-1}\binom{r}{k}\left(1-\frac{k}{r}\right)^n. \]

Unfortunately, the signs are alternating. Actually, if we write

\[ \mathbb{P}(T>n)\leq \sum_{i=1}^r \mathbb{P}(E_{n,i}) \]

and if we note that

\[ \mathbb{P}(E_{n,i})=\left(1-\frac{1}{r}\right)^n\leq e^{-n/r} \]

then, taking $ {n=\lceil t r\log(r)\rceil} $, we obtain that for all real $ {t>0} $,

\[ \mathbb{P}(T> \lceil tr\log(r)\rceil) \leq r^{-t+1}. \]

It turns out that the asymptotic fluctuations of $ {T} $ are Gumbel.

Theorem 1 (Gumbel fluctuations) We have
\[ \frac{T-r\log(r)}{r} \underset{r\rightarrow\infty}{\overset{\mathrm{law}}{\longrightarrow}} G \]
where $ {G} $ is the Gumbel law on $ {\mathbb{R}} $ with density $ {t\mapsto e^{-e^{-t}-t}} $.

Proof: Let us fix $ {t\in\mathbb{R}} $ and let us show that

\[ \lim_{r\rightarrow\infty}\mathbb{P}(T> r\log(r)+tr)=S(t)=1-e^{-e^{-t}}. \]

Fix $ {t\in\mathbb{R}} $ and take $ {r} $ large enough such that $ {r\log(r)+tr>r} $. Set $ {n_{t,r}=r\log(r)+tr} $ if $ {r\log(r)+tr\in\mathbb{N}} $ and $ {n_{t,r}=\lceil r\log(r)+tr\rceil} $ if not. We already know that

\[ \mathbb{P}(T> r\log(r)+tr) = \sum_{k=1}^r(-1)^{k-1}\binom{r}{k} \left(1-\frac{k}{r}\right)^{n_{t,r}}. \]

Now, since $ {\binom{r}{k}\leq r^kk!} $ and $ {(1-u)\leq e^{-u}} $ for all $ {u\geq0} $, we have

\[ \binom{r}{k}\left(1-\frac{k}{r}\right)^{n_{t,r}} \overset{\leq}{\underset{r\rightarrow\infty}{\longrightarrow}} \frac{e^{-kt}}{k!}. \]

By dominated convergence, we obtain finally

\[ \lim_{r\rightarrow\infty} \sum_{k=1}^r(-1)^{k-1}\binom{r}{k} \left(1-\frac{k}{r}\right)^{n_{t,r}} =\sum_{k=1}^\infty(-1)^{k-1}\frac{e^{-kt}}{k!}=S(t). \]

$\Box$

The classical coupon collector problem is considered in e.g. Feller (volume I)
Another alternating signs formula for the tail can be found in e.g. Fulman (2009)
The proof of the Gumbel fluctuation is taken from Motwani and Raghavan (section 3.2)
The non uniform coupon collector is nicely studied by e.g. Holst (1986)

Actually, the coupon collector is a special instance of a more general model. Namely, if $ {(X_n)_{n\geq0}} $ is a sequence of random variables, not necessarily independent or of the same law, taking their values in a finite set $ {E} $, then we define the cover time by

\[ T=\inf\{n\geq0:\{X_0,\ldots,X_n\}=E\}. \]

The cover time of finite Markov chains or random walks on finite graphs was extensively studied. For recent results, see e.g. Ding, Lee, Peres (2010).

2 Comments

Month: October 2010

Back to basics - Coupon collector