{"id":937,"date":"2010-10-20T00:38:05","date_gmt":"2010-10-19T22:38:05","guid":{"rendered":"http:\/\/djalil.chafai.net\/blog\/?p=937"},"modified":"2020-06-22T15:04:53","modified_gmt":"2020-06-22T13:04:53","slug":"back-to-basics-coupon-collector","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2010\/10\/20\/back-to-basics-coupon-collector\/","title":{"rendered":"Back to basics - Coupon collector"},"content":{"rendered":"<p style=\"text-align: center;\"><a href=\"http:\/\/en.wikipedia.org\/wiki\/House_of_Stuart\"><img loading=\"lazy\" class=\"aligncenter size-medium wp-image-946\" title=\"Collection of stamps: House of Stuart \" src=\"\/blog\/wp-content\/uploads\/2010\/10\/coupon_collector_stamps_stuarts-300x169.gif\" alt=\"\" width=\"300\" height=\"169\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2010\/10\/coupon_collector_stamps_stuarts-300x169.gif 300w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2010\/10\/coupon_collector_stamps_stuarts.gif 944w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Let \\( {X_1,X_2,\\ldots} \\) be i.i.d. random variables uniformly distributed on \\( {\\{1,\\ldots,r\\}} \\). The classical <em>coupon collector problem<\/em> consists in looking at the random time<\/p>\n<p style=\"text-align: center;\">\\[ T=\\min\\{n\\geq1:\\{X_1,\\ldots,X_n\\}=\\{1,\\ldots,r\\}\\}. \\]<\/p>\n<p style=\"text-align: justify;\">Since we have \\( {T=G_1+\\cdots+G_r} \\) where \\( {G_1,\\ldots,G_r} \\) are independent geometric random variables on \\( {\\{1,2,\\ldots\\}} \\) with \\( {\\mathbb{E}(G_i)=r\/(r-i+1)} \\), we obtain<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{E}(T)=r\\sum_{i=1}^r\\frac{1}{i}=r(\\log(r)+\\gamma+o_{r\\rightarrow\\infty}(1)). \\]<\/p>\n<p style=\"text-align: justify;\">One may compute similarly the variance and obtain a deviation bound around the mean via Chebyshev's inequality (exercise!). A simple reasoning shows that for any \\( {n\\geq r} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(T=n)=\\frac{r!}{r^n}S(n-1,r-1) \\]<\/p>\n<p style=\"text-align: justify;\">where \\( {S(n-1,r-1)} \\) is the Stirling number of the second kind (the number of ways to partition \\( {n-1} \\) objects into \\( {r-1} \\) non empty blocs). It is difficult to get the tail of \\( {T} \\) from this formula.<\/p>\n<p style=\"text-align: justify;\">Another possibility is to write, for every \\( {n\\geq r} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(T&gt;n) = \\mathbb{P}(E_{n,1}\\cup\\cdots\\cup E_{n,r}) \\quad\\text{where}\\quad E_{n,i} = \\{X_1\\neq i,\\ldots,X_n\\neq i\\}. \\]<\/p>\n<p style=\"text-align: justify;\">Note that if \\( {1\\leq i_1,\\ldots,i_k\\leq r} \\) are distincs, then with \\( {R:=\\{1,\\ldots,r\\}\\setminus\\{i_1,\\ldots,i_k\\}} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(E_{n,i_1}\\cap\\cdots\\cap E_{n,i_k}) =\\mathbb{P}(X_1\\in R)\\cdots\\mathbb{P}(X_n\\in R) =\\left(\\frac{r-k}{r}\\right)^n=\\left(1-\\frac{k}{r}\\right)^n. \\]<\/p>\n<p style=\"text-align: justify;\">Now, by the inclusion-exclusion principle (also known as the Poincar&eacute; sieve), for every \\( {n\\geq r } \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(T&gt;n) = \\sum_{k=1}^r(-1)^{k-1}\\binom{r}{k}\\left(1-\\frac{k}{r}\\right)^n. \\]<\/p>\n<p style=\"text-align: justify;\">Unfortunately, the signs are alternating. Actually, if we write<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(T&gt;n)\\leq \\sum_{i=1}^r \\mathbb{P}(E_{n,i}) \\]<\/p>\n<p style=\"text-align: justify;\">and if we note that<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(E_{n,i})=\\left(1-\\frac{1}{r}\\right)^n\\leq e^{-n\/r} \\]<\/p>\n<p style=\"text-align: justify;\">then, taking \\( {n=\\lceil t r\\log(r)\\rceil} \\), we obtain that for all real \\( {t&gt;0} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(T&gt; \\lceil tr\\log(r)\\rceil) \\leq r^{-t+1}. \\]<\/p>\n<p style=\"text-align: justify;\">It turns out that the asymptotic fluctuations of \\( {T} \\) are Gumbel.<\/p>\n<blockquote style=\"background: white; border: solid thick #e4e5e7; text-align: justify; padding-left: 1em;\"><p><b>Theorem 1 (Gumbel fluctuations)<\/b> <em>We have<\/em> <\/p>\n<p style=\"text-align: center;\"><em>\\[ \\frac{T-r\\log(r)}{r} \\underset{r\\rightarrow\\infty}{\\overset{\\mathrm{law}}{\\longrightarrow}} G \\]<\/em><\/p>\n<p> <em>where \\( {G} \\) is the Gumbel law on \\( {\\mathbb{R}} \\) with density \\( {t\\mapsto e^{-e^{-t}-t}} \\).<\/em><\/p><\/blockquote>\n<p style=\"text-align: justify;\"><em>Proof:<\/em> Let us fix \\( {t\\in\\mathbb{R}} \\) and let us show that<\/p>\n<p style=\"text-align: center;\">\\[ \\lim_{r\\rightarrow\\infty}\\mathbb{P}(T&gt; r\\log(r)+tr)=S(t)=1-e^{-e^{-t}}. \\]<\/p>\n<p style=\"text-align: justify;\">Fix \\( {t\\in\\mathbb{R}} \\) and take \\( {r} \\) large enough such that \\( {r\\log(r)+tr&gt;r} \\). Set \\( {n_{t,r}=r\\log(r)+tr} \\) if \\( {r\\log(r)+tr\\in\\mathbb{N}} \\) and \\( {n_{t,r}=\\lceil r\\log(r)+tr\\rceil} \\) if not. We already know that<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(T&gt; r\\log(r)+tr) = \\sum_{k=1}^r(-1)^{k-1}\\binom{r}{k} \\left(1-\\frac{k}{r}\\right)^{n_{t,r}}. \\]<\/p>\n<p style=\"text-align: justify;\">Now, since \\( {\\binom{r}{k}\\leq r^kk!} \\) and \\( {(1-u)\\leq e^{-u}} \\) for all \\( {u\\geq0} \\), we have<\/p>\n<p style=\"text-align: center;\">\\[ \\binom{r}{k}\\left(1-\\frac{k}{r}\\right)^{n_{t,r}} \\overset{\\leq}{\\underset{r\\rightarrow\\infty}{\\longrightarrow}} \\frac{e^{-kt}}{k!}. \\]<\/p>\n<p style=\"text-align: justify;\">By dominated convergence, we obtain finally<\/p>\n<p style=\"text-align: center;\">\\[ \\lim_{r\\rightarrow\\infty} \\sum_{k=1}^r(-1)^{k-1}\\binom{r}{k} \\left(1-\\frac{k}{r}\\right)^{n_{t,r}} =\\sum_{k=1}^\\infty(-1)^{k-1}\\frac{e^{-kt}}{k!}=S(t). \\]<\/p>\n<p style=\"text-align: justify;\">$\\Box$<\/p>\n<ul>\n<li>The classical coupon collector problem is considered in e.g. <a href=\"http:\/\/www.ams.org\/mathscinet-getitem?mr=228020\">Feller (volume I)<\/a><\/li>\n<li>Another alternating signs formula for the tail can be found in e.g. <a href= \"http:\/\/www.ams.org\/mathscinet-getitem?mr=2480787\">Fulman (2009)<\/a><\/li>\n<li>The proof of the Gumbel fluctuation is taken from <a href= \"http:\/\/www.ams.org\/mathscinet-getitem?mr=1344451\">Motwani and Raghavan (section 3.2)<\/a><\/li>\n<li>The non uniform coupon collector is nicely studied by e.g. <a href=\"http:\/\/www.ams.org\/mathscinet-getitem?mr=959649\">Holst (1986)<\/a><\/li>\n<\/ul>\n<p style=\"text-align: justify;\">Actually, the coupon collector is a special instance of a more general model. Namely, if \\( {(X_n)_{n\\geq0}} \\) is a sequence of random variables, not necessarily independent or of the same law, taking their values in a finite set \\( {E} \\), then we define the <em>cover time<\/em> by<\/p>\n<p style=\"text-align: center;\">\\[ T=\\inf\\{n\\geq0:\\{X_0,\\ldots,X_n\\}=E\\}. \\]<\/p>\n<p style=\"text-align: justify;\">The cover time of finite Markov chains or random walks on finite graphs was extensively studied. For recent results, see e.g. <a href=\"http:\/\/arxiv.org\/abs\/1004.4371\">Ding, Lee, Peres (2010)<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Let \\( {X_1,X_2,\\ldots} \\) be i.i.d. random variables uniformly distributed on \\( {\\{1,\\ldots,r\\}} \\). The classical coupon collector problem consists in looking at the random&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2010\/10\/20\/back-to-basics-coupon-collector\/\">Continue reading<span class=\"screen-reader-text\">Back to basics - Coupon collector<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":424},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/937"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=937"}],"version-history":[{"count":35,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/937\/revisions"}],"predecessor-version":[{"id":13812,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/937\/revisions\/13812"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=937"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=937"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=937"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}