This post is devoted to Pólya urns, probably the simplest stochastic model of reinforcement. At time \( {n=0} \), we prepare an urn containing \( {a>0} \) azure balls and \( {b>0} \) black balls. To produce the configuration of the urn at time \( {n=1} \), we draw at random a ball in the urn, and then we replace this ball in the urn together with a new ball of the same color. We repeat this mechanism independently to produce the configuration of the urn at any time \( {n\in\mathbb{N}} \).
Let \( {M_n\in[0,1]} \) be the proportion of azure balls in the urn at time \( {n\in\mathbb{N}} \). At time \( {n\in\mathbb{N}} \), the urn contains \( {a+b+n} \) balls, and \( {(a+b+n)M_n} \) azure balls. Conditional on \( {M_n} \), an azure ball is added to the urn at time \( {n+1} \) with probability \( {M_n} \). Hence
\[ M_0=\frac{a}{a+b} \quad\text{and}\quad M_{n+1}=\frac{(a+b+n)M_n+\mathbf{1}_{\{U_{n+1}\leq M_n\}}}{a+b+n+1} \]
where \( {{(U_n)}_{n\geq1}} \) are iid uniform random variables on \( {[0,1]} \), and the event \( {\{U_{n+1}\leq M_n\}} \) corresponds to add an azure ball. The stochastic recursive sequence \( {{(M_n)}_{n\geq0}} \) is a non-homogeneous Markov chain with state space \( {[0,1]} \), and also a martingale.
If we encode the result of the \( {n} \) drawing by a random variable \( {X_n} \) with values in \( {\{\alpha,\beta\}} \) where \( {\alpha} \) and \( {\beta} \) indicate respectively that the ball is azure or black. Let \( {Y_n} \) and \( {Z_n} \) be respectively the number of azure and black balls added to the urn after the first \( {n} \) drawing. At time \( {n} \), the urn contains
\[ a+Y_n+b+Z_n=a+b+n \]
balls, more precisely the number of azure balls is
\[ a+Y_n=a+\sum_{k=1}^n\mathbf{1}_{\{X_k=\alpha\}}=(a+b+n)M_n \]
while the number of black balls is
\[ b+Z_n=b+\sum_{k=1}^n\mathbf{1}_{\{X_k=\beta\}}. \]
We have \( {Y_n+Z_n=n} \). The sequence \( {{(X_n,Y_n,Z_n)}_{n\geq1}} \) satisfies the stochastic recursion
\[ (X_{n+1},Y_{n+1},Z_{n+1}) = \left\{ \begin{array}{rl} (\alpha,Y_n+1,Z_n) & \mbox{if } U_{n+1}\leq\frac{a+Y_n}{a+b+n},\\ (\beta,Y_n,Z_n+1) & \mbox{if } U_{n+1}>\frac{a+Y_n}{a+b+n}, \end{array} \right. \]
with by convention \( {Y_0=0} \) and \( {Z_0=0} \) (the value of \( {X_0} \) is useless).
Martingale. The sequence \( {{(M_n)}_{n\geq0}} \) is a martingale taking its values in \( {[0,1]} \) for the filtration \( {{(\mathcal{F}_n)}_{n\geq0}} \) defined by \( {\mathcal{F}_n=\sigma(U_1,\ldots,U_n)} \). Indeed \( {M_n\in L^1(\mathcal{F}_n)} \) and
\[ \mathbb{E}(M_{n+1}\,|\,\mathcal{F}_n) =\mathbb{E}(M_{n+1}\,|\,M_n) =\frac{(a+b+n)M_n+M_n}{a+b+n+1} =M_n. \]
In particular we have the following conservation law: for all \( {n\in\mathbb{N}} \),
\[ \mathbb{E}(M_n)=\mathbb{E}(M_0)=\frac{a}{a+b}. \]
It follows from martingale limit theorems (the martingale is uniformly bounded and thus uniformly integrable!) that there exists a random variable \( {M_\infty} \) on \( {[0,1]} \) such that
\[ \lim_{n\rightarrow\infty}M_n = M_\infty \]
almost surely and in \( {L^p} \) for any \( {p\geq1} \). In particular \( {\mathbb{E}(M_\infty)=\mathbb{E}(M_0)=\frac{a}{a+b}} \).
Limiting distribution. The random variable \( {M_\infty} \) follows the Beta distribution on \( {[0,1]} \) with parameter \( {(a,b)} \) and density
\[ u\in[0,1]\mapsto\frac{u^{a-1}(1-u)^{b-1}}{\mathrm{Beta}(a,b)} \quad\text{where}\quad \mathrm{Beta}(a,b):=\int_0^1\!p^{a-1}(1-p)^{b-1}\,dp. \]
In particular if \( {a=b=1} \) then \( {M_\infty} \) follows the uniform distribution on \( {[0,1]} \).
To see it let us recall first of all that
\[ \mathrm{Beta}(a,b) =\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} \quad\text{where}\quad \Gamma(x):=\int_0^\infty\!t^{x-1}e^{-x}\,dx. \]
For any \( {c} \) and \( {k} \) we set
\[ c^{(k)}=c(c+1)\cdots(c+k-1)=\frac{(c+k-1)!}{(c-1)!}=\frac{\Gamma(c+k)}{\Gamma(c)}. \]
Now for any \( {x_1,\ldots,x_n} \) in \( {\{0,1\}} \), we have, denoting \( {k=x_1+\cdots+x_n} \),
\[ \mathbb{P}(\mathbf{1}_{\{X_1=\alpha\}}=x_1,\ldots,\mathbf{1}_{\{X_n=\alpha\}}=x_n) =\frac{a^{(k)}b^{(n-k)}}{(a+b)^{(n)}}. \]
This probability is invariant by permutation of \( {x_1,\ldots,x_n} \): this means that the distribution of the random vector \( {(\mathbf{1}_{\{X_1=\alpha\}},\ldots,\mathbf{1}_{\{X_n=\alpha\}})} \) is exchangeable. Hence the random variable
\[ Y_n=\sum_{k=1}^n\mathbf{1}_{\{X_k=\alpha\}} \]
satisfies, for any \( {k\in\{0,1,\ldots,n\}} \),
\[ \begin{array}{rcl} \mathbb{P}(Y_n=k) &=&\binom{n}{k}\frac{a^{(k)}b^{(n-k)}}{(a+b)^{(n)}}\\ &=&\binom{n}{k} \frac{\Gamma(a+k)\Gamma(b+n-k)\Gamma(a+b)}{\Gamma(a)\Gamma(b)\Gamma(a+b+n)}\\ &=&\binom{n}{k} \frac{\mathrm{Beta}(a+k,b+n-k)}{\mathrm{Beta}(a,b)}\\ &=&\int_0^1\! \binom{n}{k}p^{k}(1-p)^{n-k}\frac{p^{a-1}(1-p)^{b-1}}{\mathrm{Beta}(a,b)}\,dp. \end{array} \]
We say that \( {Y_n} \) follows the Beta-binomial distribution, which is a a mixture of binomial laws of size \( {n} \) with a parameter \( {p} \) which follows the Beta distribution of parameter \( {(a,b)} \). We have \( {M_n=(a+Y_n)/(a+b+n)} \) with \( {Y_0=0} \). When \( {a=b=1} \), then formula for the law of \( {Y_n} \) indicates that \( {Y_n} \) is uniform on \( {\{0,1,\ldots,n\}} \), and thus \( {M_n} \) is uniform on \( {\{1/(n+2),\ldots,(n+1)/(n+2)\}} \), which implies that \( {M_\infty} \) is uniform on \( {[0,1]} \). In the general case, for any \( {t\in[0,1]} \),
\[ \begin{array}{rcl} \mathbb{P}(M_\infty\leq t) &=&\lim_{n\rightarrow\infty}\mathbb{P}(Y_n\leq(a+b+n)t-a)\\ &=&\cdots =\frac{1}{\mathrm{Beta}(a,b)}\int_0^t\!u^{a-1}(1-u)^{b-1}\,du. \end{array} \]
The computation is based on the so-called Beta-binomial correspondence, which states that if \( {U_1,\ldots,U_n} \) are iid and uniform on \( {[0,1]} \) and if \( {U_{(1)}\leq\cdots\leq U_{(n)}} \) is their reordering, then for any \( {k=1,\ldots,n} \) the random variable \( {U_{(k)}} \) follows the Beta distribution on \( {[0,1]} \) with parameter \( {(k,n-k+1)} \) and
\[ \mathbb{P}(S_n\geq k) =\mathbb{P}(\mathbf{1}_{\{U_1\leq p\}}+\cdots+\mathbf{1}_{\{U_n\leq p\}}\geq k) =\mathbb{P}(U_{(k)}\leq p). \]
From discrete to continuous. The results remain valid when \( {a} \) and \( {b} \) are positive reals. This motivates the usage of Beta and Gamma functions instead of factorials. In this case we may use an interval partition instead of an urn.
General urns and sampling schemes. We generalized the reinforcement mechanism as follows: we fix an integer \( {r\geq-1} \), and, at each drawing, we replace in the urn \( {1+r} \) balls of the drawn color.
- if \( {r=1} \) then we recover the Pólya urn studied above for which \( {Y_n} \) follows the Beta-binomial distribution;
- if \( {r=0} \) then we recover a sampling with replacement and \( {Y_n} \) follows then the binomial distribution;
- if \( {r=-1} \) then we recover a sampling without replacement and \( {Y_n} \) follows the hypergeometric distribution;
- more generally, for any \( {r\geq-1} \) and \( {k\in\{0,1,\ldots,n\}} \), we have
\[ \mathbb{P}(Y_n=k)=\frac{a^{(r,k)}b^{(r,k)}}{(a+b)^{(r,k)}} \quad\text{where}\quad c^{(r,k)}:=c(c+r)\cdots(c+(k-1)r) \]
and in this case \( {M_\infty} \) follows the Beta distribution of parameter \( {(a/r,b/r)} \).
It is also possible to consider an arbitrary number of colors, which produces a model which includes as a particular case the sampling model of the multivariate hypergeometric distribution. It is also possible to use a replacement matrix as follows: fix a matrix
\[ A={(A_{i,j})}_{1\leq i,j\leq k}\in\mathcal{M}_{k}(\mathbb{N}) \]
and consider the generalized Pólya urn with \( {k} \) colors defined as follows: we draw a ball at random, we denote by \( {i} \) its color, and then we replace in the urn \( {A_{i,j}} \) balls of color \( {j} \) for any \( {j=1,\ldots,k} \). The standard Pólya urn corresponds to \( {k=2} \) and \( {A=2I_2} \).
Rubin’s theorem on generalized Pólya urns. We consider two non-decreasing functions
\[ S_\alpha,S_\beta:\mathbb{N}\rightarrow\mathbb{R}_+ \]
such that \( {S_\alpha(0)>0} \) and \( {S_\beta(0)>0} \). We construct a random recursive sequence \( {{(X_n)}_{n\geq1}} \) taking its values in \( {\{\alpha,\beta\}} \) as follows: for any \( {n\in\mathbb{N}} \), conditional on \( {X_1,\ldots,X_n} \), denoting \( {Y_n=\sum_{k=1}^n\mathbf{1}_{\{X_k=\alpha\}}} \) and \( {Z_n=n-Y_n} \), we set
\[ (X_{n+1},Y_{n+1},Z_{n+1})= \left\{ \begin{array}{rl} (\alpha,Y_n+1,Z_n) & \text{if } U_{n+1}\leq \frac{S_\alpha(Y_n)}{S_\alpha(Y_n)+S_\beta(Z_n)}, \\ (\beta,Y_n,Z_n+1) & \text{if } U_{n+1}>\frac{S_\alpha(Y_n)}{S_\alpha(Y_n)+S_\beta(Z_n)}, \end{array} \right. \]
where \( {{(U_n)}_{n\geq1}} \) are iid uniform random variables on \( {[0,1]} \). Such a sequence \( {{(X_n)}_{n\geq1}} \) encodes the drawings of a generalized Pólya urn. If \( {(S_\alpha(0),S_\beta(0))=(A_0,B_0)} \) is deterministic and \( {S_\alpha(n)=S_\beta(n)=r(n)} \) for any \( {n>0} \), then we say that we have
- no reinforcement when \( {r} \) is linear (sampling with replacement);
- linear reinforcement when \( {r} \) is affine (standard Pólya urn);
- geometric reinforcement when \( {r} \) is a power function.
To study the general case, we introduce the probability \( {p_\alpha} \) (respectively \( {p_\beta} \)) that the sequence \( {{(X_n)}_{n\geq1}} \) contains a finite number of \( {\beta} \) (respectively of \( {\alpha} \)) in other words only \( {\alpha} \) (respectively \( {\beta} \)) for large enough \( {n} \). These probabilities are given by
\[ p_\alpha =\mathbb{P}\left(\cup_n\cap_{k\geq n}\{X_k=\alpha\}\right) \quad\text{and}\quad p_\beta =\mathbb{P}\left(\cup_n\cap_{k\geq n}\{X_k=\beta\}\right). \]
We have
\[ p_\alpha+p_\beta\leq1. \]
The following numbers in \( {[0,\infty]} \) will play an crucial role:
\[ \varphi_\alpha=\sum_{n=0}^\infty\frac{1}{S_\alpha(n)} \quad\text{and}\quad \varphi_\beta=\sum_{n=0}^\infty\frac{1}{S_\beta(n)}. \]
We are now ready to state Rubin‘s theorem: \( {p_\alpha,p_\beta} \) are linked to \( {\varphi_\alpha,\varphi_\beta} \) via
\[ \begin{array}{c||c|c} & \varphi_\beta<\infty & \varphi_\beta=\infty \\\hline\hline \varphi_\alpha<\infty & p_\alpha>0, p_\beta>0, p_\alpha+p_\beta=1 & p_\alpha=1 (\Rightarrow p_\beta=0)\\\hline \varphi_\alpha=\infty & p_\beta=1 (\Rightarrow p_\alpha=0) & p_\alpha=0, p_\beta=0\\ \end{array} \]
In particular the triviality of \( {p_\alpha} \) and \( {p_\beta} \) depends only on the asymptotic behavior of the reinforcement functions \( {S_\alpha} \) and \( {S_\beta} \), and the critical cases are related to the Riemann condition. In the case of absence of reinforcement (\( {S_\alpha} \) and \( {S_\beta} \) are linear) and in the case of linear reinforcement (\( {S_\alpha} \) and \( {S_\beta} \) are affine) we have \( {\varphi_\alpha=\varphi_\beta=\infty} \) since the harmonic series diverges. As soon as the reinforcement is over-linear, we get \( {\varphi_\alpha<\infty} \) and \( {\varphi_\beta<\infty} \), and this is the case in particular for the quadratic reinforcement and more generally for the geometric reinforcement.
Proof of Rubin’s theorem. Let \( {{(E^\alpha_n)}_{n\geq0}} \) and \( {{(E^\beta_n)}_{n\geq0}} \) be two independent sequences of independent random variables with, for any \( {n\geq0} \),
\[ \mathbb{E}(E^\alpha_n)=\frac{1}{S_\alpha(n)} \quad\text{and}\quad \mathbb{E}(E^\beta_n)=\frac{1}{S_\beta(n)} \]
Now let us define the random sets
\[ \mathcal{A}=\left\{\sum_{k=0}^n E^\alpha_k:n\geq0\right\} \quad\text{and}\quad \mathcal{B}=\left\{\sum_{k=0}^n E^\beta_k:n\geq0\right\} \quad\text{and}\quad \mathcal{G}=\mathcal{A}\cup\mathcal{B}. \]
Let \( {\xi_0<\xi_1<\cdots} \) be the reordering of the elements of \( {\mathcal{G}} \). We consider the random sequence \( {{(X_n’)}_{n\geq1}} \) taking values in \( {\{\alpha,\beta\}} \) defined by \( {X_n’=\alpha} \) if \( {\xi_{n-1}\in\mathcal{A}} \) and \( {X_n’=\beta} \) if \( {\xi_{n-1}\in\mathcal{B}} \). The sequences \( {{(X_n’)}_{n\geq1}} \) and \( {{(X_n)}_{n\geq1}} \) have same distribution, and this follows from the properties of exponential distributions (including the lack of memory). Namely, let us examine the equality in distribution of \( {X_1} \) and \( {X’_1} \). If \( {U} \) and \( {V} \) are two independent exponential random variables with respective means \( {1/u} \) and \( {1/v} \) then \( {\mathbb{P}(U<V)=u/(u+v)} \) and \( {\mathbb{P}(V<U)=v/(u+v)} \), which gives
\[ \mathbb{P}(X_1’=\alpha) =\mathbb{P}(\xi_0\in\mathcal{A}) =\mathbb{P}(E^\alpha_0\leq E^\beta_0) =\frac{S_\alpha(0)}{S_\alpha(0)+S_\beta(0)} =\mathbb{P}(X_1=\alpha). \]
The same idea gives (tedious!) the equality in distribution of \( {{(X_n)}_{n\geq1}} \) and \( {{(X_n’)}_{n\geq1}} \). Next, by the zero-one law for the sum of independent exponential random variables,
\[ p:=\mathbb{P}\left(\sum_{n=0}^\infty E^\alpha_n<\infty\right)\in\{0,1\}, \]
with \( {p=1} \) if and only if \( {\varphi_\alpha<\infty} \). The same property holds for the sequence \( {{(E^\beta_n)}_{n\geq0}} \) with \( {\varphi_\beta} \). On the other hand we have the formulas
\[ p_\beta =\mathbb{P}\left(\sum_{n=0}^\infty E^\alpha_n<\sum_{n=0}^\infty E^\beta_n\right) \quad\text{and}\quad p_\alpha =\mathbb{P}\left(\sum_{n=0}^\infty E^\beta_n<\sum_{n=0}^\infty E^\alpha_n\right). \]
Notes and further reading. The content of this post is extracted and translated from a book written with Florent Malrieu in French. Pólya urns are named after the Hungarian mathematician George Pólya. We refer to the books by Hosam Mahmoud, by Norman Johnson and Samuel Kotz, and to the survey articles by Samuel Kotz and Narayanaswamy Balakrishnan, and by Robin Pemantle. Rubin’s theorem can be found in the appendix of a paper by Burgess Davis (suggested to us by Amine Asselah). Pólya urns are fundamental models of reinforcement. They play an important role in learning theory. They are also connected to a class of stochastic algorithms such as the stochastic approximation algorithm of Robbins-Monro, linked with the theory of dynamical systems and attractors. The concept of martingale is a central tool in the stochastic approach of game theory and strategies.