Recently, I had to teach quickly basic knowledge about martingales. What we should know at minimum about martingales? Well, numerous things, indeed.
Martingales are generalizations of sums of independent random variables (random walks) in which each new increment is in a way conditionally orthogonal to the past increments. Indeed, a sequence of random variables \( {M={(M_n)}_{n\geq0}} \) is a martingale with respect to a filtration \( {\mathcal{F}={(\mathcal{F})}_{n\geq0}} \) when for any \( {n\geq0} \), \( {M_n\in L^1(\mathcal{F}_n)} \), and
\[ \mathbb{E}(M_{n+1}-M_n|\mathcal{F}_n)=0. \]
If so then we have the conservation law of the mean:
\[ \mathbb{E}(M_0)=\mathbb{E}(M_1)=\cdots \]
If \( {M_n\in L^2} \) for every \( {n\geq0} \) then for any \( {n\geq0} \),
\[ M_{n+1}-M_n\perp L^2(\mathcal{F}_n). \]
Basic examples of martingales. (guess the filtration):
- Sum. \( {S_0=0} \), \( {S_n=X_1+\cdots+X_n} \), \( {{(X_n)}_{n\geq1}} \) independent with mean \( {m} \)
\[ M_n=S_n-nm; \]
- Product. \( {P_0=1} \), \( {P_n=X_1\cdots X_n} \), \( {{(X_n)}_{n\geq1}} \) independent, \( {\geq0} \) with mean \( {m} \)
\[ M_n=\frac{P_n}{m^n}; \]
- Weights. \( {S_{-1}=0} \), \( {{(S_n)}_{n\geq0}} \) martingale, \( {H_n} \) \( {\mathcal{F}_{n-1}} \)-measurable
\[ M_n=\sum_{k=0}^n H_k(S_k-S_{k-1}); \]
- Squared sum. \( {S_0=0} \), \( {S_n=X_1+\cdots+X_n} \), \( {{(X_n)}_{n\geq1}} \) iid mean \( {0} \) variance \( {\sigma^2} \)
\[ M_n=S_n^2-n\sigma^2; \]
- Exponential. \( {S_0=0} \), \( {S_n=X_1+\cdots+X_n} \), \( {{(X_n)}_{n\geq1}} \) iid Rademacher \( {p\delta_1+q\delta_{-1}} \)
\[ M_n=\left(\frac{q}{p}\right)^{S_n}; \]
More generally if \( {S_0=x} \), \( {S_n=x+X_1+\cdots+X_n} \), \( {\alpha>0} \), \( {\varphi(\alpha)=\log(\mathbb{E}(e^{-\alpha X_1}))} \)
\[ M_n=\exp\left(-\alpha S_n-n\varphi(\alpha)\right); \]
- Wright-Fisher model. \( {X_0\in[0,N]} \), \( {\mathrm{Law}(X_{n+1}|X_n,\ldots,X_0)=\mathrm{Binom}(N,X_n/N)} \)
\[ M_n=X_n; \]
- Polya urn model. \( {{(U_n)}_{n\geq0}} \) iid uniform on \( {[0,1]} \), \( {(A_0,B_0)=(1,1)} \),
\[ (A_{n+1},B_{n+1})=(A_n+\mathbf{1}_{U_n\leq M_n},B_n+\mathbf{1}_{U_n>M_n}), \]
\[ M_n=\frac{A_n}{A_n+B_n}; \]
- Galton-Watson process. \( {{(Z_n)}_{n\geq0}} \) Galton-Watson, \( {m=\mathbb{E}(Z_1)} \), \( {\mathbb{E}(\rho^{Z_1})=\rho\leq1} \),
\[ M_n=\rho^{Z_n}; \]
and
\[ M_n=\frac{Z_n}{m^n}; \]
- Markov chain. Markov chain \( {{(X_n)}_{n\geq0}} \) on countable state space \( {E} \), \( {f:E\rightarrow\mathbb{R}} \) harmonic for the generator \( {L} \) of \( {X} \) (means that \( {Lf=0} \));
\[ M_n=f(X_n)-f(X_0). \]
This includes some of the examples above in a single concept. We have also discussed some relations to Markov chains in a previous post.
If \( {T} \) is a stopping time for \( {\mathcal{F}} \) taking its values in \( {\{0,1,\ldots\}\cup\{\infty\}} \) then the optional stopping theorem states that the stopped process
\[ {(M_{\min(n,T)})}_{n\geq0} \]
is again a martingale, and in particular by the conservation law, for any \( {n\geq0} \),
\[ \mathbb{E}(M_0)=\mathbb{E}(M_{\min(n,T)}). \]
Now since \( {\min(n,T)\rightarrow T} \) almost surely on \( {\{T<\infty\}} \), then \( {M_{\min(n,T)}\rightarrow M_T} \) almost surely on \( {\{T<\infty\}} \). In order to deduce that
\[ \mathbb{E}(M_0)=\mathbb{E}(M_T), \]
we need that \( {T<\infty} \) almost surely and a way to pass from almost sure convergence to convergence of expectations. It works for example by dominated convergence if \( {T<\infty} \) almost surely and \( {M} \) is uniformly integrable, for instance if \( {\sup_{n\geq0}|M_n|\in L^1} \). It turns out also that it also works if \( {T\in L^\infty} \), or if \( {T<\infty} \) almost surely and \( {M} \) has uniformly bounded oscilliations in the sense that \( {\sup_{n\geq0}|M_{n+1}-M_n|\in L^\infty} \).
The optional stopping theorem is useful to extract information about a stopping time. For instance, let \( {X={(X_n)}_{n\geq1}} \) be iid Bernoulli \( {p\delta_1+q\delta_0} \) with \( {p>0} \), and let
\[ T=\inf\{n\geq2:X_n=X_{n-1}=X_{n-2}=1\} \]
be the first time of occurrence of the sequence \( {111} \) in \( {X} \). Let \( {S_0=0} \) and
\[ S_{n+1}=(S_n+1)\frac{X_{n+1}}{p}. \]
Then the random sequence \( {M={(M_n)}_{n\geq0}} \) defined by \( {M_n=S_n-n} \) for any \( {n\geq0} \) is a martingale. Now, the stopping time \( {T} \) is almost surely finite since \( {T\leq T’} \) with \( {T’=\inf\{n\geq0:(X_{3n},X_{3n+1},X_{3n+2})=(1,1,1)\}} \), and \( {T’} \) follows the geometric law of parameter \( {p^3>0} \) (coin tossing game). Next, by using the optional stopping theorem and then the monotone and the dominated convergence theorems, we obtain
\[ \mathbb{E}(S_T)=\mathbb{E}(T). \]
On the other hand we have by definition of \( {S} \) and \( {T} \) that
\[ S_T=\frac{1}{p}\left(1+\frac{1}{p}\left(1+\frac{1}{p}\right)\right). \]
In this example, the martingale is purely instrumental, and provides at the end a formula for \( {\mathbb{E}(T)} \). By using the same approach for the Gambler’s ruin problem, one can use the squared sum martingale in order to get the expectation of the ruin time, and even the exponential martingale in order to get the Laplace transform of the ruin time!
Martingales are a stochastic analogue of constant sequences: they are constant in expectation. Similarly, we may define sub-martingales, which are growing in expectation (they are below=sub a limit), and super-martingales, which are decreasing in expectation (they are above=sup a limit). Indeed if a sequence of random variables \( {M={(M_n)}_{n\geq0}} \) satisfies to \( {M_n\in L^1(\mathcal{F}_n)} \) for every \( {n\geq0} \), then it is
- a sub-martingale when for every \( {n\geq0} \),
\[ \mathbb{E}(M_{n+1}-M_n|\mathcal{F}_n)\geq0. \]
- a super-martingale when for every \( {n\geq0} \),
\[ \mathbb{E}(M_{n+1}-M_n|\mathcal{F}_n)\leq0. \]
By the Jensen inequality if \( {M} \) is a martingale such that \( {M_n\in L^2} \) for every \( {n\geq0} \) then \( {{(M_n^2)}_{n\geq0}} \) is a sub-martingale. If one defines \( {\left<M\right>={(\left<M\right>_n)}_{n\geq0}} \) by \( {\left<M\right>_0=0} \) and
\[ \left<M\right>_{n+1}-\left<M\right>_n =\mathbb{E}((M_{n+1}-M_n)^2|\mathcal{F}_n) =\mathbb{E}(M_{n+1}^2-M_n^2|\mathcal{F}_n) \]
then \( {{(M_n^2-\left<M\right>_n)}_{n\geq0}} \) is a martingale. The process \( {\left<M\right>} \) is called the increasing process of \( {M} \) or the quadratic variation of \( {M} \). It is also called the compensator of the sub-martingale \( {{(M_n^2)}_{n\geq0}} \). This process \( {\left<M\right>} \) is non-negative, non-decreasing, predictable, integrable. For example, if \( {M_n=X_1+\cdots+X_n} \) where \( {{(X_n)}_{n\geq0}} \) are iid with mean \( {0} \) and variance \( {\sigma^2} \), then \( {\left<M\right>_n=n\sigma^2} \).
Given that martingales are generalizations of sums of independent centered random variables, one may ask about some extension of the Doob maximal inequality, the law of large numbers and the central limit theorem.
The Doob maximal inequality for martingales allows to control the maximum by the terminal value. It states that if \( {M} \) is a \( {\geq0} \) martingale bounded in \( {L^p} \), \( {p>1} \), then
\[ \left\Vert \sup_{0\leq k\leq n}M_k\right\Vert_p \leq \frac{p}{p-1}\left\Vert M_n\right\Vert_p. \]
The basic theorems of convergence for martingales are the following:
- If \( {M} \) is a martingale bounded in \( {L^p} \), with \( {p>1} \), meaning that \( {\sup_{n\geq0}\mathbb{E}(|M_n|^p)<\infty} \), then there exists a random variable \( {M_\infty\in L^p} \) such that
\[ \lim_{n\rightarrow\infty}M_n=M_\infty \]
almost surely and in \( {L^p} \). Here the almost sure convergence comes from the Doob maximal inequality for martingales and the Borel-Cantelli lemma, while the \( {L^p} \) convergence comes from the Cauchy criterion or from the uniform integrability which comes in turn from the boundedness in \( {L^p} \) with \( {p>1} \). Furthermore we have \( {M_n=\mathbb{E}(M_\infty|\mathcal{F}_n)} \);
- If \( {M} \) is a sub-martingale such that \( {{(M_n^+)}_{n\geq0}} \) is bounded in \( {L^1} \), meaning that \( {\sup_{n\geq0}\mathbb{E}(\max(0,M_n))<\infty} \), then there exists a random variable \( {M_\infty\in L^1} \) such that
\[ \lim_{n\rightarrow\infty}M_n=M_\infty \]
almost surely. In particular, this holds for instance if \( {M} \) is a \( {\geq0} \) martingale since in this case \( {\mathbb{E}(\max(0,M_n))=\mathbb{E}(M_n)=\mathbb{E}(M_0)} \). The boundedness in \( {L^1} \) is not enough to ensure uniform integrability. The convergence holds in \( {L^1} \) if and only if \( {M} \) is uniformly integrable (counter example: critical Galton-Watson process, tends almost surely to \( {0} \) while keeping a mean of \( {1} \)). Other remarkable special cases are when \( {M} \) is a sub-martingale bounded above by a constant, or a super-martingale bounded below by a constant.
The law of large numbers for martingales states that if \( {M} \) is a martingale such that \( {M_n\in L^2} \) for any \( {n\geq0} \) and if \( {\left<M\right>_\infty=\lim_{n\rightarrow\infty}\left<M_n\right>} \) then there exists \( {M_\infty\in L^2} \) such that
- on \( {\{\left<M\right>_\infty<\infty\}} \), we have, almost surely,
\[ \lim_{n\rightarrow\infty} M_n=M_\infty; \]
- on \( {\{\left<M\right>_\infty=\infty\}} \), we have, almost surely,
\[ \lim_{n\rightarrow\infty}\frac{M_n}{\left<M\right>_n}=0, \]
and various refinements are possible here, such as, almost surely, for any \( {\varepsilon>0} \),
\[ \frac{M_n^2}{\left<M\right>_n}=o((\log(\left<M\right>_n))^{1+\varepsilon}). \]
The central limit theorem for martingales states that if \( {M} \) is a martingale such that \( {M_n\in L^2} \) for any \( {n\geq0} \) and if \( {{(a_n)}_{n\geq0}} \) is a deterministic sequence such that
- \( {a_n\geq0} \) for any \( {n\geq0} \) and \( {a_n\nearrow\infty} \) as \( {n\rightarrow\infty} \);
- there exists a real number \( {\ell\geq0} \) such that
\[ \frac{\left<M\right>_n}{a_n}\overset{\mathbb{P}}{\underset{n\rightarrow\infty}{\longrightarrow}}\ell; \]
- the so-called Lindeberg condition is satisfied: for any \( {\varepsilon>0} \),
\[ \frac{1}{a_n}\sum_{k=1}^n\mathbb{E}(|M_{k}-M_{k-1}|^2\mathbf{1}_{|M_{k+1}-M_k|\geq\varepsilon\sqrt{a_n}}|\mathcal{F}_{k-1}) \overset{\mathbb{P}}{\underset{n\rightarrow\infty}{\longrightarrow}}0; \]
then we have
\[ \frac{M_n}{\sqrt{a_n}} \overset{\mathrm{law}}{\underset{n\rightarrow\infty}{\longrightarrow}} \mathcal{N}(0,\ell), \]
and moreover if \( {\ell>0} \) then
\[ \sqrt{a_n}\frac{M_n}{\sqrt{\left<M\right>_n}} \overset{\mathrm{law}}{\underset{n\rightarrow\infty}{\longrightarrow}} \mathcal{N}\left(0,\ell^{-1}\right). \]
The Lindeberg condition expresses the fact that some sort of global variance at time \( {n} \) is well spread across the increments of the martingale. You may take a look at a former post about such aspects in the case of sums of independent random variables (sparsity and localization). One can satisfy the Lindeberg condition by checking the so-called Lyapunov condition, which is a bit simpler: for some real \( {p>2} \),
\[ \frac{1}{a_n}\sum_{k=1}^n\mathbb{E}\left(|M_{k}-M_{k-1}|^p|\mathcal{F}_{k-1}\right) \overset{\mathbb{P}}{\underset{n\rightarrow\infty}{\longrightarrow}}0. \]
The central limit theorem for martingales provides many examples of Gaussian fluctuations with a speed which is not \( {\sqrt{n}} \). One may think for example about the number of tables in the chinese restaurant process, for which \( {a_n=\theta\log(n)} \).
Many aspects of this post are inspired by my teaching at École Polytechnique, and by the 2007 book co-authored with Bernard Bercu (in French).
Further reading.