Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Simulating heavy tails

How to quickly simulate a positive real random variable $latex X$ with a polynomial tail, say $latex \mathbb{P}(X>t)\sim (1+t)^{-\alpha}$ as $latex t\to\infty$, for some prescribed $latex \alpha>0$. We may use an already coded function provided by  our favorite computational software. Quoting my former PhD advisor, you just have to press the button rand or something similar. But how to proceed from scratch using uniform random numbers? The inversion method suggests to set $latex X=U^{-1/\alpha}-1$ where $latex U$ is a uniform random variable on $latex [0,1].$ The law of $latex X$ is Pareto with probability density function $latex t\in[0,\infty)\mapsto a(1+t)^{-(1+a)}$.  It is amusing to remark that the fractional part of $latex X$ has probability density function

$latex \displaystyle t\in[0,1]\mapsto\alpha\sum_{n=1}^\infty\frac{1}{(n+t)^{1+\alpha}}=\alpha\zeta(\alpha+1,t+1)$

where $latex \zeta$ is the Hurwitz Zeta function. The integer part of $latex X$ follows a Zipf like law.

To simulate an $latex n$-sample of $latex X$ with GNU-Octave one can use rand(1,n).^(-1/alpha)-1. The problem is that on computers, real numbers are replaced by  floats: there is a quantification limit.  Small values of $latex \alpha$ produce a lack of precision for large values, which are frequent for heavy tails…

Regarding the algorithm, it is not wise to use the inversion method to simulate a law on an infinite set (recall that there is no uniform law of such sets).  To simulate $latex X$, one should prefer an excess packing method as in Marsaglia & Tsang Ziggurat algorithm. This alternative approach has the advantage of being faster and a bit more precise (the code is heavier). It is implemented for exponential, Gaussian, and gamma distributions in the  rande, randg, randn functions of GNU-Octave 3.2. If you read French, you may take a look at the first chapter of my pedagogical book with Bernard Bercu. This approach constitutes an instance of  a general meta principle in Computer Science which states that one can speed up algorithms by using static pre-computations, related to the rigid structure of the problem. The fastest algorithm is not necessarily the shortest algorithm in terms of number of code lines.

If you are interested in the simulation of $latex \alpha$-stable distributions, $latex 0<\alpha\leq 2$, you may take a look at the Chambers-Mallows-Stuck exact algorithm which generalizes the well known Box-Muller algorithm for Gaussians (suffers from the quantification limit).

For true (unpredictable) randomness, take a look at and its GNU-R interface 😉

Leave a Comment

Logarithmic potential and Hermitization

The logarithmic potential is a classical object of potential theory intimately connected with the two dimensional Laplacian. It appears also in free probability theory via the free entropy, and in partial differential equations e.g. Patlak-Keller-Segel models. This post concerns only it usage for the spectra of non Hermitian random matrices.

Let \( {\mathcal{P}(\mathbb{C})} \) be the set of probability measures on \( {\mathbb{C}} \) which integrate \( {\log\left|\cdot\right|} \) in a neighborhood of infinity. For every \( {\mu\in\mathcal{P}(\mathbb{C})} \), the logarithmic potential \( {U_\mu} \) of \( {\mu} \) on \( {\mathbb{C}} \) is the function

\[ U_\mu:\mathbb{C}\rightarrow(-\infty,+\infty] \]

defined for every \( {z\in\mathbb{C}} \) by

\[ U_\mu(z)=-\int_{\mathbb{C}}\!\log|z-z’|\,\mu(dz’) =-(\log\left|\cdot\right|*\mu)(z). \ \ \ \ \ (1) \]

This integral in well defined in \( {(-\infty,+\infty]} \) thanks to the assumption \( {\mu\in\mathcal{P}(\mathbb{C})} \). Note that \( {U_\mu(z)=+\infty} \) if \( {\mu} \) has an atom at point \( {z} \). For the circular law \( {\mathcal{U}_\sigma} \) of density

\[ (\sigma^2\pi)^{-1}\mathbf{1}_{\{z\in\mathbb{C}:|z|\leq\sigma\}} \]

we have, for all \( {z\in\mathbb{C}} \), the remarkable formula

\[ U_{\mathcal{U}_\sigma}(z)= \begin{cases} \log(\sigma)-\log|z\sigma^{-1}| & \text{if } |z|>\sigma, \\ \log(\sigma)-\frac{1}{2}(|z\sigma^{-1}|^2-1) & \text{if } |z|\leq\sigma, \end{cases} \ \ \ \ \ (2) \]

see e.g. the book of Saff and Totik. Let \( {\mathcal{D}'(\mathbb{C})} \) be the set of Schwartz-Sobolev distributions (generalized functions). Since \( {\log\left|\cdot\right|} \) is Lebesgue locally integrable on \( {\mathbb{C}} \), one can check by using the Fubini theorem that \( {U_\mu} \) is Lebesgue locally integrable on \( {\mathbb{C}} \). In particular,

\[ U_\mu<\infty \]

Lebesgue almost everywhere (a.e.), and

\[ U_\mu\in\mathcal{D}'(\mathbb{C}). \]

Since \( {\log\left|\cdot\right|} \) is the fundamental solution of the Laplace equation in \( {\mathbb{C}} \), we have, in \( {\mathcal{D}'(\mathbb{C})} \),

\[ \Delta U_\mu=-2\pi\mu. \ \ \ \ \ (3) \]

In other words, for every smooth and compactly supported “test function” \( {\varphi:\mathbb{C}\rightarrow\mathbb{R}} \),

\[ \int_{\mathbb{C}}\!\Delta\varphi(z)U_\mu(z)\,dz =-2\pi\int_{\mathbb{C}}\!\varphi(z)\,\mu(dz). \]

The identity (3) means that \( {\mu\mapsto U_\mu} \) in the distributional Green potential of the Laplacian in \( {\mathbb{C}} \). The function \( {U_\mu} \) contains enough information to recover \( {\mu} \):

Lemma 1 (Unicity) For every \( {\mu,\nu\in\mathcal{P}(\mathbb{C})} \), if \( {U_\mu=U_\nu} \) a.e. then \( {\mu=\nu} \).

Proof: Since \( {U_\mu=U_\nu} \) in \( {\mathcal{D}'(\mathbb{C})} \), we get \( {\Delta U_\mu=\Delta U_\nu} \) in \( {\mathcal{D}'(\mathbb{C})} \). Now (3) gives \( {\mu=\nu} \) in \( {\mathcal{D}'(\mathbb{C})} \), and thus \( {\mu=\nu} \) as measures since \( {\mu} \) and \( {\nu} \) are Radon measures. ☐

Let \( {A} \) be an \( {n\times n} \) complex matrix. We define the discrete probability measure on \( {\mathbb{C}} \)

\[ \mu_A:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(A)} \]

where \( {\lambda_1(A),\ldots,\lambda_n(A)} \) are the eigenvalues of \( {A} \), i.e. the roots in \( {\mathbb{C}} \) of its characteristic polynomial \( {P_A(z):=\det(A-zI)} \). We also define the discrete probability measure on \( {[0,\infty)} \)

\[ \nu_A:=\frac{1}{n}\sum_{k=1}^n\delta_{s_k(A)} \]

where \( {s_1(A),\ldots,s_n(A)} \) are the singular values of \( {A} \), i.e. the eigenvalues of the positive semidefinite Hermitian matrix \( {\sqrt{AA^*}} \). We have now

\[ U_{\mu_A}(z) =-\int_{\mathbb{C}}\!\log\left| z’-z\right|\,\mu_A(dz’) =-\frac{1}{n}\log\left|\det(A-zI)\right| =-\frac{1}{n}\log\left| P_A(z)\right| \]

for every \( {z\in\mathbb{C}\setminus\{\lambda_1(A),\ldots,\lambda_n(A)\}} \). We have also the alternative expression

\[ U_{\mu_A}(z) =-\frac{1}{n}\log\det(\sqrt{(A-zI)(A-zI)^*}) =-\int_0^\infty\!\log(t)\,\nu_{A-zI}(dt). \ \ \ \ \ (4) \]

The identity (4) bridges the eigenvalues with the singular values, and is at the heart of the following lemma, which allows to deduce the convergence of \( {\mu_A} \) from the one of \( {\nu_{A-zI}} \). The strength of this Hermitization lies in the fact that in contrary to the eigenvalues, one can control the singular values with the entries of the matrix. The price payed here is the introduction of the auxiliary variable \( {z} \) and the uniform integrability. We recall that on a Borel measurable space \( {(E,\mathcal{E})} \), we say that a Borel function \( {f:E\rightarrow\mathbb{R}} \) is uniformly integrable for a sequence of probability measures \( {(\eta_n)_{n\geq1}} \) on \( {E} \) when

\[ \lim_{t\rightarrow\infty}\varlimsup_{n\rightarrow\infty}\int_{\{|f|>t\}}\!|f|\,d\eta_n=0. \ \ \ \ \ (5) \]

We will use this property as follows: if \( {(\eta_n)_{n\geq1}} \) converges weakly to \( {\eta} \) and \( {f} \) is continuous and uniformly integrable for \( {(\eta_n)_{n\geq1}} \) then \( {f} \) is \( {\eta} \)-integrable and \( {\lim_{n\rightarrow\infty}\int\!f\,d\eta_n=\int\!f\,\eta} \). The idea of using Hermitization goes back at least to Girko. However, theorem 2 and lemma 3 below are inspired from the approach of Tao and Vu.

Theorem 2 (Girko Hermitization) Let \( {(A_n)_{n\geq1}} \) be a sequence of complex random matrices where \( {A_n} \) is \( {n\times n} \) for every \( {n\geq1} \), defined on a common probability space. Suppose that for a.a. \( {z\in\mathbb{C}} \), there exists a probability measure \( {\nu_z} \) on \( {[0,\infty)} \) such that a.s.

  • (i) \( {(\nu_{A_n-zI})_{n\geq1}} \) converges weakly to \( {\nu_z} \) as \( {n\rightarrow\infty} \)
  • (ii) \( {\log(\cdot)} \) is uniformly integrable for \( {\left(\nu_{A_n-zI}\right)_{n\geq1}} \)

Then there exists a probability measure \( {\mu\in\mathcal{P}(\mathbb{C})} \) such that

  • (j) a.s. \( {(\mu_{A_n})_{n\geq1}} \) converges weakly to \( {\mu} \) as \( {n\rightarrow\infty} \)
  • (jj) for a.a. \( {z\in\mathbb{C}} \),

    \[ U_\mu(z)=-\int_0^\infty\!\log(t)\,\nu_z(dt). \]

Moreover, if \( {(A_n)_{n\geq1}} \) is deterministic, then the statements hold without the “a.s.”

Proof: Let \( {z} \) and \( {\omega} \) be such that (i-ii) hold. For every \( {1\leq k\leq n} \), define

\[ a_{n,k}:=|\lambda_k(A_n-zI)| \quad\text{and}\quad b_{n,k}:=s_k(A_n-zI) \]

and set \( {\nu:=\nu_z} \). Note that \( {\mu_{A_n-zI}=\mu_{A_n}*\delta_{-z}} \). Thanks to the Weyl inequalities and to the assumptions (i-ii), one can use lemma 3 below, which gives that \( {(\mu_{A_n})_{n\geq1}} \) is tight, that \( {\log\left| z-\cdot\right|} \) is uniformly integrable for \( {(\mu_{A_n})_{n\geq1}} \), and that

\[ \lim_{n\rightarrow\infty}U_{\mu_{A_n}}(z)=-\int_0^\infty\!\log(t)\,\nu_z(dt)=:U(z). \]

Consequently, a.s. \( {\mu\in\mathcal{P}(\mathbb{C})} \) and \( {U_\mu=U} \) a.e. for every adherence value \( {\mu} \) of \( {(\mu_{A_n})_{n\geq1}} \). Now, since \( {U} \) does not depend on \( {\mu} \), by lemma 1, a.s. \( {\left(\mu_{A_n}\right)_{n\geq1}} \) has a unique adherence value \( {\mu} \), and since \( {(\mu_n)_{n\geq1}} \) is tight, \( {(\mu_{A_n})_{n\geq1}} \) converges weakly to \( {\mu} \) by the Prohorov theorem. Finally, by (3), \( {\mu} \) is deterministic since \( {U} \) is deterministic, and (j-jj) hold. ☐

The following lemma is in a way the skeleton of the Girko Hermitization of theorem 2. It states essentially a propagation of a uniform logarithmic integrability for a couple of triangular arrays, provided that a logarithmic majorization holds between the arrays. See arXiv:0808.1502v2 for a proof.

Lemma 3 (Logarithmic majorization and uniform integrability) Let \( {(a_{n,k})_{1\leq k\leq n}} \) and \( {(b_{n,k})_{1\leq k\leq n}} \) be two triangular arrays in \( {[0,\infty)} \). Define the discrete probability measures

\[ \mu_n:=\frac{1}{n}\sum_{k=1}^n\delta_{a_{n,k}} \quad\text{and}\quad \nu_n:=\frac{1}{n}\sum_{k=1}^n\delta_{b_{n,k}}. \]

If the following properties hold

  • (i) \( {a_{n,1}\geq\cdots\geq a_{n,n}} \) and \( {b_{n,1}\geq\cdots\geq b_{n,n}} \) for \( {n\gg1} \),
  • (ii) \( {\prod_{i=1}^k a_{n,i} \leq \prod_{i=1}^k b_{n,i}} \) for every \( {1\leq k\leq n} \) for \( {n\gg1} \),
  • (iii) \( {\prod_{i=k}^n b_{n,i} \leq \prod_{i=k}^n a_{n,i}} \) for every \( {1\leq k\leq n} \) for \( {n\gg1} \),
  • (iv) \( {(\nu_n)_{n\geq1}} \) converges weakly to some probability measure \( {\nu} \) as \( {n\rightarrow\infty} \),
  • (v) \( {\log(\cdot)} \) is uniformly integrable for \( {(\nu_n)_{n\geq1}} \),


  • (j) \( {(\mu_n)_{n\geq1}} \) is tight,
  • (jj) \( {\log(\cdot)} \) is uniformly integrable for \( {(\mu_n)_{n\geq1}} \),
  • (jjj) we have, as \( {n\rightarrow\infty} \),

    \[ \int_0^\infty\!\log(t)\,\mu_n(dt) =\int_0^\infty\!\log(t)\,\nu_n(dt)\rightarrow\int_0^\infty\!\log(t)\,\nu(dt), \]

and in particular, for every adherence value \( {\mu} \) of \( {(\mu_n)_{n\geq1}} \),

\[ \int_0^\infty\!\log(t)\,\mu(dt)=\int_0^\infty\!\log(t)\,\nu(dt). \]

The logarithmic potential is related to the Cauchy-Stieltjes transform of \( {\mu} \) via

\[ S_\mu(z) :=\int_{\mathbb{C}}\!\frac{1}{z’-z}\,\mu(dz’) =(\partial_x-i\partial_y)U_\mu(z) \quad\text{and thus}\quad (\partial_x+i\partial_y)S_\mu=-2\pi\mu \]

in \( {\mathcal{D}'(\mathbb{C})} \). The term “logarithmic potential” comes from the fact that \( {U_\mu} \) is the electrostatic potential of \( {\mu} \) viewed as a distribution of charges in \( {\mathbb{C}\equiv\mathbb{R}^2} \). The logarithmic energy

\[ \mathcal{E}(\mu) :=\int_{\mathbb{C}}\!U_\mu(z)\,\mu(dz) =-\int_{\mathbb{C}}\int_{\mathbb{C}}\!\log\left| z-z’\right|\,\mu(dz)\mu(dz’) \]

is up to a sign the Voiculescu free entropy of \( {\mu} \) in free probability theory. The circular law \( {\mathcal{U}_\sigma} \) minimizes \( {\mu\mapsto\mathcal{E}(\mu)} \) under a second moment constraint. In the spirit of (4) and beyond matrices, the Brown spectral measure of a nonnormal bounded operator \( {a} \) is

\[ \mu_a:=(-4\pi)^{-1}\Delta\int_0^\infty\!\log(t)\,\nu_{a-zI}(dt) \]

where \( {\nu_{a-zI}} \) is the spectral distribution of the self-adjoint operator \( {(a-zI)(a-zI)^*} \). Due to the logarithm, the Brown spectral measure \( {\mu_a} \) depends discontinuously on the \( {*} \)-moments of \( {a.} \) For random matrices, this problem is circumvented in the Girko Hermitization by requiring a uniform integrability, which turns out to be a.s. satisfied for random matrices with i.i.d. entries with finite positive variance.

Leave a Comment

Circular law theorem for random Markov matrices

I have uploaded on arXiv a paper written with Charles Bordenave and Pietro Caputo entitled Circular Law Theorem for Random Markov Matrices. We provide, probably for the first time, an almost sure  circular law theorem for a model of random matrices with dependent entries and finite positive variance. This work leads us to formulate some open problems of variable difficulty, listed at the end of the introduction. Feel free to solve them!

To be a bit more precise, let $latex (X_{jk})_{jk\geq1}$ be i.i.d. nonnegative random variables with bounded density, mean $latex m$, and finite positive variance $latex \sigma^2$. Let $latex M$ be the $latex n\times n$ random Markov matrix with i.i.d. rows defined by $latex M_{jk}=X_{jk}/(X_{j1}+\cdots+X_{jn})$.  Let $latex \lambda_1,\ldots,\lambda_n$ be the eigenvalues of $latex \sqrt{n}M$ i.e. the roots in $latex \mathbb{C}$ of its characteristic polynomial. Our main result states that with probability one, the counting probability measure $latex \frac{1}{n}\delta_{\lambda_1}+\cdots+\frac{1}{n}\delta_{\lambda_n}$ converges weakly as $latex n\to\infty$ to the uniform law on the disk $latex \{z\in\mathbb{C}:|z|\leq m^{-1}\sigma\}$ (circular law).

In particular, when $latex X_{11}$ follows an exponential law, the random matrix $latex M$ belongs to the Dirichlet Markov Ensemble of random stochastic matrices, and our main result gives a positive answer to the question raised in a previous paper.

The matrix $latex M$ is Markov: its entries belong to $latex [0,1]$ and each row sums up to $latex 1$. It has equally distributed dependent entries. However, the rows of $latex M$ are i.i.d. and follow an exchangeable law on $latex \mathbb{R}^n$ supported by the simplex. Since $latex \sigma<\infty$, the uniform law of large numbers of Bai and Yin states that almost surely

$latex \displaystyle\max_{1\leq i\leq n}\left|X_{i1}+\cdots+X_{in}-nm\right|=o(n)$.

This suggests that $latex m^{-1}\sqrt{n}M$ is approximately equal to $latex n^{-1/2}X$ for $latex n$ large enough. One can then expect that almost surely the counting probability measure of the singular values of $latex \sqrt{n}M$ will converge as $latex n\to\infty$ to  a quartercircular law while the counting probability measure of the eigenvalues of $latex \sqrt{n}M$ will converges to a circular law. We show that this heuristics is valid. There is however a complexity gap between these two statements due to the fact that for nonnormal operators such as $latex M$, the eigenvalues are less stable than the singular values under perturbations (here the least singular value enters the scene).

The proof of the main result is inspired from the Tao and Vu version of the Girko Hermitization via logarithmic potentials. More precisely, the starting point is the following Hermitization identity valid for any $latex n\times n$ complex matrix $latex A$ and any $latex z$ outside the spectrum of $latex A$:

$latex \displaystyle \frac{1}{n}\sum_{k=1}^n\log|\lambda_k(A)-z|=\frac{1}{n}\sum_{k=1}^n\log(\lambda_k(\sqrt{(A-zI)(A-zI)^*})).$

The left hand side is the logarithmic potential at point $latex z$ of the counting probability measure of the eigenvalues of $latex A$. The right hand side is the integral of $latex \log(\cdot)$ for the counting probability measure of the singular values of $latex A-zI$. Thanks to the Girko Hermitization theorem, the problem boils down then to show that for Lebesgue almost all $latex z\in\mathbb{C}$, almost surely,

  1. the counting probability measure $latex \nu_{\sqrt{n}M-zI}$ of the singular values of $latex \sqrt{n}M-zI$ tends weakly as $latex n\to\infty$ to a deterministic probability measure $latex \nu_z$ related to the circular law
  2. the function $latex \log(\cdot)$ is uniformly integrable for the sequence $latex (\nu_{\sqrt{n}M-zI})_{n\geq1}$

For the first statement, we use a result of Dozier and Silverstein, which is based on Hermitian techniques: truncation, centralization,  recursion for the Cauchy-Stieltjes transform (trace of the resolvent) via Sherman-Morrison bloc inversion, producing finally a cubic equation for the Cauchy-Stieltjes transform of the limiting law.

The second statement  involves as essential ingredients a Talagrand concentration inequality (for the control of the distance of rows to the span of other rows), and a polynomial control of the operator norm of the resolvent (i.e. the least singular value of $latex \sqrt{n}M-zI$). The bounded density assumption is purely technical and comes from the way we control the operator norm of the resolvent. A possible route for the removal of this assumption is to control the least singular value for a certain kind of matrices with independent rows.

1 Comment

Deux questions entre statistique et calcul stochastique

Qu’elle est la meilleure manière de simuler la loi de la variable aléatoire \( {Z=\int_0^1\!f(t)\,dB_t} \) où \( {f:[0,1]\rightarrow\mathbb{R}} \) est une fonction déterministe sympathique et où \( {(B_t)_{t\geq0}} \) est un mouvement brownien standard issu de \( {0} \) ?

Soit \( {(\mathcal{F}_t)_{t\geq0}} \) la filtration naturelle de \( {(B_t)_{t\geq0}} \) définie par \( {\mathcal{F}_t=\sigma(B_s)_{0\leq s\leq t}} \) pour tout \( {t\geq0} \). Soit \( {0\leq t_1\leq \cdots\leq t_r\leq 1} \) une suite de temps. Considérons une fonction aléatoire \( {F:[0,1]\times \Omega\rightarrow\mathbb{R}} \) de la forme

\[ F(t,\omega)=\sum_{i=1}^{r-1} E_i(\omega) \mathbf{1}_{[t_i,t_{i+1})} \]

où pour tout \( {1\leq i\leq r-1} \) la variable aléatoire \( {E_i} \) est \( {\mathcal{F}_{t_i}} \) mesurable. On dispose alors de la formule stochastique suivante :

\[ I(F)=\int_0^1\!F(t,\cdot)\,dB_t=\sum_{i=1}^{r-1}E_i(B_{t_{i+1}}-B_{t_i}). \]

Cette formule permet de définir par un procédé d’approximation des intégrales stochastiques plus générales pour des fonctions \( {F} \) plus complexes, cf. par exemple Oksendal pour une introduction accessible, et Revuz & Yor, Karatzas & Shreve, et Chung & Williams pour aller plus loin. Lorsque les variables aléatoires \( {(E_i)_{1\leq i\leq n}} \) sont déterministes, la variable aléatoire \( {I(F)} \) est gaussienne centrée et sa variance se calcule directement par bilinéarité. Revenons à la question posée. La fonction \( {F} \) est déterministe (\( {F(\cdot,\omega)=f} \)) et la variable aléatoire \( {Z} \) est gaussienne centrée. La variance de \( {Z} \) se calcule au moyen de l’isométrie d’Itô :

\[ \mathbb{E}[Z^2] =\mathbb{E}\left[\left(\int_0^1\!f(t)\,dB_t\right)^2\right] =\mathbb{E}\left[\int_0^1\!f(t)^2\,dt\right] =\int_0^1\!f(t)^2\,dt. \]

Pour simuler \( {Z} \), il suffit donc de pré-calculer ou d’approcher par une méthode numérique l’intégrale déterministe \( {\iota=\int_0^1\!f(t)^2\,dt} \) puis de simuler la loi gaussienne \( {\mathcal{N}(0,\iota)} \). Pour s’en convaincre, lorsque \( {f=\mathbf{1}_{[0,t]}} \) on trouve \( {\iota=t} \) et donc \( {Z\sim\mathcal{N}(0,t)} \), ce qui est bien normal car pour cette fonction \( {f} \) la variable aléatoire \( {Z} \) vaut \( {B_t} \).

On peut également voir l’intégrale stochastique \( {Z} \) comme la valeur au temps \( {1} \) de la solution \( {(X_t)_{0\leq t\leq 1}} \) de l’équation différentielle stochastique (EDS) très simple \( {dX_t=f(t)dB_t} \) avec condition initiale \( {X_0=0} \). Ainsi, toute méthode de simulation approchée de cette EDS permet en particulier de simuler de manière approchée \( {Z} \). Une méthode classique pour les EDS : les schémas d’Euler, cf. par exemple Kloeden & Platen, Kloeden & Platen, ou Iacus. Cependant, pour la question posée, l’EDS est si simple que la méthode explicite gaussienne évoquée précédemment est préférable. Les schémas d’Euler permettent de simuler de manière approchée la solution d’EDS plus générales de la forme

\[ dX_t=f(t,X_t)dB_t+g(t,X_t)dt \]

dont la solution n’est pas forcément gaussienne (dès lors que \( {f(t,x)} \) dépend de \( {x} \) où que \( {g(t,x)} \) n’est pas affine en \( {x} \). Lorsque \( {f(t,x)=\alpha} \) et \( {g(t,x)=-\beta x} \) on obtient un processus gaussien d’Ornstein-Uhlenbeck) :

\[ X_1=\int_0^1\!f(t,X_t)\,dB_t + \int_0^1\!g(t,X_t)\,dt. \]

Ces méthodes numériques sont utilisées en finance mathématique, cf. par exemple Karatzas & Shreve. En statistique, les intégrales stochastiques gaussiennes apparaissent par exemple dans des théorèmes limites du type TCL, cf. également les problèmes statistiques liés aux processus de diffusion, cf. par exemple Kutoyants.

Les schémas d’Euler pour les EDS peuvent être poussés à des ordres plus élevés. Le schéma de Milstein par exemple correspond à une approximation à l’ordre deux, peu utilisée toutefois en pratique. On distingue l’erreur d’approximation forte de l’erreur d’approximation faible (i.e. en espérance, qui fait gagner un ordre de puissance \( {n^{1/2}} \) où \( {n} \) est le nombre de pas de discrétisation du temps). Pour ces aspects, on pourra consulter par exemple Kloeden & Platen, Kloeden & Platen, et Bouleau & Lépingle. Des approches plus originales figurent dans Gaines & Lyons, Gaines & Lyons, Lyons & Victoir, Pagès.

Sot \( {(B_t)_{t\geq0}} \) un mouvement brownien standard issu de \( {0} \) et \( {f:\mathbb{R}_+\rightarrow\mathbb{R}} \) une fonction lisse. Soit \( {(X_t)_{t\geq0}} \) la solution de l’équation différentielle stochastique (EDS) suivante : \( {dX_t=dB_t+f(t)dt} \), \( {X_0=0} \). Qu’elle est la densité de la loi de \( {(X_t)_{t\geq0}} \) par rapport à \( {(B_t)_{t\geq0}} \), sur l’espace des trajectoires ? Comment s’exprime la \( {\log} \)-vraisemblance ?

On a \( {X_t=\int_0^t\!f(s)\,ds+B_t\sim\mathcal{N}(\int_0^t\!f(s)\,ds,t)} \) pour tout \( {t\geq0} \). Le processus \( {(X_t)_{t\geq0}} \) est gaussien, de même fonction de covariance que \( {(B_t)_{t\geq0}} \). La densité de la loi de \( {(X_t)_{t\geq0}} \) par rapport à \( {(B_t)_{t\geq0}} \), sur l’espace des trajectoires est donnée par la formule de Cameron-Martin-Girsanov, cf. Oksendal (théorème 8.6.4 page 162 et exemple 8.6.5 page 164). Plus précisément, soit \( {H} \) une fonction qui dépend de la trajectoire sur l’intervalle de temps \( {[0,t]} \). Par exemple, une fonction de la forme \( {H(x_{[0,t]}) = \mathbf{1}_{x_{s_1} \in A_1}\cdots\mathbf{1}_{x_{s_n} \in A_n}} \) où les \( {s_i} \) sont fixés dans \( {[0,t]} \) et où les \( {A_i} \) sont des boréliens de \( {\mathbb{R}} \). Alors on a

\[ E\left[H(X_{[0,t]})\right] = E\left[H(W_{[0,t]}) Q((W_{[0,t]})\right] \]

\[ \log Q(W_{[0,t]}) := \int_0^t\!f(s)\,dB_s +\frac{1}{2}\int_0^t\!f(s)^2 ds. \]

La fonction \( {Q} \) est la densité recherchée par rapport à la loi de \( {(B_t)_{t\geq0}} \) sur \( {[0,t]} \). Comme la motivation de la question est de calculer des vraisemblances, on souhaite surtout exprimer \( {Q} \) en terme de \( {(X_t)_{t\geq0}} \) plutôt qu’en terme de \( {(B_t)_{t\geq0}} \). Or l’EDS donne immédiatement

\[ \log Q = \int_0^t\!f(s)\,dX_s – \frac{1}{2}\int_0^t\!\!f^2(s)\,ds, \]

qui est la \( {\log} \)-vraisemblance recherchée. Au passage, quand \( {f} \) est constante et égale à un réel \( {\theta} \), on trouve bien que le maximum de vraisemblance en \( {\theta} \) vaut \( {X_t/t} \). \`A présent, pour deux fonctions \( {f} \) et \( {g} \) et une trajectoire observée \( {X_{[0,t]}} \), le logarithme du rapport de vraisemblance vaut :

\[ \int_0^t\!(f-g)(s)\,dX_s – \frac{1}{2}\int_0^2\!(f^2-g^2)(s)\,ds. \]

On peut également exprimer \( {\log(Q)} \) encore plus simplement car la dérive \( {f(t)dt} \) dans l’EDS considérée est déterministe. Plus précisément, la formule d’Itô donne la formule d’intégration par parties suivante :

\[ \int_0^t\!f(s)\,dB_s = f(t) B_t – \int_0^t\!B_s f'(s)\,ds, \]

cf. Oksendal (théorème 4.1.5 page 46), et en utilisant l’EDS, la \( {\log} \)-vraisemblance \( {\log(Q)} \) s’écrit :

\[ f(t) X_t – \int_0^t\!X_s\,f'(s)\,ds – \frac{1}{2}\int_0^t\!f^2(s)\,ds. \]

Remarque à retenir : si \( {\mathcal{W}} \) is la mesure de Wiener standard et \( {\mathcal{W}_f} \) la mesure de Wiener translatée par une fonction \( {f} \), alors \( {\mathcal{W}_f} \) est absolument continue par rapport à \( {\mathcal{W}} \) si et seulement si \( {f} \) appartient à l’espace de Cameron-Martin, et la densité est donnée par la formule de Girsanov.

Ce billet est inspiré de questions posées par Anne-Laure Fougères et Béatrice Laurent-Bonneau.

Leave a Comment