Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Simulating heavy tails

How to quickly simulate a positive real random variable $X$ with a polynomial tail, say $\mathbb{P}(X>t)\sim (1+t)^{-\alpha}$ as $t\to\infty$, for some prescribed $\alpha>0$. We may use an already coded function provided by  our favorite computational software. Quoting my former PhD advisor, you just have to press the button rand or something similar. But how to proceed from scratch using uniform random numbers? The inversion method suggests to set $X=U^{-1/\alpha}-1$ where $U$ is a uniform random variable on $[0,1].$ The law of $X$ is Pareto with probability density function $t\in[0,\infty)\mapsto a(1+t)^{-(1+a)}$.  It is amusing to remark that the fractional part of $X$ has probability density function

$$t\in[0,1]\mapsto\alpha\sum_{n=1}^\infty\frac{1}{(n+t)^{1+\alpha}}=\alpha\zeta(\alpha+1,t+1)$$

where $\zeta$ is the Hurwitz Zeta function. The integer part of $X$ follows a Zipf like law.

To simulate an $n$-sample of $X$ with GNU-Octave one can use rand(1,n).^(-1/alpha)-1. The problem is that on computers, real numbers are replaced by  floats: there is a quantification limit.  Small values of $\alpha$ produce a lack of precision for large values, which are frequent for heavy tails…

Regarding the algorithm, it is not wise to use the inversion method to simulate a law on an infinite set (recall that there is no uniform law of such sets).  To simulate $X$, one should prefer an excess packing method as in Marsaglia & Tsang Ziggurat algorithm. This alternative approach has the advantage of being faster and a bit more precise (the code is heavier). It is implemented for exponential, Gaussian, and gamma distributions in the  rande, randg, randn functions of GNU-Octave 3.2. If you read French, you may take a look at the first chapter of my pedagogical book with Bernard Bercu. This approach constitutes an instance of  a general meta principle in Computer Science which states that one can speed up algorithms by using static pre-computations, related to the rigid structure of the problem. The fastest algorithm is not necessarily the shortest algorithm in terms of number of code lines.

If you are interested in the simulation of $\alpha$-stable distributions, $0<\alpha\leq 2$, you may take a look at the Chambers-Mallows-Stuck exact algorithm which generalizes the well known Box-Muller algorithm for Gaussians (suffers from the quantification limit).

For true (unpredictable) randomness, take a look at random.org and its GNU-R interface 😉

Leave a Comment
Syntax · Style · .