Press "Enter" to skip to content

Month: November 2014

About random generators of geometric distribution

Dice

Recently, a friend of mine, Arnaud Guyader, discovered by accident during a teaching session that the geometric distribution generator of GNU R, implemented in the rgeom function, is sub-optimal. The command help rgeom says rgeom uses the derivation as an exponential mixture of Poissons, see Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480. Moreover, page 480 of this book says nothing about the geometric distribution.

/* Source code of rgeom.c in latest version of GNU R (2014-11-15).
*  Mathlib : A C Library of Special Functions
*  Copyright (C) 1998 Ross Ihaka and the R Core Team.
*  Copyright (C) 2000 The R Core Team
*
*  This program is free software; you can redistribute it and/or modify
*  it under the terms of the GNU General Public License as published by
*  the Free Software Foundation; either version 2 of the License, or
*  (at your option) any later version.
*
*  This program is distributed in the hope that it will be useful,
*  but WITHOUT ANY WARRANTY; without even the implied warranty of
*  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
*  GNU General Public License for more details.
*
*  You should have received a copy of the GNU General Public License
*  along with this program; if not, a copy is available at
*  http://www.r-project.org/Licenses/
*
*  SYNOPSIS
*
*    #include <Rmath.h>
*    double rgeom(double p);
*
*  DESCRIPTION
*
*    Random variates from the geometric distribution.
*
*  NOTES
*
*    We generate lambda as exponential with scale parameter
*    p / (1 – p).  Return a Poisson deviate with mean lambda.
*
*  REFERENCE
*
*    Devroye, L. (1986).
*    Non-Uniform Random Variate Generation.
*    New York: Springer-Verlag.
*    Page 480.
*/

#include “nmath.h”

double rgeom(double p)
{
if (!R_FINITE(p) || p <= 0 || p > 1) ML_ERR_return_NAN;

return rpois(exp_rand() * ((1 – p) / p));
}

The fact that the geometric distribution is an exponential mixture of Poisson is a special case of the more general fact that the negative binomial distribution, i.e. the convolution power of the geometric distribution also known as the Pascal distribution, is a gamma mixture of Poisson. Namely if \( {X\sim\mathrm{Gamma}(n,\ell)} \) with \( {\ell=p/(1-p)} \), and if \( {Y|X\sim\mathrm{Poisson}(X)} \) then \( {Y\sim\mathrm{NegativeBinomial}(n,p)} \), since for every \( {k\geq n} \), \( {\mathbb{P}(Y=k)} \) writes

\[ \int_0^\infty\!e^{-\lambda} \frac{\lambda^k}{k!} \frac{\ell^n\lambda^{n-1}}{\Gamma(n)}e^{-\lambda\ell}\,d\lambda =\frac{\ell^n}{k!\Gamma(n)} \int_0^\infty\!\lambda^{n+k-1}e^{-\lambda(\ell+1)}\,d\lambda =\frac{\Gamma(n+k)}{k!\Gamma(n)}(1-p)^kp^n. \]

For \( {n=1} \) we recover the geometric distribution. This mixture representation is useful for the simulation of negative binomial distributions with high integer parameter, but is not the best choice for the simulation of the special case of the geometric distribution.

The simulation of the geometric distribution is better done by using the so-called inversion method (generalized inverse of the cdf), which boils down to take the integer part of the standard exponential distribution generator: ceil(log(rand())/log(1-p)) since if \( {X\sim\mathrm{Exponential}(\lambda)} \) with \( {\lambda=-\log(1-p)} \) then

\[ \mathbb{P}(k\leq X\leq k+1) =e^{-k\lambda}-e^{-(k+1)\lambda} =e^{-\lambda k}(1-e^{-\lambda}) =(1-p)^kp. \]

Patch. Here is a possible patch to the rgeom.c of GNU R.

31,32c31
<  *    We generate lambda as exponential with scale parameter
<  *    p / (1 – p).  Return a Poisson deviate with mean lambda.

>  *    returns 1+ integer part of exponential with scale parameter -1 / log(1 – p).
39c38
<  *    Page 480.

>  *    Section X.2 page 498.
48c47
<     return rpois(exp_rand() * ((1 – p) / p));

>     return ceil(log(rand()) / log(1 – p));

Further reading. The simulation of the geometric distribution is discussed in Section X.2 page 498 of the book of Devroye: the inversion method is considered, but no words on the exponential mixture of Poisson. The simulation of the negative binomial distribution is discussed in Section X.4.7 page 543, and the Gamma mixture of Poisson is considered.

Other software. The random generator for the geometric distribution proposed presently by the GNU Scientific Library, by the GNU Octave, and by the Scilab software are based on the inversion method. The Distributions.jl package for the Julia software also uses the inversion method. The good news with open source software is that even if the help does not mention the algorithm that is used, you can always check the source code.

Leave a Comment
Syntax · Style · .