# Libres pensées d'un mathématicien ordinaire Posts

This short post is devoted to a couple of Julia programs.

Gaussian Unitary Ensemble. It is the Boltzmann-Gibbs measure with density proportional to
$$(x_1,\ldots,x_N)\in\mathbb{R}^N\mapsto\mathrm{e}^{-N\sum_{i=1}^N x_i^2}\prod_{i<j}(x_i-x_j)^2.$$
It is the distribution of the eigenvalues of a random $N\times N$ Hermitian matrix distributed according to the Gaussian probability measure with density proportional to
$$H\mapsto\mathrm{e}^{-N\mathrm{Tr}(H^2)}.$$
It is a famous exactly solvable model of mathematical physics. There is a nice formula for the mean empirical spectral distribution $\mathbb{E}\mu_N$ where $\mu_N=\frac{1}{N}\sum_{i=1}^N\delta_{x_i}$, namely
$$x\in\mathrm{R}\mapsto\frac{\mathrm{e}^{-\frac{N}{2}x^2}}{\sqrt{2\pi N}}\sum_{\ell=0}^{N-1}P_\ell^2(\sqrt{N}x)$$
where ${(P_\ell)}_{\ell\geq0}$ are the Hermite polynomials which are the orthonormal polynomials for the standard Gaussian distribution $\mathcal{N}(0,1)$. The computation of the Laplace transform and a subtle reasonning, see this paper, reveal that it converges as $N\to\infty$ towards the the Wigner semicircle distribution with density with respect to the Lebesgue measure
$$x\in\mathbb{R}\mapsto\frac{\sqrt{4-x^2}}{2\pi}\mathbf{1}_{x\in[-2,2]}.$$
Here is a nice plot followed by the Julia code used to produce it.

function normalized_hermite_polynomials_numeric(n,x)
if size(x,2) != 1
error("Second argument must be column vector.")
elseif n == 0
return ones(length(x),1)
elseif n == 1
return x
else
P = [ ones(length(x),1) x ]
for k = 1:n-1
P = [ P  x.*P[:,k+1]/sqrt(k+1)-P[:,k]*sqrt(1/(1+1/k)) ]
end
return P # matrix with columns P_0(x),...,P_n(x)
end
end #function

function gue_numeric(n,x)
if size(x,2) != 1
error("Second argument must be column vector.")
else
y = sqrt(n)*x
p = normalized_hermite_polynomials_numeric(n-1,y).^2
return exp(-y.^2/2) .* sum(p,2) / sqrt(2*pi*n)
end
end #function

using Plots
pyplot()
N = 8
X = collect(-3:.01:3)
Y = gue_numeric(N,X)
function semicircle(x)
if (abs(x)>2) return 0 else return sqrt(4-x^2)/(2*pi) end
end # function
Y = [Y [semicircle(x) for x in X]]
plot(X,Y,
title = @sprintf("GUE mean Empirical Spectral Distribution beta=2 N=%i",N),
titlefont = font("Times", 10),
label = ["GUE mean ESD" "SemiCircle"],
lw = 2,
linestyle = [:dash :solid],
linecolor = [:darkblue :darkred],
aspect_ratio = 2*pi,
grid = false,
border = false)
gui()
#savefig("gue.svg")
savefig("gue.png")


Here is, just for fun, a way to produce symbolically Hermite polynomials.

# Pkg.add("Polynomials") # http://juliamath.github.io/Polynomials.jl/latest/
using Polynomials

function unnormalized_hermite_polynomial_symbolic(n)
if n == 0
return Poly([1])
elseif n == 1
return Poly([0,1])
else
P = [Poly([1]) Poly([0,1])]
for k = 1:n-1
P = [ P Poly([0,1])*P[k+1]-k*P[k] ]
end
return P # better than just P[n+1]
end
end #function


Complex Ginibre Ensemble. It is the Boltzmann-Gibbs measure with density proportional to
$$(x_1,\ldots,x_N)\in\mathbb{C}^N\mapsto\mathrm{e}^{-N\sum_{i=1}^N|x_i|^2}\prod_{i<j}|x_i-x_j|^2.$$
It is the distribution of the eigenvalues of a random $N\times N$ complex matrix distributed according to the Gaussian probability measure with density proportional to
$$M\mapsto\mathrm{e}^{-N\mathrm{Tr}(MM^*)}$$
where $M^*=\overline{M}^\top$. Yet another exactly solvable model of mathematical physics, with a nice formula for the mean empirical spectral distribution $\mathbb{E}\mu_N$ where $\mu_N=\frac{1}{N}\sum_{i=1}^N\delta_{x_i}$, given by
$$z\in\mathbb{C}\mapsto\frac{\mathrm{e}^{-N|z|^2}}{\pi}\sum_{\ell=0}^{N-1}\frac{|\sqrt{N}z|^{2\ell}}{\ell!}$$
which is the analogue of the one given above for the Gaussian Unitary Ensemble. Using induction and integration by parts, it turns out that this density can be rewritten as
$$z\in\mathbb{C}\mapsto\int_{N|z|^2}^\infty\frac{u^{N-1}\mathrm{e}^{-u}}{\pi(N-1)!}\mathrm{d}u =\frac{\Gamma(N,N|z|^2)}{\pi}$$
where $\Gamma$ is the normalized incomplete Gamma function and where we used the identity
$$\mathrm{e}^{-r}\sum_{\ell=0}^n\frac{r^\ell}{\ell!}=\frac{1}{n!}\int_r^\infty u^n\mathrm{e}^{-u}\mathrm{d}u.$$
Note that the function $t\mapsto 1-\Gamma(N,t)$ is the cumulative distribution function of the Gamma distribution with shape parameter $N$ and scale parameter $1$. As a consequence, denoting $X_1,\ldots,X_N$ i.i.d. exponential random variables of unit mean, the density can be written as
$$z\in\mathbb{C}\mapsto\frac{1}{\pi}\mathbb{P}\left(\frac{X_1+\cdots+X_N}{N}\geq|z|^2\right).$$
Now the law of large numbers implies that this density converges pointwise as $N\to\infty$ towards the density of the uniform distribution on the unit disk of the complex plane
$$z\in\mathbb{C}\mapsto\frac{\mathbf{1}_{|z|\leq1}}{\pi}.$$
One can also use i.i.d. Poisson random variables $Y_1,\ldots,Y_N$ of mean $|z|^2$, giving
$$z\in\mathbb{C}\mapsto\frac{1}{\pi}\mathbb{P}\left(\frac{Y_1+\cdots+Y_N}{N}<1\right),$$
see this former post. Here are the plots with respect to $|z|$, followed by the Julia code.

# Pkg.add("Distributions") # https://juliastats.github.io/Distributions.jl
using Plots
using Distributions
N = 8
X = collect(0:.01:3)
Y = ccdf(Gamma(N,1),N*X.^2)/pi
function unit(x)
if (x>1) return 0 else return 1/pi end
end # function
Y = [Y [unit(x) for x in X]]
pyplot()
plot(X,Y,
title = @sprintf("Ginibre mean Empirical Spectral Distribution beta=2 N=%i",N),
titlefont = font("Times", 10),
label = ["Ginibre mean ESD" "Equilibrium"],
lw = 2,
linestyle = [:dash :solid],
linecolor = [:darkblue :darkred],
grid = false,
border = false)
gui()
#savefig("ginibre.svg")
savefig("ginibre.png")

Further reading. The Introducing Julia wikibook on Wikibooks.

This post presents an inequality for self-normalized sums without moment assumptions.

Symmetric laws. Recall that a probability distribution is symmetric when ${X}$ and ${-X}$ are equally distributed if ${X}$ is a random variable following this distribution. In this case ${\varepsilon=\mathrm{sign}(X)}$ and ${|X|}$ are independent and ${\varepsilon}$ follows a symmetric Rademacher distribution: ${\mathbb{P}(\varepsilon=\pm1)=1/2}$.

Concentration. Let ${X_1,\ldots,X_n}$ be independent real random variables with symmetric law and without atom at ${0}$. Then for any real ${r>0}$,

$\mathbb{P}\left(\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}}\geq r\right) \leq\mathrm{e}^{-\frac{r^2}{2}}.$

Note that this is available without any moment assumption on the random variables.

Proof. Thanks to the independence and symmetry assumptions, the random variables ${\varepsilon_1=\mathrm{sign}(X_1),\ldots,\varepsilon_n=\mathrm{sign}(X_n)}$ are iid, follow the symmetric Rademacher distribution, and are independent of ${|X_1|,\ldots,|X_n|}$. Now by conditioning we get

$\mathbb{P}\left(\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}}\geq r\right) =\mathbb{E}(\varphi_r(|X_1|,\ldots,|X_n|))$

where ${\varphi_r(c_1,\ldots,c_n)=\mathbb{P}((\varepsilon_1c_1+\cdots+\varepsilon_nc_n)/\sqrt{c_1^2+\cdots+c_n^2}\geq r)}$. We can assume that ${c_i>0}$ since ${\mathbb{P}(X_i=0)=0}$. It remains to use the Hoeffding inequality, which states that if ${Z_1,\ldots,Z_n}$ are independent centered and bounded real random variables then for any real ${r>0}$,

$\mathbb{P}\left(Z_1+\cdots+Z_n\geq r\right) \leq\exp\left(-\frac{2r^2}{\mathrm{osc}(Z_1)^2+\cdots+\mathrm{osc}(Z_n)^2}\right).$

where ${\mathrm{osc}(Z)=\max(Z)-\min(Z)}$. Here we use it with, for any ${i=1,\ldots,n}$,

$Z_i=\frac{c_i}{\sqrt{c_1^2+\cdots+c_n^2}}\varepsilon_i \quad\text{for which}\quad \mathrm{osc}(Z_i)^2=\frac{4c_i^2}{c_1^2+\cdots+c_n^2}.$

Indeed this gives ${ \varphi_r(c_1,\ldots,c_n) =\mathbb{P}\left(Z_1+\cdots+Z_n\geq r\right) \leq\mathrm{e}^{-\frac{r^2}{2}}}$.

Probabilistic interpretation. When ${X_1,\ldots,X_n}$ are iid and in ${L^2}$, then their mean is zero, and their variance is say ${\sigma^2>0}$. The law of large numbers gives ${\sqrt{X_1^2+\cdots+X_n^2}=\sqrt{n}(\sigma+o_{n\rightarrow\infty}(1))}$ almost surely. Therefore by the central limit theorem and Slutsky’s lemma we get ${(X_1+\cdots+X_n)/\sqrt{X_1^2+\cdots+X_n^2}\overset{\text{law}}{\longrightarrow}\mathcal{N}(0,1)}$ as ${n\rightarrow\infty}$.

Geometric interpretation. If ${X_1,\ldots,X_n}$ are iid standard Gaussian, then

$\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}} =\langle U_n,\theta_n\rangle$

where

$U_n=\sqrt{n}\frac{(X_1,\ldots,X_n)}{\sqrt{X_1^2+\cdots+X_n^2}} \quad\text{and}\quad \theta_n=\frac{(1,\ldots,1)}{\sqrt{n}}.$

The random vector ${U_n}$ is uniformly distributed on the sphere of ${\mathbb{R}^n}$ of radius ${\sqrt{n}}$, while the vector ${\theta_n}$ belongs to the unit sphere. Note that ${\langle U_n,\theta_n\rangle}$ is the law of the sum of the coordinates of a row or column of a uniform random orthogonal matrix.

Relation to Studentization. The result above can be related to the Studentized version of the empirical mean. Indeed, if one defined the empirical mean and the empirical variance

$\overline{X}_n=\frac{X_1+\cdots+X_n}{n} \quad\text{and}\quad \widehat\sigma^2_n=\frac{(X_1-\overline{X}_n)^2+\cdots+(X_n-\overline{X}_n)^2}{n-1}$

then using

$(n-1)\widehat\sigma_n^2 =X_1^2+\cdots+X_n^2-\frac{(X_1+\cdots+X_n)^2}{n}$

we get, for any ${r\geq0}$, after some algebra,

$\left\{\sqrt{n}\frac{\overline{X}_n}{\widehat\sigma_n}\geq r\right\} =\left\{\frac{X_1+\cdots+X_n}{\sqrt{X_1^2+\cdots+X_n^2}} \geq r\sqrt{\frac{n}{n-1+r^2}}\right\}.$

It follows then from the concentration inequality above that if ${X_1,\ldots,X_n}$ are independent, with symmetric law without atom at ${0}$, then for any ${r\geq0}$,

$\mathbb{P}\left(\sqrt{n}\frac{\overline{X}_n}{\widehat\sigma_n}\geq r\right) \leq\exp\left(-\frac{nr^2}{2(n-1+r^2)}\right).$

If ${X_1,\ldots,X_n}$ are iid centered Gaussian then ${\overline{X}_n\sim\mathcal{N}(0,1)}$ and ${\widehat{\sigma}^2_n\sim\chi^2(n-1)}$ are independent and their ratio ${\sqrt{n}\overline{X}_n/\widehat\sigma_n}$ follows the Student ${t(n-1)}$ law, of density proportional to ${x\mapsto 1/(1+t^2/(n-1))^{n/2}}$, which is in particular heavy tailed.

Extension and further reading. On this subject, one may take a look for instance at the article by Sergey Bobkov and Friedrich Götze entitled Concentration inequalities and limit theorems for randomized sums (2007). In another direction, it is also possible to show that a Cramér large deviation principle for self-normalized sums is available without any moment assumption, and we refer for instance to the probability survey by Qi-Man Shao and Qiying Wang entitled Self-normalized limit theorems: a survey (2013).

Many thanks to the mysterious L.

Sean O’Rourke pointed out on December 30, 2017 that a notation should be corrected in the statement of Lemma A.1 in the probability survey Around the circular law (2012) that I wrote years ago in collaboration with Charles Bordenave.

Indeed the definition of ${\sigma^2}$ should be corrected to

$\sigma^2 :=\min_{1\leq i,j\leq n}\mathrm{Var}(X_{ij}\mid|X_{i,j}|\leq a)>0.$

It was erroneously written

$\sigma^2 :=\min_{1\leq i,j\leq n}\mathrm{Var}(X_{ij}\mathbf{1}_{|X_{i,j}|\leq a})>0.$

Let us take this occasion for a back to basics about conditional variance and variance of truncation. Let ${X}$ be a real random variable on ${(\Omega,\mathcal{F},\mathbb{P})}$ and ${A\in\mathcal{F}}$ be an event. First the real number ${\mathbb{E}(X\mid A)=\mathbb{E}(X\mid\mathbf{1}_A=1)}$ is not the random variable ${\mathbb{E}(X\mid\mathbf{1}_A)}$. We have

$\mathbb{E}(X\mid\mathbf{1}_A) =\underbrace{\frac{\mathbb{E}(X\mathbf{1}_A)}{\mathbb{P}(A)}}_{\mathbb{E}(X\mid A)}\mathbf{1}_A +\underbrace{\frac{\mathbb{E}(X\mathbf{1}_{A^c})}{\mathbb{P}(A^c)}}_{\mathbb{E}(X\mid A^c)}\mathbf{1}_{A^c}.$

Note that this formula still makes sense when ${\mathbb{P}(A)=0}$ or ${\mathbb{P}(A)=1}$.

The quantity ${\mathbb{E}(X\mid A)}$ makes sense only if ${\mathbb{P}(A)>0}$, and in this case, the conditional variance of ${X}$ given the event ${A}$ is the real number given by

$\begin{array}{rcl} \mathrm{Var}(X\mid A) &=&\mathbb{E}((X-\mathbb{E}(X\mid A))^2\mid A)\\ &=&\mathbb{E}(X^2\mid A)-\mathbb{E}(X\mid A)^2\\ &=&\frac{\mathbb{E}(X^2\mathbf{1}_A)}{\mathbb{P}(A)} -\frac{\mathbb{E}(X\mathbf{1}_A)^2}{\mathbb{P}(A)^2}\\ &=& \frac{\mathbb{E}(X^2\mathbf{1}_A)\mathbb{P}(A)-\mathbb{E}(X\mathbf{1}_A)^2}{\mathbb{P}(A)^2}\\ &=&\mathbb{E}_A(X^2)-\mathbb{E}_A(X)^2=:\mathrm{Var}_A(X) \end{array}$

where ${\mathbb{E}_A}$ is the expectation with respect to the probability measure with density ${\mathbf{1}_A/\mathbb{P}(A)}$ with respect to ${\mathbb{P}}$. In particular, by the Cauchy–Schwarz inequality,

$\mathrm{Var}(X\mid A) \geq 0$

with equality if and only if ${X}$ and ${\mathbf{1}_A}$ are colinear.

Of course ${\mathrm{Var}(X\mid A)=0}$ if ${X}$ is constant. However ${\mathrm{Var}(X\mid A)}$ may vanish for a non-constant ${X}$. Indeed if ${A=\{|X|\leq a\}}$ and if ${X\sim\frac{1}{2}\delta_{a/2}+\frac{1}{2}\delta_{2a}}$ then ${X\mid A}$ is constant and equal to ${a/2}$. In this example, since ${X\mathbf{1}_A}$ is not a constant, this shows also that one cannot lower bound ${\mathrm{Var}(X\mid A)}$ with the variance of the truncation

$\mathrm{Var}(X\mathbf{1}_A)=\mathbb{E}(X^2\mathbf{1}_A)-\mathbb{E}(X\mathbf{1}_A)^2.$

Another notable correction. Mylène Maïda pointed out to me on February 27 2018 that at the bottom of page 14, just before the statement

$\sup_{z\in C}|n\varphi_{n,1}(\sqrt{n}z)-\pi^{-1}\mathbf{1}_{[0,1]}(|z|)|=0$

the compact set ${C}$ must be taken in ${\{z\in\mathbb{C}:|z|\neq1\}}$ and not on the whole complex plane ${\mathbb{C}}$. Indeed, when ${|z|=1}$, ${n\varphi_{n,1}(\sqrt{n}z)}$ tends as ${n\rightarrow\infty}$ to ${1/2}$, and not to ${\pi^{-1}}$, see for instance this former post for a one formula proof based on the central limit theorem for Poisson random variables. Anyway this is really not surprising since a sequence of continuous functions cannot converge uniformly to a discontinuous function.

This tiny back to basics post is devoted to a couple of bits of Probability and Statistics.

The central limit theorem cannot hold in probability. Let ${{(X_n)}_{n\geq1}}$ be iid real random variables with zero mean and unit variance. The central limit theorem (CLT) states that as ${n\rightarrow\infty}$,

$Z_n=\frac{X_1+\cdots+X_n}{\sqrt{n}} \overset{\text{law}}{\longrightarrow}\mathcal{N}(0,1).$

A frequently asked question by good students is to know if one can replace the convergence in law by the (stronger) convergence in probability. The answer is negative, and in particular the convergence cannot hold almost surely or in ${L^p}$. Let us examine why. Recall that the convergence in probability is stable by linear combinations and by subsequence extraction.

We proceed by contradiction. Suppose that ${Z_n\rightarrow Z_\infty}$ in probability. Then necessarily ${Z_\infty\sim\mathcal{N}(0,1)}$. Now, on the one hand, ${Z_{2n}-Z_n\rightarrow0}$ in probability. On the other hand,

$Z_{2n}-Z_n =\frac{1-\sqrt{2}}{\sqrt{2}}Z_n+\frac{X_{n+1}\cdots+X_{2n}}{\sqrt{2n}} =\frac{1-\sqrt{2}}{\sqrt{2}}Z_n+\frac{1}{\sqrt{2}}Z_n’.$

But ${Z_n’}$ is an independent copy of ${Z_n}$. Thus the CLT used twice gives ${Z_{2n}-Z_n\overset{\text{law}}{\longrightarrow}\mathcal{N}(0,\sigma^2)}$ with ${\sigma^2=(1-\sqrt{2})^2/2+1/2=2-\sqrt{2}\neq0}$, hence the contradiction.

Alternative proof. Set ${S_n=X_1+\cdots+X_n}$, and observe that

$\frac{S_{2n}-S_n}{\sqrt{n}}=\sqrt{2}Z_{2n}-Z_n.$

Now, if the CLT was in probability, the right hand side would converge in probability to ${\sqrt{2}Z_\infty-Z_\infty}$ which follows the law ${\mathcal{N}(0,(\sqrt{2}-1)^2)}$. On the other hand, since ${S_{2n}-S_n}$ has the law of ${S_n}$, by the CLT, the left hand side converges in law towards ${Z_\infty\sim\mathcal{N}(0,1)}$, hence the contradiction. This “reversed” proof was kindly suggested by Michel Ledoux.

Intermezzo: Slutsky lemma. The Slutsky lemma asserts that if

$X_n\overset{\text{law}}{\longrightarrow} X \quad\text{and}\quad Y_n\overset{\text{law}}{\longrightarrow} c$

with ${c}$ constant, then

$(X_n,Y_n)\overset{\text{law}}{\longrightarrow}(X,c),$

and in particular, ${f(X_n,Y_n)\overset{\text{law}}{\longrightarrow} f(X,c)}$ for every continuous ${f}$.

Let us prove it. Since ${Y_n\overset{\text{law}}{\longrightarrow} c}$ and ${c}$ is constant, we have ${Y_n\rightarrow c}$ in probability, and since for all ${t\in\mathbb{R}}$, the function ${y\mapsto \mathrm{e}^{ity}}$ is uniformly continuous on ${\mathbb{R}}$, we have that for all ${s,t\in\mathbb{R}}$ and all ${\varepsilon>0}$, there exists ${\eta>0}$ such that for large enough ${n}$,

$\begin{array}{rcl} |\mathbb{E}(\mathrm{e}^{isX_n+itY_n})-\mathbb{E}(\mathrm{e}^{isX_n+itc})| &\leq&\mathbb{E}(|\mathrm{e}^{itY_n}-\mathrm{e}^{itc}|\mathbf{1}_{|Y_n-c|\leq\eta})+2\mathbb{P}(|Y_n-c|>\eta)\\ &\leq& \varepsilon+2\varepsilon. \end{array}$

Alternatively we can use the Lipschitz property instead of the uniform continuity:

$\begin{array}{rcl} |\mathbb{E}(\mathrm{e}^{isX_n+itY_n})-\mathbb{E}(\mathrm{e}^{isX_n+itc})| &\leq&\mathbb{E}(\left|\mathrm{e}^{itY_n}-\mathrm{e}^{itc}\right|\mathbf{1}_{|Y_n-c|\leq\eta})+2\mathbb{P}(|Y_n-c|>\eta)\\ &\leq& |t|\eta+2\varepsilon. \end{array}$

On the other hand, since ${X_n\overset{\text{law}}{\longrightarrow}X}$, we have, for all ${s,t\in\mathbb{R}}$, as ${n\rightarrow\infty}$,

$\mathbb{E}(\mathrm{e}^{isX_n+itc})=\mathrm{e}^{itc}\mathbb{E}(\mathrm{e}^{isX_n}) \longrightarrow \mathrm{e}^{itc}\mathbb{E}(\mathrm{e}^{isX}) =\mathbb{E}(\mathrm{e}^{isX+itc}).$

The delta-method. Bizarrely this basic result, very useful in Statistics, appears to be unknown to many young probabilists. Suppose that as ${n\rightarrow\infty}$,

$a_n(Z_n-b_n)\overset{\text{law}}{\longrightarrow}L,$

where ${{(Z_n)}_{n\geq1}}$ is a sequence of real random variables, ${L}$ a probability distribution, and ${{(a_n)}_{n\geq1}}$ and ${{(b_n)}_{n\geq1}}$ deterministic sequences such that ${a_n\rightarrow\infty}$ and ${b_n\rightarrow b}$. Then for any ${\mathcal{C}^1}$ function ${f:\mathbb{R}\rightarrow\mathbb{R}}$ such that ${f'(b)\neq0}$, we have

$\frac{a_n}{f'(b)}(f(Z_n)-f(b_n))\overset{\text{law}}{\longrightarrow}L.$

The typical usage in Statistics is for the fluctuations of estimators say for ${a_n(Z_n-b_n)=\sqrt{n}(\widehat{\theta}_n-\theta)}$. Note that the rate in ${n}$ and the fluctuation law are not modified! Let us give a proof. By a Taylor formula or here the mean value theorem,

$f(Z_n)-f(b_n)=f'(W_n)(Z_n-b_n)$

where ${W_n}$ is a random variable lying between ${b_n}$ and ${Z_n}$. Since ${a_n\rightarrow\infty}$, the Slutsky lemma gives ${Z_n-b_n\rightarrow0}$ in law, and thus in probability since the limit is deterministic. As a consequence ${W_n-b_n\rightarrow0}$ in probability and thus ${W_n\rightarrow b}$ in probability. The continuity of ${f’}$ at point ${b}$ provides ${f'(W_n)\rightarrow f'(b)}$ in probability, hence ${f'(W_n)/f'(b)\rightarrow1}$ in probability, and again by Slutsky lemma,

$\frac{a_n}{f'(b)}(f(Z_n)-f(b_n)) =\frac{f'(W_n)}{f'(b)}a_n(Z_n-b_n) \overset{\text{law}}{\longrightarrow}L.$

If ${f'(b)=0}$ then one has to use a higher order Taylor formula, and the rate and fluctuation will be deformed by a power. Namely, suppose that ${f^{(1)}(b)=\cdots=f^{(r-1)}(b)=0}$ while ${f^{(r)}(b)\neq0}$, then, denoting ${L_r}$ the push forward of ${L}$ by ${x\mapsto x^r}$, we get

$\frac{a_n^rr!}{f^{(r)}(b)}(f(Z_n)-f(b_n)) \overset{\text{law}}{\longrightarrow}L_r.$

The delta-method can be of course generalized to sequences of random vectors, etc.