Recall that a random ${n\times n}$ Hermitian matrix ${M}$ belongs to the Gaussian Unitary Ensemble (GUE) when it admits a density proportional to

$M\mapsto \exp\left(-\frac{n}{2}\mathrm{Tr}(M^2)\right).$

If ${\lambda_1,\ldots,\lambda_n}$ are the eigenvalues of ${M}$ then the Wigner theorem gives

$\mathbb{E}\left(\frac{\mathrm{Card}\{1\leq i\leq n:\lambda_i\in I\}}{n}\right) \underset{n\rightarrow\infty}{\longrightarrow} \mu_{\mathrm{SC}}(I)$

for every interval ${I\subset\mathbb{R}}$, where ${\mu_{\mathrm{SC}}}$ is the semicircle distribution with density ${\frac{1}{2\pi}\sqrt{4-x^2}\mathbf{1}_{[-2,2]}(x)}$. In other words, the averaged empirical spectral distribution

$\mu_n :=\mathbb{E}\left(\frac{1}{n}\sum_{i=1}^n\delta_{\lambda_i}\right)$

tends weakly to ${\mu_{\mathrm{SC}}}$ as ${n\rightarrow\infty}$. There are many proofs of this result. We present in this post a simple analytic proof due to Pastur.

Cauchy-Stieltjes transform. Set ${\mathbb{C}_+=\{z\in\mathbb{C}:\mathfrak{I}z>0\}}$. The Cauchy-Stieltjes transform ${s_\mu:\mathbb{C}_+\rightarrow\mathbb{C}}$ of a probability measure ${\mu}$ on ${\mathbb{R}}$ is given by

$s_\mu(z):=\int\!\frac{1}{\lambda-z}\,d\mu(\lambda).$

The knowledge of ${s_\mu}$ on ${\mathbb{C}_+}$ (and even on an small ball of ${\mathbb{C}_+}$ by analyticity) is enough to characterize ${\mu}$. We always have the universal bound for any ${z\in\mathbb{C}_+}$:

$|s_\mu(z)|\leq\frac{1}{\mathfrak{I}z}.$

Furthermore, if ${{(\mu_n)}}$ is a sequence of probability measures on ${\mathbb{R}}$ such that ${s_{\mu_n}(z)\rightarrow s(z)}$ as ${n\rightarrow\infty}$ for every ${z\in\mathbb{C}_+}$ then there exists a unique probability measure ${\mu}$ on ${\mathbb{R}}$ such that ${s_\mu=s}$ and such that ${\mu_n\rightarrow\mu}$ weakly as ${n\rightarrow\infty}$.

Our problem reduces to show that ${s_{\mu_n}(z)\rightarrow s_{\mu_{\mathrm{SC}}}(z)}$ for every ${z\in\mathbb{C}_+}$.

Resolvent. The resolvent ${G}$ of a Hermitian matrix ${H}$ at point ${z\in\mathbb{C}_+}$ is

$G(z)=(H-zI)^{-1}.$

The map ${z\mapsto G(z)}$ is holomorphic on ${\mathbb{C}_+}$ and

$\left\Vert G(z)\right\Vert_{2\rightarrow2}\leq \frac{1}{\mathfrak{I}z}.$

Denoting ${\tau:=\frac{1}{n}\mathrm{Tr}}$ and ${\mu:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(H)}}$, we have

$s_\mu(z)=\tau(G(z)).$

The resolvent identity states that if ${G_1}$ and ${G_2}$ are the resolvent at point ${z}$ of two Hermitian matrices ${H_1}$ and ${H_2}$ then

$G_2-G_1=-G_1(H_2-H_1)G_2.$

Used for ${H_1=M}$ and ${H_2=0}$, this gives, taking expectations and multiplying both sides by ${z}$, and denoting ${G}$ the resolvent at point ${z}$ of ${M}$,

$-I-z\mathbb{E}(G)=-\mathbb{E}(GM).$

From this identity, we will obtain an equation satisfied by ${s_{\mu_n}=\mathbb{E}(\tau(G))}$.

Integration by parts. For a random vector ${X}$ with density ${e^{-V}}$ on ${\mathbb{R}^d}$ and a smooth ${F:\mathbb{R}^d\rightarrow\mathbb{C}}$, an integration by parts gives

$\mathbb{E}(F(X)\nabla V(X))=\mathbb{E}(\nabla F(X))$

as soon as ${e^{-V(x)}F(x)\rightarrow0}$ as ${\left\Vert x\right\Vert\rightarrow\infty}$. In particular, for every ${1\leq p,q\leq n}$, by denoting ${\partial_{M_{pq}}:=\frac{1}{2}(\partial_{\Re M_{pq}}-i\partial_{\mathfrak{I} M_{pq}})}$ if ${p\neq q}$,

$\mathbb{E}(F(M)M_{pq}) =\frac{1}{n}\mathbb{E}(\partial_{M_{qp}}F(M)).$

Let us examine the case ${F(M)=G_{ab}}$ with ${1\leq a,b\leq n}$. The resolvent identity used twice gives

$\begin{array}{rcl} G_2-G_1 &=&-G_1(M_2-M_1)G_2 \\ &=&-G_1(M_2-M_1)(G_1-G_1(M_2-M_1)G_2)\\ &=&-G_1(M_2-M_1)G_1+o(\left\Vert M_2-M_1\right\Vert) \end{array}$

which gives in particular that for every ${1\leq a,b,p,q\leq n}$,

$\partial_{M_{pq}}G_{ab}=-G_{ap}G_{qb}$

As a consequence, we get, using the integration by parts, for every ${1\leq p\leq n}$,

$\mathbb{E}(GM)_{pp} =\sum_{q=1}^n\mathbb{E}(G_{pq}M_{qp}) =\frac{1}{n}\sum_{q=1}^n\mathbb{E}(\partial_{pq}G_{pq}) =-\frac{1}{n}\sum_{q=1}^n\mathbb{E}(G_{pp}G_{qq}) =-\mathbb{E}(G_{pp}\tau(G)).$

Summing up, we obtain, after taking normalized traces,

$-1-zs_{\mu_n}(z)=\mathbb{E}(\tau(G)^2).$

Poincaré inequality. To compare ${\mathbb{E}(\tau(G)^2)}$ in the right hand side with ${\mathbb{E}(\tau(G))^2=s_{\mu_n}(z)^2}$, we may use concentration, or alternatively and following Pastur, a Poincaré inequality. Namely, we set ${f(M):=\tau(G)}$ and we write

$\left|\mathbb{E}(\tau(G)^2)-(\mathbb{E}\tau(G))^2\right| = \left|\mathbb{E}((\tau(G)-\mathbb{E}\tau(G))^2)\right| \leq \mathbb{E}(\left|\tau(G)-\mathbb{E}\tau(G)\right|^2) =\mathrm{Var}(f(M)).$

Now ${M}$ is Gaussian with covariance matrix ${C}$ such that ${\left\Vert C\right\Vert_{2\rightarrow2}\leq \mathcal{O}(n^{-1})}$. Thus, the Poincaré inequality for the law of ${M}$ and for function ${f}$ gives

$\mathrm{Var}(f(M)) \leq \mathcal{O}(n^{-1})\mathbb{E}(\left\Vert \nabla f(M)\right\Vert_2^2).$

But

$\partial_{M_{pq}}\tau(G) =\frac{1}{n}\sum_a\partial_{M_{pq}}G_{aa} =-\frac{1}{n}\sum_aG_{ap}G_{qa} =-\frac{1}{n}G^2_{qp}$

and therefore

$\left\Vert \nabla f\right\Vert_2^2 =\frac{1}{n^2}\sum_{p,q}\left|\partial_{M_{pq}}\tau(G)\right|^2 =\frac{1}{n^2}\left\Vert G^2\right\Vert_{\mathrm{HS}}^2 \leq \frac{1}{n}\left\Vert G^2\right\Vert_{2\rightarrow2}^2 \leq \frac{1}{n}\left\Vert G\right\Vert_{2\rightarrow2}^4 \leq \frac{1}{n(\mathfrak{I} z)^4}.$

Putting all together, this gives finally

$\left|s_{\mu_n}(z)^2+zs_{\mu_n}(z)+1\right| = \mathcal{O}\left(\frac{1}{n^2(\mathfrak{I} z)^4}\right) \underset{n\rightarrow\infty}{\longrightarrow}0.$

Conclusion. The equation ${s^2+zs+1=0}$ in ${s}$ has two roots ${s_\pm(z):=\frac{1}{2}(-z\pm\sqrt{z^2-4})}$. We observe that ${\mathfrak{I}s_{\mu}(z)\mathfrak{I}z>0}$. This gives that ${\lim_{n\rightarrow\infty}s_n(z)=\frac{1}{2}(-z+\sqrt{z^2-4})}$ for every ${z\in\mathbb{C}_+}$, which is the Cauchy-Stieltjes transform of ${\mu_{\mathrm{SC}}}$. We conclude that ${\mu_n}$ converges weakly as ${n\rightarrow\infty}$ to ${\mu_{\mathrm{SC}}}$. One may drop the expectation in ${\mu_n}$ and obtain an almost sure weak convergence by using concentration and the first Borel-Cantelli lemma.

Further reading. This method is quite powerful and can be extended to many models of random matrices, including Wigner matrices (not necessarily Gaussian), as explained in the book by Pastur and Shcherbina.