# Libres pensées d'un mathématicien ordinaire Posts

This post discusses some basic aspects of the Central Limit Theorem (CLT) in relation with the notions of localization and sparsity. Let ${G\sim\mathcal{N}(0,1)}$ and ${(X_n)_{n\geq1}}$ be a sequence of independent real random variables with, for every ${n\geq1}$,

$\mathbb{E}(X_n)=0\quad\text{and}\quad \sigma_n^2:=\mathbb{E}(X_n^2)$

Let us define

$S_n:=\frac{X_1+\cdots+X_n}{s_n} \quad\text{where}\quad s_n^2:=\mathrm{Var}(X_1+\cdots+X_n)=\sigma_1^2+\cdots+\sigma_n^2.$

The Lindeberg CLT states that if, for every ${\varepsilon>0}$,

$\lim_{n\rightarrow\infty}\frac{1}{s_n^2}\sum_{k=1}^n \mathbb{E}(X_k^2\mathbf{1}_{\{|X_k|>\varepsilon s_n\}})=0$

then ${(S_n)_{n\geq1}}$ converges in distribution to the standard Gaussian ${\mathcal{N}(0,1)}$, in other words,

$\lim_{n\rightarrow\infty}\mathbb{P}(S_n\leq x)=\mathbb{P}(G\leq x) \quad\text{for all }x\in\mathbb{R}.$

Moreover, the Feller criterion states that if

$\lim_{n\rightarrow\infty}\max_{1\leq k\leq n}\frac{\sigma_k^2}{s_n^2}=0$

then the Lindeberg condition is necessary and sufficient for the convergence of ${(S_n)_{n\geq1}}$ in distribution to the standard Gaussian ${\mathcal{N}(0,1)}$. The Feller condition means that each single variance ${\sigma_k^2}$ represents an asymptotically negligible portion of the total variance ${s_n^2}$, as ${n}$ goes to infinity. In other words, the total variance is spread as ${n}$ goes to infinity.

On the contrary, and quite intuitively, one can guess that if ${(\sigma_n)_{n\geq1}}$ is localized then ${S_n}$ is very close to the sum of few terms for arbitrary large ${n}$, and the CLT may fail due to a lack of averaging (homogenization). Of course, if the sequence ${(\sigma_n)_{n\geq1}}$ is sparse (extreme localization!) i.e. ${\mathrm{Card}\{n\geq1:\sigma_n\neq0\}<\infty}$, then the CLT fails. Beyond sparsity, let us seek for a more subtle example for which one can check immediately from scratch that the CLT fails. Let us pick a sequence ${(\sigma_n)_{n\geq1}}$ of positive real numbers, and a sequence ${(U_n)_{n\geq1}}$ of bounded i.i.d. random variables on ${[-c,c]}$ with mean ${0}$ and variance ${1}$. If we define the random variable ${X_k:=\sigma_kU_k}$ then

$\mathbb{E}(X_k)=0 \quad\text{and}\quad \mathbb{E}(X_k^2)=\sigma_k^2$

and

$c^{-1}|S_n|\leq \frac{\left\Vert(\sigma_1,\ldots,\sigma_n)\right\Vert_1} {\left\Vert(\sigma_1,\ldots,\sigma_n)\right\Vert_2} =:\rho_n.$

Now, if ${(\rho_n)_{n\geq1}}$ is bounded then ${(S_n)_{n\geq1}}$ is bounded and thus the CLT fails. The norms-ratio ${\rho_n}$ measures the delocalisation of the vector ${(\sigma_1,\ldots,\sigma_n}$). Note that ${(\rho_n)_{n\geq1}}$ is bounded if ${(\sigma_n)_{n\geq1}}$ grows too fast or decays too fast. For instance, ${(\rho_n)_{n\geq1}}$ is bounded if ${(\sigma_n)_{n\geq1}\in\ell^1}$. On the other hand, since ${s_n^2=s_{n-1}^2+\sigma_n^2}$, the Cauchy-Schwarz inequality gives

$\rho_n \leq \frac{1}{s_n}\sum_{k=1}^{n-1}\sigma_k +\frac{\sigma_n}{s_n} \leq \sqrt{n-1}\,\frac{s_{n-1}}{\sigma_n}+1$

and ${(\rho_n)_{n\geq1}}$ is bounded e.g. if ${\sigma_n=s_{n-1}\sqrt{n}}$. The delocalization control is an essential aspect of the CLT. The Berry-Esséen theorem, which constitues a quantitative CLT, involves also a norms-ratio measuring localization: if ${(X_n)_{n\geq1}}$ are independent real random variables with

$\mathbb{E}(X_n)=0 \quad\text{and}\quad \sigma^2_n:=\mathbb{E}(X_n^2) \quad\text{and}\quad \tau_n^3:=\mathbb{E}(|X_n|^3)$

and if ${V_n:=(X_1,\ldots,X_n)}$ and ${S_n}$ is defined from ${(X_n)_{n\geq1}}$ as before then for all ${n\geq1}$,

$\sup_{x\in\mathbb{R}} \left|\mathbb{P}\left(S_n\leq x\right)-\mathbb{P}(G\leq x)\right| \leq 6\frac{\mathbb{E}(\left\Vert V_n\right\Vert_3^3)} {\mathbb{E}(\left\Vert V_n\right\Vert_2^2)^3} = 6\frac{\tau_1^3+\cdots+\tau_n^3}{(\sigma_1^2+\cdots+\sigma_n^2)^{3/2}}.$

You may take a look at the recent work of Klartag and Sodin on the role of delocalization in the Berry-Esséen theorem.

Measuring (de)localisation with norms-ratios is a classical trick in mathematics and physics. It plays a role for instance for eigenvectors in the formalization of the Anderson localization phenomenon for random Schrödinger operators, and in the recent work of Erdös, Schlein, Ramirez, Yau, Tao and Vu on the universality of eigenvalues spacings for models of random matrices. The norm ratio is also related to embeddings in the local theory of Banach spaces.

This post is inspired from a question asked by my friend Sébastien Blachère.

How to quickly simulate a positive real random variable $latex X$ with a polynomial tail, say $latex \mathbb{P}(X>t)\sim (1+t)^{-\alpha}$ as $latex t\to\infty$, for some prescribed $latex \alpha>0$. We may use an already coded function provided by  our favorite computational software. Quoting my former PhD advisor, you just have to press the button rand or something similar. But how to proceed from scratch using uniform random numbers? The inversion method suggests to set $latex X=U^{-1/\alpha}-1$ where $latex U$ is a uniform random variable on $latex [0,1].$ The law of $latex X$ is Pareto with probability density function $latex t\in[0,\infty)\mapsto a(1+t)^{-(1+a)}$.  It is amusing to remark that the fractional part of $latex X$ has probability density function

$latex \displaystyle t\in[0,1]\mapsto\alpha\sum_{n=1}^\infty\frac{1}{(n+t)^{1+\alpha}}=\alpha\zeta(\alpha+1,t+1)$

where $latex \zeta$ is the Hurwitz Zeta function. The integer part of $latex X$ follows a Zipf like law.

To simulate an $latex n$-sample of $latex X$ with GNU-Octave one can use rand(1,n).^(-1/alpha)-1. The problem is that on computers, real numbers are replaced by  floats: there is a quantification limit.  Small values of $latex \alpha$ produce a lack of precision for large values, which are frequent for heavy tails…

Regarding the algorithm, it is not wise to use the inversion method to simulate a law on an infinite set (recall that there is no uniform law of such sets).  To simulate $latex X$, one should prefer an excess packing method as in Marsaglia & Tsang Ziggurat algorithm. This alternative approach has the advantage of being faster and a bit more precise (the code is heavier). It is implemented for exponential, Gaussian, and gamma distributions in the  rande, randg, randn functions of GNU-Octave 3.2. If you read French, you may take a look at the first chapter of my pedagogical book with Bernard Bercu. This approach constitutes an instance of  a general meta principle in Computer Science which states that one can speed up algorithms by using static pre-computations, related to the rigid structure of the problem. The fastest algorithm is not necessarily the shortest algorithm in terms of number of code lines.

If you are interested in the simulation of $latex \alpha$-stable distributions, $latex 0<\alpha\leq 2$, you may take a look at the Chambers-Mallows-Stuck exact algorithm which generalizes the well known Box-Muller algorithm for Gaussians (suffers from the quantification limit).

For true (unpredictable) randomness, take a look at random.org and its GNU-R interface 😉

The logarithmic potential is a classical object of potential theory intimately connected with the two dimensional Laplacian. It appears also in free probability theory via the free entropy, and in partial differential equations e.g. Patlak-Keller-Segel models. This post concerns only it usage for the spectra of non Hermitian random matrices.

Let ${\mathcal{P}(\mathbb{C})}$ be the set of probability measures on ${\mathbb{C}}$ which integrate ${\log\left|\cdot\right|}$ in a neighborhood of infinity. For every ${\mu\in\mathcal{P}(\mathbb{C})}$, the logarithmic potential ${U_\mu}$ of ${\mu}$ on ${\mathbb{C}}$ is the function

$U_\mu:\mathbb{C}\rightarrow(-\infty,+\infty]$

defined for every ${z\in\mathbb{C}}$ by

$U_\mu(z)=-\int_{\mathbb{C}}\!\log|z-z’|\,\mu(dz’) =-(\log\left|\cdot\right|*\mu)(z). \ \ \ \ \ (1)$

This integral in well defined in ${(-\infty,+\infty]}$ thanks to the assumption ${\mu\in\mathcal{P}(\mathbb{C})}$. Note that ${U_\mu(z)=+\infty}$ if ${\mu}$ has an atom at point ${z}$. For the circular law ${\mathcal{U}_\sigma}$ of density

$(\sigma^2\pi)^{-1}\mathbf{1}_{\{z\in\mathbb{C}:|z|\leq\sigma\}}$

we have, for all ${z\in\mathbb{C}}$, the remarkable formula

$U_{\mathcal{U}_\sigma}(z)= \begin{cases} \log(\sigma)-\log|z\sigma^{-1}| & \text{if } |z|>\sigma, \\ \log(\sigma)-\frac{1}{2}(|z\sigma^{-1}|^2-1) & \text{if } |z|\leq\sigma, \end{cases} \ \ \ \ \ (2)$

see e.g. the book of Saff and Totik. Let ${\mathcal{D}'(\mathbb{C})}$ be the set of Schwartz-Sobolev distributions (generalized functions). Since ${\log\left|\cdot\right|}$ is Lebesgue locally integrable on ${\mathbb{C}}$, one can check by using the Fubini theorem that ${U_\mu}$ is Lebesgue locally integrable on ${\mathbb{C}}$. In particular,

$U_\mu<\infty$

Lebesgue almost everywhere (a.e.), and

$U_\mu\in\mathcal{D}'(\mathbb{C}).$

Since ${\log\left|\cdot\right|}$ is the fundamental solution of the Laplace equation in ${\mathbb{C}}$, we have, in ${\mathcal{D}'(\mathbb{C})}$,

$\Delta U_\mu=-2\pi\mu. \ \ \ \ \ (3)$

In other words, for every smooth and compactly supported “test function” ${\varphi:\mathbb{C}\rightarrow\mathbb{R}}$,

$\int_{\mathbb{C}}\!\Delta\varphi(z)U_\mu(z)\,dz =-2\pi\int_{\mathbb{C}}\!\varphi(z)\,\mu(dz).$

The identity (3) means that ${\mu\mapsto U_\mu}$ in the distributional Green potential of the Laplacian in ${\mathbb{C}}$. The function ${U_\mu}$ contains enough information to recover ${\mu}$:

Lemma 1 (Unicity) For every ${\mu,\nu\in\mathcal{P}(\mathbb{C})}$, if ${U_\mu=U_\nu}$ a.e. then ${\mu=\nu}$.

Proof: Since ${U_\mu=U_\nu}$ in ${\mathcal{D}'(\mathbb{C})}$, we get ${\Delta U_\mu=\Delta U_\nu}$ in ${\mathcal{D}'(\mathbb{C})}$. Now (3) gives ${\mu=\nu}$ in ${\mathcal{D}'(\mathbb{C})}$, and thus ${\mu=\nu}$ as measures since ${\mu}$ and ${\nu}$ are Radon measures. ☐

Let ${A}$ be an ${n\times n}$ complex matrix. We define the discrete probability measure on ${\mathbb{C}}$

$\mu_A:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(A)}$

where ${\lambda_1(A),\ldots,\lambda_n(A)}$ are the eigenvalues of ${A}$, i.e. the roots in ${\mathbb{C}}$ of its characteristic polynomial ${P_A(z):=\det(A-zI)}$. We also define the discrete probability measure on ${[0,\infty)}$

$\nu_A:=\frac{1}{n}\sum_{k=1}^n\delta_{s_k(A)}$

where ${s_1(A),\ldots,s_n(A)}$ are the singular values of ${A}$, i.e. the eigenvalues of the positive semidefinite Hermitian matrix ${\sqrt{AA^*}}$. We have now

$U_{\mu_A}(z) =-\int_{\mathbb{C}}\!\log\left| z’-z\right|\,\mu_A(dz’) =-\frac{1}{n}\log\left|\det(A-zI)\right| =-\frac{1}{n}\log\left| P_A(z)\right|$

for every ${z\in\mathbb{C}\setminus\{\lambda_1(A),\ldots,\lambda_n(A)\}}$. We have also the alternative expression

$U_{\mu_A}(z) =-\frac{1}{n}\log\det(\sqrt{(A-zI)(A-zI)^*}) =-\int_0^\infty\!\log(t)\,\nu_{A-zI}(dt). \ \ \ \ \ (4)$

The identity (4) bridges the eigenvalues with the singular values, and is at the heart of the following lemma, which allows to deduce the convergence of ${\mu_A}$ from the one of ${\nu_{A-zI}}$. The strength of this Hermitization lies in the fact that in contrary to the eigenvalues, one can control the singular values with the entries of the matrix. The price payed here is the introduction of the auxiliary variable ${z}$ and the uniform integrability. We recall that on a Borel measurable space ${(E,\mathcal{E})}$, we say that a Borel function ${f:E\rightarrow\mathbb{R}}$ is uniformly integrable for a sequence of probability measures ${(\eta_n)_{n\geq1}}$ on ${E}$ when

$\lim_{t\rightarrow\infty}\varlimsup_{n\rightarrow\infty}\int_{\{|f|>t\}}\!|f|\,d\eta_n=0. \ \ \ \ \ (5)$

We will use this property as follows: if ${(\eta_n)_{n\geq1}}$ converges weakly to ${\eta}$ and ${f}$ is continuous and uniformly integrable for ${(\eta_n)_{n\geq1}}$ then ${f}$ is ${\eta}$-integrable and ${\lim_{n\rightarrow\infty}\int\!f\,d\eta_n=\int\!f\,\eta}$. The idea of using Hermitization goes back at least to Girko. However, theorem 2 and lemma 3 below are inspired from the approach of Tao and Vu.

Theorem 2 (Girko Hermitization) Let ${(A_n)_{n\geq1}}$ be a sequence of complex random matrices where ${A_n}$ is ${n\times n}$ for every ${n\geq1}$, defined on a common probability space. Suppose that for a.a. ${z\in\mathbb{C}}$, there exists a probability measure ${\nu_z}$ on ${[0,\infty)}$ such that a.s.

• (i) ${(\nu_{A_n-zI})_{n\geq1}}$ converges weakly to ${\nu_z}$ as ${n\rightarrow\infty}$
• (ii) ${\log(\cdot)}$ is uniformly integrable for ${\left(\nu_{A_n-zI}\right)_{n\geq1}}$

Then there exists a probability measure ${\mu\in\mathcal{P}(\mathbb{C})}$ such that

• (j) a.s. ${(\mu_{A_n})_{n\geq1}}$ converges weakly to ${\mu}$ as ${n\rightarrow\infty}$
• (jj) for a.a. ${z\in\mathbb{C}}$,

$U_\mu(z)=-\int_0^\infty\!\log(t)\,\nu_z(dt).$

Moreover, if ${(A_n)_{n\geq1}}$ is deterministic, then the statements hold without the “a.s.”

Proof: Let ${z}$ and ${\omega}$ be such that (i-ii) hold. For every ${1\leq k\leq n}$, define

$a_{n,k}:=|\lambda_k(A_n-zI)| \quad\text{and}\quad b_{n,k}:=s_k(A_n-zI)$

and set ${\nu:=\nu_z}$. Note that ${\mu_{A_n-zI}=\mu_{A_n}*\delta_{-z}}$. Thanks to the Weyl inequalities and to the assumptions (i-ii), one can use lemma 3 below, which gives that ${(\mu_{A_n})_{n\geq1}}$ is tight, that ${\log\left| z-\cdot\right|}$ is uniformly integrable for ${(\mu_{A_n})_{n\geq1}}$, and that

$\lim_{n\rightarrow\infty}U_{\mu_{A_n}}(z)=-\int_0^\infty\!\log(t)\,\nu_z(dt)=:U(z).$

Consequently, a.s. ${\mu\in\mathcal{P}(\mathbb{C})}$ and ${U_\mu=U}$ a.e. for every adherence value ${\mu}$ of ${(\mu_{A_n})_{n\geq1}}$. Now, since ${U}$ does not depend on ${\mu}$, by lemma 1, a.s. ${\left(\mu_{A_n}\right)_{n\geq1}}$ has a unique adherence value ${\mu}$, and since ${(\mu_n)_{n\geq1}}$ is tight, ${(\mu_{A_n})_{n\geq1}}$ converges weakly to ${\mu}$ by the Prohorov theorem. Finally, by (3), ${\mu}$ is deterministic since ${U}$ is deterministic, and (j-jj) hold. ☐

The following lemma is in a way the skeleton of the Girko Hermitization of theorem 2. It states essentially a propagation of a uniform logarithmic integrability for a couple of triangular arrays, provided that a logarithmic majorization holds between the arrays. See arXiv:0808.1502v2 for a proof.

Lemma 3 (Logarithmic majorization and uniform integrability) Let ${(a_{n,k})_{1\leq k\leq n}}$ and ${(b_{n,k})_{1\leq k\leq n}}$ be two triangular arrays in ${[0,\infty)}$. Define the discrete probability measures

$\mu_n:=\frac{1}{n}\sum_{k=1}^n\delta_{a_{n,k}} \quad\text{and}\quad \nu_n:=\frac{1}{n}\sum_{k=1}^n\delta_{b_{n,k}}.$

If the following properties hold

• (i) ${a_{n,1}\geq\cdots\geq a_{n,n}}$ and ${b_{n,1}\geq\cdots\geq b_{n,n}}$ for ${n\gg1}$,
• (ii) ${\prod_{i=1}^k a_{n,i} \leq \prod_{i=1}^k b_{n,i}}$ for every ${1\leq k\leq n}$ for ${n\gg1}$,
• (iii) ${\prod_{i=k}^n b_{n,i} \leq \prod_{i=k}^n a_{n,i}}$ for every ${1\leq k\leq n}$ for ${n\gg1}$,
• (iv) ${(\nu_n)_{n\geq1}}$ converges weakly to some probability measure ${\nu}$ as ${n\rightarrow\infty}$,
• (v) ${\log(\cdot)}$ is uniformly integrable for ${(\nu_n)_{n\geq1}}$,

then

• (j) ${(\mu_n)_{n\geq1}}$ is tight,
• (jj) ${\log(\cdot)}$ is uniformly integrable for ${(\mu_n)_{n\geq1}}$,
• (jjj) we have, as ${n\rightarrow\infty}$,

$\int_0^\infty\!\log(t)\,\mu_n(dt) =\int_0^\infty\!\log(t)\,\nu_n(dt)\rightarrow\int_0^\infty\!\log(t)\,\nu(dt),$

and in particular, for every adherence value ${\mu}$ of ${(\mu_n)_{n\geq1}}$,

$\int_0^\infty\!\log(t)\,\mu(dt)=\int_0^\infty\!\log(t)\,\nu(dt).$

The logarithmic potential is related to the Cauchy-Stieltjes transform of ${\mu}$ via

$S_\mu(z) :=\int_{\mathbb{C}}\!\frac{1}{z’-z}\,\mu(dz’) =(\partial_x-i\partial_y)U_\mu(z) \quad\text{and thus}\quad (\partial_x+i\partial_y)S_\mu=-2\pi\mu$

in ${\mathcal{D}'(\mathbb{C})}$. The term “logarithmic potential” comes from the fact that ${U_\mu}$ is the electrostatic potential of ${\mu}$ viewed as a distribution of charges in ${\mathbb{C}\equiv\mathbb{R}^2}$. The logarithmic energy

$\mathcal{E}(\mu) :=\int_{\mathbb{C}}\!U_\mu(z)\,\mu(dz) =-\int_{\mathbb{C}}\int_{\mathbb{C}}\!\log\left| z-z’\right|\,\mu(dz)\mu(dz’)$

is up to a sign the Voiculescu free entropy of ${\mu}$ in free probability theory. The circular law ${\mathcal{U}_\sigma}$ minimizes ${\mu\mapsto\mathcal{E}(\mu)}$ under a second moment constraint. In the spirit of (4) and beyond matrices, the Brown spectral measure of a nonnormal bounded operator ${a}$ is

$\mu_a:=(-4\pi)^{-1}\Delta\int_0^\infty\!\log(t)\,\nu_{a-zI}(dt)$

where ${\nu_{a-zI}}$ is the spectral distribution of the self-adjoint operator ${(a-zI)(a-zI)^*}$. Due to the logarithm, the Brown spectral measure ${\mu_a}$ depends discontinuously on the ${*}$-moments of ${a.}$ For random matrices, this problem is circumvented in the Girko Hermitization by requiring a uniform integrability, which turns out to be a.s. satisfied for random matrices with i.i.d. entries with finite positive variance.

I have uploaded on arXiv a paper written with Charles Bordenave and Pietro Caputo entitled Circular Law Theorem for Random Markov Matrices. We provide, probably for the first time, an almost sure  circular law theorem for a model of random matrices with dependent entries and finite positive variance. This work leads us to formulate some open problems of variable difficulty, listed at the end of the introduction. Feel free to solve them!

To be a bit more precise, let $latex (X_{jk})_{jk\geq1}$ be i.i.d. nonnegative random variables with bounded density, mean $latex m$, and finite positive variance $latex \sigma^2$. Let $latex M$ be the $latex n\times n$ random Markov matrix with i.i.d. rows defined by $latex M_{jk}=X_{jk}/(X_{j1}+\cdots+X_{jn})$.  Let $latex \lambda_1,\ldots,\lambda_n$ be the eigenvalues of $latex \sqrt{n}M$ i.e. the roots in $latex \mathbb{C}$ of its characteristic polynomial. Our main result states that with probability one, the counting probability measure $latex \frac{1}{n}\delta_{\lambda_1}+\cdots+\frac{1}{n}\delta_{\lambda_n}$ converges weakly as $latex n\to\infty$ to the uniform law on the disk $latex \{z\in\mathbb{C}:|z|\leq m^{-1}\sigma\}$ (circular law).

In particular, when $latex X_{11}$ follows an exponential law, the random matrix $latex M$ belongs to the Dirichlet Markov Ensemble of random stochastic matrices, and our main result gives a positive answer to the question raised in a previous paper.

The matrix $latex M$ is Markov: its entries belong to $latex [0,1]$ and each row sums up to $latex 1$. It has equally distributed dependent entries. However, the rows of $latex M$ are i.i.d. and follow an exchangeable law on $latex \mathbb{R}^n$ supported by the simplex. Since $latex \sigma<\infty$, the uniform law of large numbers of Bai and Yin states that almost surely

$latex \displaystyle\max_{1\leq i\leq n}\left|X_{i1}+\cdots+X_{in}-nm\right|=o(n)$.

This suggests that $latex m^{-1}\sqrt{n}M$ is approximately equal to $latex n^{-1/2}X$ for $latex n$ large enough. One can then expect that almost surely the counting probability measure of the singular values of $latex \sqrt{n}M$ will converge as $latex n\to\infty$ to  a quartercircular law while the counting probability measure of the eigenvalues of $latex \sqrt{n}M$ will converges to a circular law. We show that this heuristics is valid. There is however a complexity gap between these two statements due to the fact that for nonnormal operators such as $latex M$, the eigenvalues are less stable than the singular values under perturbations (here the least singular value enters the scene).

The proof of the main result is inspired from the Tao and Vu version of the Girko Hermitization via logarithmic potentials. More precisely, the starting point is the following Hermitization identity valid for any $latex n\times n$ complex matrix $latex A$ and any $latex z$ outside the spectrum of $latex A$:

$latex \displaystyle \frac{1}{n}\sum_{k=1}^n\log|\lambda_k(A)-z|=\frac{1}{n}\sum_{k=1}^n\log(\lambda_k(\sqrt{(A-zI)(A-zI)^*})).$

The left hand side is the logarithmic potential at point $latex z$ of the counting probability measure of the eigenvalues of $latex A$. The right hand side is the integral of $latex \log(\cdot)$ for the counting probability measure of the singular values of $latex A-zI$. Thanks to the Girko Hermitization theorem, the problem boils down then to show that for Lebesgue almost all $latex z\in\mathbb{C}$, almost surely,

1. the counting probability measure $latex \nu_{\sqrt{n}M-zI}$ of the singular values of $latex \sqrt{n}M-zI$ tends weakly as $latex n\to\infty$ to a deterministic probability measure $latex \nu_z$ related to the circular law
2. the function $latex \log(\cdot)$ is uniformly integrable for the sequence $latex (\nu_{\sqrt{n}M-zI})_{n\geq1}$

For the first statement, we use a result of Dozier and Silverstein, which is based on Hermitian techniques: truncation, centralization,  recursion for the Cauchy-Stieltjes transform (trace of the resolvent) via Sherman-Morrison bloc inversion, producing finally a cubic equation for the Cauchy-Stieltjes transform of the limiting law.

The second statement  involves as essential ingredients a Talagrand concentration inequality (for the control of the distance of rows to the span of other rows), and a polynomial control of the operator norm of the resolvent (i.e. the least singular value of $latex \sqrt{n}M-zI$). The bounded density assumption is purely technical and comes from the way we control the operator norm of the resolvent. A possible route for the removal of this assumption is to control the least singular value for a certain kind of matrices with independent rows.