Informal aspects of free probability

Dan-Virgil Voiculescu (photo by G. Bergman)

This post gives some few bits of free probability theory. The content is mostly taken from the expository text arXiv:1405.1003. Free probability theory was forged in the 1980's by Dan-Virgil Voiculescu, while working on isomorphism problems in von Neumann operator algebras of free groups. Voiculescu discovered later in the 1990's that free probability is the algebraic structure that emerges from the asymptotic global spectral analysis of random matrix models as the dimension tends to infinity. Free probability theory comes among other things with algebraic analogues of the central limit theorem and the Boltzmann entropy. The term ``free'' in ``free probability theory'' and in ``free entropy'' comes from the free group (see below), and has no relation with the term ``free'' in the Helmholtz free energy which comes from thermodynamics (available work obtainable at constant temperature).

Algebraic probability space. Let \( {\mathcal{A}} \) be an algebra over \( {\mathbb{C}} \), with unity \( {id} \), equipped with an involution \( {a\mapsto a^*} \) and a normalized linear form \( {\tau:\mathcal{A}\rightarrow\mathbb{C}} \) such that \( {\tau(ab)=\tau(ba)} \), \( {\tau(id)=1} \), and \( {\tau(aa^*)\geq0} \). A basic non commutative example is given by the algebra of square complex matrices: \( {\mathcal{A}=\mathcal{M}_n(\mathbb{C})} \), \( {id=I_n} \), \( {a^*=\bar{a}^\top} \), \( {\tau(a)=\frac{1}{n}\mathrm{Tr}(a)} \), for which \( {\tau} \) appears as an expectation with respect to the empirical spectral distribution: denoting \( {\lambda_1(a),\ldots,\lambda_n(a)\in\mathbb{C}} \) the eigenvalues of \( {a} \), we have

\[ \tau(a)=\frac{1}{n}\sum_{k=1}^n\lambda_k(a)=\int\!x\,d\mu_a(x) \quad\text{where}\quad \mu_a:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(a)}. \]

If \( {a=a^*} \) (we say that \( {a} \) is real or Hermitian) then \( {\mu_a} \) is supported in \( {\mathbb{R}} \) and is fully characterized by the collection of moments \( {\tau(a^m)} \), \( {m\geq0} \), which can be seen as a sort of algebraic distribution of \( {a} \). Beyond this example, by analogy with classical probability theory, we may see the elements of \( {\mathcal{A}} \) as algebraic analogues of bounded random variables, and \( {\tau} \) as an algebraic analogue of an expectation, and \( {\tau(a^m)} \) as the analogue of the \( {m} \)-th moment of \( {a} \). We say that \( {a\in\mathcal{A}} \) has mean \( {\tau(a)} \) and variance \( {\tau((a-\tau(a))^2)=\tau(a^2)-\tau(a)^2} \), and that \( {a} \) is centered when \( {\tau(a)=0} \). The \( {*} \)-law of \( {a} \) is the collection of mixed moments of \( {a} \) and \( {a^*} \) called the \( {*} \)-moments:

\[ \tau(b_1\cdots b_m)\text{ where } b_1,\ldots,b_m\in\{a,a^*\}\text{ and } m\geq1. \]

In contrast with classical probability, the product of algebraic variables may be non commutative. When \( {a\in\mathcal{A}} \) is real \( {a^*=a} \), then the \( {*} \)-law of \( {a} \) boils down to the moments: \( {\tau(a^m)} \), \( {m\geq0} \). In classical probability theory, the law of a real bounded random variable \( {X} \) is characterized by its moments \( {\mathbb{E}(X^m)} \), \( {m\geq0} \), thanks to the (Stone-)Weierstrass theorem. When the bounded variable is not real and takes its values in \( {\mathbb{C}} \) then we need the mixed moments \( {\mathbb{E}(X^m\bar{X}^n)} \), \( {m,n\geq0} \).

The so-called Gelfand-Naimark-Segal (GNS) construction shows that any algebraic probability space can be realized as a subset of the algebra of bounded operators on a Hilbert space. Using then the spectral theorem, this shows that any compactly supported probability measure is the \( {*} \)-law of some algebraic random variable.

Hermitization and Brown spectral measure. One can connect \( {*} \)-law and spectrum even for non real elements. Namely, if \( {a\in\mathcal{M}_n(\mathbb{C})} \) and if \( {\mu_a:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(a)}} \) is its empirical spectral distribution in \( {\mathbb{C}} \), then, for any \( {z\not\in\{\lambda_1(a),\ldots,\lambda_n(a)\}} \),

\[ \begin{array}{rcl} \frac{1}{2}\tau(\log((a-zid)(a-zid)^*)) &=&\frac{1}{n}\log\left|\det(a-zI_n)\right| \\ &=&\int\!\log\left|z-\lambda\right|\,d\mu_a(\lambda) \\ &=&(\log\left|z-\cdot\right|*\mu_a)(z) \\ &=&:-U_{\mu_a}(z). \end{array} \]

The quantity \( {U_{\mu_a}(z)} \) is the logarithmic potential at point \( {z\in\mathbb{C}} \) of the probability measure \( {\mu_a} \). Since \( {-\frac{1}{2\pi}\log\left|\cdot\right|} \) is the so-called fundamental solution of the Laplace equation in dimension \( {2} \), it follows that in the sense of Schwartz distributions,

\[ \mu_a=\frac{1}{2\pi}\Delta U_{\mu_a}. \]

Following Brown, beyond the matrix case \( {\mathcal{A}=\mathcal{M}_n(\mathbb{C})} \), this suggest to define the spectral measure of an algebraic variable \( {a\in\mathcal{A}} \) in an abstract algebra \( {\mathcal{A}} \) as being the probability measure \( {\mu_a} \) on \( {\mathbb{C}} \) given by

\[ \mu_a:=-\frac{1}{\pi}\Delta\tau(\log((a-zid)(a-zid)^*)) \]

where here again \( {\Delta=4\partial\bar{\partial}} \) is the two-dimensional Laplacian acting on \( {z} \), as soon as we know how to define the operator \( {\log((a-zid)(a-zid)^*)} \) for every \( {z} \) such that \( {a-zid} \) is invertible. This makes sense for instance if \( {\mathcal{A}} \) is a von Neumann algebra of bounded operators on a Hilbert space, since one may define \( {\log(b)} \) if \( {b^*=b} \) by using functional calculus. The moral of the story is that the \( {*} \)-law of \( {a} \) determines the \( {*} \)-law of the Hermitian element \( {(a-zid)(a-zid)^*} \) for every \( {z\in\mathbb{C}} \), which in turn determines the Brown spectral measure \( {\mu_a} \). This strategy is known as Hermitization.

Freeness. The notion of freeness is an algebraic analogue of independence. In classical probability theory, a collection of \( {\sigma} \)-field are independent if the product of bounded random variables is centered as soon as the factors are centered and measurable with respect to different \( {\sigma} \)-fields. We say that \( {\mathcal{B}\subset\mathcal{A}} \) is a sub-algebra of \( {\mathcal{A}} \) when it is stable by the algebra operations, by the involution, and contains the unity \( {id} \). By analogy with classical probability, we say that the collection \( {{(\mathcal{A}_i)}_{i\in I}} \) of sub-algebras of \( {\mathcal{A}} \) are free when for any integer \( {m\geq1} \), \( {i_1,\ldots,i_m\in I} \), and \( {a_1\in\mathcal{A}_{i_1},\ldots,a_m\in\mathcal{A}_{i_m}} \), we have

\[ \tau(a_1\cdots a_m)=0 \]

as soon as \( {\tau(a_1)=\cdots=\tau(a_m)=0} \) and \( {i_1\neq \cdots\neq i_m} \) (only consecutive indices are required to be different). We say that \( {{(a_i)}_{i\in I}\subset\mathcal{A}} \) are free when the sub-algebras that they generate are free. If for instance \( {a,b\in\mathcal{A}} \) are free and centered, then \( {\tau(ab)=0} \), and \( {\tau(abab)=0} \). Note that in classical probability, the analogue of this last expression will never be zero if \( {a} \) and \( {b} \) are not zero, due to the commutation relation \( {abab=a^2b^2} \).

Can we find examples of free matrices in \( {\mathcal{M}_n(\mathbb{C})} \)? Actually, this will not give exciting answers. Freeness is more suited for infinite dimensional operators. It turns out that the definition and the name of freeness come from a fundamental infinite dimensional example constructed from the free group. More precisely, let \( {F_n} \) be the free group with \( {2n} \) generators (letters and anti-letters) \( {g_1^{\pm1},\ldots,g_n^{\pm1}} \) with \( {n\geq2} \) (for \( {n=1} \), \( {F_n=\mathbb{Z}} \) is commutative). The term ``free'' comes from the fact that \( {F_n} \) is the free product of \( {n} \) copies of \( {\mathbb{Z}} \), without any additional relation. Let \( {\varnothing} \) be the neutral element of \( {F_n} \) (empty string). Let \( {A_n} \) be the associated free algebra identified with \( {\ell_\mathbb{C}^2(F_n)} \). Each element of \( {A_n} \) can be seen as a complex measure of the form \( {\sum_{w\in F_n}c_w\delta_w} \), and \( {{(\delta_w)}_{w\in F_n}} \) is the canonical basis: \( {\langle\delta_w,\delta_{w'}\rangle=\mathbf{1}_{w=w'}} \). The product on \( {A_n} \) is the convolution of measures on \( {F_n} \):

\[ \left(\sum_{w\in F_n}c_w\delta_w\right)\left(\sum_{w\in F_n}c'_w\delta_w\right) =\sum_{w\in F_n}\left(\sum_{v\in F_n}c_vc'_{v^{-1}w}\right)\delta_w =\sum_{w\in F_n}(c*c')_w\delta_w. \]

Now, let \( {\mathcal{A}} \) be the algebra over \( {\mathbb{C}} \) of linear operators from \( {\ell_\mathbb{C}^2(F_n)} \) to itself. The product in \( {\mathcal{A}} \) is the composition of operators. The involution \( {*} \) in \( {\mathcal{A}} \) is the transposition-conjugacy of operators. The identity operator is denoted \( {id} \). We consider the linear form \( {\tau:\mathcal{A}\rightarrow\mathbb{C}} \) defined for every \( {a\in\mathcal{A}} \) by

\[ \tau(a)=\langle a\delta_\varnothing,\delta_\varnothing\rangle. \]

For every \( {w\in F_n} \), let \( {u_w\in\mathcal{A}} \) be the left translation operator defined by \( {u_w(\delta_v)=\delta_{wv}} \). Then

\[ u_w u_{w'}=u_{ww'},\quad (u_w)^*=u_{w^{-1}},\quad u_\varnothing=id, \]

and therefore \( {u_w u_w^*=u_w^* u_w=id} \) (we say that \( {u_w} \) is unitary). We have \( {\tau(u_w)=\mathbf{1}_{w=\varnothing}} \). Let us show that the sub-algebras \( {\mathcal{A}_1,\ldots,\mathcal{A}_n} \) generated by \( {u_{g_1},\ldots,u_{g_n}} \) are free. Each \( {\mathcal{A}_i} \) consists in linear combinations of \( {u_{g_i^r}} \) with \( {r\in\mathbb{Z}} \), and centering forces \( {r=0} \). For every \( {w_1,\ldots,w_m\in F_n} \), we have

\[ \tau(u_{w_1}\cdots u_{w_m}) =\tau(u_{w_1\cdots w_m}) =\mathbf{1}_{w_1\cdots w_m=\varnothing}. \]

Let us consider the case \( {w_j=g_{i_j}^{r_j}\in\mathcal{A}_{i_j}} \) with \( {r_j\in\mathbb{Z}} \), for every \( {1\leq j\leq m} \). Either \( {r_j=0} \) and \( {w_j=\varnothing} \) or \( {r_j\neq0} \) and \( {\tau(w_j)=0} \). Let us assume that \( {r_1\neq0,\ldots,r_n\neq0} \), which implies \( {\tau(u_{w_1})=\cdots=\tau(u_{w_n})=0} \). Now since \( {G_n} \) is a tree, it does not have cycles, and thus if we follow a path starting from the root \( {\varnothing} \) then we cannot go back to the root if we never go back locally along the path (this is due to the absence of cycles). Consequently, if additionally \( {i_1\neq\cdots\neq i_m} \) (i.e. two consecutive terms are different), then we have necessarily \( {w_1\cdots w_m\neq\varnothing} \), and therefore \( {\tau(u_{w_1}\cdots u_{w_m})=0} \). From this observation one can conclude that \( {\mathcal{A}_1,\ldots,\mathcal{A}_n} \) are free.

Beyond the example of the free group: if \( {G_1,\ldots,G_n} \) are groups and \( {G} \) their free product, then the algebras generated by \( {\{u_g:g\in G_1\},\ldots,\{u_g:g\in G_n\}} \) are always free in the one generated by \( {\{u_g:g\in G\}} \).

Conceptual dictionary between classical probability and free probability. The first is commutative while the second is typically non commutative. Free probability is the algebraic structure that emerges from the asymptotic analysis, over the dimension, of the empirical spectral distribution of unitary invariant random matrices. The term free comes from the algebra of linear operators over the free algebra of the free group, an example in which the concept of freeness emerges naturally, in relation with the symmetric random walk on the infinite regular tree of even degree \( {\geq 4} \) (which is the Cayley graph of a non-commutative free group).

Classical probability	Free probability
Bounded r.v. \( {X} \) on \( {\mathbb{C}} \)	Algebra element \( {a\in\mathcal{A}} \)
\( {\mathbb{E}(X^m\bar{X}^{n})} \)	\( {\tau(b_1\cdots b_m)} \), \( {b\in\{a,a^*\}^m} \)
Law = Moments	Law = \( {*} \)-moments
\( {X} \) is real	\( {a=a^*} \)
Independence	Freeness
Classical convolution \( {*} \)	Free convolution \( {\boxplus} \)
Gaussian law (with CLT)	Semicircle law (with CLT)
Boltzmann entropy \( {\mathcal{S}} \)	Voiculescu entropy \( {\chi} \)

Law of free couples and free convolution. In classical probability theory, the law of a couple of independent random variables is fully characterized by the couple of laws of the variables. In free probability theory, the \( {*} \)-law of the couple \( {(a,b)} \) is the collection of mixed moments in \( {a,a^*,b,b^*} \). If \( {a,b} \) are free, then one can compute the \( {*} \)-law of the couple \( {(a,b)} \) by using the \( {*} \)-law of \( {a} \) and \( {b} \), thanks to the centering trick. For instance, in order to compute \( {\tau(ab)} \), we may write using freeness \( {0=\tau((a-\tau(a))(b-\tau(b)))=\tau(ab)-\tau(a)\tau(b)} \) to get \( {\tau(ab)=\tau(a)\tau(b)} \). As a consequence, one can show that the \( {*} \)-law of a couple of free algebraic variables is fully characterized by the couple of \( {*} \)-laws of the variables. This works for arbitrary vectors of algebraic variables.

In classical probability theory, the law of the sum of two independent random variables is given by the convolution of the law of the variables. In free probability theory, the \( {*} \)-law of the sum \( {a+b} \) of two free algebraic variables \( {a,b\in\mathcal{A}} \) is given by the so-called free convolution \( {\mathrm{dist}(a)\boxplus\mathrm{dist}(b)} \) of the \( {*} \)-law of \( {a} \) and \( {b} \), which can be defined using the \( {*} \)-law of the couple \( {(a,b)} \). Given an at most countable family of compactly supported probability measures on \( {\mathbb{R}} \), it can be show that one can always construct an algebraic probability space containing free algebraic variables admitting these probability measures as their \( {*} \)-distributions. The free convolution \( {\boxplus} \) of probability measures is associative. We have so far at hand an algebraic framework, called free probability theory, in which the concepts of algebra elements, trace, \( {*} \)-law, freeness, and free convolution are the analogue of the concepts of bounded random variables, expectation, law, independence, and convolution of classical probability theory. Do we have a CLT, and an analogue of the Gaussian? The answer is positive.

Free CLT and semicircle law. It is natural to define the convergence in \( {*} \)-law, denoted \( {\overset{*}{\rightarrow}} \), as being the convergence of all \( {*} \)-moments. The Voiculescu free CLT states that if \( {a_1,a_2,\ldots\in\mathcal{A}} \) are free, real \( {a_i=a_i^*} \), with same \( {*} \)-law, zero mean \( {\tau(a_i)=0} \), unit variance \( {\tau(a_i^2)=1} \), then

\[ s_n:=\frac{a_1+\cdots+a_n}{\sqrt{n}} \overset{*}{\underset{n\rightarrow\infty}{\longrightarrow}} \frac{\sqrt{4-x^2}\mathbf{1}_{[-2,2]}}{2\pi}dx. \]

The limiting \( {*} \)-law is given by the moments of the semicircle law on \( {[-2,2]} \), which are \( {0} \) for odd moments and the Catalan numbers \( {{(C_m)}_{m\geq0}} \) for even moments: for every \( {m\geq0} \),

\[ \int_{-2}^2\!x^{2m+1}\frac{\sqrt{4-x^2}}{2\pi}dx=0 \quad\text{and}\quad \int_{-2}^2\!x^{2m}\frac{\sqrt{4-x^2}}{2\pi}dx =C_m:=\frac{1}{1+m}\binom{2m}{m}. \]

The semicircle law is also known as the Wigner distribution in random matrix theory and the Sato-Tate distribution in number theory. An algebraic variable \( {b\in\mathcal{A}} \) has semicircle \( {*} \)-law when it is real \( {b=b^*} \) and \( {\tau(b^{2m+1})=0} \) and \( {\tau(b^{2m})=C_m} \) for every \( {m\geq0} \). The proof of the free CLT consists in computing the moments of \( {s_n} \) using freeness. This reveals three type of terms: terms which are zero at fixed \( {n} \) thanks to freeness and centering, terms having zero contribution asymptotically as \( {n\rightarrow\infty} \), and terms which survive at the limit, and which involve only the second moment of the \( {a_i} \)'s. Beyond freeness, there exists other notions of algebraic independence, allowing to compute moments, and leading to a CLT with other limiting distributions, including the Bernoulli distribution and the arcsine distribution.

As for the classical CLT, the first two moments are conserved along the free CLT: \( {\tau(s_n)=0} \) and \( {\tau(s_n^2)=1} \) for all \( {n\geq1} \). The semicircle \( {*} \)-law is the free analogue of the Gaussian distribution of classical probability. The semicircle \( {*} \)-law is stable by free convolution: if \( {a_1,\ldots,a_n} \) are free with semicircle \( {*} \)-law then the \( {*} \)-law of \( {a_1+\cdots+a_n} \) is also semicircle and its second moment is the sum of the second moments. In particular \( {s_n=(a_1+\cdots+a_n)/\sqrt{n}} \) is semicircle, just like the Gaussian case in the CLT of classical probability! If \( {\mu_1,\ldots,\mu_n} \) are semicircle laws then their free convolution \( {\mu_1\boxplus\cdots\boxplus\mu_n} \) is also a semicircle law and its variance is the sum of the variances.

Random walks and CLT. Let us reconsider the free group \( {F_n} \). The algebraic variables

\[ a_1=\frac{u_{g_1}+u_{g_1^{-1}}}{\sqrt{2}},\ldots, a_n=\frac{u_{g_n}+u_{g_n^{-1}}}{\sqrt{2}} \]

are free, real, centered, with unit variance. Let us define

\[ a=\sum_{i=1}^n(u_{g_i}+u_{g_i}^{-1})=\sqrt{2}\sum_{i=1}^n a_i \quad\text{and}\quad p =\frac{a}{2n} =\frac{a_1+\cdots+a_n}{\sqrt{2}n}. \]

Then \( {a} \) is the adjacency operator of the Cayley graph \( {G_n} \) of the free group \( {F_n} \), which is \( {2n} \)-regular, without cycles (a tree!), rooted at \( {\varnothing} \). Moreover, \( {\langle p\delta_v,\delta_w\rangle=\frac{1}{2n}\mathbf{1}_{wv^{-1}\in S}} \) where \( {S=\{g_1^{\pm1},\ldots,g_n^{\pm}\}} \), and thus \( {p} \) is the transition kernel of the simple random walk on \( {G_n} \). For every integer \( {m\geq0} \), the quantity \( {\tau(a^m)=\langle a^m\delta_\varnothing,\delta_\varnothing\rangle} \) is the number of paths of length \( {m} \) in \( {G_n} \) starting and ending at the root \( {\varnothing} \). From the Kesten-McKay theorem, for every integer \( {m\geq0} \),

\[ \tau(a^m) =\langle a^m\delta_\varnothing,\delta_\varnothing\rangle =\int\!x^m\,d\mu_{d}(x), \]

where \( {\mu_{d}} \) is the Kesten-McKay distribution with parameter \( {d=2n} \), given by

\[ d\mu_{d}(x) :=\frac{d\sqrt{4(d-1)-x^2}}{2\pi(d^2-x^2)} \mathbf{1}_{[-2\sqrt{d-1},2\sqrt{d-1}]}(x)dx. \]

By parity we have \( {\tau(a^{2m+1})=0} \) for every \( {m\geq0} \). When \( {n=1} \) then \( {d=2} \) and \( {G_2} \) is the Cayley graph of \( {F_1=\mathbb{Z}} \), the corresponding Kesten-McKay law \( {\mu_{2}} \) is the arcsine law on \( {[-2,2]} \),

\[ \langle a^{2m}\delta_\varnothing,\delta_\varnothing\rangle =\int\!x^{2m}\,d\mu_{2}(x) =\int_{-2}^2\frac{x^{2m}}{\pi\sqrt{4-x^2}}dx =\binom{2m}{m}, \]

and we recover the fact that the number of paths of length \( {2m} \) in \( {\mathbb{Z}} \) starting and ending at the root \( {\varnothing} \) (which is the origin \( {0} \)) is given by the central binomial coefficient. The binomial combinatorics is due to the commutativity of \( {F_1=\mathbb{Z}} \). At the opposite side of degree, when \( {d=2n\rightarrow\infty} \) then \( {\mu_{d}} \), scaled by \( {(d-1)^{-1/2}} \), tends to the semicircle law on \( {[-2,2]} \):

\[ \lim_{d\rightarrow\infty} \frac{\langle a^{2m}\delta_\varnothing,\delta_\varnothing\rangle}{(d-1)^m} =\lim_{d\rightarrow\infty} \int\!\left(\frac{x}{\sqrt{d-1}}\right)^{2m}\!\!\!\!d\mu_{d}(x) =\int_{-2}^2 y^{2m}\frac{\sqrt{4-y^2}}{2\pi}dy =C_m:=\frac{1}{1+m}\binom{2m}{m}. \]

As a consequence, we have, thanks to \( {(d-1)^m\sim_{n\rightarrow\infty}d^m=(2n)^m=(\sqrt{2n})^{2m}} \),

\[ \tau\left(\left(\frac{a_1+\cdots+a_n}{\sqrt{n}}\right)^{2m}\right) =\frac{\tau(a^{2m})}{(2n)^m} = \frac{\langle a^{2m}\delta_\varnothing,\delta_\varnothing\rangle}{d^m} \underset{n\rightarrow\infty}{\longrightarrow} C_m. \]

This is nothing else but a free CLT for the triangular array \( {{((a_1,\ldots,a_n))}_{n\geq1}} \)! The free CLT is the algebraic structure that emerges from the asymptotic analysis, as \( {n\rightarrow\infty} \), of the combinatorics of loop paths of the simple random walk on the Cayley graph \( {G_n} \) of the free group \( {F_n} \), and more generally on the \( {d} \)-regular infinite graph without cycles (a tree!) as the degree \( {d} \) tends to infinity.

We have \( {a=b_1+\cdots+b_n} \) where \( {b_i=\sqrt{2}a_i=u_{g_i}+u_{g_i^{-1}}} \). But for any \( {m\in\mathbb{N}} \) we have \( {\tau(b_i)^{2m+1}=0} \) and \( {\tau(b_i^{2m})=\sum_{r=1}^{2m}\binom{2m}{r}\tau(u_{g_i^{2(r-m)}})=\binom{2m}{m}} \), and therefore the \( {*} \)-law of \( {b_i} \) is the arcsine law \( {\mu_2} \), and thanks to the freeness of \( {b_1,\ldots,b_n} \) we obtain

\[ \mu_d=\underbrace{\mu_{2}\boxplus\cdots\boxplus\mu_{2}}_{d\text{ times}}. \]

The Kesten-McKay distribution appears for regular trees of even degree (Cayley graph of free groups) in the doctoral thesis of Harry Kesten published in 1959. It seems that no connections were made at that time with the contemporary works of Eugene Wigner on random matrices published in the 1950's. In 1981, Brendan McKay showed that these distributions appear as the limiting empirical spectral distributions of the adjacency matrices of sequences of (random) graphs which are asymptotically regular and without cycles (trees!). He does not cite Kesten and Wigner.

Free entropy. Inspired by the Boltzmann H-Theorem view of Shannon on the CLT of classical probability theory, one may ask if there exists, in free probability theory, a free entropy functional, maximized by the semicircle law at fixed second moment, and which is monotonic along the free CLT. We will see that the answer is positive. Let us consider a real algebraic variable \( {a\in\mathcal{A}} \), \( {a^*=a} \), such that there exists a probability measure \( {\mu_a} \) on \( {\mathbb{R}} \) such that for every integer \( {m\geq0} \),

\[ \tau(a^m)=\int\!x^m\,d\mu_a(x). \]

Inspired from the micro-macro construction of the Boltzmann entropy, one may consider an approximation at the level of the moments of the algebraic variable \( {a} \) (which is in general infinite dimensional) by Hermitian matrices (which are finite dimensional). Namely, following Voiculescu, for every real numbers \( {\varepsilon>0} \) and \( {R>0} \) and integers \( {m\geq0} \) and \( {d\geq1} \), let \( {\Gamma_R(a;m;d,\varepsilon)} \) be the relatively compact set of Hermitian matrices \( {h\in\mathcal{M}_d(\mathbb{C})} \) such that \( {\left\Vert h\right\Vert\leq R} \) and \( {\max_{0\leq k\leq m}\left|\tau(a^k)-\frac{1}{d}\mathrm{Tr}(h^k)\right|\leq\varepsilon} \). The volume \( {\left|\Gamma_R(a;m,d,\varepsilon)\right|} \) measures the degree of freedom of the approximation of the algebraic variable \( {a} \) by matrices, and is the analogue of the cardinal (which was multinomial) in the combinatorial construction of the Boltzmann entropy. We find

\[ \chi(a) :=\sup_{R>0}\inf_{m\in\mathbb{N}}\inf_{\varepsilon>0} \varlimsup_{d\rightarrow\infty} \left(\frac{1}{d^2}\log\left|\Gamma_R(a;m,d,\varepsilon)\right|+\frac{\log(d)}{2}\right) =\iint\!\log\left|x-y\right|\,d\mu_a(x)d\mu_a(y). \]

This quantity depends only on \( {\mu_a} \) and is also denoted \( {\chi(\mu_a)} \). It is a quadratic form in \( {\mu_a} \). It is the Voiculescu entropy functional (a survey is available). When \( {\mu} \) is a probability measure on \( {\mathbb{C}} \), we will still denote

\[ \chi(\mu):=\iint\!\log\left|x-y\right|\,d\mu(x)d\mu(y). \]

However, this is not necessarily the entropy of an algebraic variable when \( {\mu} \) is not supported in \( {\mathbb{R}} \). The Voiculescu free entropy should not be confused with the von Neumann entropy in quantum probability defined by \( {S(a)=-\tau(a\log(a))} \), which was considered by Elliot Lieb. For some models of random graphs, one can imitate Voiculescu and forge some sort of graphical Boltzmann entropy, which can be related to a large deviations rate functional, see the work of Bordenave and Caputo and references therein.

The semicircle law is for the free entropy the analogue of the Gaussian law for the Boltzmann entropy. The semicircle law on \( {[-2,2]} \) is the unique law that maximizes the Voiculescu entropy \( {\chi} \) among the laws on \( {\mathbb{R}} \) with second moment equal to \( {1} \):

\[ \arg\max\left\{\chi(\mu):\mathrm{supp}(\mu)\subset\mathbb{R},\int\!x^2\,d\mu(x)=1\right\} =\frac{\sqrt{4-x^2}\mathbf{1}_{[-2,2]}(x)}{2\pi}dx. \]

How about laws on \( {\mathbb{C}} \) instead of \( {\mathbb{R}} \)? The uniform law on the unit disc is the unique law that maximizes the functional \( {\chi} \) among the set of laws on \( {\mathbb{C}} \) with second moment (mean squared modulus) equal to \( {1} \) (here \( {z=x+iy} \) and \( {dz=dxdy} \)):

\[ \arg\max\left\{\chi(\mu):\mathrm{supp}(\mu)\subset\mathbb{C},\int\!|z|^2\,d\mu(z)=1\right\} =\frac{\mathbf{1}_{\{z\in\mathbb{C}:|z|=1\}}}{\pi}dz, \]

Under the uniform law on the unit disc, the real and the imaginary parts follow the semicircle law on \( {[-1,1]} \), and are not independent. If we say that an algebraic variable \( {c\in\mathcal{A}} \) is circular when its \( {*} \)-law is the uniform law on the unit disc of \( {\mathbb{C}} \), then, if \( {s_1,s_2\in\mathcal{A}} \) are free with \( {*} \)-law equal to the semicircle law on \( {[-2,2]} \), then \( {\frac{s_1+is_2}{\sqrt{2}}} \) is circular (here \( {i=(0,1)\in\mathbb{C}} \) is such that \( {i^2=-1} \)).

It turns out that the Voiculescu free entropy \( {\chi} \) is monotonic along the Voiculescu free CLT:

\[ \chi(a)=\chi(s_1)\leq\cdots\leq\chi(s_n)\leq\chi(s_{n+1})\leq\cdots \underset{n\rightarrow\infty}{\nearrow} \max\chi(s) \]

where still \( {s_n=n^{-1/2}(a_1+\cdots+a_n)} \) and where \( {s} \) is an semicircle algebraic variable. Dimitri Shlyakhtenko considered this problem in two papers. He gave a proof of this remarkable fact, based on a free Fisher information functional (which is a Hilbert transform), that captures simultaneously the classical and the free CLT. The Boltzmann-Shannon H-Theorem interpretation of the CLT is thus remarkably valid in classical probability theory, and in free probability theory.

A famous theorem of Eugene Wigner. Let \( {H} \) be a random \( {n\times n} \) Hermitian matrix belonging to the Gaussian Unitary Ensemble (GUE). This means that \( {H} \) has density proportional to

\[ e^{-\frac{n}{4}\mathrm{Tr}(H^2)} =e^{-\frac{n}{4}\sum_{1\leq j\leq 1}^n H_{jj}^2-\frac{n}{2}\sum_{1\leq j<k\leq n}|H_{jk}|^2}. \]

The entries of \( {H} \) are Gaussian, centered, independent, with variance \( {2/n} \) on the diagonal and \( {1/n} \) outside. Let \( {\mu_n=\frac{1}{n}\sum_{j=1}^n\delta_{\lambda_{j}(H)}} \) be the empirical spectral distribution of \( {H} \), which is a random discrete probability measure on \( {\mathbb{R}} \). For every \( {m\geq0} \), the mean \( {m} \)-th moment of \( {\mu_n} \) is given by

\[ \mathbb{E}\int\!x^m\,d\mu_n(x) =\frac{1}{n}\sum_{j=1}^n\lambda_{j}^m =\frac{\mathbb{E}\mathrm{Tr}(H^m)}{n} =\sum_{i_1,\ldots,i_m}\frac{\mathbb{E}(H_{i_1i_2}\cdots H_{i_{m-1}i_m}H_{i_mi_1})}{n}. \]

In particular, the first two moments of \( {\mathbb{E}\mu_n} \) satisfy to

\[ \mathbb{E}\int\!x\,d\mu_n(x)=\frac{\mathbb{E}(\mathrm{Tr}(H))}{n} =\frac{\mathbb{E}\sum_{j=1}^n H_{jj}}{n}=0 \]

and

\[ \mathbb{E}\int\!x^2\,d\mu_n(x)=\frac{\mathbb{E}(\mathrm{Tr}(H^2))}{n} =\mathbb{E}\frac{\sum_{j,k=1}^n |H_{jk}|^2}{n} =\frac{n(2/n)+(n^2-n)(1/n)}{n} \underset{n\rightarrow\infty}{\longrightarrow}1. \]

More generally, for any \( {m>2} \), the computation of the limiting \( {m} \)-th moment of \( {\mathbb{E}\mu_n} \) boils down to the combinatorics of the paths \( {i_1\rightarrow i_2\rightarrow\cdots\rightarrow i_m} \). It can be shown that the surviving terms as \( {n\rightarrow\infty} \) correspond to paths forming a tree and passing exactly zero or two times per each edge. This gives finally, for every integer \( {m\geq0} \), denoting \( {C_m} \) the \( {m} \)-th Catalan number,

\[ \lim_{n\rightarrow\infty}\mathbb{E}\int\!x^{2m+1}\,d\mu_n(x)=0 \quad\text{and}\quad \lim_{n\rightarrow\infty}\mathbb{E}\int\!x^{2m}\,d\mu_n(x)=C_m. \]

This means that \( {\mathbb{E}\mu_n} \) tends as \( {n\rightarrow\infty} \) in the sense of moments to the semicircle law on \( {[-2,2]} \) (which has unit variance). Just like the CLT, the result is actually universal, in the sense that it remains true if one drops the Gaussian distribution assumption of the entries. This is the famous Wigner theorem, in its modern general form, named after Eugene Paul Wigner (1902 -- 1995) who proved a version of it in the 1950's. The GUE case is in fact exactly solvable: one can compute the density of the eigenvalues, which turns out to be proportional to

\[ \prod_{j=1}^ne^{-\frac{n}{4}\lambda_{j}^2}\prod_{j<k}(\lambda_j-\lambda_k)^2 =\exp \left(-\frac{n}{4}\sum_{j=1}^n\lambda_j^2 -\sum_{j\neq k}\log\frac{1}{\left|\lambda_j-\lambda_k\right|}\right). \]

The logarithmic repulsion is the Coulomb repulsion in dimension \( {2} \). Also, this suggest to interpret the eigenvalues \( {\lambda_1,\ldots,\lambda_n} \) as a Coulomb gas of two-dimensional charged particles forced to stay in a one dimensional ramp (the real line) and experiencing a confinement by a quadratic potential. These formulas allow to deduce the semicircle limit of the one-point correlation (density of \( {\mathbb{E}\mu_n} \)), by using various methods, such as orthogonal polynomials, or large deviations theory.

Asymptotic freeness of unitary invariant random matrices. If \( {A} \) and \( {B} \) are two Hermitian \( {n\times n} \) matrices, then the spectrum of \( {A+B} \) depend not only on the spectrum of \( {A} \) and the spectrum of \( {B} \), but also on the eigenvectors of \( {A} \) and \( {B} \). Even if \( {A} \) and \( {B} \) commute, then they admit the same eigenvectors and the spectrum of \( {A+B} \) is the sum of the spectra of \( {A} \) and \( {B} \), but this depends on the way we label the eigenvalues, which depends in turn on the eigenvectors. Now if \( {A} \) and \( {B} \) are two independent random Hermitian matrices, there is no reason to believe that the empirical spectral distribution \( {\mu_{A+B}} \) of \( {A+B} \) depend only on the empirical spectral distributions \( {\mu_A} \) and \( {\mu_B} \) of \( {A} \) and \( {B} \). Let \( {A} \) be a \( {n\times n} \) random Hermitian matrix in the GUE, normalized such that \( {\mu_A} \) has a mean second moment equal to \( {1} \). Then the Wigner theorem says that \( {\mathbb{E}\mu_A} \) tend in the sense of moments, as \( {n\rightarrow\infty} \), to the semicircle law of unit variance. If \( {B} \) is an independent copy of \( {A} \), then, thanks to the convolution of Gaussian laws, \( {A+B} \) is identical in law to \( {\sqrt{2}A} \), and thus \( {\mathbb{E}\mu_{A+B}} \) tend, in the sense of moments, as \( {n\rightarrow\infty} \), to the semicircle law of variance \( {2} \). Then, thanks to the free convolution of semicircle laws, we have

\[ \mathbb{E}\mu_{A+B}-\mathbb{E}\mu_A\boxplus\mathbb{E}\mu_B \underset{n\rightarrow\infty}{\overset{*}{\longrightarrow}}0, \]

where \( {\overset{*}{\rightarrow}} \) denotes the convergence of moments. Voiculescu has established that this asymptotic freeness phenomenon remains actually true beyond the GUE case, provided that the eigenspaces of the two matrices are randomly decoupled using a random unitary conjugation. For example, let \( {A} \) and \( {B} \) two \( {n\times n} \) Hermitian matrices such that \( {\mu_A\rightarrow\mu_a} \) and \( {\mu_B\rightarrow\mu_b} \) in the sense of moments as \( {n\rightarrow\infty} \), where \( {\mu_a} \) and \( {\mu_b} \) are two compactly supported laws on \( {\mathbb{R}} \). Let \( {U} \) and \( {V} \) be independent random unitary matrices uniformly distributed on the unitary group (we say Haar unitary). Then

\[ \mathbb{E}\mu_{UAU^*+VBV^*} \underset{n\rightarrow\infty}{\overset{*}{\longrightarrow}} \mu_a\boxplus\mu_b. \]

This asymptotic freeness reveals that free probability is the algebraic structure that emerges from asymptotic analysis of large dimensional unitary invariant models of random matrices.

Since the functional \( {\chi} \) is maximized by the uniform law on the unit disc, one may ask about an analogue of the Wigner theorem for non-Hermitian random matrices. The answer is positive. For our purposes, we will focus on a special ensemble of random matrices, introduced by Jean Ginibre in the 1960's, for which he computed explicitly the density of the eigenvalues. Note by the way that Jean Ginibre is also famous for Fortuin-Kasteleyn-Ginibre (FKG) correlation inequality, and for scattering theory for nonlinear Schrödinger equations.

Complex Ginibre Ensemble and the circular law. One of the most fascinating result in the asymptotic analysis of large dimensional random matrices is the circular law for the complex Ginibre ensemble, which can be proved using the Voiculescu functional \( {\chi} \) (maximized at fixed second moment by uniform law on unit disc). A simple model of random matrix is the Ginibre model:

\[ G= \begin{pmatrix} G_{11}&\cdots&G_{1n}\\ \vdots & \vdots & \vdots\\ G_{n1} & \cdots & G_{nn} \end{pmatrix} \]

where \( {{(G_{jk})}_{1\leq j,k\leq n}} \) are i.i.d. random variables on \( {\mathbb{C}} \), with \( {\Re G_{jk},\Im G_{jk}} \) of Gaussian law of mean \( {0} \) and variance \( {1/(2n)} \). In particular, \( {\mathbb{E}(|G_{jk}|^2)=1/n} \). The density of \( {G} \) is proportional to

\[ \prod_{j,k=1}^ne^{-n\left|G_{jk}\right|^2} = e^{-\sum_{j,k=1}^n n\left|G_{jk}\right|^2} = e^{-n\mathrm{Tr}(GG^*)}. \]

This shows that the law of \( {G} \) is unitary invariant, meaning that \( {UGU^*} \) and \( {G} \) have the same law, for every unitary matrix \( {U} \). Can we compute the law of the eigenvalues of \( {G} \)? Let \( {G=UTU^*} \) be the Schur unitary triangularization of \( {G} \). Here \( {T=D+N} \) where \( {D} \) is diagonal and \( {N} \) is upper triangular with null diagonal. In particular \( {N} \) is nilpotent, \( {T} \) is upper triangular, and the diagonal of \( {D} \) is formed with the eigenvalues \( {\lambda_1,\ldots,\lambda_n} \) in \( {\mathbb{C}} \) of \( {G} \). The Jacobian of the change of variable \( {G\mapsto (U,D,N)} \) is proportional to \( {\prod_{1\leq j<k\leq n}\left|\lambda_j-\lambda_k\right|^2} \) (for simplicity, we neglect here delicate problems related to the non-uniqueness of the Schur unitary decomposition). On the other hand,

\[ \mathrm{Tr}(GG^*) =\mathrm{Tr}(DD^*)+\mathrm{Tr}(DN^*)+\mathrm{Tr}(ND^*)+\mathrm{Tr}(NN^*) =\mathrm{Tr}(DD^*). \]

This allows to integrate out the \( {U,N} \) variables. The law of the eigenvalues is then proportional to

\[ e^{-n\sum_{j=1}^n\left|\lambda_j\right|^2}\prod_{1\leq j<k\leq n}\left|\lambda_j-\lambda_k\right|^2. \]

This defines a determinantal process on \( {\mathbb{C}} \): the complex Ginibre ensemble. In order to interpret the law of the eigenvalues as a Boltzmann measure, we put the Vandermonde determinant inside the exponential:

\[ e^{-n\sum_j\left|\lambda_j\right|^2+2\sum_{j<k}\log\left|\lambda_j-\lambda_k\right|}. \]

If we encode the eigenvalues by the empirical measure \( {\mu_n:=\frac{1}{n}\sum_{j=1}^n\delta_{\lambda_j}} \), this takes the form

\[ e^{-n^2 \mathcal{I}(\mu_n)} \]

where the ``energy'' \( {\mathcal{I}(\mu_n)} \) of the configuration \( {\mu_n} \) is defined via

\[ \mathcal{I}(\mu):= \int\!\left|z\right|^2\,d\mu(z) +\iint_{\neq}\!\log\frac{1}{\left|z-z'\right|}\,d\mu(z)d\mu(z'). \]

This suggests to interpret the eigenvalues \( {\lambda_1,\ldots,\lambda_n} \) of \( {G} \) as Coulomb gas of two-dimensional charged particles, confined by a an external field (quadratic potential) and subject to pair Coulomb repulsion. Note that \( {-\mathcal{I}} \) can also be seen as a penalized Voiculescu functional. Minimizing a penalized functional is equivalent to minimizing without penalty but under constraint (Lagrange). Presently, if \( {\mathcal{M}} \) is the set of probability measures on \( {\mathbb{C}} \) then \( {\inf_{\mathcal{M}}\mathcal{I}>-\infty} \) and the infimum is achieved at a unique probability measure \( {\mu_*} \), which is the uniform law on the unit disc of \( {\mathbb{C}} \).

How does the random discrete probability measure \( {\mu_n} \) behave as \( {n\rightarrow\infty} \)? Following Hiai and Petz one may adopt a large deviations approach. Let \( {\mathcal{M}} \) be the set of probability measures on \( {\mathbb{C}} \). One may show that the functional \( {\mathcal{I}:\mathcal{M}\rightarrow\mathbb{R}\cup\{+\infty\}} \) is lower semi continuous for the topology of narrow convergence, is strictly convex, and has compact level sets. Let us consider a distance compatible with the topology. It can be shown that for every ball \( {B} \) for this distance,

\[ \mathbb{P}(\mu_n\in B) \approx \exp\left(-n^2\inf_{B}(\mathcal{I}-\inf_{\mathcal{M}}\mathcal{I})\right) \]

Now either \( {\mu_*\in B} \) and in this case \( {\mathbb{P}(\mu_n\in B)\approx 1} \), or \( {\mu_*\not\in B} \) and in this case \( {\mathbb{P}(\mu_n\in B)\rightarrow0} \) exponentially fast. Actually, the first Borel-Cantelli lemma allows to deduce that almost surely

\[ \lim_{n\rightarrow\infty}\mu_n = \mu_* = \arg\inf \mathcal{I} = \frac{\mathbf{1}_{\{z\in\mathbb{C}:\left|z\right|\leq1\}}}{\pi}dz, \]

where \( {z=x+iy} \) and \( {dz=dxdy} \). This phenomenon is known as the circular law. If one starts with a Hermitian random Gaussian matrix -- the Gaussian Unitary Ensemble (GUE) -- then the same analysis is available, and produces a convergence to the semicircle law on \( {[-2,2]} \).

The circular law is universal, in the sense that it remains valid if one drops the Gaussian assumption of the entries of the matrix, while keeping the i.i.d. structure and the \( {1/n} \) variance. This was the subject of a long series of works by Girko, Bai, Tao and Vu, among others, and was the subject of several posts. Another way to go beyond the Gaussian case is to start from the Coulomb gas and to replace the quadratic confining potential \( {\left|\cdot\right|^2} \) by a more general potential \( {V:\mathbb{C}\rightarrow\mathbb{R}} \), not necessarily radial. This type of generalization was studied by Saff and Totik, Hiai and Petz, Adrien Hardy, among others.

Beyond random matrices, how about the empirical measure of random particles in \( {\mathbb{R}^d} \) with Coulomb type singular repulsion and external field confinement? Is there an analogue of the circular law phenomenon? Does the ball replace the disc? The answer is positive.

Further reading.

Some other posts: