Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Orthogonal polynomials and Jacobi matrices

There are deep and beautiful links between orthogonal polynomials, symmetric tridiagonal matrices, one dimensional Schrödinger operators, and the Hamburger moments problem, see e.g. Van Assche, Szeg Ho, Chihara, Dette and Studden. The relationships with birth-and-death processes on \( {\mathbb{Z}_+} \) are considered in Karlin and Mc Gregor, Karlin and Mc Gregor, Van Assche, Dette and Studden.

Fix a prescribed bounded sequence \( {(\gamma_n)_{n\geq0}} \) of positive reals and define the symmetric tridiagonal matrix

\[ \Gamma= \begin{pmatrix} 0 & \gamma_1 \\ \gamma_1 & 0 & \gamma_2 \\ & \gamma_2 & 0 & \gamma_3 \\ && \ddots & \ddots & \ddots \\ &&& \gamma_{n-2} & 0 & \gamma_{n-1} \\ &&&& \gamma_{n-1} & 0 \end{pmatrix} \ \ \ \ \ (1) \]

with eigenvalues \( {(\lambda_{n,k})_{1\leq k\leq n}} \) and empirical spectral distribution

\[ \mu_{\Gamma}=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_{n,k}}. \]

The characteristic polynomial \( {P_n} \) of \( {\Gamma} \) is the last term of the second order recursion

\[ P_{k+1}(x)=xP_k(x)-\gamma_k^2P_{k-1}(x) \]

for every \( {0\leq k\leq n-1} \), with \( {P_{-1}\equiv0} \) and \( {P_0\equiv1} \). From the classical theory, \( {(P_n)_{n\geq0}} \) is the sequence of orthogonal polynomials for a compactly supported probability measure \( {\nu} \) on \( {\mathbb{R}} \). Namely, for every \( {i,j\geq0} \),

\[ \int_{\mathbb{R}}\!P_i(x)P_j(x)\,\nu(dx)=\delta_{i,j}. \]

The law \( {\nu} \) is unique and solves a Hamburger moments problem. The matrix \( {\Gamma} \) and the law \( {\nu} \) are known respectively as the Jacobi matrix and the spectral measure of \( {(P_n)_{n\geq0}} \). The roots \( {(\lambda_{n,k})_{1\leq k\leq n}} \) of \( {P_n} \) are real, distinct, belong to the support of \( {\nu} \), and there exists one and only one root of \( {P_{n+1}} \) between two roots of \( {P_n} \) (interlacement). The Cauchy-Stieltjes transform of \( {\nu} \) is given by the continuous fraction

\[ \int_{\mathbb{R}}\!\frac{1}{z-x}\,\nu(dx) =\cfrac{1}{z-\cfrac{\gamma_1^2}{z-\cfrac{\gamma_2^2}{z-\ddots}}} \]

for all \( {z\in\mathbb{C}_+.} \) The \( {n^\text{th}} \) approximant of this continuous fraction is the Cauchy-Stieltjes transform of the Gauss-Christoffel quadrature \( {\nu_n} \) of \( {\nu} \). In other words, \( {\nu_n} \) is the discrete law on \( {\mathbb{R}} \) with atoms \( {(\lambda_{n,k})_{1\leq k\leq n}} \) which is equal to \( {\nu} \) as a linear form on all polynomials of degree \( {\leq n} \). If \( {(v_{n,k})_{1\leq k\leq n}} \) are the orthogonal eigenvectors of the symmetric matrix \( {\Gamma} \) associated to the eigenvalues \( {(\lambda_{n,k})_{1\leq k\leq n}} \), then from Proposition 1.1.9 in Hiai and Petz,

\[ \nu_n=\frac{1}{n}\sum_{k=1}^n\left<v_{n,k},e_1\right>^2\delta_{\lambda_{n,k}} \]


\[ \nu_n\underset{n\rightarrow\infty}{\longrightarrow}\nu \]

where the convergence is the weak convergence for continuous bounded functions and all polynomials (Wasserstein topology). It turns out that \( {\nu_n} \) is the so called distribution measure at vector \( {e_1} \) of the self–adjoint operator \( {\Gamma} \), while \( {\nu} \) is the distribution measure at vector \( {e_1} \) of the self–adjoint operator

\[ \overline{\Gamma}:\ell^2(\mathbb{N})\rightarrow\ell^2(\mathbb{N}) \]

where \( {\mathbb{N}=\{1,2,\ldots\}} \), see e.g. Example 1.1.1 and Proposition 1.1.9 in Hiai and Petz. Here \( {\overline{\Gamma}} \) is the operator which coincides with \( {\Gamma} \) on \( {\ell^2(\{1,\ldots,n\})} \) for every \( {n\geq1} \). In other words, for every \( {k\geq0} \),

\[ \left<\overline{\Gamma}^k e_1,e_1\right>_{\ell^2(\mathbb{N})} =\lim_{n\rightarrow\infty}\left<\Gamma^k e_1,e_1\right>_{\ell^2(\{1,\ldots,n\})} =\lim_{n\rightarrow\infty}\int_{\mathbb{R}}\!x^k\,\nu_n(dx) =\int_{\mathbb{R}}\!x^k\,\nu(dx). \]

One should not confuse \( {\nu_n} \) with \( {\mu_\Gamma} \). In general, \( {\mu_\Gamma} \) does not converge weakly to \( {\nu} \). Actually, a famous theorem by Erdös & Turán states that if \( {\nu} \) is supported on \( {[-1,+1]} \) with almost-everywhere positive density, then \( {\mu_\Gamma} \) tends weakly as \( {n\rightarrow\infty} \) to the arc–sine distribution on \( {[-1,+1]} \), see e.g. pages 24–25 in Van Assche. In fact the convergence holds for all moments.

In the special case where \( {\gamma_n=1} \) for every \( {n\geq1} \), then \( {\nu} \) turns out to be a Wigner semi–circle law on \( {[-1,+1]} \) and \( {(P_n)_{n\geq0}} \) are the Jacobi polynomials with parameters \( {\alpha=\beta=1/2} \), also known as the Chebyshev polynomials of the second kind. From the Erdős-Turán theorem, we get then a simple example for which \( {\nu} \) (Wigner semi–circle law on \( {[-1,+1]} \)) is not the weak limit \( {\mu} \) of \( {\mu_\Gamma} \) (arc–sine law on \( {[-1,+1]} \)). On the other hand, it is well known that the arc–sine law on \( {[-1,+1]} \) is a fixed point of the map \( {\nu\mapsto\mu} \) and \( {(P_n)_{n\geq0}} \) are in this case the Chebyshev polynomials of the first kind.

Recall that the recursive relations of the Chebyshev polynomials of the first and second kind differ only on the first coefficients. It is quite natural to consider the empirical measure of the roots \( {\mu_{Q_n}} \) associated to the sequence of random orthogonal polynomials \( {(Q_n)_{n\geq0}} \) defined by the recursive relation

\[ Q_{n+1}(x)=xQ_n(x)-W_nQ_{n-1}(x) \]

for every \( {n\geq1} \) with initial conditions \( {Q_0\equiv1} \) and \( {Q_1(x)=x} \), where this time \( {(W_n)_{n\geq1}} \) are i.i.d. positive random variables. For every \( {n\geq1} \), the random polynomial \( {Q_n} \) is the characteristic polynomial of the random matrix

\[ \begin{pmatrix} 0 & \sqrt{W_1} \\ \sqrt{W_1} & 0 & \sqrt{W_2} \\ &\sqrt{W_2} & 0 & \sqrt{W_3} \\ &&\ddots & \ddots &\ddots \\ &&&\sqrt{W_{n-2}}&0&\sqrt{W_{n-1}} \\ &&&&\sqrt{W_{n-1}}&0 \end{pmatrix}. \]

This matrix is the so called Jacobi matrix of the sequence \( {(Q_n)_{n\geq0}} \), and its empirical spectral distribution is \( {\mu_{Q_n}} \). Since this matrix is symmetric tridiagonal with i.i.d. entries, the method of moments allows to show that a.s. \( {\mu_{Q_n}} \) tends as \( {n\rightarrow\infty} \) to a non-random probability measure which depends on the common law of the \( {W_i} \)’s via its moments. Here again, the limit of \( {\mu_{Q_n}} \) as \( {n\rightarrow\infty} \) does not coincide in general with the (random) law for which \( {(Q_n)_{n\geq0}} \) are orthogonal.

Note: the content of this post is taken from arXiv 0811.1097v1, but was removed from the lightweight published version arXiv 0811.1097v2.

Leave a Comment

Localization of eigenvectors: heavy tailed random matrices

We measure the localization of an eigenvector $latex v\in\mathbb{R}^n$, $latex v\neq0$, by the norm ratio $latex \sqrt{n}\left\Vert v\right\Vert_\infty \left\Vert v\right\Vert_2^{-1}$.

In what follows, the matrix $latex X$ is $latex n\times n$, with i.i.d. entries and $latex X_{11}=\varepsilon (U^{-1/\alpha}-1)$ where $latex \varepsilon$ and $latex U$ are independent with $latex \varepsilon$ symmetric Rademacher (random sign) and $latex U$ uniform on $latex [0,1]$. For every $latex t>0$, we have $latex \displaystyle \mathbb{P}(|X_{11}|>t)=(1+t)^{-\alpha}$.

Each of the following plots corresponds to a single realization of the matrix. In each plot, each small red cirlce corresponds to a couple (eigenvalue, eigenvector). A logarithmic scale is used for the eigenvalues, while the localization is used for the eigenvectors.  The notion of “bulk of the spectrum” is unclear when $latex \alpha<2$. Beware that we should not 100% trust numerics when dealing with heavy tailed random matrices (no truncation here).

Symmetric model: eigenvectors localization versus eigenvalues (Octave code)

Full iid model: eigenvectors localization versus singular values (Octave code)

Note: This post is motivated by an animated coffee break discussion with Alice Guionnet and Charles Bordenave during a conference on random matrices (Paris, June 1-4, 2010). You may forge numerous conjectures from these computer simulations. Another interesting problem is the limiting behavior of the eigenvalues or singular values spacings when the dimension goes to infinity, and its dependency over the tail index $latex \alpha$.

Recent addition to this post: Sandrine Péché pointed out that conjectures have been made by the physicists Bouchaud and Cizeau in their paper Theory of Lévy matrices published in Phys. Rev. E 50, 1810–1822 (1994).

1 Comment

Test problems

Suppose that $latex X_1,\ldots,X_n$ and $latex Y_1,\ldots,Y_n$ are two independent samples following a Gaussian law on $latex \mathbb{R}^p$ with zero mean and unknown invertible covariances $latex A$ and $latex B$. We consider the situation where $latex p$ is much larger than $latex n$ but we assume that $latex A^{-1}$ and $latex B^{-1}$ are sparse (conditional independence on the components). How can we test efficiently if $latex A=B$? Same question if the sparsity structure is assumed in $latex A$ and $latex B$ rather than on their inverse (independence instead of conditional independence).

Suppose that $latex (X_1,Y_1),\ldots,(X_n,Y_n)$ is a sample drawn from some unknown law on $latex \mathbb{R}^2.$ How can we efficiently test if the marginal laws are identical? More generally, suppose that $latex \ldots,Z_{-1}, Z_0, Z_1,\ldots$ is a time series. How can we test if the series is stationary? In other words, how can we test a nonparametric linear structure from the observation of a sample of an unknown law? Random projections?

For these questions coming from applications, we seek for a concrete usable answer…

This post is inspired from (separate) informal discussions with Georges Oppenheim and Didier Concordet. My friend and colleague Christophe Giraud told me that he is working with Nicolas Verzelen and Fanny Villers on the first question.


Large deviations for random matrices


I am not fond of the large deviations industry, but I like very much the Sanov theorem associated to the law of large numbers expressed on empirical distributions. The rate function in this large deviation principle is the Kullback-Leibler relative entropy.  It turns out that a very pleasant Sanov type theorem exists for certain models of random matrices. Namely, let $latex V:\mathbb{R}\to\mathbb{R}$ be a continuous function such that $latex \exp(-V)$ is Lebesgue integrable and  $latex V$ grows faster than the logarithm at infinity. For instance, one may take $latex V(x)=x^2/2$. Let $latex H_n$ be a random $latex n\times n$ Hermitian matrix with Lebesgue density proportional to

$latex \displaystyle H\mapsto \exp\left(-n\mathrm{Tr}(V(H))\right).$

Let $latex \lambda_{n,1},\ldots,\lambda_{n,n}$ be the eigenvalues of $latex H_n$. Their ordering  does not matter here. These eigenvalues are real since $latex H_n$ is Hermitian.  The unitary invariance of the law of $latex H_n$ allows to show that the law of $latex \lambda_{n,1},\ldots,\lambda_{n,n}$ , quotiented by the action of the symmetric group, has Lebesgue density proportional to

$latex \displaystyle (\lambda_{n,1},\ldots,\lambda_{n,n})\mapsto \exp\left(-\sum_{k=1}^n nV(\lambda_{n,k})\right)\prod_{i\neq j}|\lambda_{n,i}-\lambda_{n,j}|^2$

which can be rewritten as

$latex \displaystyle (\lambda_{n,1},\ldots,\lambda_{n,n})\mapsto \exp\left(-\sum_{k=1}^n nV(\lambda_{n,k})+\frac{1}{2}\sum_{i\neq j}\log|\lambda_{n,i}-\lambda_{n,j}|\right).$

The Vandermonde determinant comes from the Jacobian of the diagonalization seen as a change of variable (integration over the eigenvectors), and can also be seen as the discriminant (resultant of $latex P, P’)$ of the characteristic polynomial $latex P$. Let us consider the random counting probability measure of these eigenvalues:

$latex \displaystyle L_n:=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_{n,k}}$

which is a random element of the convex set $latex \mathcal{P}$ of probability measures on $latex \mathbb{R}$. A very nice result due to Ben Arous and Guionnet states that for the topology of the narrow convergence, the sequence $latex (L_n)_{n\geq1}$ satisfies to a large deviation principle with speed $latex n^2$ and good rate function $latex \Phi$ given up to an additive constant by

$latex \displaystyle \mu\in\mathcal{P}\mapsto \Phi(\mu):=\int\!V\,d\mu-\int\!\int\!\log|x-y|\,d\mu(x)d\mu(y)$.

In other words, we have for every nice set $latex A\subset\mathcal{P}$, as $latex n\to\infty$,

$latex \displaystyle \mathbb{P}(L_n\in A)\approx \exp\left(-n^2\inf_{\mu\in A}\Phi(\mu)\right).$

The second term in the definition of $latex \Phi(\mu)$ is the logarithmic energy of $latex \mu$ (minus the Voiculescu free entropy). When $latex V(x)=x^2/2$, then $latex H_n$ belongs to the Gaussian Unitary Ensemble and  the Wigner semicircle law is a maximum of this entropy under a second moment constraint, see e.g. the book by Saff and Totik. In particular, since the proof does not involve the underlying almost sure convergence theorem (in contrary to the Cramer theorem which involves the law of large numbers), the large deviation principle of Ben Arous and Guionnet yields via the first Borel-Cantelli lemma a new proof of the Wigner theorem.

In my opinion (I have a mathematical physicist soul!) the large deviation principle of Ben Arous and Guionnet is one of the nicest results in random matrix theory. It explains the appearance of the Wigner semicircle law in the Wigner theorem via a maximum entropy or minimum energy under moments constraints. Unfortunately, the result is only available for unitary invariant ensembles and does not cover the case of non Gaussian Wigner matrices with i.i.d. entries, for which the Wigner theorem is still valid. It is tempting to seek for a version of this large deviation principle for such non Gaussian Wigner matrices. The answer is not known. It is not clear that the sole finite positive variance assumption is enough to ensure that the rate function is the one of the Gaussian Unitary Ensemble. It is probable that the rate function will depend on the law of the entries. However, the arg-infimum of this function is still the Wigner semicircle law.

The proof of Ben Arous and Guionnet relies crucially on the explicit knowledge of the law of the spectrum, due to the unitary invariance of the model (roughly, if one puts the Vandermonde determinant into the exponential as a potential, it looks like a discrete Voiculescu entropy). Ben Arous and Zeitouni have used essentially the same method in order to establish a large deviation principle for the non-Hermitian version of the model, yielding a new proof of the circular law theorem for the Ginibre Ensemble.

It could be nice to connect these large deviations principles with transport inequalities. For connections between these two universes, take a look at the recent survey by Gozlan and Leonard.

Syntax · Style · Tracking & Privacy.