Press "Enter" to skip to content

Category: Uncategorized

Spectral gap concentration in high dimension

London tube logo "Mind the gap!"

This short post is devoted to a very natural and simple random diffusion operator on the Euclidean space for which the spectral gap exhibits high-dimensional concentration.

Diffusion operator. Let $\Sigma$ be a $d\times d$ positive definite real symmetric matrix, and ${(X_t)}_{t\geq0}$ be the simplest Markov diffusion on $\mathbb{R}^d$ for which the Gaussian $\mu=\mathcal{N}(0,\Sigma)$ is invariant. It is the overdamped Langevin or Ornstein-Uhlenbeck process solving the stochastic differential equation \[ X_t =\sqrt{2}B_t-\int_0^t\nabla V(X_s)\mathrm{d}s =\sqrt{2}B_t-\int_0^t\Sigma^{-1}X_s\mathrm{d}s \] where $B$ is a standard Brownian motion on $\mathbb{R}^d$, and $V(x)=\frac{1}{2}\langle\Sigma^{-1}x,x\rangle$. The process is ergodic : $X_t\to\mu$ in law as $t\to\infty$ regardless of $X_0$, and more generally exponentially ergodic : for all $X_0$ and $t\geq0$, denoting $f_t$ the density of $\mathrm{Law}(X_t)$ with respect to $\mu$, we have \[ \int(f_t-1)^2\mathrm{d}\mu \leq\mathrm{e}^{-2\gamma t} \int(f_0-1)^2\mathrm{d}\mu \] where $\gamma > 0$ is the spectral gap of the linear differential operator \[ A=\Delta-\nabla V\cdot\nabla=\Delta-\Sigma^{-1}x\cdot\nabla \] seen as an unbounded selfadjoint operator on $L^2(\mu)$. It is the infinitesimal generator of the Markov semigroup of the process. The term spectral gap comes from the fact that \[ \mathrm{spec}(-A)\subset\{0\}\cup[\gamma,+\infty). \] By using the evolution equation $\partial_t f_t=Af_t$, the Grönwall lemma, and the integration by parts, the exponential decay above is equivalent to the functional inequality \[ \gamma\mathrm{Var}_\mu(f)\leq\mathbb{E}_\mu(f(-A)f)=\int|\nabla f|^2\mathrm{d}\mu. \] By the change of function $f=g(\Sigma^{-1/2}\cdot)$ and the optimal Poincaré inequality for $\mathcal{N}(0,I_n)$, \[ \gamma=\lambda_{\min}(\Sigma^{-1})=(\lambda_{\max}(\Sigma))^{-1}=(\|\Sigma\|_{\mathrm{op.}})^{-1}. \] If $v\in\mathbb{R}^d$, $v\neq0$, is an eigenvector of $\Sigma^{-1}$ associated to $\lambda_{\min}(\Sigma^{-1})$, then the deformed multivariate Hermite polynomial $f(x)=x\cdot v$ satisfies $-Af(x)=-0+\Sigma^{-1}x\cdot v=\gamma f(x)$, and is therefore an eigenfunction of $-A$ associated to $\gamma$.

Disorder and concentration. We make now the matrix $\Sigma$ random, and independent of the initial condition $X_0$ and the driving Brownian motion $B$. The process $X$ is then a diffusion in random environment, or a disordered OL or OU diffusion process. A simple natural way to make $\Sigma$ random is to decide that it is the empirical covariance matrix of a sample $Z_1,\ldots,Z_n$ made with $n$ independent copies of a random vector of $\mathbb{R}^d$ of law $\mathcal{N}(0,I_d)$. More precisely, \[ \Sigma=\frac{1}{n}\sum_{i=1}^nZ_i\otimes Z_i=\frac{1}{n}ZZ^\top \] where $Z$ is the $d\times n$ matrix with columns $Z_1,\ldots,Z_n$. We have \[ \mathbb{E}\Sigma=I_d. \] The random matrix $\Sigma$ is symmetric and positive semi-definite. Its law is known as the Wishart distribution, with parameters $n$, $d$, and $I_d$. In the high-dimensional regime \[ n,d\to\infty\quad\text{with}\quad \frac{d}{n}\to\rho\in(0,\infty), \] a famous result of random matrix theory states that \[ \lambda_{\max}(\Sigma)\xrightarrow[]{\mathrm{a.s.}}(1+\sqrt{\rho})^2. \] Moreover, denoting $\mathrm{TW}$ the Tracy-Widom distribution of parameter $\beta=1$, \[ c_\rho n^{2/3}(\lambda_{\max}(\Sigma)-(1+\sqrt{\rho})^2) \xrightarrow[]{\mathrm{law}} \mathrm{TW} \] where $c_\rho=(1+\sqrt{\rho})^{4/3}/\rho^{1/6}$. By the delta-method, this gives a concentration of the spectral gap $\gamma=(\lambda_{\max}(\Sigma))^{-1}$ of the random diffusion operator $A$ in high dimension: \[ \gamma\approx\frac{1}{(1+\sqrt{\rho})^2}-\frac{\mathrm{TW}}{c_\rho(1+\sqrt{\rho})^4n^{2/3}}. \]

Inverse disorder. Alternatively, we could decide to put the Wishart disorder directly on the quadratic form $V(x)=\frac{1}{2}\Sigma^{-1}x\cdot x$, in other words on $\Sigma^{-1}$ instead of on $\Sigma$. In this case, the spectral gap would be directly and simply equal to the $\lambda_{\min}$ of the Wishart random matrix, leading, when $\rho < 1$, to the high-dimensional concentration \[ \gamma\approx(1-\sqrt{\rho})^2+\frac{\mathrm{TW}}{c_\rho n^{2/3}}. \] The case $\rho=1$ corresponds to a hard edge, and gives rise to another type of fluctuation, while the case $\rho > 1$ gives $\gamma=0$ with positive probability.

Universality. These high-dimensional behaviors are universal, in the sense that they remain the same if $Z$ is no longer Gaussian but has still i.i.d. coordinates of mean zero and unit variance. However, for the $\lambda_{\max}$, we have also to assume that $Z$ has coordinates with finite fourth moment, otherwise the fluctuation is heavy-tailed instead of TW.

Random Schrödinger operator. Let us consider the linear isometry \[ T:L^2(\mu)\to L^2(\mathrm{d}x),\quad T(f)=f\sqrt{\varphi}, \] where $\varphi$ is the density of $\mu$. Then some algebra gives \[ T\circ A\circ T^{-1} =\Delta+W \] where the multiplicative potential is given by \[ W(x)=-\tfrac{1}{4}|\nabla V(x)|^2+\tfrac{1}{2}\Delta V(x) =-\tfrac{1}{4}|\Sigma^{-1}x|^2+\tfrac{1}{2}\mathrm{Tr}(\Sigma^{-1}). \] This is a real Schrödinger operator, with potential $W$. Since $W$ is a quadratic form, the operator is a (deformed and real) quantum harmonic oscillator. It has same spectrum as $A$, while its eigenfunctions are overdamped versions of the ones of $A$. This passage from $L^2(\mu)$ to $L^2(\mathrm{d}x)$ is known as the ground state transform.

Explicit formulas. For the sake of clarity and completeness, let us compute the full eigendecomposition of $A$. Pick $f:\mathbb{R}^d\to\mathbb{R}$ and $O\in\mathbb{O}_n(\mathbb{R})$. For every $1\leq i\leq d$, \[ \partial_if(Ox)=\sum_{j=1}^d(\partial_jf)(Ox)O_{ji} \quad\text{and}\quad \partial_{ii}f(Ox)=\sum_{j,k=1}^d(\partial_{jk}f)(Ox)O_{ji}O_{ki}. \] But $OO^\top=I_d$, hence $\Delta f(O x)=(\Delta f)(Ox)$, expressing the rotational invariance of $\Delta$. We also get $\nabla f(Ox)=O^\top(\nabla f)(Ox)$. Thus \begin{align*} Af(Ox) &=(\Delta f)(Ox)-\Sigma^{-1}x\cdot O^\top(\nabla f)(Ox)\\ &=(\Delta f)(Ox)-O\Sigma^{-1}O^\top Ox\cdot(\nabla f)(Ox)\\ &=(Bf)(Ox) \end{align*} where \[ B=\Delta-Dx\cdot\nabla \quad\text{and}\quad D=O\Sigma^{-1}O^\top. \] It follows that $-Bf=\lambda f$ iff $-Ag=\lambda g$ where $g(x)=f(Ox)$. Let us select $O$ such that $D=\mathrm{Diag}(s^2_1,\ldots,s^2_d)$. Then we have the direct sum \[ B=\Delta-Dx\cdot\nabla=\sum_{i=1}^dB_{s_i} \quad\text{where}\quad B_{s_i}=\partial_{ii}^2-s_i^2x_i\partial_i. \] Now we observe that for all $f:\mathbb{R}\to\mathbb{R}$, $a > 0$, $s > 0$, \[ B_s(f(ax)) =a^2f''(ax)-s^2xaf'(ax) =a^2(B_{\frac{s}{a}}f)(ax). \] It follows that $-B_1f=\lambda f$ iff $-B_sg=\lambda s^2g$ where $g(x)=f(sx)$.

But in $L^2(\mathcal{N}(0,1))$, we have $\mathrm{spec}(-B_1)=\mathbb{N}=\{0,1,2,\ldots\}$ and the eigenfunctions are the Hermite polynomials ${(H_k)}_{k\in\mathbb{N}}$, orthonormal with respect to the standard Gaussian $\mathcal{N}(0,1)$. Putting all together, we obtain finally that \[ \mathrm{spec}(-A)=\{k_1s_1^2+\cdots+k_ds_d^2:k\in\mathbb{N}^d\} =s_1^2\mathbb{N}+\cdots+s_d^2\mathbb{N} \] where $s_1^2,\ldots,s_d^2$ are the eigenvalues of $\Sigma^{-1}$, while the subspace of eigenfunctions associated to an eigenvalue $\lambda\in\mathrm{spec}(-A)$ is spanned by the rotated and scaled multivariate Hermite polynomials given, for an arbitrary $k\in\mathbb{N}^d$ such that $\lambda=k_1s_1^2+\cdots+k_ds_d^2$, by \[ g(x)=P(Ox) \quad\text{where}\quad P(x)=\prod_{i=1}^dH_{k_i}(s_ix_i) \] where $O$ is the matrix of eigenvectors of $\Sigma^{-1}=O^\top\mathrm{Diag}(s_1^2,\ldots,s_d^2)O$.

We could conduct the previous analysis directly on the Schrödinger operator, by using the rotational invariance of the Laplacian and the cyclic property of the trace: \begin{align*} (\Delta+W)(f(Ox)) &=(\Delta f)(Ox)-\tfrac{1}{4}|DOx|^2+\tfrac{1}{2}\mathrm{Tr}D\\ &=\bigr(\sum_{i=1}^d(\partial_{ii}^2-\tfrac{1}{4}D_{ii}^2+\tfrac{1}{2}D_{ii})\bigr)(f(Ox)). \end{align*}

Further reading.

  • Edelman, Alan
    The distribution and moments of the smallest eigenvalue of a random matrix of Wishart type
    Linear Algebra and its Applications 159 (1991) 55-80
  • Edelman, Alan
    Eigenvalues and Condition Numbers of Random Matrices
    SIAM Journal on Matrix Analysis and Applications 9(4) (1988)
  • Johnstone, Iain M.
    On the Distribution of the Largest Eigenvalue in Principal Components Analysis
    The Annals of Statistics 29(2) (2001) 295-327
  • Lee, Ji and Schnelli, Kevin
    Tracy-Widom distribution for the largest eigenvalue of real sample covariance matrices with general population
    Annals of Applied Probability 26(6) (2016) 3786-3839.
  • Auffinger, Antonio, Ben Arous, Gérard, and Péché, Sandrine
    Poisson convergence for the largest eigenvalues of heavy tailed random matrices
    Annales de l’Institut Henri Poincaré - Probabilités et Statistiques 45(3) (2009) 589-610
Leave a Comment
Syntax · Style · .