Bartlett decomposition and other factorizations

Maurice Stevenson Bartlett — Maurice S. Bartlett (1910 - 2002)

The study of random matrices benefits from matrix factorizations involving a lower number of independent random variables. This post is devoted to the famous Bartlett decomposition of Wishart random matrices. This beautiful result was published in 1933 by the British statistician Maurice Stevenson Bartlett (1910 - 2002). This post is also the occasion to consider briefly some other decompositions of random matrices, due to Silverstein, Trotter, Dumitriu and Edelman, and Krishnapur and Virág.

Bartlett decomposition of Wishart matrices. In Statistics, the classical Bartlett decomposition provides the distribution of the factor in the Cholesky decomposition of an empirical covariance matrix following the Wishart distribution.

Let us denote by \( {\mathcal{S}_m^+} \) the set of \( {m\times m} \) symmetric matrices with non-negative spectrum. Let \( {X_1,\ldots,X_n} \) be iid random vectors of \( {\mathbb{R}^m} \) following the Gaussian law \( {\mathcal{N}(\mu,\Sigma)} \) with mean \( {\mu\in\mathbb{R}^m} \) and covariance matrix \( {\Sigma\in\mathcal{S}_m^+} \):

\[ \mu:=\mathbb{E}(X_1)\in\mathbb{R}^m \]

and

\[ \Sigma:=\mathbb{E}(X_1\otimes X_1)-\mu\otimes\mu =\mathbb{E}((X_1-\mu)\otimes(X_1-\mu))\in\mathcal{S}_m^+. \]

The random \( {m\times m} \) matrix

\[ S_n:=\sum_{k=1}^n X_k\otimes X_k \]

takes its values in \( {\mathcal{S}_m^+} \), and its law is known as the Wishart distribution

\[ \mathcal{W}_m(\Sigma,n), \]

named after the Scottish statistician John Wishart (1878 - 1956). It is a natural matrix extension of the chi-square distribution (case \( {m=1} \)). If \( {n\geq m} \) and \( {\det(\Sigma)>0} \) then it can be shown that it admits a density given by

\[ S\mapsto \frac{1}{Z_n}(\det S)^{\frac{n-m-1}{2}} \exp\left(-\frac{1}{2}\mathrm{Tr}(\Sigma^{-1}S)\right) \mathbf{1}_{\mathcal{S}_m^+}(S) \]

or alternatively

\[ S\mapsto \frac{1}{Z_n} \exp\left(-\frac{1}{2}\mathrm{Tr}(\Sigma^{-1}S) +\frac{n-m-1}{2}\log\det S\right) \mathbf{1}_{\mathcal{S}_m^+}(S) \]

where \( {Z_n} \) is the normalizing constant given by

\[ Z_n=\det(\Sigma)^{\frac{n}{2}} 2^{\frac{mn}{2}} \pi^{\frac{m(m-1)}{4}} \Gamma\left(\frac{1}{2}\right)\cdots\Gamma\left(\frac{m}{2}\right). \]

Recall that if \( {S\in\mathcal{S}_m^+} \) then one can construct an \( {m\times m} \) upper triangular matrix \( {T} \) with non-negative diagonal such that

\[ H=T^\top T. \]

This is called the Cholesky decomposition or factorization of \( {S} \), named after the French engineer André-Louis René Cholesky (1875 - 1918). It is unique when \( {S} \) is invertible. Note that the matrix \( {T^\top} \) is lower triangular, and the Cholesky factorization is a special case of the so-called \( {LU} \) factorization, where \( {L} \) and \( {U} \) stand for ``lower'' and ``upper'' respectively.

The Bartlett decomposition is as follows: if \( {S_n\sim\mathcal{W}_m(I_m,n)} \) with \( {n\geq m} \) then almost surely \( {\det S_n>0} \) and in the Cholesky factorization

\[ S_n=T^\top T, \]

the random variables \( {{(T_{jk})}_{1\leq j\leq k\leq m}} \) are independent with \( {T_{jj}^2\sim\chi^2_{n-j+1}} \) for every \( {j=1,\ldots,m} \) and \( {T_{jk}\sim\mathcal{N}(0,1)} \) for every \( {1\leq j<k\leq m} \), in other words

\[ T= \begin{pmatrix} T_{11} & \cdots & T_{1m} \\ \vdots & \ddots & \vdots \\ 0 & \cdots & T_{mm} \end{pmatrix} \quad\text{with}\quad T_{jk} \sim \left\{ \begin{array}{cl} \chi_{n-j+1} & \text{if }j=k,\\ \mathcal{N}(0,1) & \text{if }j<k. \end{array} \right. \]

Indeed, this follows immediately from the formula for the density of \( {\mathcal{W}_m(I_m,n)} \) by the change of variables \( {S=T^\top T} \). Namely, using the fact that

\[ \mathrm{Tr}(S)=\mathrm{Tr}(TT^\top)=\sum_{1\leq j\leq k\leq m}T_{jk}^2 \quad\text{and}\quad \det S=(\det T)^2=\prod_{j=1}^m T_{jj}^2, \]

and

\[ dS=2^m\prod_{j=1}^m T_{jj}^{m-j+1}dT =2^m\prod_{j=1}^mT_{jj}^{m-j+1}\prod_{1\leq j\leq k\leq m}dT_{j,k} \]

we obtain that the density of \( {T} \) is

\[ T\mapsto \prod_{1\leq j<k\leq m} \frac{e^{-\frac{1}{2}T_{jk}^2}}{\sqrt{2\pi}} \times \prod_{j=1}^m \frac{(T_{jj}^2)^{\frac{n-j+1}{2}-1}e^{-\frac{1}{2}T_{jj}^2}} {2^{\frac{n-j+1}{2}}\Gamma\left(\frac{n-j+1}{2}\right)} \]

Statistical application of Bartlett decomposition.

Let us assume here that \( {n\geq2} \). If we introduce the empirical mean

\[ \widehat{\mu}_n:=\frac{1}{n}\sum_{k=1}^nX_k \]

then one can show that

\[ \sum_{k=1}^n(X_k-\widehat\mu_n)\otimes(X_k-\widehat\mu_n) \sim\mathcal{W}_m(\Sigma,n-1). \]

If we introduce the empirical covariance matrix

\[ \widehat{\Sigma}_n :=\frac{1}{n}\sum_{k=1}^n(X_k-\widehat\mu_n)\otimes(X_k-\widehat\mu_n) =\frac{S_n}{n}-\widehat\mu_n\otimes\widehat\mu_n \]

then it can be checked that the modified empirical covariance matrix

\[ \widetilde{\Sigma}_n :=\frac{1}{n-1}\sum_{k=1}^n(X_k-\widehat\mu_n)\otimes(X_k-\widehat\mu_n) =\frac{n}{n-1}\widehat{\Sigma}_n =\frac{S_n}{n-1}-\frac{n}{n-1}\widehat\mu_n\otimes\widehat\mu_n \]

is an unbiased estimator of \( {\Sigma} \) in the sense that

\[ \mathbb{E}(\widetilde{\Sigma}_n)=\Sigma. \]

The quantity \( {\det\Sigma} \) is the so-called generalized variance of the sample. It can be estimated with \( {\det\widetilde\Sigma_n} \). Now if \( {n-1\geq m} \) and \( {\det\Sigma>0} \) then

\[ (n-1)\widetilde{\Sigma}_n=S_n\sim\mathcal{W}_m(\Sigma,n-1) \]

and

\[ (n-1)\Sigma^{-1/2}\widetilde{\Sigma}_n\Sigma^{-1/2}\sim\mathcal{W}_m(I_m,n-1) \]

and the Bartlett decomposition gives

\[ (n-1)^{m}\frac{\det\widetilde{\Sigma}_n}{\det\Sigma} =\prod_{j=1}^m T_{jj} \]

where \( {{(T_{jj})}_{1\leq j\leq m}} \) are independent with \( {T_{jj}^2\sim\chi^2_{n-j+1}} \) for any \( {j=1,\ldots,m} \). We have

\[ \log\frac{\det\widetilde\Sigma_n}{\det\Sigma}-m\log(n-1) =\sum_{j=1}^m\log(T_{jj}^2). \]

Writing \( {T_{jj}^2=\sum_{k=1}^{n-j+1}Z_{jk}^2} \) where \( {{(Z_{jk})}_{j\geq1,k\geq1}} \) are iid \( {\mathcal{N}(0,1)} \), we have \( {\mathbb{E}(Z_{jk}^2)=1} \) and \( {\mathrm{Var}(Z_{jk}^2)=3-1^2=2} \), and thus, for any fixed \( {j} \), by the Central Limit Theorem,

\[ \sqrt{\frac{n-j+1}{2}}\left(\frac{T_{jj}^2}{n-j+1}-1\right) \underset{n\rightarrow\infty}{\overset{d}{\longrightarrow}} \mathcal{N}(0,1). \]

It follows by the so-called delta-method that

\[ \sqrt{\frac{n-j+1}{2}} (\log(T_{jj}^2)-\log(n-j+1)) \underset{n\rightarrow\infty}{\overset{d}{\longrightarrow}} \mathcal{N}(0,1). \]

It follows by summing \( {j} \) over \( {1,\ldots,m} \) and using independence that

\[ \sqrt{\frac{n-j+1}{2}} \sum_{j=1}^m(\log(T_{jj}^2)-\log(n-j+1)) \underset{n\rightarrow\infty}{\overset{d}{\longrightarrow}} \mathcal{N}(0,m). \]

This gives immediately and finally the following remarkable fact:

\[ \sqrt{\frac{n-1}{2m}} \log\frac{\det\widetilde{\Sigma}_n}{\det\Sigma} \underset{n\rightarrow\infty}{\overset{d}{\longrightarrow}} \mathcal{N}(0,1). \]

Silverstein decomposition of Wishart matrices.

Let \( {X_1,\ldots,X_n} \) be iid random vectors of \( {\mathbb{R}^m} \) following the Gaussian law \( {\mathcal{N}(0,I_n)} \), with \( {n\geq m\geq 1} \). Let \( {\mathbb{X}} \) be the \( {m\times n} \) random matrix with columns \( {X_1,\ldots,X_n} \), then

\[ S_n=\mathbb{X}\mathbb{X}^\top\sim\mathcal{W}_m(I_m,n). \]

The square root of the eigenvalues of \( {S} \) are the singular values of \( {\mathbb{X}} \). Following Silverstein, apparently independely of Bartlett, using for instance the Gram-Schmidt algorithm, it can be shown that one can construct a couple of random orthogonal matrices \( {O_m} \) (\( {m\times m} \)) and \( {O_n} \) (\( {n\times n} \)) such that

\[ B=O_m\mathbb{X}O_n^\top \]

is a random \( {m\times n} \) lower triangular bidiagonal rectangular matrix with independent entries such that \( {B_{j,j}^2\sim\chi^2_{n-j+1}} \) for any \( {j=1,\ldots,m} \) and \( {B_{j,j-1}^2\sim\chi^2_{m-j+1}} \) for any \( {j=2,\ldots,m-1} \). In other words

\[ B= \begin{pmatrix} B_{1,1} & 0 & & 0 & 0 & \cdots & 0\\ B_{2,1} & \ddots & 0 & & 0 & \cdots & 0\\ & \ddots & \ddots & 0 & 0 & \cdots & 0\\ 0 & & B_{m,m-1} & B_{m,m} & 0 & \cdots & 0 \end{pmatrix} \quad\text{with}\quad B_{jk} \sim \left\{ \begin{array}{cl} \chi_{n-j+1} & \text{if }j=k,\\ \chi_{m-j+1} & \text{if }j=k+1. \end{array} \right. \]

It follows that \( {\mathbb{X}} \) and \( {B} \) have the same singular values. Following Silverstein, this allows to show, when \( {m,n\rightarrow\infty} \) with \( {\lim_{m,n\rightarrow\infty}m/n=\rho\in(0,1)} \), using the Gershgorin theorem, that almost surely, the smallest singular value of \( {\frac{1}{\sqrt{n}}\mathbb{X}} \) tends to the left edge \( {1-\sqrt{\rho}} \) of the Marchenko-Pastur distribution.

Trotter decomposition of matrices from the GUE.

Let \( {\mathcal{H}_m} \) be the set of \( {m\times m} \) Hermitian matrices, equipped with the Lebesgue measure. Let \( {G} \) be the ``Gaussian'' random variable with density proportional to

\[ H\in\mathcal{H}_m\mapsto e^{-\frac{1}{2}\mathrm{Tr}(H^2)}. \]

We say that the random matrix \( {G} \) belongs to the Gaussian Unitary Ensemble. Since

\[ \mathrm{Tr}(H^2)=\sum_{j=1}^mH_{jj}^2 +2\sum_{1\leq j<k\leq m}(\Re H_{jk})^2 +2\sum_{1\leq j<k\leq m}(\Im H_{jk})^2 \]

it follows that the random variables

\[ \{G_{jj}:1\leq j\leq m;\sqrt{2}\Re G_{jk}:1\leq j<k\leq m;\sqrt{2}\Im G_{jk}:1\leq j<k\leq m\} \]

are iid with law \( {\mathcal{N}(0,1)} \). Following Trotter, apparently independently of Bartlett and Silverstein, using for instance a product of reflections, it can be shown that one can construct a random unitary matrix \( {U} \) (\( {m\times m} \)) such that

\[ T=UGU^* \]

is a random \( {m\times m} \) tridiagonal symmetrix matrix with \( {{(T_{j,k})}_{1\leq j\leq k\leq m}} \) independent, such that \( {\sqrt{2}T_{j,j}\sim\mathcal{N}(0,2)} \) for any \( {j=1,\ldots,m} \) and \( {2T_{j,j-1}^2\sim\chi_{2(m-j+1)}^2} \) for any \( {j=2,\ldots,m} \). In other words

\[ T= \begin{pmatrix} T_{1,1} & T_{2,1} & & 0\\ T_{2,1} & \ddots & \ddots &\\ & \ddots & \ddots & T_{m-1,m}\\ 0 & & T_{m-1,m} & T_{m,m} \end{pmatrix} \quad\text{with}\quad \sqrt{2}T_{jk} \sim \left\{ \begin{array}{cl} \mathcal{N}(0,2) & \text{if }j=k,\\ \chi_{2(m-j+1)} & \text{if }j=k+1. \end{array} \right. \]

The random matrices \( {G} \) and \( {T} \) have the same spectrum. Following Trotter, as \( {m\rightarrow\infty} \), the tridiagonal matrix \( {\frac{1}{\sqrt{m}}T} \) matches a Toeplitz matrix studied by Kac, Murdock, and Szegő in the 1950s in (KMS), and this leads to an alternative proof of the Wigner theorem for the GUE.

Note that the algorithm used by Silverstein to construct the orthogonal matrices and the algorithm used by Trotter to construct the unitary matrix are some versions of the famous and tremendously useful QR algorithm, Arnoldi algorithm, or Lanczos algorithm in matrix numerical analysis.

Dumitriu-Edelman decomposition of matrices from Beta Ensembles.

Inspired by the previous work of Trotter, Dumitriu and Edelman have shown that for any real \( {\beta>0} \), if \( {T} \) is the random \( {m\times m} \) tridiagonal symmetrix matrix with \( {{(T_{j,k})}_{1\leq j\leq k\leq m}} \) independent, such that \( {\sqrt{2}T_{j,j}\sim\mathcal{N}(0,2)} \) for any \( {j=1,\ldots,m} \) and \( {2T_{j,j-1}^2\sim\chi_{\beta(m-j+1)}^2} \) for any \( {j=2,\ldots,m} \). In other words

\[ T= \begin{pmatrix} T_{1,1} & T_{2,1} & & 0\\ T_{2,1} & \ddots & \ddots &\\ & \ddots & \ddots & T_{m-1,m}\\ 0 & & T_{m-1,m} & T_{m,m} \end{pmatrix} \quad\text{with}\quad \sqrt{2}T_{jk} \sim \left\{ \begin{array}{cl} \mathcal{N}(0,2) & \text{if }j=k,\\ \chi_{\beta(m-j+1)} & \text{if }j=k+1. \end{array} \right. \]

then the eigenvalues of \( {T} \) are distributed according to the Coulomb gas on \( {\mathbb{R}} \) with density proportional to

\[ (\lambda_1,\ldots,\lambda_n) \mapsto \exp\left(-\frac{1}{2}\sum_{i=1}^m\lambda_i^2\right) \prod_{j<k}\left|\lambda_j-\lambda_k\right|^\beta \]

This gas is referred to as the \( {\beta} \)-Hermite Ensemble. For \( {\beta=2} \) we get the GUE.

Inspired by the previous work of Silverstein, Dumitriu and Edelman have also shown that for any reals \( {a>0} \) and \( {\beta>0} \), if \( {B} \) is the random \( {m\times m} \) bidiagonal lower triangular matrix with \( {{(B_{j,k})}_{1\leq j,k\leq m}} \) independent, such that \( {T_{j,j}^2\sim\chi_{2a-\beta(j-1)}} \) for any \( {j=1,\ldots,m} \) and \( {T_{j,j-1}^2\sim\chi_{\beta(m-j+1)}^2} \) for any \( {j=2,\ldots,m} \). In other words

\[ B= \begin{pmatrix} B_{1,1} & 0 & & 0\\ B_{2,1} & \ddots & \ddots &\\ & \ddots & \ddots & 0\\ 0 & & B_{m-1,m} & B_{m,m} \end{pmatrix} \quad\text{with}\quad B_{jk} \sim \left\{ \begin{array}{cl} \chi_{2a-\beta(j-1)} & \text{if }j=k,\\ \chi_{\beta(m-j+1)} & \text{if }j=k+1. \end{array} \right. \]

then the singular values of \( {B} \) are distributed according to the Coulomb gas on \( {\mathbb{R}_+} \) with density proportional to

\[ (\lambda_1,\ldots,\lambda_n) \mapsto \prod_{i=1}^m\lambda_i^{a-(1+\frac{\beta}{2}(m-1))} \exp\left(-\frac{1}{2}\sum_{i=1}^m\lambda_i\right) \prod_{j<k}\left|\lambda_j-\lambda_k\right|^\beta \]

This gas is referred to as the \( {\beta} \)-Laguerre Ensemble. For \( {\beta=2} \) we get the density of the eigenvalues of random matrices following the Wishart distribution.

Krishnapur and Virág decomposition of Ginibre Ensemble.

Inspired by the approach of Dumitriu and Edelman, Khrishnapur and Virág have shows that for any \( {\beta\in\{1,2\}} \), if \( {T} \) is the random \( {m\times m} \) with \( {{(T_{j,k})}_{1\leq j,k\leq m}} \) independent, such that \( {T_{j,k}=0} \) for any \( {1\leq j,k\leq m} \) with \( {k>j+1} \) (it is a so-called Hessenberg matrix), and \( {\beta T_{j,j+1}^2\sim\chi_{\beta j}} \) for any \( {j=1,\ldots,m-1} \), and \( {T_{j,k}\sim\mathcal{N}} \) for any \( {1\leq j\leq k\leq m} \), where \( {\mathcal{N}=\mathbb{N}(0,1)} \) on \( {\mathbb{R}} \) if \( {\beta=1} \) while \( {\mathcal{N}=\mathcal{N}(0,\frac{1}{2}I_2)} \) on \( {\mathbb{C}} \) if \( {\beta=2} \). In other words

\[ T= \begin{pmatrix} T_{1,1} & T_{1,2} & & 0\\ T_{2,1} & \ddots & \ddots &\\ \vdots & \ddots & \ddots & T_{m-1,m}\\ T_{1,m} & \cdots & T_{m-1,m} & T_{m,m} \end{pmatrix} \quad\text{with}\quad T_{jk} \sim \left\{ \begin{array}{cl} \beta^{-1/2}\chi_{\beta j} & \text{if }k=j+1,\\ \mathcal{N} & \text{if }j\leq k. \end{array} \right. \]

then the eigenvalues of \( {B} \) are distributed as the eigenvalues of a random \( {m\times m} \) matrix with iid entries of law \( {\mathcal{N}} \) (the so-called real or complex Ginibre ensemble). When \( {\beta=2} \) these eigenvalues are distributed according to the Coulomb gas on \( {\mathbb{C}} \) with density proportional to

\[ (\lambda_1,\ldots,\lambda_n) \mapsto \exp\left(-\frac{1}{2}\sum_{i=1}^m|\lambda_i|^2\right) \prod_{j<k}\left|\lambda_j-\lambda_k\right|^2, \]

which is the analogue on \( {\mathbb{C}} \) of the formula for the GUE already mentioned above.

Thanks. This post benefited in part from unexpected discussions with Bernard Bercu and Laëtitia Comminges.

Further reading.

Bartlett, M.S. On the theory of statistical regression, Proceedings of the Royal Society of Edinburgh vol. 53 (1933) pp. 260-283. zbMATH [The year 1933 was not yet scanned by the publisher, so I asked the library: PDF!]
Anderson, T. W. An Introduction to Multivariate Statistical Analysis, Wiley (1958) [notably pages 256-257]; MR1990662
Deheuvels, P. Modèle linéaire et analyse de la variance Notes de cours de l'UPMC (2012) [notably Section 2.1.6];
Muirhead, R. J. Aspects of Multivariate Statistical Theory Wiley (1982) [notably Theorem 4.2.14 page 99]; MR652932
Silverstein, J. W. The smallest eigenvalue of a large dimensional Wishart matrix Annals of Probab. 13:1364-1368 (1985). MR806232
Trotter, H. F. Eigenvalue Distributions of Large Her mitian Matrices; Wigner's semi- circle law and a theorem of Kac, Murdock, and Szegő, Advances in Mathematics 54:67-82 (1984) MR761763
Dumitriu I. and Edelman, A. Matrix Models for Beta Ensembles, J. Math. Phys. 43 (2002) no. 11 pp. 5830--5847 arXiv:math-ph/0206043 MR1936554 [note that they cite the previous works of Trotter and of Silverstein]
Rouault, A. Asymptotic behavior of random determinants in the Laguerre, Gram and Jacobi ensembles Latin American Journal of Probability and Mathematical Statistics (ALEA) 3 (2007) 181-230 arXiv:0607767 MR2365642
Krishnapur, M. and Virág, B. The Ginibre ensemble and Gaussian analytic functions, International Mathematics Research Notices 2014.6 (2014): 1441-1464 arXiv:1112.2457 MR3180597
Forrester, P. J. Log-gases and random matrices London Mathematical Society Monographs Series 34 Princeton University Press (2010) xiv+791 pp. MR2641363

Some other posts: