Three convex and compact sets of matrices – Libres pensées d'un mathématicien ordinaire

This post is devoted to few convex and compact sets of matrices that I like.

The set \( {\mathcal{C}_n} \) of correlation matrices. A real \( {n\times n} \) matrix \( {C} \) is a correlation matrix when \( {C} \) is symmetric, semidefinite positive, with unit diagonal. This means that

\[ C_{ii}=1, \quad C_{ji}=C_{ji},\quad \left<x,Cx\right>\geq0 \]

for every \( {1\leq i,j\leq n} \) and for every \( {x\in\mathbb{R}^n} \). In particular, we have \( {C_{ij}=C_{ji}\in[-1,1]} \) for every \( {1\leq i,j\leq n} \). The set \( {\mathcal{C}_n} \) of all correlation matrices is convex and compact. It contains the identity matrix \( {I_n} \) but is not stable by the matrix product. However, it is stable by the Schur-Hadamard entrywise product! It is the intersection of the cone of symmetric positive semidefinite matrices \( {\mathcal{S}_n^+} \) with the affine subspace of equations \( {S_{11}=\cdots=S_{nn}=1} \). A closer look at \( {\mathcal{C}_3} \) gives, by simply computing the principal minors,

\[ \begin{pmatrix} 1 & a & c \\ a & 1 & b \\ c & b & 1 \end{pmatrix} \in\mathcal{C}_3 \quad\text{iff}\quad a,b,c\in[-1,1] \quad\text{and}\quad a^2+b^2+c^2\leq 1+2abc. \]

This shows the ellipsoidal geometry of \( {\mathcal{C}_n} \), coming from the spectral constraint, and related to the Schur complement. The set \( {\mathcal{C}_n} \) is not a polytope of \( {\mathbb{R}^{n\times n}} \) (i.e. is not the intersection of a finite collection of half spaces). Some authors use the name elliptope, an example of spectrahedron (set of matrices defined by linear matrix inequalities). The extremal points of \( {\mathcal{C}_n} \) were studied by Ycart while the facial structure was explored by Laurent and Poljak. A useful parameterization of \( {\mathcal{C}_n} \) using covariance graphs was developed by Kurowicka and Cooke.

The uniform probability measure on \( {\mathcal{C}_n} \) is the normalized trace of the Lebesgue measure. Its simulation was studied by many people, and one can take a look at the works of Joe and of Lewandowski, Kurowicka, and Joe, where the volume of \( {\mathcal{C}_n} \) is also computed in \( {\mathbb{R}^{n(n-1)/2}} \):

\[ \mathrm{Volume}(\mathcal{C}_n) = \pi^{(n^2-1)/4}\frac{\prod_{k=1}^{(n-1)/2}\Gamma(2k)}{2^{(n-1)^2/4}\Gamma^{n-1}((n+1)/2)} \]

if \( {n} \) is odd, while if \( {n} \) is even:

\[ \mathrm{Volume}(\mathcal{C}_n) = \pi^{n(n-2)/4}\frac{2^{(3n^2-4n)/4}\Gamma^n(n/2)\prod_{k=1}^{(n-2)/2}\Gamma(2k)}{\Gamma^{n-1}(n)}. \]

If \( {S} \) is a symmetric \( {n\times n} \) positive semidefinite matrix then the matrix \( {C} \) defined by

\[ C_{ij}=\frac{S_{ij}}{\sqrt{S_{ii}S_{jj}}} \]

for every \( {1\leq i,j\leq n} \) belongs to \( {\mathcal{C}_n} \). If \( {C} \) is a random element of \( {\mathcal{C}_n} \) following the uniform law, then it is known that the off-diagonal entries of \( {C} \), which are correlated, follow the same law, namely the Beta law of parameter \( {(n/2,n/2)} \) on \( {[-1,1]} \) (the density of the law of \( {C} \) is also computable using a covariance graph representation, leading to the volumetric formulas above). This allows to show that if \( {S} \) is the empirical covariance matrix of a standard Gaussian sample in \( {\mathbb{R}^n} \) then the random correlation matrix constructed from \( {S} \) (which is the empirical correlation matrix) does not follow the uniform law on \( {\mathcal{C}_n} \). One may ask whether or not the asymptotic spectral properties of a uniform random element of \( {\mathcal{C}_n} \), as \( {n\rightarrow\infty} \), are identical to the ones of Gaussian empirical covariance matrices (Marchenko-Pastur quarter circular law). I have not tested this numerically by simulation.

The set \( {\mathcal{M}_n} \) of stochastic matrices. A real \( {n\times n} \) matrix \( {M} \) is a stochastic matrix (we say also a Markov matrix) when all its entries are non-negative and all rows sum up to one. This means that \( {M_{ij}\geq0} \) and \( {M_{i1}+\cdots+M_{in}=1} \) for every \( {1\leq i,j\leq n} \). In particular, \( {M_{ij}\in[0,1]} \) for all \( {1\leq i,j\leq n} \), and every row belongs to the non-negative portion of the \( {\ell^1_n} \) unit sphere. The name Markov comes from the fact that such matrices are exactly the transition matrices of finite Markov chains on a state space of cardinal \( {n} \). The set \( {\mathcal{M}_n} \) of all \( {n\times n} \) Markov matrices is convex and compact. It is a polytope of \( {\mathbb{R}^{n\times n}} \) (i.e. intersection of half spaces) with \( {n(n-1)} \) degrees of freedom. The set \( {\mathcal{M}_n} \) contains the identity matrix \( {I_n} \) and is stable by matrix product (it is thus a semigroup for this product).

If \( {E} \) is a \( {n\times n} \) matrix with non-negative entries then the \( {n\times n} \) matrix \( {M} \), defined by

\[ M_{ij}=\frac{E_{ij}}{E_{i1}+\cdots+E_{in}} \]

for every \( {1\leq i,j\leq n} \), belongs to \( {\mathcal{M}_n} \). If the entries of \( {E} \) are i.i.d. exponential random variables then the Markov matrix \( {M} \) constructed from \( {E} \) follows the uniform law on \( {\mathcal{M}_n} \) which is the normalized trace of the Lebesgue measure. This kind of random matrices is called the Dirichlet Markov Ensemble since the rows of such random matrices are i.i.d. and follow the Dirichlet distribution with parameter \( {(1,\ldots,1)} \). Note that the entries of this matrix are identically distributed and follow a Beta law, and are correlated on each row and independent across rows. The law of large numbers gives \( {E_{i1}+\cdots+E_{in}=n(1+o(1))} \) and this suggests that the asymptotic spectral properties of such a random \( {M} \) as \( {n\rightarrow\infty} \) are identical to the ones of \( {E} \) (Marchenko-Pastur quarter circular law and Girko circular law), see cb-pc-dc.

Stochastic group. Let us consider the set of all \( {n\times n} \) real matrices such that \( {M} \) is invertible and \( {M_{i1}+\cdots+M_{in}=1} \) for all \( {1\leq i\leq n} \) (note that the entries of \( {M} \) are allowed to take negative values). This set is actually a group for the matrix product, sometimes called the stochastic group, and we refer to Poole for more details. This set is neither convex nor compact and I ignore what are its asymptotic spectral properties if equipped with a probability measure (notion of uniform law via constrained maximum entropy?).

The set \( {\mathcal{B}_n} \) of doubly stochastic matrices. A real \( {n\times n} \) matrix \( {B} \) is doubly stochastic (we say also bistochastic) if \( {B} \) and \( {B^\top} \) are both stochastic matrices. This means that \( {B_{i1}+\cdots+B_{in}=B_{i1}+\cdots+B_{in}=1} \) for any \( {1\leq i\leq n} \). In particular, \( {B_{ij}\in[0,1]} \) for every \( {1\leq i,j\leq n} \). The set \( {\mathcal{B}_n} \) is convex and compact, and forms a polytope of \( {\mathbb{R}^{n\times n}} \) with \( {(n-1)^2} \) degrees of freedom. It was shown by Canfield and McKay that in \( {\mathbb{R}^{(n-1)\times(n-1)}} \),

\[ \mathrm{Volume}(\mathcal{B}_n) =\frac{\exp\left(\frac{1}{3}+n^2+o(1)\right)}{(2\pi)^{n-1/2}n^{(n-1)^2}n^{n-1}}. \]

The Birkhoff polytope was remarkably studied by Barvinok, and a discrete version (contingency tables) by Barvinok and Hartigan.

The set \( {\mathcal{B}_n} \) contains the identity matrix and is stable by the matrix product (semigroup!). The matrix with all entries equal to \( {1/n} \) belongs to \( {\mathcal{B}_n} \) and is called sometimes the Wedderburn matrix. The van der Waerden conjecture, solved by Falikman, states that the Wedderburn matrix achieves the minimum of the permanent on \( {\mathcal{B}_n} \). The set \( {\mathcal{B}_n} \) is referred to as the Birkhoff polytope (that is why we use the notation \( {\mathcal{B}_n} \)). A celebrated theorem of Birkhoff and von Neumann states that \( {\mathcal{B}_n} \) is the convex envelope of permutation matrices. It can be shown, using the Minkowski-Carathéodory theorem, that each element of \( {\mathcal{B}_n} \) is the convex combination of at most \( {(n-1)^2+1} \) permutation matrices (which depend on the particular element). The set \( {\mathcal{B}_n} \) is also known as the assignment polytope or as the transportation polytope, since each element corresponds to a transportation map between \( {n} \) unit sources and \( {n} \) unit targets. One should not confuse \( {\mathcal{B}_n} \) with the permutahedron. The set \( {\mathcal{B}_n} \) plays a role in diverse areas and can be used for instance for the proof of the Hoffman-Wielandt inequality in matrix analysis.

The uniform law \( {\mathcal{L}_n} \) on \( {\mathcal{B}_n} \) is the normalized trace of the Lebesgue measure. The law \( {\mathcal{L}_n} \) has identical marginal laws, and is invariant by permutation of rows and of columns but is not exchangeable (i.e. invariant by any permutation of the entries). Let us consider the random walk on \( {\mathcal{B}_n} \) obtained at each step by selecting randomly two rows and two columns and transferring a random amount of mass between the corresponding four entries. This is a Markov chain which admits \( {\mathcal{L}_n} \) as an invariant law. Its speed of convergence was studied by Diaconis. More recently, Chatterjee, Diaconis, and Sly have shown that many asymptotic properties of random matrices drawn from \( {\mathcal{L}_n} \) are identical to the ones of random matrices with i.i.d. exponential entries. In particular, the entries are close to the exponential distribution of mean \( {1/n} \) and the Marchenko-Pastur quarter circular law holds true. It is conjectured that the Girko circular law is also true (we ignore how to adapt the Tao and Vu proof).

Unistochastic matrices. If \( {U} \) is \( {n\times n} \) unitary then the matrix \( {(|U_{ij}|^2)_{1\leq i,j\leq n}} \) belongs to \( {\mathcal{B}_n} \). Such matrices are sometimes called unistochastic by some physicists. However, it turns out that many elements of \( {\mathcal{B}_n} \) are not unistochastic.

Tridiagonal doubly stochastic matrices. The subset of tridiagonal elements of \( {\mathcal{B}_n} \) corresponds to birth and death processes on a finite state space of cardinal \( {n} \) which are symmetric with respect to the uniform law. The uniform distribution on this remarkable subset was studied by Diaconis and Wood.

Permutation matrices. Recall that \( {P} \) is an \( {n\times n} \) permutation matrix when there exists a permutation \( {\sigma} \) of \( {\{1,\ldots,n\}} \) such that \( {P_{ij}=\mathbf{1}_{i=\sigma(j)}} \) for every \( {1\leq i,j\leq n} \). The set \( {\mathcal{P}_n} \) of \( {n\times n} \) permutation matrices is the set of extremal points of the convex polytope \( {\mathcal{B}_n} \). The set \( {\mathcal{P}_n} \) is a finite sub-group of \( {\mathrm{GL}_n} \) homomorphic to the permutation group \( {\Sigma_n} \). The spectrum of a permutation matrix \( {P} \) associated to \( {\sigma\in\Sigma_n} \) is actually a very nice object which is related to its cycles decomposition. Namely, if \( {\sigma} \) has exactly \( {1\leq r\leq n} \) cycles, with lengths \( {\ell_1,\ldots,\ell_r} \), then \( {P} \) has the spectrum of the bloc diagonal matrix

\[ B_1\oplus\cdots\oplus B_r \]

where \( {B_k} \) is \( {\ell_k\times\ell_k} \) of the form (the cycle is a right shift on \( {\mathbb{Z}/\ell_k\mathbb{Z}} \))

\[ \begin{pmatrix} 0 & 1 & 0 & 0 & \cdots & 0 \\ 0 & 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & 0 & 0 & \cdots & 1 \\ 1 & 0 & 0 & 0 & \cdots & 0 \end{pmatrix}. \]

The spectrum of \( {B_k} \) is given by the solutions of the equation \( {\lambda^{\ell(k)}=(-1)^{\ell(k)}} \) (roots of \( {\pm} \) unity). The spectrum of \( {P} \) is the union of the spectra of \( {B_1,\ldots,B_r} \). In particular, the spectrum of \( {P} \) is included in the unit circle \( {\{z\in\mathbb{C};|z|=1\}} \) and depends only on \( {c_1,\ldots,c_n} \) where \( {c_k} \) is the number of cycles of length \( {k} \) of \( {\sigma} \). Note that \( {c_1} \) is the number of fixed points of \( {\sigma} \), and \( {c_1+2c_2+\cdots+nc_n=n} \). Now if \( {P} \) is uniformly distributed on \( {\mathcal{P}_n} \) then \( {\sigma} \) is uniformly distributed on \( {\Sigma_n} \) and this allows to describe the spectrum of \( {P} \) via an Ewens distribution with parameter \( {\theta=1} \), which gives rise as \( {n\rightarrow\infty} \) to a Poisson-Dirichlet distribution. The spectrum of uniform permutation matrices was studied for instance by Wieand, by Hambly, Keevash, O'Connell, and Stark, and more recently by Ben Arous and Dang. We ignore how to exploit these results in order to study the spectrum of the element of \( {\mathcal{B}_n} \).

Note. I have learned the existence of the set of correlation matrices from Gérard Letac, an eclectic mind who was one of my teachers years ago.

Some other posts:

5 Comments

Florent Benaych-Georges 2011-11-11
Intéressant !
Djalil Chafaï 2012-05-08
The circular law for the Birkhoff polytope (doubly stochastic matrices) was recently proved by Hoi H. Nguyen.
Werner Hürlimann 2013-10-10
The volume of the convex set of correlation matrices has been calculated with another method in W. Hürlimann (2012), Positive semi-definite correlation matrices: recursive algorithmic generation and volume measure. Pure Mathematical Sciences 1(3), 137-149.
Corey O'Meara 2014-04-04
Related work: Corey O'Mearaa & Rajesh Pereirab, Self-dual maps and symmetric bistochastic matrices, Linear and Multilinear Algebra 61(1) 23-34 (2013).
Djalil Chafaï 2014-04-11
Here are two interesting pages on mathoverflow:
- Extreme points of transportation polytope
- Generalizations of the Birkhoff-von Neumann theorem

This site uses Akismet to reduce spam. Learn how your comment data is processed.