Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Schur complement and geometry of positive definite matrices

This post concerns the geometry of the convex cone of positive definite symmetric matrices. Let us denote by \( {\mathcal{S}_n} \) the set of \( {n\times n} \) real symmetric matrices, and by \( {\mathcal{S}_n^+} \) the subset of positive definite matrices. Let us fix some \( {K\in\mathcal{S}_n} \) and write

\[ K:= \begin{pmatrix} A & W \\ W^\top & S \end{pmatrix} \]

with \( {A\in\mathcal{S}_{n-m}} \) and \( {S\in\mathcal{S}_{m}} \) for some fixed \( {1\leq m<n} \). If \( {A\in\mathcal{S}_{n-m}^+} \) then the Schur complement of \( {A} \) with respect to \( {K} \) is the matrix

\[ T:=S-W^\top A^{-1}W\in\mathcal{S}_m. \]

It is well known that

\[ K\in\mathcal{S}_n^+ \quad\text{iff}\quad A\in\mathcal{S}_{n-m}^+ \quad\text{and}\quad T\in\mathcal{S}_m^+, \]

and moreover, \( {K^{-1}} \) admits the block decomposition

\[ K^{-1} = \begin{pmatrix} I_n & -A^{-1}W \\ 0 & I_m \\ \end{pmatrix} \begin{pmatrix} A^{-1} & 0 \\ 0 & T^{-1} \\ \end{pmatrix} \begin{pmatrix} I_n & 0 \\ -W^\top A^{-1} & I_m \\ \end{pmatrix}, \]

see for instance Theorem 7.7.6 page 472 in Horn and Johnson, Matrix analysis, Cambridge University Press, 1990. In terms of Gaussian random vectors, if \( {X=(Y,Z)} \) is a Gaussian random vector of \( {\mathbb{R}^n=\mathbb{R}^{n-m}\times\mathbb{R}^m} \) with \( {1\leq m<n} \), with covariance matrix \( {K} \), then

\[ Y\sim\mathcal{N}(0,A) \quad\text{and}\quad Z\sim\mathcal{N}(0,S) \quad\text{and}\quad \mathcal{L}(Z\,\vert\,Y)=\mathcal{N}(W^\top A^{-1}Y,T). \]

More generally, for a non empty \( {I\subset\{1,\ldots,n\}} \), we have, with \( {J:=I^c} \),

\[ X_I\sim\mathcal{N}(0,K_{I,I}) \quad\text{and}\quad X_{J}\sim\mathcal{N}(0,K_{J,J}) \]

and the Schur complement of \( {K_{J,J}} \) in \( {K} \) is \( {K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I}} \) and

\[ \mathcal{L}(X_I\,\vert\,X_{J}) \sim\mathcal{N}(K_{I,J}(K_{J,J})^{-1}X_{J},\,K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I}). \]

Moreover, from the block inverse formula we get

\[ (K^{-1})_{I,I} = (K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I})^{-1}. \]

The components of \( {X_I} \) are conditionally independent given the remaining components iff \( {((K^{-1})_{I,I})^{-1}} \) is diagonal, i.e. iff \( {(K^{-1})_{I,I}} \) is diagonal. In particular, if \( {I=\{i,j\}} \) with \( {i\neq j} \) then \( {X_i} \) and \( {X_j} \) are conditionally independent given the remaining components iif \( {(K^{-1})_{i,j}=0} \).

The extremal elements of the closed convex cone \( {\overline{\mathcal{S}_n^+}} \) are the unit rank matrices \( {\{uu^\top: u\in\mathbb{R}^n\}} \). Every \( {A\in\overline{\mathcal{S}_n^+}} \) can be written as \( {A=\lambda_1 u_1u_1^\top+\cdots+\lambda_n u_nu_n^\top} \) where \( {\lambda_1,\ldots,\lambda_n} \) and \( {u_1,\ldots,u_n} \) denote the spectrum and the eigenvectors of \( {A} \) respectively. This is the Carathéodory local parametrization of the convex cone \( {\overline{\mathcal{S}_n^+}} \) with \( {n} \) Carathéodory conic rays \( {\lambda_1 u_1u_1^\top,\ldots,\lambda_r u_nu_n^\top} \).

For any \( {K\in\mathcal{S}_n} \), any \( {1\leq i\leq j\leq n} \), and any \( {x\in\mathbb{R}} \), let \( {\xi_{i,j}(K,x)} \) be the element of \( {\mathcal{S}_n} \) obtained from \( {K} \) by replacing \( {K_{i,j}} \) and \( {K_{j,i}} \) by \( {x} \). In particular,

\[ K=\xi_{i,j}(K,K_{i,j}). \]

Now for which values of \( {x} \) the matrix \( {\xi_{i,j}(K,x)} \) belongs to \( {\mathcal{S}_n^+} \)? The answers is as follows.

Theorem 1 Set \( {K\in\mathcal{S}_n^+} \), \( {n\geq2} \), \( {1\leq i\leq j\leq n} \), and \( {x\in\mathbb{R}} \).

  1. Off-diagonal. If \( {i<j} \) then

    \[ \xi_{i,j}(K,x)\in\mathcal{S}_n^+ \quad\text{iff}\quad \delta_{i,j}^-(K)<x<\delta_{i,j}^+(K) \]

    where \( {\delta_{1,2}^{\pm}(K):=\pm\sqrt{K_{1,1}K_{2,2}}} \) if \( {n=2} \), while if \( {n>2} \),

    \[ \delta_{i,j}^\pm(K):= \frac{-u^\top v\pm\left((u^\top v)^2- (A^{-1})_{i,i}(v^\top Bv-K_{j,j})\right)^{1/2}} {(A^{-1})_{i,i}}. \]

    with

    \[ A:=K_{-j,-j},\quad B:=(A^{-1})_{-i,-i},\quad u:=A_{-i,i},\quad v:=K_{-\{i,j\},j}. \]

  2. Diagonal.

    \[ \xi_{i,i}(K,x)\in\mathcal{S}_n^+ \quad\text{iff}\quad x > \delta_{i,i}(K), \]

    where

    \[ \delta_{i,i}(K):=w^\top W^{-1}w \]

    with

    \[ W:=K_{-i,-i}\quad w:=K_{-i,i}. \]

In particular, \( {\delta_{i,i}(K)<K_{i,i}} \) for any \( {1\leq i\leq n} \) and \( {\delta_{i,j}^-(K)< K_{i,j}<\delta_{i,j}^+(K)} \) for any \( {1\leq i<j\leq n} \). Moreover, \( {\xi_{i,j}(K,0)\in\mathcal{S}_n^+} \) iff \( {v^\top Bv\leq K_{j,j}} \).

Proof: We prove the first part for \( {n>2} \) and leave the remaining statements to the reader. The matrix \( {A} \) is non singular since \( {K} \) is non singular. Moreover, \( {K\in\mathcal{S}_n^+} \) implies \( {A^{-1}\in\mathcal{S}_n^+} \), and in particular \( {(A^{-1})_{i,i}>0} \). The set

\[ \{x\in\mathbb{R}:\xi_{i,j}(K,x)\in\mathcal{S}_n^+\} \]

is non empty and convex (an interval), as the intersection of an affine subspace with a convex cone, both containing \( {K=\xi_{i,j}(K,K_{i,j})} \). For any \( {x\in\mathbb{R}} \),

\[ (\xi_{i,j}(K,x))_{-j,-j}=K_{-j,-j}=A. \]

By considering the Schur complement, the matrix \( {\xi_{i,j}(K,x)} \) belongs to \( {\mathcal{S}_n^+} \) iff

\[ \sum_{1\leq k,k’\leq n} z_k (A^{-1})_{k,k’} z_{k’} <K_{j,j}, \]

where \( {z:=(\xi_{i,j}(K,x))_{-j,j}} \). By rewriting \( {z} \) in terms of \( {K} \), we get the equation

\[ (A^{-1})_{i,i}\,x^2+(2u^\top v)\,x+v^\top Bv-K_{j,j}<0. \]

Recall that \( {(A^{-1})_{i,i}>0} \). Next, since \( {K=\xi_{i,j}(K,K_{i,j})} \) belongs to \( {\mathcal{S}_n^+} \), the equation admits \( {x=K_{i,j}} \) as a solution, and thus the discriminant

\[ (u^\top v)^2- (A^{-1})_{i,i}(v^\top Bv-K_{j,j}) \]

is positive. This leads to the non empty interval \( {(\delta_{i,j}^-(K),\delta_{i,j}^+(K))} \). ☐

Let \( {K\in\mathcal{S}_n^+} \) with \( {n>2} \), and consider the notations of theorem 1. For any \( {1\leq i<j\leq n} \), we define the matrices \( {\xi_{i,j}^-(K)} \) and \( {\xi_{i,j}^+(K)} \) by

\[ \xi_{i,j}^\pm(K):=\xi_{i,j}(K,\delta_{i,j}^\pm(K)). \]

Similarly, for any \( {1\leq i\leq n} \), we define

\[ \xi_{i,i}(K):=\xi_{i,i}(K,\delta_{i,i}(K)). \]

The elements of

\[ \mathcal{E}(K):= \{\xi_{i,j}^\pm(K);1\leq i< j\leq n\} \cup\{\xi_{i,i}(K);1\leq i\leq n\} \]

are not of full rank and belong to the boundary of \( {\mathcal{S}_n^+} \). If \( {E_1,\ldots,E_r\in\mathcal{E}(K)} \), and \( {\lambda_1>0,\ldots,\lambda_r>0} \), then \( {\lambda_1 E_1+\cdots+\lambda_r E_r\in\mathcal{S}_n^+} \). These sums construct from \( {K} \) an open convex cone, included in \( {\mathcal{S}_n^+} \), and containing \( {K} \).

Leave a Comment
Syntax · Style · .