Press "Enter" to skip to content

Category: Uncategorized

Schur complement and geometry of positive definite matrices

This post concerns the geometry of the convex cone of positive definite symmetric matrices. Let us denote by \( {\mathcal{S}_n} \) the set of \( {n\times n} \) real symmetric matrices, and by \( {\mathcal{S}_n^+} \) the subset of positive definite matrices. Let us fix some \( {K\in\mathcal{S}_n} \) and write

\[ K:= \begin{pmatrix} A & W \\ W^\top & S \end{pmatrix} \]

with \( {A\in\mathcal{S}_{n-m}} \) and \( {S\in\mathcal{S}_{m}} \) for some fixed \( {1\leq m<n} \). If \( {A\in\mathcal{S}_{n-m}^+} \) then the Schur complement of \( {A} \) with respect to \( {K} \) is the matrix

\[ T:=S-W^\top A^{-1}W\in\mathcal{S}_m. \]

It is well known that

\[ K\in\mathcal{S}_n^+ \quad\text{iff}\quad A\in\mathcal{S}_{n-m}^+ \quad\text{and}\quad T\in\mathcal{S}_m^+, \]

and moreover, \( {K^{-1}} \) admits the block decomposition

\[ K^{-1} = \begin{pmatrix} I_n & -A^{-1}W \\ 0 & I_m \\ \end{pmatrix} \begin{pmatrix} A^{-1} & 0 \\ 0 & T^{-1} \\ \end{pmatrix} \begin{pmatrix} I_n & 0 \\ -W^\top A^{-1} & I_m \\ \end{pmatrix}, \]

see for instance Theorem 7.7.6 page 472 in Horn and Johnson, Matrix analysis, Cambridge University Press, 1990. In terms of Gaussian random vectors, if \( {X=(Y,Z)} \) is a Gaussian random vector of \( {\mathbb{R}^n=\mathbb{R}^{n-m}\times\mathbb{R}^m} \) with \( {1\leq m<n} \), with covariance matrix \( {K} \), then

\[ Y\sim\mathcal{N}(0,A) \quad\text{and}\quad Z\sim\mathcal{N}(0,S) \quad\text{and}\quad \mathcal{L}(Z\,\vert\,Y)=\mathcal{N}(W^\top A^{-1}Y,T). \]

More generally, for a non empty \( {I\subset\{1,\ldots,n\}} \), we have, with \( {J:=I^c} \),

\[ X_I\sim\mathcal{N}(0,K_{I,I}) \quad\text{and}\quad X_{J}\sim\mathcal{N}(0,K_{J,J}) \]

and the Schur complement of \( {K_{J,J}} \) in \( {K} \) is \( {K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I}} \) and

\[ \mathcal{L}(X_I\,\vert\,X_{J}) \sim\mathcal{N}(K_{I,J}(K_{J,J})^{-1}X_{J},\,K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I}). \]

Moreover, from the block inverse formula we get

\[ (K^{-1})_{I,I} = (K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I})^{-1}. \]

The components of \( {X_I} \) are conditionally independent given the remaining components iff \( {((K^{-1})_{I,I})^{-1}} \) is diagonal, i.e. iff \( {(K^{-1})_{I,I}} \) is diagonal. In particular, if \( {I=\{i,j\}} \) with \( {i\neq j} \) then \( {X_i} \) and \( {X_j} \) are conditionally independent given the remaining components iif \( {(K^{-1})_{i,j}=0} \).

The extremal elements of the closed convex cone \( {\overline{\mathcal{S}_n^+}} \) are the unit rank matrices \( {\{uu^\top: u\in\mathbb{R}^n\}} \). Every \( {A\in\overline{\mathcal{S}_n^+}} \) can be written as \( {A=\lambda_1 u_1u_1^\top+\cdots+\lambda_n u_nu_n^\top} \) where \( {\lambda_1,\ldots,\lambda_n} \) and \( {u_1,\ldots,u_n} \) denote the spectrum and the eigenvectors of \( {A} \) respectively. This is the Carathéodory local parametrization of the convex cone \( {\overline{\mathcal{S}_n^+}} \) with \( {n} \) Carathéodory conic rays \( {\lambda_1 u_1u_1^\top,\ldots,\lambda_r u_nu_n^\top} \).

For any \( {K\in\mathcal{S}_n} \), any \( {1\leq i\leq j\leq n} \), and any \( {x\in\mathbb{R}} \), let \( {\xi_{i,j}(K,x)} \) be the element of \( {\mathcal{S}_n} \) obtained from \( {K} \) by replacing \( {K_{i,j}} \) and \( {K_{j,i}} \) by \( {x} \). In particular,

\[ K=\xi_{i,j}(K,K_{i,j}). \]

Now for which values of \( {x} \) the matrix \( {\xi_{i,j}(K,x)} \) belongs to \( {\mathcal{S}_n^+} \)? The answers is as follows.

Theorem 1 Set \( {K\in\mathcal{S}_n^+} \), \( {n\geq2} \), \( {1\leq i\leq j\leq n} \), and \( {x\in\mathbb{R}} \).

  1. Off-diagonal. If \( {i<j} \) then

    \[ \xi_{i,j}(K,x)\in\mathcal{S}_n^+ \quad\text{iff}\quad \delta_{i,j}^-(K)<x<\delta_{i,j}^+(K) \]

    where \( {\delta_{1,2}^{\pm}(K):=\pm\sqrt{K_{1,1}K_{2,2}}} \) if \( {n=2} \), while if \( {n>2} \),

    \[ \delta_{i,j}^\pm(K):= \frac{-u^\top v\pm\left((u^\top v)^2- (A^{-1})_{i,i}(v^\top Bv-K_{j,j})\right)^{1/2}} {(A^{-1})_{i,i}}. \]

    with

    \[ A:=K_{-j,-j},\quad B:=(A^{-1})_{-i,-i},\quad u:=A_{-i,i},\quad v:=K_{-\{i,j\},j}. \]

  2. Diagonal.

    \[ \xi_{i,i}(K,x)\in\mathcal{S}_n^+ \quad\text{iff}\quad x > \delta_{i,i}(K), \]

    where

    \[ \delta_{i,i}(K):=w^\top W^{-1}w \]

    with

    \[ W:=K_{-i,-i}\quad w:=K_{-i,i}. \]

In particular, \( {\delta_{i,i}(K)<K_{i,i}} \) for any \( {1\leq i\leq n} \) and \( {\delta_{i,j}^-(K)< K_{i,j}<\delta_{i,j}^+(K)} \) for any \( {1\leq i<j\leq n} \). Moreover, \( {\xi_{i,j}(K,0)\in\mathcal{S}_n^+} \) iff \( {v^\top Bv\leq K_{j,j}} \).

Proof: We prove the first part for \( {n>2} \) and leave the remaining statements to the reader. The matrix \( {A} \) is non singular since \( {K} \) is non singular. Moreover, \( {K\in\mathcal{S}_n^+} \) implies \( {A^{-1}\in\mathcal{S}_n^+} \), and in particular \( {(A^{-1})_{i,i}>0} \). The set

\[ \{x\in\mathbb{R}:\xi_{i,j}(K,x)\in\mathcal{S}_n^+\} \]

is non empty and convex (an interval), as the intersection of an affine subspace with a convex cone, both containing \( {K=\xi_{i,j}(K,K_{i,j})} \). For any \( {x\in\mathbb{R}} \),

\[ (\xi_{i,j}(K,x))_{-j,-j}=K_{-j,-j}=A. \]

By considering the Schur complement, the matrix \( {\xi_{i,j}(K,x)} \) belongs to \( {\mathcal{S}_n^+} \) iff

\[ \sum_{1\leq k,k’\leq n} z_k (A^{-1})_{k,k’} z_{k’} <K_{j,j}, \]

where \( {z:=(\xi_{i,j}(K,x))_{-j,j}} \). By rewriting \( {z} \) in terms of \( {K} \), we get the equation

\[ (A^{-1})_{i,i}\,x^2+(2u^\top v)\,x+v^\top Bv-K_{j,j}<0. \]

Recall that \( {(A^{-1})_{i,i}>0} \). Next, since \( {K=\xi_{i,j}(K,K_{i,j})} \) belongs to \( {\mathcal{S}_n^+} \), the equation admits \( {x=K_{i,j}} \) as a solution, and thus the discriminant

\[ (u^\top v)^2- (A^{-1})_{i,i}(v^\top Bv-K_{j,j}) \]

is positive. This leads to the non empty interval \( {(\delta_{i,j}^-(K),\delta_{i,j}^+(K))} \). ☐

Let \( {K\in\mathcal{S}_n^+} \) with \( {n>2} \), and consider the notations of theorem 1. For any \( {1\leq i<j\leq n} \), we define the matrices \( {\xi_{i,j}^-(K)} \) and \( {\xi_{i,j}^+(K)} \) by

\[ \xi_{i,j}^\pm(K):=\xi_{i,j}(K,\delta_{i,j}^\pm(K)). \]

Similarly, for any \( {1\leq i\leq n} \), we define

\[ \xi_{i,i}(K):=\xi_{i,i}(K,\delta_{i,i}(K)). \]

The elements of

\[ \mathcal{E}(K):= \{\xi_{i,j}^\pm(K);1\leq i< j\leq n\} \cup\{\xi_{i,i}(K);1\leq i\leq n\} \]

are not of full rank and belong to the boundary of \( {\mathcal{S}_n^+} \). If \( {E_1,\ldots,E_r\in\mathcal{E}(K)} \), and \( {\lambda_1>0,\ldots,\lambda_r>0} \), then \( {\lambda_1 E_1+\cdots+\lambda_r E_r\in\mathcal{S}_n^+} \). These sums construct from \( {K} \) an open convex cone, included in \( {\mathcal{S}_n^+} \), and containing \( {K} \).

Leave a Comment
Syntax · Style · .