Month: August 2010

This post concerns the geometry of the convex cone of positive definite symmetric matrices. Let us denote by ${\mathcal{S}_n}$ the set of ${n\times n}$ real symmetric matrices, and by ${\mathcal{S}_n^+}$ the subset of positive definite matrices. Let us fix some ${K\in\mathcal{S}_n}$ and write

$K:= \begin{pmatrix} A & W \\ W^\top & S \end{pmatrix}$

with ${A\in\mathcal{S}_{n-m}}$ and ${S\in\mathcal{S}_{m}}$ for some fixed ${1\leq m<n}$. If ${A\in\mathcal{S}_{n-m}^+}$ then the Schur complement of ${A}$ with respect to ${K}$ is the matrix

$T:=S-W^\top A^{-1}W\in\mathcal{S}_m.$

It is well known that

$K\in\mathcal{S}_n^+ \quad\text{iff}\quad A\in\mathcal{S}_{n-m}^+ \quad\text{and}\quad T\in\mathcal{S}_m^+,$

and moreover, ${K^{-1}}$ admits the block decomposition

$K^{-1} = \begin{pmatrix} I_n & -A^{-1}W \\ 0 & I_m \\ \end{pmatrix} \begin{pmatrix} A^{-1} & 0 \\ 0 & T^{-1} \\ \end{pmatrix} \begin{pmatrix} I_n & 0 \\ -W^\top A^{-1} & I_m \\ \end{pmatrix},$

see for instance Theorem 7.7.6 page 472 in Horn and Johnson, Matrix analysis, Cambridge University Press, 1990. In terms of Gaussian random vectors, if ${X=(Y,Z)}$ is a Gaussian random vector of ${\mathbb{R}^n=\mathbb{R}^{n-m}\times\mathbb{R}^m}$ with ${1\leq m<n}$, with covariance matrix ${K}$, then

$Y\sim\mathcal{N}(0,A) \quad\text{and}\quad Z\sim\mathcal{N}(0,S) \quad\text{and}\quad \mathcal{L}(Z\,\vert\,Y)=\mathcal{N}(W^\top A^{-1}Y,T).$

More generally, for a non empty ${I\subset\{1,\ldots,n\}}$, we have, with ${J:=I^c}$,

$X_I\sim\mathcal{N}(0,K_{I,I}) \quad\text{and}\quad X_{J}\sim\mathcal{N}(0,K_{J,J})$

and the Schur complement of ${K_{J,J}}$ in ${K}$ is ${K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I}}$ and

$\mathcal{L}(X_I\,\vert\,X_{J}) \sim\mathcal{N}(K_{I,J}(K_{J,J})^{-1}X_{J},\,K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I}).$

Moreover, from the block inverse formula we get

$(K^{-1})_{I,I} = (K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I})^{-1}.$

The components of ${X_I}$ are conditionally independent given the remaining components iff ${((K^{-1})_{I,I})^{-1}}$ is diagonal, i.e. iff ${(K^{-1})_{I,I}}$ is diagonal. In particular, if ${I=\{i,j\}}$ with ${i\neq j}$ then ${X_i}$ and ${X_j}$ are conditionally independent given the remaining components iif ${(K^{-1})_{i,j}=0}$.

The extremal elements of the closed convex cone ${\overline{\mathcal{S}_n^+}}$ are the unit rank matrices ${\{uu^\top: u\in\mathbb{R}^n\}}$. Every ${A\in\overline{\mathcal{S}_n^+}}$ can be written as ${A=\lambda_1 u_1u_1^\top+\cdots+\lambda_n u_nu_n^\top}$ where ${\lambda_1,\ldots,\lambda_n}$ and ${u_1,\ldots,u_n}$ denote the spectrum and the eigenvectors of ${A}$ respectively. This is the Carathéodory local parametrization of the convex cone ${\overline{\mathcal{S}_n^+}}$ with ${n}$ Carathéodory conic rays ${\lambda_1 u_1u_1^\top,\ldots,\lambda_r u_nu_n^\top}$.

For any ${K\in\mathcal{S}_n}$, any ${1\leq i\leq j\leq n}$, and any ${x\in\mathbb{R}}$, let ${\xi_{i,j}(K,x)}$ be the element of ${\mathcal{S}_n}$ obtained from ${K}$ by replacing ${K_{i,j}}$ and ${K_{j,i}}$ by ${x}$. In particular,

$K=\xi_{i,j}(K,K_{i,j}).$

Now for which values of ${x}$ the matrix ${\xi_{i,j}(K,x)}$ belongs to ${\mathcal{S}_n^+}$? The answers is as follows.

Theorem 1 Set ${K\in\mathcal{S}_n^+}$, ${n\geq2}$, ${1\leq i\leq j\leq n}$, and ${x\in\mathbb{R}}$.

1. Off-diagonal. If ${i<j}$ then

$\xi_{i,j}(K,x)\in\mathcal{S}_n^+ \quad\text{iff}\quad \delta_{i,j}^-(K)<x<\delta_{i,j}^+(K)$

where ${\delta_{1,2}^{\pm}(K):=\pm\sqrt{K_{1,1}K_{2,2}}}$ if ${n=2}$, while if ${n>2}$,

$\delta_{i,j}^\pm(K):= \frac{-u^\top v\pm\left((u^\top v)^2- (A^{-1})_{i,i}(v^\top Bv-K_{j,j})\right)^{1/2}} {(A^{-1})_{i,i}}.$

with

$A:=K_{-j,-j},\quad B:=(A^{-1})_{-i,-i},\quad u:=A_{-i,i},\quad v:=K_{-\{i,j\},j}.$

2. Diagonal.

$\xi_{i,i}(K,x)\in\mathcal{S}_n^+ \quad\text{iff}\quad x > \delta_{i,i}(K),$

where

$\delta_{i,i}(K):=w^\top W^{-1}w$

with

$W:=K_{-i,-i}\quad w:=K_{-i,i}.$

In particular, ${\delta_{i,i}(K)<K_{i,i}}$ for any ${1\leq i\leq n}$ and ${\delta_{i,j}^-(K)< K_{i,j}<\delta_{i,j}^+(K)}$ for any ${1\leq i<j\leq n}$. Moreover, ${\xi_{i,j}(K,0)\in\mathcal{S}_n^+}$ iff ${v^\top Bv\leq K_{j,j}}$.

Proof: We prove the first part for ${n>2}$ and leave the remaining statements to the reader. The matrix ${A}$ is non singular since ${K}$ is non singular. Moreover, ${K\in\mathcal{S}_n^+}$ implies ${A^{-1}\in\mathcal{S}_n^+}$, and in particular ${(A^{-1})_{i,i}>0}$. The set

$\{x\in\mathbb{R}:\xi_{i,j}(K,x)\in\mathcal{S}_n^+\}$

is non empty and convex (an interval), as the intersection of an affine subspace with a convex cone, both containing ${K=\xi_{i,j}(K,K_{i,j})}$. For any ${x\in\mathbb{R}}$,

$(\xi_{i,j}(K,x))_{-j,-j}=K_{-j,-j}=A.$

By considering the Schur complement, the matrix ${\xi_{i,j}(K,x)}$ belongs to ${\mathcal{S}_n^+}$ iff

$\sum_{1\leq k,k’\leq n} z_k (A^{-1})_{k,k’} z_{k’} <K_{j,j},$

where ${z:=(\xi_{i,j}(K,x))_{-j,j}}$. By rewriting ${z}$ in terms of ${K}$, we get the equation

$(A^{-1})_{i,i}\,x^2+(2u^\top v)\,x+v^\top Bv-K_{j,j}<0.$

Recall that ${(A^{-1})_{i,i}>0}$. Next, since ${K=\xi_{i,j}(K,K_{i,j})}$ belongs to ${\mathcal{S}_n^+}$, the equation admits ${x=K_{i,j}}$ as a solution, and thus the discriminant

$(u^\top v)^2- (A^{-1})_{i,i}(v^\top Bv-K_{j,j})$

is positive. This leads to the non empty interval ${(\delta_{i,j}^-(K),\delta_{i,j}^+(K))}$. ☐

Let ${K\in\mathcal{S}_n^+}$ with ${n>2}$, and consider the notations of theorem 1. For any ${1\leq i<j\leq n}$, we define the matrices ${\xi_{i,j}^-(K)}$ and ${\xi_{i,j}^+(K)}$ by

$\xi_{i,j}^\pm(K):=\xi_{i,j}(K,\delta_{i,j}^\pm(K)).$

Similarly, for any ${1\leq i\leq n}$, we define

$\xi_{i,i}(K):=\xi_{i,i}(K,\delta_{i,i}(K)).$

The elements of

$\mathcal{E}(K):= \{\xi_{i,j}^\pm(K);1\leq i< j\leq n\} \cup\{\xi_{i,i}(K);1\leq i\leq n\}$

are not of full rank and belong to the boundary of ${\mathcal{S}_n^+}$. If ${E_1,\ldots,E_r\in\mathcal{E}(K)}$, and ${\lambda_1>0,\ldots,\lambda_r>0}$, then ${\lambda_1 E_1+\cdots+\lambda_r E_r\in\mathcal{S}_n^+}$. These sums construct from ${K}$ an open convex cone, included in ${\mathcal{S}_n^+}$, and containing ${K}$.

$\begin{pmatrix} 6 & 10 & 13 & * & * & * \\ 0 & 4 & -2 & 0 & * & * \\ 5 & 7 & 4 & -3 & * & * \\ * & -1 & 2 & 1 & -1 & * \\ * & * & * & 6 & 2 & * \\ * & * & * & * & * & 5 \end{pmatrix} \overset{?}{\rightarrow} \begin{pmatrix} 6 & 10 & 13 & 4 & 5 & 4 \\ 0 & 4 & -2 & 0 & 2 & 6 \\ 5 & 7 & 4 & -3 & 5 & 1 \\ -2 & -1 & 2 & 1 & -1 & 2 \\ 2 & 5 & -1 & 6 & 2 & 8 \\ -3 & 1 & 2 & 0 & 0 & 5 \end{pmatrix}$

If ${Z}$ is a Gaussian random vector of ${\mathbb{R}^n}$ with covariance matrix ${K}$, then for every non empty ${I\subset\{1,\ldots,n\}}$, the random vector ${Z_I:=(Z_i:i\in I)}$ of ${\mathbb{R}^I}$ is Gaussian, with covariance ${K_{I,I}}$.

Conversely, suppose that we prescribe the Gaussian marginal distributions of ${Z_{I_1},\ldots,Z_{I_r}}$ for non empty subsets ${I_1,\ldots,I_r\subset\{1,\ldots,n\}}$ with the natural compatibility conditions in case of overlaps. Can we find a Gaussian random vector ${Z}$ on ${\mathbb{R}^n}$ with such prescribed marginals? The problem reduces to the construction of a covariance matrix with prescribed diagonal blocs with possible overlaps: we prescribe the entries with index ${(i,j)\in I_1\times I_1\cup\cdots\cup I_r\times I_r}$.

In linear algebra, this is a matrix completion problem, more precisely a positive definite matrix completion problem. Among all possible solutions, we may try to maximize the entropy. Recall that the Boltzmann-Shannon entropy ${H(f)}$ of a probability density function ${f}$ on ${\mathbb{R}^n}$ is

$H(f):=-\int\!f(x)\,\log(f(x))\,dx.$

For a Gaussian distribution, we obtain

$H(\mathcal{N}(m,K))=\frac{n}{2}\log(2\pi e)+\log(\det(K)).$

Therefore, looking for a maximum entropy solution for our Gaussian problem with marginal constraints reduces to looking for a maximum-determinant matrix completion.

Let ${\mathcal{S}_n^+}$ be the set of ${n\times n}$ symmetric positive definite matrices. A very well known inequality attributed to Hadamard states that for every ${K\in\mathcal{S}_n^+}$,

$\det(K)\leq K_{1,1}\cdots K_{n,n}$

with equality if and only if ${K}$ is diagonal. This can be rephrased as the solution of a matrix completion problem: among the elements of ${\mathcal{S}_n^+}$ with prescribed diagonal, the diagonal matrix is the unique maximizer of the determinant. More generally, if one prescribes a subset of the ${n(n+1)/2}$ entries, can we complete the remaining entries in order to obtain an element of ${\mathcal{S}_n^+}$? Is there a unique solution with maximum determinant? How to compute it?

The prescribed entries can be though of as a weighted graph ${G=(V,E,w)}$, in other words as a map ${w:E\rightarrow\mathbb{R}}$ where ${E\subset\{\{i,j\}:i,j\in V\}}$ and ${V=\{1,\ldots,n\}}$. We may now define the set of matrix completions as

$\mathcal{C}_G:=\{K\in\mathcal{S}_n^+:K_{i,j}=w_{\{i,j\}}\ \forall \{i,j\}\in E\}.$

Each element of ${\mathcal{C}_G}$ is a complete weighted graphs extending ${(V,E,w)}$. Note that ${\{i\}\in E}$ means that ${G}$ prescribes the diagonal entry ${(i,i)}$ in the matrix.

Theorem 1 (Max-det matrix completions) If ${\{i\}\in E}$ for all ${i\in V}$ and if ${\mathcal{C}_G}$ is not empty, then there exists a unique ${K_*\in\mathcal{C}_G}$ such that

$\det(K_*)= \max_{K\in\mathcal{C}_G}\det(K).$

Moreover, ${K_*}$ is the unique element of ${\mathcal{C}_G}$ such that

$(K_*^{-1})_{i,j}=0 \quad\text{for all}\quad \{i,j\}\not\in E.$

Proof: The non empty convex set ${\mathcal{C}_G}$ is compact since ${\{i\}\in E}$ for all ${i\in V}$. Now the first statement follows from the strict concavity of the functional ${K\in\mathcal{S}_n^+\mapsto\log(\det(K))}$. The second statement follows from the first order conditions of optimality:

$K^{-1}=(\nabla\log\det)(K) \in \mathrm{span}\{e^{(i,j)}:\{i,j\}\in E\}$

where ${e^{(i,j)}}$ is the ${n\times n}$ symmetric matrix with ${1}$ at positions ${(i,j)}$ and ${(j,i)}$ and ${0}$ elsewhere. The gradient is relative to the Hilbert space of symmetric ${n\times n}$ matrices equipped with the scalar product ${A\cdot B:=\mathrm{Tr}(AB)}$. ☐

If there exists ${i\in V}$ with ${\{i\}\not\in E}$ and if ${\mathcal{C}_G}$ is not empty, then one can show that ${\mathcal{C}_G}$ is not compact and the function ${\det}$ is unbounded on ${\mathcal{C}_G}$. However, one can force compactness by adding the constraint ${\mathrm{Tr}(K)\leq \tau}$ or the constraint ${\mathrm{Tr}(K^2)\leq\tau}$ for some fixed real number ${\tau>0}$, and derive a result similar to theorem 1, see Grone, Johnson, Sá, Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124.

Theorem 1 does not provide a condition on ${(V,E,w)}$ ensuring that ${\mathcal{C}_G}$ is not empty. An answer is given by the following result. We recall that a cycle of length ${r\geq1}$ is a sequence ${i_1,\ldots,i_{r+1}\in V}$ such that ${\{i_k,i_{k+1}\}\in E}$ for all ${1\leq k\leq r}$ and ${i_{r+1}=i_1}$. A cycle is irreducible if it does not contain a cycle of smaller length.

Theorem 2 (Existence of matrix completions) Suppose that for every clique of ${(V,E)}$ the symmetric matrix associated to this clique and constructed from ${w}$ is positive definite. Then ${\mathcal{C}_G}$ is not empty iff ${(V,E)}$ does not have irreducible cycles of length ${>3}$.

One can find a proof of theorem 2 in Grone, Johnson, Sá, Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124. The remarkable feature of theorem 2 is that it reduces the existence of a matrix completion to a purely graphical condition. Graphs without irreducible cycles of lengths ${>3}$ are known as chordal graphs or triangulated graphs. Two examples: unions of complete graphs (i.e. the adjacency matrix is full block diagonal up to relabeling of the vertices), and band diagonal graphs (i.e. the adjacency matrix is full band diagonal up to relabeling of the vertices).

An ordering of the vertices is a bijection ${\ell:V\rightarrow V}$. We say that ${\ell}$ is a perfect elimination ordering of ${(V,E)}$ when for every ${i\in V}$, the set

$\{j\in V:\{i,j\}\in E,\ell(i)<\ell(j)\}$

is a clique of ${(V,E)}$. This is equivalent to say that ${(V,E)}$ is chordal. This leads to a recursive algorithm similar to the Gaussian elimination process for the construction of the solution ${K_*}$ in theorem 1, see e.g. Grone, Johnson, Sá, Wolkowicz, Positive definite completions of partial Hermitian matrices, Linear Algebra Appl. 58 (1984), pp. 109-124 and Dahl, Vandenberghe, Roychowdhury, Covariance selection for nonchordal graphs via chordal embedding, Optim. Methods Softw. 23 (2008), no. 4, 501-520. For more references on matrix completions and max-det problems, see e.g. Vandenberg, Boyd, Wu, Determinant maximization with linear matrix inequalities constraints, SIAM J. Matrix Anal. Appl. 19 (1998), no. 2, 499-533.

For a Gaussian random vector ${Z}$ with covariance matrix ${K\in\mathcal{S}_n^+}$ and for every ${i\neq j\in\{1,\ldots,n\}}$, we have ${K^{-1}_{i,j}=0}$ if and only if ${Z_i}$ and ${Z_j}$ are independent conditionally on the remaining components of ${Z}$. This is easily seen from the Schur complement formula

$(K^{-1})_{I,I}=(K_{I,I}-K_{I,J}(K_{J,J})^{-1}K_{J,I})^{-1}$

for every non empty and disjoint ${I,J\subset\{1,\ldots,n\}}$. The Gaussian formulation of theorem 1 is a rather old result of statistics, see e.g. Dempster, Covariance Selection, Biometrics 28 (1972), no. 1, pp. 157-175. It constitutes a basic element of graphical models, see e.g. the books Whittaker, Graphical models in applied multivariate statistics, Wiley, 1990 and Lauritzen, Graphical models, Oxford University Press, 1996. The algorithm for the computation of ${K_*}$ can be seen as a flavor of the Iterated (or Iterative) Proportional Fitting (IPF) algorithm developed at the origin for contingency tables. The IPF for Gaussian distributions is studied for instance in Cramer, Conditional iterative proportional fitting for Gaussian distributions, J. Multivariate Anal. 65 (1998), no. 2, 261-276.

I traveled to  Austria this summer, for vacations. It was impossible to avoid visiting the Zentralfriedhof of Vienna, where one can contemplate the grave of a great viennese scientist: Ludwig Boltzmann. His ideas in mathematical physics were amazingly original, beautiful, and deep.  He understood the power of melting probability and partial differential equations before the rigorous developments of these fields in mathematics and physics. His elementary derivation of the entropy formula from the combinatorics of microstates is also striking,  years before the formulation of the quantum hypothesis by Max Planck and the concept of information by Claude Shannon. Unfortunately, his ideas were not well received at that time, and he eventually committed suicide in Italy after a severe depression. Nevertheless, his work had an enormous impact on science. In 2010, Cédric Villani received a Fields Medal partly for his study of the Boltzmann equation of kinetic theory of gases. You may take a look at his paper on the Cercignani conjecture and the more recent paper by Desvillettes, Mouhot, and Villani.

In the nineteenth century, probability theory was not yet developed, and stochastic models were called statistical models. That is probably why we say statistical mechanics and statistical physics.  You may take a look at the pretty interesting small old book by Paul and Tatiana Ehrenfest.

Syntax · Style · Tracking & Privacy.