Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

Phi-entropies

I have updated my brief survey article in the EVOL ANR encyclopedia on $latex \Phi$ entropy and $latex \Phi$ Sobolev inequalities.

Leave a Comment

Sherman inverse problem for Markov matrices

Let $latex X$ be an $latex n\times n$ complex matrix. The eigenvalues $latex \lambda_1(X), \ldots, \lambda_n(X)$ of $latex X$ are the roots in $latex \mathbb{C}$ of its characteristic polynomial. We label them in such a way that

$latex \displaystyle |\lambda_1(X)|\geq\cdots\geq|\lambda_n(X)|$

with growing phases. The spectral radius of $latex X$ is $latex \rho(X):=|\lambda_1(X)|$. The singular values

$latex \displaystyle s_1(X)\geq\cdots\geq s_n(X)$

of $latex X$ are the eigenvalues of the positive semi-definite Hermitian matrix $latex \sqrt{XX^*}$ where $latex X^*$ is the conjugate-transpose of $latex X$. We have thus $latex s_k(X)=\sqrt{\lambda_k(XX^*)}$. The operator norm of $latex X$ is $latex \Vert X\Vert_{2\to2}=s_1(X)$. Geometrically, the matrix $latex X$ maps the unit sphere to an ellipsoid, and the half lenghts of the principal axes of this ellipsoid are exactly the singular values of $latex X$.

H. Weyl has shown in 1949 that for every $latex k\in\{1,\ldots,n\}$, we have

$latex \displaystyle \prod_{i=k}^ns_k(X)\leq\prod_{i=k}^n|\lambda_i(X)| \quad\text{and}\quad \prod_{i=1}^k|\lambda_i(X)|\leq\prod_{i=1}^ks_k(X). $

This expresses a two ways logarithmic majorization. Note that we recover the radius/norm bound $latex \rho(X)\leq\Vert X\Vert_{2\to2}$. Note also that one may deduce the second Weyl inequalities from the first ones by using them for the inverse of $latex X+\varepsilon I$ (and conversely).  Note moreover that

$latex \displaystyle\prod_{i=1}^n|\lambda_i(X)|=\prod_{i=1}^ns_i(X)=|\det(X)|$.

In 1954, A. Horn has shown that any sequences of numbers in $latex \mathbb{C}$ and $latex \mathbb{R}_+$ satisfying to the Weyl inequalities and the ordering that we used are actually the eigenvalues and singular values of a complex matrix. This  solves positively a problem formulated by S. Sherman.

We say that $latex X$ is Markov (or stochastic) if $latex X$ has entries in $latex [0,1]$ and each row sums up to $latex 1$. The spectrum of $latex X$ is then included in the unit disc of the complex plane, contains $latex 1$, and is symmetric with respect to the real axis. With the Weyl-Sherman-Horn inverse problem in mind, one can ask about the possible set of singular values for a Markov matrix with prescribed  eigenvalues. One may also reverse the question. One has to take into account the additional symmetries due to the periods. Actually, instead of the singular values of $latex M$, it is maybe more natural, from a Markovian point of view, to consider the spectrum of $latex \sqrt{MM^{*\mu}}$ where $latex M^{*\mu}$ is the adjoint of $latex M$ for $latex \ell^2(\mu)$ where $latex \mu$ is the invariant measure of $latex M$ (when $latex M$ is irreducible). This remark leads to many other questions.

The content of this post is inspired from informal discussions with Laurent Miclo during the EVOL workshop held in Hammamet in May 2010.

Leave a Comment

Spectrum of a Markov matrix

Let $latex M$ be an $latex n\times n$ Markov matrix (e.g. a stochastic matrix). This means that $latex M$ has entries in $latex [0,1]$ and each row sums up to $latex 1$. Let $latex \lambda_1,\ldots,\lambda_n$ be the roots in $latex \mathbb{C}$ of its characteristic polynomial. Let us order them in such a way that $latex |\lambda_1|\geq\cdots\geq|\lambda_n|$ with growing phases. It is not difficult to show that $latex \{\lambda_1,\ldots,\lambda_n\}\subset\{z\in\mathbb{C}:|z|\leq1\}$, that $latex \lambda_1=1$, and that the spectrum of $latex M$ is symmetric with respect to the real axis. A good reference for the spectrum of Markov matrices is the book of Seneta.

Suppose from now on that $latex M$ is irreducible. Then the eigenspace associated to $latex \lambda_1$ has dimension $latex 1$ and $latex M$ admits a unique invariant probability measure $latex \mu$. One can show that the spectrum of $latex M$ is invariant by the rotation of angle $latex 2\pi/d$ where $latex d\in\{1,2,\ldots\}$ is called the period of $latex M$. In particular, there are exactly $latex d$ eigenvalues of unit module, and if $latex M$ is reversible, then its spectrum is real and $latex -1$ belongs to the spectrum if and only if $latex d=2$.

From now on, we suppose additionally that $latex M$ is aperiodic: $latex d=1$. Then $latex |\lambda_2|<1$. In this case, the spectral gap $latex 1-|\lambda_2|$  describes the worst exponential decay to the equilibrium $latex \mu$ in $latex \ell^2(\mu)$ for a Markov chain of kernel $latex M$. This corresponds to the control of the decay of the powers of $latex M-I$ by using its spectral radius.

Beyond the spectral gap, one may ask about some dynamical interpretation of the remaining eigenvalues. If $latex M$ is reversible, then  $latex M$ is self-adjoint in $latex \ell^2(\mu)$ and $latex 1-|\lambda_k|$ for $latex \lambda_k\not\in\{0,1\}$ describes the worst exponential decay to the equilibrium $latex \mu$ in $latex \ell^2(\mu)$ when the initial law of the chain is orthogonal to the eigenspaces of $latex M^\top$ associated to the eigenvalues of higher module. One may also think about the Chen and Saloff-Coste criterion for the $latex \ell^2$ cuttoff of reversible Markov chains. Beyond reversibility, this simple interpretation is still valid if $latex M$ is normal, i.e. if $latex MM^*=M^*M$ where $latex M^*$ is the adjoint of $latex M$ in $latex \ell^2(\mu)$.

The empirical spectral distribution of $latex M$ is the discrete probability measure

$latex \displaystyle\nu:=\frac{1}{n}\delta_{\lambda_1}+\cdots+\frac{1}{n}\delta_{\lambda_n}.$

For every $latex k\in\{0,1,2,\ldots\}$, we have the global spectral identity

$latex \displaystyle \int\!z^k\,d\nu(z)=\frac{1}{n}\sum_{i=1}^n\lambda_i^k=\frac{1}{n}\mathrm{Tr}(M^k)=\frac{1}{n}\sum_{i=1}^nM^k_{i,i}=\frac{1}{n}\sum_{i=1}^np(i,k)$

where $latex p(i,k):=M^k_{i,i}$ is the probability of a loop of lenght $latex k$ rooted at $latex i$ for a Markov chain of kernel $latex M$. This gives a global interpretation of the spectrum: the moments of the empirical spectral distribution of $latex M$ correspond to the mean loops probabilities for the chain.

Let $latex f:\{1,\ldots,n\}\to\mathbb{R}$ be some observable and $latex (X_n)_{n\geq1}$ be a Markov chain on $latex \{1,\ldots,n\}$ with kernel $latex M$. The ergodic theorem (strong law of large numbers for additive functionals) states that regardless of the initial law of the chain, we have

$latex \displaystyle \frac{f(X_0)+\cdots+f(X_{n-1})}{n}\overset{\text{a.s.}}{\underset{n\to\infty}{\longrightarrow}}\mu(f):=\int\!f\,d\mu=\sum_{k=1}^nf(k)\mu(k).$

Let $latex g:\{1,\ldots,n\}\to\mathbb{R}$ be such that $latex (M-I)g=f-\mu(f)$. Note that $latex L:=M-I$ is the generator of $latex M$ and $latex g$ solves the Poisson equation. Then the central limit theorem states that

$latex \displaystyle \sqrt{n}\left(\frac{f(X_0)+\cdots+f(X_{n-1})}{n}-\mu(f)\right)\overset{\text{law}}{\underset{n\to\infty}{\longrightarrow}}\mathcal{N}(0,\sigma(f)^2)$

where $latex \sigma(f)^2:=\mu(g^2)-\mu(g)^2$. Thus, if $latex M$ is reversible then

$latex \displaystyle\sigma(f)^2\leq \frac{\mu(f^2)-\mu(f)^2}{\mathrm{dist}(1,\{\lambda_2,\ldots,\lambda_n\})^2}.$

Note that when $latex M$ and $latex \mu$ are not known, one may estimate $latex \sigma(f)^2$ empirically. See e.g. the work of Trevezas and Limnios and references therein.

1 Comment

An algorithm

The algorithm (several instances may run in parallel):

  1. be curious and open minded, do retain concepts, beyond techniques
  2. when you catch a new idea (for instance in a seminar) try it on all your problems
  3. if your imagination fails to produce a promising problem, return to first step
  4. if the problem is already sparsely solved by others, write down an expository synthesis
  5. if not, never hesitate to learn new techniques to solve your problem
  6. work hard, generalize, specialize, follow your intuition, use simulations
  7. if the problem is still too difficult, seek for complementary friends
  8. if the problem is still unsolved, keep it in mind for future use
  9. it is time to take vacations… maybe
  10. return to first step (and never forget to eat and to sleep 😉

For a poor human like me, this algorithm is more a paradigm than a reality… 🙂

This reminds me a famous quotation from Paul Halmos: Don’t just read it; fight it! Ask your own questions, look for your own examples, discover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis?

Leave a Comment