The Hoffman-Wielandt inequality is one of my favorite spectral inequalities in linear algebra. Before stating it, let us recall some basic concepts and objects.

Normal matrices. A matrix ${A\in\mathcal{M}_{n,n}(\mathbb{C})}$ is normal when ${AA^*=A^*A}$, where ${A^*}$ is its conjugate-transpose. This is equivalent to say that there exists a unitary matrix ${U}$ such that ${UAU^*}$ is diagonal (and the diagonal elements are precisely the eigenvalues of ${A}$). Every Hermitian and every unitary matrix is normal. A nilpotent matrix is never normal.

Trace norm. The trace norm on ${\mathcal{M}_{n,n}(\mathbb{C})}$ is defined for every ${A\in\mathcal{M}_{n,n}(\mathbb{C})}$ by

$\left\Vert A\right\Vert^2 :=\mathrm{Tr}(AA^*) =\mathrm{Tr}(A^*A) =\sum_{1\leq i,j\leq n} |A_{i,j}|^2.$

This norm is also known as the Frobenius norm, the Schur norm, or the Hilbert-Schmidt norm. As for the operator norm, the trace norm is unitary invariant, in the sense that ${\left\Vert UAV\right\Vert=\left\Vert A\right\Vert}$ for any ${n\times n}$ unitary matrices ${U}$ and ${V}$ and any ${A\in\mathcal{M}_{n,n}(\mathbb{C})}$. The advantage of the trace norm over the operator norm lies in its convenient expression in terms of the matrix entries. Moreover, this norm is Hilbertian for the Hermitian product ${\left<A,B\right>=\mathrm{Tr}(AB^*)}$.

Hoffman-Wielandt inequality. It states that if ${A,B\in\mathcal{M}_{n,n}(\mathbb{C})}$ are normal, with respective eigenvalues ${\lambda_1(A),\ldots,\lambda_n(A)}$ and ${\lambda_1(B),\ldots,\lambda_n(B)}$, then, denoting by ${\Pi}$ the permutation group of ${\{1,\ldots,n\}}$,

$\min_{\pi\in\Pi}\sum_{i=1}^n\left|\lambda_i(A)-\lambda_{\pi(i)}(B)\right|^2 \leq\left\Vert A-B\right\Vert^2.$

In other words, the optimal matching of the spectra of ${A}$ and ${B}$ beats the trace norm. Let us give a proof of the Hoffman-Wielandt inequality. We have

$C:=UAU^*=\mathrm{diag}(\lambda_1(A),\ldots,\lambda_n(A))$

and

$D:=VBV^*=\mathrm{diag}(\lambda_1(B),\ldots,\lambda_n(B))$

for some ${n\times n}$ unitary matrices ${U}$ and ${V}$. By unitary invariance, we have

$\left\Vert A-B\right\Vert^2 =\left\Vert U^*CU-V^*DV\right\Vert^2 =\left\Vert CUV^*-UV^*D\right\Vert^2 =\left\Vert CW-WD\right\Vert^2$

where ${W:=UV^*}$. This gives, denoting ${P:=(|W_{i,j}|^2)_{1\leq i,j\leq n}}$,

$\left\Vert A-B\right\Vert^2 =\sum_{i,j=1}^n\left|(CW)_{i,j}-(WD)_{i,j}\right|^2 =\sum_{i,j=1}^nP_{i,j}\left|\lambda_i(A)-\lambda_j(B)\right|^2.$

The expression above is linear in ${P}$. Moreover, since ${W}$ is unitary, the matrix ${P}$ is doubly stochastic. If ${\mathcal{P}}$ is the set of ${n\times n}$ doubly stochastic matrices then

$\left\Vert A-B\right\Vert^2 \geq\inf_{Q\in\mathcal{P}}\Phi(Q) \quad\text{where}\quad \Phi(Q):=\sum_{i,j=1}^nQ_{i,j}\left|\lambda_i(A)-\lambda_j(B)\right|^2.$

But ${\Phi}$ is linear and ${\mathcal{P}}$ is convex and compact, and thus the infimum above is achieved for some extremal point ${Q}$ of ${\mathcal{P}}$. Now a celebrated theorem due to Birkhoff and von Neumann states that the extremal points of ${\mathcal{P}}$ are exactly the permutation matrices. Recall that ${P\in\mathcal{P}}$ is a permutation matrix when for a permutation ${\pi\in\Pi}$ where ${\Pi}$ is the symmetric group, we have ${P_{i,j}=\delta_{\pi(i),j}}$ for every ${i,j\in\{1,\ldots,n\}}$. This gives the final step of the proof:

$\left\Vert A-B\right\Vert^2 \geq \min_{\pi\in\Pi}\sum_{i=1}^n\left|\lambda_i(A)-\lambda_{\pi(i)}(B)\right|^2.$

Coupling distance. Recall that the coupling distance between two probability measures ${\eta_1,\eta_2}$ on ${\mathbb{C}}$ with finite second moment is defined by

$d(\eta_1,\eta_2):=\inf\sqrt{\mathbb{E}(|X_1-X_2|^2)}$

where the infimum runs over the set of couple of random variables ${(X_1,X_2)}$ on ${\mathbb{C}\times\mathbb{C}}$ with ${X_1\sim\eta_1}$ and ${X_2\sim\eta_2}$. In the case where ${\eta_1=\frac{1}{n}\sum_{i=1}^n\delta_{a_i}}$ and ${\eta_2=\frac{1}{n}\sum_{i=1}^n\delta_{b_i}}$ where ${(a_i)_{1\leq i\leq n}}$ and ${(b_i)_{1\leq i\leq n}}$ are sequences in ${\mathbb{C}}$, we have, using the Birkhoff and von Neuman theorem on doubly stochastic matrices,

$d(\eta_1,\eta_2)^2= \frac{1}{n}\min_{\pi\in\Pi}\sum_{i=1}^n\left|a_i-b_{\pi(i)}\right|^2.$

This allows to rewrite the Hoffman-Wielandt inequality as

$nd^2(\mu_A,\mu_B)\leq \left\Vert A-B\right\Vert^2$

where

$\mu_A:=\frac{1}{n}\sum_{i=1}^n\delta_{\lambda_i(A)} \quad\text{and}\quad \mu_B:=\frac{1}{n}\sum_{i=1}^n\delta_{\lambda_i(B)}.$

Note that if ${(a_i)_{1\leq i\leq n}}$ and ${(b_i)_{1\leq i\leq n}}$ are in ${\mathbb{R}}$ then

$\min_{\pi\in\Pi}\sum_{i=1}^n(a_i-b_{\pi(i)})^2 =\sum_{i=1}^n(a_{(i)}-b_{(i)})^2.$

where ${a_{(1)}\leq\cdots\leq a_{(n)}}$ and ${b_{(1)}\leq\cdots\leq b_{(n)}}$ are the monotone reordering of both sequences. To see it, we may assume that the first sequence is already ordered as ${a_1\leq\cdots\leq a_n}$, and we observe that if ${b_i>b_{i+1}}$ for some ${i}$, then

$\begin{array}{rcl} (a_i-b_i)^2+(a_{i+1}-b_{i+1})^2 &=&(a_i-b_{i+1})^2+(a_{i+1}-b_i)^2+2(b_i-b_{i+1})(a_{i+1}-a_i)\\ &\geq& (a_i-b_{i+1})^2+(a_{i+1}-b_i)^2, \end{array}$

which allows to reorder the second sequence by using successive transpositions.

Note. Helmut Wielandt (1910 – 2001) devoted his doctoral thesis and habilitation to permutation groups. He has a book on finite permutation groups. From 1941 he did research on meteorology, cryptology and aerodynamics. In 1942 he was attached to the Kaiser Wilhelm Institute and the Aerodynamics Research Institute at Göttingen. He said “… I had to work on vibration problems. I am indebted to that time for valuable discoveries: on the one hand the applicability of abstract tools to the solution of concrete problems, on the other hand, the – for a pure mathematician – unexpected difficulty and unaccustomed responsibility of numerical evaluation. It was a matter of estimating eigenvalues of non-self-adjoint differential equations and matrices.” Source: MacTutor.

Note. Alan Hoffman (1924 – ) is also famous for the Hoffman-Singleton graph.

One Comment

1. Minghua 2013-01-09

I’d like to know something about analogous result for trace norm or Schattern p norm.