The Hoffman-Wielandt inequality is one of my favorite spectral inequalities in linear algebra. Before stating it, let us recall some basic concepts and objects.
Normal matrices. A matrix A∈Mn,n(C) is normal when AA∗=A∗A, where A∗ is its conjugate-transpose. This is equivalent to say that there exists a unitary matrix U such that UAU∗ is diagonal (and the diagonal elements are precisely the eigenvalues of A). Every Hermitian and every unitary matrix is normal. A non zero nilpotent matrix is never normal.
Trace norm. The trace norm on Mn,n(C) is defined for every A∈Mn,n(C) by
‖A‖2:=Tr(AA∗)=Tr(A∗A)=∑1≤i,j≤n|Ai,j|2.
This is also known as the Frobenius, Schur, or Hilbert-Schmidt norm. As for the operator norm A↦‖A‖2→2=max‖x‖=1‖Ax‖, the trace norm is unitary invariant, namely ‖UAV‖=‖A‖ for any n×n unitary matrices U and V and any A∈Mn,n(C). The advantage of the trace norm over the operator norm lies in its convenient expression in terms of the matrix entries. Moreover, this norm is Hilbertian for the Hermitian product ⟨A,B⟩=Tr(AB∗).
Hoffman-Wielandt inequality. It states that if A,B∈Mn,n(C) are normal, with respective eigenvalues λ1(A),…,λn(A) and λ1(B),…,λn(B), then, denoting by Sn the permutation group of {1,…,n},
minσ∈Snn∑i=1|λi(A)−λσ(i)(B)|2≤‖A−B‖2.
In other words, the optimal matching of the spectra of A and B beats the trace norm. Let us give the original proof of Hoffman and Wielandt (1953). We have
C:=UAU∗=diag(λ1(A),…,λn(A))
and
D:=VBV∗=diag(λ1(B),…,λn(B))
for some n×n unitary matrices U and V. By unitary invariance, we have
‖A−B‖2=‖U∗CU−V∗DV‖2=‖CUV∗−UV∗D‖2=‖CW−WD‖2
where W:=UV∗. This gives, denoting P:=(|Wi,j|2)1≤i,j≤n,
‖A−B‖2=n∑i,j=1|(CW)i,j−(WD)i,j|2=n∑i,j=1Pi,j|λi(A)−λj(B)|2.
The expression above is linear in P. Moreover, since W is unitary, the matrix P is doubly stochastic. If P is the set of n×n doubly stochastic matrices then
‖A−B‖2≥infQ∈PΦ(Q)whereΦ(Q):=n∑i,j=1Qi,j|λi(A)−λj(B)|2.
But Φ is linear and P is convex and compact, and thus the infimum above is achieved for some extremal point Q of P. Now a celebrated theorem due to Birkhoff and von Neumann states that the extremal points of P are exactly the permutation matrices. Recall that P∈P is a permutation matrix when for a permutation σ∈Sn where Sn is the symmetric group, we have Pi,j=δσ(i),j for every i,j∈{1,…,n}. This gives the final step of the proof:
‖A−B‖2≥minσ∈Snn∑i=1|λi(A)−λσ(i)(B)|2.
The same proof provides without much efforts an upper bound, namely
maxσ∈Snn∑i=1|λi(A)−λσ(i)(B)|2≥‖A−B‖2≥minσ∈Snn∑i=1|λi(A)−λσ(i)(B)|2.
Coupling distance. Recall that the coupling distance between two probability measures η1,η2 on C with finite second moment is defined by
d(η1,η2):=inf√E(|X1−X2|2)
where the infimum runs over the set of couple of random variables (X1,X2) on C×C with X1∼η1 and X2∼η2. In the case where η1=1n∑ni=1δai and η2=1n∑ni=1δbi where (ai)1≤i≤n and (bi)1≤i≤n are sequences in C, we have, using the Birkhoff and von Neuman theorem on doubly stochastic matrices,
d(η1,η2)2=1nminσ∈Snn∑i=1|ai−bσ(i)|2.
This allows to rewrite the Hoffman-Wielandt inequality as
nd2(μA,μB)≤‖A−B‖2
where
μA:=1nn∑i=1δλi(A)andμB:=1nn∑i=1δλi(B).
Note that if A and B are Hermitian then (ai)1≤i≤n and (bi)1≤i≤n are in R and
minσ∈Snn∑i=1(ai−bσ(i))2=n∑i=1(a(i)−b(i))2.
where a(1)≤⋯≤a(n) and b(1)≤⋯≤b(n) are the monotone reordering of both sequences. To see it, we may assume that the first sequence is already ordered as a1≤⋯≤an, and we observe that if bi>bi+1 for some i, then
(ai−bi)2+(ai+1−bi+1)2=(ai−bi+1)2+(ai+1−bi)2+2(bi−bi+1)(ai+1−ai)≥(ai−bi+1)2+(ai+1−bi)2,
which allows to reorder the second sequence by using successive transpositions.
Singular values. The Hoffman-Wielandt inequality is also available for singular values of rectangular matrices. Recall that the singular values decomposition (SVD) of a rectangular matrix A∈Mm,n(C) states that there exists a couple of unitary matrices U and V of sizes m×m and n×n such that the m×n matrix D=U∗AV is diagonal with D11≥⋯≥Drr≥0 where r=min(m,n) and Dii=0 if i>r. The non-negative real numbers D11,…,Drr are the singular values of A, denoted s1(A),…,sr(A). The largest singular value is nothing else but the operator norm s1(A)=max‖x‖=1‖Ax‖=‖A‖2→2. Note that ‖A‖2=Tr(AA∗)=s21(A)+⋯+s2r(A). The matrices √AA∗ and √A∗A have same spectrum s1(A),…,sr(A) up to multiplicity of 0. There is also a nice linearization: the eigenvalues of the (m+n)×(m+n) Hermitian matrix
˜A=(0AA∗0)
are given by −s1(A),…,−sr(A),0,…,0,sr(A),…,s1(A) where 0,…,0 is a sequence of m+n−2r=max(m,n)−min(m,n) zeros. The matrices √AA∗ and ˜A are Hermitizations of the singular values decomposition.
We are now ready to state the Hoffman-Wielandt inequality for singular values of rectangular matrices. If A,B∈Mm,n(C) with respective singular values s1(A)≥⋯≥sr(A) and s1(B)≥…≥sr(B) then, setting r=min(m,n),
r∑i=1(si(A)−si(B))2=minσ∈Srr∑i=1(si(A)−sσ(i)(B))2≤‖A−B‖2.
This follows immediately from the Hoffman-Wielandt inequality for the (m+n)×(m+n) Hermitian matrices ˜A and ˜B obtaiend by the linear Hermitization introduced above.
Non-normal matrices. As noticed by Hoffman and Wielandt, the inequality may not hold for non-normal matrices. Namely, take for instance
A=(010⋯0001⋯0⋮⋮⋮⋮000⋯1000⋯0)andB=(010⋯0001⋯0⋮⋮⋮⋮000⋯1εn00⋯0)
in Mn(R), where say ε=1/n. The matrix A is nilpotent, and A−B has rank one and small norm. We have ‖A−B‖2=ε2n→0. On the other hand λ1(A)=⋯=λn(A)=0, while Bn=εnIn which gives λk(B)=ε1/nne2iπk/n for all 1≤k≤n, and therefore, for any σ∈Sn, ∑nk=1|λk(A)−λσ(k)(B)|2=nε2/nn=ne2log(n)/n→∞, hence the contradiction. Hoffman and Wielandt gave the 2×2 counter example
A=(0004)andB=(−1−1+1+1),
A is normal, B is not, ‖A−B‖2=12, ∑2k=1(λk(A)−λσ(k)(B))2=16 for any σ∈Sn.
Infinite dimensions. The Hoffman-Wielandt inequality remains valid for infinite dimensional Hermitian or Hilbert-Schmidt operators, provided that the sequence of eigenvalues are defined in a suitable way. The proof can be based by reduction on the finite dimensional case. We refer to the work of Bhatia and Elsner cited below for details and references.
p-norms. Set p≥1. The Hoffman-Wielandt inequality for Hermitian matrices can be extended to p-norms. More precisely it can be shown that for any Hermitian matrices A,B∈Mn,n(C) with respective eigenvalues λ1(A)≥⋯≥λn(A) and λ1(B)≥⋯≥λn(B), we have
n∑i=1(λi(A)−λi(B))p≤‖A−B‖pp
where ‖⋅‖p=(sp1(⋅)+⋯+sn(⋅)p)1/p is the Schatten p-norm. This follows from a generalization by Lidskii and Wielandt of the Courant-Fischer minimax variational formulas to cumulative sums of ordered eigenvalues of Hermitian matrices, related to the majorization concept. The extension beyond Hermicity to normal matrices is problematic, and we refer to the work of Bhatia and Elsner cited below for details and references.
Related inequalities. There are plenty of inequalities on the variation of the spectrum by perturbation. For instance the Weyl interlacing inequality states that if A,B∈Mn,n(C) are Hermitian with respective eigenvalues λ1(A)≥⋯≥λn(A) and λ1(B)≥⋯≥λn(B) and if λ1(A+B)≥⋯≥λn(A+B) are the eigenvalues of A+B then λi+j−1(A+B)≤λi(A)+λj(B) for any 1≤i,j≤n with i+j−1≤n. They follow from the Courant-Fischer minimax variational formulas for the eigenvalues of Hermitian matrices. From Weyl's inequalities we get in particular that
max1≤i≤n|λi(A)−λi(B)|≤‖A−B‖2→2.
In the same spirit, and beyond Hermitian matrices, the Bauer-Fike inequality states that if A∈Mn,n(C) is diagonalizable, say A=PDP−1 with D diagonal, and if B∈Mn,n(C), then, for any eigenvalue μ of B, there exists an eigenvalue λ of A such that
|λ−μ|≤κ(P)‖A−B‖2→2
where κ(P)=‖P‖2→2‖P−1‖2→2=s1(P)sn(P) is the condition number of P. It follows from
0=det(B−μIn)=det(D−μIn)det((D−μIn)−1P−1(B−A)P+In).
Further reading. Here are some interesting references.
- Alan Hoffman and Helmut Wielandt. The variation of the spectrum of a normal matrix. Duke Math. J. 20, (1953). 37-39.
- Rajendra Bhatia and Ludwig Elsner. The Hoffman-Wielandt inequality in infinite dimensions. Proc. Indian Acad. Sci. Math. Sci. 104 (1994), no. 3, 483-494.
- Roger Horn, and Charles Johnson. Matrix analysis. Second edition. Cambridge University Press, Cambridge, 2013. xviii+643 pp. ISBN: 978-0-521-54823-6
- Roger Horn and Charles Johnson. Topics in matrix analysis. Corrected reprint of the 1991 original. Cambridge University Press, Cambridge, 1994. viii+607 pp. ISBN: 0-521-46713-6
- Rajendra Bhatia. Positive definite matrices. [2015] paperback edition of the 2007 original. Princeton Series in Applied Mathematics. Princeton University Press, Princeton, NJ, 2007. ix+254 pp. ISBN: 978-0-691-16825-8; 978-0-400-82778-7
- Rajendra Bhatia. Matrix analysis. Graduate Texts in Mathematics, 169. Springer-Verlag, New York, 1997. xii+347 pp. ISBN: 0-387-94846-5
- Bhatia, Rajendra. Perturbation bounds for matrix eigenvalues. Reprint of the 1987 original. Classics in Applied Mathematics, 53. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2007. xvi+191 pp. ISBN: 978-0-898716-31-3
- Tosio Kato. Perturbation theory for linear operators. Reprint of the 1980 edition. Classics in Mathematics. Springer-Verlag, Berlin, 1995. xxii+619 pp. ISBN: 3-540-58661-X
Note. Alan Hoffman (1924 - ) is also famous for the Hoffman-Singleton graph.
Note. Helmut Wielandt (1910 - 2001) devoted his doctoral thesis and habilitation to permutation groups. He wrote a book on finite permutation groups. From 1941 he did research on meteorology, cryptology and aerodynamics. In 1942 he was attached to the Kaiser Wilhelm Institute and the Aerodynamics Research Institute at Göttingen. He said ... I had to work on vibration problems. I am indebted to that time for valuable discoveries: on the one hand the applicability of abstract tools to the solution of concrete problems, on the other hand, the - for a pure mathematician - unexpected difficulty and unaccustomed responsibility of numerical evaluation. It was a matter of estimating eigenvalues of non-self-adjoint differential equations and matrices.'' Source: MacTutor.
I'd like to know something about analogous result for trace norm or Schattern p norm.
I've just added some extensions and comments, including something about p-norm versions.