# Month: February 2012

Curieusement, les universitaires français parlent relativement peu de l’accord commercial anti contrefaçon ACTA signé par plusieurs États récemment.  Il s’agit pourtant d’un grand danger, pesant, au nom du profit, sur toute la production culturelle non marchande. Les marchands n’ont jamais été aussi puissants, et ils ne cessent d’imposer leurs dogmes aux démocraties.

À travers les âges, les marchands ont joué un rôle de passeurs entre les peuples, favorisant la diffusion de tout ce qui fait les cultures humaines, religions comprises. Mais depuis la fin du siècle dernier, l’apparition d’Internet et des outils de télécommunication numériques a permis un contact plus direct inédit entre les producteurs de culture et d’information et leurs récepteurs. Cette révolution électronique menace certains marchands comme les éditeurs, mais en suscite d’autres comme les fournisseurs d’accès. Le rôle culturel des marchands est moins important aujourd’hui, et leurs pratiques en deviennent socialement  nuisibles.

C’est dans ce contexte que Richard Matthew Stallman (rms) a visité la France récemment. Son discours est toujours aussi absolu, polarisant. Sa pensée libertaire de gauche bien mûrie fait parfois sourire mais ne manque pas de charme. Pour rms, les œuvres numérisées gagnent à être copiées, pourvu que le nom du créateur soit conservé. Selon lui, les créateurs devraient être rémunérés directement par ceux qui apprécient leurs œuvres, et c’est ceux qui donnent qui décident combien donner(*). La copie permet la grande diffusion, et le réseau permet la rémunération directe.  Internet permet de développer cette liaison directe entre les créateurs et leur public, tandis que des marchands, rendus  inutiles, font tout pour tuer dans l’œuf cette révolution. D’autres marchands, plus avisés, tentent de construire leur activité lucrative autour de cette révolution, avec tous les excès dont la cupidité est responsable.

Il va sans dire que rms considère que les pratiques des sociétés comme Apple, Facebook, Google, et autres very-big-brothers numériques sont néfastes. Cependant, Google par exemple est contre ACTA et participe au développement du logiciel libre (noyau Linux, Python, …).

(*) c’est par exemple le mode de financement de Wikipédia, un grand succès d’Internet.

Many mathematicians believe that in concrete numerical applications such as image, sound, and video compression, wavelets have superseded the more classical Fourier transform. This is not really the case, unfortunately, for various reasons, ranging from non competitive implementations to poor standardizations. A symptomatic example is given by the JPEG image format for lossy compression. Our .jpg or .jpeg files are based on the following algorithm: the image is partitioned into square blocs, and on each bloc, a spatial discrete Fourier transform is computed (actually a Discrete Cosine Transform, DCT), a frequency filter is then applied to kill hardly visible frequencies (this is the lossy part), and finally a lossless algorithm (Huffman) is used to compress the result. Regarding sound compression, our .mp3 music files (actually MPEG-1 MPEG-2 audio layer 3) are based on similar ideas. At the end of the past century, a new JPEG format called JPEG2000 was designed, involving an algorithm in which the discrete Fourier transform is replaced by a discrete wavelet transform and in which the lossless final step is performed with more efficient entropic coding. JPEG2000 is not the sole image format based on wavelets, and one can mention the more specialized ECW format for instance. Sadly, JPEG2000 has not reached the expected success, and very few people are using .jp2|.jpx files.

Regarding wavelets for video compression, the situation is not really better. One may read the blog post dated 2010-02-26 by Jason Garrett-Glaser, an x264 developer. Recall that x264 is a free software library used in the VideoLAN projet (VLC media player!) to encode H.264 video streams (MPEG-4 AVC). In particular, he ends his post by saying:

« JPEG2000 is a classic example of wavelet failure: despite having more advanced entropy coding, being designed much later than JPEG, being much more computationally intensive, and having much better PSNR, comparisons have consistently shown it to be visually worse than JPEG at sane filesizes.  By comparison, H.264′s intra coding, when used for still image compression, can beat JPEG by a factor of 2 or more (I’ll make a post on this later).  With the various advancements in DCT intra coding since H.264, I suspect that a state-of-the-art DCT compressor could win by an even larger factor. Despite the promised benefits of wavelets, a wavelet encoder even close to competitive with x264 has yet to be created.  With some tests even showing Dirac losing to Theora in visual comparisons, it’s clear that many problems remain to be solved before wavelets can eliminate the ugliness of block-based transforms once and for all. »

This shows the long path from seductive theory to concrete applications. We will see in the forthcoming decades if the wavelets paradigm will beat optimized Fourier transforms. Data compression is a rapidly evolving and quite active domain lying between applied mathematics, computer science, and information technology. Finally, the described phenomenon is quite common in applied mathematics, and is not specific to wavelets. Other instances are for example Model Selection in Statistics and Compressive Sensing in Information Technology.

Note: Every practitioner knows that comparing algorithms on datasets is a tricky task. You may also read How to cheat on video encoder comparisons, by J. Garrett-Glaser, dated 2010-06-21.

I had the pleasure to upload recently on arXiv and on HAL a collaborative work with Charles Bordenave and Pietro Caputo, entitled Spectrum of Markov generators on sparse random graphs.

Let ${X=(X_{ij})_{1\leq i,j\leq n}}$ be a random matrix in ${\mathbb{C}}$ whose entries are i.i.d. with mean ${m}$, covariance matrix ${K=\mathrm{Cov}(\Re X_{11},\Im X_{11})}$, and variance ${\mathrm{Tr}(K)=1}$. The sparse regime is obtained by allowing the law of ${X_{11}}$ to depend on ${n}$ with ${\mathrm{Tr}(K)\rightarrow0}$ as ${n\rightarrow\infty}$, the basic example being the adjacency matrix of Erdös-Rényi random graphs. The analysis of the sparse regime was promoted by Charles Bordenave, who has a pretty Hungarian mathematical soul. However, to simplify the exposition, this blog post restricts to the non sparse regime, which captures most of the rigid algebraic-geometric structure: we thus assume that the law of ${X_{11}}$ does not depend on ${n}$. We consider the random matrix defined by

$L=X-D$

where ${D}$ is the diagonal matrix obtained from the row sums of ${X}$, namely ${D_{ii}=\sum_{k=1}^nX_{ik}}$. If ${X}$ is interpreted as the adjacency matrix of a weighted oriented graph, then ${L}$ is the associated Laplacian matrix, with zero row sums. In particular, if the weights ${X_{ij}}$ take values in ${[0,\infty)}$, then ${L}$ is the infinitesimal generator of the continuous time random walk on that graph, and properties of the spectrum of ${L}$ can be used to study its long-time behavior. Clearly, ${L}$ has non independent entries but independent rows. A related model is obtained by considering the stochastic matrix ${P=D^{-1}X}$, which corresponds to discrete time random walk (considered in arXiv:0808.1502). In order to analyze the spectrum of ${L}$, it is more convenient to introduce the following affine transformation of ${L}$:

$M =\frac{L+nmI}{\sqrt{n}} =\frac{X}{\sqrt{n}}-\frac{D-nmI}{\sqrt{n}}.$

By the central limit theorem, the distribution of ${n^{-1/2}(D_{ii}-nm)}$ converges to the Gaussian law ${\mathcal{N}(0,K)}$. Combined with the circular law for ${n^{-1/2}X}$, this suggests the interpretation of the spectral distribution of ${M}$, in the limit ${n\rightarrow\infty}$, as an additive Gaussian deformation of the circular law. In a sense, our model is a non-Hermitian analogue of a model already studied by Bryc, Dembo, and Jiang years ago with the method of moments.

Basic notations and concepts. Recall that if ${A}$ is an ${n\times n}$ matrix, we denote by ${\lambda_1(A),\ldots,\lambda_n(A)}$ its eigenvalues, i.e. the roots in ${\mathbb{C}}$ of its characteristic polynomial. We label them in such a way that ${|\lambda_1(A)|\geq\cdots\geq|\lambda_n(A)|}$. We denote by ${s_1(A),\ldots,s_n(A)}$ the singular values of ${A}$, i.e. the eigenvalues of the Hermitian positive semidefinite matrix ${| A |= \sqrt{A^*A}}$, labeled so that ${s_1(A)\geq\cdots \geq s_n(A) \geq 0}$. The operator norm of ${A}$ is ${\| A \|= s_1(A)}$ while the spectral radius is ${|\lambda_1(A)|}$. We define the discrete probability measures

$\mu_A=\frac{1}{n}\sum_{k=1}^n\delta_{\lambda_k(A)} \quad\text{and}\quad \nu_A=\mu_{|A|} =\frac{1}{n}\sum_{k=1}^n\delta_{s_k(A)}.$

In the sequel, ${G}$ is a Gaussian random variable on ${\mathbb{R}^2\cong\mathbb{C}}$ with law ${\mathcal{N}(0,K)}$ i.e. mean ${0}$ and covariance matrix ${K}$. This law has a Lebesgue density on ${\mathbb{R}^2}$ if and only if ${K}$ is invertible, given by ${z=(x,y)\mapsto(2\pi\sqrt{\det(K)})^{-1}\exp(-\frac{1}{2}<(x,y)^{\top}K^{-1}(x,y)>)}$. Note that ${K}$ is not invertible when ${X_{11}}$ is supported in ${\mathbb{R}}$.

Singular values of shifts. Our first result concerns the singular values of shifts of the matrix ${M}$, a useful proxy to the eigenvalues. It states that for every ${z\in\mathbb{C}}$, there exists a probability measure ${\nu_z}$ on ${\mathbb{R}_+}$ which depends only on ${z}$ and ${K}$ such that with probability one,

$\nu_{M-zI} \underset{n\rightarrow\infty}{\longrightarrow} \nu_z.$

Moreover, the limiting law ${\nu_z}$ is characterized as follows: its symmetrization ${\check \nu_z}$ is the unique symmetric probability measure on ${\mathbb{R}}$ with Cauchy-Stieltjes transform satisfying, for every ${\eta\in\mathbb{C}_+=\{z\in\mathbb{C}:\Im(z)>0\}}$,

$S_{\check \nu_z}(\eta) =\int_{\mathbb{C}}\!\frac{1}{z-\eta}\,d\check\nu(z) =\mathbb{E}\left(\frac{S_{\check \nu_z}(\eta)+\eta } {|G-z|^2-(\eta+ S_{\check \nu_z}(\eta))^2}\right).$

It is in fact classical to express the Cauchy-Stieltjes transform of the limiting singular values distribution as a fixed point of a non linear equation, which comes from a recursion on the trace of the resolvent exploiting the recursive structure of the model. The real difficulty in the proof of the result above lies in the fact the entries of ${L}$ are dependent (but asymptotically independent).

Eigenvalues convergence. The next result concerns the eigenvalues of ${M}$:

$\mu_{M} \underset{n\rightarrow\infty}{\longrightarrow} \mu$

where ${\mu}$ is the probability measure on ${\mathbb{C}}$ defined by

$\mu=\frac{1}{2\pi}\Delta\int_0^\infty\!\log(t)\,d\nu_z(t),$

where the Laplacian ${\Delta=\partial_z\partial_{\overline z}=\partial_x^2+\partial_y^2}$ is taken in the sense of Schwartz-Sobolev distributions in the space ${\mathcal{D}'(\mathbb{R}^2)}$. The limiting distribution ${\mu}$ is independent of the mean ${m}$ of ${X_{11}}$, and this is rather natural since shifting the entries produces a deterministic rank one perturbation. As in other known circumstances, a rank one additive perturbation produces essentially a single outlier, and therefore does not affect the limiting spectral distribution. Our proof of the convergence to ${\mu}$ is inspired from the logarithmic potential approach developed by Tao and Vu for the standard circular law (see also arXiv:1109.3343). As usual, the main difficulty lies in the control of the small singular values of shifts ${M-zI}$, in particular the norm of the resolvent. We solve this difficulty by using essentially the techniques developed by Rudelson and Vershynin and others.

Rigid analysis of the limit. To obtain further properties of ${\mu}$, we turn to a flavor of free probability extended to possibly unbounded operators to interpret ${n^{-1/2}(D-nmI)}$ as ${n\rightarrow\infty}$. Following Brown and Haagerup and Schultz, one can define a large operator ${\star}$-algebra in which each element ${a}$ has a Brown spectral measure denoted ${\mu_a}$, that is the probability measure on ${\mathbb{C}}$ given by

$\mu_a=\Delta\int_0^\infty\!\log(s)\,d\nu_{|a-z|}(s)$

where ${|b|=\sqrt{bb^*}}$ (i.e. the square root of the self-adjoint operator ${bb^*}$). If ${a}$ is normal (i.e. ${aa^*=a^*a}$) then its Brown measure ${\mu_a}$ coincides with its usual spectral measure. Now let ${c}$ and ${g}$ be ${\star}$-free operators with ${c}$ circular, and ${g}$ normal (i.e. ${gg^*=g^*g}$) with spectral measure equal to the Gaussian law ${\mathcal{N}(0,K)}$. Then we are able to show that

$\nu_z = \mu_{|c + g – z|} \quad \text{and} \quad \mu = \mu_{c+g}.$

Having identified the limit law ${\mu}$, we obtain some additional information on it. Namely, we use the concept of subordination developed by Biane and Voiculescu, which allows to show that the support of ${\mu}$ is given by

$\mathrm{Supp}(\mu) = \left\{z \in \mathbb{C} : \mathbb{E}\left(\frac{1}{|G-z|^2}\right)\geq 1\right\}.$

Moreover, there exists a unique function ${f : \mathrm{Supp} (\mu) \rightarrow [0,1]}$, which is ${C^\infty}$ in the interior of ${\mathrm{Supp} (\mu)}$, such that for all ${z\in \mathrm{Supp} (\mu)}$,

$\mathbb{E}\left[\frac{1}{|G-z|^2 + f(z)^2}\right]=1.$

Moreover ${\mu}$ is absolutely continuous in ${\mathbb{C}}$ with density given by

$z\mapsto \frac{1}{\pi} f(z)^2 \mathbb{E}\left[\Phi(G,z)\right] + \frac{1}{\pi} \frac{\left[\mathbb{E}\left[(G-z)\Phi(G,z)\right]\right|^2}{\mathbb{E}\left[\Phi(G,z)\right]}$

where

$\Phi(w,z):=\frac{1}{(|w-z|^2 + f(z)^2)^2}.$

It can be seen that ${\mu}$ is rotationally invariant when ${K}$ is a multiple of the identity, while this is not the case if ${X_{11}}$ is supported in ${\mathbb{R}}$, in which case ${K_{22}=K_{12}=K_{21}=0}$ (in this case ${G}$ does not have a density on ${\mathbb{C}}$ since ${K}$ is not invertible). Note also that the support of ${\mu}$ is unbounded since it contains the support of ${\mathcal{N}(0,K)}$, and thus ${\mathrm{Supp}(\mu)=\mathbb{C}}$ if ${K}$ is invertible. If ${K}$ is not invertible, it can be checked that the boundary of ${\mathrm{Supp}(\mu)}$ is

$\left\{z \in \mathbb{C} : \mathbb{E}\left(\frac{1}{|G-z|^2}\right) = 1\right\}$

On this set, ${f(z) = 0}$, but the formula for the density above shows that the density does not vanish there. This phenomenon, not unusual for Brown measures, occurs for the circular law and more generally for ${R}$-diagonal operators, see Haagerup and Larsen. Our formula above for the density is slightly more explicit than the formulas given in Biane and Lehner. The subordination formula that we use can also be used to compute more general Brown measures of the form ${\mu_{a + c}}$ with ${a, c}$ ${\star}$-free and ${c}$ circular.

Spectrum localization. The convergence of ${\mu_M}$ suggests that the bulk of the spectrum of ${L}$ is concentrated around the value ${-mn}$ in a two dimensional window of width ${\sqrt{n}}$. Actually, it is possible to localize more precisely the support of the spectrum, by controlling the extremal eigenvalues of ${L}$. Recall that ${L}$ has always the trivial eigenvalue ${0}$. We define for convenience the centered matrices ${\underline X = X-\mathbb{E}X}$ and ${\underline D=D-\mathbb{E}D}$ and ${\underline L=L-\mathbb{E}L=\underline X-\underline D}$. If ${J}$ stands for the ${n\times n}$ matrix with all entries equal to ${1}$, then

$\mathbb{E}L = L-\underline L= mJ – mnI.$

Now the idea is that if ${\mathbb{E}(|X_{11}^4|)<\infty}$ then by Bai and Yin theorem, the operator norm of ${\underline X}$ is ${\sqrt{n}\,(2+o(1))}$. On the other hand, from the central limit theorem one expects that the operator norm and the spectral radius of the diagonal matrix ${\underline D}$ are of order ${\sqrt{2n\log(n)}\,(1+o(1))}$ as for maximum of i.i.d. Gaussian random variables. We show indeed that if ${X_{11}}$ is supported in ${\mathbb{R}_+}$ and if ${\mathbb{E}(|X_{11}|^4)<\infty}$ then with probability one, for ${n \gg1}$, every eigenvalue ${ \underline \lambda}$ of ${\underline L}$ satisfies

$|\Re \underline \lambda| \leq \sqrt{2 n\log(n)}\,(1+o(1)) \quad\text{and}\quad |\Im \underline \lambda| \leq \sqrt{n}(2+o(1)) .$

Moreover, with probability one, for ${n \gg 1}$, every eigenvalue ${\lambda\neq 0}$ of ${L}$ satisfies

$|\Re \lambda + m n| \leq \sqrt{2 n \log(n)}\,(1+o(1)) \quad\text{and}\quad |\Im \lambda| \leq \sqrt{n}(2+o(1)).$

Our proof is simple and relies on classical perturbative methods in matrix analysis: refined Gershgorin and the Bauer-Fike theorems. If one defines a spectral gap ${\kappa}$ of the Markov generator ${L}$ as the minimum of ${|\Re \lambda|}$ for ${\lambda\neq 0}$ in the spectrum of ${L}$, then it follows that a.s.

$\kappa\geq mn – \sqrt{2n\log(n)}\,(1+o(1)).$

Invariant measure. We turn to the properties of the invariant measure of ${L}$. If ${X_{11}}$ is supported in ${\mathbb{R}_+}$ and if ${L}$ is irreducible, then from the Perron-Frobenius theorem, the kernel of ${L}$ has dimension ${1}$ and there is a unique vector ${\Pi\in(0,1)^n}$ such that ${L^\top\Pi = 0}$ and ${\sum_{i=1}^n\Pi_i =1}$. The vector ${\Pi}$ is the invariant measure of the Markov process with infinitesimal generator ${L}$. Actually, we show that a.s. for ${n \gg 1}$, the Markov generator ${L}$ is irreducible and

$\left\Vert\Pi – U_n\right\Vert_1 = \mathcal{O} \left(\sqrt{\frac{\log(n)}{n}}\right)=o(1).$

where ${U_n = \frac1n(1 ,\ldots,1)^\top}$ is the uniform probability distribution on the finite set ${\{1, \ldots , n\}}$ and where ${\left\Vert\cdot\right\Vert_1}$ is the total variation norm. Our proof relies on the remarkable Sylvester determinant theorem which states that if ${A\in\mathcal{M}_{p,q}}$ and ${B\in\mathcal{M}_{q,p}}$ are two rectangular matrices with swapped dimensions then

$\det(I_p+AB)=\det(I_q+BA).$

To understand it, recall that ${AB}$ and ${BA}$ have the same spectrum up to the multiplicity of the eigenvalue zero, and thus, their characteristic polynomials in ${z}$ are identical up to a multiplication by a power of ${z}$, which gives the result taking ${z=1}$. In particular, this formula allows to pass from a high dimensional problem to a one dimensional problem.

Interpolation. Our results for the matrix ${L=X-D}$ can be extended with minor modifications to the case of the matrix ${L_{(t)}=X-tD}$, where ${t\in\mathbb{R}}$ provided the law ${\mathcal{N}(0,K)}$ characterizing our limiting spectral distributions is replaced by ${\mathcal{N}(0,t^2K)}$. This gives back the circular law for ${t=0}$. One may also interpolate between the Gaussian and the circular laws by considering ${(1-t)X-tD}$ with ${t\in[0,1]}$ or other parametrizations.

Open problems.

• Almost sure convergence. The mode of convergence of spectral distributions is the weak convergence in probability. We believe that this can be upgraded to almost sure weak convergence, but this requires stronger bounds on the smallest singular values of ${M-zI}$ (i.e. norm of the resolvent). This is not a problem if ${X_{11}}$ has a bounded density.
• Sparsity. We are able to show that the results remain essentially available in the sparse case in which the law of ${X_{11}}$ depends on ${n}$. However, our treatment is not optimal. Note that an optimal answer in the sparse case is still pending even for the circular law model.
• Heavy tails. A different model for random Markov generators is obtained when the law of ${X_{11}}$ has heavy tails, with e.g. infinite first moment. In this context, we refer to arXiv:1006.1713 and to arXiv:1109.3343 for the spectral analysis of non-Hermitian matrices with i.i.d. entries, and to arXiv:0903.3528 for the case of reversible Markov transition matrices. It is natural to expect that, in contrast with the cases considered here, there is no asymptotic independence of the matrices ${X}$ and ${D}$ in the heavy tailed case.
• Spectral edge and spectral gap. Concerning the localization of the spectrum, it seems natural to conjecture the asymptotic behavior ${\kappa= mn – \sqrt{2n\log(n)}\,(1+o(1))}$ for the spectral gap, but we do not have a proof of the corresponding upper bound. In the same spirit, we believe that with probability one, with ${\underline L=L-\mathbb{E}L}$,

$\lim_{n\rightarrow\infty} \frac{s_1(\underline L)}{\sqrt{2 n\log(n)}} =\lim_{n\rightarrow\infty}\frac{|\lambda_1(\underline L)|}{\sqrt{2 n\log(n)}}=1,$

which contrasts with the behavior of ${\underline X}$ for which ${s_1/|\lambda_1|\rightarrow2}$ as ${n\rightarrow\infty}$.

In principle, a mathematician can communicate to the others his own work using solely public free access Internet academic sites (either personal or global). A famous example is the work of the Fields Medalist Grisha Perelman on Thurston’s geometrization conjecture published on arXiv.org. Even if arXiv.org is moderated, it does not provide a peer reviewing process. For historical and sociological reasons, most professional mathematicians publish their results in peer reviewed mathematical journals. Many of these journals are handled by capitalistic companies such as Elsevier, Springer, Taylor and Francis, etc. The aim of these companies is to make profit, and some of them are not willing to make knowledge freely accessible. Basically, the mathematical publications rely on three pillars: science, money, and human comedy:

• Science. Mathematicians use peer reviewing in order to improve science quality and to reduce pollution. Peer reviewing is the first step in the long transformation of live mathematics into “dead” mathematics which can be considered as true and certified. This does not mean that peer reviewing is good for everything. Mathematicians need also media and space to discuss ideas. Internet has good tools for such live interactions, such as MathOverflow, a sort of modern version of the venerable sci.math usenet newsgroup. Actually, MathOverflow implements some kind of peer rating, while keeping freedom.
• Money. Everything has a cost. Non free journals are actually paied by readers via their institution. Freely accessible journals or sites have also a cost, but the circuit of money is different. For instance, the cost of arXiv.org is supported mostly by North American and European Universities and Research Centers. Other circuits of money are possible. For instance, some journals are freely accessible but the authors have to pay at publication time (some private companies such as Springer propose this scheme to the authors of non free journals in order to make their papers freely accessible). Many prestigious journals are handled by capitalistic companies, asking for high prices to make profit. On the other hand, being handled by an academic institution does not guarantee a low price.
• Human comedy. Mathematicians are first of all humans, members of an academic community. The academic community needs a way to measure the merits of each individual in order to decide recognition (and thus promotion and funding). In general, recognition is gained by obtaining good or influential results, published in respected journals. Publishing in respected journals can be seen as a way to obtain a label of quality, a sort of delegation of evaluation. The respect or prestige of a journal comes from its history, its editorial board, and its content. The editorial and peer reviewing job of most peer reviewed mathematical journals (free or not) is done for free by mathematicians (they gain social recognition).

Many aspects are not specific to mathematics. Ideally, one can imagine the introduction of editorial labels in arXiv.org, attributed by editorial boards to papers at author request and after peer reviewing (these boards may leave completely capitalistic publishers). One can also imagine to make arXiv.org less static by adding on top of it a dynamic collaborative stack-exchange structure such as MathOverflow, allowing the discussion of each paper. These ideas are floating around for a while. Unfortunately, discussing these matters with many colleagues and with the Editors of certain leading journals reveals that the mathematical community is a bit conservative. However, it is remarkable that some few scientific leaders such as the Fields Medalist Tim Gowers are promoting such a revolution.