{"id":21403,"date":"2025-02-14T17:47:13","date_gmt":"2025-02-14T16:47:13","guid":{"rendered":"https:\/\/djalil.chafai.net\/blog\/?p=21403"},"modified":"2025-03-15T09:25:51","modified_gmt":"2025-03-15T08:25:51","slug":"back-to-basics-johnson-lindenstrauss-lemma","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2025\/02\/14\/back-to-basics-johnson-lindenstrauss-lemma\/","title":{"rendered":"Back to basics : Johnson-Lindenstrauss lemma"},"content":{"rendered":"<figure id=\"attachment_21404\" aria-describedby=\"caption-attachment-21404\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Joram_Lindenstrauss\"><img loading=\"lazy\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2025\/02\/J.Lindenstrauss-300x205.jpeg\" alt=\"Photo of Joram Lindenstrauss (1936 - 2012)\" width=\"300\" height=\"205\" class=\"size-medium wp-image-21404\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2025\/02\/J.Lindenstrauss-300x205.jpeg 300w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2025\/02\/J.Lindenstrauss.jpeg 400w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-21404\" class=\"wp-caption-text\">Joram Lindenstrauss (1936 - 2012)<\/figcaption><\/figure>\n<p style=\"text-align:justify;\">This short post is about the Johnson-Lindenstrauss lemma for dimension reduction. It is an emblematic basic result of high-dimensional probability or asymptotic geometric analysis. It provides a low dimensional quasi-isometric embedding. Its proof involves a random matrix and the concentration of measure phenomenon.<\/p>\n<p style=\"text-align:justify;\"><strong>Johnson-Lindenstrauss lemma for dimension reduction.<\/strong> Let us denote by $|v|$ the Euclidean norm of $v$. For all $n,N\\geq1$, all finite $S\\subset\\mathbb{R}^n$ of cardinal $N$, all $0 < \\varepsilon&lt;1$, and all $k > \\frac{24}{\\varepsilon^2}\\log(N)$, there exists a $k\\times n$ real matrix $A$ such that for all $x,y\\in S$, $x\\neq y$, \\[   \\sqrt{1-\\varepsilon}\\leq\\frac{|Ax-Ay|}{|x-y|}\\leq\\sqrt{1+\\varepsilon}. \\] In other words, it is possible to embed quasi-isometrically $N$ points of $\\mathbb{R}^n$ in $\\mathbb{R}^{O(\\log(N))}$.<\/p>\n<p style=\"text-align:justify;\"><strong>Proof.<\/strong> The desired property can be rewritten as \\[   ||A(x-y)|^2-|x-y|^2|\\leq\\varepsilon|x-y|^2. \\] If $R_1,\\ldots,R_k$ are the rows of $\\sqrt{k}A$, then \\[   |A(x-y)|^2=\\frac{1}{k}\\sum_{i=1}^k\\langle R_i,x-y\\rangle^2. \\] The idea now is that if $R_1,\\ldots,R_k$ are taken random and i.i.d. of law $\\mathcal{N}(0,I_n)$, then we have $\\langle R_i,x-y\\rangle\\sim\\mathcal{N}(0,|x-y|^2)$, making $|A(x-y)|^2$ a sum of $k$ i.i.d. $\\chi^2$ random variables of total mean $|x-y|^2$, which suggests to use the concentration of measure phenomenon around the mean $|x-y|^2$, for small deviations up to $\\pm\\varepsilon|x-y|^2$.<\/p>\n<p style=\"text-align:justify;\">Let $Z_1,\\ldots,Z_n$ be i.i.d. of law $\\mathcal{N}(0,\\sigma^2)$. For all $\\theta < \\frac{1}{2\\sigma^2}$, \\[   \\mathbb{E}(\\mathrm{e}^{\\theta(Z_i^2-\\sigma^2)})   =\\frac{\\mathrm{e}^{-\\theta\\sigma^2}}{\\sqrt{2\\pi}\\sigma}   \\int\\mathrm{e}^{\\theta u^2-\\frac{u^2}{2\\sigma^2}}\\mathrm{d}u   =\\frac{\\mathrm{e}^{-\\theta\\sigma^2}}{\\sqrt{1-2\\theta\\sigma^2}}. \\] The Markov inequality gives, for all $\\varepsilon > 0$ and all $0 < \\theta<\\frac{1}{2\\sigma^2}$, \\[   \\mathbb{P}\\Bigr(\\frac{1}{k}\\sum_{i=1}^k(Z_i^2-\\sigma^2)\\geq \\varepsilon\\sigma^2\\Bigr)   \\leq\\mathrm{e}^{-k(\\theta\\sigma^2(\\varepsilon+1)+\\log\\sqrt{1-2\\theta\\sigma^2})}. \\] Taking $\\theta=\\frac{\\varepsilon}{2\\sigma^2(1+\\varepsilon)} < \\frac{1}{2\\sigma^2}$ and using $\\varepsilon-\\log(1+\\varepsilon)\\geq\\frac{\\varepsilon^2}{4}$ for $0 < \\varepsilon&lt;1$ gives \\[   \\mathbb{P}\\Bigr(\\frac{1}{k}\\sum_{i=1}^k(Z_i^2-\\sigma^2)\\geq \\varepsilon\\sigma^2\\Bigr)   \\leq\\mathrm{e}^{-\\frac{k}{8}\\varepsilon^2}. \\] By the same method for $-(Z_i^2-\\sigma^2)$ with  $\\theta=\\frac{\\varepsilon}{2\\sigma^2(1-\\varepsilon)}$,  $-\\varepsilon+\\log(1-\\varepsilon)\\geq\\frac{\\varepsilon^2}{4}$, $0 < \\varepsilon&lt;1$,  \\[   \\mathbb{P}\\Bigr(-\\sum_{i=1}^k(Z_i^2-\\sigma^2)\\geq r\\Bigr)   \\leq\\mathrm{e}^{-\\frac{k}{8}\\varepsilon^2}. \\] Finally by the union bound, we get the following $\\sigma$ free upper bound, for all $0 < \\varepsilon&lt;1$: \\[   \\mathbb{P}\\Bigr(\\bigr|\\frac{1}{k}\\sum_{i=1}^kZ_i^2-\\sigma^2\\bigr|\\geq\\varepsilon\\sigma^2\\Bigr)   \\leq2\\mathrm{e}^{-\\frac{k}{8}\\varepsilon^2}. \\] But $R_1,\\ldots,R_k$ are i.i.d. $\\mathcal{N}(0,I_n)$. Thus, for all $x\\in\\mathbb{R}^n$, $|Ax|^2=\\frac{1}{k}\\sum_{i=1}^k\\langle R_i,x\\rangle^2$ has same law as $\\frac{1}{k}\\sum_{i=1}^kZ_i^2$ with $\\sigma^2=|x|^2$, thus, for all $0 < \\varepsilon&lt;1$, \\[   \\mathbb{P}\\Bigr(|\\Delta(x)|\\geq\\varepsilon|x|^2\\Bigr)   \\leq2\\mathrm{e}^{-\\frac{k}{8}\\varepsilon^2}   \\quad\\text{where}\\quad   \\Delta(x):=|Ax|^2-|x|^2. \\] This bound is $x$ free. Thus, by the union bound, for all $0 < \\varepsilon&lt;1$, \\begin{align*}   \\mathbb{P}\\bigr(\\exists x,y\\in S:|\\Delta(x-y)|>\\varepsilon|x-y|^2\\bigr)   &\\leq|S^2|\\max_{x,y\\in S}\\mathbb{P}\\bigr(|\\Delta(x-y)|>\\varepsilon|x-y|^2\\bigr)\\\\   &\\leq2N^2\\mathrm{e}^{-\\frac{k}{8}\\varepsilon^2}. \\end{align*} If $k > \\frac{8\\log(2N^2)}{\\varepsilon^2}$, then the bound is strictly smaller than $1$, hence there exists $\\omega$ such that $||A(x-y)|^2-|x-y|^2|\\leq\\varepsilon|x-y|^2$ for all $x,y\\in S$. This probabilistic method becomes constructive by using Monte Carlo. The probability of finding such an $\\omega$ decreases when $\\varepsilon$ or $k$ decreases, or when $N$ increases, which makes sense.<\/p>\n<p style=\"text-align:justify;\"><strong>Comments.<\/strong> What is important is the behavior of the concentration of $\\langle R_i,x\\rangle^2$ around its mean $|x|^2$, for small left and right deviations around the mean. The result remains up to a constant if we replace the Gaussian $\\mathcal{N}(0,1)$ by the discrete uniform on $\\pm1$ (Rademacher law) for the entries of $\\sqrt{k}A$. This can be written in terms of $\\psi$-norms, see [CGLP, Lemma 1.3.1 p. 34].<\/p>\n<p style=\"text-align:justify;\">The rows of $A$ can be chosen uniformly distributed on the sphere, which is the historical choice made in [JL]. Recent extensions are presented in [PV].<\/p>\n<p style=\"text-align:justify;\">The matrix $A$ does not depend on the data $S$, contrary to other dimension reduction methods such as Principal Components Analysis (PCA) or Singular Values Decomposition (SVD).<\/p>\n<p style=\"text-align:justify;\">\n<blockquote><p>   <em>Joram\u2019s and my mathematical collaborations ranged from nonseparable     Banach spaces to the geometry of finite metric spaces. To Joram these were     not very different. He viewed his migrations from topic to topic as     natural. Our early research was in the linear world, although even in the     1970s Joram tried to interest me in the nonlinear geometry of Banach     spaces. I knew well his landmark 1964 paper, in which he laid out a     blueprint for what nonlinear Banach space theory should be, but I thought     I had no intuition for the topic. That changed in 1981 during our     sabbaticals. Marcus and Pisier, as a consequence of their work on     stochastic processes, proved a seemingly unrelated result on the extension     of Lipschitz mappings from finite subsets of Lp , 1 < p < 2, into a     Hilbert space. Their theorem suggested a general result, where Lp is     replaced by a general Banach space, but because of the nature of their     proof, they did not get a result even for the Banach space L1. Joram and I     realized that we could solve the problem if we could prove a dimension     reduction lemma in Hilbert space. After formulating the lemma (now called     the Johnson-Lindenstrauss Lemma) we proved it in fifteen minutes. Still,     we knew it was a neat result because it not only allowed us to solve the     problem of Marcus and Pisier but also eliminated the \u201ccurse of     dimensionality\u201d in certain high-dimensional pattern recognition problems.     Of course, we had no idea that this lemma would become the most quoted     result of either of us, having 1,000+ references according to Google     Scholar, more than three times the references for all of our other     research articles combined, and getting 134,000 hits when Googling     \u201cJohnson- Lindenstrauss lemma.\u201d Joram's appreciation of the J-L Lemma is     revealed by looking at the list of his selected publications that Joram     drew up in the year before his death when he knew that the end was near;     that is, the paper containing the lemma is not among the twenty-six     articles he selected! Actually, I was not surprised by that; Joram put a     premium on difficulty and was not very     comfortable with the attention the J-L Lemma received.<\/em><br \/>   William B. Johnson, excerpt from [JS]. <\/p><\/blockquote>\n<p style=\"text-align:justify;\"><strong>Further reading.<\/strong><\/p>\n<p style=\"text-align:justify;\">\n<ul>\n<li><strong>[JL]<\/strong> William Buhmann Johnson and Joram Lindenstrauss<br \/>   Extensions of Lipschitz mappings into a Hilbert space<br \/>   Conference in modern analysis and probability (New Haven, Conn., 1982)<br \/>   Contemp. Math., vol. 26, Amer. Math. Soc., 1984, p. 189-206. <\/li>\n<li><strong>[CGLP]<\/strong> Djalil Chafa\u00ef, Olivier Gu\u00e9don, Guillaume Lecu\u00e9, and Alain Pajor<br \/>   Interactions between compressed sensing, random matrices, and high   dim. geometry<br \/>   Panoramas et Synth\u00e8ses 37, Soci\u00e9t\u00e9 Math\u00e9matique de France (SMF), (2012) 182p. <\/li>\n<li><strong>[JS]<\/strong> Compiled by William B. Johnson and Gideon Schechtman<br \/>   Joram Lindenstrauss, in Memoriam<br \/>   Notices of the American Mathematical Society 26(1) (2015) <\/li>\n<li><strong>[PV]<\/strong> Yaniv Plan and Roman Vershynin<br \/>   Random matrices acting on sets: independent columns<br \/>   Preprint arXiv:2502.16827 <\/li>\n<\/ul>\n<figure id=\"attachment_21405\" aria-describedby=\"caption-attachment-21405\" style=\"width: 221px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/William_B._Johnson_(mathematician)\"><img loading=\"lazy\" src=\"http:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2025\/02\/W.B.Johnson-221x300.png\" alt=\"Photo of William (Bill) Buhmann Johnson (1944 - )\" width=\"221\" height=\"300\" class=\"size-medium wp-image-21405\" srcset=\"https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2025\/02\/W.B.Johnson-221x300.png 221w, https:\/\/djalil.chafai.net\/blog\/wp-content\/uploads\/2025\/02\/W.B.Johnson.png 225w\" sizes=\"(max-width: 221px) 100vw, 221px\" \/><\/a><figcaption id=\"caption-attachment-21405\" class=\"wp-caption-text\">William (Bill) Buhmann Johnson (1944 - )<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>This short post is about the Johnson-Lindenstrauss lemma for dimension reduction. It is an emblematic basic result of high-dimensional probability or asymptotic geometric analysis. It&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2025\/02\/14\/back-to-basics-johnson-lindenstrauss-lemma\/\">Continue reading<span class=\"screen-reader-text\">Back to basics : Johnson-Lindenstrauss lemma<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1304},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/21403"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=21403"}],"version-history":[{"count":29,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/21403\/revisions"}],"predecessor-version":[{"id":21586,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/21403\/revisions\/21586"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=21403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=21403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=21403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}