Press "Enter" to skip to content

Kantorovich invented Wasserstein distances

Leonid Vitaliyevich Kantorovich (1912 – 1986)
Leonid Vitaliyevich Kantorovich (1912 – 1986)

Wasserstein metrics are in many areas of pure and applied mathematics. The simplest Wasserstein distance is defined of the set of probability measures on $\mathbb{R}^d$ with finite first moment by


where the infimum runs over all couples $(X,Y)$ of random variables with $X\sim\mu$ and $Y\sim\nu$. Equivalently, in a more analytic formulation


where the infimum runs over the convex set of probability measures on the product space $\mathbb{R}^d\times\mathbb{R}^d$ with marginal distributions $\mu$ and $\nu$. This is the infimum of a linear function of $\pi$ over a convex set. This leads to the Kantorovich-Rubinstein dual formulation

$$W(\mu,\nu)=\sup_{f}\int f\mathrm{d}(\mu-\nu)$$

where the supremum runs over the set of Lipschitz functions $f:\mathbb{R}^d\to\mathbb{R}$ with

$$\left\Vert f\right\Vert_{\mathrm{Lip}}:=\sup_{x\neq y}\frac{|f(x)-f(y)|}{|x-y|}\leq1.$$

The Wasserstein metric is so useful that it has many other names, depending on the context, such as transportation distance, Mallows distance, earth mover’s distance, etc.

Very recently, Michel Ledoux brought to my attention an expository article by Anatoly Vershyk entitled Long History of the Monge-Kantorovich Transportation Problem about the history of these metrics and their naming. According to this article, we should definitively say Kantorovich metrics instead of Wasserstein metrics. Here are some excerpts from Vershyk’s article:

In the 1939 booklet and subsequently, LV Kantorovich (LV hereinafter) singled out the transportation problem from other problems of linear programming. Soon after, he began writing together with his disciple M. K. Gavurin on a special method for solving a linear transport problem—the potential method [4]. It is an implementation of the general method of duality, and the visual interpretation leads immediately to the analogy with the theory of fluid dynamics and flows in networks, which was later much developed. The article with Gavurin was addressed to transport engineers and planners, but it was rejected by several serious journals in the field and remained unpublished for almost 10 years. Not waiting for its publication, LV wrote his ‘‘On mass transfer’’ [3]. I want to say about this work the same that I said about the booklet. This is a classic in all respects: it contains a profound idea that goes beyond those examples studied previously, it is brief and self-contained, one feels that there is nothing more to be said, just as there is nothing to be added to the second law of Newton, and finally, it includes a program of future research, one that was followed at first very slowly but proceeds especially quickly today.

About the end of the 1950s, LV’s main work on the subject (not only economics) became known in the West, more so than in the USSR. Since that time, some scholars in the West speak of ‘‘the Monge–Kantorovich transport problem,’’ which seems to me pretty fair terminology.

Just this … as a function of two probability measures on the compact … should be called the Kantorovich metric on the product space.

It is especially ironic to find the Kantorovich metric called the Vasershtein metric. Leonid Vasershtein is a famous mathematician specializing in algebraic K-theory and other areas of algebra and analysis, and my good friend—and he is absolutely not guilty of this distortion of terminology, which occurs primarily in Western literature. It so happened that my colleague and friend R. L. Dobrushin, head of the laboratory where L. N. Vasershtein worked, with understandable enthusiasm spread the word mostly among probabilists and statistical physicists about the ‘‘new’’ metric and its spectacular applications. I spoke to Dobrushin in 1975 and told him that what he called the Vasershtein metric in the report is the Kantorovich metric. After some discussion, he agreed fully and even said so in one of his later works. But it was too late, the wrong name stuck. Vasershtein’s interesting article [13] was very brief (it seems to me that few people who refer to it have looked at it), it does contain in passing a definition of the Kantorovich metric and applies it to the behavior of Markov fields. But there is no definition of power metrics, although in the literature [14] those are also called Vasershtein metrics. Undoubtedly, the work of Vasershtein is worthy of mention in this connection, but I think we should restore the correct terminology out of respect for L. V. Kantorovich, to the teachers and pioneers in our science.

[3]   Kantorovich, L. V., 1942: On translocation of masses. USSR AS Doklady. New Serie. vol. 37, 7–8, 227–229 (in Russian). [English translation: J. Math. Sci., 133, 4 (2006), 1381–1382.]
[4]   Kantorovich,  L.  V.,  and  Gavurin,  M.  K.,  1949:  Application  of mathematical methods to problems of analysis of freight flows. Problems  of  raising  the  efficiency  of  transport  performance, Moscow-Leningrad, 110–138 (in Russian)

[13] Vasershtein, L. N. 1969: Markov processes over denumerable products of spaces describing large system of automata,
Problems Inform. Transmission, 5, 3, 47–52.
[14] Villani, C., 2006: Optimal Transport, Old and New, Springer, 635 pp.

Naming is a delicate matter. What science really needs is standard names. The rest is more social and psychological, but has an impact on science too! Do you know Stigler’s law of eponymy?

Note. Leonid Vasershtein is still alive.

Note. It seems that Leonid V. Kantorovich used the notation W in his early works!


  1. Mark Meckes 2016-10-20

    You might be interested to see Barry Simon’s take on the issue of naming, starting on p. 3 here.

  2. Djalil Chafaï 2016-10-20

    Thanks, Mark. I agree with Barry Simon in this text, in particular when he says … “I have resisted the temptation of some text writers to rename things to set the record straight. For example, there is a small group who have attempted to replace “WKB approximation” by “Liouville–Green approximation”, with valid historical justification (see the Notes to Section 15.5 of Part 2B). But if I gave a talk and said I was about to use the Liouville–Green approximation, I’d get blank stares from many who would instantly know what I meant by the WKB approximation. And, of course, those who try to change the name also know what WKB is! Names are mainly for shorthand, not history.“…

  3. Nicolas Rougerie 2016-11-01

    A useful reminder ! Thanks for the reference to Vershyk’s article. One can find a similar discussion in the video lectures of Pierre-Louis Lions at the Collège de France (year 2007-2008 I believe) wherein he suggests to use “Monge-Kantorovitch-Wasserstein metrics” to make everybody happy.

    He also adds “En recherche, rien ne se perd, tout se retrouve.” and demonstrates the validity of the principle by essentially rediscovering the Diaconis-Freedman proof of de Finetti’s theorem during the course of the lectures.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Syntax · Style · .