Wasserstein metrics are in many areas of pure and applied mathematics. The simplest Wasserstein distance is defined of the set of probability measures on $\mathbb{R}^d$ with finite first moment by

$$W(\mu,\nu)=\inf_{(X,Y)}\mathbb{E}(|X-Y|)$$

where the infimum runs over all couples $(X,Y)$ of random variables with $X\sim\mu$ and $Y\sim\nu$. Equivalently, in a more analytic formulation

$$W(\mu,\nu)=\inf_{\pi}\int\!|x-y|\pi(\mathrm{d}x,\mathrm{d}y)$$

where the infimum runs over the convex set of probability measures on the product space $\mathbb{R}^d\times\mathbb{R}^d$ with marginal distributions $\mu$ and $\nu$. This is the infimum of a linear function of $\pi$ over a convex set. This leads to the Kantorovich-Rubinstein dual formulation

$$W(\mu,\nu)=\sup_{f}\int f\mathrm{d}(\mu-\nu)$$

where the supremum runs over the set of Lipschitz functions $f:\mathbb{R}^d\to\mathbb{R}$ with

$$\left\Vert f\right\Vert_{\mathrm{Lip}}:=\sup_{x\neq y}\frac{|f(x)-f(y)|}{|x-y|}\leq1.$$

The Wasserstein metric is so useful that it has many other names, depending on the context, such as transportation distance, Mallows distance, earth mover’s distance, etc.

Very recently, Michel Ledoux brought to my attention an expository article by Anatoly Vershyk entitled *Long History of the Monge-Kantorovich Transportation Problem* about the history of these metrics and their naming. According to this article, we should definitively say **Kantorovich metrics** instead of **Wasserstein metrics**. Here are some excerpts from Vershyk’s article:

…

In the 1939 booklet and subsequently,LV Kantorovich(LV hereinafter) singled out the transportation problem from other problems of linear programming. Soon after, he began writing together with his disciple M. K. Gavurin on a special method for solving a linear transport problem—the potential method [4]. It is an implementation of the general method of duality, and the visual interpretation leads immediately to the analogy with the theory of fluid dynamics and flows in networks, which was later much developed. The article with Gavurin was addressed to transport engineers and planners, but it was rejected by several serious journals in the field and remained unpublished for almost 10 years. Not waiting for its publication, LV wrote his ‘‘On mass transfer’’ [3]. I want to say about this work the same that I said about the booklet. This is a classic in all respects: it contains a profound idea that goes beyond those examples studied previously, it is brief and self-contained, one feels that there is nothing more to be said, just as there is nothing to be added to the second law of Newton, and finally, it includes a program of future research, one that was followed at first very slowly but proceeds especially quickly today.

…

About the end of the 1950s, LV’s main work on the subject (not only economics) became known in the West, more so than in the USSR. Since that time, some scholars in the West speak of ‘‘theMonge–Kantorovich transport problem,’’ which seems to me pretty fair terminology.

…

Just this … as a function of two probability measures on the compact … should be called theKantorovich metricon the product space.

…

It is especially ironic to find the Kantorovich metric called the Vasershtein metric.Leonid Vasershteinis a famous mathematician specializing in algebraic K-theory and other areas of algebra and analysis, and my good friend—and he is absolutely not guilty of this distortion of terminology, which occurs primarily in Western literature. It so happened that my colleague and friendR. L. Dobrushin, head of the laboratory where L. N. Vasershtein worked, with understandable enthusiasm spread the word mostly among probabilists and statistical physicists about the ‘‘new’’ metric and its spectacular applications. I spoke to Dobrushin in 1975 and told him that what he called the Vasershtein metric in the report is the Kantorovich metric. After some discussion, he agreed fully and even said so in one of his later works. But it was too late, the wrong name stuck. Vasershtein’s interesting article [13] was very brief (it seems to me that few people who refer to it have looked at it), it does contain in passing a definition of the Kantorovich metric and applies it to the behavior of Markov fields. But there is no definition of power metrics, although in the literature [14] those are also called Vasershtein metrics.Undoubtedly, the work of Vasershtein is worthy of mention in this connection, but I think we should restore the correct terminology out of respect for L. V. Kantorovich, to the teachers and pioneers in our science.

…

[3] Kantorovich, L. V., 1942: On translocation of masses. USSR AS Doklady. New Serie. vol. 37, 7–8, 227–229 (in Russian). [English translation: J. Math. Sci., 133, 4 (2006), 1381–1382.]

[4] Kantorovich, L. V., and Gavurin, M. K., 1949: Application of mathematical methods to problems of analysis of freight flows. Problems of raising the efficiency of transport performance, Moscow-Leningrad, 110–138 (in Russian)…

[13] Vasershtein, L. N. 1969: Markov processes over denumerable products of spaces describing large system of automata,Problems Inform. Transmission, 5, 3, 47–52.[14] Villani, C., 2006: Optimal Transport, Old and New, Springer, 635 pp.

…

Naming is a delicate matter. What science really needs is standard names. The rest is more social and psychological, but has an impact on science too! Do you know Stigler’s law of eponymy?

**Note.** Leonid Vasershtein is still alive.

**Note.** It seems that Leonid V. Kantorovich used the notation W in his early works!

You might be interested to see Barry Simon’s take on the issue of naming, starting on p. 3 here.

Thanks, Mark. I agree with Barry Simon in this text, in particular when he says … “

I have resisted the temptation of some text writers to rename things to set the record straight. For example, there is a small group who have attempted to replace “WKB approximation” by “Liouville–Green approximation”, with valid historical justification (see the Notes to Section 15.5 of Part 2B). But if I gave a talk and said I was about to use the Liouville–Green approximation, I’d get blank stares from many who would instantly know what I meant by the WKB approximation. And, of course, those who try to change the name also know what WKB is! Names are mainly for shorthand, not history.“…A useful reminder ! Thanks for the reference to Vershyk’s article. One can find a similar discussion in the video lectures of Pierre-Louis Lions at the Collège de France (year 2007-2008 I believe) wherein he suggests to use “Monge-Kantorovitch-Wasserstein metrics” to make everybody happy.

He also adds “En recherche, rien ne se perd, tout se retrouve.” and demonstrates the validity of the principle by essentially rediscovering the Diaconis-Freedman proof of de Finetti’s theorem during the course of the lectures.

Thanks, Nicolas. I was not able to locate the correct video on the website of the Collège de France – there are so many – but at least this is clearly mentioned in the lecture notes, page 3 of Équations aux dérivées partielles et applications.