{"id":861,"date":"2010-10-14T23:00:38","date_gmt":"2010-10-14T21:00:38","guid":{"rendered":"http:\/\/djalil.chafai.net\/blog\/?p=861"},"modified":"2020-06-22T15:07:06","modified_gmt":"2020-06-22T13:07:06","slug":"back-to-basics-total-variation-distance","status":"publish","type":"post","link":"https:\/\/djalil.chafai.net\/blog\/2010\/10\/14\/back-to-basics-total-variation-distance\/","title":{"rendered":"Back to basics - Total variation distance"},"content":{"rendered":"<p><a href=\"\/blog\/wp-content\/uploads\/2010\/10\/chalk.gif\"><img loading=\"lazy\" class=\"aligncenter size-full wp-image-881\" title=\"Teaching\" src=\"\/blog\/wp-content\/uploads\/2010\/10\/chalk.gif\" alt=\"\" width=\"80\" height=\"60\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Today, part of my teaching concerned basic properties of the total variation on discrete spaces. Let \\( {E} \\) be a possibly infinite <strong>countable<\/strong> set equipped with its discrete topology and \\( {\\sigma} \\)-field. Let \\( {\\mathcal{P}} \\) be the set of probability measures on \\( {E} \\). We equip \\( {\\mathcal{P}} \\) with the <em>total variation distance<\/em> defined for every \\( {\\mu,\\nu\\in\\mathcal{P}} \\) by<\/p>\n<p style=\"text-align: center;\">\\[ d_{TV}(\\mu,\\nu)=\\sup_{A\\subset{E}}|\\mu(A)-\\nu(A)|. \\]<\/p>\n<p style=\"text-align: justify;\">It takes its values in \\( {[0,1]} \\) and \\( {1} \\) is achieved when \\( {\\mu} \\) and \\( {\\nu} \\) have disjoint supports.<\/p>\n<blockquote style=\"background: white; border: solid thick #e4e5e7; text-align: justify; padding-left: 1em;\"><p><strong>Theorem 1 (Alternative expressions)<\/strong> <em>For every \\( {\\mu,\\nu\\in\\mathcal{P}} \\) we have<\/em><\/p>\n<p style=\"text-align: center;\"><em>\\[ d_{TV}(\\mu,\\nu) =\\frac{1}{2}\\sup_{f:E\\rightarrow[-1,1]}\\left|\\int\\!f\\,d\\mu-\\int\\!f\\,d\\nu\\right| =\\frac{1}{2}\\sum_{x\\in E}|\\mu(x)-\\nu(x)| \\]<\/em><\/p>\n<p><em>Moreover, the supremum in the original definition of \\( {d_{TV}} \\) is achieved for the set<\/em><\/p>\n<p style=\"text-align: center;\"><em>\\[ A_*=\\{x\\in E:\\mu(x)\\geq\\nu(x)\\} \\]<\/em><\/p>\n<p><em>while the supremum in the functional expression of \\( {d_{TV}} \\) is achieved for<\/em><\/p>\n<p style=\"text-align: center;\"><em>\\[ f=\\mathbf{1}_{A_*}-\\mathbf{1}_{A_*^c}. \\]<\/em><\/p>\n<\/blockquote>\n<p style=\"text-align: justify;\"><em>Proof:<\/em> The second equality follows from the inequality<\/p>\n<p style=\"text-align: center;\">\\[ \\left|\\int\\!f\\,d\\mu-\\int\\!f\\,d\\nu\\right| \\leq \\sum_{x\\in E}|f(x)||\\mu(x)-\\nu(x)| \\leq \\sup_{x\\in E}|f(x)|\\sum_{x\\in E}|\\mu(x)-\\nu(x)| \\]<\/p>\n<p style=\"text-align: justify;\">which is saturated for \\( {f=\\mathbf{1}_{A_*}-\\mathbf{1}_{A_*^c}} \\). For the first equality, we write<\/p>\n<p style=\"text-align: center;\">\\[ |\\mu(A)-\\nu(A)| =\\frac{1}{2}\\left|\\int\\!f_A\\,d\\mu-\\int\\,f_A\\,d\\nu\\right| \\]<\/p>\n<p style=\"text-align: justify;\">where \\( {f=\\mathbf{1}_A-\\mathbf{1}_{A^c}} \\), which gives<\/p>\n<p style=\"text-align: center;\">\\[ |\\mu(A)-\\nu(A)| \\leq \\sup_{f:E\\rightarrow[-1,1]}\\left|\\int\\!f\\,d\\mu-\\int\\!f\\,d\\nu\\right| = \\frac{1}{2}\\sum_{x\\in E}|\\mu(x)-\\nu(x)| \\]<\/p>\n<p style=\"text-align: justify;\">which is saturated for \\( {A=A_*} \\) since<\/p>\n<p style=\"text-align: center;\">\\[ 2|\\mu(A_*)-\\nu(A_*)|=\\sum_{x\\in A_*}|\\mu(x)-\\nu(x)|+\\sum_{x\\in A_*^c}|\\mu(x)-\\nu(x)|. \\]<\/p>\n<p style=\"text-align: justify;\">$\\Box$<\/p>\n<blockquote style=\"background: white; border: solid thick #e4e5e7; text-align: justify; padding-left: 1em;\"><p><strong>Theorem 2 (Equivalent criteria for convergence in law)<\/strong> <em>Let \\( {(X_n)_{n\\in\\mathbb{N}}} \\) be a sequence of random variables on \\( {E} \\) and let \\( {\\mu_n} \\) be the law of \\( {X_n} \\). For any \\( {\\mu} \\) in \\( {\\mathcal{P}} \\), the following properties are equivalent:<\/em><\/p>\n<ol>\n<li><em>\\( {(X_n)_{n\\in\\mathbb{N}}} \\) converges in law to \\( {\\mu} \\)<\/em><\/li>\n<li><em>\\( {\\lim_{n\\rightarrow\\infty}\\int\\!f\\,d\\mu_n=\\int\\!f\\,d\\mu} \\) for all bounded \\( {f:E\\rightarrow\\mathbb{R}} \\)<\/em><\/li>\n<li><em>\\( {\\lim_{n\\rightarrow\\infty}\\mu_n(x)=\\mu(x)} \\) for all \\( {x\\in E} \\)<\/em><\/li>\n<li><em>\\( {\\lim_{n\\rightarrow\\infty}d_{TV}(\\mu_n,\\mu)=0} \\)<\/em><\/li>\n<\/ol>\n<\/blockquote>\n<p style=\"text-align: justify;\"><em>Proof:<\/em> To deduce 1. from 4. it suffices to use the functional variational expression of \\( {d_{TV}} \\). The properties 1. and 2. are equivalent since every \\( {f:E\\rightarrow\\mathbb{R}} \\) is continuous for the discrete topology. To deduce 3. from 2. one can take \\( {f=\\mathbf{1}_{\\{x\\}}} \\). To deduce 4. from 3. we start by observing that for an arbitrary \\( {A\\subset E} \\),<\/p>\n<p style=\"text-align: center;\">\\[ \\sum_{x\\in E}|\\mu_n(x)-\\mu(x)| =\\sum_{x\\in A}|\\mu_n(x)-\\mu(x)| +\\sum_{x\\in A^c}|\\mu_n(x)-\\mu(x)|. \\]<\/p>\n<p style=\"text-align: justify;\">Thanks to 3. for every \\( {\\varepsilon'&gt;0} \\) there exists an integer \\( {N=N(A,\\varepsilon')} \\) such that the first term of the right hand side is bounded above by \\( {\\mathrm{card}(A)\\varepsilon'} \\) for all \\( {n\\geq N} \\). For the second term of the right hand side, we write<\/p>\n<p style=\"text-align: center;\">\\[ \\sum_{x\\in A^c}|\\mu_n(x)-\\mu(x)| \\leq \\sum_{x\\in A^c}\\mu_n(x)+\\sum_{x\\in A^c}\\mu(x). \\]<\/p>\n<p style=\"text-align: justify;\">Since we have<\/p>\n<p style=\"text-align: center;\">\\[ \\sum_{x\\in A^c}\\mu_n(x) =\\sum_{x\\in A}\\mu(x)-\\sum_{x\\in A}\\mu_n(x)+\\sum_{x\\in A^c}\\mu(x) \\]<\/p>\n<p style=\"text-align: justify;\">we obtain<\/p>\n<p style=\"text-align: center;\">\\[ \\sum_{x\\in A^c}|\\mu_n(x)-\\mu(x)| \\leq \\sum_{x\\in A}|\\mu_n(x)-\\mu(x)|+2\\sum_{x\\in A^c}\\mu(x). \\]<\/p>\n<p style=\"text-align: justify;\">Since \\( {\\nu\\in\\mathcal{P}} \\), for any \\( {\\varepsilon''&gt;0} \\), we can select \\( {A} \\) finite such that \\( {\\mu(A^c)\\leq\\varepsilon''} \\).$\\Box$<\/p>\n<blockquote style=\"background: white; border: solid thick #e4e5e7; text-align: justify; padding-left: 1em;\"><p><strong>Theorem 3 (Yet another expression and the extremal case)<\/strong> <em>For every \\( {\\mu,\\nu\\in\\mathcal{P}} \\) we have<\/em><\/p>\n<p style=\"text-align: center;\"><em>\\[ d_{TV}(\\mu,\\nu)=1-\\sum_{x\\in E}(\\mu(x)\\wedge\\nu(x)). \\]<\/em><\/p>\n<p><em>In particular, \\( {d_{TV}(\\mu,\\nu)=1} \\) if and only if \\( {\\mu} \\) and \\( {\\nu} \\) have disjoint supports.<\/em><\/p><\/blockquote>\n<p style=\"text-align: justify;\"><em>Proof:<\/em> It suffices to write<\/p>\n<p style=\"text-align: center;\">\\[ \\sum_{x\\in E}(\\mu(x)\\wedge\\nu(x)) =\\frac{1}{2}\\sum_{x\\in E}(\\mu(x)+\\nu(x)-|\\mu(x)-\\nu(x)|)=1-d_{TV}(\\mu,\\nu). \\]<\/p>\n<p style=\"text-align: justify;\">$\\Box$<\/p>\n<blockquote style=\"background: white; border: solid thick #e4e5e7; text-align: justify; padding-left: 1em;\"><p><strong>Theorem 4 (Coupling)<\/strong> <em>For every \\( {\\mu,\\nu\\in\\mathcal{P}} \\) we have<\/em><\/p>\n<p style=\"text-align: center;\"><em>\\[ d_{TV}(\\mu,\\nu) =\\inf_{(X,Y)}\\mathbb{P}(X\\neq Y) \\]<\/em><\/p>\n<p><em>where the infimum runs over all the couples of random variables on \\( {E\\times E} \\) with marginal laws \\( {\\mu} \\) and \\( {\\nu} \\). Moreover, there exists a couple of this type for which the equality is achieved.<\/em><\/p><\/blockquote>\n<p style=\"text-align: justify;\"><em>Proof:<\/em> Let \\( {(X,Y)} \\) be a couple of random variables on \\( {E\\times E} \\) with marginal laws \\( {\\mu} \\) nd \\( {\\nu} \\). Since \\( {\\mathbb{P}(X=x,Y=x)\\leq \\mu(x)\\wedge\\nu(x)} \\) for every \\( {x\\in E} \\) we have<\/p>\n<p style=\"text-align: center;\">\\[ 1-d_{TV}(\\mu,\\nu)=\\sum_{x\\in E}(\\mu(x)\\wedge\\nu(x)) \\geq \\sum_{x\\in E}\\mathbb{P}(X=x,Y=x) =\\mathbb{P}(X=Y). \\]<\/p>\n<p style=\"text-align: justify;\">Therefore, it suffices now to construct a couple \\( {(X,Y)} \\) for which the equality is achieved. Define<\/p>\n<p style=\"text-align: center;\">\\[ p=1-d_{TV}(\\mu,\\nu)\\in[0,1]. \\]<\/p>\n<p style=\"text-align: justify;\"><strong>Case where<\/strong> \\( {p=0} \\). Then \\( {d_{TV}(\\mu,\\nu)=1} \\) and the supports of \\( {\\mu} \\) et \\( {\\nu} \\) are disjoints. This gives \\( {\\sum_{x\\in E}\\mu(x)\\nu(x)=0} \\). We consider in this case \\( {(X,Y)} \\) where \\( {X\\sim\\mu} \\) and \\( {Y\\sim\\nu} \\) are independent:<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(X=Y)=\\sum_{x\\in E}\\mu(x)\\nu(x)=0. \\]<\/p>\n<p style=\"text-align: justify;\"><strong>Case where<\/strong> \\( {p=1} \\). Then \\( {d_{TV}(\\mu,\\nu)=0} \\) and thus \\( {\\mu=\\nu} \\). We can take \\( {(X,X)} \\) where \\( {X\\sim\\mu} \\).<\/p>\n<p style=\"text-align: justify;\"><strong>Case where<\/strong> \\( {0&lt;p&lt;1} \\). Let \\( {(U,V,W)} \\) be a triple of random variables with laws<\/p>\n<p style=\"text-align: center;\">\\[ p^{-1}(\\mu\\wedge\\nu),\\quad (1-p)^{-1}(\\mu-(\\mu\\wedge\\nu)),\\quad (1-p)^{-1}(\\nu-(\\mu\\wedge\\nu)) \\]<\/p>\n<p style=\"text-align: justify;\">(recall that \\( {p=\\sum_{x\\in E}(\\mu(x)\\wedge\\nu(x))} \\)). Let \\( {B} \\) be a Bernoulli random variable, independent of \\( {(U,V,W)} \\), such that \\( {\\mathbb{P}(B=1)=1-\\mathbb{P}(B=0)=p} \\). If we define \\( {(X,Y)=(U,U)} \\) if \\( {B=1} \\) while \\( {(X,Y)=(V,W)} \\) if \\( {B=0} \\), then \\( {X\\sim\\mu} \\) and \\( {Y\\sim\\nu} \\), and since the laws of \\( {V} \\) and \\( {W} \\) have disjoint supports, we have \\( {\\mathbb{P}(V=W)=0} \\), and thus<\/p>\n<p style=\"text-align: center;\">\\[ \\mathbb{P}(X=Y)=\\mathbb{P}(B=1)=p. \\]<\/p>\n<p style=\"text-align: justify;\">$\\Box$<\/p>\n<p style=\"text-align: justify;\">Since \\( {\\mathbb{P}(X\\neq Y)=\\mathbb{E}(d(X,Y))} \\) for the atomic distance \\( {d(x,y)=1_{x\\neq y}} \\) we have<\/p>\n<p style=\"text-align: center;\">\\[ d_{TV}(\\mu,\\nu)=\\min_{\\pi\\in\\Pi(\\mu,\\nu)}\\int_{E\\times E}\\!d(x,y)\\,d\\pi(x,y) \\]<\/p>\n<p style=\"text-align: justify;\">where \\( {\\Pi(\\mu,\\nu)} \\) is the convex set of probability measures on \\( {E\\times E} \\) with marginal laws \\( {\\mu} \\) and \\( {\\nu} \\). From this point of view, \\( {d_{TV}} \\) is the Wasserstein \\( {W_1} \\) distance on \\( {\\mathcal{P}} \\) associated to the atomic distance on \\( {E} \\).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, part of my teaching concerned basic properties of the total variation on discrete spaces. Let \\( {E} \\) be a possibly infinite countable set&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/djalil.chafai.net\/blog\/2010\/10\/14\/back-to-basics-total-variation-distance\/\">Continue reading<span class=\"screen-reader-text\">Back to basics - Total variation distance<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":4112},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/861"}],"collection":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/comments?post=861"}],"version-history":[{"count":20,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/861\/revisions"}],"predecessor-version":[{"id":13818,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/posts\/861\/revisions\/13818"}],"wp:attachment":[{"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/media?parent=861"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/categories?post=861"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/djalil.chafai.net\/blog\/wp-json\/wp\/v2\/tags?post=861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}