Processing math: 100%
Press "Enter" to skip to content

Concentration for projected distributions

Photo of Sergey Germanovich Bobkov (1961 - ) a great explorer of gaussianity, isoperimetry, and concentration
Sergey Germanovich Bobkov (1961 - ) a great explorer of gaussianity, isoperimetry, and concentration

This post is devoted to a concentration inequality of Lipschitz functions for a class of projected probability distributions on the unit sphere of Rn, n2, Sn1={xRn:|x|:=x21++x2n=1}. We take this opportunity to recall various aspects of concentration for Gaussians.

Concentration. Let us consider a random vector X of Rn, n2, and its Euclidean norm |X|:=X21++X2n. Suppose that for all F:RnR Lipschitz, the law of the random variable F(X) has sub-Gaussian concentration, in the sense that for all r0, P(|F(X)EF(X)|r)cexp(r22CF2Lip.), for some constants c,C>0. Then for all F:RnR Lipschitz and rσFLip., P(|F(X|X|)EF(X|X|)|r)2cexp(μ28C(rFLip.σ)2) where μ:=E|X|andσ:=E||X|μ1|. The quantity FLip.:=supx,yRn:xy|F(x)F(y)||xy| is the Lipschitz norm of F with respect to the Euclidean norm on Rn, and not with respect to the geodesic distance on Sn1.

The statement is dilation invariant, in the sense that if we replace X by λX for some constant λ>0, the left hand side is invariant, and the right hand side is also invariant, since μ and C are replaced by λμ and λC while σ is invariant.

Since X/|X| takes its values in the unit sphere Sn1, which has diameter 2, we have |F(X|X|)EF(X|X|)|2FLip., it follows that P(|F(X|X|)EF(X|X|)|r)=0 if r>2FLip., as a consequence, the concentration inequality is useless when r>2FLip..

Projected normal laws. Consider the Gaussian case XN(m,Σ), mRn, ΣSym+n×n(R). Then the sub-Gaussian concentration of Lipschitz functions holds with c=2andC=Σop.=max|x|=1Σx,x. The law of X/|X| is known as the projected normal distribution on Sn1.

Let us further specialize to the isotropic case (m,Σ)=(0,In). Then the law of X/|X| is uniform, moreover Σop.=1, X1,,Xn are i.i.d. N(0,1), and μ=E(χ(n))=2Γ(n+12)Γ(n2)nnwhileσn1πn, and these asymptotic estimates correspond to LLN and CLT for X21++X2n and express the thin-shell phenomenon for the isotropic log-concave distribution N(0,In).

Proof. The map xx/|x| is 1-Lipschitz outside the unit ball, but is not Lipschitz at the origin. Instead of trying to compose functions, the idea is that since the real random variable |X| is a Lipschitz function of X, it concentrates around its mean μ, making X/|X| close to X/μ, which is a dilation of X, and which concentrates in turn! Let us follow this idea.

The concentration inequality, used with the 1-Lipschitz function ||, gives, for all ρ>0, P(||X|μ|ρμ)cexp(ρ2μ22C). Next, to concentrate F(X/|X|), we note that by replacing r with r/FLip., we can assume without loss of generality that FLip.=1. Now, on the event Aρ={||X|μ|<ρμ}, |F(X|X|)F(Xμ)||X|||X|μ||X|μ=||X|μ|μρ. Moreover, we have |EF(X|X|)EF(Xμ)|E|F(X|X|)F(Xμ)|σ. Hence, on the event Aρ, |F(X|X|)EF(X|X|)||F(Xμ)EF(Xμ)|+(ρ+σ). Therefore, for all r(ρ+σ), using the concentration for F(X/μ), P(|F(X|X|)EF(X|X|)|r)P(Acρ)+P(|F(Xμ)EF(Xμ)|r(ρ+σ))cexp(ρ22Cμ2)+cexp((r(ρ+σ))22Cμ2). It remains to select or optimize over ρ. Let us take ρ=r(ρ+σ), which gives ρ=12(rσ), which satisfies r(ρ+σ). This gives, for all rσ, P(|F(X|X|)EF(X|X|)|r)2cexp(μ2(rσ)28C).

Uniform case. If the law of X is rotationally invariant, for instance when XN(0,In), then U:=X/|X| follows the uniform distribution on Sn1, and the concentration inequality of Milman and Schechtman states that for all F:Sn1R and r0, P(|F(U)EF(U)|r)2exp(nr22F2Lip.), where the Lipschitz constant is with respect to the geodesic distance on Sn1.

Covariance representation for the Gaussian. Let us consider XN(0,In) in Rn. Then, for all F,G:RnR such that |F|< and |G|<, and all α[0,1], the following elementary covariance representation holds: EF(X)G(X)EF(X)EG(X)=10E(F)(Xα),(G)(Yα)dα where (XαYα)N(0,(InαInαInIn)). Equivalently, we could use (Xα,Yα):=(X,αX+1α2Y) with Y independent copy of X and α=et, which is the Mehler formula for the Ornstein-Uhlenbeck process.

Following [H], this can be proved by interpolation as α runs over [0,1]. We could use the Ornstein-Uhlenbeck semigroup, but there is something simpler. Indeed, X1=Y1 has the law of X, while X0 and Y0 are independent and have both the law of X. Thus EF(X)G(X)EF(X)EG(X)=EF(X1)G(Y1)EF(X0)G(Y0)=10αEF(Xα)G(Yα)dα. By approximation and bilinearity, it suffices to consider the case of trigonometric monomials, namely characteristic functions : F(x)=eiu,x and G(x)=eiv,x, u,vRn. In this case EF(Xα)G(Yα)=exp(12(|u|2+2αu,v+|v|2)). Now it simply remains to note, using F(x)=iueiu,x and G(x)=iveiv,x, that αEF(Xα)G(Yα)=u,vEF(Xα)G(Yα)=EF(Xα),G(Yα). Denoting fα(x,y) the density of (Xα,Yα), this writes, using integration by parts, αfα(x,y)=nk=12xk,ykfα(x,y).

Concentration from covariance representation. Following [BGH, H], the covariance representation implies sub-Gaussian concentration of Lipschitz functions. Let F:RnR be such that |F|1 and EF(X)=0. The covariance representation used for F and G=eθF, θ0, gives, using the fact that Yα has the law of X for all α, EF(X)eθF(X)=θ10EF(Xα),F(Yα)eθF(Yα)dαθ10E|F(Xα)||F(Yα)|eθF(Yα)dαθ10EeθF(Yα)dα=θEeθF(X). Introducing the Laplace transform L(θ):=EeθF(X), this is the differential inequality L(θ)θL(θ),hencelogEeθFθ22. By exponential Markov inequality, for all r0, P(F(X)r)infθ>0eθrEeθFexp(r22). Finally, by translation, dilation, and approximation with the Rademacher theorem on Lipschitz functions, if XN(m,Σ) then for all F:RnR and all r0, P(|F(X)EF(X)|r)2exp(r22F(Σ)2Lip.)2exp(r22Σop.F2Lip.). In the sequel, for simplicity, we formulate the results for N(0,In) only.

Finer tail from Pisier inequality. Recall the Mills bound for ZN(0,1) and r>0: P(Zr)=rex222πdxrxrex222πdx12πrr(ex22)dx=er222πr, a quantitative estimate that catches the erfc asymptotics P(Zr)er222πr as r.

Following [BGH, H], if say XN(0,In), then for all F:RnR, FLip.1, r>0, P(|F(X)EF(X)|r)π2er22r, generalizing the Mills bound with a worse constant.

Let us give a proof following [BGH]. Set Y=F(X)EF(X). By smoothing, we can assume that F is smooth and Y has positive density p. The covariance representation above gives, for all bounded and piecewise differentiable U:RR, that EYU(Y)EU(Y). Specializing to the piecewise step function U(x)=min(xr)+,ε), we get, EY(Yr)1rYr+ε+εEY1Yr+εP(Yr+ε)P(Yr). Dividing by ε and sending ε to 0 gives, for all r>0, m(r):=EY1Yrp(r). Since m(r)=rxp(x)dx and m(r)=rp(r), we have obtained the differential inequality m(r)m(r)/r. Therefore rlogm(r)+12r2 is non-increasing. It follows that r0er22EY1Yr=m(r)er22 is non-increasing. Hence, for all r>0, P(Yr)EY1Y0er22r. It remains to use the Pisier concentration inequality to get E|Y|π/2|F|.

Pisier concentration inequality with Maurey version of the proof. Following [P, Theorem 2.2], if say XN(0,In) then for all Φ:RR convex and F:RnR, EΦ(F(X)EF(X))EΦ(F(X)F(X))EΦ(π2F(X),X) where X is an independent copy of X. Examples include Φ(x)=eθx and Φ(x)=|x|.

The first inequality comes from the Jensen inequality with respect to the expectation on X. To prove the second inequality, let us follow Pisier and Maurey. Set Xα:=αX+1α2X and Xα:=1α2XαX, α[0,1]. Then αXα=11α2Xα, and F(X)F(X)=F(X1)F(X0)=10F(Xα),Xαdα1α2. The Jensen inequality for Φ and the arcsine law on [0,1] of density f(α)=2π1α2 gives Φ(F(X)F(X))10Φ(π2F(Xα),Xα)f(α)dα. By the Fubini-Tonelli theorem EΦ(F(X)F(X))10EΦ(π2F(Xα),Xα)f(α)dα. Now (Xα,Xα) is the image of the standard Gaussian vector (X,X)=(X0,X0) by a rotation (of angle arccosα) and since the standard Gaussian law is rotationally invariant, the law of (Xα,Xα) does not depend on α. Hence the expectation under the integral is a constant function of α, and this gives finally the desired Pisier concentration inequality.

In the case Φ(x)=|x|, we get, using Fubini-Tonelli or conditionning over X, E|F(X)EF(X)|π2E|F(X),X|=π2EXEX|F(X),X|. Now, using the Cauchy-Schwarz inequality to upper bound the right hand side would be too rough. Instead, we note that at fixed X, by rotational invariance, F(X),X has the law of |F(X)|e1,X, namely the law of |F(X)|Z with ZN(0,1). Therefore, we get E|F(X)EF(X)|π2|F(X)|E(|Z|)=π2|F(X)|.

Note that only the rotational invariance of the Gaussian is used in these Pisier-Maurey proofs. In particular, the results remain available for rotational invariant distributions such as multivariate Barenblatt/Student/Cauchy distributions, see for instance [BV].

It is natural to ask about the optimality of the constant π/2 in the Pisier concentration inequality. In the affine case F(x)=a,x+b, |a|=1, we have F(X)F(X)N(0,2) and F(X),XN(0,1), hence the optimal constant for affine F is 2<π/2.

Finer tail via stochastic calculus, following Ibragimov, Tsirelson, and Sudakov. Maurey is known in probabilistic functional analysis for proofs using stochastic calculus. He was not the first. As a matter of fact, let us examine an argument giving finer concentration due to Ibragimov, Sudakov, and Tsirelson in [IST, Theorem 1 and Corollaries 1 and 2 p. 25-27] : if XN(0,In) and F:RnR, |F|1, then the random variable F(X)EF(X) has the law of BT where B is a standard real Brownian motion and T a stopping time such that T1.

As a consequence, for all r0, by the reflection principle, P(F(X)EF(X)r)=P(BTr)P(sup0t1Btr)=P(|B1|r)=2Q(r) where Q is the tail of the standard Gaussian N(0,1). It follows that for all r0, P(|F(X)EF(X)|r)4Q(r). Recall the Mills bound and the erfc asymptotics Q(r)=rex222πdxer22r2πandQ(r)=12erfc(r2)r+er22r2π(11r2+3r4). Back to the construction of T such that F(X)EF(X) and BT are identical in law, it is not a surprise to get something of this kind, having in mind Skorokhod representation theorems, what is remarquable is to get it with T1. Let (Ws)s[0,1] be an n-dimensional standard Brownian motion, and let us consider the martingale Ms=E(F(W1)Fs),s[0,1]. Then M0=EF(X), while M1 has the law of F(X). By the Dubins-Schwarz theorem, there exists a real Brownian motion B such that (BMs)0s1 and (MsM0)0s1 have same law, in particular M1M0=F(X)EF(X) has the law of BT with T:=M1.

It remains to show that M11. Let (Ps)s[0,1] be the heat or Markov semigroup of W defined by Ps(f)(x)=E(f(Ws)W0=x). Then, by the Markov property, we get Ms=E(F(W1)Ws)=P1s(F)(Ws). Next, the Itô formula gives Mt=M0+t0P1s(F)(Ws)dWs,henceMt=t0|P1s(F)|2(Ws)ds. Now P1s(F)=P1sF, thus |P1s(F)|P1s|F|, hence |P1s(F)|1.

Note that E(|F(X)EF(X)|2)=EM11, hence E|F(X)EF(X)|1, a better bound than the π/2 obtained from Pisier inequality above. Moreover the affine case F(x)=a,x+b, |a|=1, shows that this bound is in fact optimal.

Further comments. The concentration of Lipschitz functions for the uniform distribution on Sn1 dates back at least to Milman and Schechtman, as a consequence of the Lévy isoperimetric inequality for the uniform distribution on the sphere. In high-dimension, as n, its gives the concentration of Lipschitz functions for the standard normal distribution, which can also be deduced from the Gaussian isoperimetric inequality of Sudakov and Tsirelson, or from the logarithmic Sobolev inequality of Gross via the Herbst argument, or from the Talagrand transportation inequality, or from the infimum convolution inequality of Maurey. As we have explained, finer versions can be deduced, following Ibragimov, Sudakov, and Tsirelson [IST], by using stochastic calculus, or by using the concentration inequality of Pisier [P]. Following Bobkov, Götze, and Houdré [BGH, H], it can also be deduced quickly from a covariance representation put forward by Houdré and Pérez-Abreu [HPA], which can be seen as the Ornstein-Uhlenbeck case of the Hörmander-Helffer-Sjöstrand covariance representation formula. A spherical version is considered by Bobkov and Duggal in [BD]. Finer tail bounds are also considered by Aubrun, Jenkinson, and Szarek in [AJS].

The concentration of measure phenomenon is related to the notion of observable diameter developed by Mikhaïl Gromov in metric geometry, see for instance [G,S].

The idea of writing this post came after a question asked by Anthony Nguyen, a PhD student in signal processing at INRIA and École normale supérieure Paris-Saclay.

Further reading.

  • [Lé] Paul Lévy
    Problèmes concrets d'analyse fonctionnelle
    Gauthier-Villars, 1951
  • [MS] Vitali Davidovich Milman, Gideon Schechtman, with an appendix by Mikhaïl Gromov
    Asymptotic theory of finite dimensional normed spaces
    Springer, 1986
  • [G] Mikhaïl Gromov
    Metric structures for Riemannian and non-Riemannian spaces
    With appendices by Mikhaïl Katz, Pierre Pansu and Stephen Semmes
    Translated from the 1981 French original by Sean Michael Bates
    Reprint of the 2001 English edition. Birkhäuser 2007. xx+585pp
    Including the famous chapter 312 on concentration of measure
  • [B] Christer Borell
    The Brunn-minkowski inequality in Gauss space
    Inventiones mathematicae, 1975
  • [S] Takashi Shioya
    Metric Measure Geometry: Gromov's Theory of Convergence and Concentration of Metrics and Measures
    IRMA Lectures in Mathematics & Theoretical Physics, EMS 2016
  • [IST] Ildar Abdulovich Ibragimov, Valdimir Nikolaevich Sudakov, and Boris Semyonovich Tsirelson
    Norms of Gaussian sample functions
    Proceedings of the Third Japan-USSR Symposiumon Probability Theory
    Lecture Notes in Math. 550, pp. 20-41. Springer (1976)
  • [ST] Vladimir Nikolaevich Sudakov, Boris Semyonovich Tsirelson
    Extremal properties of half-spaces for spherically invariant measures
    Journal of Soviet Mathematics, 1978
  • [P] Gilles Pisier
    Probabilistic methods in the geometry of Banach spaces
    Probability and Analysis, Lecture Notes in Math. 1206, pp. 167-241. Springer (1986)
  • [M] Bernard Maurey
    Some deviations inequalities
    Geometric & Functional Analysis (GAFA), 1(2), 188-197 (1991)
  • [T1] Michel Talagrand
    A new isoperimetric theorem and its application to concentration of measure phenomena
    Geometric & Functional Analysis (GAFA), 1(2), 211–223 (1991)
  • [T2] Michel Talagrand
    Transportation cost for Gaussian and other product measures
    Geometric & Functional Analysis (GAFA), 6(3), 587–600 (1995)
  • [HPA] Christian Houdré and Victor Pérez-Abreu
    Covariance identities and inequalities for functionals on Wiener and Poisson spaces
    Ann. Probab., 23, 400-419 (1995)
  • [BG] Sergey Germanovich Bobkov, Frederich Götze
    Exponential integrability and transportation cost related to logarithmic Sobolev inequalities
    Journal of Functional Analysis, 163(1), 1–28 (1999)
  • [BGH] Sergey G. Bobkov, Frederich Götze, and Christian Houdré
    On Gaussian and Bernoulli covariance representations
    Bernoulli 7 (2001), 439-451
  • [Le] Michel Ledoux
    The concentration of measure phenomenon
    AMS, 2001
  • [BV] Serget G. Bobkov and Bruno Volzone
    On Gilles Pisier's approach to Gaussian concentration, isoperimetry, and Poincaré-type inequalities
    http://arxiv.org/abs/2311.03506
  • [H] Christian Houdré
    Covariance representation and an elementary proof of the Gaussian concentration inequality
    https://arXiv.org/abs/2410.06937
  • [AJS] Guillaume Aubrun, Justin Jenkinson, Stanislaw J. Szarek
    Optimal constants in concentration inequalities on the sphere and in the Gauss space
    https://arXiv.org/abs/2406.13581
  • [BD] Sergey G. Bobkov and Devraj Duggal
    Spherical covariance representations
    https://arXiv.org/abs/2403.19089
Photo of Christian Houdré
Christian Houdré
    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Syntax · Style · .