Some few moments with the problem of moments – Libres pensées d'un mathématicien ordinaire

\[ \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty}\!x^{2n}e^{-\frac{x^2}{2}}\,dx =\frac{(2n)!}{n!2^n}. \]

This post in an invitation to the moments problem, taken from old notes. Let \( {P} \) be a probability distribution on \( {\mathbb{R}} \). The sequence of absolute moments \( {(M_n)} \) of \( {P} \) is given for every \( {n\in\mathbb{N}} \) by

\[ M_n=\int_{\mathbb{R}}\!|x|^n\,dP(x)\in[0,\infty]. \]

When \( {M_n<\infty} \), the associated moment \( {m_n} \) is given by

\[ m_n=\int_{\mathbb{R}}\!x^n\,dP(x), \]

and we have \( {|m_n|\leq M_n} \). The sequence of moments \( {(m_n)} \) of \( {P} \) is well defined if and only if \( {P} \) has finite absolute moments, in other words if and only if \( {\mathbb{R}[X]\subset\mathrm{L}^1(P)} \). Notice that \( {(M_n)=(m_n)} \) when \( {P} \) is supported in \( {\mathbb{R}_+} \). We say that a probability distribution \( {P} \) on \( {\mathbb{R}} \) (resp. \( {\mathbb{R}_+} \)) is characterized by its moments if and only if \( {P} \) is the unique probability distribution on \( {\mathbb{R}} \) (resp. \( {\mathbb{R}_+} \)) with sequence of moments \( {(m_n)} \). Moments problems go back probably to Tchebychev, Markov, and Stieltjes, and can be subdivided into several subproblems including:

existence. under which condition a sequence of real numbers \( {(m_n)} \) is the sequence of moments of a probability distribution?
uniqueness. under which condition a probability distribution is characterized by its moments?
structure. how to describe the convex set of all probability distributions sharing the same sequence of moments?

Notice that existence and uniqueness problems are in general sensitive to additional constraints. For the moments problem, uniqueness on \( {\mathbb{R}_+} \) does not imply uniqueness on \( {\mathbb{R}} \), whereas existence on \( {\mathbb{R}} \) does not imply existence on \( {\mathbb{R}_+} \). Stieltjes studied moments problems on \( {\mathbb{R}_+} \). He obtained a necessary and sufficient condition for existence, and studied uniqueness. His methods involve for instance continuous fractions. Later, Hamburger continued the work of Stieltjes and studied moments problems on \( {\mathbb{R}} \). Actually, moments problems were studied by many people including among others Marcel Riesz, Krein, Hausdorff, Hamburger, and Carleman. Nowadays, there is less activity around moments problems, but one may read Simon and also Diaconis and Freedman for a revival.

1. Existence

The moments problem is a linear problem, with a positivity constraint. The following result is due to Hamburger.

Theorem 1 (Hamburger) A sequence \( {(m_n)} \) of real numbers is the sequence of moments of a probability distribution on the real line if and only if the infinite Hankel matrix
\[ H= \begin{pmatrix} m_0 & m_1 & m_2 & \cdots \\ m_1 & m_2 & m_3 & \cdots \\ m_2 & m_3 & m_4 & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{pmatrix} \]
is positive definite: \( {\sum_{n,n’}m_{n+n’}u_n\overline{u}_{n’}\geq0} \) for any sequence \( {(u_n)} \) of complex numbers such that \( {u_n=0} \) except for a finitely many values of \( {n} \).

Notice that \( {H_{n,n’}=m_{n+n’}} \) and that the condition on \( {H} \) means that every finite square submatrix of \( {H} \) is positive definite. There exists a necessary and sufficient condition it terms of matrices for the Stieltjes moment problem. The reader will find more details on Stietljes and Hamburger moments problems in Shohat and Tamarkin, Akhiezer, and Krein and Nudel’man.

2. Uniqueness

One can derive sufficient conditions for uniqueness in terms of Hankel determinants. We give in the sequel some aspects of a more general approach based on quasi-analytic functions. In the sequel, and unless explicitly mentioned, we consider uniqueness on \( {\mathbb{R}} \), not on \( {\mathbb{R}_+} \). Following Hausdorff, the density of the polynomials for the uniform topology (Weierstass-Bernstein theorem) implies that any probability distribution \( {P} \) supported in a compact interval \( {[a,b]\subset\mathbb{R}} \) is characterized by its moments.

Remark 1 (A counter-example by C.C. Heyde) The log-normal distribution is not characterized by its moments. Namely, a real random variable \( {X} \) follows the log-normal distribution if and only if \( {\log(X)} \) is a standard Gaussian distribution. The associated Lebesgue density is \( {x\mapsto (2\pi)^{-1/2}x^{-1}\exp(-(\log(x))^2/2)\mathbf{1}_{\mathbb{R}_+}(x)} \). Now, for any fixed real number \( {a\in[-1,+1]} \), let us consider the Lebesgue density \( {f_a} \) defined by \( {f_a(x)=f(x)(1+a\sin(2\pi\log(x)))} \) for every \( {x\in\mathbb{R}} \). It turns out that \( {f} \) and \( {f_a} \) share the same sequence of moments, see for instance page 227 in Feller.

The following result is due to Tchakaloff, see also Bayer and Teichmann.

Theorem 2 (Tchakaloff, Bayer and Teichmann) Let \( {P} \) be a probability distribution on \( {\mathbb{R}^d} \) with some finite absolute moments: \( {\max_{1\leq k\leq n}M_n<\infty} \) for some integer \( {n\geq1} \). Let \( {\mathbb{R}_n[X]} \) be the vector space of polynomial functions of total degree less than of equal to \( {n} \). Then there exists a finitely supported probability distribution \( {P_k=p_1\delta{x_1}+\cdots+p_k\delta{x_k}} \) with \( {\mathrm{supp}(P_k)=\{x_1,\dots,x_k\}\subset\mathrm{supp}(P)} \) and \( {k\leq\mathrm{Dim}\mathbb{R}_n[X]} \) such that for every \( {f\in\mathbb{R}_n[X]} \),
\[ \int_{\mathbb{R}^d}\!f(x)\,dP(x) =\int_{\mathbb{R}^d}\!f(x)\,dP_k(x)=\sum_{i=1}^k p_if(x_i). \]

Proof: The proof uses basic separation properties of convex sets, the extremal points theorem of Minkowski-Carathéodory, and the transformation of the measure into a measure on the space of polynomials of bounded degree. This kind of result is also referred as quadrature of cubature formulas. See also Putinar and Curto and Fialkow. There a link between Hamburger moments problems and the density of polynomials in Lebesgue spaces, see for instance Stoyanov, Bakan, Bakan, Berg, Fialkow and Petrovic, and Stochel, Szafraniec, Putinar, Vasilescu. ☐

Theorem 3 (Analycity of the Fourier transform and the moments problem) Let \( {P} \) be a probability distribution on \( {\mathbb{R}} \) with well defined moments \( {(m_n)} \) and Fourier transform \( {\varphi_P} \). The following propositions are equivalent.
\( {\varphi_P} \) is analytic on a neighborhood of the origin;
\( {\varphi_P} \) is analytic on \( {\mathbb{R}} \);
\( {\varlimsup_n \left(\frac{1}{n!}|m_n|\right)^{\frac{1}{n}}<\infty} \).
Moreover, if they hold true, then \( {P} \) is characterized by its moments \( {(m_n)} \). It is the case in particular when \( {P} \) is compactly supported or when
\[ \varlimsup_n \frac{1}{n}|m_n|^{\frac{1}{n}}<\infty. \]
.

Proof: For every \( {n} \), we have \( {M_n<\infty} \), and thus \( {\varphi_P} \) is \( {n} \) times differentiable on \( {\mathbb{R}} \). Moreover \( {\varphi^{(n)}_P} \) is continuous on \( {\mathbb{R}} \) and for every \( {t\in\mathbb{R}} \),

\[ \varphi^{(n)}_P(t)=\int_{\mathbb{R}}\!(ix)^ne^{itx}\,dP(x). \]

In particular, \( {\varphi_P^{(n)}(0)=i^n m_n} \), and the Taylor series of \( {\varphi_P} \) at the origin is determined by the sequence \( {(m_n)} \). Recall that the radius of convergence \( {r} \) of the power series \( {\sum_{n}a_n z^n} \) associated to the sequence of complex numbers \( {(a_n)} \) is given by the Hadamard formula

\[ r^{-1}=\varlimsup_n |a_n|^{\frac{1}{n}}, \]

and consequently, 1\( {\Leftrightarrow} \)2 (just take \( {a_n=i^n m_n/n!} \)). In the other hand, for any \( {n\in\mathbb{N}} \) and any \( {s,t\in\mathbb{R}} \), we have

\[ e^{isx} \left(e^{itx}-1-\frac{itx}{1!}-\cdots-\frac{(itx)^{n-1}}{(n-1)!}\right) \leq \frac{|tx|^n}{n!}, \]

see for instance p. 512 and 514 in Feller, and thus, for any \( {n\in\mathbb{N}} \) and any \( {s,t\in\mathbb{R}} \),

\[ \left(\varphi_P(s+t)-\varphi_P(s)-\frac{t}{1!}\varphi_P'(s)-\cdots-\frac{t^{n-1}}{(n-1)!}\varphi^{(n-1)}_P(s)\right) \leq m_n \frac{|t|^n}{n!}, \]

which implies 1 \( {\Leftrightarrow} \) 2. By the Stirling formula, if \( {\varlimsup_n \frac{1}{n}|m_n|^{\frac{1}{n}}<\infty} \) then condition 3 holds true. If \( {P} \) is compactly supported, then \( {\sup_n |m_n| <\infty} \) and thus condition 3 holds true. Suppose now that conditions 1–3 hold true. From condition 2, the analytic continuation principle states that \( {\varphi_P} \) admits a maximal simply connected analytic continuation to a neighborhood of \( {\mathbb{R}} \) in \( {\mathbb{C}} \), which is thus holomorphic. Next, the sequence of moments \( {(m_n)} \) uniquely characterizes the Taylor series at the origin, and thus uniquely characterizes the analytic continuation of \( {\varphi_P} \) by virtue of the isolated zeros theorem. In particular, the sequence of moments characterizes the function \( {\varphi_P} \) on \( {\mathbb{R}} \), and thus \( {P} \) by virtue of the the Cramér-Wold theorem. ☐

It turns out that a probability distribution on \( {\mathbb{R}} \) can be characterized by its moments without having an analytic Fourier transform. Actually, in view of the moments problem, the main useful property here regarding analycity is that an analytic function on \( {\mathbb{R}} \) is uniquely determined by its value and the values of all its derivatives at the origin. Quasi-analytic functions have this property. These functions where introduced by Borel and Hadamard, and where later brilliantly studied by Denjoy and Carleman, see for instance the memoir of Carleman, or chapter 19 in Rudin and section 4.2 in Bélisle, Massé, and Ransford. The Carleman condition appearing below is strictly weaker than the Hadamard condition.

For any sequence of positive real numbers \( {(c_n)} \) and any bounded interval \( {[a,b]\subset\mathbb{R}} \), we denote by \( {\mathcal{C}([a,b],(c_n))} \) the class of infinitely differentiable functions \( {f:[a,b]\subset\mathbb{R}\rightarrow\mathbb{C}} \) such that

\[ \sup_{[a,b]}|f^{(n)}|\leq r^n c_n \]

for any \( {n\in\mathbb{N}} \) and for some positive real constant \( {r} \) which may depend on \( {f} \). The Hadamard problem consists in finding conditions on \( {(c_n)} \) such that any couple of functions \( {f} \) and \( {g} \) in \( {\mathcal{C}([a,b],(c_n))} \) that are equal together with all their derivatives at some fixed point of \( {[a,b]} \) are equal on the whole interval \( {[a,b]} \). Such functions are called quasi-analytic. The analytic functions on \( {[a,b]} \) correspond to the class \( {\mathcal{C}([a,b],(n!))} \).

Theorem 4 (Denjoy-Carleman characterization of quasi-analycity) For any sequence of positive real numbers \( {(c_n)} \) and any bounded interval \( {[a,b]\subset\mathbb{R}} \), the class \( {\mathcal{C}([a,b],(c_n))} \) is quasi-analytic if and only if
\[ \sum_{n=1}^\infty \left(\inf_{k\geq n}|c_k|^\frac{1}{k}\right)^{-1}=\infty. \]

Analycity implies quasi-analycity but the converse if false. The Carleman condition is satisfied if the Hadamard condition is satisfied.

Corollary 5 (Carleman condition for the moments problem) Let \( {P} \) be a probability distribution on \( {\mathbb{R}} \) with finite absolute moments \( {(M_n)} \) and moments \( {(m_n)} \). If at least one of the following conditions is satisfied
\( {\sum_{n=1}^\infty {M_{2n}}^{-\frac{1}{2n}}=\infty} \);
\( {\sum_{n=1}^\infty {M_{n}}^{-\frac{1}{n}}=\infty} \);
\( {\sum_{n=1}^\infty |m_{n}|^{-\frac{1}{n}}=\infty} \)
then \( {P} \) is characterized by its moments.

Proof: We have \( {|m_n|\leq M_n} \) for every \( {n\in\mathbb{N}} \). In the other hand, the elementary bound

\[ 2|u|^{2n+1}\leq|u|^{2n}+|u|^{2n+2} \]

valid for every \( {u\in\mathbb{R}} \) and \( {n\in\mathbb{N}} \) implies that

\[ 2M_{2n}^{2n+1}\leq M_{2n}+M_{2n+2} \]

for every \( {n\in\mathbb{N}} \). Consequently, we obtain the following cascading Carleman like conditions:

\[ \sum_{n=1}^\infty {M_{2n}}^{-\frac{1}{2n}}=\infty \quad\Rightarrow\quad \sum_{n=1}^\infty {M_{n}}^{-\frac{1}{n}}=\infty \quad\Rightarrow\quad \sum_{n=1}^\infty|m_{n}|^{-\frac{1}{n}}=\infty. \]

They imply that \( {\varphi_P} \) is quasi-analytic, and that it is characterized by the sequence \( {(m_n)} \) since \( {\varphi_P^{(n)}(t)=(it)^n m_n} \) for every \( {n\in\mathbb{N}} \) and \( {t\in\mathbb{R}} \). The desired result follows then from the Denjoy-Carleman theorem with \( {c_n=|m_n|} \) by using the bound \( {\inf_{k\geq n} c_k\leq c_n} \). ☐

If \( {\varphi_P} \) is analytic on a neighborhood of the origin then the Hadamard condition \( {\varlimsup_n n^{-1}|m_n|^{\frac{1}{n}}<\infty} \) holds true and implies the Carleman condition \( {\sum_{n=1}^\infty |m_n|^{-\frac{1}{n}}=\infty} \).

Let us move now to the multivariate moment problem. We define the sequence of absolute moments \( {(M_n)} \) of a probability distribution \( {P} \) on \( {\mathbb{R}^d} \) with \( {d>1} \) by

\[ M_n=\int_{\mathbb{R}^d}\!\left\Vert x\right\Vert^n\,dP(x)\in[0,\infty] \]

for any \( {n\in\mathbb{N}} \). If \( {P_{\left\Vert\cdot\right\Vert}} \) denotes the image distribution of \( {P} \) by the map \( {x\mapsto\left\Vert x\right\Vert} \), then \( {P_{\left\Vert\cdot\right\Vert}} \) is a probability distribution on \( {\mathbb{R}_+} \) with sequence of moments \( {(M_n)} \). By using Hölder inequality, if \( {M_n<\infty} \) for every \( {n\in\mathbb{N}} \), then for any multi-index \( {k\in\mathbb{N}^d} \), one can define the moment \( {m_k} \) of \( {P} \) by

\[ m_k=m_{k_1,\ldots,k_d}=\int_{\mathbb{R}^d}\!x_1^{k_1}\cdots x_d^{k_d}\,dP(x). \]

We say that \( {P} \) is characterized by its moments if and only if \( {P} \) is the unique probability distribution on \( {\mathbb{R}^d} \) with moments \( {(m_k)} \). Notice that when \( {M_n<\infty} \) for every \( {n\in\mathbb{N}} \), the Fourier transform \( {\varphi_P} \) is infinitely Fréchet differentiable on \( {\mathbb{R}^d} \) and for every \( {t\in\mathbb{R}^d} \) and \( {k\in\mathbb{N}^d} \),

\[ \partial^{k_1}_{t_1}\cdots\partial^{k_d}_{t_d}\varphi_P(t_1,\ldots,t_d)= i^{k_1+\cdots+k_d}t_1^{k_1}\cdots t_d^{k_d} m_k. \]

It is delicate to make use of analycity in dimension strictly bigger than \( {1} \) due to the lack of multidimensional isolated zeros theorem (e.g. Hartog type phenomena). In some sense, the Carleman moments condition turns out to be more flexible. If \( {P} \) is a law on \( {\mathbb{R}^d} \) and \( {X\sim P} \) and \( {x\in\mathbb{R}^d} \) then we denote by \( {P_{\left<x\right>}} \) the law of the real random variable \( {\left<x,X\right>} \).

Theorem 6 (Multidimensional case) Let \( {P} \) be a probability distribution on \( {\mathbb{R}^d} \) with Fourier transform \( {\varphi_P} \) and finite absolute moments \( {(M_n)} \) and moments \( {(m_k)} \). If at least one the following propositions hold true
\( {P_{\left<x\right>}} \) is characterized by its moments for every \( {x\in\mathbb{S}(\mathbb{R}^d)} \);
\( {P} \) satisfies to the Carleman condition
\[ \sum_{n=1}^\infty \left(M_{n}\right)^{-\frac{1}{n}}=\infty; \]
for every \( {x\in\mathbb{S}(\mathbb{R}^d)} \), the function \( {t\in\mathbb{R}\mapsto\varphi_P(tx)} \) is analytic on \( {\mathbb{R}} \);
then \( {P} \) is characterized by its moments \( {(m_k)} \).

Proof: For every \( {x\in\mathbb{S}(\mathbb{R}^d)} \), the moments of the unidimensional probability distribution \( {P_{\left<x\right>}} \) are uniquely determined by the sequence \( {(m_k)} \) since for every \( {n\in\mathbb{N}} \),

\[ \int_{\mathbb{R}}\!u^n\,dP_{\left<x\right>}(u)=\int_{\mathbb{R}}\!\left<x,y\right>^n\,dP(y) =\sum_{k_1+\cdots+k_d=n}\binom{n}{k_1\cdots k_d} x_1^{k_1}\cdots x_d^{k_d}m_{k_1,\ldots,k_d}. \]

By the Cramér-Wold theorem, 1 \( {\Rightarrow} \) \( {P} \) is characterized by \( {(m_k)} \). 2\( {\Rightarrow} \)1. For every \( {x\in\mathbb{S}(\mathbb{R}^d)} \), the unidimensional probability distribution \( {P_{\left<x\right>}} \) satisfies in turn to the Carleman condition since for every \( {n\in\mathbb{N}} \),

\[ \int_{\mathbb{R}}\!|u|^n\,dP_{\left<x\right>}(u)= \int_{\mathbb{R}^d}\!|\left<x,y\right>|^n\,dP(y) \leq \left\Vert x\right\Vert^{n}\int_{\mathbb{R}^d}\!\left\Vert y\right\Vert^{n}\,dP(y) =M_n, \]

3\( {\Rightarrow} \)1. We have For every \( {x\in\mathbb{S}(\mathbb{R}^d)} \), \( {\varphi_{P_{\left<x\right>}}(t)=\varphi_P(tx)} \) for every \( {t\in\mathbb{R}} \). ☐

Some other posts:

2 Comments

Djalil Chafaï 2012-02-27
More in these personal unpublished notes…
Ti 2024-05-30
Dear Prof. Chafaï,
thank you for your post, Existence and the associated counter example on the log normal are new to me… So the following question would seem a bit naive, anyway :
I was wondering if such results would exist with cumulants ?
On the moment problem, do you know any book that gather the theorems you mentioned ?
Kind regards

This site uses Akismet to reduce spam. Learn how your comment data is processed.