Press "Enter" to skip to content

Libres pensées d'un mathématicien ordinaire Posts

The Sinkhorn factorization of a square matrix $A$ with positive entries takes the form $$A = D_1 S D_2$$ where $D_1$ and $D_2$ are diagonal matrices with positive diagonal entries and where $S$ is a doubly stochastic matrix meaning that it has entries in $[0,1]$ and each row and column sums up to one. It is named after Richard Dennis Sinkhorn (1934 – 1995) who worked on the subject in the 1960s. A natural algorithm for the numerical approximation of this factorization consists in iteratively normalize each row and each column, and this is referred to as the Sinkhorn-Knopp algorithm. Its convergence is fast. Actually this is a special case of the iterative proportional fitting (IPF) algorithm, classic in probability and statistics, which goes back at least to the analysis of the contingency tables in the 1920s, and which was reinvented many times, see for instance the article by Friedrich Pukelsheim and references therein. It is in particular well known that $S$ is the projection of $A$, with respect to the Kullback-Leibler divergence, on the set of doubly stochastic matrices (Birkhoff polytope). The Sinkhorn factorization and its algorithm became fashionable in the domain of computational optimal transportation due to a relation with entropic regularization. It is also considered in the domain of quantum information theory.

Here is a simple implementation using the Julia programming language.

using LinearAlgebra # for Diagonal
#
function sinkhorn(A, tol = 1E-6, maxiter = 1E6)
# Sinkhorn factorization A = D1 S D2 where D1 and D2 are diagonal and S doubly
# stochastic using Sinkhorn-Knopp algorithm which consists in iterative rows
# and columns sums normalizations. This code is not optimized.
m , n = size(A,1), size(A,2)
D1, D2 = ones(1,n), ones(1,n)
S = A
iter, err = 0, +Inf
while ((err > tol) && (iter < maxiter))
RS = vec(sum(S, dims = 2)) # column vector of rows sums
D1 = D1 .* RS
S = Diagonal(1 ./ RS) * S
CS = vec(sum(S, dims = 1)) # row vector of columns sums
D2 = CS .* D2
S = S * Diagonal(1 ./ CS)
iter += 1
err = norm(RS .- 1) + norm(CS .- 1)
end
return S, Diagonal(D1), Diagonal(D2), iter
end # function


Circular law. Suppose now that $A$ is a random matrix, $n\times n$, with independent and identically distributed positive entries of mean $m$ and variance $\sigma^2$, for instance following the uniform or the exponential distribution. Let $S$ be the random doubly stochastic matrix provided by the Sinkhorn factorization. It is tempting to conjecture that the empirical spectral distribution of $$\frac{m\sqrt{n}}{\sigma}S$$ converges weakly as $n\to\infty$ to the uniform distribution on the unit disc of the complex plane, with a single outlier equal to $\frac{m\sqrt{n}}{\sigma}$. The circular law for $S$ is inherited from the circular law for $A$ and the law of large numbers. This can be easily guessed from the first iteration of the Sinkhorn-Knopp algorithm, that reminds the circular law analysis of random Markov matrices inspired from the Dirichlet Markov Ensemble and its decomposition. The circular law for $A$ is a universal high dimensional phenomenon that holds as soon as the variance of the entries is finite. One can also explore the case of heavy tails, and wonder if the circular law for $S$ still remains true!

The following Julia code produces the graphics at the top and the bottom of this post.

using Printf # for @printf
using Plots # for scatter and plot
using LinearAlgebra # for LinRange

function sinkhornplot(A,m,sigma,filename)
S, D1, D2, iter = sinkhorn(A)
@printf("%s\n",filename)
@printf(" ‖A-D1 S D2‖ = %s\n",norm(A-D1*S*D2))
@printf(" ‖RowsSums(S)-1‖ = %s\n",norm(sum(S,dims=2).-1))
@printf(" ‖ColsSums(S)-1‖ = %s\n",norm(sum(S,dims=1).-1))
@printf(" SK iterations = %d\n",iter)
spec = eigvals(S) * m * sqrt(size(S,1)) / sigma
maxval, maxloc = findmax(abs.(spec))
deleteat!(spec, maxloc)
scatter(real(spec),imag(spec),aspect_ratio=:equal,legend=false)
x = LinRange(-1,1,100)
plot!(x,sqrt.(1 .- x.^2),linewidth=2,linecolor=:steelblue)
plot!(x,-sqrt.(1 .- x.^2),linewidth=2,linecolor=:steelblue)
savefig(filename)
end #function

sinkhornplot(rand(500,500),1/2,1/sqrt(12),"sinkhorn-unif.png")
sinkhornplot(-log.(rand(500,500)),1,1,"sinkhorn-expo.png")
sinkhornplot(abs.(randn(500,500)./randn(500,500)),1,1,"sinkhorn-heavy.png")


Example of program output.

sinkhorn-unif.png
‖A-D1 S D2‖ = 6.704194265149758e-14
‖RowsSums(S)-1‖ = 8.209123420903459e-11
‖ColsSums(S)-1‖ = 3.326966272203901e-15
SK iterations = 4
sinkhorn-expo.png
‖A-D1 S D2‖ = 1.9956579227110527e-13
‖RowsSums(S)-1‖ = 5.155302149020719e-11
‖ColsSums(S)-1‖ = 3.3454392974351074e-15
SK iterations = 5
sinkhorn-heavy.png
‖A-D1 S D2‖ = 8.423054036449791e-10
‖RowsSums(S)-1‖ = 4.5152766460217607e-7
‖ColsSums(S)-1‖ = 3.8778423131653425e-15
SK iterations = 126


Further reading.

This post is devoted to some aspects of the Landen transformation, essentially ${x\mapsto(1-x)/(1+x)}$ or ${x\mapsto 4x/(1+x)^2}$, used for certain special functions. It was introduced by John Landen (1719 — 1790) for expressing a hyperbolic arc in terms of two elliptic arcs. It is useful for numerical evaluation. Since its invention, the infinitesimal calculus was systematically used in order to compute geometric quantities such as for instance the length of an arc of ellipse and in particular the circumference of an ellipse. There is no closed formula like in the special case of the circle, and this leads to integrals known as elliptic integrals. Nowadays, they belong to the vast zoo of special functions.

Complete elliptic integrals of first and second kind. Given, for ${\rho\in[0,1]}$, by

$K(\rho) :=\int_0^{\frac{\pi}{2}}\frac{\mathrm{d}\theta}{\sqrt{1-\rho\sin^2(\theta)}} =\int_0^1\frac{\mathrm{d}t}{\sqrt{1-\rho t^2}\sqrt{1-t^2}}$

and

$E(\rho):=\int_0^{\frac{\pi}{2}}\sqrt{1-\rho\sin^2(\theta)}\mathrm{d}\theta =\int_0^1\frac{\sqrt{1-\rho t^2}}{\sqrt{1-t^2}}\mathrm{d}t.$

The incomplete elliptic integrals are given by the same formula after replacing ${\frac{\pi}{2}}$ by an arbitrary angle. The inverse of incomplete elliptic integrals are known as elliptic functions. Geometrically, the length of the arc of an ellipse can be expressed using the elliptic integral of the second kind, while the surface measure of an ellipsoid involves a combination of elliptic integrals of first and second kind. Elliptic integrals and functions appear at many places in mathematics, physics, and engineering, and were studied by several mathematicians, including historically, among others, Leonhard Euler (1707 — 1783), Adrien-Marie Legendre (1752 — 1833), Johann Carl Friedrich Gauss (1777 — 1855), Niels Henrik Abel (1802 — 1829), Carl Gustav Jacob Jacobi (1804 — 1851), Karl Weierstrass (1815 — 1897), and Arthur Cayley (1821 — 1895). Elliptic integrals and functions, are, as most classical special functions, very well known by software packages specialized in mathematics such as Maple and Mathematica.

Landen transformation. For all ${x\in[0,1]}$,

$K\Bigr(\Bigr(\frac{1-x}{1+x}\Bigr)^2\Bigr) =\frac{1+x}{2}K(1-x^2)$

and

$E\Bigr(\Bigr(\frac{1-x}{1+x}\Bigr)^2\Bigr) =\frac{1}{1+x}E(1-x^2)+\frac{2x}{(1+x)^2}K\Bigr(\Bigr(\frac{1-x}{1+x}\Bigr)^2\Bigr).$

Reformulation. If ${x_1:=\frac{1-x}{1+x}}$ then ${1+x_1=\frac{2}{1+x}}$ and ${\frac{4x_1}{(1+x_1)^2}=1-x^2}$, and we get

$K\Bigr(\frac{4x_1}{(1+x_1)^2}\Bigr) = (1+x_1)K(x_1^2).$

Note that ${x_1}$ runs over ${[0,1]}$ when ${x}$ runs over ${[0,1]}$. Similarly we get

$E\Bigr(\frac{4x_1}{(1+x_1)^2}\Bigr) = \frac{2}{1+x_1}E(x_1^2)-(1-x_1)K(x_1^2).$

For all ${x\geq1}$, the identity ${\frac{4x}{(1+x)^2}=\frac{4x^{-1}}{(1+x^{-1})^2}}$ allows to express ${K}$ and ${E}$ at ${\frac{4x}{(1+x)^2}}$ for all ${x\geq1}$.

Ivory derivation via hypergeometric series. It is possible to derive the formulas by using powerful change of variables, presented later on in this post, which remain valid for more general formulas for incomplete elliptic integrals. Nevertheless, following James Ivory (1765 — 1842), for complete elliptic integrals, it is more efficient to proceed by using a hypergeometric series expansion. Namely, by using the trick

$1+2x\cos(\alpha)+x^2=(1+x\mathrm{e}^{\mathrm{i}\alpha})(1+x\mathrm{e}^{-\mathrm{i}\alpha})$

and the Newton binomial theorem

$\frac{1}{(1-z)^{\alpha}}=\sum_{n=0}^\infty\frac{(\alpha)_n}{n!}z^n,$

where ${(\alpha)_n:=\alpha(\alpha+1)\cdots(\alpha+n-1)}$ is the rising factorial, we get, for ${0\leq x<1}$,

$\begin{array}{rcl} K\Bigr(\frac{4x}{(1+x)^2}\Bigr) &=&\frac{1}{2}\int_0^{\pi} \Bigr(\Bigr(1-\frac{4x}{(1+x)^2}\sin^2(\theta)\Bigr)^{-1/2} \mathrm{d}\theta\\ &=&\frac{1}{2}\int_0^{\pi} \Bigr(\Bigr(1-\frac{2x}{(1+x)^2}(1-\cos(2\theta))\Bigr)^{-1/2} \mathrm{d}\theta\\ &=&\frac{1+x}{2}\int_0^{\pi} \Bigr(1+x^2+2x\cos(2\theta)\Bigr)^{-1/2} \mathrm{d}\theta\\ &=&\frac{1+x}{2}\int_0^{\pi} \Bigr(1+x\mathrm{e}^{2\mathrm{i}\theta}\Bigr)^{-1/2} \Bigr(1+x\mathrm{e}^{-2\mathrm{i}\theta}\Bigr)^{-1/2} \mathrm{d}\theta\\ &=&\frac{1+x}{2}\sum_{m=0}^\infty \frac{(\frac{1}{2})_m(-x)^m}{m!} \sum_{n=0}^\infty \frac{(\frac{1}{2})_n(-x)^n}{n!} \int_0^{\pi}\mathrm{e}^{2\mathrm{i}(m-n)\theta}\mathrm{d}\theta\\ &=&\frac{\pi}{2}(1+x)\sum_{n=0}^\infty\frac{(\frac{1}{2})_n^2x^{2n}}{n!^2}\\ &=&(1+x)\frac{\pi}{2}F_{2,1}\Bigr(\frac{1}{2},\frac{1}{2};1;x^2\Bigr)\\ &=&(1+x)K(x^2). \end{array}$

This can be seen as a formula for hypergeometric series: if ${0\leq x<1}$ then

$F_{2,1}\Bigr(\frac{1}{2},\frac{1}{2};1;\frac{4x}{(1+x)^2}\Bigr) =(1+x)F_{2,1}\Bigr(\frac{1}{2},\frac{1}{2};1;x^2\Bigr).$

Similarly, using in the last step

$K(x^2)=F_{2,1}\Bigr(\frac{1}{2},\frac{1}{2};1;x^2\Bigr) \quad\text{and}\quad E(x^2)=F_{2,1}\Bigr(-\frac{1}{2},\frac{1}{2};1;x^2\Bigr),$

we get

$\begin{array}{rcl} (1+x)E\Bigr(\frac{4x}{(1+x)^2}\Bigr) &=&\frac{1+x}{2}\int_0^{\pi} \Bigr(1-\frac{4x}{(1+x)^2}\sin^2(\theta)\Bigr)^{1/2} \mathrm{d}\theta\\ &=&\frac{1+x}{2}\int_0^{\pi} \Bigr(\Bigr(1-\frac{2x}{(1+x)^2}(1-\cos(2\theta))\Bigr)^{1/2} \mathrm{d}\theta\\ &=&\frac{1}{2}\int_0^{\pi} \Bigr(1+x^2+2x\cos(2\theta)\Bigr)^{1/2} \mathrm{d}\theta\\ &=&\frac{1}{2}\int_0^{\pi} \Bigr(1+x\mathrm{e}^{2\mathrm{i}\theta}\Bigr)^{1/2} \Bigr(1+x\mathrm{e}^{-2\mathrm{i}\theta}\Bigr)^{1/2} \mathrm{d}\theta\\ &=&\frac{1}{2}\sum_{m=0}^\infty \frac{(-\frac{1}{2})_m(-x)^m}{m!} \sum_{n=0}^\infty \frac{(-\frac{1}{2})_n(-x)^n}{n!} \int_0^{\pi}\mathrm{e}^{2\mathrm{i}(m-n)\theta}\mathrm{d}\theta\\ &=&\frac{\pi}{2}\sum_{n=0}^\infty\frac{(-\frac{1}{2})_n^2x^{2n}}{n!^2}\\ &=&\frac{\pi}{2}F_{2,1}\Bigr(-\frac{1}{2},-\frac{1}{2};1;x^2\Bigr)\\ &=&\frac{\pi}{2}\Bigr(2F_{2,1}\Bigr(-\frac{1}{2},\frac{1}{2};1;x^2\Bigr) -(1-x^2)F_{2,1}\Bigr(\frac{1}{2},\frac{1}{2};1;x^2\Bigr)\Bigr)\\ &=&2E(x^2)-(1-x^2)K(x^2). \end{array}$

This corresponds to the hypergeometric identity

$\begin{array}{rcl} (1+x)F_{2,1}\Big(-\frac{1}{2},\frac{1}{2};1;\frac{4x}{(1+x)^2}\Bigr) &=&F_{2,1}\Bigr(-\frac{1}{2},-\frac{1}{2};1;x^2\Bigr)\\ &=&2F_{2,1}\Bigr(-\frac{1}{2},\frac{1}{2};1;x^2\Bigr) -(1-x^2)F_{2,1}\Bigr(\frac{1}{2},\frac{1}{2};1;x^2\Bigr). \end{array}$

Invariance of Cayley elliptic integral. For all ${a,b>0}$, the Cayley elliptic integral

$I(a,b) :=\int_0^{\frac{\pi}{2}} \frac{1}{\sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)}} \mathrm{d}\theta$

is left unchanged if we replace ${a,b}$ by their arithmetic and geometric means, namely

$I(a,b)=I\left(\frac{a+b}{2},\sqrt{ab}\right).$

Similarly if we define

$J(a,b):= \int_0^{\frac{\pi}{2}} \sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)} \mathrm{d}\theta$

then

$2J\Bigr(\frac{a+b}{2},\sqrt{ab}\Bigr) =J(a,b) + ab I(a,b)$

Link with Landen transformation for ${K}$. If ${a>b>0}$ then

$I(a,b) =\int_0^{\frac{\pi}{2}} \frac{1}{\sqrt{a^2-(a^2-b^2)\sin^2(\theta)}} \mathrm{d}\theta =\frac{1}{a}K\left(\frac{a^2-b^2}{a^2}\right)$

and

$I\left(\frac{a+b}{2},\sqrt{ab}\right) =\frac{1}{\frac{a+b}{2}} K\left(\frac{\left(\frac{a+b}{2}\right)^2-b^2}{\left(\frac{a+b}{2}\right)^2}\right) =\frac{2}{a+b}K\left(\frac{(a+b)^2-4b^2}{(a+b)^2}\right) =\frac{2}{a+b}K\left(\left(\frac{a-b}{a+b}\right)^2\right),$

and thus, by the invariance of Cayley elliptic integrals,

$K\left(\left(\frac{a-b}{a+b}\right)^2\right) =\frac{a+b}{2a}K\left(\frac{a^2-b^2}{a^2}\right).$

Setting ${x:=\frac{a-b}{a+b}}$ we get ${1+x=\frac{2a}{a+b}}$, ${\frac{4x}{(1+x)^2}=a^2-b^2}$, and, when ${a=1}$,

$(1+x)K(x^2) = K\left(\frac{4x}{(1+x)^2}\right).$

We have thus obtained, from the invariance formula for the Cayley integral ${I(a,b)}$, an alternative proof of the Landen transform formula involving ${K}$. Similarly, the invariance formula for ${J(a,b)}$ leads to the Landen transform formula involving ${E}$ and ${K}$.

Link with arithmetic-geometric mean. If we define ${a_0:=a}$, ${b_0=b}$, and

$a_{n+1}:=\frac{a_n+b_n}{2} \quad\text{and}\quad b_{n+1}:=\sqrt{a_nb_n}$

for all ${n\geq0}$, then it can be shown that

$b=b_0\leq b_1\leq\cdots\leq b_{n+1}\leq a_{n+1}\leq\cdots\leq a_1\leq a_0=a$

and that actually both sequences converge as ${n\rightarrow\infty}$ to a common limit, the arithmetic-geometric mean (AGM) ${M(a,b)}$. Indeed, if ${c_n:=\sqrt{a_n^2-b_n^2}}$ then

$c_{n+1}=\frac{a_n-b_n}{2}\quad\text{and}\quad c_n^2=(a_n-b_n)(a_n+b_n)=4c_{n+1}a_{n+1},$

hence ${c_n}$ decreases to ${0}$ as ${n\rightarrow\infty}$ and

$M(a,b):= \lim_{n\rightarrow\infty}a_n=\lim_{n\rightarrow\infty}b_n.$

Using the invariance of Cayley integrals and observing that ${(a,b)\mapsto I(a,b)}$ is continuous and ${I(c,c)=\frac{\pi}{2c}}$ for all ${c}$, we get

$I(a,b)=I(M(a,b),M(a,b))=\frac{\pi}{2M(a,b)}.$

Moreover since ${I(a,b)=\frac{1}{a}K\Bigr(\frac{a^2-b^2}{a^2}\Bigr)}$ we get

$M(a,b)=\frac{a\pi}{K\Big(\frac{a^2-b^2}{a^2}\Bigr)}.$

If we set ${x:=\frac{\sqrt{a^2-b^2}}{a}}$ then using

$M(a,b)=a^2M\Bigr(1,\frac{b}{a}\Bigr)=M(1,\sqrt{1-x^2})=M(1-x,1+x)$

we get

$M(1-x,1+x)=\frac{\pi}{2K(x^2)}.$

Proof of the invariance of Cayley elliptic integrals. The change of variable

$b\tan\theta=x$

gives ${\mathrm{d}\theta=\frac{\cos^2(\theta)}{b}\mathrm{d}x}$, and since ${\cos^2(\theta)=\frac{b^2}{x^2+b^2}}$,

$I(a,b) =\int_0^\infty \frac{1}{\sqrt{\cos^2(\theta)}\sqrt{(a^2+x^2)}}\frac{b}{x^2+b^2} \mathrm{d}x =\int_0^\infty\frac{1}{\sqrt{(a^2+x^2)(x^2+b^2)}} \mathrm{d}x.$

The further substitution ${x=t+\sqrt{t^2+ab}}$ satisfies

$\mathrm{d}x=(1+\frac{t}{\sqrt{t^2+ab}})\mathrm{d}t=\frac{x}{\sqrt{t^2+ab}}\mathrm{d}t,$

and the identity

$\sqrt{(x^2+a^2)(x^2+b^2)} =2x\sqrt{t^2+\left(\frac{a+b}{2}\right)^2},$

which can be checked using the simpler identity ${x^2=2tx+ab}$, gives

$I(a,b) = \frac{1}{2}\int_{-\infty}^\infty \frac{1}{\sqrt{t^2+\left(\frac{a+b}{2}\right)^2}\sqrt{t^2+ab}} \mathrm{d}t = \int_0^\infty \frac{1}{\sqrt{\left(t^2+\left(\frac{a+b}{2}\right)^2\right) \left(t^2+\sqrt{ab}^2\right)}} \mathrm{d}t$

which is equal to ${I\left(\frac{a+b}{2},\sqrt{ab}\right)}$ according to the preceding formula with ${\frac{a+b}{2},\sqrt{ab}}$ instead of ${a,b}$. Alternatively, following Jean-Pierre Demailly (1957 –), using the change of variable

$\varphi=\theta+\arctan\Bigr(\frac{b}{a}\tan(\theta)\Bigr)$

which is an increasing bijection from ${[0,\pi/2)}$ to ${[0,\pi)}$, we have

$\frac{\mathrm{d}\varphi}{\mathrm{d}\theta} =1+\frac{\frac{b}{a}(1+\tan^2(\theta))}{1+\frac{b^2}{a^2}\tan^2(\theta)} =\frac{(a+b)(a\cos^2(\theta)+b\sin^2(\theta))}{a^2\cos^2(\theta)+b^2\sin^2(\theta)}.$

On the other hand

$\varphi=\theta+\alpha\quad\text{where}\quad\alpha:=\arctan\Bigr(\frac{b}{a}\tan(\theta)\Bigr)\in[0,\pi/2).$

We have ${\tan(\alpha)=\frac{b}{a}\tan(\theta)}$ and since

$\cos(\alpha)=\frac{1}{\sqrt{1-\tan^2(\alpha)}} \quad\text{and}\quad \sin(\alpha)=\frac{\tan(\alpha)}{\sqrt{1+\tan^2(\alpha)}}$

we get

$\begin{array}{rcl} \cos(\varphi) &=&\cos(\theta)\cos(\alpha)-\sin(\theta)\sin(\alpha)\\ &=&\cos(\theta)\frac{1}{\sqrt{1+\Bigr(\frac{b}{a}\Bigr)^2\tan^2(\theta)}} -\sin(\theta)\frac{\frac{b}{a}\tan(\theta)}{\sqrt{1+\Bigr(\frac{b}{a}\Bigr)^2\tan^2(\theta)}} =\frac{a\cos^2(\theta)-b\sin^2(\theta)}{\sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)}} \end{array}$

and

$\begin{array}{rcl} \sin(\varphi) &=&\sin(\theta)\cos(\alpha)+\cos(\theta)\sin(\alpha)\\ &=&\sin(\theta)\frac{1}{\sqrt{1+\Bigr(\frac{b}{a}\Bigr)^2\tan^2(\theta)}} +\cos(\theta)\frac{\frac{b}{a}\tan(\theta)}{\sqrt{1+\Bigr(\frac{b}{a}\Bigr)^2\tan^2(\theta)}} =\frac{(a+b)\sin(\theta)\cos(\theta)}{\sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)}}. \end{array}$

This gives

$\begin{array}{rcl} a_1^2\cos^2(\varphi)+b_1^2\sin^2(\varphi) &=&\Bigr(\frac{a+b}{2}\Bigr)^2\Bigr(\cos^2(\varphi)+\frac{4ab}{(a+b)^2}\sin^2(\varphi)\Bigr)\\ &=&\Bigr(\frac{a+b}{2}\Bigr)^2 \frac{(a\cos^2(\theta)-b\sin^2(\theta))^2+4ab\sin^2(t)\cos^2(t)} {a^2\cos^2(\theta)+b^2\sin^2(\theta)}\\ &=&\Bigr(\frac{a+b}{2}\Bigr)^2 \frac{(a\cos^2(\theta)+b\sin^2(\theta))^2} {a^2\cos^2(\theta)+b^2\sin^2(\theta)}. \end{array}$

In other words, setting ${\Delta(\theta):=\sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)}}$, we get

$\Delta_1(\varphi) :=\sqrt{a_1^2\cos^2(\varphi)+b_1^2\sin^2(\varphi)} =\frac{a+b}{2}\frac{a\cos^2(\varphi)+b\sin^2(\theta)}{\sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)}} =\frac{a+b}{2}\frac{a\cos^2(\theta)+b\sin^2(\theta)}{\Delta(\theta)}$

and thus, combining this with a formula above for ${\mathrm{d}\varphi/\mathrm{d}\theta}$ we obtain

$\frac{1}{2}\frac{\mathrm{d}\varphi}{\Delta_1(\varphi)} =\frac{1}{2}\frac{\mathrm{d}\varphi}{\sqrt{a_1^2\cos^2(\varphi)+b_1^2\sin^2(\varphi)}} =\frac{\mathrm{d}\theta}{\sqrt{a^2\cos^2(\theta)+b^2\sin^2(\theta)}} =\frac{\mathrm{d}\theta}{\Delta(\theta)}$

hence finally

$I\Big(\frac{a+b}{2},\sqrt{ab}\Bigr)=I(a,b).$

Next, for the formula concerning ${J}$, we observe that

$\Delta_1(\varphi)+\frac{a-b}{2}\cos(\varphi) =\frac{a^2\cos^2(\theta)+b^2\sin^2(\theta)}{\sqrt{a^2\sin^2(\theta)+b^2\cos^2(\theta)}} =\Delta(\theta)$

and

$\Delta_1(\varphi)-\frac{a-b}{2}\cos(\varphi) =\frac{ab\cos^2(\theta)+ab\sin^2(\theta)}{\sqrt{a^2\sin^2(\theta)+b^2\cos^2(\theta)}} =\frac{ab}{\Delta(\theta)},$

hence

$2\Delta_1(\varphi)=\Delta(\theta)+\frac{ab}{\Delta(\theta)},$

while a formula above reads ${\mathrm{d}\varphi=2\frac{\Delta_1(\varphi)}{\Delta(\theta)}\mathrm{d}\theta}$ hence

$\Delta_1(\varphi)\mathrm{d}\varphi +\frac{a-b}{2}\cos(\varphi)\mathrm{d}\varphi =\Delta(\theta)\mathrm{d}\varphi =2\Delta_1(\varphi)\mathrm{d}\theta =\Bigr(\Delta(\theta)+\frac{ab}{\Delta(\theta)}\Bigr)\mathrm{d}\theta.$

Finally, since ${\int_0^{2\pi}\cos(\theta)\mathrm{d}\theta=0}$ we get as expected

$2J(a_1,b_1)=J(a,b)+abI(a,b).$

Historical proof by change of variable due to Landen. This proof works also for incomplete elliptic integrals, and admits a geometrical interpretation. Let

$x\in[0,1],\quad x’=\sqrt{1-x^2},\quad x_1=\frac{1-x’}{1+x’}.$

We start from the formula

$2K(x_1^2) =\int_0^{\pi}\frac{\mathrm{d}\theta_1}{\sqrt{1-x_1^2\sin^2(\theta_1)}}.$

We would like to set ${x_1\sin(\theta_1)=\sin(\alpha)}$ in such a way that ${x_1^2\sin^2(\theta_1)=\sin^2(\alpha)}$. But it turns out that it is better to use another change of variable, namely replace ${\theta_1}$ by ${\theta}$ with

$2\theta=\theta_1+\alpha=\theta_1+\arcsin(x_1\sin(\theta_1))$

where ${\arcsin}$ takes its values in ${[0,\pi/2]}$. Then ${\theta}$ runs over ${[0,\pi/2]}$ when ${\theta_1}$ runs over ${[0,\pi]}$, the derivative of the formula with respect to ${\theta_1}$ is ${1+\cos(\theta_1)/\sqrt{1-x_1^2\sin^2(\theta_1)}}$, which vanishes only for ${\theta_1=\pi}$. The crucial point now is the identity

$(1+x_1)\sqrt{1-x^2\sin^2(\theta)} =\sqrt{1-x_1^2\sin^2(\theta_1)} +x_1\cos(\theta_1)$

Now we get

$(1+x_1)x^2 \frac{\sin(\theta)\cos(\theta)} {\sqrt{1-x^2\sin^2(\theta)}} \mathrm{d}\theta = \left( x_1^2 \frac{\sin(\theta_1)\cos(\theta_1)} {\sqrt{1-x_1^2\sin^2(\theta_1)}} +x_1\sin(\theta_1) \right)\mathrm{d}\theta_1$

hence

$\frac{1+x_1}{2x_1}x^2 \frac{\sin(2\theta)} {\sqrt{1-x^2\sin^2(\theta)}} \mathrm{d}\theta = \left( \frac{x_1\sin(\theta_1)\cos(\theta_1) +\sin(\theta_1)\sqrt{1-x_1^2\sin^2(\theta_1)}} {\sqrt{1-x_1^2\sin^2(\theta_1)}} \right)\mathrm{d}\theta_1$

hence

$(1+x’) \frac{\sin(2\theta)} {\sqrt{1-x^2\sin^2(\theta)}} \mathrm{d}\theta = \left( \frac{\sin(\alpha)\cos(\theta_1) +\sin(\theta_1)\cos(\alpha)} {\sqrt{1-x_1^2\sin^2(\theta_1)}} \right)\mathrm{d}\theta_1$

hence

$(1+x’)\frac{\sin(2\theta)} {\sqrt{1-x^2\sin^2(\theta)}} \mathrm{d}\theta = \left( \frac{\sin(\alpha+\theta_1)} {\sqrt{1-x_1^2\sin^2(\theta_1)}} \right)\mathrm{d}\theta_1$

and finally

$(1+x’) \frac{1} {\sqrt{1-x^2\sin^2(\theta)}} \mathrm{d}\theta = \frac{1} {\sqrt{1-x_1^2\sin^2(\theta_1)}} \mathrm{d}\theta_1.$

Integrating over the range of ${\theta}$ and ${\theta_1}$ we obtain

$(1+x’) \int_0^{\frac{\pi}{2}} \frac{\mathrm{d}\theta} {\sqrt{1-x^2\sin^2(\theta)}} = \int_0^{\pi} \frac{\mathrm{d}\theta_1} {\sqrt{1-x_1^2\sin^2(\theta_1)}}$

which gives ${(1+x’)K(x^2)=2K(x_1^2)}$, in other words ${(1+x’)K(1-x’^2)=2K(x_1^2)}$ which is the desired formula with ${x’}$ instead of ${x}$. To get the formula involving ${E}$, we write

$1-x^2\sin^2(\theta) =1-x^2\frac{1-\cos(2\theta)}{2} =1-\frac{x^2}{2}+\frac{x^2}{2}\cos(2\theta),$

and since

$\cos(2\theta) =\cos(\theta_1)\cos(\alpha)-\sin(\theta_1)\sin(\alpha) =\cos(\theta_1)\sqrt{1-x_1^2\sin^2(\theta_1)}-x_1\sin^2(\theta_1)$

we get

$1-x^2\sin^2(\theta) =1-\frac{x^2}{2}-\frac{x^2}{2x_1} +\frac{x^2}{2}\cos(\theta_1)\sqrt{1-x_1^2\sin(\theta_1)} +\frac{x^2}{2x_1}(1-x_1^2\sin^2(\theta_1)).$

Now since ${\frac{x^2}{2x_1}=\frac{(1+x’)^2}{2}}$ and ${1-\frac{x^2}{2}-\frac{x^2}{2x_1}=-x’}$ we obtain

$1-x^2\sin^2(\theta) =x’ +\frac{x^2}{2}\cos(\theta_1)\sqrt{1-x_1^2\sin(\theta_1)} +\frac{(1+x’)^2}{2}(1-x_1^2\sin^2(\theta_1)).$

Combining this identity with a previous identity we obtain

$(1+x’) \int_0^{\frac{\pi}{2}} \sqrt{1-x^2\sin^2(\theta)} \mathrm{d}\theta = \int_0^{\pi} \frac{-x’ +\frac{x^2}{2}\cos(\theta_1)\sqrt{1-x_1^2\sin(\theta_1)} +\frac{(1+x’)^2}{2}(1-x_1^2\sin^2(\theta_1))} {\sqrt{1-x_1^2\sin^2(\theta_1)}} \mathrm{d}\theta_1,$

hence ${(1+x’)E(x^2)=-2x’K(x_1^2)+0+\frac{(1+x’)^2}{2}2E(x_1^2)}$ which rewrites as ${\frac{1}{1+x’}E(1-x’^2)=-\frac{2x’}{(1+x’)^2}K(x_1^2)+E(x_1^2)}$ which is the desired formula with ${x’}$ instead of ${x}$.

Alternative expression of the change of variable. It reads

$\tan(\alpha) =\frac{x_1\sin(\theta_1)}{\sqrt{1-x_1^2\sin^2(\theta_1)}} =\frac{x_1\sin(2\theta)}{1+x_1\cos(2\theta)}.$

Indeed, we have

$\cos(2\theta) =\cos(\theta_1)\cos(\alpha)-\sin(\theta_1)\sin(\alpha) =\cos(\theta_1)\sqrt{1-x_1^2\sin^2(\theta_1)}-x_1\sin^2(\theta_1)$

thus

$1+x_1\cos(2\theta) =\sqrt{1-x_1^2\sin^2(\theta_1)} \Bigr(x_1\cos(\theta_1)+\sqrt{1-x_1^2\sin^2(\theta_1)}\Bigr)$

and therefore

$\sqrt{1-x_1^2\sin^2(\theta_1)} =\frac{1+x_1\cos(2\theta)} {x_1\cos(\theta_1)+\sqrt{1-x_1^2\sin^2(\theta_1)}} =\frac{1+x_1\cos(2\theta)} {(1+x_1)\sqrt{1-x^2\sin^2(\theta)}}.$

Similarly we get

$\sin(2\theta) =\sin(\theta_1)\cos(\alpha)+\cos(\theta_1)\sin(\alpha) =\sin(\theta_1)\sqrt{1-x_1^2\sin^2(\theta_1)}+x_1\sin(\theta_1)\cos(\theta_1)$

and therefore

$x_1\sin(\theta_1) =\frac{x_1\sin(2\theta)} {\sqrt{1-x_1^2\sin^2(\theta_1)}+x_1\cos(\theta_1)} =\frac{x_1\sin(2\theta)} {(1+x_1)\sqrt{1-x^2\sin^2(\theta)}}$

Finally writing ${\tan(\alpha)=\frac{\sin(\alpha)}{\cos(\alpha)}=\frac{x_1\sin(\alpha)}{\sqrt{1-x_1^2\sin^2(\theta)}}}$ leads to the desired expression.

Further reading.

Final words. A famous quotes says “Be careful about reading health books. You may die of a misprint”. We could say the same about books on elliptic integrals!

The Funk-Hecke formula has an analytic, geometric, and probabilistic content. In its simplest form, and probabilistically, it gives the law of the projection of the uniform law on the sphere on any diameter of the sphere. It allows dimension reduction of multivariate integrals.

For ${x,y\in\mathbb{R}^n}$ we write ${x\cdot y=x_1y_1+\cdots+x_ny_n}$ and ${|x|=\sqrt{x\cdot x}=\sqrt{x_1^2+\cdots+x_n^2}}$.

Sphere. Let ${\sigma_{\mathbb{S}^{n-1}}}$ be the uniform probability measure on the unit sphere of ${\mathbb{R}^n}$, ${n\geq2}$,

$\mathbb{S}^{n-1}=\{x\in\mathbb{R}^n:|x|=1\}.$

Denoting ${\mathrm{d}x}$ the trace of the Lebesgue measure on ${\mathbb{S}^{n-1}}$, we have

$\sigma_{\mathbb{S}^{n-1}}(\mathrm{d}x) =\frac{\mathrm{d}x}{|\mathbb{S}^{n-1}|} \quad\text{where}\quad |\mathbb{S}^{n-1}|=\int_{\mathbb{S}^{n-1}}\mathrm{d}x=2\frac{\pi^{\frac{n}{2}}}{\Gamma(\frac{n}{2})}$

is the surface area of ${\mathbb{S}^{n-1}}$. If ${n=2}$ we recover the perimeter ${2\pi}$ of the unit circle.

If ${n=1}$, then ${\mathbb{S}^{n-1}=\mathbb{S}^0=\{-1,1\}}$ is a couple of points, and we can define the uniform probability measure on it as being the Rademacher or symmetric Bernoulli distribution ${\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_1}$, which is ${\frac{1}{2}}$ times the counting measure ${\delta_{-1}+\delta_1}$ which plays the role of the trace of the Lebesgue measure, and from this point of view the surface area is ${2}$, which is exactly the value given by the Gamma based formula above when ${n=1}$.

Funk-Hecke formula. In its basic form, the Funk-Hecke formula states that for all bounded measurable ${f:[-1,1]\mapsto\mathbb{R}}$ and all ${y\in\mathbb{S}^{n-1}}$,

$\int f(x\cdot y)\sigma_{\mathbb{S}^{n-1}}(\mathrm{d}x)=\frac{\Gamma(\frac{n}{2})}{\sqrt{\pi}\Gamma(\frac{n-1}{2})}\int_{-1}^1f(t)(1-t^2)^{\frac{n-3}{2}}\mathrm{d}t.$

The formula does not depend on ${y}$, an invariance due to spherical symmetry.

The constant in the right hand side can be easily recovered by taking ${f=1}$, indeed

$\int_{-1}^1(1-t^2)^{\frac{n-3}{2}}\mathrm{d}t =\int_0^1(1-u)^{\frac{n-3}{2}}u^{-\frac{1}{2}}\mathrm{d}u =\mathrm{Beta}\Bigr(\frac{n-1}{2},\frac{1}{2}\Bigr) =\frac{\Gamma(\frac{1}{2})\Gamma(\frac{n-1}{2})}{\Gamma(\frac{n}{2})}.$

To prove the Funk-Hecke formula, we can slice the integration over ${\mathbb{S}^{n-1}}$ with respect to the value of ${t=\cos(\theta)=x\cdot y\in[-1,1]}$, ${\theta\in[0,\pi]}$. We note that

$\mathrm{d}t=\sin(\theta)\mathrm{d}\theta=\sqrt{1-t^2}\mathrm{d}\theta.$

Moreover the intersection of ${\mathbb{S}^{n-1}}$ with the hyperplane ${\{x\in\mathbb{R}^n:x\cdot y=t\}}$ is a sphere ${S^{n-2}_t}$ of dimension ${n-2}$ and of radius ${\sqrt{1-t^2}}$ (a couple of points if ${n=2}$). Its surface area is ${|S^{n-2}_t|=\sqrt{1-t^2}^{n-2}|\mathbb{S}^{n-2}|}$ (equal to ${2}$ if ${n=2}$). As a consequence, we get

$\int_{\mathbb{S}^{n-1}} f(x\cdot y)\mathrm{d}x =\int_0^\pi f(t)|S^{n-2}_t|\mathrm{d}\theta =|\mathbb{S}^{n-2}|\int_{-1}^1f(t)(1-t^2)^{\frac{n-3}{2}}\mathrm{d}t.$

It remains to divide both sides by ${|\mathbb{S}^{n-1}|}$ and to observe that we have ${\frac{|\mathbb{S}^{n-2}|}{|\mathbb{S}^{n-1}|}=\frac{\Gamma(\frac{n}{2})}{\sqrt{\pi}\Gamma(\frac{n-1}{2})}}$.

Probability laws. In probabilistic terms, if ${X}$ is a random vector of ${\mathbb{R}^n}$ uniformly distributed on ${\mathbb{S}^{n-1}}$ then for all ${y\in\mathbb{S}^{n-1}}$, the law of ${X\cdot y}$ has density

$t\in[-1,1]\mapsto \frac{\Gamma(\frac{n}{2})}{\sqrt{\pi}\Gamma(\frac{n-1}{2})} (1-t^2)^{\frac{n-3}{2}}.$

This law does not depend on the choice of ${y\in\mathbb{S}^{n-1}}$.
It is symmetric in the sense that ${X\cdot y}$ and ${-(X\cdot y)}$ have same law.
The law of ${|X\cdot y|}$ is the image of the law ${\mathrm{Beta}\left(\frac{3}{2},\frac{n-1}{2}\right)}$ by the map ${u\mapsto\sqrt{u}}$.
The law of ${X\cdot y}$ is in particular:

• if ${n=2}$, an arcsine law with density ${\frac{\mathbf{1}_{t\in[-1,1]}}{\pi\sqrt{1-t^2}}}$,
• if ${n=3}$, a uniform law with density ${\frac{\mathbf{1}_{t\in[-1,1]}}{2}}$ (Archimedes principle, see below),
• if ${n=4}$, a semicircle law with density ${\frac{2\sqrt{1-t^2}\mathbf{1}_{t\in[-1,1]}}{\pi}}$.

Archimedes principle. It states that the projection of the uniform law of the unit sphere of ${\mathbb{R}^3}$ on a diameter is the uniform law on the diameter. It is the case ${n=3}$ of the Funk-Hecke formula above. More generally, if ${(X_1,\ldots,X_n)}$ is a random vector of ${\mathbb{R}^n}$ uniformly distributed on the unit sphere ${\mathbb{S}^{n-1}}$ then the projection ${(X_1,\ldots,X_{n-2})}$ is uniformly distributed on the unit ball of ${\mathbb{R}^{n-2}}$. What matters is to lose two dimensions.

It does not work if we replace ${n-2}$ by ${n-1}$. To see it, let us recall that the arcsine law on ${[-1,1]}$ is the projection of the uniform law on ${\mathbb{S}^1}$, which is the unit sphere of ${\mathbb{R}^2}$, while the semicircle law on ${[-1,1]}$ is the projection of the uniform law on the disc ${\{x\in\mathbb{R}^2:|x|\leq 1\}}$ which is the unit ball of ${\mathbb{R}^2}$. Also it follows from the Archimedes principle that the projection of the uniform law on ${\mathbb{S}^2}$ on a plane containing the origin cannot be the uniform law on a disc. Indeed, if it was the case, the projection on a diameter would be the semicircle law, while the Archimedes principle states that it is the uniform law.

Gegenbauer or ultraspherical orthogonal polynomials. This family is orthogonal with respect to the probability measure on ${[-1,1]}$ with density

$t\mapsto\frac{\Gamma(\alpha+1)}{\sqrt{\pi}\Gamma(\alpha+\frac{1}{2})}(1-t^2)^{\alpha-\frac{1}{2}},$

where ${\alpha}$ is a parameter. We could call this distribution the Gegenbauer law. The law that appear in the Funk-Hecke formula corresponds to the case ${\alpha=\frac{n-2}{2}}$. The Gegenbauer polynomials form a special class of Jacobi polynomials, and include Chebyshev ${T}$, Legendre, and Chebyshev ${U}$ polynomials as the special cases ${n=2}$, ${n=3}$, and ${n=4}$.

Spherical harmonics. They are by definition the homogeneous polynomials of ${n}$ variables which are harmonic on ${\mathbb{R}^n}$. Since they are homogeneous, they are characterized by their restriction to ${\mathbb{S}^{n-1}}$. We identify each of them to its restriction to ${\mathbb{S}^{n-1}}$. These polynomials are orthogonal with respect to the uniform probability measure on ${\mathbb{S}^{n-1}}$, and we normalize them in such a way that they form an orthonormal sequence. It turns out moreover that the spherical harmonics form the eigenfunctions of the Laplace-Beltrami operator on the sphere.

The ${k}$-th Gegenbauer polynomial ${P_k}$ of parameter ${\alpha=\frac{n-2}{2}}$, normalized in such a way that ${P_k(1)=1}$, can be expressed in terms of spherical harmonics of degree ${k}$. More precisely, if ${{(Y_{k,\ell})}_{1\leq\ell\leq Z_{n,k}}}$ are all the spherical harmonics on ${\mathbb{S}^{n-1}}$ of degree ${k}$ and if ${Z_{n,k}}$ is their number, then the addition formula states that for all ${x,y\in\mathbb{S}^{n-1}}$,

$\sum_{\ell=1}^{Z_{n,k}}Y_{k,\ell}(x)Y_{k,\ell}(y)=Z_{n,k}P_k(x\cdot y).$

General Funk-Hecke formula. It reads, for any bounded measurable ${f:\mathbb{S}^{n-1}\rightarrow\mathbb{R}}$, any ${y\in\mathbb{S}^{n-1}}$, and any spherical harmonics ${Y_k}$ on ${\mathbb{S}^{n-1}}$ of degree ${k\geq0}$,

$\int f(x\cdot y)Y_k(x)\sigma_{\mathbb{S}^{n-1}}(\mathrm{d}x) =\lambda_kY_k(y)$

where

$\lambda_k=\frac{\Gamma(\frac{n}{2})}{\sqrt{\pi}\Gamma(\frac{n-1}{2})}\int_{-1}^1f(t)P_k(t)(1-t^2)^{\frac{n-3}{2}}\mathrm{d}t.$

We recover the simple Funk-Hecke formula by taking ${k=0}$ for which ${Y_0=1}$ and ${P_0=1}$.

In other words, for any “kernel” ${x\mapsto f(x\cdot y)}$ on ${\mathbb{S}^{n-1}}$, a spherical harmonics ${Y_k}$ of degree ${k}$ is an eigenfunction, associated to the eigenvalue ${\lambda_k}$, of the Funk transform

$g\mapsto \int f(x\cdot\bullet)g(x)\sigma_{\mathbb{S}^{n-1}}(\mathrm{d}x).$

Naming. The Funk-Hecke formula is named after the mathematicians Paul Funk (1886 – 1969) and Erich Hecke (1887 – 1947), two former students of David Hilbert, for their work in harmonic analysis published in 1916 and 1918 respectively.

Stochastic processes. The uniform law on the sphere is invariant for spherical Brownian motion. How about the law of the projection of spherical Brownian motion on a diameter? It turns out that this is again a Markov diffusion process for which the Gegenbauer law is invariant, related to Wright-Fisher diffusions and Jacobi operators.

Further reading.

Syntax · Style · .