We will see in this post how the arcsine law appears naturally in the combinatorics of paths of the simple random walk on \( {\mathbb{Z}} \), while the Wigner semicircle law appears naturally in the asymptotic analysis (over the degree) of the combinatorics of paths in regular trees, in relation with free groups. The link with free probability and random matrices might be the subject of further posts.
Counting paths on graphs. Let \( {G=(V,E)} \) be a graph with set of vertices \( {V} \), and set of edges \( {E\subset\{\{v,w\}:v\neq w\in V\}} \). We assume that \( {V} \) is at most countable, that \( {E} \) is non empty, and that each vertex has positive and finite degree: for any \( {v\in V} \),
\[ d_v=\sum_{w\in V}\mathbf{1}_{\{v,w\}\in E}\in(0,\infty). \]
We may consider the Hilbert space
\[ \ell^2_{\mathbb{C}}(V) =\left\{x:V\rightarrow\mathbb{C}\mbox{ with }\sum_{v\in V}|x_v|^2<\infty\right\}. \]
To any \( {x\in\ell^2_{\mathbb{C}}(V)} \) we associate a complex measure \( {\sum_{v\in V}x_v\delta_v} \). We have \( {\langle\delta_v,\delta_w\rangle=\mathbf{1}_{v=w}} \) and we say that \( {{(\delta_v)}_{v\in V}} \) is the canonical basis of \( {\ell^2_{\mathbb{C}}(V)} \). The scalar product is
\[ \langle x,y\rangle =\sum_{v,w\in V}x_v\overline{y_w}\langle\delta_v,\delta_w\rangle =\sum_{v\in V}x_v\overline{y_v}. \]
The adjacency operator of \( {G} \) is \( {A:\ell^2_{\mathbb{C}}(V)\rightarrow \ell^2_{\mathbb{C}}(V)} \) defined for every \( {v,w\in V} \) by
\[ \langle A\delta_w,\delta_v\rangle=\mathbf{1}_{\{v,w\}\in E}. \]
Note that \( {A} \) is well defined since \( {\|A\delta_v\|^2=\langle A\delta_v,A\delta_v\rangle=d_v<\infty} \) for any \( {v\in V} \). For any \( {m\in\mathbb{N}} \), we denote by \( {A^m=A\circ\cdots\circ A} \) the \( {m} \)-th power of \( {A} \). For every vertices \( {v,w\in V} \) the number of paths in \( {G} \) of length \( {m} \) starting at \( {v} \) and ending at \( {w} \) is
\[ \langle A^m\delta_v,\delta_w\rangle =\sum_{\substack{u_0,\ldots,u_m\in V\\v=u_0,u_m=w}} \mathbf{1}_{\{u_0,u_1\}\in E}\cdots\mathbf{1}_{\{u_{m-1},u_m\}\in E}. \]
Simple random walk on graphs. Let \( {G=(V,E)} \) be as above. The simple random walk \( {X={(X_t)}_{t\in\mathbb{N}}} \) on \( {G} \) is the Markov chain with state space \( {V} \) and transition kernel \( {P:V\times V\mapsto[0,1]} \) given for any \( {v,w\in V} \) by
\[ P(v,w) =\mathbb{P}(X_{n+1}=w|X_n=v) =\frac{\mathbf{1}_{\{v,w\}\in E}}{d_v}. \]
The measure \( {\mu} \) on \( {V} \) associated to the degree sequence \( {{(d_v)}_{v\in V}} \) of \( {G} \), defined by \( {\mu(v)=d_v} \) for every \( {v\in V} \), is symmetric with respect to the kernel \( {P} \): for every \( {v,w\in V} \),
\[ \mu(v)P(v,w)=\mu(w)P(w,v). \]
Suppose now that \( {G} \) is regular, meaning that \( {{(d_v)}_{v\in V}} \) is constant and say equal to some \( {d>1} \). Then \( {\mu} \) is proportional to the counting measure, and \( {P} \) is doubly stochastic. Conditional on \( {\{X_0=v\}} \), for every \( {m\in\mathbb{N}} \), the law of the random path \( {(X_0=v,X_1,\ldots,X_m)} \) is uniform on the set of paths of length \( {m} \) starting at \( {v} \). It follows that for every \( {v\in V} \) and every \( {m\in\mathbb{N}} \),
\[ \mathbb{P}(X_m=v|X_0=v) =P^m(v,v) =\frac{\langle A^m\delta_v,\delta_v\rangle}{d^m}, \]
and it follows then from Markov chains theory that the random walk \( {X} \), which is irreducible, is recurrent (null recurrent if \( {V} \) is infinite) if and only if for some (thus every) \( {v\in V} \)
\[ \sum_{m\in\mathbb{N}} \frac{\langle A^m\delta_v,\delta_v\rangle}{d^m} =\infty. \]
One can compute \( {\langle A^m\delta_v,\delta_v\rangle} \) with some combinatorics for certain nice regular graphs.
Counting paths on square lattices. Let us compute \( {\langle A^m\delta_v,\delta_v\rangle} \) when \( {G} \) is the usual square lattice of dimension \( {n\geq1} \), which is the Cayley graph of the commutative group \( {(\mathbb{Z}^n,+)} \). In this case \( {G} \) is regular of even degree \( {d=2n} \). In this graph \( {G} \), the quantity \( {\langle A^m\delta_v,\delta_v\rangle} \) does not depend on \( {v} \) and is equal to zero if \( {m} \) is odd. When \( {m} \) is even, we may replace \( {m} \) by \( {2m} \) for convenience. Thanks to the commutativity, one can, for each path, group the increments by type. This makes the combinatorics multinomial (encodes the outcomes of a dice of \( {2n} \) faces thrown \( {2m} \) times): place \( {2m} \) balls in \( {2n} \) boxes labeled \( {+e_1,-e_1,\ldots,+e_n,-e_n} \) in such a way that the boxes \( {+e_i} \) and \( {-e_i} \) contain the same number of balls, for each \( {1\leq i\leq n} \):
\[ \langle A^{2m}\delta_0,\delta_0\rangle =\sum_{\substack{0\leq m_1,\ldots,m_n\leq m\\m_1+\cdots+m_n=m}}\binom{2m}{m_1,m_1,\ldots,m_n,m_n} =\sum_{\substack{0\leq m_1,\ldots,m_n\leq m\\m_1+\cdots+m_n=m}}\frac{(2m)!}{m_1!^2\cdots m_n!^2}. \]
When \( {n=1} \) then the formula boils down to the central binomial coefficient
\[ \langle A^{2m}\delta_0,\delta_0\rangle\overset{n=1}{=}\binom{2m}{m}. \]
When \( {n=2} \) then by the Vandermonde convolution formula for binomial coefficients,
\[ \langle A^{2m}\delta_0,\delta_0\rangle\overset{n=2}{=}\frac{(2m)!}{m!^2}\sum_{0\leq k\leq m} \binom{m}{k}\binom{m}{m-k}=\frac{(2m)!}{m!^2}\binom{2m}{m}=\frac{(2m)!^2}{m!^4}. \]
More generally, one may seek for compact formulas for any \( {n\geq1} \) in the same spirit. Beyond the example of square lattices, it turns out that one can compute \( {\langle A^m\delta_v,\delta_v\rangle} \) for other examples of regular graphs, such as for instance regular trees.
Counting paths on regular trees. For any integer \( {d\geq2} \), the regular tree \( {T_d=(V,E)} \) is the graph with infinite countable number of vertices, without cycles, and in which each vertex has exactly \( {d} \) neighbors. If \( {d} \) is even with \( {d=2n} \) then \( {T_d=T_{2n}} \) is the Cayley graph of the free group \( {\mathbb{F}_n} \) (see below), in particular \( {T_2} \) is the Cayley graph of \( {\mathbb{Z}} \). Let us pick a vertex \( {\varnothing} \) in \( {T_d} \), and call it the root. In the case \( {d=2n} \) it is customary to identify \( {\varnothing} \) with the neutral element of the free group \( {\mathbb{F}_n} \) (which is \( {0} \) if \( {n=1} \), and the empty string if \( {n>1} \), see below for details).
Let us denote by \( {A_d} \) the adjacency operator of our regular tree \( {T_d} \). Recall that \( {\langle A_d^m\delta_v,\delta_v\rangle} \) is the number of paths in the graph \( {T_d} \), of length \( {m} \), starting and ending at \( {\varnothing} \). As for square lattices, the quantity \( {\langle A_d^m\delta_v,\delta_v\rangle} \) does not depend on \( {v\in V} \), and is equal in particular to \( {\langle A_d^m \delta_\varnothing,\delta_\varnothing\rangle} \), and is equal to zero if \( {m} \) is odd. We have
\[ \langle A_d^m\delta_\varnothing,\delta_\varnothing\rangle =\int\!x^m\,\mu_{d}(dx) \]
for any \( {m\in\mathbb{N}} \), where \( {\mu_{d}} \) is the Kesten-McKay distribution given by
\[ \mu_{d}(dx) =\frac{d\sqrt{4(d-1)-x^2}}{2\pi(d^2-x^2)} \mathbf{1}_{[-2\sqrt{d-1},2\sqrt{d-1}]}(x)dx. \]
In other words, the number of paths of length \( {m} \) in \( {T_d} \), starting and ending at \( {\varnothing} \), is given by the \( {m} \)-th moment of the probability distribution \( {\mu_{d}} \). A proof relies on the Carleman moments condition for the existence of the probability measure, and on the inversion of the Cauchy-Stieltjes transform to get the formula for the density. Experts may note by the way that the probability distribution \( {\mu_{d}} \) is nothing else but the so-called spectral distribution with respect to vector \( {\delta_\varnothing} \) of the symmetric operator \( {A_d} \) (actually with respect to any canonical vector \( {\delta_v} \), \( {v\in V} \), thanks to symmetries!).
When \( {d=2} \) then \( {T_2} \) is the Cayley graph of \( {F_1=\mathbb{Z}} \) and the corresponding Kesten-McKay distribution \( {\mu_{2}} \) is the arcsine distribution on \( {[-2,2]} \): for every integer \( {m\geq1} \),
\[ \langle A_d^{2m}\delta_\varnothing,\delta_\varnothing\rangle =\int\!x^{2m}\,\mu_{2}(dx) =\int_{-2}^2\frac{x^{2m}}{\pi\sqrt{4-x^2}}dx =\binom{2m}{m} \]
and we recover that the number of paths of length \( {2m} \) in \( {\mathbb{Z}} \) starting and ending at the root \( {\varnothing} \) (which is the origin \( {0} \)) is given by the central binomial coefficient:
\[ \langle A_d^{2m}\delta_\varnothing,\delta_\varnothing\rangle =\binom{2m}{m}. \]
At the opposite side of degree, when \( {d\rightarrow\infty} \) then \( {\mu_{d}} \), scaled by \( {(d-1)^{-1/2}} \), tends to the semicircle distribution on \( {[-2,2]} \) (also known as the Wigner or the Sato-Tate distribution): for every integer \( {m\geq1} \),
\[ \begin{array}{rcl} \lim_{d\rightarrow\infty} \frac{\langle A_d^{2m}\delta_\varnothing,\delta_\varnothing\rangle}{(d-1)^m} &=&\lim_{d\rightarrow\infty}\int\!\left(\frac{x}{\sqrt{d-1}}\right)^{2m}\,\mu_{d}(dx)\\ &=&\lim_{d\rightarrow\infty}\frac{d}{2\pi}\int_{-2\sqrt{d-1}}^{2\sqrt{d-1}}\! \left(\frac{x}{\sqrt{d-1}}\right)^{2m} \frac{\sqrt{4(d-1)-x^2}}{d^2-x^2}\,dx\\ &=&\lim_{d\rightarrow\infty}\frac{d}{2\pi}\int_{-2}^2\!y^{2m}\frac{\sqrt{4(d-1)-(d-1)y^2}}{d^2-(d-1)y^2}\,\sqrt{d-1}dy\\ &=&\int_{-2}^2 y^{2m}\frac{\sqrt{4-y^2}}{2\pi}dy\\ &=&\frac{1}{1+m}\binom{2m}{m}. \end{array} \]
This quantity \( {C_m=\frac{1}{1+m}\binom{2m}{m}} \) is nothing else but the \( {m} \)-th Catalan number. Taking \( {d=2n} \), this says that the number of paths of length \( {2m} \) in the free group \( {\mathbb{F}_n} \) starting and ending at the root \( {\varnothing} \) is equivalent, as \( {n\rightarrow\infty} \), to \( {(2n-1)^mC_m} \):
\[ \langle A_{2n}^{2m}\delta_\varnothing,\delta_\varnothing\rangle \sim_{n\rightarrow\infty}(2n)^m\frac{1}{m+1}\binom{2m}{m}. \]
Tensor products versus free products. There are two natural multivariate generalizations of the group \( {(\mathbb{Z},+)} \). The first one is the commutative group \( {(\mathbb{Z}^n,+)=(\mathbb{Z},+)^{\otimes n}} \), obtained by tensor product, for which the Cayley graph is the usual square lattice. The second one is the less known free group \( {\mathbb{F}_n=(\mathbb{Z},+)^{\boxplus n}} \) which is non commutative if \( {n>1} \), but for which the Cayley graph is a tree like for \( {\mathbb{Z}} \) and unlike for \( {\mathbb{Z}^n} \) with \( {n>1} \). The Cayley graphs of \( {\mathbb{Z}^n} \) and \( {\mathbb{F}_n} \) are both infinite and \( {(2n)} \)-regular. One should not confuse the free group \( {\mathbb{F}_n} \) with the finite field \( {\mathbf{F}_q} \).
- Commutative universe: \( {\mathbb{Z}^n} \), tensor product of \( {n} \) copies of \( {\mathbb{Z}} \)
- Regular square lattice, presence of cycles
- Central binomial coefficients and arcsine distribution
- Non commutative universe: \( {\mathbb{F}_n} \), free product of \( {n>1} \) copies of \( {\mathbb{Z}} \)
- Regular tree, absence of cycles
- Catalan numbers and semicircle distribution (only as \( {n\rightarrow\infty} \)).
Free product and free group. If \( {G_1,\ldots,G_n} \) are at most countable groups, then their free product \( {G} \) is the group of finite strings (words) written using as letters the element of the disjoint union of \( {G_1,\ldots,G_n} \). The sole simplifications in these strings are the ones coming from the group operation in each \( {G_1,\ldots,G_n} \). The group operation in \( {G} \) is the concatenation of strings, and the neutral element is the empty string \( {\varnothing} \). If \( {G_1,\ldots,G_n} \) are finitely generated then \( {G} \) is also finitely generated and one may use the generators of \( {G_1,\ldots,G_n} \) as letters to construct the elements of \( {G} \). The free group \( {\mathbb{F}_n} \) is the free product of \( {n} \) copies of \( {(\mathbb{Z},+)} \). It is customary to use the multiplicative notation. Namely, let \( {n\geq1} \) be an integer and let
\[ S=\{e_1,\ldots,e_n,e_1^{-1},\ldots,e_n^{-1}\} \]
be a set of \( {2n} \) distinct letters. The notation will make sense in the sequel. The free group \( {\mathbb{F}_n} \) with \( {n} \) generators is constituted by all the finite words (strings of letters) written with the letters of \( {S} \), with the simplification rules
\[ e_i e_i^{-1} = e_i^{-1} e_i = \varnothing \]
for every \( {1\leq i\leq d} \), where \( {\varnothing} \) denotes the empty string. For any \( {a_1\cdots a_p} \) and \( {b_1\cdots b_q} \) in \( {\mathbb{F}_n} \), with \( {a_1,\ldots,a_p,b_1,\ldots,b_q} \) in \( {S} \), the group operation consists first to concatenate the strings into \( {a_1\cdots a_p b_1\cdots b_q} \), and then to simplify using the simplification rules. This operation is not commutative if \( {n>1} \). The empty word \( {\varnothing} \) is the neutral element. With multiplicative notation, we have \( {(e_i)^{-1}=e_i^{-1}} \) and \( {(e_i^{-1})^{-1}=e_i} \) and \( {(a_1\cdots a_p)^{-1}=a_p^{-1}\cdots a_1^{-1}} \). We say that \( {S} \) is a generating set of \( {\mathbb{F}_n} \), and that the element of \( {S} \) are generators. Every element \( {a} \) of \( {\mathbb{F}_n} \) with \( {a\neq\varnothing} \) can be written as
\[ a=a_1\cdots a_p \]
with \( {p\geq1} \) and \( {a_1,\ldots,a_p\in S} \) and \( {a_{i+1}\neq a_i^{-1}} \) for every \( {1\leq i\leq p-1} \). The term free comes from the fact that the sole relations are due to the group structure. There are no extra relations. The free group is infinite and non commutative if \( {n>1} \). Every finitely generated group is isomorphic to the quotient of a free group by extra relations. Note that \( {F_1} \) is isomorphic to \( {(\mathbb{Z},+)} \) and \( {S=\{\pm e_1\}} \), while for \( {d>1} \) the group \( {(\mathbb{Z}^d,+)} \) is isomorphic to the quotient of \( {F_d} \) with the extra relations which are the commutation relations between the generators \( {S=\{\pm e_i:1\leq i\leq d\}} \). The group \( {(\mathbb{Z}^d,+)} \) can be seen from this perspective as the free commutative group. The notation \( {e_i} \) matches the standard notation for the canonical basis of \( {\mathbb{Z}^d} \).
Cayley graph. The (left) Cayley graph of \( {\mathbb{F}_n} \) has vertices \( {V=\mathbb{F}_n} \) and edges
\[ E=\{\{h,k\}\subset V:kh^{-1}\in S\}=\{\{h,sh\}:h\in V,s\in S\}. \]
We say left since we multiply from the left by elements of \( {S} \) in order to produce edges. This graph is actually a rooted infinite \( {2n} \)-regular tree, in which the root \( {\varnothing} \) has \( {2n} \) children while each other vertex has exactly \( {1} \) parent and \( {2n-1} \) children. Note that \( {F_1} \) is isomorphic to \( {\mathbb{Z}} \), and its Cayley graph is a doubly infinite chain. In \( {\mathbb{Z}^n} \), \( {n>1} \), the commutation relations produce cycles, which are not present in \( {\mathbb{F}_n} \).
The set \( {\mathbb{F}_n} \) is a metric space for the natural distance \( {\mathrm{dist}} \) on its Cayley graph: \( {\mathrm{dist}(a,b)} \) is the length of the shortest path from \( {a} \) to \( {b} \) in the graph, for any \( {a,b\in \mathbb{F}_n} \). Let us denote by \( {|a|} \) the length of the word \( {a} \) after simplification, in other words we have \( {a=c_1\cdots c_{|a|}} \) where \( {c_1,\ldots,c_{|a|}\in S} \) and \( {c_i\neq c_i^{-1}} \) for any \( {1\leq i<|a|} \). Then \( {\mathrm{dist}(\varnothing,a)=|a|} \) and \( {\mathrm{dist}(a,b)=|a^{-1}b|} \). The distance is invariant by translation in the sense that \( {\mathrm{dist}(ac,bc)=\mathrm{dist}(a,b)} \) for any \( {a,b,c\in \mathbb{F}_n} \).
Random walk again. The random walk on \( {\mathbb{F}_n} \) is the random walk on the Cayley graph of \( {\mathbb{F}_n} \). At each step, it multiplies its position in \( {\mathbb{F}_n} \), from the left, by a random letter chosen uniformly in \( {S} \). For any \( {t\geq0} \) and \( {h,k\in \mathbb{F}_n} \),
\[ P(h,k) :=\mathbb{P}(X_{t+1}=k\,|\,X_t=h) =\frac{\mathbf{1}_{\{h,k\}\in E}}{|S|} =\frac{\mathbf{1}_{kh^{-1}\in S}}{2n} \]
where \( {E} \) is the set of edges of the Cayley graph of \( {\mathbb{F}_n} \). For any \( {m\geq1} \), let us consider
\[ P^m(h,k):=\mathbb{P}(X_m=k\,|\,X_0=h), \]
The law of the random walk is translation invariant: for every \( {h,k\in \mathbb{F}_n} \),
\[ P^m(h,k)=p^m(\varnothing,kh^{-1}). \]
In particular the diagonal of \( {P^m} \) is constant: \( {P^m(h,h)} \) does not depend on \( {h} \). Let \( {A_{2n}} \) be the adjacency operator of the Cayley graph of \( {\mathbb{F}_n} \) which is the regular tree of degree \( {d=2n} \) (see above). For every \( {m\geq0} \), we get from our preceding analysis that
\[ P^{m}(\varnothing,\varnothing) =\frac{\langle A_{2n}^m\delta_\varnothing,\delta_\varnothing\rangle}{(2n)^{m}}. \]
We know that \( {P^{m}(\varnothing,\varnothing)=0} \) if \( {m} \) is odd. If \( {m} \) is even, we may replace \( {m} \) by \( {2m} \) for convenience. If \( {n=1} \) then \( {F_1=\mathbb{Z}} \) is commutative and
\[ P^{2m}(\varnothing,\varnothing) \overset{n=1}{=}\frac{1}{2^{2m}}\binom{2m}{m}. \]
In contrast, if \( {n>1} \) then \( {\mathbb{F}_n} \) is no longer commutative, and by the Kesten-McKay formula,
\[ P^{2m}(\varnothing,\varnothing) \sim_{n\rightarrow\infty} \frac{(2n)^m}{(2n)^{2m}}C_m=\frac{C_m}{(2n)^m} \]
where \( {C_m=\frac{1}{1+m}\binom{2m}{m}} \) is the \( {m} \)-th Catalan number.
Notes. The Kesten-McKay distribution was obtained by Harry Kesten (1931 – ) in his doctoral thesis (35p., entitled Symmetric random walks on groups, published in 1959 in Transactions of the AMS), in the special case of the random walk on the Cayley graph of the free group. A modern presentation lifted to regular trees can be found in the book by Hora and Obata (theorem 4.4). The cumulative distribution of the Kesten-McKay distribution \( {\mu_d} \) has a nice expression: for any \( {u\in(-2\sqrt{d-1},2\sqrt{d-1})} \),
\[ \begin{array}{rcl} \mu_d((-\infty,u)) &=&\int_{-2\sqrt{d-1}}^u\!\frac{d\sqrt{4(d-1)-x^2}}{2\pi(d^2-x^2)}\,dx \\ &=&\frac{1}{2}+\frac{d}{2\pi} \left[\arcsin\frac{u}{2\sqrt{d-1}}-\frac{d-2}{d}\arctan\frac{(d-2)u}{d\sqrt{4(d-1)-u^2}} \right]. \end{array} \]
It has been shown by Brendan McKay (1951 – ) in his 1981 paper The Expected Eigenvalue Distribution of Large Regular Graphs that if \( {{(G_n)}_{n\geq1}} \) is a sequence of finite regular graphs with fixed degree \( {d\geq2} \) such that
- the number of vertices \( {|G_n|} \) satisfies \( {\lim_{n\rightarrow\infty}|G_n|=\infty} \);
- the number \( {k} \)-cycles \( {C_k(G_n)} \) in \( {G_n} \) satisfies \( {\lim_{n\rightarrow\infty}\frac{C_k(G_n)}{|G_n|}=0} \) for any \( {k\geq3} \);
then the empirical distribution of the eigenvalues of the adjacency matrix of \( {G_n} \) tends weakly to \( {\mu_d} \) as \( {n\rightarrow\infty} \). Moreover if \( {G} \) is a random regular graph drawn from the uniform distribution on regular graphs with degree \( {d} \) and \( {n} \) vertices, then the expected spectral distribution of the adjacency matrix of \( {G} \) tends weakly as \( {n\rightarrow\infty} \) to \( {\mu_d} \). The proof of McKay is based on the method of moments. At the very end of his paper, McKay says that his work can be connected to the statistical theory of random walks but does not cite explicitly the work of Kesten.