We will see in this post how the arcsine law appears naturally in the combinatorics of paths of the simple random walk on Z, while the Wigner semicircle law appears naturally in the asymptotic analysis (over the degree) of the combinatorics of paths in regular trees, in relation with free groups. The link with free probability and random matrices might be the subject of further posts.
Counting paths on graphs. Let G=(V,E) be a graph with set of vertices V, and set of edges E⊂{{v,w}:v≠w∈V}. We assume that V is at most countable, that E is non empty, and that each vertex has positive and finite degree: for any v∈V,
dv=∑w∈V1{v,w}∈E∈(0,∞).
We may consider the Hilbert space
ℓ2C(V)={x:V→C with ∑v∈V|xv|2<∞}.
To any x∈ℓ2C(V) we associate a complex measure ∑v∈Vxvδv. We have ⟨δv,δw⟩=1v=w and we say that (δv)v∈V is the canonical basis of ℓ2C(V). The scalar product is
⟨x,y⟩=∑v,w∈Vxv¯yw⟨δv,δw⟩=∑v∈Vxv¯yv.
The adjacency operator of G is A:ℓ2C(V)→ℓ2C(V) defined for every v,w∈V by
⟨Aδw,δv⟩=1{v,w}∈E.
Note that A is well defined since ‖Aδv‖2=⟨Aδv,Aδv⟩=dv<∞ for any v∈V. For any m∈N, we denote by Am=A∘⋯∘A the m-th power of A. For every vertices v,w∈V the number of paths in G of length m starting at v and ending at w is
⟨Amδv,δw⟩=∑u0,…,um∈Vv=u0,um=w1{u0,u1}∈E⋯1{um−1,um}∈E.
Simple random walk on graphs. Let G=(V,E) be as above. The simple random walk X=(Xt)t∈N on G is the Markov chain with state space V and transition kernel P:V×V↦[0,1] given for any v,w∈V by
P(v,w)=P(Xn+1=w|Xn=v)=1{v,w}∈Edv.
The measure μ on V associated to the degree sequence (dv)v∈V of G, defined by μ(v)=dv for every v∈V, is symmetric with respect to the kernel P: for every v,w∈V,
μ(v)P(v,w)=μ(w)P(w,v).
Suppose now that G is regular, meaning that (dv)v∈V is constant and say equal to some d>1. Then μ is proportional to the counting measure, and P is doubly stochastic. Conditional on {X0=v}, for every m∈N, the law of the random path (X0=v,X1,…,Xm) is uniform on the set of paths of length m starting at v. It follows that for every v∈V and every m∈N,
P(Xm=v|X0=v)=Pm(v,v)=⟨Amδv,δv⟩dm,
and it follows then from Markov chains theory that the random walk X, which is irreducible, is recurrent (null recurrent if V is infinite) if and only if for some (thus every) v∈V
∑m∈N⟨Amδv,δv⟩dm=∞.
One can compute ⟨Amδv,δv⟩ with some combinatorics for certain nice regular graphs.
Counting paths on square lattices. Let us compute ⟨Amδv,δv⟩ when G is the usual square lattice of dimension n≥1, which is the Cayley graph of the commutative group (Zn,+). In this case G is regular of even degree d=2n. In this graph G, the quantity ⟨Amδv,δv⟩ does not depend on v and is equal to zero if m is odd. When m is even, we may replace m by 2m for convenience. Thanks to the commutativity, one can, for each path, group the increments by type. This makes the combinatorics multinomial (encodes the outcomes of a dice of 2n faces thrown 2m times): place 2m balls in 2n boxes labeled +e1,−e1,…,+en,−en in such a way that the boxes +ei and −ei contain the same number of balls, for each 1≤i≤n:
⟨A2mδ0,δ0⟩=∑0≤m1,…,mn≤mm1+⋯+mn=m(2mm1,m1,…,mn,mn)=∑0≤m1,…,mn≤mm1+⋯+mn=m(2m)!m1!2⋯mn!2.
When n=1 then the formula boils down to the central binomial coefficient
⟨A2mδ0,δ0⟩n=1=(2mm).
When n=2 then by the Vandermonde convolution formula for binomial coefficients,
⟨A2mδ0,δ0⟩n=2=(2m)!m!2∑0≤k≤m(mk)(mm−k)=(2m)!m!2(2mm)=(2m)!2m!4.
More generally, one may seek for compact formulas for any n≥1 in the same spirit. Beyond the example of square lattices, it turns out that one can compute ⟨Amδv,δv⟩ for other examples of regular graphs, such as for instance regular trees.
Counting paths on regular trees. For any integer d≥2, the regular tree Td=(V,E) is the graph with infinite countable number of vertices, without cycles, and in which each vertex has exactly d neighbors. If d is even with d=2n then Td=T2n is the Cayley graph of the free group Fn (see below), in particular T2 is the Cayley graph of Z. Let us pick a vertex ∅ in Td, and call it the root. In the case d=2n it is customary to identify ∅ with the neutral element of the free group Fn (which is 0 if n=1, and the empty string if n>1, see below for details).
Let us denote by Ad the adjacency operator of our regular tree Td. Recall that ⟨Amdδv,δv⟩ is the number of paths in the graph Td, of length m, starting and ending at ∅. As for square lattices, the quantity ⟨Amdδv,δv⟩ does not depend on v∈V, and is equal in particular to ⟨Amdδ∅,δ∅⟩, and is equal to zero if m is odd. We have
⟨Amdδ∅,δ∅⟩=∫xmμd(dx)
for any m∈N, where μd is the Kesten-McKay distribution given by
μd(dx)=d√4(d−1)−x22π(d2−x2)1[−2√d−1,2√d−1](x)dx.
In other words, the number of paths of length m in Td, starting and ending at ∅, is given by the m-th moment of the probability distribution μd. A proof relies on the Carleman moments condition for the existence of the probability measure, and on the inversion of the Cauchy-Stieltjes transform to get the formula for the density. Experts may note by the way that the probability distribution μd is nothing else but the so-called spectral distribution with respect to vector δ∅ of the symmetric operator Ad (actually with respect to any canonical vector δv, v∈V, thanks to symmetries!).
When d=2 then T2 is the Cayley graph of F1=Z and the corresponding Kesten-McKay distribution μ2 is the arcsine distribution on [−2,2]: for every integer m≥1,
⟨A2mdδ∅,δ∅⟩=∫x2mμ2(dx)=∫2−2x2mπ√4−x2dx=(2mm)
and we recover that the number of paths of length 2m in Z starting and ending at the root ∅ (which is the origin 0) is given by the central binomial coefficient:
⟨A2mdδ∅,δ∅⟩=(2mm).
At the opposite side of degree, when d→∞ then μd, scaled by (d−1)−1/2, tends to the semicircle distribution on [−2,2] (also known as the Wigner or the Sato-Tate distribution): for every integer m≥1,
limd→∞⟨A2mdδ∅,δ∅⟩(d−1)m=limd→∞∫(x√d−1)2mμd(dx)=limd→∞d2π∫2√d−1−2√d−1(x√d−1)2m√4(d−1)−x2d2−x2dx=limd→∞d2π∫2−2y2m√4(d−1)−(d−1)y2d2−(d−1)y2√d−1dy=∫2−2y2m√4−y22πdy=11+m(2mm).
This quantity Cm=11+m(2mm) is nothing else but the m-th Catalan number. Taking d=2n, this says that the number of paths of length 2m in the free group Fn starting and ending at the root ∅ is equivalent, as n→∞, to (2n−1)mCm:
⟨A2m2nδ∅,δ∅⟩∼n→∞(2n)m1m+1(2mm).
Tensor products versus free products. There are two natural multivariate generalizations of the group (Z,+). The first one is the commutative group (Zn,+)=(Z,+)⊗n, obtained by tensor product, for which the Cayley graph is the usual square lattice. The second one is the less known free group Fn=(Z,+)⊞n which is non commutative if n>1, but for which the Cayley graph is a tree like for Z and unlike for Zn with n>1. The Cayley graphs of Zn and Fn are both infinite and (2n)-regular. One should not confuse the free group Fn with the finite field Fq.
- Commutative universe: Zn, tensor product of n copies of Z
- Regular square lattice, presence of cycles
- Central binomial coefficients and arcsine distribution
- Non commutative universe: Fn, free product of n>1 copies of Z
- Regular tree, absence of cycles
- Catalan numbers and semicircle distribution (only as n→∞).
Free product and free group. If G1,…,Gn are at most countable groups, then their free product G is the group of finite strings (words) written using as letters the element of the disjoint union of G1,…,Gn. The sole simplifications in these strings are the ones coming from the group operation in each G1,…,Gn. The group operation in G is the concatenation of strings, and the neutral element is the empty string ∅. If G1,…,Gn are finitely generated then G is also finitely generated and one may use the generators of G1,…,Gn as letters to construct the elements of G. The free group Fn is the free product of n copies of (Z,+). It is customary to use the multiplicative notation. Namely, let n≥1 be an integer and let
S={e1,…,en,e−11,…,e−1n}
be a set of 2n distinct letters. The notation will make sense in the sequel. The free group Fn with n generators is constituted by all the finite words (strings of letters) written with the letters of S, with the simplification rules
eie−1i=e−1iei=∅
for every 1≤i≤d, where ∅ denotes the empty string. For any a1⋯ap and b1⋯bq in Fn, with a1,…,ap,b1,…,bq in S, the group operation consists first to concatenate the strings into a1⋯apb1⋯bq, and then to simplify using the simplification rules. This operation is not commutative if n>1. The empty word ∅ is the neutral element. With multiplicative notation, we have (ei)−1=e−1i and (e−1i)−1=ei and (a1⋯ap)−1=a−1p⋯a−11. We say that S is a generating set of Fn, and that the element of S are generators. Every element a of Fn with a≠∅ can be written as
a=a1⋯ap
with p≥1 and a1,…,ap∈S and ai+1≠a−1i for every 1≤i≤p−1. The term free comes from the fact that the sole relations are due to the group structure. There are no extra relations. The free group is infinite and non commutative if n>1. Every finitely generated group is isomorphic to the quotient of a free group by extra relations. Note that F1 is isomorphic to (Z,+) and S={±e1}, while for d>1 the group (Zd,+) is isomorphic to the quotient of Fd with the extra relations which are the commutation relations between the generators S={±ei:1≤i≤d}. The group (Zd,+) can be seen from this perspective as the free commutative group. The notation ei matches the standard notation for the canonical basis of Zd.
Cayley graph. The (left) Cayley graph of Fn has vertices V=Fn and edges
E={{h,k}⊂V:kh−1∈S}={{h,sh}:h∈V,s∈S}.
We say left since we multiply from the left by elements of S in order to produce edges. This graph is actually a rooted infinite 2n-regular tree, in which the root ∅ has 2n children while each other vertex has exactly 1 parent and 2n−1 children. Note that F1 is isomorphic to Z, and its Cayley graph is a doubly infinite chain. In Zn, n>1, the commutation relations produce cycles, which are not present in Fn.
The set Fn is a metric space for the natural distance dist on its Cayley graph: dist(a,b) is the length of the shortest path from a to b in the graph, for any a,b∈Fn. Let us denote by |a| the length of the word a after simplification, in other words we have a=c1⋯c|a| where c1,…,c|a|∈S and ci≠c−1i for any 1≤i<|a|. Then dist(∅,a)=|a| and dist(a,b)=|a−1b|. The distance is invariant by translation in the sense that dist(ac,bc)=dist(a,b) for any a,b,c∈Fn.
Random walk again. The random walk on Fn is the random walk on the Cayley graph of Fn. At each step, it multiplies its position in Fn, from the left, by a random letter chosen uniformly in S. For any t≥0 and h,k∈Fn,
P(h,k):=P(Xt+1=k|Xt=h)=1{h,k}∈E|S|=1kh−1∈S2n
where E is the set of edges of the Cayley graph of Fn. For any m≥1, let us consider
Pm(h,k):=P(Xm=k|X0=h),
The law of the random walk is translation invariant: for every h,k∈Fn,
Pm(h,k)=pm(∅,kh−1).
In particular the diagonal of Pm is constant: Pm(h,h) does not depend on h. Let A2n be the adjacency operator of the Cayley graph of Fn which is the regular tree of degree d=2n (see above). For every m≥0, we get from our preceding analysis that
Pm(∅,∅)=⟨Am2nδ∅,δ∅⟩(2n)m.
We know that Pm(∅,∅)=0 if m is odd. If m is even, we may replace m by 2m for convenience. If n=1 then F1=Z is commutative and
P2m(∅,∅)n=1=122m(2mm).
In contrast, if n>1 then Fn is no longer commutative, and by the Kesten-McKay formula,
P2m(∅,∅)∼n→∞(2n)m(2n)2mCm=Cm(2n)m
where Cm=11+m(2mm) is the m-th Catalan number.
Notes. The Kesten-McKay distribution was obtained by Harry Kesten (1931 - ) in his doctoral thesis (35p., entitled Symmetric random walks on groups, published in 1959 in Transactions of the AMS), in the special case of the random walk on the Cayley graph of the free group. A modern presentation lifted to regular trees can be found in the book by Hora and Obata (theorem 4.4). The cumulative distribution of the Kesten-McKay distribution μd has a nice expression: for any u∈(−2√d−1,2√d−1),
μd((−∞,u))=∫u−2√d−1d√4(d−1)−x22π(d2−x2)dx=12+d2π[arcsinu2√d−1−d−2darctan(d−2)ud√4(d−1)−u2].
It has been shown by Brendan McKay (1951 - ) in his 1981 paper The Expected Eigenvalue Distribution of Large Regular Graphs that if (Gn)n≥1 is a sequence of finite regular graphs with fixed degree d≥2 such that
- the number of vertices |Gn| satisfies limn→∞|Gn|=∞;
- the number k-cycles Ck(Gn) in Gn satisfies limn→∞Ck(Gn)|Gn|=0 for any k≥3;
then the empirical distribution of the eigenvalues of the adjacency matrix of Gn tends weakly to μd as n→∞. Moreover if G is a random regular graph drawn from the uniform distribution on regular graphs with degree d and n vertices, then the expected spectral distribution of the adjacency matrix of G tends weakly as n→∞ to μd. The proof of McKay is based on the method of moments. At the very end of his paper, McKay says that his work can be connected to the statistical theory of random walks but does not cite explicitly the work of Kesten.