Month: June 2011

Generating random permutations following the uniform law on the symmetric group ${S_n}$ is useful in many concrete situations. Various simulation algorithms are available. At the time of writing, the randperm function of Octave/Freemat/Matlab is implemented as

$\mathrm{[junk,perm]=sort(rand(1,n))}$

This algorithm is based on the following fact: if ${U_1,\ldots,U_n}$ are i.i.d. uniform random variables on ${[0,1]}$ (one may replace the uniform law by any other law without atoms) then the random permutation which sorts them is distributed according to the uniform law on ${S_n}$. The computational complexity of the algorithm is roughly the complexity of the quicksort which is ${O(n\log(n))}$ with high probability and ${O(n^2)}$ in the worst case. By the way, the randomized version of quicksort is analyzed in the book by Motwani and Raghavan. It is quite fun to realize that in order to produce a disorder, we reorder a disorder.

Actually, one can get rid of the logarithmic factor: there exist ${O(n)}$ algorithms producing uniform random permutations. For instance, one may pick without replacement the ${n}$ images of the permutation. Unfortunately, this method is difficult to implement since we have to keep track of the already chosen elements, and this may have a cost. Alternatively, one may also use the Chinese restaurant process. Recall that the Chinese restaurant process (with parameter ${\theta=1}$) consists at step ${k}$ either to add ${k}$ to an existing cycle (dealing with ${\{1,\ldots,k-1\}}$) with a probability proportional to its length, or to create a new cycle of unit length with probability ${1/k}$. Starting from the permutation ${(1)\cdots(n)}$ with ${n}$ fixed points, this corresponds to chose ${i}$ uniform over ${\{1,\ldots,k\}}$ and to compute the transposition ${(i,k)}$. Here ${(i,k)}$ is the identity if ${i=k}$, and we recall that the effect of a multiplication with the transposition ${(i,k)}$ is either the merge of the cycles containing ${i}$ and ${k}$ if they were disjoint or the split of the cycle containing both ${i}$ and ${k}$ if they belong to the same cycle. Here since ${k}$ belongs to a cycle of length ${1}$, we add it to the cycle containing ${i}$ with probability ${(k-1)/k}$. Therefore, the algorithm reduces to compute the product of transpositions

$(1U_1)\cdots(nU_n)$

where ${U_1,\ldots,U_n}$ are independent random variables with ${U_k}$ uniform on ${\{1,\ldots,k\}}$. This algorithm, also known as the Fisher-Yates or Knuth shuffle, is very simple to implement:

$\begin{array}{rcl} &&\mathrm{for\ k\ from\ 1\ to\ n\ do\ v[k]:=k;\ end} \\ &&\mathrm{for\ k\ from\ n\ downto\ 2\ do\ i:=rand(1,k);\ swap(v[i],v[k]);\ end} \end{array}$

This is the algorithm used at the time of writing by the grand generator of Scilab via the Fortran 77 routine genprm. Beware that this algorithm is not competitive if simply implemented in an interpreted language (that explains the sort(rand) trick above). Note that the Chinese restaurant process has also the ability of producing almost for free the cycles decomposition. With an arbitrary ${\theta}$, it allows more generally to simulate random permutations on ${S_n}$ following the Ewens law with parameter ${\theta}$ (we recover the uniform law on ${S_n}$ when ${\theta=1}$). The Ewens law appears as a Boltzmann-Gibbs measure on ${S_n}$ with inverse temperature ${-\log(\theta)}$ and energy equal to the number of cycles.

There exists approximate Monte Carlo Markov Chains (MCMC) methods, based on an aperiodic and irreducible Markov chain on ${S_n}$ admitting the uniform law on ${S_n}$ as an invariant law. For instance, one may use the random walk on ${S_n}$ with i.i.d. increments distributed according to the uniform law on transpositions. This Markov chain on ${S_n}$ features an ${n\log(n)}$ cutoff in total variation as shown by Diaconis and Shahshahani. One may also use the top to random shuffle Markov chain which features also an ${n\log(n)}$ cutoff related to the coupon collector, see e.g. Aldous and Diaconis. More examples are considered in an article by Diaconis and Fill and Pitman and in the excellent relatively recent book by Levin and Peres and Wilmer.

The Steinhaus-Johnson-Trotter algorithm is also based on transpositions.

Note: this post benefited from discussions with many people at various occasions, including Laurent Miclo, Edouard Maurel-Segala, and Yan Doumerc.

Last Updated on 2014-08-20

A famous theorem due to Constantin Carathéodory states that if ${A\subset\mathbb{R}^d}$ then every point ${x}$ in the convex hull of ${A}$ is the mean of a discrete probability measure supported by at most ${d+1}$ atoms in ${A}$. In other words, there exist ${x_1,\ldots,x_{d+1}\in A}$ and ${\lambda_1,\ldots,\lambda_{d+1}\in[0,1]}$ such that ${\lambda_1+\cdots+\lambda_{d+1}=1}$ and

$x=\sum_{i=1}^{d+1}\lambda_ix_i.$

The simplex case ${A=\{0,e_1,\ldots,e_d\}}$, where ${e_1,\ldots,e_d}$ is the canonical basis of ${\mathbb{R}^d}$, shows that one cannot replace ${d+1}$ by a smaller number in general. The proof is not difficult. Namely, since ${x}$ belongs to the convex hull of ${A}$, it is the convex combination of say ${n}$ points ${x_1,\ldots,x_n}$ of ${A}$, namely ${x=\sum_{i=1}^n\lambda_ix_i}$ with ${\lambda_1,\ldots,\lambda_n\in[0,1]}$ and ${\lambda_1+\cdots+\lambda_n=1}$. Now if ${n>d+1}$ then the ${n-1}$ vectors ${x_1-x_n,\ldots,x_{n-1}-x_n}$ of ${\mathbb{R}^d}$ are linearly dependent, which means that ${\sum_{i=1}^n\mu_ix_i=0}$ for some reals ${\mu_1,\ldots,\mu_{n-1}}$ not all equal to zero, with ${\mu_n:=-\sum_{i=1}^{n-1}\mu_i}$. This gives ${x=\sum_{i=1}^n(\lambda_i+\theta\mu_i)x_i}$ for any real ${\theta}$. Now we may select ${\theta}$ in order to obtain an expression of ${x}$ as a convex combination of only ${n-1}$ points. The desired result follows then by repeating this argument ${n-(d+1)}$ times. This proof is constructive as soon as we start with an explicit ${x}$.

Following Bernard Maurey, one may drop the dependence over the dimension by relaxing the desired result: if ${H}$ is a Hilbert space and ${A\subset H}$ (say bounded for simplicity) then there exists a constant ${c=c(A)}$ such that for every ${x}$ in the convex hull of ${A}$ and for every ${n\geq1}$, there exists ${x_1,\ldots,x_n\in A}$ such that

$\left\Vert x-\frac{1}{n}\sum_{i=1}^nx_i\right\Vert \leq \frac{c}{\sqrt{n}}.$

The proof of Maurey is probabilistic. Namely, since ${x}$ belongs to the convex hull of ${A}$, there exists a probability measure ${\mu}$ supported in ${A}$ with mean ${x}$. Let ${(X_i)_{i\geq1}}$ be independent and identically distributed random variables with law ${\mu}$. We have ${\mathbb{E}(X_i)=x}$. For every ${n\geq1}$,

$\mathbb{E}\left(\left\Vert x-\frac{1}{n}\sum_{i=1}^nX_i\right\Vert^2\right) =\frac{1}{n}\mathbb{E}(\left\Vert X_1-x\right\Vert^2) = \frac{c^2}{n}.$

Therefore, there exists at least one ${\omega}$ such that

$\left\Vert x-\frac{1}{n}\sum_{i=1}^nX_i(\omega)\right\Vert \leq \frac{c}{\sqrt{n}}.$

It remains to take ${x_i=X_i(\omega)}$. This proof is a simple instance of what is known as the empirical method. It is not constructive, as many probabilistic proofs. The method of Maurey has many useful applications such as covering numbers, see for instance Pisier, Remarques sur un résultat non publié de B. Maurey. Séminaire d’analyse fonctionnelle (“Maurey-Schwartz”, Polytechnique) (1980-1981), exp. no 5, p. 1-12. A recent application is given in Vershynin, How close is the sample covariance matrix to the actual covariance matrix?, arXiv:1004.3484 [math.PR] and in Gaïffas and Lecué, Sharp oracle inequalities for the prediction of a high-dimensional matrix, arXiv:1008.4886v1 [math.ST].

Last Updated on 2016-11-16

Spring-Summer school [June 20-21-22] in Paris (IHP) with the following two 9 hours courses :

• Jared Tanner (U. Edinburgh, UK) Stochastic Geometry and Random Matrix Theory in Compressed Sensing. Stochastic geometry and random matrix theory can be used to model and analyse the efficacy of a new paradigm in signal processing, compressed sensing. This new perspective shows that if a signal is sufficiently simple, it can be acquired at a rate proportional to its information content rather than the worst case rate dictated by a larger ambient space containing it. These lectures will show how this phenomenon can be modelled using stochastic geometry, and will also show how standard eigen-analysis in random matrix theory can give similar results.
• Roman Vershynin (U. Michigan, USA) Introduction to the non-asymptotic analysis of random matrices. This is a mini-course on basic non-asymptotic methods and concepts in random matrix theory. We will develop several tools for the analysis of the extreme singular values of random matrices with independent rows or columns. Many of these methods sprung off from the development of geometric functional analysis in the 1970-2000’s. They have applications in several fields, most notably in theoretical computer science, statistics and signal processing. Two applications will be discussed: for the problem of estimating covariance matrices in statistics, and for validating probabilistic constructions of measurement matrices in compressed sensing.

Bitcoin [bitcoin.org] is a virtual currency without any centralized issuing authority, using encrypted peer to peer network to secure transactions.  The total number of bitcoins is programmed to reach 21 millions. All transactions are public and stored in a distributed database to confirm transactions and prevent double spending. The value of a bitcoin in US Dollars has increased from $0.06 to over$30.00 since October 7, 2010.

Syntax · Style · .