Suppose that $X_1,\ldots,X_n$ and $Y_1,\ldots,Y_n$ are two independent samples following a Gaussian law on $\mathbb{R}^p$ with zero mean and unknown invertible covariances $A$ and $B$. We consider the situation where $p$ is much larger than $n$ but we assume that $A^{-1}$ and $B^{-1}$ are sparse (conditional independence on the components). How can we test efficiently if $A=B$? Same question if the sparsity structure is assumed in $A$ and $B$ rather than on their inverse (independence instead of conditional independence).
Suppose that $(X_1,Y_1),\ldots,(X_n,Y_n)$ is a sample drawn from some unknown law on $\mathbb{R}^2.$ How can we efficiently test if the marginal laws are identical? More generally, suppose that $\ldots,Z_{-1}, Z_0, Z_1,\ldots$ is a time series. How can we test if the series is stationary? In other words, how can we test a nonparametric linear structure from the observation of a sample of an unknown law? Random projections?
For these questions coming from applications, we seek for a concrete usable answer…
This post is inspired from (separate) informal discussions with Georges Oppenheim and Didier Concordet. My friend and colleague Christophe Giraud told me that he is working with Nicolas Verzelen and Fanny Villers on the first question.
For your third problem, what kind of assumption would you make on the density of X and Y?
Here is a really naive approach if you think that the densities are smooth and have compact support:
1) If you assume that the densities belong to some finite dimensional subspace S, then you can build a test by estimating separately the densities of X and Y and comparing the associated confidence region. This would give you a parametric test T_S.
2) Apply a Bonferroni procedure with a nested class of subspaces that derives from a wavelet expansion.
The procedure is valid since under the null hypothesis the projections of the densities of X and Y on any subspace S are still the same.
Such a procedure does not make use of the possible dependency between X and Y.
Thanks Nicolas for your comments. Your answer is roughly the answer that I gave immediately to Georges Oppenheim but he was not happy with it 🙂 He argued that if you estimate the marginal laws from the observation of the couple, then your couple of estimators are correlated unless you split the sample in two parts reducing its effective size by a factor of 2. I suggested some resampling procedure, but he was still not happy 🙂 I think that he was motivated by small samples rather than by asymptotic theory.
I do not think you have to split to sample in two parts:
Take a confidence region of the density fx and a confidence region of fy at level 97.5%. A test that rejects H0 if the two confidence regions are disjoint has a level has a size smaller than 5%. Roughly, I would think that you lose a factor of 2 in a logarithm term instead of a factor 2.