-
Active clustering with bandit feedback
Authors:
Victor Thuot,
Alexandra Carpentier,
Christophe Giraud,
Nicolas Verzelen
Abstract:
We investigate the Active Clustering Problem (ACP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e., the least number of observation -…
▽ More
We investigate the Active Clustering Problem (ACP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e., the least number of observation - and with a probability of error smaller than a prescribed constant $δ$. In this paper, (i) we derive a non-asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the active setting.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Minimax optimal seriation in polynomial time
Authors:
Yann Issartel,
Christophe Giraud,
Nicolas Verzelen
Abstract:
We consider the statistical seriation problem, where the statistician seeks to recover a hidden ordering from a noisy observation of a permuted Robinson matrix. In this paper, we tightly characterize the minimax rate for this problem of matrix reordering when the Robinson matrix is bi-Lipschitz, and we also provide a polynomial time algorithm achieving this rate; thereby answering two open questio…
▽ More
We consider the statistical seriation problem, where the statistician seeks to recover a hidden ordering from a noisy observation of a permuted Robinson matrix. In this paper, we tightly characterize the minimax rate for this problem of matrix reordering when the Robinson matrix is bi-Lipschitz, and we also provide a polynomial time algorithm achieving this rate; thereby answering two open questions of [Giraud et al., 2021]. Our analysis further extends to broader classes of similarity matrices.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Estimating the history of a random recursive tree
Authors:
Simon Briend,
Christophe Giraud,
Gábor Lugosi,
Déborah Sulem
Abstract:
This paper studies the problem of estimating the order of arrival of the vertices in a random recursive tree. Specifically, we study two fundamental models: the uniform attachment model and the linear preferential attachment model. We propose an order estimator based on the Jordan centrality measure and define a family of risk measures to quantify the quality of the ordering procedure. Moreover, w…
▽ More
This paper studies the problem of estimating the order of arrival of the vertices in a random recursive tree. Specifically, we study two fundamental models: the uniform attachment model and the linear preferential attachment model. We propose an order estimator based on the Jordan centrality measure and define a family of risk measures to quantify the quality of the ordering procedure. Moreover, we establish a minimax lower bound for this problem, and prove that the proposed estimator is nearly optimal. Finally, we numerically demonstrate that the proposed estimator outperforms degree-based and spectral ordering procedures.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Computation-information gap in high-dimensional clustering
Authors:
Bertrand Even,
Christophe Giraud,
Nicolas Verzelen
Abstract:
We investigate the existence of a fundamental computation-information gap for the problem of clustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambient dimension $p$ is larger than the number $n$ of points. The existence of a computation-information gap in a specific Bayesian high-dimensional asymptotic regime has been conjectured by arXiv:1610.02918 based on the…
▽ More
We investigate the existence of a fundamental computation-information gap for the problem of clustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambient dimension $p$ is larger than the number $n$ of points. The existence of a computation-information gap in a specific Bayesian high-dimensional asymptotic regime has been conjectured by arXiv:1610.02918 based on the replica heuristic from statistical physics. We provide evidence of the existence of such a gap generically in the high-dimensional regime $p \geq n$, by (i) proving a non-asymptotic low-degree polynomials computational barrier for clustering in high-dimension, matching the performance of the best known polynomial time algorithms, and by (ii) establishing that the information barrier for clustering is smaller than the computational barrier, when the number $K$ of clusters is large enough. These results are in contrast with the (moderately) low-dimensional regime $n \geq poly(p, K)$, where there is no computation-information gap for clustering a mixture of isotropic Gaussian. In order to prove our low-degree computational barrier, we develop sophisticated combinatorial arguments to upper-bound the mixed moments of the signal under a Bernoulli Bayesian model.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Parameter-free projected gradient descent
Authors:
Evgenii Chzhen,
Christophe Giraud,
Gilles Stoltz
Abstract:
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD). We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Our algorithm is able to handle projection steps, does not involve restarts, reweighing alo…
▽ More
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD). We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional gradient evaluations compared to the classical PGD. It also fulfills optimal rates of convergence for cumulative regret up to logarithmic factors. We provide an extension of our approach to stochastic optimization and conduct numerical experiments supporting the developed theory.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness
Authors:
Evgenii Chzhen,
Christophe Giraud,
Zhen Li,
Gilles Stoltz
Abstract:
We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. The learner aims to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. We assume that contexts come from a continuous set, that costs can be signed, and that the expe…
▽ More
We consider contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. The learner aims to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. We assume that contexts come from a continuous set, that costs can be signed, and that the expected reward and cost functions, while unknown, may be uniformly estimated -- a typical assumption in the literature. In this setting, total cost constraints had so far to be at least of order $T^{3/4}$, where $T$ is the number of rounds, and were even typically assumed to depend linearly on $T$. We are however motivated to use CBwK to impose a fairness constraint of equalized average costs between groups: the budget associated with the corresponding cost constraints should be as close as possible to the natural deviations, of order $\sqrt{T}$. To that end, we introduce a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of $\sqrt{T}$ up to poly-logarithmic terms. This strategy is more direct and simpler than existing strategies in the literature. It relies on a careful, adaptive, tuning of the step size.
△ Less
Submitted 26 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
The price of unfairness in linear bandits with biased feedback
Authors:
Solenne Gaucher,
Alexandra Carpentier,
Christophe Giraud
Abstract:
In this paper, we study the problem of fair sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. The perceived reward is a linear combination of the covariates of the chosen action, but the player only observes a biased evaluation of this reward, depending on the sensitive attribute. To chara…
▽ More
In this paper, we study the problem of fair sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. The perceived reward is a linear combination of the covariates of the chosen action, but the player only observes a biased evaluation of this reward, depending on the sensitive attribute. To characterize the difficulty of this problem, we design a phased elimination algorithm that corrects the unfair evaluations, and establish upper bounds on its regret. We show that the worst-case regret is smaller than $\mathcal{O}(κ_*^{1/3}\log(T)^{1/3}T^{2/3})$, where $κ_*$ is an explicit geometrical constant characterizing the difficulty of bias estimation. We prove lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a possible sub-logarithmic factor. We also derive gap-dependent upper bounds on the regret, and matching lower bounds for some problem instance.Interestingly, these results reveal a transition between a regime where the problem is as difficult as its unbiased counterpart, and a regime where it can be much harder.
△ Less
Submitted 3 June, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit
Authors:
Karl Hajjar,
Lénaïc Chizat,
Christophe Giraud
Abstract:
To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical…
▽ More
To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood via these asymptotics that the nature of the trained model radically changes depending on the scale of the initial random weights, ranging from a kernel regime (for large initial variance) to a feature learning regime (for small initial variance). For deeper networks more regimes are possible, and in this paper we study in detail a specific choice of ''small'' initialization corresponding to "mean-field" limits of neural networks, which we call integrable parameterizations (IPs). First, we show that under standard i.i.d. zero-mean initialization, integrable parameterizations of neural networks with more than four layers start at a stationary point in the infinite-width limit and no learning occurs. We then propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics. In particular, one of these methods consists in using large initial learning rates, and we show that it is equivalent to a modification of the recently proposed maximal update parameterization $μ$P. We confirm our results with numerical experiments on image classification tasks, which additionally show a strong difference in behavior between various choices of activation functions that is not yet captured by theory.
△ Less
Submitted 20 December, 2021; v1 submitted 29 October, 2021;
originally announced October 2021.
-
Localization in 1D non-parametric latent space models from pairwise affinities
Authors:
Christophe Giraud,
Yann Issartel,
Nicolas Verzelen
Abstract:
We consider the problem of estimating latent positions in a one-dimensional torus from pairwise affinities. The observed affinity between a pair of items is modeled as a noisy observation of a function $f(x^*_{i},x^*_{j})$ of the latent positions $x^*_{i},x^*_{j}$ of the two items on the torus. The affinity function $f$ is unknown, and it is only assumed to fulfill some shape constraints ensuring…
▽ More
We consider the problem of estimating latent positions in a one-dimensional torus from pairwise affinities. The observed affinity between a pair of items is modeled as a noisy observation of a function $f(x^*_{i},x^*_{j})$ of the latent positions $x^*_{i},x^*_{j}$ of the two items on the torus. The affinity function $f$ is unknown, and it is only assumed to fulfill some shape constraints ensuring that $f(x,y)$ is large when the distance between $x$ and $y$ is small, and vice-versa. This non-parametric modeling offers a good flexibility to fit data. We introduce an estimation procedure that provably localizes all the latent positions with a maximum error of the order of $\sqrt{\log(n)/n}$, with high-probability. This rate is proven to be minimax optimal. A computationally efficient variant of the procedure is also analyzed under some more restrictive assumptions. Our general results can be instantiated to the problem of statistical seriation, leading to new bounds for the maximum error in the ordering.
△ Less
Submitted 11 August, 2023; v1 submitted 6 August, 2021;
originally announced August 2021.
-
A Unified Approach to Fair Online Learning via Blackwell Approachability
Authors:
Evgenii Chzhen,
Christophe Giraud,
Gilles Stoltz
Abstract:
We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts. The setting is a repeated game between the Player and Nature, where at each stage both pick actions based on the contexts. Inspired by the notion of unawareness, we assume that the Player can only access the non-sensitive context before making a decision, while we discuss both…
▽ More
We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts. The setting is a repeated game between the Player and Nature, where at each stage both pick actions based on the contexts. Inspired by the notion of unawareness, we assume that the Player can only access the non-sensitive context before making a decision, while we discuss both cases of Nature accessing the sensitive contexts and Nature unaware of the sensitive contexts. Adapting Blackwell's approachability theory to handle the case of an unknown contexts' distribution, we provide a general necessary and sufficient condition for learning objectives to be compatible with some fairness constraints. This condition is instantiated on (group-wise) no-regret and (group-wise) calibration objectives, and on demographic parity as an additional constraint. When the objective is not compatible with the constraint, the provided framework permits to characterise the optimal trade-off between the two.
△ Less
Submitted 7 November, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Pair-Matching: Links Prediction with Adaptive Queries
Authors:
Christophe Giraud,
Yann Issartel,
Luc Lehéricy,
Matthieu Lerasle
Abstract:
The pair-matching problem appears in many applications where one wants to discover good matches between pairs of entities or individuals. Formally, the set of individuals is represented by the nodes of a graph where the edges, unobserved at first, represent the good matches. The algorithm queries pairs of nodes and observes the presence/absence of edges. Its goal is to discover as many edges as po…
▽ More
The pair-matching problem appears in many applications where one wants to discover good matches between pairs of entities or individuals. Formally, the set of individuals is represented by the nodes of a graph where the edges, unobserved at first, represent the good matches. The algorithm queries pairs of nodes and observes the presence/absence of edges. Its goal is to discover as many edges as possible with a fixed budget of queries. Pair-matching is a particular instance of multi-armed bandit problem in which the arms are pairs of individuals and the rewards are edges linking these pairs. This bandit problem is non-standard though, as each arm can only be played once.
Given this last constraint, sublinear regret can be expected only if the graph presents some underlying structure. This paper shows that sublinear regret is achievable in the case where the graph is generated according to a Stochastic Block Model (SBM) with two communities. Optimal regret bounds are computed for this pair-matching problem. They exhibit a phase transition related to the Kesten-Stigum threshold for community detection in SBM. The pair-matching problem is considered in the case where each node is constrained to be sampled less than a given amount of times. We show how optimal regret rates depend on this constraint. The paper is concluded by a conjecture regarding the optimal regret when the number of communities is larger than 2. Contrary to the two communities case, we argue that a statistical-computational gap would appear in this problem.
△ Less
Submitted 5 March, 2024; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Partial recovery bounds for clustering with the relaxed $K$means
Authors:
Christophe Giraud,
Nicolas Verzelen
Abstract:
We investigate the clustering performances of the relaxed $K$means in the setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM). After identifying the appropriate signal-to-noise ratio (SNR), we prove that the misclassification error decay exponentially fast with respect to this SNR. These partial recovery bounds for the relaxed $K$means improve upon results currently known…
▽ More
We investigate the clustering performances of the relaxed $K$means in the setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM). After identifying the appropriate signal-to-noise ratio (SNR), we prove that the misclassification error decay exponentially fast with respect to this SNR. These partial recovery bounds for the relaxed $K$means improve upon results currently known in the sGMM setting. In the SBM setting, applying the relaxed $K$means SDP allows to handle general connection probabilities whereas other SDPs investigated in the literature are restricted to the assortative case (where within group probabilities are larger than between group probabilities). Again, this partial recovery bound complements the state-of-the-art results. All together, these results put forward the versatility of the relaxed $K$means.
△ Less
Submitted 19 April, 2019; v1 submitted 19 July, 2018;
originally announced July 2018.
-
Estimation of species relative abundances and habitat preferences using opportunistic data
Authors:
Camille Coron,
Clément Calenge,
Christophe Giraud,
Romain Julliard
Abstract:
We develop a new statistical procedure to monitor, with opportunist data, relative species abundances and their respective preferences for dierent habitat types. Following Giraud et al. (2015), we combine the opportunistic data with some standardized data in order to correct the bias inherent to the opportunistic data collection. Our main contributions are (i) to tackle the bias induced by habitat…
▽ More
We develop a new statistical procedure to monitor, with opportunist data, relative species abundances and their respective preferences for dierent habitat types. Following Giraud et al. (2015), we combine the opportunistic data with some standardized data in order to correct the bias inherent to the opportunistic data collection. Our main contributions are (i) to tackle the bias induced by habitat selection behaviors, (ii) to handle data where the habitat type associated to each observation is unknown, (iii) to estimate probabilities of selection of habitat for the species. As an illustration, we estimate common bird species habitat preferences and abundances in the region of Aquitaine (France).
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
PECOK: a convex optimization approach to variable clustering
Authors:
Florentina Bunea,
Christophe Giraud,
Martin Royer,
Nicolas Verzelen
Abstract:
The problem of variable clustering is that of grou** similar components of a $p$-dimensional vector $X=(X_{1},\ldots,X_{p})$, and estimating these groups from $n$ independent copies of $X$. When cluster similarity is defined via $G$-latent models, in which groups of $X$-variables have a common latent generator, and groups are relative to a partition $G$ of the index set $\{1, \ldots, p\}$, the m…
▽ More
The problem of variable clustering is that of grou** similar components of a $p$-dimensional vector $X=(X_{1},\ldots,X_{p})$, and estimating these groups from $n$ independent copies of $X$. When cluster similarity is defined via $G$-latent models, in which groups of $X$-variables have a common latent generator, and groups are relative to a partition $G$ of the index set $\{1, \ldots, p\}$, the most natural clustering strategy is $K$-means. We explain why this strategy cannot lead to perfect cluster recovery and offer a correction, based on semi-definite programing, that can be viewed as a penalized convex relaxation of $K$-means (PECOK). We introduce a cluster separation measure tailored to $G$-latent models, and derive its minimax lower bound for perfect cluster recovery. The clusters estimated by PECOK are shown to recover $G$ at a near minimax optimal cluster separation rate, a result that holds true even if $K$, the number of clusters, is estimated adaptively from the data. We compare PECOK with appropriate corrections of spectral clustering-type procedures, and show that the former outperforms the latter for perfect cluster recovery of minimally separated clusters.
△ Less
Submitted 16 June, 2016;
originally announced June 2016.
-
Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms
Authors:
Florentina Bunea,
Christophe Giraud,
Xi Luo,
Martin Royer,
Nicolas Verzelen
Abstract:
Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if…
▽ More
Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we contrast our methods with another popular clustering method, spectral clustering, specialized to variable clustering, and show that ensuring exact cluster recovery via this method requires clusters to have a higher separation, relative to the minimax threshold. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.
△ Less
Submitted 12 December, 2018; v1 submitted 8 August, 2015;
originally announced August 2015.
-
Capitalising on Opportunistic Data for Monitoring Species Relative Abundances
Authors:
Christophe Giraud,
Clément Calenge,
Camille Coron,
Romain Julliard
Abstract:
With the internet, a massive amount of information on species abundance can be collected under citizen science programs. However, these data are often difficult to use directly in statistical inference, as their collection is generally opportunistic, and the distribution of the sampling effort is often not known. In this paper, we develop a general statistical framework to combine such "opportuni…
▽ More
With the internet, a massive amount of information on species abundance can be collected under citizen science programs. However, these data are often difficult to use directly in statistical inference, as their collection is generally opportunistic, and the distribution of the sampling effort is often not known. In this paper, we develop a general statistical framework to combine such "opportunistic data" with data collected using schemes characterized by a known sampling effort. Under some structural assumptions regarding the sampling effort and detectability, our approach allows to estimate the relative abundance of several species in different sites. It can be implemented through a simple generalized linear model. We illustrate the framework with typical bird datasets from the Aquitaine region, south-western France. We show that, under some assumptions, our approach provides estimates that are more precise than the ones obtained from the dataset with a known sampling effort alone. When the opportunistic data are abundant, the gain in precision may be considerable, especially for the rare species. We also show that estimates can be obtained even for species recorded only in the opportunistic scheme. Opportunistic data combined with a relatively small amount of data collected with a known effort may thus provide access to accurate and precise estimates of quantitative changes in relative abundance over space and/or time.
△ Less
Submitted 26 February, 2015; v1 submitted 9 July, 2014;
originally announced July 2014.
-
Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes
Authors:
Christophe Giraud,
François Roueff,
Andres Sanchez-Perez
Abstract:
In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the $\ell^1$ norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representatio…
▽ More
In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the $\ell^1$ norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representation. Two kinds of aggregations are considered giving rise to different moment conditions on the noise and more or less sharp oracle inequalities. We apply this approach for deriving an adaptive predictor for locally stationary time varying autoregressive (TVAR) processes. It is obtained by aggregating a finite number of well chosen predictors, each of them enjoying an optimal minimax convergence rate under specific smoothness conditions on the TVAR coefficients. We show that the obtained aggregated predictor achieves a minimax rate while adapting to the unknown smoothness. To prove this result, a lower bound is established for the minimax rate of the prediction risk for the TVAR process. Numerical experiments complete this study. An important feature of this approach is that the aggregated predictor can be computed recursively and is thus applicable in an online prediction context.
△ Less
Submitted 17 November, 2015; v1 submitted 27 April, 2014;
originally announced April 2014.
-
Discussion: Latent variable graphical model selection via convex optimization
Authors:
Christophe Giraud,
Alexandre Tsybakov
Abstract:
Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].
Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].
△ Less
Submitted 5 November, 2012;
originally announced November 2012.
-
High-dimensional regression with unknown variance
Authors:
Christophe Giraud,
Sylvie Huet,
Nicolas Verzelen
Abstract:
We review recent results for high-dimensional sparse linear regression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation-sparsity. The emphasis is put on non-asymptotic analyses and feasible procedures. In addition, a small numerical study compares the practical performance of three schemes for tuning the…
▽ More
We review recent results for high-dimensional sparse linear regression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation-sparsity. The emphasis is put on non-asymptotic analyses and feasible procedures. In addition, a small numerical study compares the practical performance of three schemes for tuning the Lasso estimator and some references are collected for some more general models, including multivariate regression and nonparametric regression.
△ Less
Submitted 20 February, 2012; v1 submitted 26 September, 2011;
originally announced September 2011.
-
A pseudo-RIP for multivariate regression
Authors:
Christophe Giraud
Abstract:
We give a suitable RI-Property under which recent results for trace regression translate into strong risk bounds for multivariate regression. This pseudo-RIP is compatible with the setting $n < p$.
We give a suitable RI-Property under which recent results for trace regression translate into strong risk bounds for multivariate regression. This pseudo-RIP is compatible with the setting $n < p$.
△ Less
Submitted 28 June, 2011;
originally announced June 2011.
-
Low rank Multivariate regression
Authors:
Christophe Giraud
Abstract:
We consider in this paper the multivariate regression problem, when the target regression matrix $A$ is close to a low rank matrix. Our primary interest in on the practical case where the variance of the noise is unknown. Our main contribution is to propose in this setting a criterion to select among a family of low rank estimators and prove a non-asymptotic oracle inequality for the resulting est…
▽ More
We consider in this paper the multivariate regression problem, when the target regression matrix $A$ is close to a low rank matrix. Our primary interest in on the practical case where the variance of the noise is unknown. Our main contribution is to propose in this setting a criterion to select among a family of low rank estimators and prove a non-asymptotic oracle inequality for the resulting estimator. We also investigate the easier case where the variance of the noise is known and outline that the penalties appearing in our criterions are minimal (in some sense). These penalties involve the expected value of the Ky-Fan quasi-norm of some random matrices. These quantities can be evaluated easily in practice and upper-bounds can be derived from recent results in random matrix theory.
△ Less
Submitted 22 June, 2011; v1 submitted 27 September, 2010;
originally announced September 2010.
-
Estimator selection in the Gaussian setting
Authors:
Yannick Baraud,
Christophe Giraud,
Sylvie Huet
Abstract:
We consider the problem of estimating the mean $f$ of a Gaussian vector $Y$ with independent components of common unknown variance $σ^{2}$. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection $\FF$ of estimators of $f$ based on $Y$ and, with the same data $Y$, aim at selecting an estimator among $\FF$ with the smalle…
▽ More
We consider the problem of estimating the mean $f$ of a Gaussian vector $Y$ with independent components of common unknown variance $σ^{2}$. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection $\FF$ of estimators of $f$ based on $Y$ and, with the same data $Y$, aim at selecting an estimator among $\FF$ with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to $Y$ may be unknown. We establish a non-asymptotic risk bound for the selected estimator. As particular cases, our approach allows to handle the problems of aggregation and model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. We also derive oracle-type inequalities when $\FF$ consists of linear estimators. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice.
△ Less
Submitted 22 June, 2011; v1 submitted 13 July, 2010;
originally announced July 2010.
-
Atomicity Improvement for Elliptic Curve Scalar Multiplication
Authors:
Christophe Giraud,
Vincent Verneuil
Abstract:
In this paper we address the problem of protecting elliptic curve scalar multiplication implementations against side-channel analysis by using the atomicity principle. First of all we reexamine classical assumptions made by scalar multiplication designers and we point out that some of them are not relevant in the context of embedded devices. We then describe the state-of-the-art of atomic scalar…
▽ More
In this paper we address the problem of protecting elliptic curve scalar multiplication implementations against side-channel analysis by using the atomicity principle. First of all we reexamine classical assumptions made by scalar multiplication designers and we point out that some of them are not relevant in the context of embedded devices. We then describe the state-of-the-art of atomic scalar multiplication and propose an atomic pattern improvement method. Compared to the most efficient atomic scalar multiplication published so far, our technique shows an average improvement of up to 10.6%.
△ Less
Submitted 2 March, 2010; v1 submitted 24 February, 2010;
originally announced February 2010.
-
Graph selection with GGMselect
Authors:
Christophe Giraud,
Sylvie Huet,
Nicolas Verzelen
Abstract:
Applications on inference of biological networks have raised a strong interest in the problem of graph estimation in high-dimensional Gaussian graphical models. To handle this problem, we propose a two-stage procedure which first builds a family of candidate graphs from the data, and then selects one graph among this family according to a dedicated criterion. This estimation procedure is shown to…
▽ More
Applications on inference of biological networks have raised a strong interest in the problem of graph estimation in high-dimensional Gaussian graphical models. To handle this problem, we propose a two-stage procedure which first builds a family of candidate graphs from the data, and then selects one graph among this family according to a dedicated criterion. This estimation procedure is shown to be consistent in a high-dimensional setting, and its risk is controlled by a non-asymptotic oracle-like inequality. The procedure is tested on a real data set concerning gene expression data, and its performances are assessed on the basis of a large numerical study. The procedure is implemented in the R-package GGMselect available on the CRAN.
△ Less
Submitted 15 February, 2012; v1 submitted 3 July, 2009;
originally announced July 2009.
-
Mixing Least-Squares Estimators when the Variance is Unknown
Authors:
Christophe Giraud
Abstract:
We propose a procedure to handle the problem of Gaussian regression when the variance is unknown. We mix least-squares estimators from various models according to a procedure inspired by that of Leung and Barron (2007). We show that in some cases the resulting estimator is a simple shrinkage estimator. We then apply this procedure in various statistical settings such as linear regression or adap…
▽ More
We propose a procedure to handle the problem of Gaussian regression when the variance is unknown. We mix least-squares estimators from various models according to a procedure inspired by that of Leung and Barron (2007). We show that in some cases the resulting estimator is a simple shrinkage estimator. We then apply this procedure in various statistical settings such as linear regression or adaptive estimation in Besov spaces. Our results provide non-asymptotic risk bounds for the Euclidean risk of the estimator.
△ Less
Submitted 2 November, 2007;
originally announced November 2007.
-
Estimation of Gaussian graphs by model selection
Authors:
Christophe Giraud
Abstract:
We investigate in this paper the estimation of Gaussian graphs by model selection from a non-asymptotic point of view. We start from a n-sample of a Gaussian law P_C in R^p and focus on the disadvantageous case where n is smaller than p. To estimate the graph of conditional dependences of P_C, we introduce a collection of candidate graphs and then select one of them by minimizing a penalized emp…
▽ More
We investigate in this paper the estimation of Gaussian graphs by model selection from a non-asymptotic point of view. We start from a n-sample of a Gaussian law P_C in R^p and focus on the disadvantageous case where n is smaller than p. To estimate the graph of conditional dependences of P_C, we introduce a collection of candidate graphs and then select one of them by minimizing a penalized empirical risk. Our main result assess the performance of the procedure in a non-asymptotic setting. We pay a special attention to the maximal degree D of the graphs that we can handle, which turns to be roughly n/(2 log p).
△ Less
Submitted 16 July, 2008; v1 submitted 10 October, 2007;
originally announced October 2007.
-
Gaussian model selection with an unknown variance
Authors:
Yannick Baraud,
Christophe Giraud,
Sylvie Huet
Abstract:
Let $Y$ be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean $μ$ of $Y$ by model selection. More precisely, we start with a collection $\mathcal{S}=\{S_m,m\in\mathcal{M}\}$ of linear subspaces of $\mathbb{R}^n$ and associate to each of these the least-squares estimator of $μ$ on $S_m$. Then, we use a data driven pena…
▽ More
Let $Y$ be a Gaussian vector whose components are independent with a common unknown variance. We consider the problem of estimating the mean $μ$ of $Y$ by model selection. More precisely, we start with a collection $\mathcal{S}=\{S_m,m\in\mathcal{M}\}$ of linear subspaces of $\mathbb{R}^n$ and associate to each of these the least-squares estimator of $μ$ on $S_m$. Then, we use a data driven penalized criterion in order to select one estimator among these. Our first objective is to analyze the performance of estimators associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second objective is to propose better penalties that are versatile enough to take into account both the complexity of the collection $\mathcal{S}$ and the sample size. Then we apply those to solve various statistical problems such as variable selection, change point detections and signal estimation among others. Our results are based on a nonasymptotic risk bound with respect to the Euclidean loss for the selected estimator. Some analogous results are also established for the Kullback loss.
△ Less
Submitted 1 April, 2009; v1 submitted 9 January, 2007;
originally announced January 2007.