-
Graphon Estimation in bipartite graphs with observable edge labels and unobservable node labels
Authors:
Etienne Donier-Meroz,
Arnak S. Dalalyan,
Francis Kramarz,
Philippe Choné,
Xavier D'Haultfoeuille
Abstract:
Many real-world data sets can be presented in the form of a matrix whose entries correspond to the interaction between two entities of different natures (number of times a web user visits a web page, a student's grade in a subject, a patient's rating of a doctor, etc.). We assume in this paper that the mentioned interaction is determined by unobservable latent variables describing each entity. Our…
▽ More
Many real-world data sets can be presented in the form of a matrix whose entries correspond to the interaction between two entities of different natures (number of times a web user visits a web page, a student's grade in a subject, a patient's rating of a doctor, etc.). We assume in this paper that the mentioned interaction is determined by unobservable latent variables describing each entity. Our objective is to estimate the conditional expectation of the data matrix given the unobservable variables. This is presented as a problem of estimation of a bivariate function referred to as graphon. We study the cases of piecewise constant and Hölder-continuous graphons. We establish finite sample risk bounds for the least squares estimator and the exponentially weighted aggregate. These bounds highlight the dependence of the estimation error on the size of the data set, the maximum intensity of the interactions, and the level of noise. As the analyzed least-squares estimator is intractable, we propose an adaptation of Lloyd's alternating minimization algorithm to compute an approximation of the least-squares estimator. Finally, we present numerical experiments in order to illustrate the empirical performance of the graphon estimator on synthetic data sets.
△ Less
Submitted 4 September, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Simple proof of the risk bound for denoising by exponential weights for asymmetric noise distributions
Authors:
Arnak S. Dalalyan
Abstract:
In this note, we consider the problem of aggregation of estimators in order to denoise a signal. The main contribution is a short proof of the fact that the exponentially weighted aggregate satisfies a sharp oracle inequality. While this result was already known for a wide class of symmetric noise distributions, the extension to asymmetric distributions presented in this note is new.
In this note, we consider the problem of aggregation of estimators in order to denoise a signal. The main contribution is a short proof of the fact that the exponentially weighted aggregate satisfies a sharp oracle inequality. While this result was already known for a wide class of symmetric noise distributions, the extension to asymmetric distributions presented in this note is new.
△ Less
Submitted 25 December, 2022;
originally announced December 2022.
-
Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction
Authors:
Amir-Hossein Bateni,
Arshak Minasyan,
Arnak S. Dalalyan
Abstract:
We study the problem of robust estimation of the mean vector of a sub-Gaussian distribution. We introduce an estimator based on spectral dimension reduction (SDR) and establish a finite sample upper bound on its error that is minimax-optimal up to a logarithmic factor. Furthermore, we prove that the breakdown point of the SDR estimator is equal to $1/2$, the highest possible value of the breakdown…
▽ More
We study the problem of robust estimation of the mean vector of a sub-Gaussian distribution. We introduce an estimator based on spectral dimension reduction (SDR) and establish a finite sample upper bound on its error that is minimax-optimal up to a logarithmic factor. Furthermore, we prove that the breakdown point of the SDR estimator is equal to $1/2$, the highest possible value of the breakdown point. In addition, the SDR estimator is equivariant by similarity transforms and has low computational complexity. More precisely, in the case of $n$ vectors of dimension $p$ -- at most $\varepsilon n$ out of which are adversarially corrupted -- the SDR estimator has a squared error of order $\big(\frac{r_Σ}{n} + \varepsilon^2\log(1/\varepsilon)\big){\log p}$ and a running time of order $p^3 + n p^2$. Here, $r_Σ\le p$ is the effective rank of the covariance matrix of the reference distribution. Another advantage of the SDR estimator is that it does not require knowledge of the contamination rate and does not involve sample splitting. We also investigate extensions of the proposed algorithm and of the obtained results in the case of (partially) unknown covariance matrix.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Risk bounds for aggregated shallow neural networks using Gaussian prior
Authors:
Laura Tinsi,
Arnak S. Dalalyan
Abstract:
Analysing statistical properties of neural networks is a central topic in statistics and machine learning. However, most results in the literature focus on the properties of the neural network minimizing the training error. The goal of this paper is to consider aggregated neural networks using a Gaussian prior. The departure point of our approach is an arbitrary aggregate satisfying the PAC-Bayesi…
▽ More
Analysing statistical properties of neural networks is a central topic in statistics and machine learning. However, most results in the literature focus on the properties of the neural network minimizing the training error. The goal of this paper is to consider aggregated neural networks using a Gaussian prior. The departure point of our approach is an arbitrary aggregate satisfying the PAC-Bayesian inequality. The main contribution is a precise nonasymptotic assessment of the estimation error appearing in the PAC-Bayes bound. We also review available bounds on the error of approximating a function by a neural network. Combining bounds on estimation and approximation errors, we establish risk bounds that are sharp enough to lead to minimax rates of estimation over Sobolev smoothness classes.
△ Less
Submitted 2 February, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets
Authors:
Avetik Karagulyan,
Arnak S. Dalalyan
Abstract:
We study the problem of sampling from a probability distribution on $\mathbb R^p$ defined via a convex and smooth potential function. We consider a continuous-time diffusion-type process, termed Penalized Langevin dynamics (PLD), the drift of which is the negative gradient of the potential plus a linear penalty that vanishes when time goes to infinity. An upper bound on the Wasserstein-2 distance…
▽ More
We study the problem of sampling from a probability distribution on $\mathbb R^p$ defined via a convex and smooth potential function. We consider a continuous-time diffusion-type process, termed Penalized Langevin dynamics (PLD), the drift of which is the negative gradient of the potential plus a linear penalty that vanishes when time goes to infinity. An upper bound on the Wasserstein-2 distance between the distribution of the PLD at time $t$ and the target is established. This upper bound highlights the influence of the speed of decay of the penalty on the accuracy of the approximation. As a consequence, considering the low-temperature limit we infer a new nonasymptotic guarantee of convergence of the penalized gradient flow for the optimization problem.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
All-In-One Robust Estimator of the Gaussian Mean
Authors:
Arnak S. Dalalyan,
Arshak Minasyan
Abstract:
The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uni…
▽ More
The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to $0.5$, and a nearly-minimax-rate-breakdown point approximately equal to $0.28$. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free non-asymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.
△ Less
Submitted 4 March, 2021; v1 submitted 4 February, 2020;
originally announced February 2020.
-
Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets
Authors:
Arnak S. Dalalyan,
Avetik Karagulyan,
Lionel Riou-Durand
Abstract:
In this paper, we provide non-asymptotic upper bounds on the error of sampling from a target density using three schemes of discretized Langevin diffusions. The first scheme is the Langevin Monte Carlo (LMC) algorithm, the Euler discretization of the Langevin diffusion. The second and the third schemes are, respectively, the kinetic Langevin Monte Carlo (KLMC) for differentiable potentials and the…
▽ More
In this paper, we provide non-asymptotic upper bounds on the error of sampling from a target density using three schemes of discretized Langevin diffusions. The first scheme is the Langevin Monte Carlo (LMC) algorithm, the Euler discretization of the Langevin diffusion. The second and the third schemes are, respectively, the kinetic Langevin Monte Carlo (KLMC) for differentiable potentials and the kinetic Langevin Monte Carlo for twice-differentiable potentials (KLMC2). The main focus is on the target densities that are smooth and log-concave on $\mathbb R^p$, but not necessarily strongly log-concave. Bounds on the computational complexity are obtained under two types of smoothness assumption: the potential has a Lipschitz-continuous gradient and the potential has a Lipschitz-continuous Hessian matrix. The error of sampling is measured by Wasserstein-$q$ distances. We advocate for the use of a new dimension-adapted scaling in the definition of the computational complexity, when Wasserstein-$q$ distances are considered. The obtained results show that the number of iterations to achieve a scaled-error smaller than a prescribed value depends only polynomially in the dimension.
△ Less
Submitted 5 December, 2021; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Outlier-robust estimation of a sparse linear model using $\ell_1$-penalized Huber's $M$-estimator
Authors:
Arnak S. Dalalyan,
Philip Thompson
Abstract:
We study the problem of estimating a $p$-dimensional $s$-sparse vector in a linear model with Gaussian design and additive noise. In the case where the labels are contaminated by at most $o$ adversarial outliers, we prove that the $\ell_1$-penalized Huber's $M$-estimator based on $n$ samples attains the optimal rate of convergence $(s/n)^{1/2} + (o/n)$, up to a logarithmic factor. For more general…
▽ More
We study the problem of estimating a $p$-dimensional $s$-sparse vector in a linear model with Gaussian design and additive noise. In the case where the labels are contaminated by at most $o$ adversarial outliers, we prove that the $\ell_1$-penalized Huber's $M$-estimator based on $n$ samples attains the optimal rate of convergence $(s/n)^{1/2} + (o/n)$, up to a logarithmic factor. For more general design matrices, our results highlight the importance of two properties: the transfer principle and the incoherence property. These properties with suitable constants are shown to yield the optimal rates, up to log-factors, of robust estimation with adversarial contamination.
△ Less
Submitted 19 November, 2019; v1 submitted 12 April, 2019;
originally announced April 2019.
-
A nonasymptotic law of iterated logarithm for general M-estimators
Authors:
Victor-Emmanuel Brunel,
Arnak S. Dalalyan,
Nicolas Schreuder
Abstract:
M-estimators are ubiquitous in machine learning and statistical learning theory. They are used both for defining prediction strategies and for evaluating their precision. In this paper, we propose the first non-asymptotic "any-time" deviation bounds for general M-estimators, where "any-time" means that the bound holds with a prescribed probability for every sample size. These bounds are nonasympto…
▽ More
M-estimators are ubiquitous in machine learning and statistical learning theory. They are used both for defining prediction strategies and for evaluating their precision. In this paper, we propose the first non-asymptotic "any-time" deviation bounds for general M-estimators, where "any-time" means that the bound holds with a prescribed probability for every sample size. These bounds are nonasymptotic versions of the law of iterated logarithm. They are established under general assumptions such as Lipschitz continuity of the loss function and (local) curvature of the population risk. These conditions are satisfied for most examples used in machine learning, including those ensuring robustness to outliers and to heavy tailed distributions. As an example of application, we consider the problem of best arm identification in a parametric stochastic multi-arm bandit setting. We show that the established bound can be converted into a new algorithm, with provably optimal theoretical guarantees. Numerical experiments illustrating the validity of the algorithm are reported.
△ Less
Submitted 24 May, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
Confidence regions and minimax rates in outlier-robust estimation on the probability simplex
Authors:
Amir-Hossein Bateni,
Arnak S. Dalalyan
Abstract:
We consider the problem of estimating the mean of a distribution supported by the $k$-dimensional probability simplex in the setting where an $\varepsilon$ fraction of observations are subject to adversarial corruption. A simple particular example is the problem of estimating the distribution of a discrete random variable. Assuming that the discrete variable takes $k$ values, the unknown parameter…
▽ More
We consider the problem of estimating the mean of a distribution supported by the $k$-dimensional probability simplex in the setting where an $\varepsilon$ fraction of observations are subject to adversarial corruption. A simple particular example is the problem of estimating the distribution of a discrete random variable. Assuming that the discrete variable takes $k$ values, the unknown parameter $\boldsymbol θ$ is a $k$-dimensional vector belonging to the probability simplex. We first describe various settings of contamination and discuss the relation between these settings. We then establish minimax rates when the quality of estimation is measured by the total-variation distance, the Hellinger distance, or the $\mathbb L^2$-distance between two probability measures. We also provide confidence regions for the unknown mean that shrink at the minimax rate. Our analysis reveals that the minimax rates associated to these three distances are all different, but they are all attained by the sample average. Furthermore, we show that the latter is adaptive to the possible sparsity of the unknown vector. Some numerical experiments illustrating our theoretical findings are reported.
△ Less
Submitted 1 February, 2020; v1 submitted 12 February, 2019;
originally announced February 2019.
-
On sampling from a log-concave density using kinetic Langevin diffusions
Authors:
Arnak S. Dalalyan,
Lionel Riou-Durand
Abstract:
Langevin diffusion processes and their discretizations are often used for sampling from a target density. The most convenient framework for assessing the quality of such a sampling scheme corresponds to smooth and strongly log-concave densities defined on $\mathbb R^p$. The present work focuses on this framework and studies the behavior of Monte Carlo algorithms based on discretizations of the kin…
▽ More
Langevin diffusion processes and their discretizations are often used for sampling from a target density. The most convenient framework for assessing the quality of such a sampling scheme corresponds to smooth and strongly log-concave densities defined on $\mathbb R^p$. The present work focuses on this framework and studies the behavior of Monte Carlo algorithms based on discretizations of the kinetic Langevin diffusion. We first prove the geometric mixing property of the kinetic Langevin diffusion with a mixing rate that is, in the overdamped regime, optimal in terms of its dependence on the condition number. We then use this result for obtaining improved guarantees of sampling using the kinetic Langevin Monte Carlo method, when the quality of sampling is measured by the Wasserstein distance. We also consider the situation where the Hessian of the log-density of the target distribution is Lipschitz-continuous. In this case, we introduce a new discretization of the kinetic Langevin diffusion and prove that this leads to a substantial improvement of the upper bound on the sampling error measured in Wasserstein distance.
△ Less
Submitted 26 December, 2018; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Exponential weights in multivariate regression and a low-rankness favoring prior
Authors:
Arnak S. Dalalyan
Abstract:
We establish theoretical guarantees for the expected prediction error of the exponential weighting aggregate in the case of multivariate regression that is when the label vector is multidimensional. We consider the regression model with fixed design and bounded noise. The first new feature uncovered by our guarantees is that it is not necessary to require independence of the observations: a symmet…
▽ More
We establish theoretical guarantees for the expected prediction error of the exponential weighting aggregate in the case of multivariate regression that is when the label vector is multidimensional. We consider the regression model with fixed design and bounded noise. The first new feature uncovered by our guarantees is that it is not necessary to require independence of the observations: a symmetry condition on the noise distribution alone suffices to get a sharp risk bound. This result needs the regression vectors to be bounded. A second curious finding concerns the case of unbounded regression vectors but independent noise. It turns out that applying exponential weights to the label vectors perturbed by a uniform noise leads to an estimator satisfying a sharp oracle inequality. The last contribution is the instantiation of the proposed oracle inequalities to problems in which the unknown parameter is a matrix. We propose a low-rankness favoring prior and show that it leads to an estimator that is optimal under weak assumptions.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Restricted eigenvalue property for corrupted Gaussian designs
Authors:
Philip Thompson,
Arnak S. Dalalyan
Abstract:
Motivated by the construction of tractable robust estimators via convex relaxations, we present conditions on the sample size which guarantee an augmented notion of Restricted Eigenvalue-type condition for Gaussian designs. Such a notion is suitable for high-dimensional robust inference in a Gaussian linear model and a multivariate Gaussian model when samples are corrupted by outliers either in th…
▽ More
Motivated by the construction of tractable robust estimators via convex relaxations, we present conditions on the sample size which guarantee an augmented notion of Restricted Eigenvalue-type condition for Gaussian designs. Such a notion is suitable for high-dimensional robust inference in a Gaussian linear model and a multivariate Gaussian model when samples are corrupted by outliers either in the response variable or in the design matrix. Our proof technique relies on simultaneous lower and upper bounds of two random bilinear forms with very different behaviors. Such simultaneous bounds are used for balancing the interaction between the parameter vector and the estimated corruption vector as well as for controlling the presence of corruption in the design. Our technique has the advantage of not relying on known bounds of the extreme singular values of the associated Gaussian ensemble nor on the use of mutual incoherence arguments. A relevant consequence of our analysis, compared to prior work, is that a significantly sharper restricted eigenvalue constant can be obtained under weaker assumptions. In particular, the sparsity of the unknown parameter and the number of outliers are allowed to be completely independent of each other.
△ Less
Submitted 30 November, 2018; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Minimax estimation of a p-dimensional linear functional in sparse Gaussian models and robust estimation of the mean
Authors:
Olivier Collier,
Arnak S. Dalalyan
Abstract:
We consider two problems of estimation in high-dimensional Gaussian models. The first problem is that of estimating a linear functional of the means of $n$ independent $p$-dimensional Gaussian vectors, under the assumption that most of these means are equal to zero. We show that, up to a logarithmic factor, the minimax rate of estimation in squared Euclidean norm is between $(s^2\wedge n) +sp$ and…
▽ More
We consider two problems of estimation in high-dimensional Gaussian models. The first problem is that of estimating a linear functional of the means of $n$ independent $p$-dimensional Gaussian vectors, under the assumption that most of these means are equal to zero. We show that, up to a logarithmic factor, the minimax rate of estimation in squared Euclidean norm is between $(s^2\wedge n) +sp$ and $(s^2\wedge np)+sp$. The estimator that attains the upper bound being computationally demanding, we investigate suitable versions of group thresholding estimators that are efficiently computable even when the dimension and the sample size are very large. An interesting new phenomenon revealed by this investigation is that the group thresholding leads to a substantial improvement in the rate as compared to the element-wise thresholding. Thus, the rate of the group thresholding is $s^2\sqrt{p}+sp$, while the element-wise thresholding has an error of order $s^2p+sp$. To the best of our knowledge, this is the first known setting in which leveraging the group structure leads to a polynomial improvement in the rate.
The second problem studied in this work is the estimation of the common $p$-dimensional mean of the inliers among $n$ independent Gaussian vectors. We show that there is a strong analogy between this problem and the first one. Exploiting it, we propose new strategies of robust estimation that are computationally tractable and have better rates of convergence than the other computationally tractable robust (with respect to the presence of the outliers in the data) estimators studied in the literature. However, this tractability comes with a loss of the minimax-rate-optimality in some regimes.
△ Less
Submitted 8 November, 2018; v1 submitted 14 December, 2017;
originally announced December 2017.
-
User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient
Authors:
Arnak S. Dalalyan,
Avetik G. Karagulyan
Abstract:
In this paper, we study the problem of sampling from a given probability density function that is known to be smooth and strongly log-concave. We analyze several methods of approximate sampling based on discretizations of the (highly overdamped) Langevin diffusion and establish guarantees on its error measured in the Wasserstein-2 distance. Our guarantees improve or extend the state-of-the-art res…
▽ More
In this paper, we study the problem of sampling from a given probability density function that is known to be smooth and strongly log-concave. We analyze several methods of approximate sampling based on discretizations of the (highly overdamped) Langevin diffusion and establish guarantees on its error measured in the Wasserstein-2 distance. Our guarantees improve or extend the state-of-the-art results in three directions. First, we provide an upper bound on the error of the first-order Langevin Monte Carlo (LMC) algorithm with optimized varying step-size. This result has the advantage of being horizon free (we do not need to know in advance the target precision) and to improve by a logarithmic factor the corresponding result for the constant step-size. Second, we study the case where accurate evaluations of the gradient of the log-density are unavailable, but one can have access to approximations of the aforementioned gradient. In such a situation, we consider both deterministic and stochastic approximations of the gradient and provide an upper bound on the sampling error of the first-order LMC that quantifies the impact of the gradient evaluation inaccuracies. Third, we establish upper bounds for two versions of the second-order LMC, which leverage the Hessian of the log-density. We provide nonasymptotic guarantees on the sampling error of these second-order LMCs. These guarantees reveal that the second-order LMC algorithms improve on the first-order LMC in ill-conditioned settings.
△ Less
Submitted 23 February, 2024; v1 submitted 29 September, 2017;
originally announced October 2017.
-
Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent
Authors:
Arnak S. Dalalyan
Abstract:
In this paper, we revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density. We improve the existing results when the convergence is measured in the Wasserstein distance and provide further insights on the very tight relations between, on the one hand, the Langevin Monte Carlo for s…
▽ More
In this paper, we revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density. We improve the existing results when the convergence is measured in the Wasserstein distance and provide further insights on the very tight relations between, on the one hand, the Langevin Monte Carlo for sampling and, on the other hand, the gradient descent for optimization. Finally, we also establish guarantees for the convergence of a version of the Langevin Monte Carlo algorithm that is based on noisy evaluations of the gradient.
△ Less
Submitted 28 July, 2017; v1 submitted 16 April, 2017;
originally announced April 2017.
-
Optimal Kullback-Leibler Aggregation in Mixture Density Estimation by Maximum Likelihood
Authors:
Arnak S. Dalalyan,
Mehdi Sebbar
Abstract:
We study the maximum likelihood estimator of density of $n$ independent observations, under the assumption that it is well approximated by a mixture with a large number of components. The main focus is on statistical properties with respect to the Kullback-Leibler loss. We establish risk bounds taking the form of sharp oracle inequalities both in deviation and in expectation. A simple consequence…
▽ More
We study the maximum likelihood estimator of density of $n$ independent observations, under the assumption that it is well approximated by a mixture with a large number of components. The main focus is on statistical properties with respect to the Kullback-Leibler loss. We establish risk bounds taking the form of sharp oracle inequalities both in deviation and in expectation. A simple consequence of these bounds is that the maximum likelihood estimator attains the optimal rate $((\log K)/n)^{1/2}$, up to a possible logarithmic correction, in the problem of convex aggregation when the number $K$ of components is larger than $n^{1/2}$. More importantly, under the additional assumption that the Gram matrix of the components satisfies the compatibility condition, the obtained oracle inequalities yield the optimal rate in the sparsity scenario. That is, if the weight vector is (nearly) $D$-sparse, we get the rate $(D\log K)/n$. As a natural complement to our oracle inequalities, we introduce the notion of nearly-$D$-sparse aggregation and establish matching lower bounds for this type of aggregation.
△ Less
Submitted 18 January, 2017;
originally announced January 2017.
-
On the Exponentially Weighted Aggregate with the Laplace Prior
Authors:
Arnak S. Dalalyan,
Edwin Grappin,
Quentin Paris
Abstract:
In this paper, we study the statistical behaviour of the Exponentially Weighted Aggregate (EWA) in the problem of high-dimensional regression with fixed design. Under the assumption that the underlying regression vector is sparse, it is reasonable to use the Laplace distribution as a prior. The resulting estimator and, specifically, a particular instance of it referred to as the Bayesian lasso, wa…
▽ More
In this paper, we study the statistical behaviour of the Exponentially Weighted Aggregate (EWA) in the problem of high-dimensional regression with fixed design. Under the assumption that the underlying regression vector is sparse, it is reasonable to use the Laplace distribution as a prior. The resulting estimator and, specifically, a particular instance of it referred to as the Bayesian lasso, was already used in the statistical literature because of its computational convenience, even though no thorough mathematical analysis of its statistical properties was carried out. The present work fills this gap by establishing sharp oracle inequalities for the EWA with the Laplace prior. These inequalities show that if the temperature parameter is small, the EWA with the Laplace prior satisfies the same type of oracle inequality as the lasso estimator does, as long as the quality of estimation is measured by the prediction loss. Extensions of the proposed methodology to the problem of prediction with low-rank matrices are considered.
△ Less
Submitted 25 November, 2016;
originally announced November 2016.
-
On the prediction loss of the lasso in the partially labeled setting
Authors:
Pierre C. Bellec,
Arnak S. Dalalyan,
Edwin Grappin,
Quentin Paris
Abstract:
In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-…
▽ More
In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.
△ Less
Submitted 8 November, 2016; v1 submitted 20 June, 2016;
originally announced June 2016.
-
On estimation of the diagonal elements of a sparse precision matrix
Authors:
Samuel Balmand,
Arnak S. Dalalyan
Abstract:
In this paper, we present several estimators of the diagonal elements of the inverse of the covariance matrix, called precision matrix, of a sample of iid random vectors. The focus is on high dimensional vectors having a sparse precision matrix. It is now well understood that when the underlying distribution is Gaussian, the columns of the precision matrix can be estimated independently form one a…
▽ More
In this paper, we present several estimators of the diagonal elements of the inverse of the covariance matrix, called precision matrix, of a sample of iid random vectors. The focus is on high dimensional vectors having a sparse precision matrix. It is now well understood that when the underlying distribution is Gaussian, the columns of the precision matrix can be estimated independently form one another by solving linear regression problems under sparsity constraints. This approach leads to a computationally efficient strategy for estimating the precision matrix that starts by estimating the regression vectors, then estimates the diagonal entries of the precision matrix and, in a final step, combines these estimators for getting estimators of the off-diagonal entries. While the step of estimating the regression vector has been intensively studied over the past decade, the problem of deriving statistically accurate estimators of the diagonal entries has received much less attention. The goal of the present paper is to fill this gap by presenting four estimators---that seem the most natural ones---of the diagonal entries of the precision matrix and then performing a comprehensive empirical evaluation of these estimators. The estimators under consideration are the residual variance, the relaxed maximum likelihood, the symmetry-enforced maximum likelihood and the penalized maximum likelihood. We show, both theoretically and empirically, that when the aforementioned regression vectors are estimated without error, the symmetry-enforced maximum likelihood estimator has the smallest estimation error. However, in a more realistic setting when the regression vector is estimated by a sparsity-favoring computationally efficient method, the qualities of the estimators become relatively comparable with a slight advantage for the residual variance estimator.
△ Less
Submitted 25 May, 2016; v1 submitted 18 April, 2015;
originally announced April 2015.
-
Discussion on the paper: Hypotheses testing by convex optimization by Goldenshluger, Juditsky and Nemirovski
Authors:
Arnak S. Dalalyan
Abstract:
We briefly discuss some interesting questions related to the paper "Hypotheses testing by convex optimization" by Goldenshluger, Juditsky and Nemirovski.
We briefly discuss some interesting questions related to the paper "Hypotheses testing by convex optimization" by Goldenshluger, Juditsky and Nemirovski.
△ Less
Submitted 2 February, 2015;
originally announced February 2015.
-
Theoretical guarantees for approximate sampling from smooth and log-concave densities
Authors:
Arnak S. Dalalyan
Abstract:
Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence intervals. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is…
▽ More
Sampling from various kinds of distributions is an issue of paramount importance in statistics since it is often the key ingredient for constructing estimators, test procedures or confidence intervals. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is no well-developed theory providing meaningful nonasymptotic guarantees for the approximate sampling procedures, especially in the high-dimensional problems. This paper makes some progress in this direction by considering the problem of sampling from a distribution having a smooth and log-concave density defined on \(\RR^p\), for some integer \(p>0\). We establish nonasymptotic bounds for the error of approximating the target distribution by the one obtained by the Langevin Monte Carlo method and its variants. We illustrate the effectiveness of the established guarantees with various experiments. Underlying our analysis are insights from the theory of continuous-time diffusion processes, which may be of interest beyond the framework of log-concave densities considered in the present work.
△ Less
Submitted 3 December, 2016; v1 submitted 23 December, 2014;
originally announced December 2014.
-
On the Prediction Performance of the Lasso
Authors:
Arnak S. Dalalyan,
Mohamed Hebiri,
Johannes Lederer
Abstract:
Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly opti…
▽ More
Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. We finally show that our results also lead to near-optimal rates for the least-squares estimator with total variation penalty.
△ Less
Submitted 8 November, 2016; v1 submitted 7 February, 2014;
originally announced February 2014.
-
Minimax rates in permutation estimation for feature matching
Authors:
Olivier Collier,
Arnak S. Dalalyan
Abstract:
The problem of matching two sets of features appears in various tasks of computer vision and can be often formalized as a problem of permutation estimation. We address this problem from a statistical point of view and provide a theoretical analysis of the accuracy of several natural estimators. To this end, the minimax rate of separation is investigated and its expression is obtained as a function…
▽ More
The problem of matching two sets of features appears in various tasks of computer vision and can be often formalized as a problem of permutation estimation. We address this problem from a statistical point of view and provide a theoretical analysis of the accuracy of several natural estimators. To this end, the minimax rate of separation is investigated and its expression is obtained as a function of the sample size, noise level and dimension. We consider the cases of homoscedastic and heteroscedastic noise and establish, in each case, tight upper bounds on the separation distance of several estimators. These upper bounds are shown to be unimprovable both in the homoscedastic and heteroscedastic settings. Interestingly, these bounds demonstrate that a phase transition occurs when the dimension $d$ of the features is of the order of the logarithm of the number of features $n$. For $d=O(\log n)$, the rate is dimension free and equals $σ(\log n)^{1/2}$, where $σ$ is the noise level. In contrast, when $d$ is larger than $c\log n$ for some constant $c>0$, the minimax rate increases with $d$ and is of the order $σ(d\log n)^{1/4}$. We also discuss the computational aspects of the estimators and provide empirical evidence of their consistency on synthetic data. Finally, we show that our results extend to more general matching criteria.
△ Less
Submitted 2 February, 2015; v1 submitted 17 October, 2013;
originally announced October 2013.
-
Curve registration by nonparametric goodness-of-fit testing
Authors:
Olivier Collier,
Arnak S. Dalalyan
Abstract:
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a c…
▽ More
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a chi-squared random variable. This result, often referred to as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic significance level and a natural measure of lack-of-fit in terms of the $p$-value of the $χ^2$-test. We also prove that the proposed test is consistent, \textit{i.e.}, its power is asymptotically equal to $1$. Finite sample properties of the proposed methodology are demonstrated by numerical simulations. As an application, a new local descriptor for digital images is introduced and an experimental evaluation of its discriminative power is conducted.
△ Less
Submitted 19 February, 2015; v1 submitted 21 April, 2011;
originally announced April 2011.
-
Mirror averaging with sparsity priors
Authors:
Arnak S. Dalalyan,
Alexandre B. Tsybakov
Abstract:
We consider the problem of aggregating the elements of a possibly infinite dictionary for building a decision procedure that aims at minimizing a given criterion. Along with the dictionary, an independent identically distributed training sample is available, on which the performance of a given procedure can be tested. In a fairly general set-up, we establish an oracle inequality for the Mirror Ave…
▽ More
We consider the problem of aggregating the elements of a possibly infinite dictionary for building a decision procedure that aims at minimizing a given criterion. Along with the dictionary, an independent identically distributed training sample is available, on which the performance of a given procedure can be tested. In a fairly general set-up, we establish an oracle inequality for the Mirror Averaging aggregate with any prior distribution. By choosing an appropriate prior, we apply this oracle inequality in the context of prediction under sparsity assumption for the problems of regression with random design, density estimation and binary classification.
△ Less
Submitted 17 August, 2012; v1 submitted 5 March, 2010;
originally announced March 2010.
-
Penalized maximum likelihood and semiparametric second-order efficiency
Authors:
A. S. Dalalyan,
G. K. Golubev,
A. B. Tsybakov
Abstract:
We consider the problem of estimation of a shift parameter of an unknown symmetric function in Gaussian white noise. We introduce a notion of semiparametric second-order efficiency and propose estimators that are semiparametrically efficient and second-order efficient in our model. These estimators are of a penalized maximum likelihood type with an appropriately chosen penalty. We argue that sec…
▽ More
We consider the problem of estimation of a shift parameter of an unknown symmetric function in Gaussian white noise. We introduce a notion of semiparametric second-order efficiency and propose estimators that are semiparametrically efficient and second-order efficient in our model. These estimators are of a penalized maximum likelihood type with an appropriately chosen penalty. We argue that second-order efficiency is crucial in semiparametric problems since only the second-order terms in asymptotic expansion for the risk account for the behavior of the ``nonparametric component'' of a semiparametric procedure, and they are not dramatically smaller than the first-order terms.
△ Less
Submitted 16 May, 2006;
originally announced May 2006.