-
Model-adapted Fourier sampling for generative compressed sensing
Authors:
Aaron Berk,
Simone Brugiapaglia,
Yaniv Plan,
Matthew Scott,
Xia Sheng,
Ozgur Yilmaz
Abstract:
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each comp…
▽ More
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each component of the so-called local coherence vector $\boldsymbolα$ quantifies the alignment of a corresponding Fourier vector with the range of $G$. We construct a model-adapted sampling strategy with an improved sample complexity of $\textit{O}(kd\| \boldsymbolα\|_{2}^{2})$ measurements. This is enabled by: (1) new theoretical recovery guarantees that we develop for nonuniformly random sampling distributions and then (2) optimizing the sampling distribution to minimize the number of measurements needed for these guarantees. This development offers a sample complexity applicable to natural signal classes, which are often almost maximally coherent with low Fourier frequencies. Finally, we consider a surrogate sampling scheme, and validate its performance in recovery experiments using the CelebA dataset.
△ Less
Submitted 17 November, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
A coherence parameter characterizing generative compressed sensing with Fourier measurements
Authors:
Aaron Berk,
Simone Brugiapaglia,
Babhru Joshi,
Yaniv Plan,
Matthew Scott,
Özgür Yilmaz
Abstract:
In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We mov…
▽ More
In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries and provide recovery bounds, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.
△ Less
Submitted 9 November, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Beyond Independent Measurements: General Compressed Sensing with GNN Application
Authors:
Alireza Naderi,
Yaniv Plan
Abstract:
We consider the problem of recovering a structured signal $\mathbf{x} \in \mathbb{R}^{n}$ from noisy linear observations $\mathbf{y} =\mathbf{M} \mathbf{x}+\mathbf{w}$. The measurement matrix is modeled as $\mathbf{M} = \mathbf{B}\mathbf{A}$, where $\mathbf{B} \in \mathbb{R}^{l \times m}$ is arbitrary and $\mathbf{A} \in \mathbb{R}^{m \times n}$ has independent sub-gaussian rows. By varying…
▽ More
We consider the problem of recovering a structured signal $\mathbf{x} \in \mathbb{R}^{n}$ from noisy linear observations $\mathbf{y} =\mathbf{M} \mathbf{x}+\mathbf{w}$. The measurement matrix is modeled as $\mathbf{M} = \mathbf{B}\mathbf{A}$, where $\mathbf{B} \in \mathbb{R}^{l \times m}$ is arbitrary and $\mathbf{A} \in \mathbb{R}^{m \times n}$ has independent sub-gaussian rows. By varying $\mathbf{B}$, and the sub-gaussian distribution of $\mathbf{A}$, this gives a family of measurement matrices which may have heavy tails, dependent rows and columns, and singular values with a large dynamic range. When the structure is given as a possibly non-convex cone $T \subset \mathbb{R}^{n}$, an approximate empirical risk minimizer is proven to be a robust estimator if the effective number of measurements is sufficient, even in the presence of a model mismatch. In classical compressed sensing with independent (sub-)gaussian measurements, one asks how many measurements are needed to recover $\mathbf{x}$? In our setting, however, the effective number of measurements depends on the properties of $\mathbf{B}$. We show that the effective rank of $\mathbf{B}$ may be used as a surrogate for the number of measurements, and if this exceeds the squared Gaussian mean width of $(T-T) \cap \mathbb{S}^{n-1}$, then accurate recovery is guaranteed. Furthermore, we examine the special case of generative priors in detail, that is when $\mathbf{x}$ lies close to $T = \mathrm{ran}(G)$ and $G: \mathbb{R}^k \rightarrow \mathbb{R}^n$ is a Generative Neural Network (GNN) with ReLU activation functions. Our work relies on a recent result in random matrix theory by Jeong, Li, Plan, and Yilmaz arXiv:2001.10631. .
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
NBIHT: An Efficient Algorithm for 1-bit Compressed Sensing with Optimal Error Decay Rate
Authors:
Michael P. Friedlander,
Halyun Jeong,
Yaniv Plan,
Ozgur Yilmaz
Abstract:
The Binary Iterative Hard Thresholding (BIHT) algorithm is a popular reconstruction method for one-bit compressed sensing due to its simplicity and fast empirical convergence. There have been several works about BIHT but a theoretical understanding of the corresponding approximation error and convergence rate still remains open.
This paper shows that the normalized version of BIHT (NBHIT) achiev…
▽ More
The Binary Iterative Hard Thresholding (BIHT) algorithm is a popular reconstruction method for one-bit compressed sensing due to its simplicity and fast empirical convergence. There have been several works about BIHT but a theoretical understanding of the corresponding approximation error and convergence rate still remains open.
This paper shows that the normalized version of BIHT (NBHIT) achieves an approximation error rate optimal up to logarithmic factors. More precisely, using $m$ one-bit measurements of an $s$-sparse vector $x$, we prove that the approximation error of NBIHT is of order $O \left(1 \over m \right)$ up to logarithmic factors, which matches the information-theoretic lower bound $Ω\left(1 \over m \right)$ proved by Jacques, Laska, Boufounos, and Baraniuk in 2013. To our knowledge, this is the first theoretical analysis of a BIHT-type algorithm that explains the optimal rate of error decay empirically observed in the literature. This also makes NBIHT the first provable computationally-efficient one-bit compressed sensing algorithm that breaks the inverse square root error decay rate $O \left(1 \over m^{1/2} \right)$.
△ Less
Submitted 23 December, 2020;
originally announced December 2020.
-
On the best choice of Lasso program given data parameters
Authors:
Aaron Berk,
Yaniv Plan,
Özgür Yilmaz
Abstract:
Generalized compressed sensing (GCS) is a paradigm in which a structured high-dimensional signal may be recovered from random, under-determined, and corrupted linear measurements. Generalized Lasso (GL) programs are effective for solving GCS problems due to their proven ability to leverage underlying signal structure. Three popular GL programs are equivalent in a sense and sometimes used interchan…
▽ More
Generalized compressed sensing (GCS) is a paradigm in which a structured high-dimensional signal may be recovered from random, under-determined, and corrupted linear measurements. Generalized Lasso (GL) programs are effective for solving GCS problems due to their proven ability to leverage underlying signal structure. Three popular GL programs are equivalent in a sense and sometimes used interchangeably. Tuned by a governing parameter, each admit an optimal parameter choice. For sparse or low-rank signal structures, this choice yields minimax order-optimal error. While GCS is well-studied, existing theory for GL programs typically concerns this optimally tuned setting. However, the optimal parameter value for a GL program depends on properties of the data, and is typically unknown in practical settings. Performance in empirical problems thus hinges on a program's parameter sensitivity: it is desirable that small variation about the optimal parameter choice begets small variation about the optimal risk. We examine the risk for these three programs and demonstrate that their parameter sensitivity can differ for the same data. We prove a gauge-constrained GL program admits asymptotic cusp-like behaviour of its risk in the limiting low-noise regime. We prove that a residual-constrained Lasso program has asymptotically suboptimal risk for very sparse vectors. These results contrast observations about an unconstrained Lasso program, which is relatively less sensitive to its parameter choice. We support the asymptotic theory with numerical simulations, demonstrating that parameter sensitivity of GL programs is readily observed for even modest dimensional parameters. Importantly, these simulations demonstrate regimes in which a GL program exhibits sensitivity to its parameter choice, though the other two do not. We hope this work aids practitioners in selecting a GL program for their problem.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
Sub-Gaussian Matrices on Sets: Optimal Tail Dependence and Applications
Authors:
Halyun Jeong,
Xiaowei Li,
Yaniv Plan,
Özgür Yılmaz
Abstract:
Random linear map**s are widely used in modern signal processing, compressed sensing and machine learning. These map**s may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to $\mathbb{R}^n$. Thus, the performance of these m…
▽ More
Random linear map**s are widely used in modern signal processing, compressed sensing and machine learning. These map**s may be used to embed the data into a significantly lower dimension while at the same time preserving useful information. This is done by approximately preserving the distances between data points, which are assumed to belong to $\mathbb{R}^n$. Thus, the performance of these map**s is usually captured by how close they are to an isometry on the data. Gaussian linear map**s have been the object of much study, while the sub-Gaussian settings is not yet fully understood. In the latter case, the performance depends on the sub-Gaussian norm of the rows. In many applications, e.g., compressed sensing, this norm may be large, or even growing with dimension, and thus it is important to characterize this dependence.
We study when a sub-Gaussian matrix can become a near isometry on a set, show that previous best known dependence on the sub-Gaussian norm was sub-optimal, and present the optimal dependence. Our result not only answers a remaining question posed by Liaw, Mehrabian, Plan and Vershynin in 2017, but also generalizes their work. We also develop a new Bernstein type inequality for sub-exponential random variables, and a new Hanson-Wright inequality for quadratic forms of sub-Gaussian random variables, in both cases improving the bounds in the sub-Gaussian regime under moment constraints. Finally, we illustrate popular applications such as Johnson-Lindenstrauss embeddings, null space property for 0-1 matrices, randomized sketches and blind demodulation, whose theoretical guarantees can be improved by our results (in the sub-Gaussian case).
△ Less
Submitted 20 January, 2021; v1 submitted 28 January, 2020;
originally announced January 2020.
-
Weighted matrix completion from non-random, non-uniform sampling patterns
Authors:
Simon Foucart,
Deanna Needell,
Reese Pathak,
Yaniv Plan,
Mary Wootters
Abstract:
We study the matrix completion problem when the observation pattern is deterministic and possibly non-uniform. We propose a simple and efficient debiased projection scheme for recovery from noisy observations and analyze the error under a suitable weighted metric. We introduce a simple function of the weight matrix and the sampling pattern that governs the accuracy of the recovered matrix. We deri…
▽ More
We study the matrix completion problem when the observation pattern is deterministic and possibly non-uniform. We propose a simple and efficient debiased projection scheme for recovery from noisy observations and analyze the error under a suitable weighted metric. We introduce a simple function of the weight matrix and the sampling pattern that governs the accuracy of the recovered matrix. We derive theoretical guarantees that upper bound the recovery error and nearly matching lower bounds that showcase optimality in several regimes. Our numerical experiments demonstrate the computational efficiency and accuracy of our approach, and show that debiasing is essential when using non-uniform sampling patterns.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Tight Analyses for Non-Smooth Stochastic Gradient Descent
Authors:
Nicholas J. A. Harvey,
Christopher Liaw,
Yaniv Plan,
Sikander Randhawa
Abstract:
Consider the problem of minimizing functions that are Lipschitz and strongly convex, but not necessarily differentiable. We prove that after $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/T)$ with high probability. We also construct a function from this class for which the error of the final iterate of deterministic gradient descent is $Ω(\log(T)/T)$. This s…
▽ More
Consider the problem of minimizing functions that are Lipschitz and strongly convex, but not necessarily differentiable. We prove that after $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/T)$ with high probability. We also construct a function from this class for which the error of the final iterate of deterministic gradient descent is $Ω(\log(T)/T)$. This shows that the upper bound is tight and that, in this setting, the last iterate of stochastic gradient descent has the same general error rate (with high probability) as deterministic gradient descent. This resolves both open questions posed by Shamir (2012).
An intermediate step of our analysis proves that the suffix averaging method achieves error $O(1/T)$ with high probability, which is optimal (for any first-order optimization method). This improves results of Rakhlin (2012) and Hazan and Kale (2014), both of which achieved error $O(1/T)$, but only in expectation, and achieved a high probability error bound of $O(\log \log(T)/T)$, which is suboptimal.
We prove analogous results for functions that are Lipschitz and convex, but not necessarily strongly convex or differentiable. After $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/\sqrt{T})$ with high probability, and there exists a function for which the error of the final iterate of deterministic gradient descent is $Ω(\log(T)/\sqrt{T})$.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
Sensitivity of $\ell_{1}$ minimization to parameter choice
Authors:
Aaron Berk,
Yaniv Plan,
Özgür Yilmaz
Abstract:
The use of generalized LASSO is a common technique for recovery of structured high-dimensional signals. Each generalized LASSO program has a governing parameter whose optimal value depends on properties of the data. At this optimal value, compressed sensing theory explains why LASSO programs recover structured high-dimensional signals with minimax order-optimal error. Unfortunately in practice, th…
▽ More
The use of generalized LASSO is a common technique for recovery of structured high-dimensional signals. Each generalized LASSO program has a governing parameter whose optimal value depends on properties of the data. At this optimal value, compressed sensing theory explains why LASSO programs recover structured high-dimensional signals with minimax order-optimal error. Unfortunately in practice, the optimal choice is generally unknown and must be estimated. Thus, we investigate stability of each LASSO program with respect to its governing parameter. Our goal is to aid the practitioner in answering the following question: given real data, which LASSO program should be used? We take a step towards answering this by analyzing the case where the measurement matrix is identity (the so-called proximal denoising setup) and we use $\ell_{1}$ regularization. For each LASSO program, we specify settings in which that program is provably unstable with respect to its governing parameter. We support our analysis with detailed numerical simulations. For example, there are settings where a 0.1% underestimate of a LASSO parameter can increase the error significantly; and a 50% underestimate can cause the error to increase by a factor of $10^{9}$.
△ Less
Submitted 1 April, 2019; v1 submitted 29 October, 2018;
originally announced October 2018.
-
Learning tensors from partial binary measurements
Authors:
Navid Ghadermarzy,
Yaniv Plan,
Ozgur Yilmaz
Abstract:
In this paper we generalize the 1-bit matrix completion problem to higher order tensors. We prove that when $r=O(1)$ a bounded rank-$r$, order-$d$ tensor $T$ in $\mathbb{R}^{N} \times \mathbb{R}^{N} \times \cdots \times \mathbb{R}^{N}$ can be estimated efficiently by only $m=O(Nd)$ binary measurements by regularizing its max-qnorm and M-norm as surrogates for its rank. We prove that similar to the…
▽ More
In this paper we generalize the 1-bit matrix completion problem to higher order tensors. We prove that when $r=O(1)$ a bounded rank-$r$, order-$d$ tensor $T$ in $\mathbb{R}^{N} \times \mathbb{R}^{N} \times \cdots \times \mathbb{R}^{N}$ can be estimated efficiently by only $m=O(Nd)$ binary measurements by regularizing its max-qnorm and M-norm as surrogates for its rank. We prove that similar to the matrix case, i.e., when $d=2$, the sample complexity of recovering a low-rank tensor from 1-bit measurements of a subset of its entries is the same as recovering it from unquantized measurements. Moreover, we show the advantage of using 1-bit tensor completion over matricization both theoretically and numerically. Specifically, we show how the 1-bit measurement model can be used for context-aware recommender systems.
△ Less
Submitted 30 March, 2018;
originally announced April 2018.
-
Near-optimal sample complexity for convex tensor completion
Authors:
Navid Ghadermarzy,
Yaniv Plan,
Özgür Yılmaz
Abstract:
We analyze low rank tensor completion (TC) using noisy measurements of a subset of the tensor. Assuming a rank-$r$, order-$d$, $N \times N \times \cdots \times N$ tensor where $r=O(1)$, the best sampling complexity that was achieved is $O(N^{\frac{d}{2}})$, which is obtained by solving a tensor nuclear-norm minimization problem. However, this bound is significantly larger than the number of free v…
▽ More
We analyze low rank tensor completion (TC) using noisy measurements of a subset of the tensor. Assuming a rank-$r$, order-$d$, $N \times N \times \cdots \times N$ tensor where $r=O(1)$, the best sampling complexity that was achieved is $O(N^{\frac{d}{2}})$, which is obtained by solving a tensor nuclear-norm minimization problem. However, this bound is significantly larger than the number of free variables in a low rank tensor which is $O(dN)$. In this paper, we show that by using an atomic-norm whose atoms are rank-$1$ sign tensors, one can obtain a sample complexity of $O(dN)$. Moreover, we generalize the matrix max-norm definition to tensors, which results in a max-quasi-norm (max-qnorm) whose unit ball has small Rademacher complexity. We prove that solving a constrained least squares estimation using either the convex atomic-norm or the nonconvex max-qnorm results in optimal sample complexity for the problem of low-rank tensor completion. Furthermore, we show that these bounds are nearly minimax rate-optimal. We also provide promising numerical results for max-qnorm constrained tensor completion, showing improved recovery results compared to matricization and alternating least squares.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes
Authors:
Hassan Ashtiani,
Shai Ben-David,
Nick Harvey,
Christopher Liaw,
Abbas Mehrabian,
Yaniv Plan
Abstract:
We prove that $\tildeΘ(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower…
▽ More
We prove that $\tildeΘ(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians.
The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.' Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme.
△ Less
Submitted 21 July, 2020; v1 submitted 14 October, 2017;
originally announced October 2017.
-
One-Bit Compressive Sensing of Dictionary-Sparse Signals
Authors:
Rich Baraniuk,
Simon Foucart,
Deanna Needell,
Yaniv Plan,
Mary Wootters
Abstract:
One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples---only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sp…
▽ More
One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples---only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sparse in some fixed orthonormal basis. However, in most practical applications, signals are sparse with respect to an overcomplete dictionary, rather than a basis. There has already been a surge of activity to obtain recovery guarantees under such a generalized sparsity model in the classical compressive sensing setting. Here, we extend the one-bit framework to this important model, providing a unified theory of one-bit compressive sensing under dictionary sparsity. Specifically, we analyze several different algorithms---based on convex programming and on hard thresholding---and show that, under natural assumptions on the sensing matrix (satisfied by Gaussian matrices), these algorithms can efficiently recover analysis-dictionary-sparse signals in the one-bit model.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Optimizing quantization for Lasso recovery
Authors:
Xiaoyi Gu,
Shenyinying Tu,
Hao-Jun Michael Shi,
Mindy Case,
Deanna Needell,
Yaniv Plan
Abstract:
This letter is focused on quantized Compressed Sensing, assuming that Lasso is used for signal estimation. Leveraging recent work, we provide a framework to optimize the quantization function and show that the recovered signal converges to the actual signal at a quadratic rate as a function of the quantization level. We show that when the number of observations is high, this method of quantization…
▽ More
This letter is focused on quantized Compressed Sensing, assuming that Lasso is used for signal estimation. Leveraging recent work, we provide a framework to optimize the quantization function and show that the recovered signal converges to the actual signal at a quadratic rate as a function of the quantization level. We show that when the number of observations is high, this method of quantization gives a significantly better recovery rate than standard Lloyd-Max quantization. We support our theoretical analysis with numerical simulations.
△ Less
Submitted 9 June, 2016;
originally announced June 2016.
-
Average-case Hardness of RIP Certification
Authors:
Tengyao Wang,
Quentin Berthet,
Yaniv Plan
Abstract:
The restricted isometry property (RIP) for design matrices gives guarantees for optimal recovery in sparse linear models. It is of high interest in compressed sensing and statistical learning. This property is particularly important for computationally efficient recovery methods. As a consequence, even though it is in general NP-hard to check that RIP holds, there have been substantial efforts to…
▽ More
The restricted isometry property (RIP) for design matrices gives guarantees for optimal recovery in sparse linear models. It is of high interest in compressed sensing and statistical learning. This property is particularly important for computationally efficient recovery methods. As a consequence, even though it is in general NP-hard to check that RIP holds, there have been substantial efforts to find tractable proxies for it. These would allow the construction of RIP matrices and the polynomial-time verification of RIP given an arbitrary matrix. We consider the framework of average-case certifiers, that never wrongly declare that a matrix is RIP, while being often correct for random instances. While there are such functions which are tractable in a suboptimal parameter regime, we show that this is a computationally hard task in any better regime. Our results are based on a new, weaker assumption on the problem of detecting dense subgraphs.
△ Less
Submitted 31 May, 2016;
originally announced May 2016.
-
A simple tool for bounding the deviation of random matrices on geometric sets
Authors:
Christopher Liaw,
Abbas Mehrabian,
Yaniv Plan,
Roman Vershynin
Abstract:
Let $A$ be an isotropic, sub-gaussian $m \times n$ matrix. We prove that the process $Z_x := \|Ax\|_2 - \sqrt m \|x\|_2$ has sub-gaussian increments. Using this, we show that for any bounded set $T \subseteq \mathbb{R}^n$, the deviation of $\|Ax\|_2$ around its mean is uniformly bounded by the Gaussian complexity of $T$. We also prove a local version of this theorem, which allows for unbounded set…
▽ More
Let $A$ be an isotropic, sub-gaussian $m \times n$ matrix. We prove that the process $Z_x := \|Ax\|_2 - \sqrt m \|x\|_2$ has sub-gaussian increments. Using this, we show that for any bounded set $T \subseteq \mathbb{R}^n$, the deviation of $\|Ax\|_2$ around its mean is uniformly bounded by the Gaussian complexity of $T$. We also prove a local version of this theorem, which allows for unbounded sets. These theorems have various applications, some of which are reviewed in this paper. In particular, we give a new result regarding model selection in the constrained linear model.
△ Less
Submitted 7 June, 2016; v1 submitted 2 March, 2016;
originally announced March 2016.
-
Random map**s designed for commercial search engines
Authors:
Roger Donaldson,
Arijit Gupta,
Yaniv Plan,
Thomas Reimer
Abstract:
We give a practical random map** that takes any set of documents represented as vectors in Euclidean space and then maps them to a sparse subset of the Hamming cube while retaining ordering of inter-vector inner products. Once represented in the sparse space, it is natural to index documents using commercial text-based search engines which are specialized to take advantage of this sparse and dis…
▽ More
We give a practical random map** that takes any set of documents represented as vectors in Euclidean space and then maps them to a sparse subset of the Hamming cube while retaining ordering of inter-vector inner products. Once represented in the sparse space, it is natural to index documents using commercial text-based search engines which are specialized to take advantage of this sparse and discrete structure for large-scale document retrieval. We give a theoretical analysis of the map** scheme, characterizing exact asymptotic behavior and also giving non-asymptotic bounds which we verify through numerical simulations. We balance the theoretical treatment with several practical considerations; these allow substantial speed up of the method. We further illustrate the use of this method on search over two real data sets: a corpus of images represented by their color histograms, and a corpus of daily stock market index values.
△ Less
Submitted 21 July, 2015;
originally announced July 2015.
-
The generalized Lasso with non-linear observations
Authors:
Yaniv Plan,
Roman Vershynin
Abstract:
We study the problem of signal estimation from non-linear observations when the signal belongs to a low-dimensional set buried in a high-dimensional space. A rough heuristic often used in practice postulates that non-linear observations may be treated as noisy linear observations, and thus the signal may be estimated using the generalized Lasso. This is appealing because of the abundance of effici…
▽ More
We study the problem of signal estimation from non-linear observations when the signal belongs to a low-dimensional set buried in a high-dimensional space. A rough heuristic often used in practice postulates that non-linear observations may be treated as noisy linear observations, and thus the signal may be estimated using the generalized Lasso. This is appealing because of the abundance of efficient, specialized solvers for this program. Just as noise may be diminished by projecting onto the lower dimensional space, the error from modeling non-linear observations with linear observations will be greatly reduced when using the signal structure in the reconstruction. We allow general signal structure, only assuming that the signal belongs to some set K in R^n. We consider the single-index model of non-linearity. Our theory allows the non-linearity to be discontinuous, not one-to-one and even unknown. We assume a random Gaussian model for the measurement matrix, but allow the rows to have an unknown covariance matrix. As special cases of our results, we recover near-optimal theory for noisy linear observations, and also give the first theoretical accuracy guarantee for 1-bit compressed sensing with unknown covariance matrix of the measurement vectors.
△ Less
Submitted 16 November, 2015; v1 submitted 13 February, 2015;
originally announced February 2015.
-
On the Effective Measure of Dimension in the Analysis Cosparse Model
Authors:
Raja Giryes,
Yaniv Plan,
Roman Vershynin
Abstract:
Many applications have benefited remarkably from low-dimensional models in the recent decade. The fact that many signals, though high dimensional, are intrinsically low dimensional has given the possibility to recover them stably from a relatively small number of their measurements. For example, in compressed sensing with the standard (synthesis) sparsity prior and in matrix completion, the number…
▽ More
Many applications have benefited remarkably from low-dimensional models in the recent decade. The fact that many signals, though high dimensional, are intrinsically low dimensional has given the possibility to recover them stably from a relatively small number of their measurements. For example, in compressed sensing with the standard (synthesis) sparsity prior and in matrix completion, the number of measurements needed is proportional (up to a logarithmic factor) to the signal's manifold dimension.
Recently, a new natural low-dimensional signal model has been proposed: the cosparse analysis prior. In the noiseless case, it is possible to recover signals from this model, using a combinatorial search, from a number of measurements proportional to the signal's manifold dimension. However, if we ask for stability to noise or an efficient (polynomial complexity) solver, all the existing results demand a number of measurements which is far removed from the manifold dimension, sometimes far greater. Thus, it is natural to ask whether this gap is a deficiency of the theory and the solvers, or if there exists a real barrier in recovering the cosparse signals by relying only on their manifold dimension. Is there an algorithm which, in the presence of noise, can accurately recover a cosparse signal from a number of measurements proportional to the manifold dimension? In this work, we prove that there is no such algorithm. Further, we show through numerical simulations that even in the noiseless case convex relaxations fail when the number of measurements is comparable to the manifold dimension. This gives a practical counter-example to the growing literature on compressed acquisition of signals based on manifold dimension.
△ Less
Submitted 27 July, 2015; v1 submitted 3 October, 2014;
originally announced October 2014.
-
Exponential decay of reconstruction error from binary measurements of sparse signals
Authors:
Richard Baraniuk,
Simon Foucart,
Deanna Needell,
Yaniv Plan,
Mary Wootters
Abstract:
Binary measurements arise naturally in a variety of statistical and engineering applications. They may be inherent to the problem---e.g., in determining the relationship between genetics and the presence or absence of a disease---or they may be a result of extreme quantization. In one-bit compressed sensing it has recently been shown that the number of one-bit measurements required for signal esti…
▽ More
Binary measurements arise naturally in a variety of statistical and engineering applications. They may be inherent to the problem---e.g., in determining the relationship between genetics and the presence or absence of a disease---or they may be a result of extreme quantization. In one-bit compressed sensing it has recently been shown that the number of one-bit measurements required for signal estimation mirrors that of unquantized compressed sensing. Indeed, $s$-sparse signals in $\mathbb{R}^n$ can be estimated (up to normalization) from $Ω(s \log (n/s))$ one-bit measurements. Nevertheless, controlling the precise accuracy of the error estimate remains an open challenge. In this paper, we focus on optimizing the decay of the error as a function of the oversampling factor $λ:= m/(s \log(n/s))$, where $m$ is the number of measurements. It is known that the error in reconstructing sparse signals from standard one-bit measurements is bounded below by $Ω(λ^{-1})$. Without adjusting the measurement procedure, reducing this polynomial error decay rate is impossible. However, we show that an adaptive choice of the thresholds used for quantization may lower the error rate to $e^{-Ω(λ)}$. This improves upon guarantees for other methods of adaptive thresholding as proposed in Sigma-Delta quantization. We develop a general recursive strategy to achieve this exponential decay and two specific polynomial-time algorithms which fall into this framework, one based on convex programming and one on hard thresholding. This work is inspired by the one-bit compressed sensing model, in which the engineer controls the measurement procedure. Nevertheless, the principle is extendable to signal reconstruction problems in a variety of binary statistical models as well as statistical estimation problems like logistic regression.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
High-dimensional estimation with geometric constraints
Authors:
Yaniv Plan,
Roman Vershynin,
Elena Yudovina
Abstract:
Consider measuring an n-dimensional vector x through the inner product with several measurement vectors, a_1, a_2, ..., a_m. It is common in both signal processing and statistics to assume the linear response model y_i = <a_i, x> + e_i, where e_i is a noise term. However, in practice the precise relationship between the signal x and the observations y_i may not follow the linear model, and in some…
▽ More
Consider measuring an n-dimensional vector x through the inner product with several measurement vectors, a_1, a_2, ..., a_m. It is common in both signal processing and statistics to assume the linear response model y_i = <a_i, x> + e_i, where e_i is a noise term. However, in practice the precise relationship between the signal x and the observations y_i may not follow the linear model, and in some cases it may not even be known. To address this challenge, in this paper we propose a general model where it is only assumed that each observation y_i may depend on a_i only through <a_i, x>. We do not assume that the dependence is known. This is a form of the semiparametric single index model, and it includes the linear model as well as many forms of the generalized linear model as special cases. We further assume that the signal x has some structure, and we formulate this as a general assumption that x belongs to some known (but arbitrary) feasible set K. We carefully detail the benefit of using the signal structure to improve estimation. The theory is based on the mean width of K, a geometric parameter which can be used to understand its effective dimension in estimation problems. We determine a simple, efficient two-step procedure for estimating the signal based on this model -- a linear estimation followed by metric projection onto K. We give general conditions under which the estimator is minimax optimal up to a constant. This leads to the intriguing conclusion that in the high noise regime, an unknown non-linearity in the observations does not significantly reduce one's ability to determine the signal, even when the non-linearity may be non-invertible. Our results may be specialized to understand the effect of non-linearities in compressed sensing.
△ Less
Submitted 18 May, 2016; v1 submitted 14 April, 2014;
originally announced April 2014.
-
1-Bit Matrix Completion
Authors:
Mark A. Davenport,
Yaniv Plan,
Ewout van den Berg,
Mary Wootters
Abstract:
In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an…
▽ More
In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an accurate estimate of M from this data. In general this would seem impossible, but we show that the maximum likelihood estimate under a suitable constraint returns an accurate estimate of M when ||M||_{\infty} <= α, and rank(M) <= r. If the log-likelihood is a concave function (e.g., the logistic or probit observation models), then we can obtain this maximum likelihood estimate by optimizing a convex program. In addition, we also show that if instead of recovering M we simply wish to obtain an estimate of the distribution generating the 1-bit measurements, then we can eliminate the requirement that ||M||_{\infty} <= α. For both cases, we provide lower bounds showing that these estimates are near-optimal. We conclude with a suite of experiments that both verify the implications of our theorems as well as illustrate some of the practical applications of 1-bit matrix completion. In particular, we compare our program to standard matrix completion methods on movie rating data in which users submit ratings from 1 to 5. In order to use our program, we quantize this data to a single bit, but we allow the standard matrix completion program to have access to the original ratings (from 1 to 5). Surprisingly, the approach based on binary data performs significantly better.
△ Less
Submitted 1 July, 2014; v1 submitted 17 September, 2012;
originally announced September 2012.
-
One-bit compressed sensing with non-Gaussian measurements
Authors:
Albert Ai,
Alex Lapanowski,
Yaniv Plan,
Roman Vershynin
Abstract:
In one-bit compressed sensing, previous results state that sparse signals may be robustly recovered when the measurements are taken using Gaussian random vectors. In contrast to standard compressed sensing, these results are not extendable to natural non-Gaussian distributions without further assumptions, as can be demonstrated by simple counter-examples. We show that approximately sparse signals…
▽ More
In one-bit compressed sensing, previous results state that sparse signals may be robustly recovered when the measurements are taken using Gaussian random vectors. In contrast to standard compressed sensing, these results are not extendable to natural non-Gaussian distributions without further assumptions, as can be demonstrated by simple counter-examples. We show that approximately sparse signals that are not extremely sparse can be accurately reconstructed from single-bit measurements sampled according to a sub-gaussian distribution, and the reconstruction comes as the solution to a convex program.
△ Less
Submitted 8 April, 2013; v1 submitted 30 August, 2012;
originally announced August 2012.
-
Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach
Authors:
Yaniv Plan,
Roman Vershynin
Abstract:
This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. We show that a single convex program gives an accurate estimate of the signal, or coefficient vector, for both of these models. We demonstrate that an s-sparse signal in R^n can be accurately estimated from m = O(slog(n/s)) single-bit measurements using a simple convex program. This rem…
▽ More
This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. We show that a single convex program gives an accurate estimate of the signal, or coefficient vector, for both of these models. We demonstrate that an s-sparse signal in R^n can be accurately estimated from m = O(slog(n/s)) single-bit measurements using a simple convex program. This remains true even if each measurement bit is flipped with probability nearly 1/2. Worst-case (adversarial) noise can also be accounted for, and uniform results that hold for all sparse inputs are derived as well. In the terminology of sparse logistic regression, we show that O(slog(n/s)) Bernoulli trials are sufficient to estimate a coefficient vector in R^n which is approximately s-sparse. Moreover, the same convex program works for virtually all generalized linear models, in which the link function may be unknown. To our knowledge, these are the first results that tie together the theory of sparse logistic regression to 1-bit compressed sensing. Our results apply to general signal structures aside from sparsity; one only needs to know the size of the set K where signals reside. The size is given by the mean width of K, a computable quantity whose square serves as a robust extension of the dimension.
△ Less
Submitted 19 July, 2012; v1 submitted 6 February, 2012;
originally announced February 2012.
-
Dimension reduction by random hyperplane tessellations
Authors:
Yaniv Plan,
Roman Vershynin
Abstract:
Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x, y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K…
▽ More
Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x, y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K. Using the map that sends x in K to the sign vector with respect to the hyperplanes, we conclude that every bounded subset K of R^n embeds into the Hamming cube {-1, 1}^m with a small distortion in the Gromov-Haussdorf metric. Since for many sets K one has m = m(K) << n, this yields a new discrete mechanism of dimension reduction for sets in Euclidean spaces.
△ Less
Submitted 26 September, 2013; v1 submitted 18 November, 2011;
originally announced November 2011.
-
One-bit compressed sensing by linear programming
Authors:
Yaniv Plan,
Roman Vershynin
Abstract:
We give the first computationally tractable and almost optimal solution to the problem of one-bit compressed sensing, showing how to accurately recover an s-sparse vector x in R^n from the signs of O(s log^2(n/s)) random linear measurements of x. The recovery is achieved by a simple linear program. This result extends to approximately sparse vectors x. Our result is universal in the sense that wit…
▽ More
We give the first computationally tractable and almost optimal solution to the problem of one-bit compressed sensing, showing how to accurately recover an s-sparse vector x in R^n from the signs of O(s log^2(n/s)) random linear measurements of x. The recovery is achieved by a simple linear program. This result extends to approximately sparse vectors x. Our result is universal in the sense that with high probability, one measurement scheme will successfully recover all sparse vectors simultaneously. The argument is based on solving an equivalent geometric problem on random hyperplane tessellations.
△ Less
Submitted 16 March, 2012; v1 submitted 20 September, 2011;
originally announced September 2011.
-
Unicity conditions for low-rank matrix recovery
Authors:
Yonina C. Eldar,
Deanna Needell,
Yaniv Plan
Abstract:
Low-rank matrix recovery addresses the problem of recovering an unknown low-rank matrix from few linear measurements. Nuclear-norm minimization is a tractible approach with a recent surge of strong theoretical backing. Analagous to the theory of compressed sensing, these results have required random measurements. For example, m >= Cnr Gaussian measurements are sufficient to recover any rank-r n x…
▽ More
Low-rank matrix recovery addresses the problem of recovering an unknown low-rank matrix from few linear measurements. Nuclear-norm minimization is a tractible approach with a recent surge of strong theoretical backing. Analagous to the theory of compressed sensing, these results have required random measurements. For example, m >= Cnr Gaussian measurements are sufficient to recover any rank-r n x n matrix with high probability. In this paper we address the theoretical question of how many measurements are needed via any method whatsoever --- tractible or not. We show that for a family of random measurement ensembles, m >= 4nr - 4r^2 measurements are sufficient to guarantee that no rank-2r matrix lies in the null space of the measurement operator with probability one. This is a necessary and sufficient condition to ensure uniform recovery of all rank-r matrices by rank minimization. Furthermore, this value of $m$ precisely matches the dimension of the manifold of all rank-2r matrices. We also prove that for a fixed rank-r matrix, m >= 2nr - r^2 + 1 random measurements are enough to guarantee recovery using rank minimization. These results give a benchmark to which we may compare the efficacy of nuclear-norm minimization.
△ Less
Submitted 28 March, 2011;
originally announced March 2011.
-
A probabilistic and RIPless theory of compressed sensing
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probabil…
▽ More
This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) - they make use of a much weaker notion - or a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s log n Fourier coefficients that are contaminated with noise.
△ Less
Submitted 19 November, 2010; v1 submitted 16 November, 2010;
originally announced November 2010.
-
Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism
Authors:
Ery Arias-Castro,
Emmanuel J. Candès,
Yaniv Plan
Abstract:
Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have $p$ covariates and that under the a…
▽ More
Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have $p$ covariates and that under the alternative, the response only depends upon the order of $p^{1-α}$ of those, $0\leα\le1$. Under moderate sparsity levels, that is, $0\leα\le1/2$, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, that is, $α>1/2$. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when $α\geq3/4$. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where $1/2<α<3/4$. We suggest a method based on the higher criticism that is powerful in the whole range $α>1/2$. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern "$p>n$" designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.
△ Less
Submitted 23 February, 2012; v1 submitted 8 July, 2010;
originally announced July 2010.
-
Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
This paper presents several novel theoretical results regarding the recovery of a low-rank matrix from just a few measurements consisting of linear combinations of the matrix entries. We show that properly constrained nuclear-norm minimization stably recovers a low-rank matrix from a constant number of noisy measurements per degree of freedom; this seems to be the first result of this nature. Fu…
▽ More
This paper presents several novel theoretical results regarding the recovery of a low-rank matrix from just a few measurements consisting of linear combinations of the matrix entries. We show that properly constrained nuclear-norm minimization stably recovers a low-rank matrix from a constant number of noisy measurements per degree of freedom; this seems to be the first result of this nature. Further, the recovery error from noisy data is within a constant of three targets: 1) the minimax risk, 2) an oracle error that would be available if the column space of the matrix were known, and 3) a more adaptive oracle error which would be available with the knowledge of the column space corresponding to the part of the matrix that stands above the noise. Lastly, the error bounds regarding low-rank matrices are extended to provide an error bound when the matrix has full rank with decaying singular values. The analysis in this paper is based on the restricted isometry property (RIP) introduced in [6] for vectors, and in [22] for matrices.
△ Less
Submitted 2 January, 2010;
originally announced January 2010.
-
Accurate low-rank matrix recovery from a small number of linear measurements
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
We consider the problem of recovering a lowrank matrix M from a small number of random linear measurements. A popular and useful example of this problem is matrix completion, in which the measurements reveal the values of a subset of the entries, and we wish to fill in the missing entries (this is the famous Netflix problem). When M is believed to have low rank, one would ideally try to recover…
▽ More
We consider the problem of recovering a lowrank matrix M from a small number of random linear measurements. A popular and useful example of this problem is matrix completion, in which the measurements reveal the values of a subset of the entries, and we wish to fill in the missing entries (this is the famous Netflix problem). When M is believed to have low rank, one would ideally try to recover M by finding the minimum-rank matrix that is consistent with the data; this is, however, problematic since this is a nonconvex problem that is, generally, intractable.
Nuclear-norm minimization has been proposed as a tractable approach, and past papers have delved into the theoretical properties of nuclear-norm minimization algorithms, establishing conditions under which minimizing the nuclear norm yields the minimum rank solution. We review this spring of emerging literature and extend and refine previous theoretical results. Our focus is on providing error bounds when M is well approximated by a low-rank matrix, and when the measurements are corrupted with noise. We show that for a certain class of random linear measurements, nuclear-norm minimization provides stable recovery from a number of samples nearly at the theoretical lower limit, and enjoys order-optimal error bounds (with high probability).
△ Less
Submitted 2 October, 2009;
originally announced October 2009.
-
Matrix Completion With Noise
Authors:
Emmanuel J. Candes,
Yaniv Plan
Abstract:
On the heels of compressed sensing, a remarkable new field has very recently emerged. This field addresses a broad range of problems of significant practical interest, namely, the recovery of a data matrix from what appears to be incomplete, and perhaps even corrupted, information. In its simplest form, the problem is to recover a matrix from a small sample of its entries, and comes up in many a…
▽ More
On the heels of compressed sensing, a remarkable new field has very recently emerged. This field addresses a broad range of problems of significant practical interest, namely, the recovery of a data matrix from what appears to be incomplete, and perhaps even corrupted, information. In its simplest form, the problem is to recover a matrix from a small sample of its entries, and comes up in many areas of science and engineering including collaborative filtering, machine learning, control, remote sensing, and computer vision to name a few.
This paper surveys the novel literature on matrix completion, which shows that under some suitable conditions, one can recover an unknown low-rank matrix from a nearly minimal set of entries by solving a simple convex optimization problem, namely, nuclear-norm minimization subject to data constraints. Further, this paper introduces novel results showing that matrix completion is provably accurate even when the few observed entries are corrupted with a small amount of noise. A typical result is that one can recover an unknown n x n matrix of low rank r from just about nr log^2 n noisy samples with an error which is proportional to the noise level. We present numerical results which complement our quantitative analysis and show that, in practice, nuclear norm minimization accurately fills in the many missing entries of large low-rank matrices from just a few noisy samples. Some analogies between matrix completion and compressed sensing are discussed throughout.
△ Less
Submitted 18 March, 2009;
originally announced March 2009.
-
Near-ideal model selection by $\ell_1$ minimization
Authors:
Emmanuel J. Candès,
Yaniv Plan
Abstract:
We consider the fundamental problem of estimating the mean of a vector $y=Xβ+z$, where $X$ is an $n\times p$ design matrix in which one can have far more variables than observations, and $z$ is a stochastic error term--the so-called "$p>n$" setup. When $β$ is sparse, or, more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask…
▽ More
We consider the fundamental problem of estimating the mean of a vector $y=Xβ+z$, where $X$ is an $n\times p$ design matrix in which one can have far more variables than observations, and $z$ is a stochastic error term--the so-called "$p>n$" setup. When $β$ is sparse, or, more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask whether or not it is possible to accurately estimate $Xβ$ using a computationally tractable algorithm. We show that, in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error that one would achieve with an oracle supplying perfect information about which variables should and should not be included in the model. Interestingly, our results describe the average performance of the lasso; that is, the performance one can expect in an vast majority of cases where $Xβ$ is a sparse or nearly sparse superposition of variables, but not in all cases. Our results are nonasymptotic and widely applicable, since they simply require that pairs of predictor variables are not too collinear.
△ Less
Submitted 21 August, 2009; v1 submitted 2 January, 2008;
originally announced January 2008.