-
Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise
Authors:
Xi Chen,
Zhewen Hou,
Christopher A. Metzler,
Arian Maleki,
Shirin Jalali
Abstract:
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the…
▽ More
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our theoretical results capture the dependence of MSE upon the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we introduce the concept of bagged Deep Image Priors (Bagged-DIP) and integrate them with projected gradient descent. Furthermore, we show how employing Newton-Schulz algorithm for calculating matrix inverses within the iterations of PGD reduces the computational complexity of the algorithm. We will show that this method achieves the state-of-the-art performance.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Theoretical Analysis of Leave-one-out Cross Validation for Non-differentiable Penalties under High-dimensional Settings
Authors:
Haolin Zou,
Arnab Auddy,
Kamiar Rahnama Rad,
Arian Maleki
Abstract:
Despite a large and significant body of recent work focused on estimating the out-of-sample risk of regularized models in the high dimensional regime, a theoretical understanding of this problem for non-differentiable penalties such as generalized LASSO and nuclear norm is missing. In this paper we resolve this challenge. We study this problem in the proportional high dimensional regime where both…
▽ More
Despite a large and significant body of recent work focused on estimating the out-of-sample risk of regularized models in the high dimensional regime, a theoretical understanding of this problem for non-differentiable penalties such as generalized LASSO and nuclear norm is missing. In this paper we resolve this challenge. We study this problem in the proportional high dimensional regime where both the sample size n and number of features p are large, and n/p and the signal-to-noise ratio (per observation) remain finite. We provide finite sample upper bounds on the expected squared error of leave-one-out cross-validation (LO) in estimating the out-of-sample risk. The theoretical framework presented here provides a solid foundation for elucidating empirical findings that show the accuracy of LO.
△ Less
Submitted 14 February, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers (extended version)
Authors:
Arnab Auddy,
Haolin Zou,
Kamiar Rahnama Rad,
Arian Maleki
Abstract:
The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for…
▽ More
The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO's error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with non-differentiable regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size n, number of features p and regularization parameters. As a consequence, for the $\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes to infinity while n/p and SNR are fixed and bounded.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Sharp Concentration Results for Heavy-Tailed Distributions
Authors:
Milad Bakhshizadeh,
Arian Maleki,
Victor H. de la Pena
Abstract:
We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as…
▽ More
We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.
△ Less
Submitted 25 July, 2022; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions
Authors:
Kamiar Rahnama Rad,
Wenda Zhou,
Arian Maleki
Abstract:
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems…
▽ More
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as $n,p \rightarrow \infty$, even when the dimension $p$ of the feature vectors is comparable with or greater than the sample size $n$. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Does SLOPE outperform bridge regression?
Authors:
Shuaiwen Wang,
Haolei Weng,
Arian Maleki
Abstract:
A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize t…
▽ More
A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize the estimation error of SLOPE under the complementary regime where both $k$ and $n$ scale linearly with $p$, and provide new insights into the performance of SLOPE estimators. We first derive a concentration inequality for the finite sample mean square error (MSE) of SLOPE. The quantity that MSE concentrates around takes a complicated and implicit form. With delicate analysis of the quantity, we prove that among all SLOPE estimators, LASSO is optimal for estimating $k$-sparse parameter vectors that do not have tied non-zero components in the low noise scenario. On the other hand, in the large noise scenario, the family of SLOPE estimators are sub-optimal compared with bridge regression such as the Ridge estimator.
△ Less
Submitted 22 September, 2021; v1 submitted 20 September, 2019;
originally announced September 2019.
-
Consistent Risk Estimation in Moderately High-Dimensional Linear Regression
Authors:
Ji Xu,
Arian Maleki,
Kamiar Rahnama Rad,
Daniel Hsu
Abstract:
Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observa…
▽ More
Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observations $n$. However, a unifying methodology accompanied with a rigorous theory is lacking in high-dimensional settings. This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n,p \rightarrow \infty$ and $n/p \rightarrow δ>1$ ($δ$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i.e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques. A corner stone of our analysis is a bound that we obtain on the discrepancy of the `residuals' obtained from AMP and LOOCV. This connection not only enables us to obtain a more refined information on the estimates of AMP, ALO, and LOOCV, but also offers an upper bound on the convergence rate of each estimate.
△ Less
Submitted 18 January, 2021; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Minimax Linear Estimation of the Retargeted Mean
Authors:
David A. Hirshberg,
Arian Maleki,
Jose R. Zubizarreta
Abstract:
Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the c…
▽ More
Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the common practice of ignoring bias when building confidence intervals with these linear estimators. Focusing on the case that the class of the unknown outcome function is the unit ball of a reproducing kernel Hilbert space, we show that the resulting linear estimator is asymptotically optimal under conditions only marginally stronger than those used with augmented estimators. We establish bounds attesting to the estimator's good finite sample properties. In an extensive simulation study, we observe promising performance of the estimator throughout a wide range of sample sizes, noise levels, and levels of overlap between the covariate distributions of the treated and target populations.
△ Less
Submitted 26 February, 2021; v1 submitted 10 January, 2019;
originally announced January 2019.
-
Benefits of over-parameterization with EM
Authors:
Ji Xu,
Daniel Hsu,
Arian Maleki
Abstract:
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimati…
▽ More
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimating the mean vectors of a Gaussian mixture model in a scenario where the mixing weights are known. Our study shows that the global behavior of EM, when one uses an over-parameterized model in which the mixing weights are treated as unknown, is better than that when one uses the (correct) model with the mixing weights fixed to the known values. For symmetric Gaussians mixtures with two components, we prove that introducing the (statistically redundant) weight parameters enables EM to find the global maximizer of the log-likelihood starting from almost any initial mean parameters, whereas EM without this over-parameterization may very often fail. For other Gaussian mixtures, we provide empirical evidence that shows similar behavior. Our results corroborate the value of over-parameterization in solving non-convex optimization problems, previously observed in other domains.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems
Authors:
Shuaiwen Wang,
Wenda Zhou,
Arian Maleki,
Haihao Lu,
Vahab Mirrokni
Abstract:
Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1}
\hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response va…
▽ More
Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1}
\hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response variable respectively. Let $\ell$ and $R$ be the convex loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. $\mathcal{C} \subset \mathbb{R}^{p}$ is a closed convex set. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose three frameworks to obtain a computationally efficient approximation of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our three frameworks are based on the primal, dual, and proximal formulations of (1). Each framework shows its strength in certain types of problems. We prove the equivalence of the three approaches under smoothness conditions. This equivalence enables us to justify the accuracy of the three methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions
Authors:
Shuaiwen Wang,
Wenda Zhou,
Haihao Lu,
Arian Maleki,
Vahab Mirrokni
Abstract:
Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer,…
▽ More
Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of (1). We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.
-
A scalable estimate of the extra-sample prediction error via approximate leave-one-out
Authors:
Kamiar Rahnama Rad,
Arian Maleki
Abstract:
The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as $K$-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators.…
▽ More
The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as $K$-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finite-sample upper bound for $|\text{LO} - \text{ALO}|$. Our theoretical analysis illustrates that $|\text{LO} - \text{ALO}| \rightarrow 0$ with overwhelming probability, when $n,p \rightarrow \infty$, where the dimension $p$ of the feature vectors may be comparable with or even greater than the number of observations, $n$. Despite the high-dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that $|\text{LO} - \text{ALO}|$ decreases as $n,p$ increase, revealing the excellent finite sample performance of ALO. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.
△ Less
Submitted 11 February, 2020; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Global analysis of Expectation Maximization for mixtures of two Gaussians
Authors:
Ji Xu,
Daniel Hsu,
Arian Maleki
Abstract:
Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical p…
▽ More
Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical principles behind EM and its algorithmic properties. Specifically, it provides a global analysis of EM for specific models in which the observations comprise an i.i.d. sample from a mixture of two Gaussians. This is achieved by (i) studying the sequence of parameters from idealized execution of EM in the infinite sample limit, and fully characterizing the limit points of the sequence in terms of the initial parameters; and then (ii) based on this convergence analysis, establishing statistical consistency (or lack thereof) for the actual sequence of parameters produced by EM.
△ Less
Submitted 26 August, 2016;
originally announced August 2016.
-
Consistent Parameter Estimation for LASSO and Approximate Message Passing
Authors:
Ali Mousavi,
Arian Maleki,
Richard G. Baraniuk
Abstract:
We consider the problem of recovering a vector $β_o \in \mathbb{R}^p$ from $n$ random and noisy linear observations $y= Xβ_o + w$, where $X$ is the measurement matrix and $w$ is noise. The LASSO estimate is given by the solution to the optimization problem $\hatβ_λ = \arg \min_β \frac{1}{2} \|y-Xβ\|_2^2 + λ\| β\|_1$. Among the iterative algorithms that have been proposed for solving this optimizat…
▽ More
We consider the problem of recovering a vector $β_o \in \mathbb{R}^p$ from $n$ random and noisy linear observations $y= Xβ_o + w$, where $X$ is the measurement matrix and $w$ is noise. The LASSO estimate is given by the solution to the optimization problem $\hatβ_λ = \arg \min_β \frac{1}{2} \|y-Xβ\|_2^2 + λ\| β\|_1$. Among the iterative algorithms that have been proposed for solving this optimization problem, approximate message passing (AMP) has attracted attention for its fast convergence. Despite significant progress in the theoretical analysis of the estimates of LASSO and AMP, little is known about their behavior as a function of the regularization parameter $λ$, or the thereshold parameters $τ^t$. For instance the following basic questions have not yet been studied in the literature: (i) How does the size of the active set $\|\hatβ^λ\|_0/p$ behave as a function of $λ$? (ii) How does the mean square error $\|\hatβ_λ - β_o\|_2^2/p$ behave as a function of $λ$? (iii) How does $\|β^t - β_o \|_2^2/p$ behave as a function of $τ^1, \ldots, τ^{t-1}$? Answering these questions will help in addressing practical challenges regarding the optimal tuning of $λ$ or $τ^1, τ^2, \ldots$. This paper answers these questions in the asymptotic setting and shows how these results can be employed in deriving simple and theoretically optimal approaches for tuning the parameters $τ^1, \ldots, τ^t$ for AMP or $λ$ for LASSO. It also explores the connection between the optimal tuning of the parameters of AMP and the optimal tuning of LASSO.
△ Less
Submitted 4 November, 2015; v1 submitted 3 November, 2015;
originally announced November 2015.
-
From Denoising to Compressed Sensing
Authors:
Christopher A. Metzler,
Arian Maleki,
Richard G. Baraniuk
Abstract:
A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, today's denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomi…
▽ More
A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, today's denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomized measurements. Typical CS reconstruction algorithms can be cast as iteratively estimating a signal from a perturbed observation. This paper answers a natural question: How can one effectively employ a generic denoiser in a CS reconstruction algorithm? In response, we develop an extension of the approximate message passing (AMP) framework, called Denoising-based AMP (D-AMP), that can integrate a wide class of denoisers within its iterations. We demonstrate that, when used with a high performance denoiser for natural images, D-AMP offers state-of-the-art CS recovery performance while operating tens of times faster than competing methods. We explain the exceptional performance of D-AMP by analyzing some of its theoretical features. A key element in D-AMP is the use of an appropriate Onsager correction term in its iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.
△ Less
Submitted 17 April, 2016; v1 submitted 16 June, 2014;
originally announced June 2014.
-
Parameterless Optimal Approximate Message Passing
Authors:
Ali Mousavi,
Arian Maleki,
Richard G. Baraniuk
Abstract:
Iterative thresholding algorithms are well-suited for high-dimensional problems in sparse recovery and compressive sensing. The performance of this class of algorithms depends heavily on the tuning of certain threshold parameters. In particular, both the final reconstruction error and the convergence rate of the algorithm crucially rely on how the threshold parameter is set at each step of the alg…
▽ More
Iterative thresholding algorithms are well-suited for high-dimensional problems in sparse recovery and compressive sensing. The performance of this class of algorithms depends heavily on the tuning of certain threshold parameters. In particular, both the final reconstruction error and the convergence rate of the algorithm crucially rely on how the threshold parameter is set at each step of the algorithm. In this paper, we propose a parameter-free approximate message passing (AMP) algorithm that sets the threshold parameter at each iteration in a fully automatic way without either having an information about the signal to be reconstructed or needing any tuning from the user. We show that the proposed method attains both the minimum reconstruction error and the highest convergence rate. Our method is based on applying the Stein unbiased risk estimate (SURE) along with a modified gradient descent to find the optimal threshold in each iteration. Motivated by the connections between AMP and LASSO, it could be employed to find the solution of the LASSO for the optimal regularization parameter. To the best of our knowledge, this is the first work concerning parameter tuning that obtains the fastest convergence rate with theoretical guarantees.
△ Less
Submitted 31 October, 2013;
originally announced November 2013.
-
Asymptotic Analysis of LASSOs Solution Path with Implications for Approximate Message Passing
Authors:
Ali Mousavi,
Arian Maleki,
Richard G. Baraniuk
Abstract:
This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to th…
▽ More
This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to the optimization problem $x_o$ with $\hat{x}_λ = \arg \min_x \frac{1}{2} \|y-Ax\|_2^2 + λ\|x\|_1$. Despite major progress in the theoretical analysis of the LASSO solution, little is known about its behavior as a function of the regularization parameter $λ$. In this paper we study two questions in the asymptotic setting (i.e., where $N \rightarrow \infty$, $n \rightarrow \infty$ while the ratio $n/N$ converges to a fixed number in $(0,1)$): (i) How does the size of the active set $\|\hat{x}_λ\|_0/N$ behave as a function of $λ$, and (ii) How does the mean square error $\|\hat{x}_λ - x_o\|_2^2/N$ behave as a function of $λ$? We then employ these results in a new, reliable algorithm for solving LASSO based on approximate message passing (AMP).
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
Authors:
Dominique Guillot,
Bala Rajaratnam,
Benjamin T. Rolfs,
Arian Maleki,
Ian Wong
Abstract:
The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been…
▽ More
The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log e) iteration complexity to reach a tolerance of e. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned.
△ Less
Submitted 26 November, 2012; v1 submitted 12 November, 2012;
originally announced November 2012.
-
Message Passing Algorithms for Compressed Sensing
Authors:
David L. Donoho,
Arian Maleki,
Andrea Montanari
Abstract:
Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in importan…
▽ More
Compressed sensing aims to undersample certain high-dimensional signals, yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the object to be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity-undersampling tradeoff is achieved when reconstructing by convex optimization -- which is expensive in important large-scale applications. Fast iterative thresholding algorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately known fast algorithms offer substantially worse sparsity-undersampling tradeoffs than convex optimization.
We introduce a simple costless modification to iterative thresholding making the sparsity-undersampling tradeoff of the new algorithms equivalent to that of the corresponding convex optimization procedures. The new iterative-thresholding algorithms are inspired by belief propagation in graphical models. Our empirical measurements of the sparsity-undersampling tradeoff for the new algorithms agree with theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity-undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this new, apparently very different theoretical formalism.
△ Less
Submitted 21 July, 2009;
originally announced July 2009.