-
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Authors:
Fareed Qararyah,
Muhammad Waqar Azhar,
Mohammad Ali Maleki,
Pedro Trancoso
Abstract:
Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the perform…
▽ More
Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise and pointwise convolutions to overcome the memory access bottleneck. The focus is on fusing these operators on GPUs. The prior art on GPU-based fusion suffers from one or more of the following: (1) fusing either a convolution with an element-wise or multiple non-convolutional operators, (2) not explicitly optimizing for memory accesses, (3) not supporting depthwise convolutions. This paper proposes Fused Convolutional Modules (FCMs), a set of novel fused depthwise and pointwise GPU kernels. FCMs significantly reduce pointwise and depthwise convolutions memory accesses, improving execution time and energy efficiency. To evaluate the trade-offs associated with fusion and determine which convolutions are beneficial to fuse and the optimal FCM parameters, we propose FusePlanner. FusePlanner consists of cost models to estimate the memory accesses of depthwise, pointwise, and FCM kernels given GPU characteristics. Our experiments on three GPUs using representative CNNs and ViTs demonstrate that FCMs save up to 83% of the memory accesses and achieve speedups of up to 3.7x compared to cuDNN. Complete model implementations of various CNNs using our modules outperform TVMs' achieving speedups of up to 1.8x and saving up to two-thirds of the energy.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise
Authors:
Xi Chen,
Zhewen Hou,
Christopher A. Metzler,
Arian Maleki,
Shirin Jalali
Abstract:
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the…
▽ More
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our theoretical results capture the dependence of MSE upon the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we introduce the concept of bagged Deep Image Priors (Bagged-DIP) and integrate them with projected gradient descent. Furthermore, we show how employing Newton-Schulz algorithm for calculating matrix inverses within the iterations of PGD reduces the computational complexity of the algorithm. We will show that this method achieves the state-of-the-art performance.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Coordinated Deep Neural Networks: A Versatile Edge Offloading Algorithm
Authors:
Alireza Maleki,
Hamed Shah-Mansouri,
Babak H. Khalaj
Abstract:
As artificial intelligence (AI) applications continue to expand, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising to provide AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider the DNN service providers share their computing resources as well as their models' parameters and allow o…
▽ More
As artificial intelligence (AI) applications continue to expand, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising to provide AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider the DNN service providers share their computing resources as well as their models' parameters and allow other DNNs to offload their computations without mirroring. We propose a novel algorithm called coordinated DNNs on edge (\textbf{CoDE}) that facilitates coordination among DNN services by creating multi-task DNNs out of individual models. CoDE aims to find the optimal path that results in the lowest possible cost, where the cost reflects the inference delay, model accuracy, and local computation workload. With CoDE, DNN models can make new paths for inference by using their own or other models' parameters. We then evaluate the performance of CoDE through numerical experiments. The results demonstrate a $75\%$ reduction in the local service computation workload while degrading the accuracy by only $2\%$ and having the same inference time in a balanced load condition. Under heavy load, CoDE can further decrease the inference time by $30\%$ while the accuracy is reduced by only $4\%$.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Who Are Tweeting About Academic Publications? A Cochrane Systematic Review and Meta-Analysis of Altmetric Studies
Authors:
Ashraf Maleki,
Kim Holmberg
Abstract:
Previous studies have developed different categorizations of Twitter users who interact with scientific publications online, reflecting the difficulty in creating a unified approach. Using Cochrane Review meta-analysis to analyse earlier research (including 79,014 Twitter users, over twenty million tweets, and over five million tweeted publications from 23 studies), we created a consolidated robus…
▽ More
Previous studies have developed different categorizations of Twitter users who interact with scientific publications online, reflecting the difficulty in creating a unified approach. Using Cochrane Review meta-analysis to analyse earlier research (including 79,014 Twitter users, over twenty million tweets, and over five million tweeted publications from 23 studies), we created a consolidated robust categorization consisting of 11 user categories, at different dimensions, covering most of any future needs for user categorizations on Twitter and possibly also other social media platforms. Our findings showed, with moderate certainty, covering all the earlier different approaches employed, that the predominant group of Twitter was individual users (66%), being responsible for the majority of tweets (55%) and tweeted publications (50%), while organizations (22%, 27%, and 28%, respectively) and science communicators (16%, 13%, and 30%) clearly contributed to a lesser degree. These individual users consisted of both academic individuals (33%) and other individuals (28%). While academic individuals shared more academic publications than other individuals (42% vs. 31%), they posted fewer tweets overall (22% vs. 30%), but these differences do not reach statistical significance. Despite significant heterogeneity arising from variations in earlier categorizations, the findings consistently indicate the importance of academics in disseminating academic publications on Twitter.
△ Less
Submitted 14 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Heterogeneous Multi-core Array-based DNN Accelerator
Authors:
Mohammad Ali Maleki,
Mehdi Kamal,
Ali Afzali-Kusha
Abstract:
In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool that simulates the execution of neural networks on array-based accelerators and has the capability of testing different configurations for the estimation of ener…
▽ More
In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool that simulates the execution of neural networks on array-based accelerators and has the capability of testing different configurations for the estimation of energy consumption and processing latency. Based on our analysis of the behavior of benchmark networks under different architectural parameters, we offer a few recommendations for having an efficient yet high performance accelerator design. Next, we propose a heterogeneous multi-core chip scheme for deep neural network execution. The evaluations of a selective small search space indicate that the execution of neural networks on their near-optimal core configuration can save up to 36% and 67% of energy consumption and energy-delay product respectively. Also, we suggest an algorithm to distribute the processing of network's layers across multiple cores of the same type in order to speed up the computations through model parallelism. Evaluations on different networks and with the different number of cores verify the effectiveness of the proposed algorithm in speeding up the processing to near-optimal values.
△ Less
Submitted 25 June, 2022;
originally announced June 2022.
-
Towards Designing Optimal Sensing Matrices for Generalized Linear Inverse Problems
Authors:
Junjie Ma,
Ji Xu,
Arian Maleki
Abstract:
We consider an inverse problem $\mathbf{y}= f(\mathbf{Ax})$, where $\mathbf{x}\in\mathbb{R}^n$ is the signal of interest, $\mathbf{A}$ is the sensing matrix, $f$ is a nonlinear function and $\mathbf{y} \in \mathbb{R}^m$ is the measurement vector. In many applications, we have some level of freedom to design the sensing matrix $\mathbf{A}$, and in such circumstances we could optimize $\mathbf{A}$ t…
▽ More
We consider an inverse problem $\mathbf{y}= f(\mathbf{Ax})$, where $\mathbf{x}\in\mathbb{R}^n$ is the signal of interest, $\mathbf{A}$ is the sensing matrix, $f$ is a nonlinear function and $\mathbf{y} \in \mathbb{R}^m$ is the measurement vector. In many applications, we have some level of freedom to design the sensing matrix $\mathbf{A}$, and in such circumstances we could optimize $\mathbf{A}$ to achieve better reconstruction performance. As a first step towards optimal design, it is important to understand the impact of the sensing matrix on the difficulty of recovering $\mathbf{x}$ from $\mathbf{y}$.
In this paper, we study the performance of one of the most successful recovery methods, i.e., the expectation propagation (EP) algorithm. We define a notion of spikiness for the spectrum of $\bmmathbfA}$ and show the importance of this measure for the performance of EP. We show that whether a spikier spectrum can hurt or help the recovery performance depends on $f$. Based on our framework, we are able to show that, in phase-retrieval problems, matrices with spikier spectrums are better for EP, while in 1-bit compressed sensing problems, less spiky spectrums lead to better performance. Our results unify and substantially generalize existing results that compare Gaussian and orthogonal matrices, and provide a platform towards designing optimal sensing systems.
△ Less
Submitted 19 August, 2023; v1 submitted 4 November, 2021;
originally announced November 2021.
-
A composable autoencoder-based iterative algorithm for accelerating numerical simulations
Authors:
Rishikesh Ranade,
Chris Hill,
Haiyang He,
Amir Maleki,
Norman Chang,
Jay Pathak
Abstract:
Numerical simulations for engineering applications solve partial differential equations (PDE) to model various physical processes. Traditional PDE solvers are very accurate but computationally costly. On the other hand, Machine Learning (ML) methods offer a significant computational speedup but face challenges with accuracy and generalization to different PDE conditions, such as geometry, boundary…
▽ More
Numerical simulations for engineering applications solve partial differential equations (PDE) to model various physical processes. Traditional PDE solvers are very accurate but computationally costly. On the other hand, Machine Learning (ML) methods offer a significant computational speedup but face challenges with accuracy and generalization to different PDE conditions, such as geometry, boundary conditions, initial conditions and PDE source terms. In this work, we propose a novel ML-based approach, CoAE-MLSim (Composable AutoEncoder Machine Learning Simulation), which is an unsupervised, lower-dimensional, local method, that is motivated from key ideas used in commercial PDE solvers. This allows our approach to learn better with relatively fewer samples of PDE solutions. The proposed ML-approach is compared against commercial solvers for better benchmarks as well as latest ML-approaches for solving PDEs. It is tested for a variety of complex engineering cases to demonstrate its computational speed, accuracy, scalability, and generalization across different PDE conditions. The results show that our approach captures physics accurately across all metrics of comparison (including measures such as results on section cuts and lines).
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
OVERT: An Algorithm for Safety Verification of Neural Network Control Policies for Nonlinear Systems
Authors:
Chelsea Sidrane,
Amir Maleki,
Ahmed Irfan,
Mykel J. Kochenderfer
Abstract:
Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas fro…
▽ More
Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas from the classical formal methods literature with ideas from the newer neural network verification literature. The central concept of OVERT is to abstract nonlinear functions with a set of optimally tight piecewise linear bounds. Such piecewise linear bounds are designed for seamless integration into ReLU neural network verification tools. OVERT can be used to prove bounded-time safety properties by either computing reachable sets or solving feasibility queries directly. We demonstrate various examples of safety verification for several classical benchmark examples. OVERT compares favorably to existing methods both in computation time and in tightness of the reachable set.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Compressed sensing in the presence of speckle noise
Authors:
Wenda Zhou,
Shirin Jalali,
Arian Maleki
Abstract:
The problem of recovering a structured signal from its linear measurements in the presence of speckle noise is studied. This problem appears in many imaging systems such as synthetic aperture radar and optical coherence tomography. The current acquisition technology oversamples signals and converts the problem into a denoising problem with multiplicative noise. However, this paper explores the pos…
▽ More
The problem of recovering a structured signal from its linear measurements in the presence of speckle noise is studied. This problem appears in many imaging systems such as synthetic aperture radar and optical coherence tomography. The current acquisition technology oversamples signals and converts the problem into a denoising problem with multiplicative noise. However, this paper explores the possibility of reducing the number of measurements below the ambient dimension of the signal. The sophistications that appear in the study of multiplicative noises have so far impeded theoretical analysis of such problems. This paper aims to present the first theoretical result regarding the recovery of signals from their undersampled measurements under the speckle noise. It is shown that if the signal class is structured, in the sense that the signals can be compressed efficiently, then one can obtain accurate estimates of the signal from fewer measurements than the ambient dimension. We demonstrate the effectiveness of the methods we propose through simulation results.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Geometry encoding for numerical simulations
Authors:
Amir Maleki,
Jan Heyse,
Rishikesh Ranade,
Haiyang He,
Priya Kasimbeg,
Jay Pathak
Abstract:
We present a notion of geometry encoding suitable for machine learning-based numerical simulation. In particular, we delineate how this notion of encoding is different than other encoding algorithms commonly used in other disciplines such as computer vision and computer graphics. We also present a model comprised of multiple neural networks including a processor, a compressor and an evaluator.Thes…
▽ More
We present a notion of geometry encoding suitable for machine learning-based numerical simulation. In particular, we delineate how this notion of encoding is different than other encoding algorithms commonly used in other disciplines such as computer vision and computer graphics. We also present a model comprised of multiple neural networks including a processor, a compressor and an evaluator.These parts each satisfy a particular requirement of our encoding. We compare our encoding model with the analogous models in the literature
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
A Latent space solver for PDE generalization
Authors:
Rishikesh Ranade,
Chris Hill,
Haiyang He,
Amir Maleki,
Jay Pathak
Abstract:
In this work we propose a hybrid solver to solve partial differential equation (PDE)s in the latent space. The solver uses an iterative inferencing strategy combined with solution initialization to improve generalization of PDE solutions. The solver is tested on an engineering case and the results show that it can generalize well to several PDE conditions.
In this work we propose a hybrid solver to solve partial differential equation (PDE)s in the latent space. The solver uses an iterative inferencing strategy combined with solution initialization to improve generalization of PDE solutions. The solver is tested on an engineering case and the results show that it can generalize well to several PDE conditions.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Preference-based Learning of Reward Function Features
Authors:
Sydney M. Katz,
Amir Maleki,
Erdem Bıyık,
Mykel J. Kochenderfer
Abstract:
Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular use…
▽ More
Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular user's relative weighting for each feature. Designing a representative set of features to encode reward is challenging and can result in inaccurate models that fail to model the users' preferences or perform the task properly. In this paper, we present a method to learn both the relative weighting among features as well as additional features that help encode a user's reward function. The additional features are modeled as a neural network that is trained on the data from pairwise comparison queries. We apply our methods to a driving scenario used in previous work and compare the predictive power of our method to that of only hand-coded features. We perform additional analysis to interpret the learned features and examine the optimal trajectories. Our results show that adding an additional learned feature to the reward model enhances both its predictive power and expressiveness, producing unique results for each user.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Mismatched Data Detection in Massive MU-MIMO
Authors:
Charles Jeon,
Arian Maleki,
Christoph Studer
Abstract:
We investigate mismatched data detection for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems in which the prior distribution of the transmit signal used in the data detector differs from the true prior. In order to minimize the performance loss caused by the prior mismatch, we include a tuning stage into the recently proposed large-MIMO approximate message passing (L…
▽ More
We investigate mismatched data detection for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems in which the prior distribution of the transmit signal used in the data detector differs from the true prior. In order to minimize the performance loss caused by the prior mismatch, we include a tuning stage into the recently proposed large-MIMO approximate message passing (LAMA) algorithm, which enables the development of data detectors with optimal as well as sub-optimal parameter tuning. We show that carefully-selected priors enable the design of simpler and computationally more efficient data detection algorithms compared to LAMA that uses the optimal prior, while achieving near-optimal error-rate performance. In particular, we demonstrate that a hardware-friendly approximation of the exact prior enables the design of low-complexity data detectors that achieve near individually-optimal performance. Furthermore, for Gaussian priors and uniform priors within a hypercube covering the quadrature amplitude modulation (QAM) constellation, our performance analysis recovers classical and recent results on linear and non-linear massive MU-MIMO data detection, respectively.
△ Less
Submitted 18 October, 2021; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Sharp Concentration Results for Heavy-Tailed Distributions
Authors:
Milad Bakhshizadeh,
Arian Maleki,
Victor H. de la Pena
Abstract:
We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as…
▽ More
We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.
△ Less
Submitted 25 July, 2022; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions
Authors:
Kamiar Rahnama Rad,
Wenda Zhou,
Arian Maleki
Abstract:
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems…
▽ More
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as $n,p \rightarrow \infty$, even when the dimension $p$ of the feature vectors is comparable with or greater than the sample size $n$. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Does SLOPE outperform bridge regression?
Authors:
Shuaiwen Wang,
Haolei Weng,
Arian Maleki
Abstract:
A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize t…
▽ More
A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize the estimation error of SLOPE under the complementary regime where both $k$ and $n$ scale linearly with $p$, and provide new insights into the performance of SLOPE estimators. We first derive a concentration inequality for the finite sample mean square error (MSE) of SLOPE. The quantity that MSE concentrates around takes a complicated and implicit form. With delicate analysis of the quantity, we prove that among all SLOPE estimators, LASSO is optimal for estimating $k$-sparse parameter vectors that do not have tied non-zero components in the low noise scenario. On the other hand, in the large noise scenario, the family of SLOPE estimators are sub-optimal compared with bridge regression such as the Ridge estimator.
△ Less
Submitted 22 September, 2021; v1 submitted 20 September, 2019;
originally announced September 2019.
-
A Configurable Memristor-based Finite Impulse Response Filter
Authors:
Mohammad Hemmati,
Vahid Rashtchi,
Ahmad Maleki,
Siroos Toofan
Abstract:
There are two main methods to implement FIR filters: software and hardware. In the software method, an FIR filter can be implemented within the processor by programming; it uses too much memory and it is extremely time-consuming while it gives the design more configurability. In most hardware-based implementations of FIR filters, Analog-to-Digital (A/D) and Digital-to-Analog (D/A) converters are m…
▽ More
There are two main methods to implement FIR filters: software and hardware. In the software method, an FIR filter can be implemented within the processor by programming; it uses too much memory and it is extremely time-consuming while it gives the design more configurability. In most hardware-based implementations of FIR filters, Analog-to-Digital (A/D) and Digital-to-Analog (D/A) converters are mandatory and increase the cost. The most important advantage of hardware implementation of a FIR filter is its higher speed compared to its software counterpart. In this work, considering the advantages of software and hardware approaches, a method to implement direct form FIR filters using analog components and memristors is proposed. Not only the A/D and D/A converters are omitted, but also using memristors avails configurability. A new circuit is presented to handle negative coefficients of the filter and memristance values are calculated using a heuristic method in order to achieve a better accuracy in setting coefficients. Moreover, an appropriate sample and delay topology is employed which overcomes the limitations of the previous research in implementation of high-order filters. Proper operation and usefulness of the proposed structures are all validated via simulation in Cadence.
△ Less
Submitted 10 April, 2019;
originally announced April 2019.
-
Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices
Authors:
Rishabh Dudeja,
Milad Bakhshizadeh,
Junjie Ma,
Arian Maleki
Abstract:
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the…
▽ More
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the asymptotic setting in which the number of measurements $m$, and the dimension of the signal, $n$, diverge to infinity with $m/n = δ\in(1,\infty)$, and obtain a simple expression for the overlap between the spectral estimator and the true signal vector.
△ Less
Submitted 4 March, 2020; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Spectral Method for Phase Retrieval: an Expectation Propagation Perspective
Authors:
Junjie Ma,
Rishabh Dudeja,
Ji Xu,
Arian Maleki,
Xiaodong Wang
Abstract:
Phase retrieval refers to the problem of recovering a signal $\mathbf{x}_{\star}\in\mathbb{C}^n$ from its phaseless measurements $y_i=|\mathbf{a}_i^{\mathrm{H}}\mathbf{x}_{\star}|$, where $\{\mathbf{a}_i\}_{i=1}^m$ are the measurement vectors. Many popular phase retrieval algorithms are based on the following two-step procedure: (i) initialize the algorithm based on a spectral method, (ii) refine…
▽ More
Phase retrieval refers to the problem of recovering a signal $\mathbf{x}_{\star}\in\mathbb{C}^n$ from its phaseless measurements $y_i=|\mathbf{a}_i^{\mathrm{H}}\mathbf{x}_{\star}|$, where $\{\mathbf{a}_i\}_{i=1}^m$ are the measurement vectors. Many popular phase retrieval algorithms are based on the following two-step procedure: (i) initialize the algorithm based on a spectral method, (ii) refine the initial estimate by a local search algorithm (e.g., gradient descent). The quality of the spectral initialization step can have a major impact on the performance of the overall algorithm. In this paper, we focus on the model where the measurement matrix $\mathbf{A}=[\mathbf{a}_1,\ldots,\mathbf{a}_m]^{\mathrm{H}}$ has orthonormal columns, and study the spectral initialization under the asymptotic setting $m,n\to\infty$ with $m/n\toδ\in(1,\infty)$. We use the expectation propagation framework to characterize the performance of spectral initialization for Haar distributed matrices. Our numerical results confirm that the predictions of the EP method are accurate for not-only Haar distributed matrices, but also for realistic Fourier based models (e.g. the coded diffraction model). The main findings of this paper are the following:
(1) There exists a threshold on $δ$ (denoted as $δ_{\mathrm{weak}}$) below which the spectral method cannot produce a meaningful estimate. We show that $δ_{\mathrm{weak}}=2$ for the column-orthonormal model. In contrast, previous results by Mondelli and Montanari show that $δ_{\mathrm{weak}}=1$ for the i.i.d. Gaussian model.
(2) The optimal design for the spectral method coincides with that for the i.i.d. Gaussian model, where the latter was recently introduced by Luo, Alghamdi and Lu.
△ Less
Submitted 9 September, 2020; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Consistent Risk Estimation in Moderately High-Dimensional Linear Regression
Authors:
Ji Xu,
Arian Maleki,
Kamiar Rahnama Rad,
Daniel Hsu
Abstract:
Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observa…
▽ More
Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observations $n$. However, a unifying methodology accompanied with a rigorous theory is lacking in high-dimensional settings. This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n,p \rightarrow \infty$ and $n/p \rightarrow δ>1$ ($δ$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i.e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques. A corner stone of our analysis is a bound that we obtain on the discrepancy of the `residuals' obtained from AMP and LOOCV. This connection not only enables us to obtain a more refined information on the estimates of AMP, ALO, and LOOCV, but also offers an upper bound on the convergence rate of each estimate.
△ Less
Submitted 18 January, 2021; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Optimal Data Detection in Large MIMO
Authors:
Charles Jeon,
Ramina Ghods,
Arian Maleki,
Christoph Studer
Abstract:
Large multiple-input multiple-output (MIMO) appears in massive multi-user MIMO and randomly-spread code-division multiple access (CDMA)-based wireless systems. In order to cope with the excessively high complexity of optimal data detection in such systems, a variety of efficient yet sub-optimal algorithms have been proposed in the past. In this paper, we propose a data detection algorithm that is…
▽ More
Large multiple-input multiple-output (MIMO) appears in massive multi-user MIMO and randomly-spread code-division multiple access (CDMA)-based wireless systems. In order to cope with the excessively high complexity of optimal data detection in such systems, a variety of efficient yet sub-optimal algorithms have been proposed in the past. In this paper, we propose a data detection algorithm that is computationally efficient and optimal in a sense that it is able to achieve the same error-rate performance as the individually optimal (IO) data detector under certain assumptions on the MIMO system matrix and constellation alphabet. Our algorithm, which we refer to as LAMA (short for large MIMO AMP), builds on complex-valued Bayesian approximate message passing (AMP), which enables an exact analytical characterization of the performance and complexity in the large-system limit via the state-evolution framework. We derive optimality conditions for LAMA and investigate performance/complexity trade-offs. As a byproduct of our analysis, we recover classical results of IO data detection for randomly-spread CDMA. We furthermore provide practical ways for LAMA to approach the theoretical performance limits in realistic, finite-dimensional systems at low computational complexity.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
Benefits of over-parameterization with EM
Authors:
Ji Xu,
Daniel Hsu,
Arian Maleki
Abstract:
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimati…
▽ More
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimating the mean vectors of a Gaussian mixture model in a scenario where the mixing weights are known. Our study shows that the global behavior of EM, when one uses an over-parameterized model in which the mixing weights are treated as unknown, is better than that when one uses the (correct) model with the mixing weights fixed to the known values. For symmetric Gaussians mixtures with two components, we prove that introducing the (statistically redundant) weight parameters enables EM to find the global maximizer of the log-likelihood starting from almost any initial mean parameters, whereas EM without this over-parameterization may very often fail. For other Gaussian mixtures, we provide empirical evidence that shows similar behavior. Our results corroborate the value of over-parameterization in solving non-convex optimization problems, previously observed in other domains.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems
Authors:
Shuaiwen Wang,
Wenda Zhou,
Arian Maleki,
Haihao Lu,
Vahab Mirrokni
Abstract:
Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1}
\hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response va…
▽ More
Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1}
\hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response variable respectively. Let $\ell$ and $R$ be the convex loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. $\mathcal{C} \subset \mathbb{R}^{p}$ is a closed convex set. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose three frameworks to obtain a computationally efficient approximation of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our three frameworks are based on the primal, dual, and proximal formulations of (1). Each framework shows its strength in certain types of problems. We prove the equivalence of the three approaches under smoothness conditions. This equivalence enables us to justify the accuracy of the three methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
A strategic framework for identifying the critical factors of 4G technology diffusion in I.R. Iran - A Fuzzy DEMATEL approach
Authors:
Hossein Sabzian,
Hossein Gharib,
Seyyed Mostafa Seyyed Hashemi,
Ali Maleki
Abstract:
As the most prominent representative of 4G, Long term evolution (LTE) technology has become a focal point for mobile network operators all over the world. However, although Iranian main operators like MCI and Irancell have hugely invested on deployment of this technology, its diffusion has been very slow with a penetration rate of 0.06 at the end of spring 2017. Nevertheless, if this rate doesn't…
▽ More
As the most prominent representative of 4G, Long term evolution (LTE) technology has become a focal point for mobile network operators all over the world. However, although Iranian main operators like MCI and Irancell have hugely invested on deployment of this technology, its diffusion has been very slow with a penetration rate of 0.06 at the end of spring 2017. Nevertheless, if this rate doesn't increase, it will yield some negative unintended consequences for telecom operators such as (I) Failure to provide a large number of high quality services (II) Inability to compete with OTT technologies (III) Loss of many revenue opportunities (IV) Prolongation of payback period and (V) The lack of technological integrability with fifth generation networks (5G) and loss of many IOT opportunities. Through discussing the literature of technology adoption and diffusion both generally and specifically, identifying the major limitations of these studies and establishing a comprehensive factor set based on four major groups of (I) mobile handset and operators-related factors (II) subscribers-related biological factors, (III) subscribers-related perceptual factors and (IV) subscribers-related contextual factors, a novel fuzzy DEMATEL model has been developed by which all ICT policy makers can not only get a clear knowledge of factors influencing technology adoption but also know the critical success factors (CSFs) influencing Iranians' mindsets towards LTE adoption. Therefore, they can make effective and actionable policies to scale up LTE diffusion or other ICT-related technologies throughout the society.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions
Authors:
Shuaiwen Wang,
Wenda Zhou,
Haihao Lu,
Arian Maleki,
Vahab Mirrokni
Abstract:
Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer,…
▽ More
Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of (1). We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.
-
Approximate Message Passing for Amplitude Based Optimization
Authors:
Junjie Ma,
Ji Xu,
Arian Maleki
Abstract:
We consider an $\ell_2$-regularized non-convex optimization problem for recovering signals from their noisy phaseless observations. We design and study the performance of a message passing algorithm that aims to solve this optimization problem. We consider the asymptotic setting $m,n \rightarrow \infty$, $m/n \rightarrow δ$ and obtain sharp performance bounds, where $m$ is the number of measuremen…
▽ More
We consider an $\ell_2$-regularized non-convex optimization problem for recovering signals from their noisy phaseless observations. We design and study the performance of a message passing algorithm that aims to solve this optimization problem. We consider the asymptotic setting $m,n \rightarrow \infty$, $m/n \rightarrow δ$ and obtain sharp performance bounds, where $m$ is the number of measurements and $n$ is the signal dimension. We show that for complex signals the algorithm can perform accurate recovery with only $m=\left ( \frac{64}{π^2}-4\right)n\approx 2.5n$ measurements. Also, we provide sharp analysis on the sensitivity of the algorithm to noise. We highlight the following facts about our message passing algorithm: (i) Adding $\ell_2$ regularization to the non-convex loss function can be beneficial even in the noiseless setting; (ii) spectral initialization has marginal impact on the performance of the algorithm.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Optimization-based AMP for Phase Retrieval: The Impact of Initialization and $\ell_2$-regularization
Authors:
Junjie Ma,
Ji Xu,
Arian Maleki
Abstract:
We consider an $\ell_2$-regularized non-convex optimization problem for recovering signals from their noisy phaseless observations. We design and study the performance of a message passing algorithm that aims to solve this optimization problem. We consider the asymptotic setting $m,n \rightarrow \infty$, $m/n \rightarrow δ$ and obtain sharp performance bounds, where $m$ is the number of measuremen…
▽ More
We consider an $\ell_2$-regularized non-convex optimization problem for recovering signals from their noisy phaseless observations. We design and study the performance of a message passing algorithm that aims to solve this optimization problem. We consider the asymptotic setting $m,n \rightarrow \infty$, $m/n \rightarrow δ$ and obtain sharp performance bounds, where $m$ is the number of measurements and $n$ is the signal dimension. We show that for complex signals the algorithm can perform accurate recovery with only $m= \left(\frac{64}{π^2}-4\right)n \approx 2.5n$ measurements. Also, we provide sharp analysis on the sensitivity of the algorithm to noise. We highlight the following facts about our message passing algorithm: (i) Adding $\ell_2$ regularization to the non-convex loss function can be beneficial. (ii) Spectral initialization has marginal impact on the performance of the algorithm. The sharp analyses in this paper, not only enable us to compare the performance of our method with other phase recovery schemes, but also shed light on designing better iterative algorithms for other non-convex optimization problems.
△ Less
Submitted 24 February, 2018; v1 submitted 3 January, 2018;
originally announced January 2018.
-
Using Black-box Compression Algorithms for Phase Retrieval
Authors:
Milad Bakhshizadeh,
Arian Maleki,
Shirin Jalali
Abstract:
Compressive phase retrieval refers to the problem of recovering a structured $n$-dimensional complex-valued vector from its phase-less under-determined linear measurements. The non-linearity of measurements makes designing theoretically-analyzable efficient phase retrieval algorithms challenging. As a result, to a great extent, algorithms designed in this area are developed to take advantage of si…
▽ More
Compressive phase retrieval refers to the problem of recovering a structured $n$-dimensional complex-valued vector from its phase-less under-determined linear measurements. The non-linearity of measurements makes designing theoretically-analyzable efficient phase retrieval algorithms challenging. As a result, to a great extent, algorithms designed in this area are developed to take advantage of simple structures such as sparsity and its convex generalizations. The goal of this paper is to move beyond simple models through employing compression codes. Such codes are typically developed to take advantage of complex signal models to represent the signals as efficiently as possible. In this work, it is shown how an existing compression code can be treated as a black box and integrated into an efficient solution for phase retrieval. First, COmpressive PhasE Retrieval (COPER) optimization, a computationally-intensive compression-based phase retrieval method, is proposed. COPER provides a theoretical framework for studying compression-based phase retrieval. The number of measurements required by COPER is connected to $κ$, the $α$-dimension (closely related to the rate-distortion dimension) of the given family of compression codes. To finds the solution of COPER, an efficient iterative algorithm called gradient descent for COPER (GD-COPER) is proposed. It is proven that under some mild conditions on the initialization, if the number of measurements is larger than $ C κ^2 \log^2 n$, where $C$ is a constant, GD-COPER obtains an accurate estimate of the input vector in polynomial time. In the simulation results, JPEG2000 is integrated in GD-COPER to confirm the superb performance of the resulting algorithm on real-world images.
△ Less
Submitted 8 June, 2020; v1 submitted 8 December, 2017;
originally announced December 2017.
-
VLSI Design of a Nonparametric Equalizer for Massive MU-MIMO
Authors:
Charles Jeon,
Gulnar Mirza,
Ramina Ghods,
Arian Maleki,
Christoph Studer
Abstract:
Linear minimum mean-square error (L-MMSE) equalization is among the most popular methods for data detection in massive multi-user multiple-input multiple-output (MU-MIMO) wireless systems. While L-MMSE equalization enables near-optimal spectral efficiency, accurate knowledge of the signal and noise powers is necessary. Furthermore, corresponding VLSI designs must solve linear systems of equations,…
▽ More
Linear minimum mean-square error (L-MMSE) equalization is among the most popular methods for data detection in massive multi-user multiple-input multiple-output (MU-MIMO) wireless systems. While L-MMSE equalization enables near-optimal spectral efficiency, accurate knowledge of the signal and noise powers is necessary. Furthermore, corresponding VLSI designs must solve linear systems of equations, which requires high arithmetic precision, exhibits stringent data dependencies, and results in high circuit complexity. This paper proposes the first VLSI design of the NOnParametric Equalizer (NOPE), which avoids knowledge of the transmit signal and noise powers, provably delivers the performance of L-MMSE equalization for massive MU-MIMO systems, and is resilient to numerous system and hardware impairments due to its parameter-free nature. Moreover, NOPE avoids computation of a matrix inverse and only requires hardware-friendly matrix-vector multiplications. To showcase the practical advantages of NOPE, we propose a parallel VLSI architecture and provide synthesis results in 28nm CMOS. We demonstrate that NOPE performs on par with existing data detectors for massive MU-MIMO that require accurate knowledge of the signal and noise powers.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
Which bridge estimator is optimal for variable selection?
Authors:
Shuaiwen Wang,
Haolei Weng,
Arian Maleki
Abstract:
We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations $n$ grows at the same rate as the number of predictors $p$. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds…
▽ More
We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations $n$ grows at the same rate as the number of predictors $p$. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the "important" predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATPP, in order to obtain a smaller AFDP, one should pick a bridge estimator with smaller asymptotic mean square error in the first stage of TVS. Based on such principled discovery, we present a sharp comparison of different TVS, via an in-depth investigation of the estimation properties of bridge estimators. Rather than "order-wise" error bounds with loose constants, our analysis focuses on precise error characterization. Various interesting signal-to-noise ratio and sparsity settings are studied. Our results offer new and thorough insights into high-dimensional variable selection. For instance, we prove that a TVS with Ridge in its first stage outperforms TVS with other bridge estimators in large noise settings; two-stage LASSO becomes inferior when the signal is rare and weak. As a by-product, we show that two-stage methods outperform some standard variable selection techniques, such as LASSO and Sure Independence Screening, under certain conditions.
△ Less
Submitted 25 March, 2020; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Low noise sensitivity analysis of Lq-minimization in oversampled systems
Authors:
Haolei Weng,
Arian Maleki
Abstract:
The class of Lq-regularized least squares (LQLS) are considered for estimating a p-dimensional vector \b{eta} from its n noisy linear observations y = X\b{eta}+w. The performance of these schemes are studied under the high-dimensional asymptotic setting in which p grows linearly with n. In this asymptotic setting, phase transition diagrams (PT) are often used for comparing the performance of diffe…
▽ More
The class of Lq-regularized least squares (LQLS) are considered for estimating a p-dimensional vector \b{eta} from its n noisy linear observations y = X\b{eta}+w. The performance of these schemes are studied under the high-dimensional asymptotic setting in which p grows linearly with n. In this asymptotic setting, phase transition diagrams (PT) are often used for comparing the performance of different estimators. Although phase transition analysis is shown to provide useful information for compressed sensing, the fact that it ignores the measurement noise not only limits its applicability in many application areas, but also may lead to misunderstandings. For instance, consider a linear regression problem in which n > p and the signal is not exactly sparse. If the measurement noise is ignored in such systems, regularization techniques, such as LQLS, seem to be irrelevant since even the ordinary least squares (OLS) returns the exact solution. However, it is well-known that if n is not much larger than p then the regularization techniques improve the performance of OLS. In response to this limitation of PT analysis, we consider the low-noise sensitivity analysis. We show that this analysis framework (i) reveals the advantage of LQLS over OLS, (ii) captures the difference between different LQLS estimators even when n > p, and (iii) provides a fair comparison among different estimators in high signal-to-noise ratios. As an application of this framework, we will show that under mild conditions LASSO outperforms other LQLS even when the signal is dense. Finally, by a simple transformation we connect our low-noise sensitivity framework to the classical asymptotic regime in which n/p goes to infinity and characterize how and when regularization techniques offer improvements over ordinary least squares, and which regularizer gives the most improvement when the sample size is large.
△ Less
Submitted 18 February, 2018; v1 submitted 9 May, 2017;
originally announced May 2017.
-
Optimally-Tuned Nonparametric Linear Equalization for Massive MU-MIMO Systems
Authors:
Ramina Ghods,
Charles Jeon,
Gulnar Mirza,
Arian Maleki,
Christoph Studer
Abstract:
This paper deals with linear equalization in massive multi-user multiple-input multiple-output (MU-MIMO) wireless systems. We first provide simple conditions on the antenna configuration for which the well-known linear minimum mean-square error (L-MMSE) equalizer provides near-optimal spectral efficiency, and we analyze its performance in the presence of parameter mismatches in the signal and/or n…
▽ More
This paper deals with linear equalization in massive multi-user multiple-input multiple-output (MU-MIMO) wireless systems. We first provide simple conditions on the antenna configuration for which the well-known linear minimum mean-square error (L-MMSE) equalizer provides near-optimal spectral efficiency, and we analyze its performance in the presence of parameter mismatches in the signal and/or noise powers. We then propose a novel, optimally-tuned NOnParametric Equalizer (NOPE) for massive MU-MIMO systems, which avoids knowledge of the transmit signal and noise powers altogether. We show that NOPE achieves the same performance as that of the L-MMSE equalizer in the large-antenna limit, and we demonstrate its efficacy in realistic, finite-dimensional systems. From a practical perspective, NOPE is computationally efficient and avoids dedicated training that is typically required for parameter estimation
△ Less
Submitted 8 May, 2017;
originally announced May 2017.
-
An efficient algorithm for compression-based compressed sensing
Authors:
Sajjad Beygi,
Shirin Jalali,
Arian Maleki,
Urbashi Mitra
Abstract:
Modern image and video compression codes employ elaborate structures existing in such signals to encode them into few number of bits. Compressed sensing recovery algorithms on the other hand use such signals' structures to recover them from few linear observations. Despite the steady progress in the field of compressed sensing, structures that are often used for signal recovery are still much simp…
▽ More
Modern image and video compression codes employ elaborate structures existing in such signals to encode them into few number of bits. Compressed sensing recovery algorithms on the other hand use such signals' structures to recover them from few linear observations. Despite the steady progress in the field of compressed sensing, structures that are often used for signal recovery are still much simpler than those employed by state-of-the-art compression codes. The main goal of this paper is to bridge this gap through answering the following question: Can one employ a given compression code to build an efficient (polynomial time) compressed sensing recovery algorithm? In response to this question, the compression-based gradient descent (C-GD) algorithm is proposed. C-GD, which is a low-complexity iterative algorithm, is able to employ a generic compression code for compressed sensing and therefore elevates the scope of structures used in compressed sensing to those used by compression codes. The convergence performance of C-GD and its required number of measurements in terms of the rate-distortion performance of the compression code are theoretically analyzed. It is also shown that C-GD is robust to additive white Gaussian noise. Finally, the presented simulation results show that combining C-GD with commercial image compression codes such as JPEG2000 yields state-of-the-art performance in imaging applications.
△ Less
Submitted 6 April, 2017;
originally announced April 2017.
-
On the Gaussianity of Kolmogorov Complexity of Mixing Sequences
Authors:
Morgane Austern,
Arian Maleki
Abstract:
Let $ K(X_1, \ldots, X_n)$ and $H(X_n | X_{n-1}, \ldots, X_1)$ denote the Kolmogorov complexity and Shannon's entropy rate of a stationary and ergodic process $\{X_i\}_{i=-\infty}^\infty$. It has been proved that \[ \frac{K(X_1, \ldots, X_n)}{n} - H(X_n | X_{n-1}, \ldots, X_1) \rightarrow 0, \] almost surely. This paper studies the convergence rate of this asymptotic result. In particular, we show…
▽ More
Let $ K(X_1, \ldots, X_n)$ and $H(X_n | X_{n-1}, \ldots, X_1)$ denote the Kolmogorov complexity and Shannon's entropy rate of a stationary and ergodic process $\{X_i\}_{i=-\infty}^\infty$. It has been proved that \[ \frac{K(X_1, \ldots, X_n)}{n} - H(X_n | X_{n-1}, \ldots, X_1) \rightarrow 0, \] almost surely. This paper studies the convergence rate of this asymptotic result. In particular, we show that if the process satisfies certain mixing conditions, then there exists $σ<\infty$ such that $$\sqrt{n}\left(\frac{K(X_{1:n})}{n}- H(X_0|X_1,\dots,X_{-\infty})\right) \rightarrow_d N(0,σ^2).$$ Furthermore, we show that under slightly stronger mixing conditions one may obtain non-asymptotic concentration bounds for the Kolmogorov complexity.
△ Less
Submitted 4 February, 2017;
originally announced February 2017.
-
Global analysis of Expectation Maximization for mixtures of two Gaussians
Authors:
Ji Xu,
Daniel Hsu,
Arian Maleki
Abstract:
Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical p…
▽ More
Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical principles behind EM and its algorithmic properties. Specifically, it provides a global analysis of EM for specific models in which the observations comprise an i.i.d. sample from a mixture of two Gaussians. This is achieved by (i) studying the sequence of parameters from idealized execution of EM in the infinite sample limit, and fully characterizing the limit points of the sequence in terms of the initial parameters; and then (ii) based on this convergence analysis, establishing statistical consistency (or lack thereof) for the actual sequence of parameters produced by EM.
△ Less
Submitted 26 August, 2016;
originally announced August 2016.
-
New approach to Bayesian high-dimensional linear regression
Authors:
Shirin Jalali,
Arian Maleki
Abstract:
Consider the problem of estimating parameters $X^n \in \mathbb{R}^n $, generated by a stationary process, from $m$ response variables $Y^m = AX^n+Z^m$, under the assumption that the distribution of $X^n$ is known. This is the most general version of the Bayesian linear regression problem. The lack of computationally feasible algorithms that can employ generic prior distributions and provide a good…
▽ More
Consider the problem of estimating parameters $X^n \in \mathbb{R}^n $, generated by a stationary process, from $m$ response variables $Y^m = AX^n+Z^m$, under the assumption that the distribution of $X^n$ is known. This is the most general version of the Bayesian linear regression problem. The lack of computationally feasible algorithms that can employ generic prior distributions and provide a good estimate of $X^n$ has limited the set of distributions researchers use to model the data. In this paper, a new scheme called Q-MAP is proposed. The new method has the following properties: (i) It has similarities to the popular MAP estimation under the noiseless setting. (ii) In the noiseless setting, it achieves the "asymptotically optimal performance" when $X^n$ has independent and identically distributed components. (iii) It scales favorably with the dimensions of the problem and therefore is applicable to high-dimensional setups. (iv) The solution of the Q-MAP optimization can be found via a proposed iterative algorithm which is provably robust to the error (noise) in the response variables.
△ Less
Submitted 6 April, 2017; v1 submitted 9 July, 2016;
originally announced July 2016.
-
On the Performance of Mismatched Data Detection in Large MIMO Systems
Authors:
Charles Jeon,
Arian Maleki,
Christoph Studer
Abstract:
We investigate the performance of mismatched data detection in large multiple-input multiple-output (MIMO) systems, where the prior distribution of the transmit signal used in the data detector differs from the true prior. To minimize the performance loss caused by this prior mismatch, we include a tuning stage into our recently-proposed large MIMO approximate message passing (LAMA) algorithm, whi…
▽ More
We investigate the performance of mismatched data detection in large multiple-input multiple-output (MIMO) systems, where the prior distribution of the transmit signal used in the data detector differs from the true prior. To minimize the performance loss caused by this prior mismatch, we include a tuning stage into our recently-proposed large MIMO approximate message passing (LAMA) algorithm, which allows us to develop mismatched LAMA algorithms with optimal as well as sub-optimal tuning. We show that carefully-selected priors often enable simpler and computationally more efficient algorithms compared to LAMA with the true prior while achieving near-optimal performance. A performance analysis of our algorithms for a Gaussian prior and a uniform prior within a hypercube covering the QAM constellation recovers classical and recent results on linear and non-linear MIMO data detection, respectively.
△ Less
Submitted 22 June, 2016; v1 submitted 8 May, 2016;
originally announced May 2016.
-
Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques
Authors:
Haolei Weng,
Arian Maleki,
Le Zheng
Abstract:
We study the problem of estimating $β\in \mathbb{R}^p$ from its noisy linear observations $y= Xβ+ w$, where $w \sim N(0, σ_w^2 I_{n\times n})$, under the following high-dimensional asymptotic regime: given a fixed number $δ$, $p \rightarrow \infty$, while $n/p \rightarrow δ$. We consider the popular class of $\ell_q$-regularized least squares (LQLS) estimators, a.k.a. bridge, given by the optimiza…
▽ More
We study the problem of estimating $β\in \mathbb{R}^p$ from its noisy linear observations $y= Xβ+ w$, where $w \sim N(0, σ_w^2 I_{n\times n})$, under the following high-dimensional asymptotic regime: given a fixed number $δ$, $p \rightarrow \infty$, while $n/p \rightarrow δ$. We consider the popular class of $\ell_q$-regularized least squares (LQLS) estimators, a.k.a. bridge, given by the optimization problem: \begin{equation*} \hatβ (λ, q ) \in \arg\min_β\frac{1}{2} \|y-Xβ\|_2^2+ λ\|β\|_q^q, \end{equation*} and characterize the almost sure limit of $\frac{1}{p} \|\hatβ (λ, q )- β\|_2^2$. The expression we derive for this limit does not have explicit forms and hence are not useful in comparing different algorithms, or providing information in evaluating the effect of $δ$ or sparsity level of $β$. To simplify the expressions, researchers have considered the ideal "no-noise" regime and have characterized the values of $δ$ for which the almost sure limit is zero. This is known as the phase transition analysis.
In this paper, we first perform the phase transition analysis of LQLS. Our results reveal some of the limitations and misleading features of the phase transition analysis. To overcome these limitations, we propose the study of these algorithms under the low noise regime. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also makes an accurate comparison of different regularizers possible.
△ Less
Submitted 20 October, 2017; v1 submitted 23 March, 2016;
originally announced March 2016.
-
Consistent Parameter Estimation for LASSO and Approximate Message Passing
Authors:
Ali Mousavi,
Arian Maleki,
Richard G. Baraniuk
Abstract:
We consider the problem of recovering a vector $β_o \in \mathbb{R}^p$ from $n$ random and noisy linear observations $y= Xβ_o + w$, where $X$ is the measurement matrix and $w$ is noise. The LASSO estimate is given by the solution to the optimization problem $\hatβ_λ = \arg \min_β \frac{1}{2} \|y-Xβ\|_2^2 + λ\| β\|_1$. Among the iterative algorithms that have been proposed for solving this optimizat…
▽ More
We consider the problem of recovering a vector $β_o \in \mathbb{R}^p$ from $n$ random and noisy linear observations $y= Xβ_o + w$, where $X$ is the measurement matrix and $w$ is noise. The LASSO estimate is given by the solution to the optimization problem $\hatβ_λ = \arg \min_β \frac{1}{2} \|y-Xβ\|_2^2 + λ\| β\|_1$. Among the iterative algorithms that have been proposed for solving this optimization problem, approximate message passing (AMP) has attracted attention for its fast convergence. Despite significant progress in the theoretical analysis of the estimates of LASSO and AMP, little is known about their behavior as a function of the regularization parameter $λ$, or the thereshold parameters $τ^t$. For instance the following basic questions have not yet been studied in the literature: (i) How does the size of the active set $\|\hatβ^λ\|_0/p$ behave as a function of $λ$? (ii) How does the mean square error $\|\hatβ_λ - β_o\|_2^2/p$ behave as a function of $λ$? (iii) How does $\|β^t - β_o \|_2^2/p$ behave as a function of $τ^1, \ldots, τ^{t-1}$? Answering these questions will help in addressing practical challenges regarding the optimal tuning of $λ$ or $τ^1, τ^2, \ldots$. This paper answers these questions in the asymptotic setting and shows how these results can be employed in deriving simple and theoretically optimal approaches for tuning the parameters $τ^1, \ldots, τ^t$ for AMP or $λ$ for LASSO. It also explores the connection between the optimal tuning of the parameters of AMP and the optimal tuning of LASSO.
△ Less
Submitted 4 November, 2015; v1 submitted 3 November, 2015;
originally announced November 2015.
-
Optimal Large-MIMO Data Detection with Transmit Impairments
Authors:
Ramina Ghods,
Charles Jeon,
Arian Maleki,
Christoph Studer
Abstract:
Real-world transceiver designs for multiple-input multiple-output (MIMO) wireless communication systems are affected by a number of hardware impairments that already appear at the transmit side, such as amplifier non-linearities, quantization artifacts, and phase noise. While such transmit-side impairments are routinely ignored in the data-detection literature, they often limit reliable communicat…
▽ More
Real-world transceiver designs for multiple-input multiple-output (MIMO) wireless communication systems are affected by a number of hardware impairments that already appear at the transmit side, such as amplifier non-linearities, quantization artifacts, and phase noise. While such transmit-side impairments are routinely ignored in the data-detection literature, they often limit reliable communication in practical systems. In this paper, we present a novel data-detection algorithm, referred to as large-MIMO approximate message passing with transmit impairments (short LAMA-I), which takes into account a broad range of transmit-side impairments in wireless systems with a large number of transmit and receive antennas. We provide conditions in the large-system limit for which LAMA-I achieves the error-rate performance of the individually-optimal (IO) data detector. We furthermore demonstrate that LAMA-I achieves near-IO performance at low computational complexity in realistic, finite dimensional large-MIMO systems.
△ Less
Submitted 20 October, 2015;
originally announced October 2015.
-
Optimality of Large MIMO Detection via Approximate Message Passing
Authors:
Charles Jeon,
Ramina Ghods,
Arian Maleki,
Christoph Studer
Abstract:
Optimal data detection in multiple-input multiple-output (MIMO) communication systems with a large number of antennas at both ends of the wireless link entails prohibitive computational complexity. In order to reduce the computational complexity, a variety of sub-optimal detection algorithms have been proposed in the literature. In this paper, we analyze the optimality of a novel data-detection me…
▽ More
Optimal data detection in multiple-input multiple-output (MIMO) communication systems with a large number of antennas at both ends of the wireless link entails prohibitive computational complexity. In order to reduce the computational complexity, a variety of sub-optimal detection algorithms have been proposed in the literature. In this paper, we analyze the optimality of a novel data-detection method for large MIMO systems that relies on approximate message passing (AMP). We show that our algorithm, referred to as individually-optimal (IO) large-MIMO AMP (short IO-LAMA), is able to perform IO data detection given certain conditions on the MIMO system and the constellation set (e.g., QAM or PSK) are met.
△ Less
Submitted 20 October, 2015;
originally announced October 2015.
-
Does $\ell_p$-minimization outperform $\ell_1$-minimization?
Authors:
Le Zheng,
Arian Maleki,
Haolei Weng,
Xiaodong Wang,
Teng Long
Abstract:
In many application areas we are faced with the following question: Can we recover a sparse vector $x_o \in \mathbb{R}^N$ from its undersampled set of noisy observations $y \in \mathbb{R}^n$, $y=A x_o+w$. The last decade has witnessed a surge of algorithms and theoretical results addressing this question. One of the most popular algorithms is the $\ell_p$-regularized least squares (LPLS) given by…
▽ More
In many application areas we are faced with the following question: Can we recover a sparse vector $x_o \in \mathbb{R}^N$ from its undersampled set of noisy observations $y \in \mathbb{R}^n$, $y=A x_o+w$. The last decade has witnessed a surge of algorithms and theoretical results addressing this question. One of the most popular algorithms is the $\ell_p$-regularized least squares (LPLS) given by the following formulation: \[ \hat{x}(γ,p )\in \arg\min_x \frac{1}{2}\|y - Ax\|_2^2+γ\|x\|_p^p, \] where $p \in [0,1]$. Despite the non-convexity of these problems for $p<1$, they are still appealing because of the following folklores in compressed sensing: (i) $\hat{x}(γ,p )$ is closer to $x_o$ than $\hat{x}(γ,1)$. (ii) If we employ iterative methods that aim to converge to a local minima of LPLS, then under good initialization these algorithms converge to a solution that is closer to $x_o$ than $\hat{x}(γ,1)$. In spite of the existence of plenty of empirical results that support these folklore theorems, the theoretical progress to establish them has been very limited.
This paper aims to study the above folklore theorems and establish their scope of validity. Starting with approximate message passing algorithm as a heuristic method for solving LPLS, we study the impact of initialization on the performance of AMP. Then, we employ the replica analysis to show the connection between the solution of AMP and $\hat{x}(γ, p)$ in the asymptotic settings. This enables us to compare the accuracy of $\hat{x}(γ,p)$ for $p \in [0,1]$. In particular, we will characterize the phase transition and noise sensitivity of LPLS for every $0\leq p\leq 1$ accurately. Our results in the noiseless setting confirm that LPLS exhibits the same phase transition for every $0\leq p <1$ and this phase transition is much higher than that of LASSO.
△ Less
Submitted 10 June, 2016; v1 submitted 15 January, 2015;
originally announced January 2015.
-
From Denoising to Compressed Sensing
Authors:
Christopher A. Metzler,
Arian Maleki,
Richard G. Baraniuk
Abstract:
A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, today's denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomi…
▽ More
A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, today's denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomized measurements. Typical CS reconstruction algorithms can be cast as iteratively estimating a signal from a perturbed observation. This paper answers a natural question: How can one effectively employ a generic denoiser in a CS reconstruction algorithm? In response, we develop an extension of the approximate message passing (AMP) framework, called Denoising-based AMP (D-AMP), that can integrate a wide class of denoisers within its iterations. We demonstrate that, when used with a high performance denoiser for natural images, D-AMP offers state-of-the-art CS recovery performance while operating tens of times faster than competing methods. We explain the exceptional performance of D-AMP by analyzing some of its theoretical features. A key element in D-AMP is the use of an appropriate Onsager correction term in its iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.
△ Less
Submitted 17 April, 2016; v1 submitted 16 June, 2014;
originally announced June 2014.
-
Parameterless Optimal Approximate Message Passing
Authors:
Ali Mousavi,
Arian Maleki,
Richard G. Baraniuk
Abstract:
Iterative thresholding algorithms are well-suited for high-dimensional problems in sparse recovery and compressive sensing. The performance of this class of algorithms depends heavily on the tuning of certain threshold parameters. In particular, both the final reconstruction error and the convergence rate of the algorithm crucially rely on how the threshold parameter is set at each step of the alg…
▽ More
Iterative thresholding algorithms are well-suited for high-dimensional problems in sparse recovery and compressive sensing. The performance of this class of algorithms depends heavily on the tuning of certain threshold parameters. In particular, both the final reconstruction error and the convergence rate of the algorithm crucially rely on how the threshold parameter is set at each step of the algorithm. In this paper, we propose a parameter-free approximate message passing (AMP) algorithm that sets the threshold parameter at each iteration in a fully automatic way without either having an information about the signal to be reconstructed or needing any tuning from the user. We show that the proposed method attains both the minimum reconstruction error and the highest convergence rate. Our method is based on applying the Stein unbiased risk estimate (SURE) along with a modified gradient descent to find the optimal threshold in each iteration. Motivated by the connections between AMP and LASSO, it could be employed to find the solution of the LASSO for the optimal regularization parameter. To the best of our knowledge, this is the first work concerning parameter tuning that obtains the fastest convergence rate with theoretical guarantees.
△ Less
Submitted 31 October, 2013;
originally announced November 2013.
-
Asymptotic Analysis of LASSOs Solution Path with Implications for Approximate Message Passing
Authors:
Ali Mousavi,
Arian Maleki,
Richard G. Baraniuk
Abstract:
This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to th…
▽ More
This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to the optimization problem $x_o$ with $\hat{x}_λ = \arg \min_x \frac{1}{2} \|y-Ax\|_2^2 + λ\|x\|_1$. Despite major progress in the theoretical analysis of the LASSO solution, little is known about its behavior as a function of the regularization parameter $λ$. In this paper we study two questions in the asymptotic setting (i.e., where $N \rightarrow \infty$, $n \rightarrow \infty$ while the ratio $n/N$ converges to a fixed number in $(0,1)$): (i) How does the size of the active set $\|\hat{x}_λ\|_0/N$ behave as a function of $λ$, and (ii) How does the mean square error $\|\hat{x}_λ - x_o\|_2^2/N$ behave as a function of $λ$? We then employ these results in a new, reliable algorithm for solving LASSO based on approximate message passing (AMP).
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Maximin Analysis of Message Passing Algorithms for Recovering Block Sparse Signals
Authors:
Armeen Taeb,
Arian Maleki,
Christoph Studer,
Richard Baraniuk
Abstract:
We consider the problem of recovering a block (or group) sparse signal from an underdetermined set of random linear measurements, which appear in compressed sensing applications such as radar and imaging. Recent results of Donoho, Johnstone, and Montanari have shown that approximate message passing (AMP) in combination with Stein's shrinkage outperforms group LASSO for large block sizes. In this p…
▽ More
We consider the problem of recovering a block (or group) sparse signal from an underdetermined set of random linear measurements, which appear in compressed sensing applications such as radar and imaging. Recent results of Donoho, Johnstone, and Montanari have shown that approximate message passing (AMP) in combination with Stein's shrinkage outperforms group LASSO for large block sizes. In this paper, we prove that, for a fixed block size and in the strong undersampling regime (i.e., having very few measurements compared to the ambient dimension), AMP cannot improve upon group LASSO, thereby complementing the results of Donoho et al.
△ Less
Submitted 10 March, 2013;
originally announced March 2013.
-
From compression to compressed sensing
Authors:
Shirin Jalali,
Arian Maleki
Abstract:
Can compression algorithms be employed for recovering signals from their underdetermined set of linear measurements? Addressing this question is the first step towards applying compression algorithms for compressed sensing (CS). In this paper, we consider a family of compression algorithms $\mathcal{C}_r$, parametrized by rate $r$, for a compact class of signals $\mathcal{Q} \subset \mathds{R}^n$.…
▽ More
Can compression algorithms be employed for recovering signals from their underdetermined set of linear measurements? Addressing this question is the first step towards applying compression algorithms for compressed sensing (CS). In this paper, we consider a family of compression algorithms $\mathcal{C}_r$, parametrized by rate $r$, for a compact class of signals $\mathcal{Q} \subset \mathds{R}^n$. The set of natural images and JPEG at different rates are examples of $\mathcal{Q}$ and $\mathcal{C}_r$, respectively. We establish a connection between the rate-distortion performance of $\mathcal{C}_r$, and the number of linear measurements required for successful recovery in CS. We then propose compressible signal pursuit (CSP) algorithm and prove that, with high probability, it accurately and robustly recovers signals from an underdetermined set of linear measurements. We also explore the performance of CSP in the recovery of infinite dimensional signals.
△ Less
Submitted 10 July, 2013; v1 submitted 17 December, 2012;
originally announced December 2012.
-
Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
Authors:
Dominique Guillot,
Bala Rajaratnam,
Benjamin T. Rolfs,
Arian Maleki,
Ian Wong
Abstract:
The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been…
▽ More
The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log e) iteration complexity to reach a tolerance of e. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned.
△ Less
Submitted 26 November, 2012; v1 submitted 12 November, 2012;
originally announced November 2012.
-
Minimum Complexity Pursuit for Universal Compressed Sensing
Authors:
Shirin Jalali,
Arian Maleki,
Richard Baraniuk
Abstract:
The nascent field of compressed sensing is founded on the fact that high-dimensional signals with "simple structure" can be recovered accurately from just a small number of randomized samples. Several specific kinds of structures have been explored in the literature, from sparsity and group sparsity to low-rankness. However, two fundamental questions have been left unanswered, namely: What are the…
▽ More
The nascent field of compressed sensing is founded on the fact that high-dimensional signals with "simple structure" can be recovered accurately from just a small number of randomized samples. Several specific kinds of structures have been explored in the literature, from sparsity and group sparsity to low-rankness. However, two fundamental questions have been left unanswered, namely: What are the general abstract meanings of "structure" and "simplicity"? And do there exist universal algorithms for recovering such simple structured objects from fewer samples than their ambient dimension? In this paper, we address these two questions. Using algorithmic information theory tools such as the Kolmogorov complexity, we provide a unified definition of structure and simplicity. Leveraging this new definition, we develop and analyze an abstract algorithm for signal recovery motivated by Occam's Razor.Minimum complexity pursuit (MCP) requires just O(3κ) randomized samples to recover a signal of complexity κand ambient dimension n. We also discuss the performance of MCP in the presence of measurement noise and with approximately simple signals.
△ Less
Submitted 5 July, 2013; v1 submitted 28 August, 2012;
originally announced August 2012.
-
Minimum Complexity Pursuit: Stability Analysis
Authors:
Shirin Jalali,
Arian Maleki,
Richard Baraniuk
Abstract:
A host of problems involve the recovery of structured signals from a dimensionality reduced representation such as a random projection; examples include sparse signals (compressive sensing) and low-rank matrices (matrix completion). Given the wide range of different recovery algorithms developed to date, it is natural to ask whether there exist "universal" algorithms for recovering "structured" si…
▽ More
A host of problems involve the recovery of structured signals from a dimensionality reduced representation such as a random projection; examples include sparse signals (compressive sensing) and low-rank matrices (matrix completion). Given the wide range of different recovery algorithms developed to date, it is natural to ask whether there exist "universal" algorithms for recovering "structured" signals from their linear projections. We recently answered this question in the affirmative in the noise-free setting. In this paper, we extend our results to the case of noisy measurements.
△ Less
Submitted 21 May, 2012;
originally announced May 2012.