-
Uncertainty quantification for learned ISTA
Authors:
Frederik Hoppe,
Claudio Mayrink Verdun,
Felix Krahmer,
Hannah Laus,
Holger Rauhut
Abstract:
Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling scheme…
▽ More
Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models
Authors:
Leonardo Galli,
Holger Rauhut,
Mark Schmidt
Abstract:
Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step…
▽ More
Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.
△ Less
Submitted 25 October, 2023; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Robust Implicit Regularization via Weight Normalization
Authors:
Hung-Hsu Chou,
Holger Rauhut,
Rachel Ward
Abstract:
Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear netwo…
▽ More
Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice.However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient flow (continuous-time version of gradient descent) with weight normalization, where the weight vector is reparameterized in terms of polar coordinates, and gradient flow is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient flow, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.
△ Less
Submitted 23 February, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Uncertainty quantification for sparse Fourier recovery
Authors:
Frederik Hoppe,
Felix Krahmer,
Claudio Mayrink Verdun,
Marion I. Menzel,
Holger Rauhut
Abstract:
One of the most prominent methods for uncertainty quantification in high-dimen-sional statistics is the desparsified LASSO that relies on unconstrained $\ell_1$-minimization. The majority of initial works focused on real (sub-)Gaussian designs. However, in many applications, such as magnetic resonance imaging (MRI), the measurement process possesses a certain structure due to the nature of the pro…
▽ More
One of the most prominent methods for uncertainty quantification in high-dimen-sional statistics is the desparsified LASSO that relies on unconstrained $\ell_1$-minimization. The majority of initial works focused on real (sub-)Gaussian designs. However, in many applications, such as magnetic resonance imaging (MRI), the measurement process possesses a certain structure due to the nature of the problem. The measurement operator in MRI can be described by a subsampled Fourier matrix. The purpose of this work is to extend the uncertainty quantification process using the desparsified LASSO to design matrices originating from a bounded orthonormal system, which naturally generalizes the subsampled Fourier case and also allows for the treatment of the case where the sparsity basis is not the standard basis. In particular we construct honest confidence intervals for every pixel of an MR image that is sparse in the standard basis provided the number of measurements satisfies $n \gtrsim\max\{ s\log^2 s\log p, s \log^2 p \}$ or that is sparse with respect to the Haar Wavelet basis provided a slightly larger number of measurements.
△ Less
Submitted 13 September, 2023; v1 submitted 30 December, 2022;
originally announced December 2022.
-
More is Less: Inducing Sparsity via Overparameterization
Authors:
Hung-Hsu Chou,
Johannes Maly,
Holger Rauhut
Abstract:
In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse…
▽ More
In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressed sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, if there exists an exact solution, vanilla gradient flow for the overparameterized loss functional converges to a good approximation of the solution of minimal $\ell_1$-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressed sensing via gradient flow/descent on overparameterized models derived in previous works. The theory accurately predicts the recovery rate in numerical experiments. Our proof relies on analyzing a certain Bregman divergence of the flow. This bypasses the obstacles caused by non-convexity and should be of independent interest.
△ Less
Submitted 10 May, 2023; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks
Authors:
Ekkehard Schnoor,
Arash Behboodi,
Holger Rauhut
Abstract:
Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedfo…
▽ More
Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.
△ Less
Submitted 17 January, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Spark Deficient Gabor Frames for Inverse Problems
Authors:
Vasiliki Kouni,
Holger Rauhut
Abstract:
In this paper, we apply star-Digital Gabor Transform in analysis Compressed Sensing and speech denoising. Based on assumptions on the ambient dimension, we produce a window vector that generates a spark deficient Gabor frame with many linear dependencies among its elements. We conduct computational experiments on both synthetic and real-world signals, using as baseline three Gabor transforms gener…
▽ More
In this paper, we apply star-Digital Gabor Transform in analysis Compressed Sensing and speech denoising. Based on assumptions on the ambient dimension, we produce a window vector that generates a spark deficient Gabor frame with many linear dependencies among its elements. We conduct computational experiments on both synthetic and real-world signals, using as baseline three Gabor transforms generated by state-of-the-art window vectors and compare their performance to star-Gabor transform. Results show that the proposed star-Gabor transform outperforms all others in all signal cases.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
ADMM-DAD net: a deep unfolding network for analysis compressed sensing
Authors:
Vasiliki Kouni,
Georgios Paraskevopoulos,
Holger Rauhut,
George C. Alexandropoulos
Abstract:
In this paper, we propose a new deep unfolding neural network based on the ADMM algorithm for analysis Compressed Sensing. The proposed network jointly learns a redundant analysis operator for sparsification and reconstructs the signal of interest. We compare our proposed network with a state-of-the-art unfolded ISTA decoder, that also learns an orthogonal sparsifier. Moreover, we consider not onl…
▽ More
In this paper, we propose a new deep unfolding neural network based on the ADMM algorithm for analysis Compressed Sensing. The proposed network jointly learns a redundant analysis operator for sparsification and reconstructs the signal of interest. We compare our proposed network with a state-of-the-art unfolded ISTA decoder, that also learns an orthogonal sparsifier. Moreover, we consider not only image, but also speech datasets as test examples. Computational experiments demonstrate that our proposed network outperforms the state-of-the-art deep unfolding network, consistently for both real-world image and speech datasets.
△ Less
Submitted 2 May, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Path classification by stochastic linear recurrent neural networks
Authors:
Wiebke Bartolomaeus,
Youness Boutaib,
Sandra Nestler,
Holger Rauhut
Abstract:
We investigate the functioning of a classifying biological neural network from the perspective of statistical learning theory, modelled, in a simplified setting, as a continuous-time stochastic recurrent neural network (RNN) with identity activation function. In the purely stochastic (robust) regime, we give a generalisation error bound that holds with high probability, thus showing that the empir…
▽ More
We investigate the functioning of a classifying biological neural network from the perspective of statistical learning theory, modelled, in a simplified setting, as a continuous-time stochastic recurrent neural network (RNN) with identity activation function. In the purely stochastic (robust) regime, we give a generalisation error bound that holds with high probability, thus showing that the empirical risk minimiser is the best-in-class hypothesis. We show that RNNs retain a partial signature of the paths they are fed as the unique information exploited for training and classification tasks. We argue that these RNNs are easy to train and robust and back these observations with numerical experiments on both synthetic and real data. We also exhibit a trade-off phenomenon between accuracy and robustness.
△ Less
Submitted 7 January, 2022; v1 submitted 6 August, 2021;
originally announced August 2021.
-
Convergence of gradient descent for learning linear neural networks
Authors:
Gabin Maxime Nguegnang,
Holger Rauhut,
Ulrich Terstiege
Abstract:
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for al…
▽ More
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.
△ Less
Submitted 24 November, 2021; v1 submitted 4 August, 2021;
originally announced August 2021.
-
New challenges in covariance estimation: multiple structures and coarse quantization
Authors:
Johannes Maly,
Tianyu Yang,
Sjoerd Dirksen,
Holger Rauhut,
Giuseppe Caire
Abstract:
In this self-contained chapter, we revisit a fundamental problem of multivariate statistics: estimating covariance matrices from finitely many independent samples. Based on massive Multiple-Input Multiple-Output (MIMO) systems we illustrate the necessity of leveraging structure and considering quantization of samples when estimating covariance matrices in practice. We then provide a selective surv…
▽ More
In this self-contained chapter, we revisit a fundamental problem of multivariate statistics: estimating covariance matrices from finitely many independent samples. Based on massive Multiple-Input Multiple-Output (MIMO) systems we illustrate the necessity of leveraging structure and considering quantization of samples when estimating covariance matrices in practice. We then provide a selective survey of theoretical advances of the last decade focusing on the estimation of structured covariance matrices. This review is spiced up by some yet unpublished insights on how to benefit from combined structural constraints. Finally, we summarize the findings of our recently published preprint "Covariance estimation under one-bit quantization" to show how guaranteed covariance estimation is possible even under coarse quantization of the samples.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
Star DGT: a Robust Gabor Transform for Speech Denoising
Authors:
Vasiliki Kouni,
Holger Rauhut,
Theoharis Theoharis
Abstract:
In this paper, we address the speech denoising problem, where Gaussian, pink and blue additive noises are to be removed from a given speech signal. Our approach is based on a redundant, analysis-sparse representation of the original speech signal. We pick an eigenvector of the Zauner unitary matrix and -- under certain assumptions on the ambient dimension -- we use it as window vector to generate…
▽ More
In this paper, we address the speech denoising problem, where Gaussian, pink and blue additive noises are to be removed from a given speech signal. Our approach is based on a redundant, analysis-sparse representation of the original speech signal. We pick an eigenvector of the Zauner unitary matrix and -- under certain assumptions on the ambient dimension -- we use it as window vector to generate a spark deficient Gabor frame. The analysis operator associated with such a frame, is a (highly) redundant Gabor transform, which we use as a sparsifying transform in denoising procedure. We conduct computational experiments on real-world speech data, using as baseline three Gabor transforms generated by state-of-the-art window vectors in time-frequency analysis and compare their performance to the proposed Gabor transform. The results show that our proposed redundant Gabor transform outperforms all others, consistently for all signals.
△ Less
Submitted 27 December, 2021; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Covariance estimation under one-bit quantization
Authors:
Sjoerd Dirksen,
Johannes Maly,
Holger Rauhut
Abstract:
We consider the classical problem of estimating the covariance matrix of a subgaussian distribution from i.i.d. samples in the novel context of coarse quantization, i.e., instead of having full knowledge of the samples, they are quantized to one or two bits per entry. This problem occurs naturally in signal processing applications. We introduce new estimators in two different quantization scenario…
▽ More
We consider the classical problem of estimating the covariance matrix of a subgaussian distribution from i.i.d. samples in the novel context of coarse quantization, i.e., instead of having full knowledge of the samples, they are quantized to one or two bits per entry. This problem occurs naturally in signal processing applications. We introduce new estimators in two different quantization scenarios and derive non-asymptotic estimation error bounds in terms of the operator norm. In the first scenario we consider a simple, scale-invariant one-bit quantizer and derive an estimation result for the correlation matrix of a centered Gaussian distribution. In the second scenario, we add random dithering to the quantizer. In this case we can accurately estimate the full covariance matrix of a general subgaussian distribution by collecting two bits per entry of each sample. In both scenarios, our bounds apply to masked covariance estimation. We demonstrate the near-optimality of our error bounds by deriving corresponding (minimax) lower bounds and using numerical simulations.
△ Less
Submitted 22 April, 2022; v1 submitted 2 April, 2021;
originally announced April 2021.
-
Spark Deficient Gabor Frame Provides a Novel Analysis Operator for Compressed Sensing
Authors:
Vasiliki Kouni,
Holger Rauhut
Abstract:
The analysis sparsity model is a very effective approach in modern Compressed Sensing applications. Specifically, redundant analysis operators can lead to fewer measurements needed for reconstruction when employing the analysis $l_1$-minimization in Compressed Sensing. In this paper, we pick an eigenvector of the Zauner unitary matrix and -- under certain assumptions on the ambient dimension -- we…
▽ More
The analysis sparsity model is a very effective approach in modern Compressed Sensing applications. Specifically, redundant analysis operators can lead to fewer measurements needed for reconstruction when employing the analysis $l_1$-minimization in Compressed Sensing. In this paper, we pick an eigenvector of the Zauner unitary matrix and -- under certain assumptions on the ambient dimension -- we build a spark deficient Gabor frame. The analysis operator associated with such a spark deficient Gabor frame, is a new (highly) redundant Gabor transform, which we use as a sparsifying transform in Compressed Sensing. We conduct computational experiments -- on both synthetic and real-world data -- solving the analysis $l_1$-minimization problem of Compressed Sensing, with four different choices of analysis operators, including our Gabor analysis operator. The results show that our proposed redundant Gabor transform outperforms -- in all cases -- Gabor transforms generated by state-of-the-art window vectors of time-frequency analysis.
△ Less
Submitted 13 October, 2021; v1 submitted 20 March, 2021;
originally announced March 2021.
-
Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank
Authors:
Hung-Hsu Chou,
Carsten Gieshoff,
Johannes Maly,
Holger Rauhut
Abstract:
In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an implicit bias on the computed solution. In practice, (stochastic) gradientdescent tends to prefer solutions which generalize well, which provides a possible explana…
▽ More
In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an implicit bias on the computed solution. In practice, (stochastic) gradientdescent tends to prefer solutions which generalize well, which provides a possible explanation of thesuccess of deep learning. In this paper we analyze the dynamics of gradient descent in the simplifiedsetting of linear networks and of an estimation problem. Although we are not in an overparameterizedscenario, our analysis nevertheless provides insights into the phenomenon of implicit bias. In fact, wederive a rigorous analysis of the dynamics of vanilla gradient descent, and characterize the dynamicalconvergence of the spectrum. We are able to accurately locate time intervals where the effective rankof the iterates is close to the effective rank of a low-rank projection of the ground-truth matrix. Inpractice, those intervals can be used as criteria for early stop** if a certain regularity is desired. Wealso provide empirical evidence for implicit bias in more general scenarios, such as matrix sensing andrandom initialization. This suggests that deep learning prefers trajectories whose complexity (measuredin terms of effective rank) is monotonically increasing, which we believe is a fundamental concept for thetheoretical understanding of deep learning.
△ Less
Submitted 20 August, 2023; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Compressive Sensing and Neural Networks from a Statistical Learning Perspective
Authors:
Arash Behboodi,
Holger Rauhut,
Ekkehard Schnoor
Abstract:
Various iterative reconstruction algorithms for inverse problems can be unfolded as neural networks. Empirically, this approach has often led to improved results, but theoretical guarantees are still scarce. While some progress on generalization properties of neural networks have been made, great challenges remain. In this chapter, we discuss and combine these topics to present a generalization er…
▽ More
Various iterative reconstruction algorithms for inverse problems can be unfolded as neural networks. Empirically, this approach has often led to improved results, but theoretical guarantees are still scarce. While some progress on generalization properties of neural networks have been made, great challenges remain. In this chapter, we discuss and combine these topics to present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements. The hypothesis class considered is inspired by the classical iterative soft-thresholding algorithm (ISTA). The neural networks in this class are obtained by unfolding iterations of ISTA and learning some of the weights. Based on training samples, we aim at learning the optimal network parameters via empirical risk minimization and thereby the optimal network that reconstructs signals from their compressive linear measurements. In particular, we may learn a sparsity basis that is shared by all of the iterations/layers and thereby obtain a new approach for dictionary learning. For this class of networks, we present a generalization bound, which is based on bounding the Rademacher complexity of hypothesis classes consisting of such deep networks via Dudley's integral. Remarkably, under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
△ Less
Submitted 13 August, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Unfolding recurrence by Green's functions for optimized reservoir computing
Authors:
Sandra Nestler,
Christian Keup,
David Dahmen,
Matthieu Gilson,
Holger Rauhut,
Moritz Helias
Abstract:
Cortical networks are strongly recurrent, and neurons have intrinsic temporal dynamics. This sets them apart from deep feed-forward networks. Despite the tremendous progress in the application of feed-forward networks and their theoretical understanding, it remains unclear how the interplay of recurrence and non-linearities in recurrent cortical networks contributes to their function. The purpose…
▽ More
Cortical networks are strongly recurrent, and neurons have intrinsic temporal dynamics. This sets them apart from deep feed-forward networks. Despite the tremendous progress in the application of feed-forward networks and their theoretical understanding, it remains unclear how the interplay of recurrence and non-linearities in recurrent cortical networks contributes to their function. The purpose of this work is to present a solvable recurrent network model that links to feed forward networks. By perturbative methods we transform the time-continuous, recurrent dynamics into an effective feed-forward structure of linear and non-linear temporal kernels. The resulting analytical expressions allow us to build optimal time-series classifiers from random reservoir networks. Firstly, this allows us to optimize not only the readout vectors, but also the input projection, demonstrating a strong potential performance gain. Secondly, the analysis exposes how the second order stimulus statistics is a crucial element that interacts with the non-linearity of the dynamics and boosts performance.
△ Less
Submitted 14 October, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Overparameterization and generalization error: weighted trigonometric interpolation
Authors:
Yuege Xie,
Hung-Hsu Chou,
Holger Rauhut,
Rachel Ward
Abstract:
Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem. We study a random Fourier series model, where the task is to estimate the unknown Fourier coefficients f…
▽ More
Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem. We study a random Fourier series model, where the task is to estimate the unknown Fourier coefficients from equidistant samples. We derive exact expressions for the generalization error of both plain and weighted least squares estimators. We show precisely how a bias towards smooth interpolants, in the form of weighted trigonometric interpolation, can lead to smaller generalization error in the overparameterized regime compared to the underparameterized regime. This provides insight into the power of overparameterization, which is common in modern machine learning.
△ Less
Submitted 27 October, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Sparse recovery in bounded Riesz systems with applications to numerical methods for PDEs
Authors:
Simone Brugiapaglia,
Sjoerd Dirksen,
Hans Christian Jung,
Holger Rauhut
Abstract:
We study sparse recovery with structured random measurement matrices having independent, identically distributed, and uniformly bounded rows and with a nontrivial covariance structure. This class of matrices arises from random sampling of bounded Riesz systems and generalizes random partial Fourier matrices. Our main result improves the currently available results for the null space and restricted…
▽ More
We study sparse recovery with structured random measurement matrices having independent, identically distributed, and uniformly bounded rows and with a nontrivial covariance structure. This class of matrices arises from random sampling of bounded Riesz systems and generalizes random partial Fourier matrices. Our main result improves the currently available results for the null space and restricted isometry properties of such random matrices. The main novelty of our analysis is a new upper bound for the expectation of the supremum of a Bernoulli process associated with a restricted isometry constant. We apply our result to prove new performance guarantees for the CORSING method, a recently introduced numerical approximation technique for partial differential equations (PDEs) based on compressive sensing.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers
Authors:
Bubacarr Bah,
Holger Rauhut,
Ulrich Terstiege,
Michael Westdickenberg
Abstract:
We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Ri…
▽ More
We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.
△ Less
Submitted 15 October, 2020; v1 submitted 12 October, 2019;
originally announced October 2019.
-
On the geometry of polytopes generated by heavy-tailed random vectors
Authors:
Olivier Guédon,
Felix Krahmer,
Christian Kümmerle,
Shahar Mendelson,
Holger Rauhut
Abstract:
We study the geometry of centrally-symmetric random polytopes, generated by $N$ independent copies of a random vector $X$ taking values in $\mathbb{R}^n$. We show that under minimal assumptions on $X$, for $N \gtrsim n$ and with high probability, the polytope contains a deterministic set that is naturally associated with the random vector---namely, the polar of a certain floating body. This solves…
▽ More
We study the geometry of centrally-symmetric random polytopes, generated by $N$ independent copies of a random vector $X$ taking values in $\mathbb{R}^n$. We show that under minimal assumptions on $X$, for $N \gtrsim n$ and with high probability, the polytope contains a deterministic set that is naturally associated with the random vector---namely, the polar of a certain floating body. This solves the long-standing question on whether such a random polytope contains a canonical body. Moreover, by identifying the floating bodies associated with various random vectors we recover the estimates that have been obtained previously, and thanks to the minimal assumptions on $X$ we derive estimates in cases that had been out of reach, involving random polytopes generated by heavy-tailed random vectors (e.g., when $X$ is $q$-stable or when $X$ has an unconditional structure). Finally, the structural results are used for the study of a fundamental question in compressive sensing---noise blind sparse recovery.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Jointly Low-Rank and Bisparse Recovery: Questions and Partial Answers
Authors:
Simon Foucart,
Rémi Gribonval,
Laurent Jacques,
Holger Rauhut
Abstract:
We investigate the problem of recovering jointly $r$-rank and $s$-bisparse matrices from as few linear measurements as possible, considering arbitrary measurements as well as rank-one measurements. In both cases, we show that $m \asymp r s \ln(en/s)$ measurements make the recovery possible in theory, meaning via a nonpractical algorithm. In case of arbitrary measurements, we investigate the possib…
▽ More
We investigate the problem of recovering jointly $r$-rank and $s$-bisparse matrices from as few linear measurements as possible, considering arbitrary measurements as well as rank-one measurements. In both cases, we show that $m \asymp r s \ln(en/s)$ measurements make the recovery possible in theory, meaning via a nonpractical algorithm. In case of arbitrary measurements, we investigate the possibility of achieving practical recovery via an iterative-hard-thresholding algorithm when $m \asymp r s^γ\ln(en/s)$ for some exponent $γ> 0$. We show that this is feasible for $γ= 2$, and that the proposed analysis cannot cover the case $γ\leq 1$. The precise value of the optimal exponent $γ\in [1,2]$ is the object of a question, raised but unresolved in this paper, about head projections for the jointly low-rank and bisparse structure. Some related questions are partially answered in passing. For rank-one measurements, we suggest on arcane grounds an iterative-hard-thresholding algorithm modified to exploit the nonstandard restricted isometry property obeyed by this type of measurements.
△ Less
Submitted 23 October, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
A Quotient Property for Matrices with Heavy-Tailed Entries and its Application to Noise-Blind Compressed Sensing
Authors:
Felix Krahmer,
Christian Kümmerle,
Holger Rauhut
Abstract:
For a large class of random matrices $A$ with i.i.d. entries we show that the $\ell_1$-quotient property holds with probability exponentially close to 1. In contrast to previous results, our analysis does not require concentration of the entrywise distributions. We provide a unified proof that recovers corresponding previous results for (sub-)Gaussian and Weibull distributions. Our findings genera…
▽ More
For a large class of random matrices $A$ with i.i.d. entries we show that the $\ell_1$-quotient property holds with probability exponentially close to 1. In contrast to previous results, our analysis does not require concentration of the entrywise distributions. We provide a unified proof that recovers corresponding previous results for (sub-)Gaussian and Weibull distributions. Our findings generalize known results on the geometry of random polytopes, providing lower bounds on the size of the largest Euclidean ball contained in the centrally symmetric polytope spanned by the columns of $A$. At the same time, our results establish robustness of noise-blind $\ell_1$-decoders for recovering sparse vectors $x$ from underdetermined, noisy linear measurements $y = Ax + w$ under the weakest possible assumptions on the entrywise distributions that allow for recovery with optimal sample complexity even in the noiseless case. Our analysis predicts superior robustness behavior for measurement matrices with super-Gaussian entries, which we confirm by numerical experiments.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
One-bit compressed sensing with partial Gaussian circulant matrices
Authors:
Sjoerd Dirksen,
Hans Christian Jung,
Holger Rauhut
Abstract:
In this paper we consider memoryless one-bit compressed sensing with randomly subsampled Gaussian circulant matrices. We show that in a small sparsity regime and for small enough accuracy $δ$, $m\sim δ^{-4} s\log(N/sδ)$ measurements suffice to reconstruct the direction of any $s$-sparse vector up to accuracy $δ$ via an efficient program. We derive this result by proving that partial Gaussian circu…
▽ More
In this paper we consider memoryless one-bit compressed sensing with randomly subsampled Gaussian circulant matrices. We show that in a small sparsity regime and for small enough accuracy $δ$, $m\sim δ^{-4} s\log(N/sδ)$ measurements suffice to reconstruct the direction of any $s$-sparse vector up to accuracy $δ$ via an efficient program. We derive this result by proving that partial Gaussian circulant matrices satisfy an $\ell_1/\ell_2$ RIP-property. Under a slightly worse dependence on $δ$, we establish stability with respect to approximate sparsity, as well as full vector recovery results.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Masked Toeplitz covariance estimation
Authors:
Maryia Kabanava,
Holger Rauhut
Abstract:
The problem of estimating the covariance matrix $Σ$ of a $p$-variate distribution based on its $n$ observations arises in many data analysis contexts. While for $n>p$, the classical sample covariance matrix $\hatΣ_n$ is a good estimator for $Σ$, it fails in the high-dimensional setting when $n\ll p$. In this scenario one requires prior knowledge about the structure of the covariance matrix in orde…
▽ More
The problem of estimating the covariance matrix $Σ$ of a $p$-variate distribution based on its $n$ observations arises in many data analysis contexts. While for $n>p$, the classical sample covariance matrix $\hatΣ_n$ is a good estimator for $Σ$, it fails in the high-dimensional setting when $n\ll p$. In this scenario one requires prior knowledge about the structure of the covariance matrix in order to construct reasonable estimators. Under the common assumption that $Σ$ is sparse, a refined estimator is given by $M\cdot\hatΣ_n$, where $M$ is a suitable symmetric mask matrix indicating the nonzero entries of $Σ$ and $\cdot$ denotes the entrywise product of matrices. In the present work we assume that $Σ$ has Toeplitz structure corresponding to stationary signals. This suggests to average the sample covariance $\hatΣ_n$ over the diagonals in order to obtain an estimator $\tildeΣ_n$ of Toeplitz structure. Assuming in addition that $Σ$ is sparse suggests to study estimators of the form $M\cdot\tildeΣ_n$. For Gaussian random vectors and, more generally, random vectors satisfying the convex concentration property, our main result bounds the estimation error in terms of $n$ and $p$ and shows that accurate estimation is indeed possible when $n \ll p$. The new bound significantly generalizes previous results by Cai, Ren and Zhou and provides an alternative proof. Our analysis exploits the connection between the spectral norm of a Toeplitz matrix and the supremum norm of the corresponding spectral density function.
△ Less
Submitted 27 September, 2017;
originally announced September 2017.
-
Multi-level Compressed Sensing Petrov-Galerkin discretization of high-dimensional parametric PDEs
Authors:
Jean-Luc Bouchot,
Holger Rauhut,
Christoph Schwab
Abstract:
We analyze a novel multi-level version of a recently introduced compressed sensing (CS) Petrov-Galerkin (PG) method from [H. Rauhut and Ch. Schwab: Compressive Sensing Petrov-Galerkin approximation of high-dimensional parametric operator equations, Math. Comp. 304(2017) 661-700] for the solution of many-parametric partial differential equations. We propose to use multi-level PG discretizations, ba…
▽ More
We analyze a novel multi-level version of a recently introduced compressed sensing (CS) Petrov-Galerkin (PG) method from [H. Rauhut and Ch. Schwab: Compressive Sensing Petrov-Galerkin approximation of high-dimensional parametric operator equations, Math. Comp. 304(2017) 661-700] for the solution of many-parametric partial differential equations. We propose to use multi-level PG discretizations, based on a hierarchy of nested finite dimensional subspaces, and to reconstruct parametric solutions at each level from level-dependent random samples of the high-dimensional parameter space via CS methods such as weighted l1-minimization. For affine parametric, linear operator equations, we prove that our approach allows to approximate the parametric solution with (almost) optimal convergence order as specified by certain summability properties of the coefficient sequence in a general polynomial chaos expansion of the parametric solution and by the convergence order of the PG discretization in the physical variables. The computations of the parameter samples of the PDE solution is "embarrassingly parallel", as in Monte-Carlo Methods. Contrary to other recent approaches, and as already noted in [A. Doostan and H. Owhadi: A non-adapted sparse approximation of PDEs with stochastic inputs. JCP 230(2011) 3015-3034] the optimality of the computed approximations does not require a-priori assumptions on ordering and structure of the index sets of the largest gpc coefficients (such as the "downward closed" property). We prove that under certain assumptions work versus accuracy of the new algorithms is asymptotically equal to that of one PG solve for the corresponding nominal problem on the finest discretization level up to a constant.
△ Less
Submitted 15 December, 2017; v1 submitted 6 January, 2017;
originally announced January 2017.
-
Low-rank matrix recovery via rank one tight frame measurements
Authors:
Holger Rauhut,
Ulrich Terstiege
Abstract:
The task of reconstructing a low rank matrix from incomplete linear measurements arises in areas such as machine learning, quantum state tomography and in the phase retrieval problem. In this note, we study the particular setup that the measurements are taken with respect to rank one matrices constructed from the elements of a random tight frame. We consider a convex optimization approach and show…
▽ More
The task of reconstructing a low rank matrix from incomplete linear measurements arises in areas such as machine learning, quantum state tomography and in the phase retrieval problem. In this note, we study the particular setup that the measurements are taken with respect to rank one matrices constructed from the elements of a random tight frame. We consider a convex optimization approach and show both robustness of the reconstruction with respect to noise on the measurements as well as stability with respect to passing to approximately low rank matrices. This is achieved by establishing a version of the null space property of the corresponding measurement map.
△ Less
Submitted 9 December, 2016;
originally announced December 2016.
-
Improved bounds for sparse recovery from subsampled random convolutions
Authors:
Shahar Mendelson,
Holger Rauhut,
Rachel Ward
Abstract:
We study the recovery of sparse vectors from subsampled random convolutions via $\ell_1$-minimization. We consider the setup in which both the subsampling locations as well as the generating vector are chosen at random. For a subgaussian generator with independent entries, we improve previously known estimates: if the sparsity $s$ is small enough, i.e., $s \lesssim \sqrt{n/\log(n)}$, we show that…
▽ More
We study the recovery of sparse vectors from subsampled random convolutions via $\ell_1$-minimization. We consider the setup in which both the subsampling locations as well as the generating vector are chosen at random. For a subgaussian generator with independent entries, we improve previously known estimates: if the sparsity $s$ is small enough, i.e., $s \lesssim \sqrt{n/\log(n)}$, we show that $m \gtrsim s \log(en/s)$ measurements are sufficient to recover $s$-sparse vectors in dimension $n$ with high probability, matching the well-known condition for recovery from standard Gaussian measurements. If $s$ is larger, then essentially $m \geq s \log^2(s) \log(\log(s)) \log(n)$ measurements are sufficient, again improving over previous estimates. Our results are shown via the so-called robust null space property which is weaker than the standard restricted isometry property. Our method of proof involves a novel combination of small ball estimates with chaining techniques {which should be of independent interest.
△ Less
Submitted 23 March, 2018; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Low rank tensor recovery via iterative hard thresholding
Authors:
Holger Rauhut,
Reinhold Schneider,
Zeljka Stojanac
Abstract:
We study extensions of compressive sensing and low rank matrix recovery (matrix completion) to the recovery of low rank tensors of higher order from a small number of linear measurements. While the theoretical understanding of low rank matrix recovery is already well-developed, only few contributions on the low rank tensor recovery problem are available so far. In this paper, we introduce versions…
▽ More
We study extensions of compressive sensing and low rank matrix recovery (matrix completion) to the recovery of low rank tensors of higher order from a small number of linear measurements. While the theoretical understanding of low rank matrix recovery is already well-developed, only few contributions on the low rank tensor recovery problem are available so far. In this paper, we introduce versions of the iterative hard thresholding algorithm for several tensor decompositions, namely the higher order singular value decomposition (HOSVD), the tensor train format (TT), and the general hierarchical Tucker decomposition (HT). We provide a partial convergence result for these algorithms which is based on a variant of the restricted isometry property of the measurement operator adapted to the tensor decomposition at hand that induces a corresponding notion of tensor rank. We show that subgaussian measurement ensembles satisfy the tensor restricted isometry property with high probability under a certain almost optimal bound on the number of measurements which depends on the corresponding tensor format. These bounds are extended to partial Fourier maps combined with random sign flips of the tensor entries. Finally, we illustrate the performance of iterative hard thresholding methods for tensor recovery via numerical experiments where we consider recovery from Gaussian random measurements, tensor completion (recovery of missing entries), and Fourier measurements for third order tensors.
△ Less
Submitted 16 February, 2016;
originally announced February 2016.
-
Conjugate gradient acceleration of iteratively re-weighted least squares methods
Authors:
Massimo Fornasier,
Steffen Peter,
Holger Rauhut,
Stephan Worm
Abstract:
Iteratively Re-weighted Least Squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear…
▽ More
Iteratively Re-weighted Least Squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear algebra. Its general scope and its usually simple implementation, transforming the initial non-convex and non-smooth minimization problem into a more familiar and easily solvable quadratic optimization problem, make it a versatile algorithm. However, despite its simplicity, versatility, and elegant analysis, the complexity of IRLS strongly depends on the way the solution of the successive quadratic optimizations is addressed. For the important special case of $\textit{compressed sensing}$ and sparse recovery problems in signal processing, we investigate theoretically and numerically how accurately one needs to solve the quadratic problems by means of the $\textit{conjugate gradient}$ (CG) method in each iteration in order to guarantee convergence. The use of the CG method may significantly speed-up the numerical solution of the quadratic subproblems, in particular, when fast matrix-vector multiplication (exploiting for instance the FFT) is available for the matrix involved. In addition, we study convergence rates. Our modified IRLS method outperforms state of the art first order methods such as Iterative Hard Thresholding (IHT) or Fast Iterative Soft-Thresholding Algorithm (FISTA) in many situations, especially in large dimensions. Moreover, IRLS is often able to recover sparse vectors from fewer measurements than required for IHT and FISTA.
△ Less
Submitted 23 February, 2016; v1 submitted 14 September, 2015;
originally announced September 2015.
-
Refined analysis of sparse MIMO radar
Authors:
Dominik Dorsch,
Holger Rauhut
Abstract:
We analyze a multiple-input multiple-output (MIMO) radar model and provide recovery results for a compressed sensing (CS) approach. In MIMO radar different pulses are emitted by several transmitters and the echoes are recorded at several receiver nodes. Under reasonable assumptions the transformation from emitted pulses to the received echoes can approximately be regarded as linear. For the consid…
▽ More
We analyze a multiple-input multiple-output (MIMO) radar model and provide recovery results for a compressed sensing (CS) approach. In MIMO radar different pulses are emitted by several transmitters and the echoes are recorded at several receiver nodes. Under reasonable assumptions the transformation from emitted pulses to the received echoes can approximately be regarded as linear. For the considered model, and many radar tasks in general, sparsity of targets within the considered angle-range-Doppler domain is a natural assumption. Therefore, it is possible to apply methods from CS in order to reconstruct the parameters of the targets. Assuming Gaussian random pulses the resulting measurement matrix becomes a highly structured random matrix. Our first main result provides an estimate for the well-known restricted isometry property (RIP) ensuring stable and robust recovery. We require more measurements than standard results from CS, like for example those for Gaussian random measurements. Nevertheless, we show that due to the special structure of the considered measurement matrix our RIP result is in fact optimal (up to possibly logarithmic factors). Our further two main results on nonuniform recovery (i.e., for a fixed sparse target scene) reveal how the fine structure of the support set affects the (nonuniform) recovery performance. We show that for certain "balanced" support sets reconstruction with essentially the optimal number of measurements is possible. We prove recovery results for both perfect recovery of the support set in case of exactly sparse vectors and an $\ell_2$-norm approximation result for reconstruction under sparsity defect.Our analysis complements earlier work by Strohmer & Friedlander and deepens the understanding of the considered MIMO radar model.
△ Less
Submitted 11 September, 2015;
originally announced September 2015.
-
Stable low-rank matrix recovery via null space properties
Authors:
Maryia Kabanava,
Richard Kueng,
Holger Rauhut,
Ulrich Terstiege
Abstract:
The problem of recovering a matrix of low rank from an incomplete and possibly noisy set of linear measurements arises in a number of areas. In order to derive rigorous recovery results, the measurement map is usually modeled probabilistically. We derive sufficient conditions on the minimal amount of measurements ensuring recovery via convex optimization. We establish our results via certain prope…
▽ More
The problem of recovering a matrix of low rank from an incomplete and possibly noisy set of linear measurements arises in a number of areas. In order to derive rigorous recovery results, the measurement map is usually modeled probabilistically. We derive sufficient conditions on the minimal amount of measurements ensuring recovery via convex optimization. We establish our results via certain properties of the null space of the measurement map. In the setting where the measurements are realized as Frobenius inner products with independent standard Gaussian random matrices we show that $10 r (n_1 + n_2)$ measurements are enough to uniformly and stably recover an $n_1 \times n_2$ matrix of rank at most $r$. We then significantly generalize this result by only requiring independent mean-zero, variance one entries with four finite moments at the cost of replacing $10$ by some universal constant. We also study the case of recovering Hermitian rank-$r$ matrices from measurement matrices proportional to rank-one projectors. For $m \geq C r n$ rank-one projective measurements onto independent standard Gaussian vectors, we show that nuclear norm minimization uniformly and stably reconstructs Hermitian rank-$r$ matrices with high probability. Next, we partially de-randomize this by establishing an analogous statement for projectors onto independent elements of a complex projective 4-designs at the cost of a slightly higher sampling rate $m \geq C rn \log n$. Moreover, if the Hermitian matrix to be recovered is known to be positive semidefinite, then we show that the nuclear norm minimization approach may be replaced by minimizing the $\ell_2$-norm of the residual subject to the positive semidefinite constraint. Then no estimate of the noise level is required a priori. We discuss applications in quantum physics and the phase retrieval problem.
△ Less
Submitted 26 July, 2015;
originally announced July 2015.
-
Tensor theta norms and low rank recovery
Authors:
Holger Rauhut,
Željka Stojanac
Abstract:
We study extensions of compressive sensing and low rank matrix recovery to the recovery of low rank tensors from incomplete linear information. While the reconstruction of low rank matrices via nuclear norm minimization is rather well-understand by now, almost no theory is available for the extension to higher order tensors due to various theoretical and computational difficulties arising for tens…
▽ More
We study extensions of compressive sensing and low rank matrix recovery to the recovery of low rank tensors from incomplete linear information. While the reconstruction of low rank matrices via nuclear norm minimization is rather well-understand by now, almost no theory is available for the extension to higher order tensors due to various theoretical and computational difficulties arising for tensor decompositions. In fact, nuclear norm minimization for matrix recovery is a tractable convex relaxation approach, but the extension of the nuclear norm to tensors is in general NP-hard to compute. In this article, we introduce convex relaxations of the tensor nuclear norm which are computable in polynomial time via semidefinite programming. Our approach is based on theta bodies, a concept from computational algebraic geometry similar to the Lasserre relaxations. We introduce polynomial ideals which are generated by the second order minors corresponding to different matricizations of the tensor (where the tensor entries are treated as variables) such that the nuclear norm ball is the convex hull of the algebraic variety of the ideal. The $k$-th theta body for such an ideal generates a new norm which we call the $θ_k$-norm. We show that in the matrix case, these norms reduce to the nuclear norm. For tensors of order $d \geq 3$ however, we obtain new norms. The sequence of the corresponding unit-$θ_k$-norm balls converges asymptotically to the unit tensor nuclear norm ball. By providing the Gröbner basis for the ideals, we explicitly give semidefinite programs for the computation of the $θ_k$-norm and for the minimization of the $θ_k$-norm under an affine constraint. Numerical experiments for order-3 tensor recovery via $θ_1$-norm minimization suggest that our approach successfully reconstructs tensors of low rank from incomplete linear (random) measurements.
△ Less
Submitted 14 February, 2017; v1 submitted 19 May, 2015;
originally announced May 2015.
-
Identification of Matrices having a Sparse Representation
Authors:
Götz E. Pfander,
Holger Rauhut,
Jared Tanner
Abstract:
We consider the problem of recovering a matrix from its action on a known vector in the setting where the matrix can be represented efficiently in a known matrix dictionary. Connections with sparse signal recovery allows for the use of efficient reconstruction techniques such as Basis Pursuit. Of particular interest is the dictionary of time-frequency shift matrices and its role for channel estima…
▽ More
We consider the problem of recovering a matrix from its action on a known vector in the setting where the matrix can be represented efficiently in a known matrix dictionary. Connections with sparse signal recovery allows for the use of efficient reconstruction techniques such as Basis Pursuit. Of particular interest is the dictionary of time-frequency shift matrices and its role for channel estimation and identification in communications engineering. We present recovery results for Basis Pursuit with the time-frequency shift dictionary and various dictionaries of random matrices.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.
-
On the gap between RIP-properties and sparse recovery conditions
Authors:
Sjoerd Dirksen,
Guillaume Lecué,
Holger Rauhut
Abstract:
We consider the problem of recovering sparse vectors from underdetermined linear measurements via $\ell_p$-constrained basis pursuit. Previous analyses of this problem based on generalized restricted isometry properties have suggested that two phenomena occur if $p\neq 2$. First, one may need substantially more than $s \log(en/s)$ measurements (optimal for $p=2$) for uniform recovery of all $s$-sp…
▽ More
We consider the problem of recovering sparse vectors from underdetermined linear measurements via $\ell_p$-constrained basis pursuit. Previous analyses of this problem based on generalized restricted isometry properties have suggested that two phenomena occur if $p\neq 2$. First, one may need substantially more than $s \log(en/s)$ measurements (optimal for $p=2$) for uniform recovery of all $s$-sparse vectors. Second, the matrix that achieves recovery with the optimal number of measurements may not be Gaussian (as for $p=2$). We present a new, direct analysis which shows that in fact neither of these phenomena occur. Via a suitable version of the null space property we show that a standard Gaussian matrix provides $\ell_q/\ell_1$-recovery guarantees for $\ell_p$-constrained basis pursuit in the optimal measurement regime. Our result extends to several heavier-tailed measurement matrices. As an application, we show that one can obtain a consistent reconstruction from uniform scalar quantized measurements in the optimal measurement regime.
△ Less
Submitted 20 April, 2015;
originally announced April 2015.
-
Low rank matrix recovery from rank one measurements
Authors:
Richard Kueng,
Holger Rauhut,
Ulrich Terstiege
Abstract:
We study the recovery of Hermitian low rank matrices $X \in \mathbb{C}^{n \times n}$ from undersampled measurements via nuclear norm minimization. We consider the particular scenario where the measurements are Frobenius inner products with random rank-one matrices of the form $a_j a_j^*$ for some measurement vectors $a_1,...,a_m$, i.e., the measurements are given by…
▽ More
We study the recovery of Hermitian low rank matrices $X \in \mathbb{C}^{n \times n}$ from undersampled measurements via nuclear norm minimization. We consider the particular scenario where the measurements are Frobenius inner products with random rank-one matrices of the form $a_j a_j^*$ for some measurement vectors $a_1,...,a_m$, i.e., the measurements are given by $y_j = \mathrm{tr}(X a_j a_j^*)$. The case where the matrix $X=x x^*$ to be recovered is of rank one reduces to the problem of phaseless estimation (from measurements, $y_j = |\langle x,a_j\rangle|^2$ via the PhaseLift approach, which has been introduced recently. We derive bounds for the number $m$ of measurements that guarantee successful uniform recovery of Hermitian rank $r$ matrices, either for the vectors $a_j$, $j=1,...,m$, being chosen independently at random according to a standard Gaussian distribution, or $a_j$ being sampled independently from an (approximate) complex projective $t$-design with $t=4$. In the Gaussian case, we require $m \geq C r n$ measurements, while in the case of $4$-designs we need $m \geq Cr n \log(n)$. Our results are uniform in the sense that one random choice of the measurement vectors $a_j$ guarantees recovery of all rank $r$-matrices simultaneously with high probability. Moreover, we prove robustness of recovery under perturbation of the measurements by noise. The result for approximate $4$-designs generalizes and improves a recent bound on phase retrieval due to Gross, Kueng and Krahmer. In addition, it has applications in quantum state tomography. Our proofs employ the so-called bowling scheme which is based on recent ideas by Mendelson and Koltchinskii.
△ Less
Submitted 25 October, 2014;
originally announced October 2014.
-
Compressive sensing Petrov-Galerkin approximation of high-dimensional parametric operator equations
Authors:
Holger Rauhut,
Christoph Schwab
Abstract:
We analyze the convergence of compressive sensing based sampling techniques for the efficient evaluation of functionals of solutions for a class of high-dimensional, affine-parametric, linear operator equations which depend on possibly infinitely many parameters. The proposed algorithms are based on so-called "non-intrusive" sampling of the high-dimensional parameter space, reminiscent of Monte-Ca…
▽ More
We analyze the convergence of compressive sensing based sampling techniques for the efficient evaluation of functionals of solutions for a class of high-dimensional, affine-parametric, linear operator equations which depend on possibly infinitely many parameters. The proposed algorithms are based on so-called "non-intrusive" sampling of the high-dimensional parameter space, reminiscent of Monte-Carlo sampling. In contrast to Monte-Carlo, however, a functional of the parametric solution is then computed via compressive sensing methods from samples of functionals of the solution. A key ingredient in our analysis of independent interest consists in a generalization of recent results on the approximate sparsity of generalized polynomial chaos representations (gpc) of the parametric solution families, in terms of the gpc series with respect to tensorized Chebyshev polynomials. In particular, we establish sufficient conditions on the parametric inputs to the parametric operator equation such that the Chebyshev coefficients of the gpc expansion are contained in certain weighted $\ell_p$-spaces for $0<p\leq 1$. Based on this we show that reconstructions of the parametric solutions computed from the sampled problems converge, with high probability, at the $L_2$, resp. $L_\infty$ convergence rates afforded by best $s$-term approximations of the parametric solution up to logarithmic factors.
△ Less
Submitted 21 September, 2015; v1 submitted 18 October, 2014;
originally announced October 2014.
-
Uniform recovery of fusion frame structured sparse signals
Authors:
Ulaş Ayaz,
Sjoerd Dirksen,
Holger Rauhut
Abstract:
We consider the problem of recovering fusion frame sparse signals from incomplete measurements. These signals are composed of a small number of nonzero blocks taken from a family of subspaces. First, we show that, by using a-priori knowledge of a coherence parameter associated with the angles between the subspaces, one can uniformly recover fusion frame sparse signals with a significantly reduced…
▽ More
We consider the problem of recovering fusion frame sparse signals from incomplete measurements. These signals are composed of a small number of nonzero blocks taken from a family of subspaces. First, we show that, by using a-priori knowledge of a coherence parameter associated with the angles between the subspaces, one can uniformly recover fusion frame sparse signals with a significantly reduced number of vector-valued (sub-)Gaussian measurements via mixed l^1/l^2-minimization. We prove this by establishing an appropriate version of the restricted isometry property. Our result complements previous nonuniform recovery results in this context, and provides stronger stability guarantees for noisy measurements and approximately sparse signals. Second, we determine the minimal number of scalar-valued measurements needed to uniformly recover all fusion frame sparse signals via mixed l^1/l^2-minimization. This bound is achieved by scalar-valued subgaussian measurements. In particular, our result shows that the number of scalar-valued subgaussian measurements cannot be further reduced using knowledge of the coherence parameter. As a special case it implies that the best known uniform recovery result for block sparse signals using subgaussian measurements is optimal.
△ Less
Submitted 29 July, 2014;
originally announced July 2014.
-
Robust analysis $\ell_1$-recovery from Gaussian measurements and total variation minimization
Authors:
Maryia Kabanava,
Holger Rauhut,
Hui Zhang
Abstract:
Analysis $\ell_1$-recovery refers to a technique of recovering a signal that is sparse in some transform domain from incomplete corrupted measurements. This includes total variation minimization as an important special case when the transform domain is generated by a difference operator. In the present paper we provide a bound on the number of Gaussian measurements required for successful recovery…
▽ More
Analysis $\ell_1$-recovery refers to a technique of recovering a signal that is sparse in some transform domain from incomplete corrupted measurements. This includes total variation minimization as an important special case when the transform domain is generated by a difference operator. In the present paper we provide a bound on the number of Gaussian measurements required for successful recovery for total variation and for the case that the analysis operator is a frame. The bounds are particularly suitable when the sparsity of the analysis representation of the signal is not very small.
△ Less
Submitted 27 April, 2015; v1 submitted 28 July, 2014;
originally announced July 2014.
-
Tensor completion in hierarchical tensor representations
Authors:
Holger Rauhut,
Reinhold Schneider,
Zeljka Stojanac
Abstract:
Compressed sensing extends from the recovery of sparse vectors from undersampled measurements via efficient algorithms to the recovery of matrices of low rank from incomplete information. Here we consider a further extension to the reconstruction of tensors of low multi-linear rank in recently introduced hierarchical tensor formats from a small number of measurements. Hierarchical tensors are a fl…
▽ More
Compressed sensing extends from the recovery of sparse vectors from undersampled measurements via efficient algorithms to the recovery of matrices of low rank from incomplete information. Here we consider a further extension to the reconstruction of tensors of low multi-linear rank in recently introduced hierarchical tensor formats from a small number of measurements. Hierarchical tensors are a flexible generalization of the well-known Tucker representation, which have the advantage that the number of degrees of freedom of a low rank tensor does not scale exponentially with the order of the tensor. While corresponding tensor decompositions can be computed efficiently via successive applications of (matrix) singular value decompositions, some important properties of the singular value decomposition do not extend from the matrix to the tensor case. This results in major computational and theoretical difficulties in designing and analyzing algorithms for low rank tensor recovery. For instance, a canonical analogue of the tensor nuclear norm is NP-hard to compute in general, which is in stark contrast to the matrix case. In this book chapter we consider versions of iterative hard thresholding schemes adapted to hierarchical tensor formats. A variant builds on methods from Riemannian optimization and uses a retraction map** from the tangent space of the manifold of low rank tensors back to this manifold. We provide first partial convergence results based on a tensor version of the restricted isometry property (TRIP) of the measurement map. Moreover, an estimate of the number of measurements is provided that ensures the TRIP of a given tensor rank with high probability for Gaussian measurement maps.
△ Less
Submitted 3 November, 2014; v1 submitted 15 April, 2014;
originally announced April 2014.
-
Structured random measurements in signal processing
Authors:
Felix Krahmer,
Holger Rauhut
Abstract:
Compressed sensing and its extensions have recently triggered interest in randomized signal acquisition. A key finding is that random measurements provide sparse signal reconstruction guarantees for efficient and stable algorithms with a minimal number of samples. While this was first shown for (unstructured) Gaussian random measurement matrices, applications require certain structure of the measu…
▽ More
Compressed sensing and its extensions have recently triggered interest in randomized signal acquisition. A key finding is that random measurements provide sparse signal reconstruction guarantees for efficient and stable algorithms with a minimal number of samples. While this was first shown for (unstructured) Gaussian random measurement matrices, applications require certain structure of the measurements leading to structured random measurement matrices. Near optimal recovery guarantees for such structured measurements have been developed over the past years in a variety of contexts. This article surveys the theory in three scenarios: compressed sensing (sparse recovery), low rank matrix recovery, and phaseless estimation. The random measurement matrices to be considered include random partial Fourier matrices, partial random circulant matrices (subsampled convolutions), matrix completion, and phase estimation from magnitudes of Fourier type measurements. The article concludes with a brief discussion of the mathematical techniques for the analysis of such structured random measurements.
△ Less
Submitted 6 July, 2014; v1 submitted 6 January, 2014;
originally announced January 2014.
-
Interpolation via weighted $l_1$ minimization
Authors:
Holger Rauhut,
Rachel Ward
Abstract:
Functions of interest are often smooth and sparse in some sense, and both priors should be taken into account when interpolating sampled data. Classical linear interpolation methods are effective under strong regularity assumptions, but cannot incorporate nonlinear sparsity structure. At the same time, nonlinear methods such as $l_1$ minimization can reconstruct sparse functions from very few samp…
▽ More
Functions of interest are often smooth and sparse in some sense, and both priors should be taken into account when interpolating sampled data. Classical linear interpolation methods are effective under strong regularity assumptions, but cannot incorporate nonlinear sparsity structure. At the same time, nonlinear methods such as $l_1$ minimization can reconstruct sparse functions from very few samples, but do not necessarily encourage smoothness. Here we show that weighted $l_1$ minimization effectively merges the two approaches, promoting both sparsity and smoothness in reconstruction. More precisely, we provide specific choices of weights in the $l_1$ objective to achieve rates for functions with coefficient sequences in weighted $l_p$ spaces, $p<=1$. We consider the implications of these results for spherical harmonic and polynomial interpolation, in the univariate and multivariate setting. Along the way, we extend concepts from compressive sensing such as the restricted isometry property and null space property to accommodate weighted sparse expansions; these developments should be of independent interest in the study of structured sparse approximations and continuous-time compressive sensing problems.
△ Less
Submitted 26 March, 2015; v1 submitted 3 August, 2013;
originally announced August 2013.
-
Analysis $\ell_1$-recovery with frames and Gaussian measurements
Authors:
Holger Rauhut,
Maryia Kabanava
Abstract:
This paper provides novel results for the recovery of signals from undersampled measurements based on analysis $\ell_1$-minimization, when the analysis operator is given by a frame. We both provide so-called uniform and nonuniform recovery guarantees for cosparse (analysis-sparse) signals using Gaussian random measurement matrices. The nonuniform result relies on a recovery condition via tangent c…
▽ More
This paper provides novel results for the recovery of signals from undersampled measurements based on analysis $\ell_1$-minimization, when the analysis operator is given by a frame. We both provide so-called uniform and nonuniform recovery guarantees for cosparse (analysis-sparse) signals using Gaussian random measurement matrices. The nonuniform result relies on a recovery condition via tangent cones and the uniform recovery guarantee is based on an analysis version of the null space property. Examining these conditions for Gaussian random matrices leads to precise bounds on the number of measurements required for successful recovery. In the special case of standard sparsity, our result improves a bound due to Rudelson and Vershynin concerning the exact reconstruction of sparse signals from Gaussian measurements with respect to the constant and extends it to stability under passing to approximately sparse signals and to robustness under noise on the measurements.
△ Less
Submitted 3 November, 2014; v1 submitted 6 June, 2013;
originally announced June 2013.
-
Fast and RIP-optimal transforms
Authors:
Nir Ailon,
Holger Rauhut
Abstract:
We study constructions of $k \times n$ matrices $A$ that both (1) satisfy the restricted isometry property (RIP) at sparsity $s$ with optimal parameters, and (2) are efficient in the sense that only $O(n\log n)$ operations are required to compute $Ax$ given a vector $x$. Our construction is based on repeated application of independent transformations of the form $DH$, where $H$ is a Hadamard or Fo…
▽ More
We study constructions of $k \times n$ matrices $A$ that both (1) satisfy the restricted isometry property (RIP) at sparsity $s$ with optimal parameters, and (2) are efficient in the sense that only $O(n\log n)$ operations are required to compute $Ax$ given a vector $x$. Our construction is based on repeated application of independent transformations of the form $DH$, where $H$ is a Hadamard or Fourier transform and $D$ is a diagonal matrix with random $\{+1,-1\}$ elements on the diagonal, followed by any $k \times n$ matrix of orthonormal rows (e.g.\ selection of $k$ coordinates). We provide guarantees (1) and (2) for a larger regime of parameters for which such constructions were previously unknown. Additionally, our construction does not suffer from the extra poly-logarithmic factor multiplying the number of observations $k$ as a function of the sparsity $s$, as present in the currently best known RIP estimates for partial random Fourier matrices and other classes of structured random matrices.
△ Less
Submitted 17 February, 2013; v1 submitted 5 January, 2013;
originally announced January 2013.
-
Suprema of Chaos Processes and the Restricted Isometry Property
Authors:
Felix Krahmer,
Shahar Mendelson,
Holger Rauhut
Abstract:
We present a new bound for suprema of a special type of chaos processes indexed by a set of matrices, which is based on a chaining method. As applications we show significantly improved estimates for the restricted isometry constants of partial random circulant matrices and time-frequency structured random matrices. In both cases the required condition on the number $m$ of rows in terms of the spa…
▽ More
We present a new bound for suprema of a special type of chaos processes indexed by a set of matrices, which is based on a chaining method. As applications we show significantly improved estimates for the restricted isometry constants of partial random circulant matrices and time-frequency structured random matrices. In both cases the required condition on the number $m$ of rows in terms of the sparsity $s$ and the vector length $n$ is $m \gtrsim s \log^2 s \log^2 n$.
△ Less
Submitted 19 September, 2013; v1 submitted 1 July, 2012;
originally announced July 2012.
-
Remote sensing via $\ell_1$ minimization
Authors:
Max Hügel,
Holger Rauhut,
Thomas Strohmer
Abstract:
We consider the problem of detecting the locations of targets in the far field by sending probing signals from an antenna array and recording the reflected echoes. Drawing on key concepts from the area of compressive sensing, we use an $\ell_1$-based regularization approach to solve this, in general ill-posed, inverse scattering problem. As common in compressed sensing, we exploit randomness, whic…
▽ More
We consider the problem of detecting the locations of targets in the far field by sending probing signals from an antenna array and recording the reflected echoes. Drawing on key concepts from the area of compressive sensing, we use an $\ell_1$-based regularization approach to solve this, in general ill-posed, inverse scattering problem. As common in compressed sensing, we exploit randomness, which in this context comes from choosing the antenna locations at random. With $n$ antennas we obtain $n^2$ measurements of a vector $x \in \C^{N}$ representing the target locations and reflectivities on a discretized grid. It is common to assume that the scene $x$ is sparse due to a limited number of targets. Under a natural condition on the mesh size of the grid, we show that an $s$-sparse scene can be recovered via $\ell_1$-minimization with high probability if $n^2 \geq C s \log^2(N)$. The reconstruction is stable under noise and under passing from sparse to approximately sparse vectors. Our theoretical findings are confirmed by numerical simulations.
△ Less
Submitted 24 April, 2013; v1 submitted 7 May, 2012;
originally announced May 2012.
-
The restricted isometry property for time-frequency structured random matrices
Authors:
Götz E. Pfander,
Holger Rauhut,
Joel A. Tropp
Abstract:
We establish the restricted isometry property for finite dimensional Gabor systems, that is, for families of time--frequency shifts of a randomly chosen window function. We show that the $s$-th order restricted isometry constant of the associated $n \times n^2$ Gabor synthesis matrix is small provided $s \leq c \, n^{2/3} / \log^2 n$. This improves on previous estimates that exhibit quadratic scal…
▽ More
We establish the restricted isometry property for finite dimensional Gabor systems, that is, for families of time--frequency shifts of a randomly chosen window function. We show that the $s$-th order restricted isometry constant of the associated $n \times n^2$ Gabor synthesis matrix is small provided $s \leq c \, n^{2/3} / \log^2 n$. This improves on previous estimates that exhibit quadratic scaling of $n$ in $s$. Our proof develops bounds for a corresponding chaos process.
△ Less
Submitted 16 June, 2011;
originally announced June 2011.
-
Sparse recovery for spherical harmonic expansions
Authors:
Holger Rauhut,
Rachel Ward
Abstract:
We show that sparse spherical harmonic expansions can be efficiently recovered from a small number of randomly chosen samples on the sphere. To establish the main result, we verify the restricted isometry property of an associated preconditioned random measurement matrix using recent estimates on the uniform growth of Jacobi polynomials.
We show that sparse spherical harmonic expansions can be efficiently recovered from a small number of randomly chosen samples on the sphere. To establish the main result, we verify the restricted isometry property of an associated preconditioned random measurement matrix using recent estimates on the uniform growth of Jacobi polynomials.
△ Less
Submitted 20 February, 2011;
originally announced February 2011.
-
Low-rank matrix recovery via iteratively reweighted least squares minimization
Authors:
Massimo Fornasier,
Holger Rauhut,
Rachel Ward
Abstract:
We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Sp…
▽ More
We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property known in the context of compressed sensing, the algorithm is guaranteed to recover iteratively any matrix with an error of the order of the best k-rank approximation. In certain relevant cases, for instance for the matrix completion problem, our version of this algorithm can take advantage of the Woodbury matrix identity, which allows to expedite the solution of the least squares problems required at each iteration. We present numerical experiments that confirm the robustness of the algorithm for the solution of matrix completion problems, and demonstrate its competitiveness with respect to other techniques proposed recently in the literature.
△ Less
Submitted 15 July, 2011; v1 submitted 12 October, 2010;
originally announced October 2010.
-
Restricted Isometries for Partial Random Circulant Matrices
Authors:
Holger Rauhut,
Justin Romberg,
Joel A. Tropp
Abstract:
In the theory of compressed sensing, restricted isometry analysis has become a standard tool for studying how efficiently a measurement matrix acquires information about sparse and compressible signals. Many recovery algorithms are known to succeed when the restricted isometry constants of the sampling matrix are small. Many potential applications of compressed sensing involve a data-acquisition p…
▽ More
In the theory of compressed sensing, restricted isometry analysis has become a standard tool for studying how efficiently a measurement matrix acquires information about sparse and compressible signals. Many recovery algorithms are known to succeed when the restricted isometry constants of the sampling matrix are small. Many potential applications of compressed sensing involve a data-acquisition process that proceeds by convolution with a random pulse followed by (nonrandom) subsampling. At present, the theoretical analysis of this measurement technique is lacking. This paper demonstrates that the $s$th order restricted isometry constant is small when the number $m$ of samples satisfies $m \gtrsim (s \log n)^{3/2}$, where $n$ is the length of the pulse. This bound improves on previous estimates, which exhibit quadratic scaling.
△ Less
Submitted 9 October, 2010;
originally announced October 2010.