-
Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures
Authors:
Peimeng Guan,
Naveed Iqbal,
Mark A. Davenport,
Mudassir Masood
Abstract:
Model-based deep learning methods such as loop unrolling (LU) and deep equilibrium model}(DEQ) extensions offer outstanding performance in solving inverse problems (IP). These methods unroll the optimization iterations into a sequence of neural networks that in effect learn a regularization function from data. While these architectures are currently state-of-the-art in numerous applications, their…
▽ More
Model-based deep learning methods such as loop unrolling (LU) and deep equilibrium model}(DEQ) extensions offer outstanding performance in solving inverse problems (IP). These methods unroll the optimization iterations into a sequence of neural networks that in effect learn a regularization function from data. While these architectures are currently state-of-the-art in numerous applications, their success heavily relies on the accuracy of the forward model. This assumption can be limiting in many physical applications due to model simplifications or uncertainties in the apparatus. To address forward model mismatch, we introduce an untrained forward model residual block within the model-based architecture to match the data consistency in the measurement domain for each instance. We propose two variants in well-known model-based architectures (LU and DEQ) and prove convergence under mild conditions. Our approach offers a unified solution that is less parameter-sensitive, requires no additional data, and enables simultaneous fitting of the forward model and reconstruction in a single pass, benefiting both linear and nonlinear inverse problems. The experiments show significant quality improvement in removing artifacts and preserving details across three distinct applications, encompassing both linear and nonlinear inverse problems. Moreover, we highlight reconstruction effectiveness in intermediate steps and showcase robustness to random initialization of the residual block and a higher number of iterations during evaluation. Code is available at \texttt{https://github.com/InvProbs/A-adaptive-model-based-methods}.
△ Less
Submitted 10 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
Authors:
Austin Xu,
Andrew D. McRae,
**gyan Wang,
Mark A. Davenport,
Ashwin Pananjady
Abstract:
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobi…
▽ More
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator. We present numerical simulations demonstrating the performance of the estimator and its notable properties.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
New Equivalences Between Interpolation and SVMs: Kernels and Structured Features
Authors:
Chiraag Kaushik,
Andrew D. McRae,
Mark A. Davenport,
Vidya Muthukumar
Abstract:
The support vector machine (SVM) is a supervised learning algorithm that finds a maximum-margin linear classifier, often after map** the data to a high-dimensional feature space via the kernel trick. Recent work has demonstrated that in certain sufficiently overparameterized settings, the SVM decision function coincides exactly with the minimum-norm label interpolant. This phenomenon of support…
▽ More
The support vector machine (SVM) is a supervised learning algorithm that finds a maximum-margin linear classifier, often after map** the data to a high-dimensional feature space via the kernel trick. Recent work has demonstrated that in certain sufficiently overparameterized settings, the SVM decision function coincides exactly with the minimum-norm label interpolant. This phenomenon of support vector proliferation (SVP) is especially interesting because it allows us to understand SVM performance by leveraging recent analyses of harmless interpolation in linear and kernel models. However, previous work on SVP has made restrictive assumptions on the data/feature distribution and spectrum. In this paper, we present a new and flexible analysis framework for proving SVP in an arbitrary reproducing kernel Hilbert space with a flexible class of generative models for the labels. We present conditions for SVP for features in the families of general bounded orthonormal systems (e.g. Fourier features) and independent sub-Gaussian features. In both cases, we show that SVP occurs in many interesting settings not covered by prior work, and we leverage these results to prove novel generalization results for kernel SVM classification.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem Solver
Authors:
Peimeng Guan,
Jihui **,
Justin Romberg,
Mark A. Davenport
Abstract:
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimization…
▽ More
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with a regularizer. Current state-of-the-art machine learning approaches draw inspiration from such techniques by unrolling the iterative updates for an optimization-based solver and then learning a regularizer from data. This loop unrolling (LU) method has shown tremendous success, but often requires a deep model for the best performance leading to high memory costs during training. Thus, to address the balance between computation cost and network expressiveness, we propose an LU algorithm with shallow equilibrium regularizers (LUSER). These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training. The proposed method is evaluated on image deblurring, computed tomography (CT), as well as single-coil Magnetic Resonance Imaging (MRI) tasks and shows similar, or even better, performance while requiring up to 8 times less computational resources during training when compared against a more typical LU architecture with feedforward convolutional regularizers.
△ Less
Submitted 13 October, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Learning Sinkhorn divergences for supervised change point detection
Authors:
Nauman Ahad,
Eva L. Dyer,
Keith B. Hengen,
Yao Xie,
Mark A. Davenport
Abstract:
Many modern applications require detecting change points in complex sequential data. Most existing methods for change point detection are unsupervised and, as a consequence, lack any information regarding what kind of changes we want to detect or if some kinds of changes are safe to ignore. This often results in poor change detection performance. We present a novel change point detection framework…
▽ More
Many modern applications require detecting change points in complex sequential data. Most existing methods for change point detection are unsupervised and, as a consequence, lack any information regarding what kind of changes we want to detect or if some kinds of changes are safe to ignore. This often results in poor change detection performance. We present a novel change point detection framework that uses true change point instances as supervision for learning a ground metric such that Sinkhorn divergences can be then used in two-sample tests on sliding windows to detect change points in an online manner. Our method can be used to learn a sparse metric which can be useful for both feature selection and interpretation in high-dimensional change point detection settings. Experiments on simulated as well as real world sequences show that our proposed method can substantially improve change point detection performance over existing unsupervised change point detection methods using only few labeled change point instances.
△ Less
Submitted 10 February, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Active metric learning and classification using similarity queries
Authors:
Namrata Nadagouda,
Austin Xu,
Mark A. Davenport
Abstract:
Active learning is commonly used to train label-efficient models by adaptively selecting the most informative queries. However, most active learning strategies are designed to either learn a representation of the data (e.g., embedding or metric learning) or perform well on a task (e.g., classification) on the data. However, many machine learning tasks involve a combination of both representation l…
▽ More
Active learning is commonly used to train label-efficient models by adaptively selecting the most informative queries. However, most active learning strategies are designed to either learn a representation of the data (e.g., embedding or metric learning) or perform well on a task (e.g., classification) on the data. However, many machine learning tasks involve a combination of both representation learning and a task-specific goal. Motivated by this, we propose a novel unified query framework that can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. Our approach builds on similarity or nearest neighbor (NN) queries which seek to select samples that result in improved embeddings. The queries consist of a reference and a set of objects, with an oracle selecting the object most similar (i.e., nearest) to the reference. In order to reduce the number of solicited queries, they are chosen adaptively according to an information theoretic criterion. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification -- using a variety of synthetic and real world datasets. In particular, we demonstrate that actively selected NN queries outperform recently developed active triplet selection methods in a deep metric learning setting. Further, we show that in classification, actively selecting class labels can be reformulated as a process of selecting the most informative NN query, allowing direct application of our method.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Harmless interpolation in regression and classification with structured features
Authors:
Andrew D. McRae,
Santhosh Karnik,
Mark A. Davenport,
Vidya Muthukumar
Abstract:
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the inpu…
▽ More
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features.
△ Less
Submitted 21 February, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time
Authors:
Feng Zhu,
Andrew R. Sedler,
Harrison A. Grier,
Nauman Ahad,
Mark A. Davenport,
Matthew T. Kaufman,
Andrea Giovannucci,
Chethan Pandarinath
Abstract:
Modern neural interfaces allow access to the activity of up to a million neurons within brain circuits. However, bandwidth limits often create a trade-off between greater spatial sampling (more channels or pixels) and the temporal frequency of sampling. Here we demonstrate that it is possible to obtain spatio-temporal super-resolution in neuronal time series by exploiting relationships among neuro…
▽ More
Modern neural interfaces allow access to the activity of up to a million neurons within brain circuits. However, bandwidth limits often create a trade-off between greater spatial sampling (more channels or pixels) and the temporal frequency of sampling. Here we demonstrate that it is possible to obtain spatio-temporal super-resolution in neuronal time series by exploiting relationships among neurons, embedded in latent low-dimensional population dynamics. Our novel neural network training strategy, selective backpropagation through time (SBTT), enables learning of deep generative models of latent dynamics from data in which the set of observed variables changes at each time step. The resulting models are able to infer activity for missing samples by combining observations with learned latent dynamics. We test SBTT applied to sequential autoencoders and demonstrate more efficient and higher-fidelity characterization of neural population dynamics in electrophysiological and calcium imaging data. In electrophysiology, SBTT enables accurate inference of neuronal population dynamics with lower interface bandwidths, providing an avenue to significant power savings for implanted neuroelectronic interfaces. In applications to two-photon calcium imaging, SBTT accurately uncovers high-frequency temporal structure underlying neural population activity, substantially outperforming the current state-of-the-art. Finally, we demonstrate that performance could be further improved by using limited, high-bandwidth sampling to pretrain dynamics models, and then using SBTT to adapt these models for sparsely-sampled data.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Thomson's Multitaper Method Revisited
Authors:
Santhosh Karnik,
Justin Romberg,
Mark A. Davenport
Abstract:
Thomson's multitaper method estimates the power spectrum of a signal from $N$ equally spaced samples by averaging $K$ tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson's multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furtherm…
▽ More
Thomson's multitaper method estimates the power spectrum of a signal from $N$ equally spaced samples by averaging $K$ tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson's multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furthermore, many practitioners use a DPSS bandwidth $W$ and number of tapers that are smaller than what the theory suggests is optimal because the computational requirements increase with the number of tapers. We revisit Thomson's multitaper method from a linear algebra perspective involving subspace projections. This provides additional insight and helps us establish nonasymptotic bounds on some statistical properties of the multitaper spectral estimate, which are similar to existing asymptotic results. We show using $K=2NW-O(\log(NW))$ tapers instead of the traditional $2NW-O(1)$ tapers better protects against spectral leakage, especially when the power spectrum has a high dynamic range. Our perspective also allows us to derive an $ε$-approximation to the multitaper spectral estimate which can be evaluated on a grid of frequencies using $O(\log(NW)\log\tfrac{1}ε)$ FFTs instead of $K=O(NW)$ FFTs. This is useful in problems where many samples are taken, and thus, using many tapers is desirable.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Semi-supervised sequence classification through change point detection
Authors:
Nauman Ahad,
Mark A. Davenport
Abstract:
Sequential sensor data is generated in a wide variety of practical applications. A fundamental challenge involves learning effective classifiers for such sequential data. While deep learning has led to impressive performance gains in recent years in domains such as speech, this has relied on the availability of large datasets of sequences with high-quality labels. In many applications, however, th…
▽ More
Sequential sensor data is generated in a wide variety of practical applications. A fundamental challenge involves learning effective classifiers for such sequential data. While deep learning has led to impressive performance gains in recent years in domains such as speech, this has relied on the availability of large datasets of sequences with high-quality labels. In many applications, however, the associated class labels are often extremely limited, with precise labelling/segmentation being too expensive to perform at a high volume. However, large amounts of unlabeled data may still be available. In this paper we propose a novel framework for semi-supervised learning in such contexts. In an unsupervised manner, change point detection methods can be used to identify points within a sequence corresponding to likely class changes. We show that change points provide examples of similar/dissimilar pairs of sequences which, when coupled with labeled, can be used in a semi-supervised classification setting. Leveraging the change points and labeled data, we form examples of similar/dissimilar sequences to train a neural network to learn improved representations for classification. We provide extensive synthetic simulations and show that the learned representations are superior to those learned through an autoencoder and obtain improved results on both simulated and real-world human activity recognition datasets.
△ Less
Submitted 6 October, 2020; v1 submitted 24 September, 2020;
originally announced September 2020.
-
Simultaneous Preference and Metric Learning from Paired Comparisons
Authors:
Austin Xu,
Mark A. Davenport
Abstract:
A popular model of preference in the context of recommendation systems is the so-called \emph{ideal point} model. In this model, a user is represented as a vector $\mathbf{u}$ together with a collection of items $\mathbf{x_1}, \ldots, \mathbf{x_N}$ in a common low-dimensional space. The vector $\mathbf{u}$ represents the user's "ideal point," or the ideal combination of features that represents a…
▽ More
A popular model of preference in the context of recommendation systems is the so-called \emph{ideal point} model. In this model, a user is represented as a vector $\mathbf{u}$ together with a collection of items $\mathbf{x_1}, \ldots, \mathbf{x_N}$ in a common low-dimensional space. The vector $\mathbf{u}$ represents the user's "ideal point," or the ideal combination of features that represents a hypothesized most preferred item. The underlying assumption in this model is that a smaller distance between $\mathbf{u}$ and an item $\mathbf{x_j}$ indicates a stronger preference for $\mathbf{x_j}$. In the vast majority of the existing work on learning ideal point models, the underlying distance has been assumed to be Euclidean. However, this eliminates any possibility of interactions between features and a user's underlying preferences. In this paper, we consider the problem of learning an ideal point representation of a user's preferences when the distance metric is an unknown Mahalanobis metric. Specifically, we present a novel approach to estimate the user's ideal point $\mathbf{u}$ and the Mahalanobis metric from paired comparisons of the form "item $\mathbf{x_i}$ is preferred to item $\mathbf{x_j}$." This can be viewed as a special case of a more general metric learning problem where the location of some points are unknown a priori. We conduct extensive experiments on synthetic and real-world datasets to exhibit the effectiveness of our algorithm.
△ Less
Submitted 6 September, 2020; v1 submitted 4 September, 2020;
originally announced September 2020.
-
Dynamic Knowledge embedding and tracing
Authors:
Liangbei Xu,
Mark A. Davenport
Abstract:
The goal of knowledge tracing is to track the state of a student's knowledge as it evolves over time. This plays a fundamental role in understanding the learning process and is a key task in the development of an intelligent tutoring system. In this paper we propose a novel approach to knowledge tracing that combines techniques from matrix factorization with recent progress in recurrent neural net…
▽ More
The goal of knowledge tracing is to track the state of a student's knowledge as it evolves over time. This plays a fundamental role in understanding the learning process and is a key task in the development of an intelligent tutoring system. In this paper we propose a novel approach to knowledge tracing that combines techniques from matrix factorization with recent progress in recurrent neural networks (RNNs) to effectively track the state of a student's knowledge. The proposed \emph{DynEmb} framework enables the tracking of student knowledge even without the concept/skill tag information that other knowledge tracing models require while simultaneously achieving superior performance. We provide experimental evaluations demonstrating that DynEmb achieves improved performance compared to baselines and illustrating the robustness and effectiveness of the proposed framework. We also evaluate our approach using several real-world datasets showing that the proposed model outperforms the previous state-of-the-art. These results suggest that combining embedding models with sequential models such as RNNs is a promising new direction for knowledge tracing.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Localized sketching for matrix multiplication and ridge regression
Authors:
Rakshith S Srinivasa,
Mark A Davenport,
Justin Romberg
Abstract:
We consider sketched approximate matrix multiplication and ridge regression in the novel setting of localized sketching, where at any given point, only part of the data matrix is available. This corresponds to a block diagonal structure on the sketching matrix. We show that, under mild conditions, block diagonal sketching matrices require only O(stable rank / ε^2) and $O( stat. dim. ε)$ total samp…
▽ More
We consider sketched approximate matrix multiplication and ridge regression in the novel setting of localized sketching, where at any given point, only part of the data matrix is available. This corresponds to a block diagonal structure on the sketching matrix. We show that, under mild conditions, block diagonal sketching matrices require only O(stable rank / ε^2) and $O( stat. dim. ε)$ total sample complexity for matrix multiplication and ridge regression, respectively. This matches the state-of-the-art bounds that are obtained using global sketching matrices. The localized nature of sketching considered allows for different parts of the data matrix to be sketched independently and hence is more amenable to computation in distributed and streaming settings and results in a smaller memory and computational footprint.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Low-rank matrix completion and denoising under Poisson noise
Authors:
Andrew D. McRae,
Mark A. Davenport
Abstract:
This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically…
▽ More
This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically, we analyze several estimators, including a constrained nuclear-norm minimization program, nuclear-norm regularized least squares, and a nonconvex constrained low-rank optimization problem. We show that for all three estimators, with high probability, we have an upper error bound (in the Frobenius norm error metric) that depends on the matrix rank, the fraction of the elements observed, and maximal row and column sums of the true matrix. We furthermore show that the above results are minimax optimal (within a universal constant) in classes of matrices with low rank and bounded row and column sums. We also extend these results to handle the case of matrix multinomial denoising and completion.
△ Less
Submitted 30 April, 2020; v1 submitted 11 July, 2019;
originally announced July 2019.
-
Active embedding search via noisy paired comparisons
Authors:
Gregory H. Canal,
Andrew K. Massimino,
Mark A. Davenport,
Christopher J. Rozell
Abstract:
Suppose that we wish to estimate a user's preference vector $w$ from paired comparisons of the form "does user $w$ prefer item $p$ or item $q$?," where both the user and items are embedded in a low-dimensional Euclidean space with distances that reflect user and item similarities. Such observations arise in numerous settings, including psychometrics and psychology experiments, search tasks, advert…
▽ More
Suppose that we wish to estimate a user's preference vector $w$ from paired comparisons of the form "does user $w$ prefer item $p$ or item $q$?," where both the user and items are embedded in a low-dimensional Euclidean space with distances that reflect user and item similarities. Such observations arise in numerous settings, including psychometrics and psychology experiments, search tasks, advertising, and recommender systems. In such tasks, queries can be extremely costly and subject to varying levels of response noise; thus, we aim to actively choose pairs that are most informative given the results of previous comparisons. We provide new theoretical insights into the benefits and challenges of greedy information maximization in this setting, and develop two novel strategies that maximize lower bounds on information gain and are simpler to analyze and compute respectively. We use simulated responses from a real-world dataset to validate our strategies through their similar performance to greedy information maximization, and their superior preference estimation over state-of-the-art selection methods as well as random queries.
△ Less
Submitted 24 May, 2019; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Estimation of Poisson arrival processes under linear models
Authors:
Michael G. Moore,
Mark A. Davenport
Abstract:
In this paper we consider the problem of estimating the parameters of a Poisson arrival process where the rate function is assumed to lie in the span of a known basis. Our goal is to estimate the basis expansions coefficients given a realization of this process. We establish novel guarantees concerning the accuracy achieved by the maximum likelihood estimate. Our initial result is near-optimal, wi…
▽ More
In this paper we consider the problem of estimating the parameters of a Poisson arrival process where the rate function is assumed to lie in the span of a known basis. Our goal is to estimate the basis expansions coefficients given a realization of this process. We establish novel guarantees concerning the accuracy achieved by the maximum likelihood estimate. Our initial result is near-optimal, with the exception of an undesirable dependence on the dynamic range of the rate function. We then show how to remove this dependence through a process of "noise regularization", which results in an improved bound. We conjecture that a similar guarantee should be possible when using a more direct (deterministic) regularization scheme. We conclude with a discussion of practical applications and an empirical examination of the proposed regularization schemes.
△ Less
Submitted 20 December, 2018; v1 submitted 2 March, 2018;
originally announced March 2018.
-
As you like it: Localization via paired comparisons
Authors:
Andrew K. Massimino,
Mark A. Davenport
Abstract:
Suppose that we wish to estimate a vector $\mathbf{x}$ from a set of binary paired comparisons of the form "$\mathbf{x}$ is closer to $\mathbf{p}$ than to $\mathbf{q}$" for various choices of vectors $\mathbf{p}$ and $\mathbf{q}$. The problem of estimating $\mathbf{x}$ from this type of observation arises in a variety of contexts, including nonmetric multidimensional scaling, "unfolding," and rank…
▽ More
Suppose that we wish to estimate a vector $\mathbf{x}$ from a set of binary paired comparisons of the form "$\mathbf{x}$ is closer to $\mathbf{p}$ than to $\mathbf{q}$" for various choices of vectors $\mathbf{p}$ and $\mathbf{q}$. The problem of estimating $\mathbf{x}$ from this type of observation arises in a variety of contexts, including nonmetric multidimensional scaling, "unfolding," and ranking problems, often because it provides a powerful and flexible model of preference. We describe theoretical bounds for how well we can expect to estimate $\mathbf{x}$ under a randomized model for $\mathbf{p}$ and $\mathbf{q}$. We also present results for the case where the comparisons are noisy and subject to some degree of error. Additionally, we show that under a randomized model for $\mathbf{p}$ and $\mathbf{q}$, a suitable number of binary paired comparisons yield a stable embedding of the space of target vectors. Finally, we also show that we can achieve significant gains by adaptively changing the distribution for choosing $\mathbf{p}$ and $\mathbf{q}$.
△ Less
Submitted 29 August, 2021; v1 submitted 19 February, 2018;
originally announced February 2018.
-
ROAST: Rapid Orthogonal Approximate Slepian Transform
Authors:
Zhihui Zhu,
Santhosh Karnik,
Michael B. Wakin,
Mark A. Davenport,
Justin Romberg
Abstract:
In this paper, we provide a Rapid Orthogonal Approximate Slepian Transform (ROAST) for the discrete vector that one obtains when collecting a finite set of uniform samples from a baseband analog signal. The ROAST offers an orthogonal projection which is an approximation to the orthogonal projection onto the leading discrete prolate spheroidal sequence (DPSS) vectors (also known as Slepian basis ve…
▽ More
In this paper, we provide a Rapid Orthogonal Approximate Slepian Transform (ROAST) for the discrete vector that one obtains when collecting a finite set of uniform samples from a baseband analog signal. The ROAST offers an orthogonal projection which is an approximation to the orthogonal projection onto the leading discrete prolate spheroidal sequence (DPSS) vectors (also known as Slepian basis vectors). As such, the ROAST is guaranteed to accurately and compactly represent not only oversampled bandlimited signals but also the leading DPSS vectors themselves. Moreover, the subspace angle between the ROAST subspace and the corresponding DPSS subspace can be made arbitrarily small. The complexity of computing the representation of a signal using the ROAST is comparable to the FFT, which is much less than the complexity of using the DPSS basis vectors. We also give non-asymptotic results to guarantee that the proposed basis not only provides a very high degree of approximation accuracy in a mean squared error sense for bandlimited sample vectors, but also that it can provide high-quality approximations of all sampled sinusoids within the band of interest.
△ Less
Submitted 10 September, 2018; v1 submitted 13 December, 2017;
originally announced December 2017.
-
The Eigenvalue Distribution of Discrete Periodic Time-Frequency Limiting Operators
Authors:
Zhihui Zhu,
Santhosh Karnik,
Mark A. Davenport,
Justin Romberg,
Michael B. Wakin
Abstract:
Bandlimiting and timelimiting operators play a fundamental role in analyzing bandlimited signals that are approximately timelimited (or vice versa). In this paper, we consider a time-frequency (in the discrete Fourier transform (DFT) domain) limiting operator whose eigenvectors are known as the periodic discrete prolate spheroidal sequences (PDPSSs). We establish new nonasymptotic results on the e…
▽ More
Bandlimiting and timelimiting operators play a fundamental role in analyzing bandlimited signals that are approximately timelimited (or vice versa). In this paper, we consider a time-frequency (in the discrete Fourier transform (DFT) domain) limiting operator whose eigenvectors are known as the periodic discrete prolate spheroidal sequences (PDPSSs). We establish new nonasymptotic results on the eigenvalue distribution of this operator. As a byproduct, we also characterize the eigenvalue distribution of a set of submatrices of the DFT matrix, which is of independent interest.
△ Less
Submitted 17 July, 2017;
originally announced July 2017.
-
Dynamic matrix recovery from incomplete observations under an exact low-rank constraint
Authors:
Liangbei Xu,
Mark A. Davenport
Abstract:
Low-rank matrix factorizations arise in a wide variety of applications -- including recommendation systems, topic models, and source separation, to name just a few. In these and many other applications, it has been widely noted that by incorporating temporal information and allowing for the possibility of time-varying models, significant improvements are possible in practice. However, despite the…
▽ More
Low-rank matrix factorizations arise in a wide variety of applications -- including recommendation systems, topic models, and source separation, to name just a few. In these and many other applications, it has been widely noted that by incorporating temporal information and allowing for the possibility of time-varying models, significant improvements are possible in practice. However, despite the reported superior empirical performance of these dynamic models over their static counterparts, there is limited theoretical justification for introducing these more complex models. In this paper we aim to address this gap by studying the problem of recovering a dynamically evolving low-rank matrix from incomplete observations. First, we propose the locally weighted matrix smoothing (LOWEMS) framework as one possible approach to dynamic matrix recovery. We then establish error bounds for LOWEMS in both the {\em matrix sensing} and {\em matrix completion} observation models. Our results quantify the potential benefits of exploiting dynamic constraints both in terms of recovery accuracy and sample complexity. To illustrate these benefits we provide both synthetic and real-world experimental results.
△ Less
Submitted 28 October, 2016;
originally announced October 2016.
-
An overview of low-rank matrix recovery from incomplete observations
Authors:
Mark A. Davenport,
Justin Romberg
Abstract:
Low-rank matrices play a fundamental role in modeling and computational methods for signal processing and machine learning. In many applications where low-rank matrices arise, these matrices cannot be fully sampled or directly observed, and one encounters the problem of recovering the matrix given only incomplete and indirect observations. This paper provides an overview of modern techniques for e…
▽ More
Low-rank matrices play a fundamental role in modeling and computational methods for signal processing and machine learning. In many applications where low-rank matrices arise, these matrices cannot be fully sampled or directly observed, and one encounters the problem of recovering the matrix given only incomplete and indirect observations. This paper provides an overview of modern techniques for exploiting low-rank structure to perform matrix recovery in these settings, providing a survey of recent advances in this rapidly-develo** field. Specific attention is paid to the algorithms most commonly used in practice, the existing theoretical guarantees for these algorithms, and representative practical applications of these techniques.
△ Less
Submitted 24 January, 2016;
originally announced January 2016.
-
Constrained adaptive sensing
Authors:
Mark A. Davenport,
Andrew K. Massimino,
Deanna Needell,
Tina Woolf
Abstract:
Suppose that we wish to estimate a vector $\mathbf{x} \in \mathbb{C}^n$ from a small number of noisy linear measurements of the form $\mathbf{y} = \mathbf{A x} + \mathbf{z}$, where $\mathbf{z}$ represents measurement noise. When the vector $\mathbf{x}$ is sparse, meaning that it has only $s$ nonzeros with $s \ll n$, one can obtain a significantly more accurate estimate of $\mathbf{x}$ by adaptivel…
▽ More
Suppose that we wish to estimate a vector $\mathbf{x} \in \mathbb{C}^n$ from a small number of noisy linear measurements of the form $\mathbf{y} = \mathbf{A x} + \mathbf{z}$, where $\mathbf{z}$ represents measurement noise. When the vector $\mathbf{x}$ is sparse, meaning that it has only $s$ nonzeros with $s \ll n$, one can obtain a significantly more accurate estimate of $\mathbf{x}$ by adaptively selecting the rows of $\mathbf{A}$ based on the previous measurements provided that the signal-to-noise ratio (SNR) is sufficiently large. In this paper we consider the case where we wish to realize the potential of adaptivity but where the rows of $\mathbf{A}$ are subject to physical constraints. In particular, we examine the case where the rows of $\mathbf{A}$ are constrained to belong to a finite set of allowable measurement vectors. We demonstrate both the limitations and advantages of adaptive sensing in this constrained setting. We prove that for certain measurement ensembles, the benefits offered by adaptive designs fall far short of the improvements that are possible in the unconstrained adaptive setting. On the other hand, we also provide both theoretical and empirical evidence that in some scenarios adaptivity does still result in substantial improvements even in the constrained setting. To illustrate these potential gains, we propose practical algorithms for constrained adaptive sensing by exploiting connections to the theory of optimal experimental design and show that these algorithms exhibit promising performance in some representative applications.
△ Less
Submitted 15 July, 2016; v1 submitted 19 June, 2015;
originally announced June 2015.
-
1-Bit Matrix Completion
Authors:
Mark A. Davenport,
Yaniv Plan,
Ewout van den Berg,
Mary Wootters
Abstract:
In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an…
▽ More
In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an accurate estimate of M from this data. In general this would seem impossible, but we show that the maximum likelihood estimate under a suitable constraint returns an accurate estimate of M when ||M||_{\infty} <= α, and rank(M) <= r. If the log-likelihood is a concave function (e.g., the logistic or probit observation models), then we can obtain this maximum likelihood estimate by optimizing a convex program. In addition, we also show that if instead of recovering M we simply wish to obtain an estimate of the distribution generating the 1-bit measurements, then we can eliminate the requirement that ||M||_{\infty} <= α. For both cases, we provide lower bounds showing that these estimates are near-optimal. We conclude with a suite of experiments that both verify the implications of our theorems as well as illustrate some of the practical applications of 1-bit matrix completion. In particular, we compare our program to standard matrix completion methods on movie rating data in which users submit ratings from 1 to 5. In order to use our program, we quantize this data to a single bit, but we allow the standard matrix completion program to have access to the original ratings (from 1 to 5). Surprisingly, the approach based on binary data performs significantly better.
△ Less
Submitted 1 July, 2014; v1 submitted 17 September, 2012;
originally announced September 2012.
-
Signal Space CoSaMP for Sparse Recovery with Redundant Dictionaries
Authors:
Mark A. Davenport,
Deanna Needell,
Michael B. Wakin
Abstract:
Compressive sensing (CS) has recently emerged as a powerful framework for acquiring sparse signals. The bulk of the CS literature has focused on the case where the acquired signal has a sparse or compressible representation in an orthonormal basis. In practice, however, there are many signals that cannot be sparsely represented or approximated using an orthonormal basis, but that do have sparse re…
▽ More
Compressive sensing (CS) has recently emerged as a powerful framework for acquiring sparse signals. The bulk of the CS literature has focused on the case where the acquired signal has a sparse or compressible representation in an orthonormal basis. In practice, however, there are many signals that cannot be sparsely represented or approximated using an orthonormal basis, but that do have sparse representations in a redundant dictionary. Standard results in CS can sometimes be extended to handle this case provided that the dictionary is sufficiently incoherent or well-conditioned, but these approaches fail to address the case of a truly redundant or overcomplete dictionary. In this paper we describe a variant of the iterative recovery algorithm CoSaMP for this more challenging setting. We utilize the D-RIP, a condition on the sensing matrix analogous to the well-known restricted isometry property. In contrast to prior work, the method and analysis are "signal-focused"; that is, they are oriented around recovering the signal rather than its dictionary coefficients. Under the assumption that we have a near-optimal scheme for projecting vectors in signal space onto the model family of candidate sparse signals, we provide provable recovery guarantees. Develo** a practical algorithm that can provably compute the required near-optimal projections remains a significant open problem, but we include simulation results using various heuristics that empirically exhibit superior performance to traditional recovery algorithms.
△ Less
Submitted 21 June, 2013; v1 submitted 1 August, 2012;
originally announced August 2012.
-
Compressive binary search
Authors:
Mark A. Davenport,
Ery Arias-Castro
Abstract:
In this paper we consider the problem of locating a nonzero entry in a high-dimensional vector from possibly adaptive linear measurements. We consider a recursive bisection method which we dub the compressive binary search and show that it improves on what any nonadaptive method can achieve. We also establish a non-asymptotic lower bound that applies to all methods, regardless of their computation…
▽ More
In this paper we consider the problem of locating a nonzero entry in a high-dimensional vector from possibly adaptive linear measurements. We consider a recursive bisection method which we dub the compressive binary search and show that it improves on what any nonadaptive method can achieve. We also establish a non-asymptotic lower bound that applies to all methods, regardless of their computational complexity. Combined, these results show that the compressive binary search is within a double logarithmic factor of the optimal performance.
△ Less
Submitted 30 May, 2012; v1 submitted 4 February, 2012;
originally announced February 2012.
-
Compressive Sensing of Analog Signals Using Discrete Prolate Spheroidal Sequences
Authors:
Mark A. Davenport,
Michael B. Wakin
Abstract:
Compressive sensing (CS) has recently emerged as a framework for efficiently capturing signals that are sparse or compressible in an appropriate basis. While often motivated as an alternative to Nyquist-rate sampling, there remains a gap between the discrete, finite-dimensional CS framework and the problem of acquiring a continuous-time signal. In this paper, we attempt to bridge this gap by explo…
▽ More
Compressive sensing (CS) has recently emerged as a framework for efficiently capturing signals that are sparse or compressible in an appropriate basis. While often motivated as an alternative to Nyquist-rate sampling, there remains a gap between the discrete, finite-dimensional CS framework and the problem of acquiring a continuous-time signal. In this paper, we attempt to bridge this gap by exploiting the Discrete Prolate Spheroidal Sequences (DPSS's), a collection of functions that trace back to the seminal work by Slepian, Landau, and Pollack on the effects of time-limiting and bandlimiting operations. DPSS's form a highly efficient basis for sampled bandlimited functions; by modulating and merging DPSS bases, we obtain a dictionary that offers high-quality sparse approximations for most sampled multiband signals. This multiband modulated DPSS dictionary can be readily incorporated into the CS framework. We provide theoretical guarantees and practical insight into the use of this dictionary for recovery of sampled multiband signals from compressive measurements.
△ Less
Submitted 21 March, 2012; v1 submitted 16 September, 2011;
originally announced September 2011.
-
How well can we estimate a sparse vector?
Authors:
Emmanuel J. Candès,
Mark A. Davenport
Abstract:
The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random proj…
▽ More
The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random projection of the sparse vector followed by an $\ell_1$ estimation procedure such as the Dantzig selector. In this sense, compressive sensing techniques cannot essentially be improved.
△ Less
Submitted 1 March, 2013; v1 submitted 27 April, 2011;
originally announced April 2011.
-
The Pros and Cons of Compressive Sensing for Wideband Signal Acquisition: Noise Folding vs. Dynamic Range
Authors:
Mark A. Davenport,
Jason N. Laska,
John R. Treichler,
Richard G. Baraniuk
Abstract:
Compressive sensing (CS) exploits the sparsity present in many signals to reduce the number of measurements needed for digital acquisition. With this reduction would come, in theory, commensurate reductions in the size, weight, power consumption, and/or monetary cost of both signal sensors and any associated communication links. This paper examines the use of CS in the design of a wideband radio r…
▽ More
Compressive sensing (CS) exploits the sparsity present in many signals to reduce the number of measurements needed for digital acquisition. With this reduction would come, in theory, commensurate reductions in the size, weight, power consumption, and/or monetary cost of both signal sensors and any associated communication links. This paper examines the use of CS in the design of a wideband radio receiver in a noisy environment. We formulate the problem statement for such a receiver and establish a reasonable set of requirements that a receiver should meet to be practically useful. We then evaluate the performance of a CS-based receiver in two ways: via a theoretical analysis of its expected performance, with a particular emphasis on noise and dynamic range, and via simulations that compare the CS receiver against the performance expected from a conventional implementation. On the one hand, we show that CS-based systems that aim to reduce the number of acquired measurements are somewhat sensitive to signal noise, exhibiting a 3dB SNR loss per octave of subsampling, which parallels the classic noise-folding phenomenon. On the other hand, we demonstrate that since they sample at a lower rate, CS-based systems can potentially attain a significantly larger dynamic range. Hence, we conclude that while a CS-based system has inherent limitations that do impose some restrictions on its potential applications, it also has attributes that make it highly desirable in a number of important practical settings.
△ Less
Submitted 30 May, 2012; v1 submitted 26 April, 2011;
originally announced April 2011.
-
A simple proof that random matrices are democratic
Authors:
Mark A. Davenport,
Jason N. Laska,
Petros T. Boufounos,
Richard G. Baraniuk
Abstract:
The recently introduced theory of compressive sensing (CS) enables the reconstruction of sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be significantly smaller than the ambient dimension of the signal and yet preserve the significant signal information. Interestingly, it can be shown that random measurement…
▽ More
The recently introduced theory of compressive sensing (CS) enables the reconstruction of sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be significantly smaller than the ambient dimension of the signal and yet preserve the significant signal information. Interestingly, it can be shown that random measurement schemes provide a near-optimal encoding in terms of the required number of measurements. In this report, we explore another relatively unexplored, though often alluded to, advantage of using random matrices to acquire CS measurements. Specifically, we show that random matrices are democractic, meaning that each measurement carries roughly the same amount of signal information. We demonstrate that by slightly increasing the number of measurements, the system is robust to the loss of a small number of arbitrary measurements. In addition, we draw connections to oversampling and demonstrate stability from the loss of significantly more measurements.
△ Less
Submitted 4 November, 2009;
originally announced November 2009.
-
A Theoretical Analysis of Joint Manifolds
Authors:
Mark A. Davenport,
Chinmay Hegde,
Marco F. Duarte,
Richard G. Baraniuk
Abstract:
The emergence of low-cost sensor architectures for diverse modalities has made it possible to deploy sensor arrays that capture a single event from a large number of vantage points and using multiple modalities. In many scenarios, these sensors acquire very high-dimensional data such as audio signals, images, and video. To cope with such high-dimensional data, we typically rely on low-dimensiona…
▽ More
The emergence of low-cost sensor architectures for diverse modalities has made it possible to deploy sensor arrays that capture a single event from a large number of vantage points and using multiple modalities. In many scenarios, these sensors acquire very high-dimensional data such as audio signals, images, and video. To cope with such high-dimensional data, we typically rely on low-dimensional models. Manifold models provide a particularly powerful model that captures the structure of high-dimensional data when it is governed by a low-dimensional set of parameters. However, these models do not typically take into account dependencies among multiple sensors. We thus propose a new joint manifold framework for data ensembles that exploits such dependencies. We show that simple algorithms can exploit the joint manifold structure to improve their performance on standard signal processing applications. Additionally, recent results concerning dimensionality reduction for manifolds enable us to formulate a network-scalable data compression scheme that uses random projections of the sensed data. This scheme efficiently fuses the data from all sensors through the addition of such projections, regardless of the data modalities and dimensions.
△ Less
Submitted 9 December, 2009; v1 submitted 7 January, 2009;
originally announced January 2009.