-
On the Optimal Recovery of Graph Signals
Authors:
Simon Foucart,
Chunyang Liao,
Nate Veldt
Abstract:
Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has s…
▽ More
Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has shown that minimizing a regularized objective produces optimal solutions for a general class of problems, but did not fully identify the regularization parameter. Our main contribution provides a way to compute regularization parameters that are optimal or near-optimal (depending on the setting), specifically for graph signal processing problems. Our results offer a new interpretation for classical optimization techniques in graph-based learning and also come with new insights for hyperparameter selection. We illustrate the potential of our methods in numerical experiments on several semi-synthetic graph signal processing datasets.
△ Less
Submitted 29 May, 2023; v1 submitted 2 April, 2023;
originally announced April 2023.
-
The Sparsity of LASSO-type Minimizers
Authors:
Simon Foucart
Abstract:
This note extends an attribute of the LASSO procedure to a whole class of related procedures, including square-root LASSO, square LASSO, LAD-LASSO, and an instance of generalized LASSO. Namely, under the assumption that the input matrix satisfies an $\ell_p$-restricted isometry property (which in some sense is weaker than the standard $\ell_2$-restricted isometry property assumption), it is shown…
▽ More
This note extends an attribute of the LASSO procedure to a whole class of related procedures, including square-root LASSO, square LASSO, LAD-LASSO, and an instance of generalized LASSO. Namely, under the assumption that the input matrix satisfies an $\ell_p$-restricted isometry property (which in some sense is weaker than the standard $\ell_2$-restricted isometry property assumption), it is shown that if the input vector comes from the exact measurement of a sparse vector, then the minimizer of any such LASSO-type procedure has sparsity comparable to the sparsity of the measured vector. The result remains valid in the presence of moderate measurement error when the regularization parameter is not too small.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Optimal Recovery from Inaccurate Data in Hilbert Spaces: Regularize, but what of the Parameter?
Authors:
Simon Foucart,
Chunyang Liao
Abstract:
In Optimal Recovery, the task of learning a function from observational data is tackled deterministically by adopting a worst-case perspective tied to an explicit model assumption made on the functions to be learned. Working in the framework of Hilbert spaces, this article considers a model assumption based on approximability. It also incorporates observational inaccuracies modeled via additive er…
▽ More
In Optimal Recovery, the task of learning a function from observational data is tackled deterministically by adopting a worst-case perspective tied to an explicit model assumption made on the functions to be learned. Working in the framework of Hilbert spaces, this article considers a model assumption based on approximability. It also incorporates observational inaccuracies modeled via additive errors bounded in $\ell_2$. Earlier works have demonstrated that regularization provide algorithms that are optimal in this situation, but did not fully identify the desired hyperparameter. This article fills the gap in both a local scenario and a global scenario. In the local scenario, which amounts to the determination of Chebyshev centers, the semidefinite recipe of Beck and Eldar (legitimately valid in the complex setting only) is complemented by a more direct approach, with the proviso that the observational functionals have orthonormal representers. In the said approach, the desired parameter is the solution to an equation that can be resolved via standard methods. In the global scenario, where linear algorithms rule, the parameter elusive in the works of Micchelli et al. is found as the byproduct of a semidefinite program. Additionally and quite surprisingly, in case of observational functionals with orthonormal representers, it is established that any regularization parameter is optimal.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery Perspective
Authors:
Simon Foucart,
Chunyang Liao,
Shahin Shahrampour,
Yinsong Wang
Abstract:
The notion of generalization in classical Statistical Learning is often attached to the postulate that data points are independent and identically distributed (IID) random variables. While relevant in many applications, this postulate may not hold in general, encouraging the development of learning frameworks that are robust to non-IID data. In this work, we consider the regression problem from an…
▽ More
The notion of generalization in classical Statistical Learning is often attached to the postulate that data points are independent and identically distributed (IID) random variables. While relevant in many applications, this postulate may not hold in general, encouraging the development of learning frameworks that are robust to non-IID data. In this work, we consider the regression problem from an Optimal Recovery perspective. Relying on a model assumption comparable to choosing a hypothesis class, a learner aims at minimizing the worst-case error, without recourse to any probabilistic assumption on the data. We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces. Then, for any Hilbert space, we show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view, as long as the hypothesis class is linear. Interestingly, this formula coincides with kernel ridgeless regression in some cases, proving that minimizing the average error and worst-case error can yield the same solution. We provide numerical experiments in support of our theoretical findings.
△ Less
Submitted 11 September, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
On the sparsity of LASSO minimizers in sparse data recovery
Authors:
Simon Foucart,
Eitan Tadmor,
Ming Zhong
Abstract:
We present a detailed analysis of the unconstrained $\ell_1$-weighted LASSO method for recovery of sparse data from its observation by randomly generated matrices, satisfying the Restricted Isometry Property (RIP) with constant $δ<1$, and subject to negligible measurement and compressibility errors. We prove that if the data is $k$-sparse, then the size of support of the LASSO minimizer, $s$, main…
▽ More
We present a detailed analysis of the unconstrained $\ell_1$-weighted LASSO method for recovery of sparse data from its observation by randomly generated matrices, satisfying the Restricted Isometry Property (RIP) with constant $δ<1$, and subject to negligible measurement and compressibility errors. We prove that if the data is $k$-sparse, then the size of support of the LASSO minimizer, $s$, maintains a comparable sparsity, $s\leq C_δk$. For example, if $δ=0.7$ then $s< 11k$ and a slightly smaller $δ=0.4$ yields $s< 4k$. We also derive new $\ell_2/\ell_1$ error bounds which highlight precise dependence on $k$ and on the LASSO parameter $λ$, before the error is driven below the scale of negligible measurement/ and compressiblity errors.
△ Less
Submitted 14 March, 2022; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Weighted matrix completion from non-random, non-uniform sampling patterns
Authors:
Simon Foucart,
Deanna Needell,
Reese Pathak,
Yaniv Plan,
Mary Wootters
Abstract:
We study the matrix completion problem when the observation pattern is deterministic and possibly non-uniform. We propose a simple and efficient debiased projection scheme for recovery from noisy observations and analyze the error under a suitable weighted metric. We introduce a simple function of the weight matrix and the sampling pattern that governs the accuracy of the recovered matrix. We deri…
▽ More
We study the matrix completion problem when the observation pattern is deterministic and possibly non-uniform. We propose a simple and efficient debiased projection scheme for recovery from noisy observations and analyze the error under a suitable weighted metric. We introduce a simple function of the weight matrix and the sampling pattern that governs the accuracy of the recovered matrix. We derive theoretical guarantees that upper bound the recovery error and nearly matching lower bounds that showcase optimality in several regimes. Our numerical experiments demonstrate the computational efficiency and accuracy of our approach, and show that debiasing is essential when using non-uniform sampling patterns.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Nonlinear Approximation and (Deep) ReLU Networks
Authors:
I. Daubechies,
R. DeVore,
S. Foucart,
B. Hanin,
G. Petrova
Abstract:
This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g. approximation by polynomials o…
▽ More
This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g. approximation by polynomials or by piecewise polynomials on prescribed partitions. However, approximation by neural networks depending on n parameters is a form of nonlinear approximation and as such should be compared with other nonlinear methods such as variable knot splines or n-term approximation from dictionaries. The performance of neural networks in targeted applications such as machine learning indicate that they actually possess even greater approximation power than these traditional methods of nonlinear approximation. The main results of this article prove that this is indeed the case. This is done by exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task. The present article purposefully limits itself to studying the approximation of univariate functions by ReLU networks. Many generalizations to functions of several variables and other activation functions can be envisioned. However, even in this simplest of settings considered here, a theory that completely quantifies the approximation power of neural networks is still lacking.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
One-Bit Sensing of Low-Rank and Bisparse Matrices
Authors:
Simon Foucart,
Laurent Jacques
Abstract:
This note studies the worst-case recovery error of low-rank and bisparse matrices as a function of the number of one-bit measurements used to acquire them. First, by way of the concept of consistency width, precise estimates are given on how fast the recovery error can in theory decay. Next, an idealized recovery method is proved to reach the fourth-root of the optimal decay rate for Gaussian sens…
▽ More
This note studies the worst-case recovery error of low-rank and bisparse matrices as a function of the number of one-bit measurements used to acquire them. First, by way of the concept of consistency width, precise estimates are given on how fast the recovery error can in theory decay. Next, an idealized recovery method is proved to reach the fourth-root of the optimal decay rate for Gaussian sensing schemes. This idealized method being impractical, an implementable recovery algorithm is finally proposed in the context of factorized Gaussian sensing schemes. It is shown to provide a recovery error decaying as the sixth-root of the optimal rate.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
One-Bit Compressive Sensing of Dictionary-Sparse Signals
Authors:
Rich Baraniuk,
Simon Foucart,
Deanna Needell,
Yaniv Plan,
Mary Wootters
Abstract:
One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples---only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sp…
▽ More
One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples---only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sparse in some fixed orthonormal basis. However, in most practical applications, signals are sparse with respect to an overcomplete dictionary, rather than a basis. There has already been a surge of activity to obtain recovery guarantees under such a generalized sparsity model in the classical compressive sensing setting. Here, we extend the one-bit framework to this important model, providing a unified theory of one-bit compressive sensing under dictionary sparsity. Specifically, we analyze several different algorithms---based on convex programming and on hard thresholding---and show that, under natural assumptions on the sensing matrix (satisfied by Gaussian matrices), these algorithms can efficiently recover analysis-dictionary-sparse signals in the one-bit model.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Exponential decay of reconstruction error from binary measurements of sparse signals
Authors:
Richard Baraniuk,
Simon Foucart,
Deanna Needell,
Yaniv Plan,
Mary Wootters
Abstract:
Binary measurements arise naturally in a variety of statistical and engineering applications. They may be inherent to the problem---e.g., in determining the relationship between genetics and the presence or absence of a disease---or they may be a result of extreme quantization. In one-bit compressed sensing it has recently been shown that the number of one-bit measurements required for signal esti…
▽ More
Binary measurements arise naturally in a variety of statistical and engineering applications. They may be inherent to the problem---e.g., in determining the relationship between genetics and the presence or absence of a disease---or they may be a result of extreme quantization. In one-bit compressed sensing it has recently been shown that the number of one-bit measurements required for signal estimation mirrors that of unquantized compressed sensing. Indeed, $s$-sparse signals in $\mathbb{R}^n$ can be estimated (up to normalization) from $Ω(s \log (n/s))$ one-bit measurements. Nevertheless, controlling the precise accuracy of the error estimate remains an open challenge. In this paper, we focus on optimizing the decay of the error as a function of the oversampling factor $λ:= m/(s \log(n/s))$, where $m$ is the number of measurements. It is known that the error in reconstructing sparse signals from standard one-bit measurements is bounded below by $Ω(λ^{-1})$. Without adjusting the measurement procedure, reducing this polynomial error decay rate is impossible. However, we show that an adaptive choice of the thresholds used for quantization may lower the error rate to $e^{-Ω(λ)}$. This improves upon guarantees for other methods of adaptive thresholding as proposed in Sigma-Delta quantization. We develop a general recursive strategy to achieve this exponential decay and two specific polynomial-time algorithms which fall into this framework, one based on convex programming and one on hard thresholding. This work is inspired by the one-bit compressed sensing model, in which the engineer controls the measurement procedure. Nevertheless, the principle is extendable to signal reconstruction problems in a variety of binary statistical models as well as statistical estimation problems like logistic regression.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
The Gelfand widths of $\ell_p$-balls for $0<p\leq 1$
Authors:
Simon Foucart,
Alain Pajor,
Holger Rauhut,
Tino Ullrich
Abstract:
We provide sharp lower and upper bounds for the Gelfand widths of $\ell_p$-balls in the $N$-dimensional $\ell_q^N$-space for $0<p\leq 1$ and $p<q \leq 2$. Such estimates are highly relevant to the novel theory of compressive sensing, and our proofs rely on methods from this area.
We provide sharp lower and upper bounds for the Gelfand widths of $\ell_p$-balls in the $N$-dimensional $\ell_q^N$-space for $0<p\leq 1$ and $p<q \leq 2$. Such estimates are highly relevant to the novel theory of compressive sensing, and our proofs rely on methods from this area.
△ Less
Submitted 16 December, 2010; v1 submitted 3 February, 2010;
originally announced February 2010.