-
Debiased LASSO under Poisson-Gauss Model
Authors:
Pedro Abdalla,
Gil Kur
Abstract:
Quantifying uncertainty in high-dimensional sparse linear regression is a fundamental task in statistics that arises in various applications. One of the most successful methods for quantifying uncertainty is the debiased LASSO, which has a solid theoretical foundation but is restricted to settings where the noise is purely additive. Motivated by real-world applications, we study the so-called Pois…
▽ More
Quantifying uncertainty in high-dimensional sparse linear regression is a fundamental task in statistics that arises in various applications. One of the most successful methods for quantifying uncertainty is the debiased LASSO, which has a solid theoretical foundation but is restricted to settings where the noise is purely additive. Motivated by real-world applications, we study the so-called Poisson inverse problem with additive Gaussian noise and propose a debiased LASSO algorithm that only requires $n \gg s\log^2p$ samples, which is optimal up to a logarithmic factor.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Covariance estimation with direction dependence accuracy
Authors:
Pedro Abdalla,
Shahar Mendelson
Abstract:
We construct an estimator $\widehatΣ$ for covariance matrices of unknown, centred random vectors X, with the given data consisting of N independent measurements $X_1,...,X_N$ of X and the wanted confidence level. We show under minimal assumptions on X, the estimator performs with the optimal accuracy with respect to the operator norm. In addition, the estimator is also optimal with respect to dire…
▽ More
We construct an estimator $\widehatΣ$ for covariance matrices of unknown, centred random vectors X, with the given data consisting of N independent measurements $X_1,...,X_N$ of X and the wanted confidence level. We show under minimal assumptions on X, the estimator performs with the optimal accuracy with respect to the operator norm. In addition, the estimator is also optimal with respect to direction dependence accuracy: $\langle \widehatΣu,u\rangle$ is an optimal estimator for $σ^2(u)=\mathbb{E}\langle X,u\rangle^2$ when $σ^2(u)$ is ``large".
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Covariance Estimation under Missing Observations and $L_4-L_2$ Moment Equivalence
Authors:
Pedro Abdalla
Abstract:
We consider the problem of estimating the covariance matrix of a random vector by observing i.i.d samples and each entry of the sampled vector is missed with probability $p$. Under the standard $L_4-L_2$ moment equivalence assumption, we construct the first estimator that simultaneously achieves optimality with respect to the parameter $p$ and it recovers the optimal convergence rate for the class…
▽ More
We consider the problem of estimating the covariance matrix of a random vector by observing i.i.d samples and each entry of the sampled vector is missed with probability $p$. Under the standard $L_4-L_2$ moment equivalence assumption, we construct the first estimator that simultaneously achieves optimality with respect to the parameter $p$ and it recovers the optimal convergence rate for the classical covariance estimation problem when $p=1$
△ Less
Submitted 14 June, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Expander graphs are globally synchronizing
Authors:
Pedro Abdalla,
Afonso S. Bandeira,
Martin Kassabov,
Victor Souza,
Steven H. Strogatz,
Alex Townsend
Abstract:
The Kuramoto model is fundamental to the study of synchronization. It consists of a collection of oscillators with interactions given by a network, which we identify respectively with vertices and edges of a graph. In this paper, we show that a graph with sufficient expansion must be globally synchronizing, meaning that a homogeneous Kuramoto model of identical oscillators on such a graph will con…
▽ More
The Kuramoto model is fundamental to the study of synchronization. It consists of a collection of oscillators with interactions given by a network, which we identify respectively with vertices and edges of a graph. In this paper, we show that a graph with sufficient expansion must be globally synchronizing, meaning that a homogeneous Kuramoto model of identical oscillators on such a graph will converge to the fully synchronized state with all the oscillators having the same phase, for every initial state up to a set of measure zero. In particular, we show that for any $\varepsilon > 0$ and $p \geq (1 + \varepsilon) (\log n) / n$, the homogeneous Kuramoto model on the Erdős-Rényi random graph $G(n, p)$ is globally synchronizing with probability tending to one as $n$ goes to infinity. This improves on a previous result of Kassabov, Strogatz, and Townsend and solves a conjecture of Ling, Xu, and Bandeira. We also show that the model is globally synchronizing on any $d$-regular Ramanujan graph, and on typical $d$-regular graphs, for large enough degree $d$.
△ Less
Submitted 10 April, 2024; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Guarantees for Spontaneous Synchronization on Random Geometric Graphs
Authors:
Pedro Abdalla,
Afonso S. Bandeira,
Clara Invernizzi
Abstract:
The Kuramoto model is a classical mathematical model in the field of non-linear dynamical systems that describes the evolution of coupled oscillators in a network that may reach a synchronous state. The relationship between the network's topology and whether the oscillators synchronize is a central question in the field of synchronization, and random graphs are often employed as a proxy for comple…
▽ More
The Kuramoto model is a classical mathematical model in the field of non-linear dynamical systems that describes the evolution of coupled oscillators in a network that may reach a synchronous state. The relationship between the network's topology and whether the oscillators synchronize is a central question in the field of synchronization, and random graphs are often employed as a proxy for complex networks. On the other hand, the random graphs on which the Kuramoto model is rigorously analyzed in the literature are homogeneous models and fail to capture the underlying geometric structure that appears in several examples.
In this work, we leverage tools from random matrix theory, random graphs, and mathematical statistics to prove that the Kuramoto model on a random geometric graph on the sphere synchronizes with probability tending to one as the number of nodes tends to infinity. To the best of our knowledge, this is the first rigorous result for the Kuramoto model on random geometric graphs.
△ Less
Submitted 15 February, 2024; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails
Authors:
Pedro Abdalla,
Nikita Zhivotovskiy
Abstract:
We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an $η$-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the $L_{p}$-marginal moment with some $p \ge 4$ is equivalen…
▽ More
We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an $η$-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the $L_{p}$-marginal moment with some $p \ge 4$ is equivalent to the corresponding $L_2$-marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. As a part of our analysis, we prove a dimension-free Bai-Yin type theorem in the regime $p > 4$.
△ Less
Submitted 20 July, 2023; v1 submitted 17 May, 2022;
originally announced May 2022.
-
Robust Sparse Recovery with Sparse Bernoulli matrices via Expanders
Authors:
Pedro Abdalla
Abstract:
Sparse binary matrices are of great interest in the field of compressed sensing. This class of matrices make possible to perform signal recovery with lower storage costs and faster decoding algorithms. In particular, random matrices formed by i.i.d Bernoulli $p$ random variables are of practical relevance in the context of nonnegative sparse recovery.
In this work, we investigate the robust null…
▽ More
Sparse binary matrices are of great interest in the field of compressed sensing. This class of matrices make possible to perform signal recovery with lower storage costs and faster decoding algorithms. In particular, random matrices formed by i.i.d Bernoulli $p$ random variables are of practical relevance in the context of nonnegative sparse recovery.
In this work, we investigate the robust nullspace property of sparse Bernoulli $p$ matrices. Previous results in the literature establish that such matrices can accurately recover $n$-dimensional $s$-sparse vectors with $m=O\left (\frac{s}{c(p)}\log\frac{en}{s}\right )$ measurements, where $c(p) \le p$ is a constant that only depends on the parameter $p$. These results suggest that, when $p$ vanishes, the sparse Bernoulli matrix requires considerably more measurements than the minimal necessary achieved by the standard isotropic subgaussian designs. We show that this is not true. Our main result characterizes, for a wide range of levels sparsity $s$, the smallest $p$ such that it is possible to perform sparse recovery with the minimal number of measurements. We also provide matching lower bounds to establish the optimality of our results.
△ Less
Submitted 13 February, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
Community Detection with a Subsampled Semidefinite Program
Authors:
Pedro Abdalla,
Afonso S. Bandeira
Abstract:
Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recent…
▽ More
Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.
△ Less
Submitted 10 May, 2022; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Dictionary-Sparse Recovery From Heavy-Tailed Measurements
Authors:
Pedro Abdalla,
Christian Kümmerle
Abstract:
The recovery of signals that are sparse not in a basis, but rather sparse with respect to an over-complete dictionary is one of the most flexible settings in the field of compressed sensing with numerous applications. As in the standard compressed sensing setting, it is possible that the signal can be reconstructed efficiently from few, linear measurements, for example by the so-called $\ell_1$-sy…
▽ More
The recovery of signals that are sparse not in a basis, but rather sparse with respect to an over-complete dictionary is one of the most flexible settings in the field of compressed sensing with numerous applications. As in the standard compressed sensing setting, it is possible that the signal can be reconstructed efficiently from few, linear measurements, for example by the so-called $\ell_1$-synthesis method.
However, it has been less well-understood which measurement matrices provably work for this setting. Whereas in the standard setting, it has been shown that even certain heavy-tailed measurement matrices can be used in the same sample complexity regime as Gaussian matrices, comparable results are only available for the restrictive class of sub-Gaussian measurement vectors as far as the recovery of dictionary-sparse signals via $\ell_1$-synthesis is concerned.
In this work, we fill this gap and establish optimal guarantees for the recovery of vectors that are (approximately) sparse with respect to a dictionary via the $\ell_1$-synthesis method from linear, potentially noisy measurements for a large class of random measurement matrices. In particular, we show that random measurements that fulfill only a small-ball assumption and a weak moment assumption, such as random vectors with i.i.d. Student-$t$ entries with a logarithmic number of degrees of freedom, lead to comparable guarantees as (sub-)Gaussian measurements.
As a technical tool, we show a bound on the expectation of the sum of squared order statistics under very general assumptions, which might be of independent interest.
As a corollary of our results, we also obtain a slight improvement on the weakest assumption on a measurement matrix with i.i.d. rows sufficient for uniform recovery in standard compressed sensing, improving on results by Lecué and Mendelson and Dirksen, Lecué and Rauhut.
△ Less
Submitted 29 September, 2021; v1 submitted 20 January, 2021;
originally announced January 2021.