-
Matrix Concentration Inequalities and Free Probability II. Two-sided Bounds and Applications
Authors:
Afonso S. Bandeira,
Giorgio Cipolloni,
Dominik Schröder,
Ramon van Handel
Abstract:
The first paper in this series introduced a new family of nonasymptotic matrix concentration inequalities that sharply capture the spectral properties of very general Gaussian (as well as non-Gaussian) random matrices in terms of an associated noncommutative model. These methods achieved matching upper and lower bounds for smooth spectral statistics, but only provided upper bounds for the spectral…
▽ More
The first paper in this series introduced a new family of nonasymptotic matrix concentration inequalities that sharply capture the spectral properties of very general Gaussian (as well as non-Gaussian) random matrices in terms of an associated noncommutative model. These methods achieved matching upper and lower bounds for smooth spectral statistics, but only provided upper bounds for the spectral edges. Here we obtain matching lower bounds for the spectral edges, completing the theory initiated in the first paper. The resulting two-sided bounds enable the study of applications that require an exact determination of the spectral edges to leading order, which is fundamentally beyond the reach of classical matrix concentration inequalities. To illustrate their utility, we undertake a detailed study of phase transition phenomena for spectral outliers of nonhomogeneous random matrices.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Computational lower bounds for multi-frequency group synchronization
Authors:
Anastasia Kireeva,
Afonso S. Bandeira,
Dmitriy Kunisky
Abstract:
We consider a group synchronization problem with multiple frequencies which involves observing pairwise relative measurements of group elements on multiple frequency channels, corrupted by Gaussian noise. We study the computational phase transition in the problem of detecting whether a structured signal is present in such observations by analyzing low-degree polynomial algorithms. We show that, as…
▽ More
We consider a group synchronization problem with multiple frequencies which involves observing pairwise relative measurements of group elements on multiple frequency channels, corrupted by Gaussian noise. We study the computational phase transition in the problem of detecting whether a structured signal is present in such observations by analyzing low-degree polynomial algorithms. We show that, assuming the low-degree conjecture, in synchronization models over arbitrary finite groups as well as over the circle group $SO(2)$, a simple spectral algorithm is optimal among algorithms of runtime $\exp(\tildeΩ(n^{1/3}))$ for detection from an observation including a constant number of frequencies. Combined with an upper bound for the statistical threshold shown in Perry et al., our results indicate the presence of a statistical-to-computational gap in such models with a sufficiently large number of frequencies.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
A lower bound for the Balan--Jiang matrix problem
Authors:
Afonso S. Bandeira,
Dustin G. Mixon,
Stefan Steinerberger
Abstract:
We prove the existence of a positive semidefinite matrix $A \in \mathbb{R}^{n \times n}$ such that any decomposition into rank-1 matrices has to have factors with a large $\ell^1-$norm, more precisely $$ \sum_{k} x_k x_k^*=A \quad \implies \quad \sum_k \|x_k\|^2_{1} \geq c \sqrt{n} \|A\|_{1},$$ where $c$ is independent of $n$. This provides a lower bound for the Balan--Jiang matrix problem. The co…
▽ More
We prove the existence of a positive semidefinite matrix $A \in \mathbb{R}^{n \times n}$ such that any decomposition into rank-1 matrices has to have factors with a large $\ell^1-$norm, more precisely $$ \sum_{k} x_k x_k^*=A \quad \implies \quad \sum_k \|x_k\|^2_{1} \geq c \sqrt{n} \|A\|_{1},$$ where $c$ is independent of $n$. This provides a lower bound for the Balan--Jiang matrix problem. The construction is probabilistic.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Exact threshold for approximate ellipsoid fitting of random points
Authors:
Antoine Maillard,
Afonso S. Bandeira
Abstract:
We consider the problem $(\rm P)$ of exactly fitting an ellipsoid (centered at $0$) to $n$ standard Gaussian random vectors in $\mathbb{R}^d$, as $n, d \to \infty$ with $n / d^2 \to α> 0$. This problem is conjectured to undergo a sharp transition: with high probability, $(\rm P)$ has a solution if $α< 1/4$, while $(\rm P)$ has no solutions if $α> 1/4$. So far, only a trivial bound $α> 1/2$ is know…
▽ More
We consider the problem $(\rm P)$ of exactly fitting an ellipsoid (centered at $0$) to $n$ standard Gaussian random vectors in $\mathbb{R}^d$, as $n, d \to \infty$ with $n / d^2 \to α> 0$. This problem is conjectured to undergo a sharp transition: with high probability, $(\rm P)$ has a solution if $α< 1/4$, while $(\rm P)$ has no solutions if $α> 1/4$. So far, only a trivial bound $α> 1/2$ is known to imply the absence of solutions, while the sharpest results on the positive side assume $α\leq η$ (for $η> 0$ a small constant) to prove that $(\rm P)$ is solvable. In this work we study universality between this problem and a so-called "Gaussian equivalent", for which the same transition can be rigorously analyzed. Our main results are twofold. On the positive side, we prove that if $α< 1/4$, there exist an ellipsoid fitting all the points up to a small error, and that the lengths of its principal axes are bounded above and below. On the other hand, for $α> 1/4$, we show that achieving small fitting error is not possible if the length of the ellipsoid's shortest axis does not approach $0$ as $d \to \infty$ (and in particular there does not exist any ellipsoid fit whose shortest axis length is bounded away from $0$ as $d \to \infty$). To the best of our knowledge, our work is the first rigorous result characterizing the expected phase transition in ellipsoid fitting at $α= 1/4$. In a companion non-rigorous work, the first author and D. Kunisky give a general analysis of ellipsoid fitting using the replica method of statistical physics, which inspired the present work.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Fitting an ellipsoid to a quadratic number of random points
Authors:
Afonso S. Bandeira,
Antoine Maillard,
Shahar Mendelson,
Elliot Paquette
Abstract:
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions wit…
▽ More
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions with high probability if $n \geq (1 + \varepsilon) d^2 /4$. So far, only a trivial bound $n \geq d^2 / 2$ is known on the negative side, while the best results on the positive side assume $n \leq d^2 / \mathrm{polylog}(d)$. In this work, we improve over previous approaches using a key result of Bartl & Mendelson on the concentration of Gram matrices of random vectors under mild assumptions on their tail behavior. This allows us to give a simple proof that $(\mathrm{P})$ is feasible with high probability when $n \leq d^2 / C$, for a (possibly large) constant $C > 0$.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Injectivity of ReLU networks: perspectives from statistical physics
Authors:
Antoine Maillard,
Afonso S. Bandeira,
David Belius,
Ivan Dokmanić,
Shuta Nakajima
Abstract:
When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity thresh…
▽ More
When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $α= \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
On the concentration of Gaussian Cayley matrices
Authors:
Afonso S. Bandeira,
Dmitriy Kunisky,
Dustin G. Mixon,
Xinmeng Zeng
Abstract:
Given a finite group, we study the Gaussian series of the matrices in the image of its left regular representation. We propose such random matrices as a benchmark for improvements to the noncommutative Khintchine inequality, and we highlight an application to the matrix Spencer conjecture.
Given a finite group, we study the Gaussian series of the matrices in the image of its left regular representation. We propose such random matrices as a benchmark for improvements to the noncommutative Khintchine inequality, and we highlight an application to the matrix Spencer conjecture.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Expander graphs are globally synchronizing
Authors:
Pedro Abdalla,
Afonso S. Bandeira,
Martin Kassabov,
Victor Souza,
Steven H. Strogatz,
Alex Townsend
Abstract:
The Kuramoto model is fundamental to the study of synchronization. It consists of a collection of oscillators with interactions given by a network, which we identify respectively with vertices and edges of a graph. In this paper, we show that a graph with sufficient expansion must be globally synchronizing, meaning that a homogeneous Kuramoto model of identical oscillators on such a graph will con…
▽ More
The Kuramoto model is fundamental to the study of synchronization. It consists of a collection of oscillators with interactions given by a network, which we identify respectively with vertices and edges of a graph. In this paper, we show that a graph with sufficient expansion must be globally synchronizing, meaning that a homogeneous Kuramoto model of identical oscillators on such a graph will converge to the fully synchronized state with all the oscillators having the same phase, for every initial state up to a set of measure zero. In particular, we show that for any $\varepsilon > 0$ and $p \geq (1 + \varepsilon) (\log n) / n$, the homogeneous Kuramoto model on the Erdős-Rényi random graph $G(n, p)$ is globally synchronizing with probability tending to one as $n$ goes to infinity. This improves on a previous result of Kassabov, Strogatz, and Townsend and solves a conjecture of Ling, Xu, and Bandeira. We also show that the model is globally synchronizing on any $d$-regular Ramanujan graph, and on typical $d$-regular graphs, for large enough degree $d$.
△ Less
Submitted 10 April, 2024; v1 submitted 23 October, 2022;
originally announced October 2022.
-
On free energy barriers in Gaussian priors and failure of cold start MCMC for high-dimensional unimodal distributions
Authors:
Afonso S. Bandeira,
Antoine Maillard,
Richard Nickl,
Sven Wang
Abstract:
We exhibit examples of high-dimensional unimodal posterior distributions arising in non-linear regression models with Gaussian process priors for which MCMC methods can take an exponential run-time to enter the regions where the bulk of the posterior measure concentrates. Our results apply to worst-case initialised (`cold start') algorithms that are local in the sense that their step-sizes cannot…
▽ More
We exhibit examples of high-dimensional unimodal posterior distributions arising in non-linear regression models with Gaussian process priors for which MCMC methods can take an exponential run-time to enter the regions where the bulk of the posterior measure concentrates. Our results apply to worst-case initialised (`cold start') algorithms that are local in the sense that their step-sizes cannot be too large on average. The counter-examples hold for general MCMC schemes based on gradient or random walk steps, and the theory is illustrated for Metropolis-Hastings adjusted methods such as pCN and MALA.
△ Less
Submitted 19 November, 2022; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Guarantees for Spontaneous Synchronization on Random Geometric Graphs
Authors:
Pedro Abdalla,
Afonso S. Bandeira,
Clara Invernizzi
Abstract:
The Kuramoto model is a classical mathematical model in the field of non-linear dynamical systems that describes the evolution of coupled oscillators in a network that may reach a synchronous state. The relationship between the network's topology and whether the oscillators synchronize is a central question in the field of synchronization, and random graphs are often employed as a proxy for comple…
▽ More
The Kuramoto model is a classical mathematical model in the field of non-linear dynamical systems that describes the evolution of coupled oscillators in a network that may reach a synchronous state. The relationship between the network's topology and whether the oscillators synchronize is a central question in the field of synchronization, and random graphs are often employed as a proxy for complex networks. On the other hand, the random graphs on which the Kuramoto model is rigorously analyzed in the literature are homogeneous models and fail to capture the underlying geometric structure that appears in several examples.
In this work, we leverage tools from random matrix theory, random graphs, and mathematical statistics to prove that the Kuramoto model on a random geometric graph on the sphere synchronizes with probability tending to one as the number of nodes tends to infinity. To the best of our knowledge, this is the first rigorous result for the Kuramoto model on random geometric graphs.
△ Less
Submitted 15 February, 2024; v1 submitted 25 August, 2022;
originally announced August 2022.
-
A remark on Kashin's discrepancy argument and partial coloring in the Komlós conjecture
Authors:
Afonso S. Bandeira,
Antoine Maillard,
Nikita Zhivotovskiy
Abstract:
In this expository note, we discuss an early partial coloring result of B. Kashin [C. R. Acad. Bulgare Sci., 1985]. Although this result only implies Spencer's six standard deviations [Trans. Amer. Math. Soc., 1985] up to a $\log\log n$ factor, Kashin's argument gives a simple proof of the existence of a constant discrepancy partial coloring in the setup of Komlós conjecture.
In this expository note, we discuss an early partial coloring result of B. Kashin [C. R. Acad. Bulgare Sci., 1985]. Although this result only implies Spencer's six standard deviations [Trans. Amer. Math. Soc., 1985] up to a $\log\log n$ factor, Kashin's argument gives a simple proof of the existence of a constant discrepancy partial coloring in the setup of Komlós conjecture.
△ Less
Submitted 25 August, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.
-
The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics
Authors:
Afonso S. Bandeira,
Ahmed El Alaoui,
Samuel B. Hopkins,
Tselil Schramm,
Alexander S. Wein,
Ilias Zadik
Abstract:
Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to m…
▽ More
Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to make a rigorous connection between the seemingly different low-degree and free-energy based approaches. We define a free-energy based criterion for hardness and formally connect it to the well-established notion of low-degree hardness for a broad class of statistical problems, namely all Gaussian additive models and certain models with a sparse planted signal. By leveraging these rigorous connections we are able to: establish that for Gaussian additive models the "algebraic" notion of low-degree hardness implies failure of "geometric" local MCMC algorithms, and provide new low-degree lower bounds for sparse linear regression which seem difficult to prove directly. These results provide both conceptual insights into the connections between different notions of hardness, as well as concrete technical tools such as new methods for proving low-degree lower bounds.
△ Less
Submitted 13 October, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Dual bounds for the positive definite functions approach to mutually unbiased bases
Authors:
Afonso S. Bandeira,
Nikolaus Doppelbauer,
Dmitriy Kunisky
Abstract:
A long-standing open problem asks if there can exist 7 mutually unbiased bases (MUBs) in $\mathbb{C}^6$, or, more generally, $d + 1$ MUBs in $\mathbb{C}^d$ for any $d$ that is not a prime power. The recent work of Kolountzakis, Matolcsi, and Weiner (2016) proposed an application of the method of positive definite functions (a relative of Delsarte's method in coding theory and Lovász's semidefinite…
▽ More
A long-standing open problem asks if there can exist 7 mutually unbiased bases (MUBs) in $\mathbb{C}^6$, or, more generally, $d + 1$ MUBs in $\mathbb{C}^d$ for any $d$ that is not a prime power. The recent work of Kolountzakis, Matolcsi, and Weiner (2016) proposed an application of the method of positive definite functions (a relative of Delsarte's method in coding theory and Lovász's semidefinite programming relaxation of the independent set problem) as a means of answering this question in the negative. Namely, they ask whether there exists a polynomial of a unitary matrix input satisfying various properties which, through the method of positive definite functions, would show the non-existence of 7 MUBs in $\mathbb{C}^6$. Using a convex duality argument, we prove that such a polynomial of degree at most 6 cannot exist. We also propose a general dual certificate which we conjecture to certify that this method can never show that there exist strictly fewer than $d + 1$ MUBs in $\mathbb{C}^d$.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
Matrix Concentration Inequalities and Free Probability
Authors:
Afonso S. Bandeira,
March T. Boedihardjo,
Ramon van Handel
Abstract:
A central tool in the study of nonhomogeneous random matrices, the noncommutative Khintchine inequality, yields a nonasymptotic bound on the spectral norm of general Gaussian random matrices $X=\sum_i g_i A_i$ where $g_i$ are independent standard Gaussian variables and $A_i$ are matrix coefficients. This bound exhibits a logarithmic dependence on dimension that is sharp when the matrices $A_i$ com…
▽ More
A central tool in the study of nonhomogeneous random matrices, the noncommutative Khintchine inequality, yields a nonasymptotic bound on the spectral norm of general Gaussian random matrices $X=\sum_i g_i A_i$ where $g_i$ are independent standard Gaussian variables and $A_i$ are matrix coefficients. This bound exhibits a logarithmic dependence on dimension that is sharp when the matrices $A_i$ commute, but often proves to be suboptimal in the presence of noncommutativity. In this paper, we develop nonasymptotic bounds on the spectrum of arbitrary Gaussian random matrices that can capture noncommutativity. These bounds quantify the degree to which the spectrum of $X$ is captured by that of a noncommutative model $X_{\rm free}$ that arises from free probability theory. This "intrinsic freeness" phenomenon provides a powerful tool for the study of various questions that are outside the reach of classical methods of random matrix theory. Our nonasymptotic bounds are easily applicable in concrete situations, and yield sharp results in examples where the noncommutative Khintchine inequality is suboptimal. When combined with a linearization argument, our bounds imply strong asymptotic freeness for a remarkably general class of Gaussian random matrix models that may be very sparse, have dependent entries, and lack any special symmetries. When combined with a universality principle, our bounds extend beyond the Gaussian setting to general sums of independent random matrices.
△ Less
Submitted 28 February, 2023; v1 submitted 13 August, 2021;
originally announced August 2021.
-
The spectral norm of Gaussian matrices with correlated entries
Authors:
Afonso S. Bandeira,
March T. Boedihardjo
Abstract:
We give a non-asymptotic bound on the spectral norm of a $d\times d$ matrix $X$ with centered jointly Gaussian entries in terms of the covariance matrix of the entries. In some cases, this estimate is sharp and removes the $\sqrt{\log d}$ factor in the noncommutative Khintchine inequality.
We give a non-asymptotic bound on the spectral norm of a $d\times d$ matrix $X$ with centered jointly Gaussian entries in terms of the covariance matrix of the entries. In some cases, this estimate is sharp and removes the $\sqrt{\log d}$ factor in the noncommutative Khintchine inequality.
△ Less
Submitted 20 August, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Community Detection with a Subsampled Semidefinite Program
Authors:
Pedro Abdalla,
Afonso S. Bandeira
Abstract:
Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recent…
▽ More
Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.
△ Less
Submitted 10 May, 2022; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Group Testing in the High Dilution Regime
Authors:
Gabriel Arpino,
Nicolò Grometto,
Afonso S. Bandeira
Abstract:
Non-adaptive group testing refers to the problem of inferring a sparse set of defectives from a larger population using the minimum number of simultaneous pooled tests. Recent positive results for noiseless group testing have motivated the study of practical noise models, a prominent one being dilution noise. Under the dilution noise model, items in a test pool have an i.i.d. probability of being…
▽ More
Non-adaptive group testing refers to the problem of inferring a sparse set of defectives from a larger population using the minimum number of simultaneous pooled tests. Recent positive results for noiseless group testing have motivated the study of practical noise models, a prominent one being dilution noise. Under the dilution noise model, items in a test pool have an i.i.d. probability of being diluted, meaning their contribution to a test does not take effect. In this setting, we investigate the number of tests required to achieve vanishing error probability with respect to existing algorithms and provide an algorithm-independent converse bound. In contrast to other noise models, we also encounter the interesting phenomenon that dilution noise on the resulting test outcomes can be offset by choosing a suitable noise-level-dependent Bernoulli test design, resulting in matching achievability and converse bounds up to order in the high noise regime.
△ Less
Submitted 15 July, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Average-Case Integrality Gap for Non-Negative Principal Component Analysis
Authors:
Afonso S. Bandeira,
Dmitriy Kunisky,
Alexander S. Wein
Abstract:
Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\mathbf{x}^{\top}\mathbf{W} \mathbf{x}$ over $\|\mathbf{x}\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\mathbf{W} \in \mathbb{R}^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP app…
▽ More
Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\mathbf{x}^{\top}\mathbf{W} \mathbf{x}$ over $\|\mathbf{x}\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\mathbf{W} \in \mathbb{R}^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP appears to be tight for the GOE, producing a rank-one optimal matrix solution aligned with the optimal vector $\mathbf{x}$. We prove, however, that as $n \to \infty$ the SDP is not tight, and certifies an upper bound asymptotically no better than the simple spectral bound $λ_{\max}(\mathbf{W})$ on this objective function. We also provide evidence, using tools from recent literature on hypothesis testing with low-degree polynomials, that no subexponential-time certification algorithm can improve on this behavior. Finally, we present further numerical experiments estimating how large $n$ would need to be before this limiting behavior becomes evident, providing a cautionary example against extrapolating asymptotics of SDPs in high dimension from their efficacy in small "laptop scale" computations.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs
Authors:
Afonso S. Bandeira,
Jess Banks,
Dmitriy Kunisky,
Cristopher Moore,
Alexander S. Wein
Abstract:
We study the problem of efficiently refuting the k-colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-case computational hardness for this problem in sparse random regular graphs, showing optimality of a simple spectral certificate. This evidence takes the form of a computationally-quiet planting: we construct a distributio…
▽ More
We study the problem of efficiently refuting the k-colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-case computational hardness for this problem in sparse random regular graphs, showing optimality of a simple spectral certificate. This evidence takes the form of a computationally-quiet planting: we construct a distribution of d-regular graphs that has significantly smaller chromatic number than a typical regular graph drawn uniformly at random, while providing evidence that these two distributions are indistinguishable by a large class of algorithms. We generalize our results to the more general problem of certifying an upper bound on the maximum k-cut.
This quiet planting is achieved by minimizing the effect of the planted structure (e.g. colorings or cuts) on the graph spectrum. Specifically, the planted structure corresponds exactly to eigenvectors of the adjacency matrix. This avoids the pushout effect of random matrix theory, and delays the point at which the planting becomes visible in the spectrum or local statistics. To illustrate this further, we give similar results for a Gaussian analogue of this problem: a quiet version of the spiked model, where we plant an eigenspace rather than adding a generic low-rank perturbation.
Our evidence for computational hardness of distinguishing two distributions is based on three different heuristics: stability of belief propagation, the local statistics hierarchy, and the low-degree likelihood ratio. Of independent interest, our results include general-purpose bounds on the low-degree likelihood ratio for multi-spiked matrix models, and an improved low-degree analysis of the stochastic block model.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
The Spectral Norm of Random Lifts of Matrices
Authors:
Afonso S. Bandeira,
Yunzi Ding
Abstract:
We study the spectral norm of random lifts of matrices. Given an $n\times n$ symmetric matrix $A$, and a centered distribution $π$ on $k\times k\ (k\ge 2)$ symmetric matrices with spectral norm at most $1$, let the matrix random lift $A^{(k,π)}$ be the random symmetric $kn\times kn$ matrix $(A_{ij}X_{ij})_{1\le i < j \le n}$, where $X_{ij}$ are independent samples from $π$. We prove that…
▽ More
We study the spectral norm of random lifts of matrices. Given an $n\times n$ symmetric matrix $A$, and a centered distribution $π$ on $k\times k\ (k\ge 2)$ symmetric matrices with spectral norm at most $1$, let the matrix random lift $A^{(k,π)}$ be the random symmetric $kn\times kn$ matrix $(A_{ij}X_{ij})_{1\le i < j \le n}$, where $X_{ij}$ are independent samples from $π$. We prove that
$$\mathbb{E} \|A^{(k,π)}\|\lesssim \max_{i}\sqrt{\sum_j A_{ij}^2}+\max_{ij}|A_{ij}|\sqrt{\log (kn)}.$$
This result can be viewed as an extension of existing spectral bounds on random matrices with independent entries, providing further instances where the multiplicative $\sqrt{\log n}$ factor in the Non-Commutative Khintchine inequality can be removed.
As a direct application of our result, we prove an upper bound of $2(1+ε)\sqrtΔ+O(\sqrt{\log(kn)})$ on the new eigenvalues for random $k$-lifts of a fixed $G = (V,E)$ with $|V| = n$ and maximum degree $Δ$, compared to the previous result of $O(\sqrt{Δ\log(kn)})$ by Oliveira and the recent breakthrough by Bordenave and Collins which gives $2\sqrt{Δ-1} + o(1)$ as $k\rightarrow\infty$ for $Δ$-regular graph $G$.
△ Less
Submitted 1 June, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
The Average-Case Time Complexity of Certifying the Restricted Isometry Property
Authors:
Yunzi Ding,
Dmitriy Kunisky,
Alexander S. Wein,
Afonso S. Bandeira
Abstract:
In compressed sensing, the restricted isometry property (RIP) on $M \times N$ sensing matrices (where $M < N$) guarantees efficient reconstruction of sparse vectors. A matrix has the $(s,δ)$-$\mathsf{RIP}$ property if behaves as a $δ$-approximate isometry on $s$-sparse vectors. It is well known that an $M\times N$ matrix with i.i.d. $\mathcal{N}(0,1/M)$ entries is $(s,δ)$-$\mathsf{RIP}$ with high…
▽ More
In compressed sensing, the restricted isometry property (RIP) on $M \times N$ sensing matrices (where $M < N$) guarantees efficient reconstruction of sparse vectors. A matrix has the $(s,δ)$-$\mathsf{RIP}$ property if behaves as a $δ$-approximate isometry on $s$-sparse vectors. It is well known that an $M\times N$ matrix with i.i.d. $\mathcal{N}(0,1/M)$ entries is $(s,δ)$-$\mathsf{RIP}$ with high probability as long as $s\lesssim δ^2 M/\log N$. On the other hand, most prior works aiming to deterministically construct $(s,δ)$-$\mathsf{RIP}$ matrices have failed when $s \gg \sqrt{M}$. An alternative way to find an RIP matrix could be to draw a random gaussian matrix and certify that it is indeed RIP. However, there is evidence that this certification task is computationally hard when $s \gg \sqrt{M}$, both in the worst case and the average case.
In this paper, we investigate the exact average-case time complexity of certifying the RIP property for $M\times N$ matrices with i.i.d. $\mathcal{N}(0,1/M)$ entries, in the "possible but hard" regime $\sqrt{M} \ll s\lesssim M/\log N$. Based on analysis of the low-degree likelihood ratio, we give rigorous evidence that subexponential runtime $N^{\tildeΩ(s^2/M)}$ is required, demonstrating a smooth tradeoff between the maximum tolerated sparsity and the required computational power. This lower bound is essentially tight, matching the runtime of an existing algorithm due to Koiran and Zouzias. Our hardness result allows $δ$ to take any constant value in $(0,1)$, which captures the relevant regime for compressed sensing. This improves upon the existing average-case hardness result of Wang, Berthet, and Plan, which is limited to $δ= o(1)$.
△ Less
Submitted 22 April, 2021; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Computationally efficient sparse clustering
Authors:
Matthias Löffler,
Alexander S. Wein,
Afonso S. Bandeira
Abstract:
We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the model $X_i = z_i θ+ \varepsilon_i, ~z_i \in \{-1,1\}, ~\varepsilon_i \thicksim \mathcal{N}(0,I)$, which has two clusters with centres $θ$ and $-θ$. We provide a finite sample analysis of a new…
▽ More
We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the model $X_i = z_i θ+ \varepsilon_i, ~z_i \in \{-1,1\}, ~\varepsilon_i \thicksim \mathcal{N}(0,I)$, which has two clusters with centres $θ$ and $-θ$. We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime $\|θ\| \rightarrow \infty$.
Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds -- the low-degree likelihood ratio -- we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable. Our results imply that a large class of tests based on low-degree polynomials fail to solve even the weak testing task.
△ Less
Submitted 22 March, 2021; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Experimental performance of graph neural networks on random instances of max-cut
Authors:
Weichi Yao,
Afonso S. Bandeira,
Soledad Villar
Abstract:
This note explores the applicability of unsupervised machine learning techniques towards hard optimization problems on random inputs. In particular we consider Graph Neural Networks (GNNs) -- a class of neural networks designed to learn functions on graphs -- and we apply them to the max-cut problem on random regular graphs. We focus on the max-cut problem on random regular graphs because it is a…
▽ More
This note explores the applicability of unsupervised machine learning techniques towards hard optimization problems on random inputs. In particular we consider Graph Neural Networks (GNNs) -- a class of neural networks designed to learn functions on graphs -- and we apply them to the max-cut problem on random regular graphs. We focus on the max-cut problem on random regular graphs because it is a fundamental problem that has been widely studied. In particular, even though there is no known explicit solution to compare the output of our algorithm to, we can leverage the known asymptotics of the optimal max-cut value in order to evaluate the performance of the GNNs.
In order to put the performance of the GNNs in context, we compare it with the classical semidefinite relaxation approach by Goemans and Williamson~(SDP), and with extremal optimization, which is a local optimization heuristic from the statistical physics literature. The numerical results we obtain indicate that, surprisingly, Graph Neural Networks attain comparable performance to the Goemans and Williamson SDP. We also observe that extremal optimization consistently outperforms the other two methods. Furthermore, the performances of the three methods present similar patterns, that is, for sparser, and for larger graphs, the size of the found cuts are closer to the asymptotic optimal max-cut value.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
A Tight Degree 4 Sum-of-Squares Lower Bound for the Sherrington-Kirkpatrick Hamiltonian
Authors:
Dmitriy Kunisky,
Afonso S. Bandeira
Abstract:
We show that, if $\mathbf{W} \in \mathbb{R}^{N \times N}_{\mathsf{sym}}$ is drawn from the gaussian orthogonal ensemble, then with high probability the degree 4 sum-of-squares relaxation cannot certify an upper bound on the objective $N^{-1} \cdot \mathbf{x}^\top \mathbf{W} \mathbf{x}$ under the constraints $x_i^2 - 1 = 0$ (i.e. $\mathbf{x} \in \{ \pm 1 \}^N$) that is asymptotically smaller than…
▽ More
We show that, if $\mathbf{W} \in \mathbb{R}^{N \times N}_{\mathsf{sym}}$ is drawn from the gaussian orthogonal ensemble, then with high probability the degree 4 sum-of-squares relaxation cannot certify an upper bound on the objective $N^{-1} \cdot \mathbf{x}^\top \mathbf{W} \mathbf{x}$ under the constraints $x_i^2 - 1 = 0$ (i.e. $\mathbf{x} \in \{ \pm 1 \}^N$) that is asymptotically smaller than $λ_{\max}(\mathbf{W}) \approx 2$. We also conjecture a proof technique for lower bounds against sum-of-squares relaxations of any degree held constant as $N \to \infty$, by proposing an approximate pseudomoment construction.
△ Less
Submitted 7 November, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio
Authors:
Dmitriy Kunisky,
Alexander S. Wein,
Afonso S. Bandeira
Abstract:
These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesi…
▽ More
These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions.
These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
Subexponential-Time Algorithms for Sparse PCA
Authors:
Yunzi Ding,
Dmitriy Kunisky,
Alexander S. Wein,
Afonso S. Bandeira
Abstract:
We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + λxx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + βxx^\top)$, respectively). Prior work has shown that when the signal-to-noise ra…
▽ More
We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + λxx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + βxx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($λ$ or $β\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\|x\|_0 / n = ρ$, it is possible to recover $x$ in polynomial time if $ρ\lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $ρ\ll 1$, it is believed that polynomial-time recovery is impossible unless $ρ\lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the "possible but hard" regime $1/\sqrt{n} \ll ρ\ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^δ)$ for some constant $δ\in (0,1)$. For any $1/\sqrt{n} \ll ρ\ll 1$, we give a recovery algorithm with runtime roughly $\exp(ρ^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(ρn)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.
△ Less
Submitted 23 June, 2022; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Computational Hardness of Certifying Bounds on Constrained PCA Problems
Authors:
Afonso S. Bandeira,
Dmitriy Kunisky,
Alexander S. Wein
Abstract:
Given a random $n \times n$ symmetric matrix $\boldsymbol W$ drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form $\boldsymbol x^\top \boldsymbol W \boldsymbol x$ over all vectors $\boldsymbol x$ in a constraint set $\mathcal{S} \subset \mathbb{R}^n$. For a certain class of normalized constraint sets…
▽ More
Given a random $n \times n$ symmetric matrix $\boldsymbol W$ drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form $\boldsymbol x^\top \boldsymbol W \boldsymbol x$ over all vectors $\boldsymbol x$ in a constraint set $\mathcal{S} \subset \mathbb{R}^n$. For a certain class of normalized constraint sets $\mathcal{S}$ we show that, conditional on certain complexity-theoretic assumptions, there is no polynomial-time algorithm certifying a better upper bound than the largest eigenvalue of $\boldsymbol W$. A notable special case included in our results is the hypercube $\mathcal{S} = \{ \pm 1 / \sqrt{n}\}^n$, which corresponds to the problem of certifying bounds on the Hamiltonian of the Sherrington-Kirkpatrick spin glass model from statistical physics.
Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negatively-spiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no low-degree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for identifying computational thresholds was proposed in a sequence of recent works on the sum-of-squares hierarchy, and is believed to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over $\boldsymbol x \in \mathcal{S}$ is much larger than that of a GOE matrix.
△ Less
Submitted 6 April, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Sum-of-Squares Optimization and the Sparsity Structure of Equiangular Tight Frames
Authors:
Afonso S. Bandeira,
Dmitriy Kunisky
Abstract:
Equiangular tight frames (ETFs) may be used to construct examples of feasible points for semidefinite programs arising in sum-of-squares (SOS) optimization. We show how generalizing the calculations in a recent work of the authors' that explored this connection also yields new bounds on the sparsity of (both real and complex) ETFs. One corollary shows that Steiner ETFs corresponding to finite proj…
▽ More
Equiangular tight frames (ETFs) may be used to construct examples of feasible points for semidefinite programs arising in sum-of-squares (SOS) optimization. We show how generalizing the calculations in a recent work of the authors' that explored this connection also yields new bounds on the sparsity of (both real and complex) ETFs. One corollary shows that Steiner ETFs corresponding to finite projective planes are optimally sparse in the sense of achieving tightness in a matrix inequality controlling overlaps between sparsity patterns of distinct rows of the synthesis matrix. We also formulate several natural open problems concerning further generalizations of our technique.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
A Gramian Description of the Degree 4 Generalized Elliptope
Authors:
Afonso S. Bandeira,
Dmitriy Kunisky
Abstract:
One of the most widely studied convex relaxations in combinatorial optimization is the relaxation of the cut polytope $\mathscr C^N$ to the elliptope $\mathscr E^N$, which corresponds to the degree 2 sum-of-squares (SOS) relaxation of optimizing a quadratic form over the hypercube $\{\pm 1\}^N$. We study the extension of this classical idea to degree 4 SOS, which gives an intermediate relaxation w…
▽ More
One of the most widely studied convex relaxations in combinatorial optimization is the relaxation of the cut polytope $\mathscr C^N$ to the elliptope $\mathscr E^N$, which corresponds to the degree 2 sum-of-squares (SOS) relaxation of optimizing a quadratic form over the hypercube $\{\pm 1\}^N$. We study the extension of this classical idea to degree 4 SOS, which gives an intermediate relaxation we call the degree 4 generalized elliptope $\mathscr E_4^N$. Our main result is a necessary and sufficient condition for the Gram matrix of a collection of vectors to belong to $\mathscr E_4^N$. Consequences include a tight rank inequality between degree 2 and degree 4 pseudomoment matrices, and a guarantee that the only extreme points of $\mathscr E^N$ also in $\mathscr E_4^N$ are the cut matrices; that is, $\mathscr E^N$ and $\mathscr E_4^N$ share no "spurious" extreme point.
For Gram matrices of equiangular tight frames, we give a simple criterion for membership in $\mathscr{E}_4^N$. This yields new inequalities satisfied in $\mathscr{E}_4^N$ but not $\mathscr{E}^N$ whose structure is related to the Schläfli graph and which cannot be obtained as linear combinations of triangle inequalities. We also give a new proof of the restriction to degree 4 of a result of Laurent showing that $\mathscr{E}_4^N$ does not satisfy certain cut polytope inequalities capturing parity constraints. Though limited to this special case, our proof of the positive semidefiniteness of Laurent's pseudomoment matrix is short and elementary.
Our techniques also suggest that membership in $\mathscr{E}_4^N$ is closely related to the partial transpose operation on block matrices, which has previously played an important role in the study of quantum entanglement. To illustrate, we present a correspondence between certain entangled bipartite quantum states and the matrices of $\mathscr{E}_4^N\setminus\mathscr{C}^N$.
△ Less
Submitted 24 March, 2019; v1 submitted 30 December, 2018;
originally announced December 2018.
-
On the Landscape of Synchronization Networks: A Perspective from Nonconvex Optimization
Authors:
Shuyang Ling,
Ruitu Xu,
Afonso S. Bandeira
Abstract:
Studying the landscape of nonconvex cost function is key towards a better understanding of optimization algorithms widely used in signal processing, statistics, and machine learning. Meanwhile, the famous Kuramoto model has been an important mathematical model to study the synchronization phenomena of coupled oscillators over various network topologies. In this paper, we bring together these two s…
▽ More
Studying the landscape of nonconvex cost function is key towards a better understanding of optimization algorithms widely used in signal processing, statistics, and machine learning. Meanwhile, the famous Kuramoto model has been an important mathematical model to study the synchronization phenomena of coupled oscillators over various network topologies. In this paper, we bring together these two seemingly unrelated objects by investigating the optimization landscape of a nonlinear function $E(\boldsymbolθ) = \frac{1}{2}\sum_{1\leq i,j\leq n} a_{ij}(1-\cos(θ_i - θ_j))$ associated to an underlying network and exploring the relationship between the existence of local minima and network topology. This function arises naturally in Burer-Monteiro method applied to $\mathbb{Z}_2$ synchronization as well as matrix completion on the torus. Moreover, it corresponds to the energy function of the homogeneous Kuramoto model on complex networks for coupled oscillators. We prove the minimizer of the energy function is unique up to a global translation under deterministic dense graphs and Erdős-Rényi random graphs with tools from optimization and random matrix theory. Consequently, the stable equilibrium of the corresponding homogeneous Kuramoto model is unique and the basin of attraction for the synchronous state of these coupled oscillators is the whole phase space minus a set of measure zero. In addition, our results address when the Burer-Monteiro method recovers the ground truth exactly from highly incomplete observations in $\mathbb{Z}_2$ synchronization and shed light on the robustness of nonconvex optimization algorithms against certain types of so-called monotone adversaries. Numerical simulations are performed to illustrate our results.
△ Less
Submitted 19 April, 2019; v1 submitted 28 September, 2018;
originally announced September 2018.
-
Stochastic Block Model for Hypergraphs: Statistical limits and a semidefinite programming approach
Authors:
Chiheon Kim,
Afonso S. Bandeira,
Michel X. Goemans
Abstract:
We study the problem of community detection in a random hypergraph model which we call the stochastic block model for $k$-uniform hypergraphs ($k$-SBM). We investigate the exact recovery problem in $k$-SBM and show that a sharp phase transition occurs around a threshold: below the threshold it is impossible to recover the communities with non-vanishing probability, yet above the threshold there is…
▽ More
We study the problem of community detection in a random hypergraph model which we call the stochastic block model for $k$-uniform hypergraphs ($k$-SBM). We investigate the exact recovery problem in $k$-SBM and show that a sharp phase transition occurs around a threshold: below the threshold it is impossible to recover the communities with non-vanishing probability, yet above the threshold there is an estimator which recovers the communities almost asymptotically surely. We also consider a simple, efficient algorithm for the exact recovery problem which is based on a semidefinite relaxation technique.
△ Less
Submitted 8 July, 2018;
originally announced July 2018.
-
Optimality and Sub-optimality of PCA I: Spiked Random Matrix Models
Authors:
Amelia Perry,
Alexander S. Wein,
Afonso S. Bandeira,
Ankur Moitra
Abstract:
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensembl…
▽ More
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including non-spectral tests. Our results leverage Le Cam's notion of contiguity, and include:
i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike.
ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries.
iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.
△ Less
Submitted 12 July, 2018; v1 submitted 2 July, 2018;
originally announced July 2018.
-
Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs
Authors:
Nicolas Boumal,
Vladislav Voroninski,
Afonso S. Bandeira
Abstract:
We consider semidefinite programs (SDPs) with equality constraints. The variable to be optimized is a positive semidefinite matrix $X$ of size $n$. Following the Burer--Monteiro approach, we optimize a factor $Y$ of size $n \times p$ instead, such that $X = YY^T$. This ensures positive semidefiniteness at no cost and can reduce the dimension of the problem if $p$ is small, but results in a non-con…
▽ More
We consider semidefinite programs (SDPs) with equality constraints. The variable to be optimized is a positive semidefinite matrix $X$ of size $n$. Following the Burer--Monteiro approach, we optimize a factor $Y$ of size $n \times p$ instead, such that $X = YY^T$. This ensures positive semidefiniteness at no cost and can reduce the dimension of the problem if $p$ is small, but results in a non-convex optimization problem with a quadratic cost function and quadratic equality constraints in $Y$. In this paper, we show that if the set of constraints on $Y$ regularly defines a smooth manifold, then, despite non-convexity, first- and second-order necessary optimality conditions are also sufficient, provided $p$ is large enough. For smaller values of $p$, we show a similar result holds for almost all (linear) cost functions. Under those conditions, a global optimum $Y$ maps to a global optimum $X = YY^T$ of the SDP. We deduce old and new consequences for SDP relaxations of the generalized eigenvector problem, the trust-region subproblem and quadratic optimization over several spheres, as well as for the Max-Cut and Orthogonal-Cut SDPs which are common relaxations in stochastic block modeling and synchronization of rotations.
△ Less
Submitted 28 May, 2019; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Notes on computational-to-statistical gaps: predictions using statistical physics
Authors:
Afonso S. Bandeira,
Amelia Perry,
Alexander S. Wein
Abstract:
In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools f…
▽ More
In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics.
These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017.
△ Less
Submitted 20 April, 2018; v1 submitted 29 March, 2018;
originally announced March 2018.
-
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Authors:
Luca Venturi,
Afonso S. Bandeira,
Joan Bruna
Abstract:
Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may explain such success.
In this paper, we address this phenomenon by studying a key topological pr…
▽ More
Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may explain such success.
In this paper, we address this phenomenon by studying a key topological property of the loss: the presence or absence of spurious valleys, defined as connected components of sub-level sets that do not include a global minimum. Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, we identify a notion of intrinsic dimension and show that it provides necessary and sufficient conditions for the absence of spurious valleys. More concretely, finite intrinsic dimension guarantees that for sufficiently overparametrised models no spurious valleys exist, independently of the data distribution. Conversely, infinite intrinsic dimension implies that spurious valleys do exist for certain data distributions, independently of model overparametrisation. Besides these positive and negative results, we show that, although spurious valleys may exist in general, they are confined to low risk levels and avoided with high probability on overparametrised models.
△ Less
Submitted 16 June, 2020; v1 submitted 18 February, 2018;
originally announced February 2018.
-
Estimation under group actions: recovering orbits from invariants
Authors:
Afonso S. Bandeira,
Ben Blum-Smith,
Joe Kileel,
Amelia Perry,
Jonathan Niles-Weed,
Alexander S. Wein
Abstract:
We study a class of orbit recovery problems in which we observe independent copies of an unknown element of $\mathbb{R}^p$, each linearly acted upon by a random element of some group (such as $\mathbb{Z}/p$ or $\mathrm{SO}(3)$) and then corrupted by additive Gaussian noise. We prove matching upper and lower bounds on the number of samples required to approximately recover the group orbit of this u…
▽ More
We study a class of orbit recovery problems in which we observe independent copies of an unknown element of $\mathbb{R}^p$, each linearly acted upon by a random element of some group (such as $\mathbb{Z}/p$ or $\mathrm{SO}(3)$) and then corrupted by additive Gaussian noise. We prove matching upper and lower bounds on the number of samples required to approximately recover the group orbit of this unknown element with high probability. These bounds, based on quantitative techniques in invariant theory, give a precise correspondence between the statistical difficulty of the estimation problem and algebraic properties of the group. Furthermore, we give computer-assisted procedures to certify these properties that are computationally efficient in many cases of interest.
The model is motivated by geometric problems in signal processing, computer vision, and structural biology, and applies to the reconstruction problem in cryo-electron microscopy (cryo-EM), a problem of significant practical interest. Our results allow us to verify (for a given problem size) that if cryo-EM images are corrupted by noise with variance $σ^2$, the number of images required to recover the molecule structure scales as $σ^6$. We match this bound with a novel (albeit computationally expensive) algorithm for ab initio reconstruction in cryo-EM, based on invariant features of degree at most 3. We further discuss how to recover multiple molecular structures from mixed (or heterogeneous) cryo-EM samples.
△ Less
Submitted 13 June, 2023; v1 submitted 29 December, 2017;
originally announced December 2017.
-
The sample complexity of multi-reference alignment
Authors:
Amelia Perry,
Jonathan Weed,
Afonso S. Bandeira,
Philippe Rigollet,
Amit Singer
Abstract:
The growing role of data-driven approaches to scientific discovery has unveiled a large class of models that involve latent transformations with a rigid algebraic constraint. Three-dimensional molecule reconstruction in Cryo-Electron Microscopy (cryo-EM) is a central problem in this class. Despite decades of algorithmic and software development, there is still little theoretical understanding of t…
▽ More
The growing role of data-driven approaches to scientific discovery has unveiled a large class of models that involve latent transformations with a rigid algebraic constraint. Three-dimensional molecule reconstruction in Cryo-Electron Microscopy (cryo-EM) is a central problem in this class. Despite decades of algorithmic and software development, there is still little theoretical understanding of the sample complexity of this problem, that is, number of images required for 3-D reconstruction. Here we consider multi-reference alignment (MRA), a simple model that captures fundamental aspects of the statistical and algorithmic challenges arising in cryo-EM and related problems. In MRA, an unknown signal is subject to two types of corruption: a latent cyclic shift and the more traditional additive white noise. The goal is to recover the signal at a certain precision from independent samples. While at high signal-to-noise ratio (SNR), the number of observations needed to recover a generic signal is proportional to $1/\mathrm{SNR}$, we prove that it rises to a surprising $1/\mathrm{SNR}^3$ in the low SNR regime. This precise phenomenon was observed empirically more than twenty years ago for cryo-EM but has remained unexplained to date. Furthermore, our techniques can easily be extended to the heterogeneous MRA model where the samples come from a mixture of signals, as is often the case in applications such as cryo-EM, where molecules may have different conformations. This provides a first step towards a statistical theory for heterogeneous cryo-EM.
△ Less
Submitted 3 June, 2019; v1 submitted 4 July, 2017;
originally announced July 2017.
-
Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks
Authors:
Alex Nowak,
Soledad Villar,
Afonso S. Bandeira,
Joan Bruna
Abstract:
Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution.
In this revised note, we are interested in studying another…
▽ More
Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution.
In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These 'planted solutions' are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense.
We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.
△ Less
Submitted 30 August, 2018; v1 submitted 22 June, 2017;
originally announced June 2017.
-
Community Detection in Hypergraphs, Spiked Tensor Models, and Sum-of-Squares
Authors:
Chiheon Kim,
Afonso S. Bandeira,
Michel X. Goemans
Abstract:
We study the problem of community detection in hypergraphs under a stochastic block model. Similarly to how the stochastic block model in graphs suggests studying spiked random matrices, our model motivates investigating statistical and computational limits of exact recovery in a certain spiked tensor model. In contrast with the matrix case, the spiked model naturally arising from community detect…
▽ More
We study the problem of community detection in hypergraphs under a stochastic block model. Similarly to how the stochastic block model in graphs suggests studying spiked random matrices, our model motivates investigating statistical and computational limits of exact recovery in a certain spiked tensor model. In contrast with the matrix case, the spiked model naturally arising from community detection in hypergraphs is different from the one arising in the so-called tensor Principal Component Analysis model. We investigate the effectiveness of algorithms in the Sum-of-Squares hierarchy on these models. Interestingly, our results suggest that these two apparently similar models exhibit significantly different computational to statistical gaps.
△ Less
Submitted 3 July, 2018; v1 submitted 8 May, 2017;
originally announced May 2017.
-
Optimal rates of estimation for multi-reference alignment
Authors:
Afonso S. Bandeira,
Philippe Rigollet,
Jonathan Weed
Abstract:
In this paper, we establish optimal rates of adaptive estimation of a vector in the multi-reference alignment model, a problem with important applications in fields such as signal processing, image processing, and computer vision, among others. We describe how this model can be viewed as a multivariate Gaussian mixture model under the constraint that the centers belong to the orbit of a group. Thi…
▽ More
In this paper, we establish optimal rates of adaptive estimation of a vector in the multi-reference alignment model, a problem with important applications in fields such as signal processing, image processing, and computer vision, among others. We describe how this model can be viewed as a multivariate Gaussian mixture model under the constraint that the centers belong to the orbit of a group. This enables us to derive matching upper and lower bounds that feature an interesting dependence on the signal-to-noise ratio of the model. Both upper and lower bounds are articulated around a tight local control of Kullback-Leibler divergences that showcases the central role of moment tensors in this problem.
△ Less
Submitted 20 May, 2018; v1 submitted 27 February, 2017;
originally announced February 2017.
-
Statistical limits of spiked tensor models
Authors:
Amelia Perry,
Alexander S. Wein,
Afonso S. Bandeira
Abstract:
We study the statistical limits of both detecting and estimating a rank-one deformation of a symmetric random Gaussian tensor. We establish upper and lower bounds on the critical signal-to-noise ratio, under a variety of priors for the planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. $\pm 1$ entries, and (iii) a sparse vector where a constant fraction $ρ$ of entries are i.i.d.…
▽ More
We study the statistical limits of both detecting and estimating a rank-one deformation of a symmetric random Gaussian tensor. We establish upper and lower bounds on the critical signal-to-noise ratio, under a variety of priors for the planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. $\pm 1$ entries, and (iii) a sparse vector where a constant fraction $ρ$ of entries are i.i.d. $\pm 1$ and the rest are zero. For each of these cases, our upper and lower bounds match up to a $1+o(1)$ factor as the order $d$ of the tensor becomes large. For sparse signals (iii), our bounds are also asymptotically tight in the sparse limit $ρ\to 0$ for any fixed $d$ (including the $d=2$ case of sparse PCA). Our upper bounds for (i) demonstrate a phenomenon reminiscent of the work of Baik, Ben Arous and Péché: an `eigenvalue' of a perturbed tensor emerges from the bulk at a strictly lower signal-to-noise ratio than when the perturbation itself exceeds the bulk; we quantify the size of this effect. We also provide some general results for larger classes of priors. In particular, the large $d$ asymptotics of the threshold location differs between problems with discrete priors versus continuous priors. Finally, for priors (i) and (ii) we carry out the replica prediction from statistical physics, which is conjectured to give the exact information-theoretic threshold for any fixed $d$.
Of independent interest, we introduce a new improvement to the second moment method for contiguity, on which our lower bounds are based. Our technique conditions away from rare `bad' events that depend on interactions between the signal and noise. This enables us to close $\sqrt{2}$-factor gaps present in several previous works.
△ Less
Submitted 24 January, 2017; v1 submitted 22 December, 2016;
originally announced December 2016.
-
SE-Sync: A Certifiably Correct Algorithm for Synchronization over the Special Euclidean Group
Authors:
David M. Rosen,
Luca Carlone,
Afonso S. Bandeira,
John J. Leonard
Abstract:
Many important geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given a set of relative measurements between them. This problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to…
▽ More
Many important geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given a set of relative measurements between them. This problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of the special Euclidean synchronization problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides an exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and design a truncated-Newton trust-region method to solve this reduction efficiently. Finally, we combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is able to recover certifiably globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics and computer vision applications, and does so more than an order of magnitude faster than the Gauss-Newton-based approach that forms the basis of current state-of-the-art techniques.
△ Less
Submitted 4 February, 2017; v1 submitted 21 December, 2016;
originally announced December 2016.
-
Marčenko-Pastur Law for Kendall's Tau
Authors:
Afonso S. Bandeira,
Asad Lodhia,
Philippe Rigollet
Abstract:
We prove that Kendall's Rank correlation matrix converges to the Marčenko-Pastur law, under the assumption that the observations are i.i.d random vectors $X_1$, $\dots$, $X_n$ with components that are independent and absolutely continuous with respect to the Lebesgue measure. This is the first result on the empirical spectral distribution of a multivariate $U$-statistic.
We prove that Kendall's Rank correlation matrix converges to the Marčenko-Pastur law, under the assumption that the observations are i.i.d random vectors $X_1$, $\dots$, $X_n$ with components that are independent and absolutely continuous with respect to the Lebesgue measure. This is the first result on the empirical spectral distribution of a multivariate $U$-statistic.
△ Less
Submitted 21 January, 2017; v1 submitted 14 November, 2016;
originally announced November 2016.
-
A Certifiably Correct Algorithm for Synchronization over the Special Euclidean Group
Authors:
David M. Rosen,
Luca Carlone,
Afonso S. Bandeira,
John J. Leonard
Abstract:
Many geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given noisy measurements of a subset of their pairwise relative transforms. This problem is typically formulated as a maximum-likelihood estimation that requires solving a nonconvex nonlinear program, which is computationally intractable in general. Neverthele…
▽ More
Many geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given noisy measurements of a subset of their pairwise relative transforms. This problem is typically formulated as a maximum-likelihood estimation that requires solving a nonconvex nonlinear program, which is computationally intractable in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of this estimation problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides the exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this semidefinite relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and then design a Riemannian truncated-Newton trust-region method to solve this reduction efficiently. We combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is capable of recovering globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics applications, and does so at a computational cost that scales comparably with that of direct Newton-type local search techniques.
△ Less
Submitted 9 February, 2017; v1 submitted 1 November, 2016;
originally announced November 2016.
-
A polynomial-time relaxation of the Gromov-Hausdorff distance
Authors:
Soledad Villar,
Afonso S. Bandeira,
Andrew J. Blumberg,
Rachel Ward
Abstract:
The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time,…
▽ More
The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time, and somewhat surprisingly is itself a pseudometric. We describe the induced topology on the set of compact metric spaces. Finally, we demonstrate the numerical performance of various algorithms for computing the relaxed distance and apply these algorithms to several relevant data sets. In particular we propose a greedy algorithm for finding the best correspondence between finite metric spaces that can handle hundreds of points.
△ Less
Submitted 18 October, 2016; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Message-passing algorithms for synchronization problems over compact groups
Authors:
Amelia Perry,
Alexander S. Wein,
Afonso S. Bandeira,
Ankur Moitra
Abstract:
Various alignment problems arising in cryo-electron microscopy, community detection, time synchronization, computer vision, and other fields fall into a common framework of synchronization problems over compact groups such as Z/L, U(1), or SO(3). The goal of such problems is to estimate an unknown vector of group elements given noisy relative observations. We present an efficient iterative algorit…
▽ More
Various alignment problems arising in cryo-electron microscopy, community detection, time synchronization, computer vision, and other fields fall into a common framework of synchronization problems over compact groups such as Z/L, U(1), or SO(3). The goal of such problems is to estimate an unknown vector of group elements given noisy relative observations. We present an efficient iterative algorithm to solve a large class of these problems, allowing for any compact group, with measurements on multiple 'frequency channels' (Fourier modes, or more generally, irreducible representations of the group). Our algorithm is a highly efficient iterative method following the blueprint of approximate message passing (AMP), which has recently arisen as a central technique for inference problems such as structured low-rank estimation and compressed sensing. We augment the standard ideas of AMP with ideas from representation theory so that the algorithm can work with distributions over compact groups. Using standard but non-rigorous methods from statistical physics we analyze the behavior of our algorithm on a Gaussian noise model, identifying phases where the problem is easy, (computationally) hard, and (statistically) impossible. In particular, such evidence predicts that our algorithm is information-theoretically optimal in many cases, and that the remaining cases show evidence of statistical-to-computational gaps.
△ Less
Submitted 14 October, 2016;
originally announced October 2016.
-
Resilience for the Littlewood-Offord Problem
Authors:
Afonso S. Bandeira,
Asaf Ferber,
Matthew Kwan
Abstract:
Consider the sum $X(ξ)=\sum_{i=1}^n a_iξ_i$, where $a=(a_i)_{i=1}^n$ is a sequence of non-zero reals and $ξ=(ξ_i)_{i=1}^n$ is a sequence of i.i.d. Rademacher random variables (that is, $\Pr[ξ_i=1]=\Pr[ξ_i=-1]=1/2$). The classical Littlewood-Offord problem asks for the best possible upper bound on the concentration probabilities $\Pr[X=x]$. In this paper we study a resilience version of the Littlew…
▽ More
Consider the sum $X(ξ)=\sum_{i=1}^n a_iξ_i$, where $a=(a_i)_{i=1}^n$ is a sequence of non-zero reals and $ξ=(ξ_i)_{i=1}^n$ is a sequence of i.i.d. Rademacher random variables (that is, $\Pr[ξ_i=1]=\Pr[ξ_i=-1]=1/2$). The classical Littlewood-Offord problem asks for the best possible upper bound on the concentration probabilities $\Pr[X=x]$. In this paper we study a resilience version of the Littlewood-Offord problem: how many of the $ξ_i$ is an adversary typically allowed to change without being able to force concentration on a particular value? We solve this problem asymptotically, and present a few interesting open problems.
△ Less
Submitted 2 August, 2017; v1 submitted 26 September, 2016;
originally announced September 2016.
-
Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization
Authors:
Amelia Perry,
Alexander S. Wein,
Afonso S. Bandeira,
Ankur Moitra
Abstract:
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition as…
▽ More
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the signal strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise.
However, not all the information about the spike is necessarily contained in the spectrum. We study the fundamental limitations of statistical methods, including non-spectral ones. Our results include:
I) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for a variety of benign priors for the spike. We extend previous work on the spherically symmetric and i.i.d. Rademacher priors through an elementary, unified analysis.
II) For any non-Gaussian Wigner ensemble, we show that PCA is always suboptimal for detection. However, a variant of PCA achieves the optimal threshold (for benign priors) by pre-transforming the matrix entries according to a carefully designed function. This approach has been stated before, and we give a rigorous and general analysis.
III) For both the Gaussian Wishart ensemble and various synchronization problems over groups, we show that inefficient procedures can work below the threshold where PCA succeeds, whereas no known efficient algorithm achieves this. This conjectural gap between what is statistically possible and what can be done efficiently remains open.
△ Less
Submitted 23 December, 2016; v1 submitted 18 September, 2016;
originally announced September 2016.
-
The non-convex Burer-Monteiro approach works on smooth semidefinite programs
Authors:
Nicolas Boumal,
Vladislav Voroninski,
Afonso S. Bandeira
Abstract:
Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-conve…
▽ More
Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. Although some theory supports this empirical success, a complete explanation of it remains an open question. In this paper, we consider a class of SDPs which includes applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations. We show that the low-rank Burer--Monteiro formulation of SDPs in that class almost never has any spurious local optima.
△ Less
Submitted 10 April, 2018; v1 submitted 15 June, 2016;
originally announced June 2016.
-
On the low-rank approach for semidefinite programs arising in synchronization and community detection
Authors:
Afonso S. Bandeira,
Nicolas Boumal,
Vladislav Voroninski
Abstract:
To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solv…
▽ More
To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solve such semidefinite programs by restricting the search space to low-rank matrices. The accompanying theory does not explain the extent of the empirical success. We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic.
△ Less
Submitted 27 May, 2016; v1 submitted 14 February, 2016;
originally announced February 2016.