Search | arXiv e-print repository

Tensor cumulants for statistical inference on invariant distributions

Authors: Dmitriy Kunisky, Cristopher Moore, Alexander S. Wein

Abstract: Many problems in high-dimensional statistics appear to have a statistical-computational gap: a range of values of the signal-to-noise ratio where inference is information-theoretically possible, but (conjecturally) computationally intractable. A canonical such problem is Tensor PCA, where we observe a tensor $Y$ consisting of a rank-one signal plus Gaussian noise. Multiple lines of work suggest th… ▽ More Many problems in high-dimensional statistics appear to have a statistical-computational gap: a range of values of the signal-to-noise ratio where inference is information-theoretically possible, but (conjecturally) computationally intractable. A canonical such problem is Tensor PCA, where we observe a tensor $Y$ consisting of a rank-one signal plus Gaussian noise. Multiple lines of work suggest that Tensor PCA becomes computationally hard at a critical value of the signal's magnitude. In particular, below this transition, no low-degree polynomial algorithm can detect the signal with high probability; conversely, various spectral algorithms are known to succeed above this transition. We unify and extend this work by considering tensor networks, orthogonally invariant polynomials where multiple copies of $Y$ are "contracted" to produce scalars, vectors, matrices, or other tensors. We define a new set of objects, tensor cumulants, which provide an explicit, near-orthogonal basis for invariant polynomials of a given degree. This basis lets us unify and strengthen previous results on low-degree hardness, giving a combinatorial explanation of the hardness transition and of a continuum of subexponential-time algorithms that work below it, and proving tight lower bounds against low-degree polynomials for recovering rather than just detecting the signal. It also lets us analyze a new problem of distinguishing between different tensor ensembles, such as Wigner and Wishart tensors, establishing a sharp computational threshold and giving evidence of a new statistical-computational gap in the Central Limit Theorem for random tensors. Finally, we believe these cumulants are valuable mathematical objects in their own right: they generalize the free cumulants of free probability theory from matrices to tensors, and share many of their properties, including additivity under additive free convolution. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 72 pages, 12 figures

arXiv:2402.05451 [pdf, ps, other]

Low-degree phase transitions for detecting a planted clique in sublinear time

Authors: Jay Mardia, Kabir Aladin Verchand, Alexander S. Wein

Abstract: We consider the problem of detecting a planted clique of size $k$ in a random graph on $n$ vertices. When the size of the clique exceeds $Θ(\sqrt{n})$, polynomial-time algorithms for detection proliferate. We study faster -- namely, sublinear time -- algorithms in the high-signal regime when $k = Θ(n^{1/2 + δ})$, for some $δ> 0$. To this end, we consider algorithms that non-adaptively query a subs… ▽ More We consider the problem of detecting a planted clique of size $k$ in a random graph on $n$ vertices. When the size of the clique exceeds $Θ(\sqrt{n})$, polynomial-time algorithms for detection proliferate. We study faster -- namely, sublinear time -- algorithms in the high-signal regime when $k = Θ(n^{1/2 + δ})$, for some $δ> 0$. To this end, we consider algorithms that non-adaptively query a subset $M$ of entries of the adjacency matrix and then compute a low-degree polynomial function of the revealed entries. We prove a computational phase transition for this class of non-adaptive low-degree algorithms: under the scaling $\lvert M \rvert = Θ(n^γ)$, the clique can be detected when $γ> 3(1/2 - δ)$ but not when $γ< 3(1/2 - δ)$. As a result, the best known runtime for detecting a planted clique, $\widetilde{O}(n^{3(1/2-δ)})$, cannot be improved without looking beyond the non-adaptive low-degree class. Our proof of the lower bound -- based on bounding the conditional low-degree likelihood ratio -- reveals further structure in non-adaptive detection of a planted clique. Using (a bound on) the conditional low-degree likelihood ratio as a potential function, we show that for every non-adaptive query pattern, there is a highly structured query pattern of the same size that is at least as effective. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 23 pages, 2 figures

arXiv:2402.00305 [pdf, ps, other]

Information-Theoretic Thresholds for Planted Dense Cycles

Authors: Cheng Mao, Alexander S. Wein, Shenduo Zhang

Abstract: We study a random graph model for small-world networks which are ubiquitous in social and biological sciences. In this model, a dense cycle of expected bandwidth $n τ$, representing the hidden one-dimensional geometry of vertices, is planted in an ambient random graph on $n$ vertices. For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds i… ▽ More We study a random graph model for small-world networks which are ubiquitous in social and biological sciences. In this model, a dense cycle of expected bandwidth $n τ$, representing the hidden one-dimensional geometry of vertices, is planted in an ambient random graph on $n$ vertices. For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $τ$, and an edge-wise signal-to-noise ratio $λ$. In particular, the information-theoretic thresholds differ from the computational thresholds established in a recent work for low-degree polynomial algorithms, thereby justifying the existence of statistical-to-computational gaps for this problem. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: 31 pages, 1 figure

MSC Class: 94A15; 62B10; 68Q87; 05C80; 05C60

arXiv:2312.13554 [pdf, ps, other]

Time Lower Bounds for the Metropolis Process and Simulated Annealing

Authors: Zongchen Chen, Dan Mikulincer, Daniel Reichman, Alexander S. Wein

Abstract: The Metropolis process (MP) and Simulated Annealing (SA) are stochastic local search heuristics that are often used in solving combinatorial optimization problems. Despite significant interest, there are very few theoretical results regarding the quality of approximation obtained by MP and SA (with polynomially many iterations) for NP-hard optimization problems. We provide rigorous lower bounds… ▽ More The Metropolis process (MP) and Simulated Annealing (SA) are stochastic local search heuristics that are often used in solving combinatorial optimization problems. Despite significant interest, there are very few theoretical results regarding the quality of approximation obtained by MP and SA (with polynomially many iterations) for NP-hard optimization problems. We provide rigorous lower bounds for MP and SA with respect to the classical maximum independent set problem when the algorithms are initialized from the empty set. We establish the existence of a family of graphs for which both MP and SA fail to find approximate solutions in polynomial time. More specifically, we show that for any $\varepsilon \in (0,1)$ there are $n$-vertex graphs for which the probability SA (when limited to polynomially many iterations) will approximate the optimal solution within ratio $Ω\left(\frac{1}{n^{1-\varepsilon}}\right)$ is exponentially small. Our lower bounds extend to graphs of constant average degree $d$, illustrating the failure of MP to achieve an approximation ratio of $Ω\left(\frac{\log (d)}{d}\right)$ in polynomial time. In some cases, our impossibility results also go beyond Simulated Annealing and apply even when the temperature is chosen adaptively. Finally, we prove time lower bounds when the inputs to these algorithms are bipartite graphs, and even trees, which are known to admit polynomial-time algorithms for the independent set problem. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 44 pages

arXiv:2304.08135 [pdf, ps, other]

Detection of Dense Subhypergraphs by Low-Degree Polynomials

Authors: Abhishek Dhawan, Cheng Mao, Alexander S. Wein

Abstract: Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem that has been extensively studied in recent years. We study a hypergraph version of the problem. Let $G^r(n,p)$ denote the $r$-uniform Erdős-Rényi hypergraph model with $n$ vertices and edge density $p$. We consider detecting the presence of a planted $G^r(n^γ, n^{-α})$ subhypergraph in a… ▽ More Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem that has been extensively studied in recent years. We study a hypergraph version of the problem. Let $G^r(n,p)$ denote the $r$-uniform Erdős-Rényi hypergraph model with $n$ vertices and edge density $p$. We consider detecting the presence of a planted $G^r(n^γ, n^{-α})$ subhypergraph in a $G^r(n, n^{-β})$ hypergraph, where $0< α< β< r-1$ and $0 < γ< 1$. Focusing on tests that are degree-$n^{o(1)}$ polynomials of the entries of the adjacency tensor, we determine the threshold between the easy and hard regimes for the detection problem. More precisely, for $0 < γ< 1/2$, the threshold is given by $α= βγ$, and for $1/2 \le γ< 1$, the threshold is given by $α= β/2 + r(γ- 1/2)$. Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known. Our proof of low-degree hardness is based on a conditional variant of the standard low-degree likelihood calculation. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 31 pages

arXiv:2303.00252 [pdf, ps, other]

Is Planted Coloring Easier than Planted Clique?

Authors: Pravesh K. Kothari, Santosh S. Vempala, Alexander S. Wein, Jeff Xu

Abstract: We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs. The problem of recov… ▽ More We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs. The problem of recovering a planted $q$-coloring is equivalent to recovering $q$ disjoint planted cliques that cover all the vertices -- a potentially easier variant of the well-studied planted clique problem. Our first result shows that this variant is as hard as the original planted clique problem in the low-degree polynomial model of computation: each clique needs to have size $k \gg \sqrt{n}$ for efficient recovery to be possible. For the related variant where the cliques cover a $(1-ε)$-fraction of the vertices, we also show hardness by reduction from planted clique. Our second result shows that refuting $q$-colorability of $G(n,1/2)$ is hard in the low-degree polynomial model when $q \gg n^{2/3}$ but easy when $q \lesssim n^{1/2}$, and we leave closing this gap for future work. Our proof is more subtle than similar results for planted clique and involves constructing a non-standard distribution over $q$-colorable graphs. We note that while related to several prior works, this is the first work that explicitly formulates refutation problems in the low-degree polynomial model. The proofs of our main results involve showing low-degree hardness of hypothesis testing between an appropriately constructed pair of distributions. For refutation, we show completeness of this approach: in the low-degree model, the refutation task is precisely as hard as the hardest associated testing problem, i.e., proving hardness of refutation amounts to finding a "hard" distribution. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 23 pages

arXiv:2302.06737 [pdf, ps, other]

Detection-Recovery Gap for Planted Dense Cycles

Authors: Cheng Mao, Alexander S. Wein, Shenduo Zhang

Abstract: Planted dense cycles are a type of latent structure that appears in many applications, such as small-world networks in social sciences and sequence assembly in computational biology. We consider a model where a dense cycle with expected bandwidth $n τ$ and edge density $p$ is planted in an Erdős-Rényi graph $G(n,q)$. We characterize the computational thresholds for the associated detection and rec… ▽ More Planted dense cycles are a type of latent structure that appears in many applications, such as small-world networks in social sciences and sequence assembly in computational biology. We consider a model where a dense cycle with expected bandwidth $n τ$ and edge density $p$ is planted in an Erdős-Rényi graph $G(n,q)$. We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree polynomial algorithms. In particular, a gap exists between the two thresholds in a certain regime of parameters. For example, if $n^{-3/4} \ll τ\ll n^{-1/2}$ and $p = C q = Θ(1)$ for a constant $C>1$, the detection problem is computationally easy while the recovery problem is hard for low-degree algorithms. △ Less

Submitted 20 June, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 41 pages, 1 figure

arXiv:2212.10872 [pdf, ps, other]

Is it easier to count communities than find them?

Authors: Cynthia Rush, Fiona Skerman, Alexander S. Wein, Dana Yang

Abstract: Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of commu… ▽ More Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of communities) even in situations where actually finding those communities is believed to be computationally hard? We show the answer is no. In particular, we consider certain hypothesis testing problems between models with different community structures, and we show (in the low-degree polynomial framework) that testing between two options is as hard as finding the communities. In addition, our methods give the first computational lower bounds for testing between two different `planted' distributions, whereas previous results have considered testing between a planted distribution and an i.i.d. `null' distribution. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: Accepted to Innovations in Theoretical Computer Science (ITCS) 2023

MSC Class: 05C80; 62F03; 68Q25 ACM Class: F.2; G.2

arXiv:2211.05274 [pdf, ps, other]

doi 10.1145/3564246.3585232

Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials

Authors: Alexander S. Wein

Abstract: Suppose we are given an $n$-dimensional order-3 symmetric tensor $T \in (\mathbb{R}^n)^{\otimes 3}$ that is the sum of $r$ random rank-1 terms. The problem of recovering the rank-1 components is possible in principle when $r \lesssim n^2$ but polynomial-time algorithms are only known in the regime $r \ll n^{3/2}$. Similar "statistical-computational gaps" occur in many high-dimensional inference ta… ▽ More Suppose we are given an $n$-dimensional order-3 symmetric tensor $T \in (\mathbb{R}^n)^{\otimes 3}$ that is the sum of $r$ random rank-1 terms. The problem of recovering the rank-1 components is possible in principle when $r \lesssim n^2$ but polynomial-time algorithms are only known in the regime $r \ll n^{3/2}$. Similar "statistical-computational gaps" occur in many high-dimensional inference tasks, and in recent years there has been a flurry of work on explaining the apparent computational hardness in these problems by proving lower bounds against restricted (yet powerful) models of computation such as statistical queries (SQ), sum-of-squares (SoS), and low-degree polynomials (LDP). However, no such prior work exists for tensor decomposition, largely because its hardness does not appear to be explained by a "planted versus null" testing problem. We consider a model for random order-3 tensor decomposition where one component is slightly larger in norm than the rest (to break symmetry), and the components are drawn uniformly from the hypercube. We resolve the computational complexity in the LDP model: $O(\log n)$-degree polynomial functions of the tensor entries can accurately estimate the largest component when $r \ll n^{3/2}$ but fail to do so when $r \gg n^{3/2}$. This provides rigorous evidence suggesting that the best known algorithms for tensor decomposition cannot be improved, at least by known approaches. A natural extension of the result holds for tensors of any fixed order $k \ge 3$, in which case the LDP threshold is $r \sim n^{k/2}$. △ Less

Submitted 26 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: 42 pages; STOC 2023

arXiv:2208.09493 [pdf, other]

Near-optimal fitting of ellipsoids to random points

Authors: Aaron Potechin, Paxton Turner, Prayaag Venkat, Alexander S. Wein

Abstract: Given independent standard Gaussian points $v_1, \ldots, v_n$ in dimension $d$, for what values of $(n, d)$ does there exist with high probability an origin-symmetric ellipsoid that simultaneously passes through all of the points? This basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component an… ▽ More Given independent standard Gaussian points $v_1, \ldots, v_n$ in dimension $d$, for what values of $(n, d)$ does there exist with high probability an origin-symmetric ellipsoid that simultaneously passes through all of the points? This basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. Based on strong numerical evidence, Saunderson, Parrilo, and Willsky [Proc. of Conference on Decision and Control, pp. 6031-6036, 2013] conjecture that the ellipsoid fitting problem transitions from feasible to infeasible as the number of points $n$ increases, with a sharp threshold at $n \sim d^2/4$. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Ω( \, d^2/\mathrm{polylog}(d) \,)$, improving prior work of Ghosh et al. [Proc. of Symposium on Foundations of Computer Science, pp. 954-965, 2020] that requires $n = o(d^{3/2})$. Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix and a careful analysis of its Neumann expansion via the theory of graph matrices. △ Less

Submitted 1 June, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: An earlier version of this paper contained an error in the proof of Proposition 5.2. The current version contains a corrected proof of the original result

arXiv:2206.07640 [pdf, other]

Statistical and Computational Phase Transitions in Group Testing

Authors: Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Alexander S. Wein, Ilias Zadik

Abstract: We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design an… ▽ More We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design and Bernoulli design. Our first set of results concerns the fundamental statistical limits. For the constant-column design, we give a new information-theoretic lower bound which implies that the proportion of correctly identifiable infected individuals undergoes a sharp "all-or-nothing" phase transition when the number of tests crosses a particular threshold. For the Bernoulli design, we determine the precise number of tests required to solve the associated detection problem (where the goal is to distinguish between a group testing instance and pure noise), improving both the upper and lower bounds of Truong, Aldridge, and Scarlett (2020). For both group testing models, we also study the power of computationally efficient (polynomial-time) inference procedures. We determine the precise number of tests required for the class of low-degree polynomial algorithms to solve the detection problem. This provides evidence for an inherent computational-statistical gap in both the detection and recovery problems at small sparsity levels. Notably, our evidence is contrary to that of Iliopoulos and Zadik (2021), who predicted the absence of a computational-statistical gap in the Bernoulli design. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

arXiv:2205.09727 [pdf, other]

The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics

Authors: Afonso S. Bandeira, Ahmed El Alaoui, Samuel B. Hopkins, Tselil Schramm, Alexander S. Wein, Ilias Zadik

Abstract: Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to m… ▽ More Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to make a rigorous connection between the seemingly different low-degree and free-energy based approaches. We define a free-energy based criterion for hardness and formally connect it to the well-established notion of low-degree hardness for a broad class of statistical problems, namely all Gaussian additive models and certain models with a sparse planted signal. By leveraging these rigorous connections we are able to: establish that for Gaussian additive models the "algebraic" notion of low-degree hardness implies failure of "geometric" local MCMC algorithms, and provide new low-degree lower bounds for sparse linear regression which seem difficult to prove directly. These results provide both conceptual insights into the connections between different notions of hardness, as well as concrete technical tools such as new methods for proving low-degree lower bounds. △ Less

Submitted 13 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: 52 pages, 1 figure

arXiv:2112.03898 [pdf, ps, other]

Lattice-Based Methods Surpass Sum-of-Squares in Clustering

Authors: Ilias Zadik, Min Jae Song, Alexander S. Wein, Joan Bruna

Abstract: Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the cl… ▽ More Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. '20; Mao, Wein '21; Davis, Diaz, Wang '21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is statistically possible but no polynomial-time algorithm succeeds. One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model does not exhibit a statistical-to-computational gap, even though the aforementioned low-degree and SoS lower bounds continue to apply in this case. To achieve this, we give a polynomial-time algorithm based on the Lenstra--Lenstra--Lovasz lattice basis reduction method which achieves the statistically-optimal sample complexity of d+1 samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be "closed" by "brittle" polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps. △ Less

Submitted 7 January, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: Added a new tight information-theoretic lower bound for label recovery

arXiv:2109.01342 [pdf, ps, other]

Circuit Lower Bounds for the p-Spin Optimization Problem

Authors: David Gamarnik, Aukosh Jagannath, Alexander S. Wein

Abstract: We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least… ▽ More We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least $\log n/(2\log\log n)$ as $n$ grows. This is stronger than the known state of the art bounds of the form $Ω(\log n/(k(n)\log\log n))$ for similar combinatorial optimization problems, where $k(n)$ depends on the optimality value. For example, for the largest clique problem $k(n)$ corresponds to the square of the size of the clique [Rossman 2010]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work, the approach is based on the overlap gap property (OGP) exhibited by random $p$-spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem. To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity. △ Less

Submitted 21 January, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: 14 pages

arXiv:2105.15081 [pdf, ps, other]

Optimal Spectral Recovery of a Planted Vector in a Subspace

Authors: Cheng Mao, Alexander S. Wein

Abstract: Recovering a planted vector $v$ in an $n$-dimensional random subspace of $\mathbb{R}^N$ is a generic task related to many problems in machine learning and statistics, such as dictionary learning, subspace recovery, principal component analysis, and non-Gaussian component analysis. In this work, we study computationally efficient estimation and detection of a planted vector $v$ whose $\ell_4$ norm… ▽ More Recovering a planted vector $v$ in an $n$-dimensional random subspace of $\mathbb{R}^N$ is a generic task related to many problems in machine learning and statistics, such as dictionary learning, subspace recovery, principal component analysis, and non-Gaussian component analysis. In this work, we study computationally efficient estimation and detection of a planted vector $v$ whose $\ell_4$ norm differs from that of a Gaussian vector with the same $\ell_2$ norm. For instance, in the special case where $v$ is an $N ρ$-sparse vector with Bernoulli-Gaussian or Bernoulli-Rademacher entries, our results include the following: (1) We give an improved analysis of a slight variant of the spectral method proposed by Hopkins, Schramm, Shi, and Steurer (2016), showing that it approximately recovers $v$ with high probability in the regime $n ρ\ll \sqrt{N}$. This condition subsumes the conditions $ρ\ll 1/\sqrt{n}$ or $n \sqrtρ \lesssim \sqrt{N}$ required by previous work up to polylogarithmic factors. We achieve $\ell_\infty$ error bounds for the spectral estimator via a leave-one-out analysis, from which it follows that a simple thresholding procedure exactly recovers $v$ with Bernoulli-Rademacher entries, even in the dense case $ρ= 1$. (2) We study the associated detection problem and show that in the regime $n ρ\gg \sqrt{N}$, any spectral method from a large class (and more generally, any low-degree polynomial of the input) fails to detect the planted vector. This matches the condition for recovery and offers evidence that no polynomial-time algorithm can succeed in recovering a Bernoulli-Gaussian vector $v$ when $n ρ\gg \sqrt{N}$. △ Less

Submitted 13 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: 54 pages

arXiv:2012.02243 [pdf, other]

Average-Case Integrality Gap for Non-Negative Principal Component Analysis

Authors: Afonso S. Bandeira, Dmitriy Kunisky, Alexander S. Wein

Abstract: Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\mathbf{x}^{\top}\mathbf{W} \mathbf{x}$ over $\|\mathbf{x}\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\mathbf{W} \in \mathbb{R}^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP app… ▽ More Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\mathbf{x}^{\top}\mathbf{W} \mathbf{x}$ over $\|\mathbf{x}\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\mathbf{W} \in \mathbb{R}^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP appears to be tight for the GOE, producing a rank-one optimal matrix solution aligned with the optimal vector $\mathbf{x}$. We prove, however, that as $n \to \infty$ the SDP is not tight, and certifies an upper bound asymptotically no better than the simple spectral bound $λ_{\max}(\mathbf{W})$ on this objective function. We also provide evidence, using tools from recent literature on hypothesis testing with low-degree polynomials, that no subexponential-time certification algorithm can improve on this behavior. Finally, we present further numerical experiments estimating how large $n$ would need to be before this limiting behavior becomes evident, providing a cautionary example against extrapolating asymptotics of SDPs in high dimension from their efficacy in small "laptop scale" computations. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 12 pages, 3 figures

arXiv:2010.06563 [pdf, ps, other]

Optimal Low-Degree Hardness of Maximum Independent Set

Authors: Alexander S. Wein

Abstract: We study the algorithmic task of finding a large independent set in a sparse Erdős-Rényi random graph with $n$ vertices and average degree $d$. The maximum independent set is known to have size $(2 \log d / d)n$ in the double limit $n \to \infty$ followed by $d \to \infty$, but the best known polynomial-time algorithms can only find an independent set of half-optimal size $(\log d / d)n$. We show… ▽ More We study the algorithmic task of finding a large independent set in a sparse Erdős-Rényi random graph with $n$ vertices and average degree $d$. The maximum independent set is known to have size $(2 \log d / d)n$ in the double limit $n \to \infty$ followed by $d \to \infty$, but the best known polynomial-time algorithms can only find an independent set of half-optimal size $(\log d / d)n$. We show that the class of low-degree polynomial algorithms can find independent sets of half-optimal size but no larger, improving upon a result of Gamarnik, Jagannath, and the author. This generalizes earlier work by Rahman and Virág, which proved the analogous result for the weaker class of local algorithms. △ Less

Submitted 12 November, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: 19 pages

arXiv:2008.12237 [pdf, ps, other]

Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs

Authors: Afonso S. Bandeira, Jess Banks, Dmitriy Kunisky, Cristopher Moore, Alexander S. Wein

Abstract: We study the problem of efficiently refuting the k-colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-case computational hardness for this problem in sparse random regular graphs, showing optimality of a simple spectral certificate. This evidence takes the form of a computationally-quiet planting: we construct a distributio… ▽ More We study the problem of efficiently refuting the k-colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-case computational hardness for this problem in sparse random regular graphs, showing optimality of a simple spectral certificate. This evidence takes the form of a computationally-quiet planting: we construct a distribution of d-regular graphs that has significantly smaller chromatic number than a typical regular graph drawn uniformly at random, while providing evidence that these two distributions are indistinguishable by a large class of algorithms. We generalize our results to the more general problem of certifying an upper bound on the maximum k-cut. This quiet planting is achieved by minimizing the effect of the planted structure (e.g. colorings or cuts) on the graph spectrum. Specifically, the planted structure corresponds exactly to eigenvectors of the adjacency matrix. This avoids the pushout effect of random matrix theory, and delays the point at which the planting becomes visible in the spectrum or local statistics. To illustrate this further, we give similar results for a Gaussian analogue of this problem: a quiet version of the spiked model, where we plant an eigenspace rather than adding a generic low-rank perturbation. Our evidence for computational hardness of distinguishing two distributions is based on three different heuristics: stability of belief propagation, the local statistics hierarchy, and the low-degree likelihood ratio. Of independent interest, our results include general-purpose bounds on the low-degree likelihood ratio for multi-spiked matrix models, and an improved low-degree analysis of the stochastic block model. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 59 pages

arXiv:2008.02269 [pdf, ps, other]

doi 10.1214/22-AOS2179

Computational Barriers to Estimation from Low-Degree Polynomials

Authors: Tselil Schramm, Alexander S. Wein

Abstract: One fundamental goal of high-dimensional statistics is to detect or recover planted structure (such as a low-rank matrix) hidden in noisy data. A growing body of work studies low-degree polynomials as a restricted model of computation for such problems: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polyn… ▽ More One fundamental goal of high-dimensional statistics is to detect or recover planted structure (such as a low-rank matrix) hidden in noisy data. A growing body of work studies low-degree polynomials as a restricted model of computation for such problems: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms. Prior work has studied the power of low-degree polynomials for the task of detecting the presence of hidden structures. In this work, we extend these methods to address problems of estimation and recovery (instead of detection). For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree-D polynomial. To our knowledge, these are the first results to establish low-degree hardness of recovery problems for which the associated detection problem is easy. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems, resolving (in the low-degree framework) open problems about the computational complexity of recovery in both cases. △ Less

Submitted 18 June, 2022; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: v2 adds new results on planted clique

Journal ref: Annals of Statistics 2022, Vol. 50, No. 3, 1833-1858

arXiv:2006.10689 [pdf, ps, other]

Free Energy Wells and Overlap Gap Property in Sparse PCA

Authors: Gérard Ben Arous, Alexander S. Wein, Ilias Zadik

Abstract: We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspir… ▽ More We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the low-degree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worst-case initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 63 pages. Accepted for presentation at the Conference on Learning Theory (COLT) 2020

arXiv:2005.11270 [pdf, ps, other]

The Average-Case Time Complexity of Certifying the Restricted Isometry Property

Authors: Yunzi Ding, Dmitriy Kunisky, Alexander S. Wein, Afonso S. Bandeira

Abstract: In compressed sensing, the restricted isometry property (RIP) on $M \times N$ sensing matrices (where $M < N$) guarantees efficient reconstruction of sparse vectors. A matrix has the $(s,δ)$-$\mathsf{RIP}$ property if behaves as a $δ$-approximate isometry on $s$-sparse vectors. It is well known that an $M\times N$ matrix with i.i.d. $\mathcal{N}(0,1/M)$ entries is $(s,δ)$-$\mathsf{RIP}$ with high… ▽ More In compressed sensing, the restricted isometry property (RIP) on $M \times N$ sensing matrices (where $M < N$) guarantees efficient reconstruction of sparse vectors. A matrix has the $(s,δ)$-$\mathsf{RIP}$ property if behaves as a $δ$-approximate isometry on $s$-sparse vectors. It is well known that an $M\times N$ matrix with i.i.d. $\mathcal{N}(0,1/M)$ entries is $(s,δ)$-$\mathsf{RIP}$ with high probability as long as $s\lesssim δ^2 M/\log N$. On the other hand, most prior works aiming to deterministically construct $(s,δ)$-$\mathsf{RIP}$ matrices have failed when $s \gg \sqrt{M}$. An alternative way to find an RIP matrix could be to draw a random gaussian matrix and certify that it is indeed RIP. However, there is evidence that this certification task is computationally hard when $s \gg \sqrt{M}$, both in the worst case and the average case. In this paper, we investigate the exact average-case time complexity of certifying the RIP property for $M\times N$ matrices with i.i.d. $\mathcal{N}(0,1/M)$ entries, in the "possible but hard" regime $\sqrt{M} \ll s\lesssim M/\log N$. Based on analysis of the low-degree likelihood ratio, we give rigorous evidence that subexponential runtime $N^{\tildeΩ(s^2/M)}$ is required, demonstrating a smooth tradeoff between the maximum tolerated sparsity and the required computational power. This lower bound is essentially tight, matching the runtime of an existing algorithm due to Koiran and Zouzias. Our hardness result allows $δ$ to take any constant value in $(0,1)$, which captures the relevant regime for compressed sensing. This improves upon the existing average-case hardness result of Wang, Berthet, and Plan, which is limited to $δ= o(1)$. △ Less

Submitted 22 April, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: 14 pages

arXiv:2005.10817 [pdf, ps, other]

Computationally efficient sparse clustering

Authors: Matthias Löffler, Alexander S. Wein, Afonso S. Bandeira

Abstract: We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the model $X_i = z_i θ+ \varepsilon_i, ~z_i \in \{-1,1\}, ~\varepsilon_i \thicksim \mathcal{N}(0,I)$, which has two clusters with centres $θ$ and $-θ$. We provide a finite sample analysis of a new… ▽ More We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the model $X_i = z_i θ+ \varepsilon_i, ~z_i \in \{-1,1\}, ~\varepsilon_i \thicksim \mathcal{N}(0,I)$, which has two clusters with centres $θ$ and $-θ$. We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime $\|θ\| \rightarrow \infty$. Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds -- the low-degree likelihood ratio -- we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable. Our results imply that a large class of tests based on low-degree polynomials fail to solve even the weak testing task. △ Less

Submitted 22 March, 2021; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: 33 pages

MSC Class: 62H30

arXiv:2004.12063 [pdf, ps, other]

Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics

Authors: David Gamarnik, Aukosh Jagannath, Alexander S. Wein

Abstract: We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b… ▽ More We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b) low-depth Boolean circuits; (c) the Langevin dynamics algorithm. We show that these families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory (although we consider the search problem as opposed to the decision problem). Our proof uses the fact that these models are known to exhibit a variant of the overlap gap property (OGP) of near-optimal solutions. Specifically, for both models, every two solutions whose objectives are above a certain threshold are either close or far from each other. The crux of our proof is that the classes of algorithms we consider exhibit a form of stability. We show by an interpolation argument that stable algorithms cannot overcome the OGP barrier. The stability of Langevin dynamics is an immediate consequence of the well-posedness of stochastic differential equations. The stability of low-degree polynomials and Boolean circuits is established using tools from Gaussian and Boolean analysis -- namely hypercontractivity and total influence, as well as a novel lower bound for random walks avoiding certain subsets. In the case of Boolean circuits, the result also makes use of Linal-Mansour-Nisan's classical theorem. Our techniques apply more broadly to low influence functions and may apply more generally. △ Less

Submitted 26 January, 2022; v1 submitted 25 April, 2020; originally announced April 2020.

Comments: 41 pages; v1 is the conference paper "Low-Degree Hardness of Random Optimization Problems" (FOCS 2020); v2 is a journal version which adds circuit lower bounds for max independent set, based on ideas from our note arXiv:2109.01342

arXiv:2004.08454 [pdf, ps, other]

Counterexamples to the Low-Degree Conjecture

Authors: Justin Holmgren, Alexander S. Wein

Abstract: A conjecture of Hopkins (2018) posits that for certain high-dimensional hypothesis testing problems, no polynomial-time algorithm can outperform so-called "simple statistics", which are low-degree polynomials in the data. This conjecture formalizes the beliefs surrounding a line of recent work that seeks to understand statistical-versus-computational tradeoffs via the low-degree likelihood ratio.… ▽ More A conjecture of Hopkins (2018) posits that for certain high-dimensional hypothesis testing problems, no polynomial-time algorithm can outperform so-called "simple statistics", which are low-degree polynomials in the data. This conjecture formalizes the beliefs surrounding a line of recent work that seeks to understand statistical-versus-computational tradeoffs via the low-degree likelihood ratio. In this work, we refute the conjecture of Hopkins. However, our counterexample crucially exploits the specifics of the noise operator used in the conjecture, and we point out a simple way to modify the conjecture to rule out our counterexample. We also give an example illustrating that (even after the above modification), the symmetry assumption in the conjecture is necessary. These results do not undermine the low-degree framework for computational lower bounds, but rather aim to better understand what class of problems it is applicable to. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: 10 pages

arXiv:1907.11636 [pdf, ps, other]

Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio

Authors: Dmitriy Kunisky, Alexander S. Wein, Afonso S. Bandeira

Abstract: These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesi… ▽ More These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions. These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA. △ Less

Submitted 26 July, 2019; originally announced July 2019.

Comments: 44 pages

arXiv:1907.11635 [pdf, ps, other]

Subexponential-Time Algorithms for Sparse PCA

Authors: Yunzi Ding, Dmitriy Kunisky, Alexander S. Wein, Afonso S. Bandeira

Abstract: We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + λxx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + βxx^\top)$, respectively). Prior work has shown that when the signal-to-noise ra… ▽ More We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + λxx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + βxx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($λ$ or $β\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\|x\|_0 / n = ρ$, it is possible to recover $x$ in polynomial time if $ρ\lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $ρ\ll 1$, it is believed that polynomial-time recovery is impossible unless $ρ\lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the "possible but hard" regime $1/\sqrt{n} \ll ρ\ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^δ)$ for some constant $δ\in (0,1)$. For any $1/\sqrt{n} \ll ρ\ll 1$, we give a recovery algorithm with runtime roughly $\exp(ρ^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(ρn)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal. △ Less

Submitted 23 June, 2022; v1 submitted 26 July, 2019; originally announced July 2019.

Comments: 44 pages

arXiv:1904.03858 [pdf, ps, other]

The Kikuchi Hierarchy and Tensor PCA

Authors: Alexander S. Wein, Ahmed El Alaoui, Cristopher Moore

Abstract: For the tensor PCA (principal component analysis) problem, we propose a new hierarchy of increasingly powerful algorithms with increasing runtime. Our hierarchy is analogous to the sum-of-squares (SOS) hierarchy but is instead inspired by statistical physics and related algorithms such as belief propagation and AMP (approximate message passing). Our level-$\ell$ algorithm can be thought of as a li… ▽ More For the tensor PCA (principal component analysis) problem, we propose a new hierarchy of increasingly powerful algorithms with increasing runtime. Our hierarchy is analogous to the sum-of-squares (SOS) hierarchy but is instead inspired by statistical physics and related algorithms such as belief propagation and AMP (approximate message passing). Our level-$\ell$ algorithm can be thought of as a linearized message-passing algorithm that keeps track of $\ell$-wise dependencies among the hidden variables. Specifically, our algorithms are spectral methods based on the Kikuchi Hessian, which generalizes the well-studied Bethe Hessian to the higher-order Kikuchi free energies. It is known that AMP, the flagship algorithm of statistical physics, has substantially worse performance than SOS for tensor PCA. In this work we 'redeem' the statistical physics approach by showing that our hierarchy gives a polynomial-time algorithm matching the performance of SOS. Our hierarchy also yields a continuum of subexponential-time algorithms, and we prove that these achieve the same (conjecturally optimal) tradeoff between runtime and statistical power as SOS. Our proofs are much simpler than prior work, and also apply to the related problem of refuting random $k$-XOR formulas. The results we present here apply to tensor PCA for tensors of all orders, and to $k$-XOR when $k$ is even. Our methods suggest a new avenue for systematically obtaining optimal algorithms for Bayesian inference problems, and our results constitute a step toward unifying the statistical physics and sum-of-squares approaches to algorithm design. △ Less

Submitted 1 October, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

Comments: 42 pages. This version adds results on odd-order tensor PCA and even-arity XOR refutation

MSC Class: 68Q87 ACM Class: F.2.2

arXiv:1902.07324 [pdf, ps, other]

Computational Hardness of Certifying Bounds on Constrained PCA Problems

Authors: Afonso S. Bandeira, Dmitriy Kunisky, Alexander S. Wein

Abstract: Given a random $n \times n$ symmetric matrix $\boldsymbol W$ drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form $\boldsymbol x^\top \boldsymbol W \boldsymbol x$ over all vectors $\boldsymbol x$ in a constraint set $\mathcal{S} \subset \mathbb{R}^n$. For a certain class of normalized constraint sets… ▽ More Given a random $n \times n$ symmetric matrix $\boldsymbol W$ drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form $\boldsymbol x^\top \boldsymbol W \boldsymbol x$ over all vectors $\boldsymbol x$ in a constraint set $\mathcal{S} \subset \mathbb{R}^n$. For a certain class of normalized constraint sets $\mathcal{S}$ we show that, conditional on certain complexity-theoretic assumptions, there is no polynomial-time algorithm certifying a better upper bound than the largest eigenvalue of $\boldsymbol W$. A notable special case included in our results is the hypercube $\mathcal{S} = \{ \pm 1 / \sqrt{n}\}^n$, which corresponds to the problem of certifying bounds on the Hamiltonian of the Sherrington-Kirkpatrick spin glass model from statistical physics. Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negatively-spiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no low-degree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for identifying computational thresholds was proposed in a sequence of recent works on the sum-of-squares hierarchy, and is believed to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over $\boldsymbol x \in \mathcal{S}$ is much larger than that of a GOE matrix. △ Less

Submitted 6 April, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

Comments: Submitted version (minor text revisions)

arXiv:1901.08334 [pdf, ps, other]

Overcomplete Independent Component Analysis via SDP

Authors: Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d'Aspremont, David Sontag

Abstract: We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoret… ▽ More We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). 21 pages

arXiv:1811.00944 [pdf, ps, other]

Spectral Methods from Tensor Networks

Authors: Ankur Moitra, Alexander S. Wein

Abstract: A tensor network is a diagram that specifies a way to "multiply" a collection of tensors together to produce another tensor (or matrix). Many existing algorithms for tensor problems (such as tensor decomposition and tensor PCA), although they are not presented this way, can be viewed as spectral methods on matrices built from simple tensor networks. In this work we leverage the full power of this… ▽ More A tensor network is a diagram that specifies a way to "multiply" a collection of tensors together to produce another tensor (or matrix). Many existing algorithms for tensor problems (such as tensor decomposition and tensor PCA), although they are not presented this way, can be viewed as spectral methods on matrices built from simple tensor networks. In this work we leverage the full power of this abstraction to design new algorithms for certain continuous tensor decomposition problems. An important and challenging family of tensor problems comes from orbit recovery, a class of inference problems involving group actions (inspired by applications such as cryo-electron microscopy). Orbit recovery problems over finite groups can often be solved via standard tensor methods. However, for infinite groups, no general algorithms are known. We give a new spectral algorithm based on tensor networks for one such problem: continuous multi-reference alignment over the infinite group SO(2). Our algorithm extends to the more general heterogeneous case. △ Less

Submitted 2 November, 2018; originally announced November 2018.

Comments: 30 pages, 8 figures

arXiv:1807.00891 [pdf, ps, other]

doi 10.1214/17-AOS1625

Optimality and Sub-optimality of PCA I: Spiked Random Matrix Models

Authors: Amelia Perry, Alexander S. Wein, Afonso S. Bandeira, Ankur Moitra

Abstract: A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensembl… ▽ More A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including non-spectral tests. Our results leverage Le Cam's notion of contiguity, and include: i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike. ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries. iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes. △ Less

Submitted 12 July, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: 67 pages, 3 figures. This is the journal version of part I of arXiv:1609.05573, accepted to the Annals of Statistics. This version includes the supplementary material as appendices

MSC Class: 62H15; 62B15

Journal ref: Ann. Statist., Volume 46, Number 5 (2018), 2416-2451

arXiv:1803.11132 [pdf, other]

Notes on computational-to-statistical gaps: predictions using statistical physics

Authors: Afonso S. Bandeira, Amelia Perry, Alexander S. Wein

Abstract: In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools f… ▽ More In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017. △ Less

Submitted 20 April, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

Comments: 22 pages, 2 figures

arXiv:1712.10163 [pdf, ps, other]

doi 10.1016/j.acha.2023.06.001

Estimation under group actions: recovering orbits from invariants

Authors: Afonso S. Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Jonathan Niles-Weed, Alexander S. Wein

Abstract: We study a class of orbit recovery problems in which we observe independent copies of an unknown element of $\mathbb{R}^p$, each linearly acted upon by a random element of some group (such as $\mathbb{Z}/p$ or $\mathrm{SO}(3)$) and then corrupted by additive Gaussian noise. We prove matching upper and lower bounds on the number of samples required to approximately recover the group orbit of this u… ▽ More We study a class of orbit recovery problems in which we observe independent copies of an unknown element of $\mathbb{R}^p$, each linearly acted upon by a random element of some group (such as $\mathbb{Z}/p$ or $\mathrm{SO}(3)$) and then corrupted by additive Gaussian noise. We prove matching upper and lower bounds on the number of samples required to approximately recover the group orbit of this unknown element with high probability. These bounds, based on quantitative techniques in invariant theory, give a precise correspondence between the statistical difficulty of the estimation problem and algebraic properties of the group. Furthermore, we give computer-assisted procedures to certify these properties that are computationally efficient in many cases of interest. The model is motivated by geometric problems in signal processing, computer vision, and structural biology, and applies to the reconstruction problem in cryo-electron microscopy (cryo-EM), a problem of significant practical interest. Our results allow us to verify (for a given problem size) that if cryo-EM images are corrupted by noise with variance $σ^2$, the number of images required to recover the molecule structure scales as $σ^6$. We match this bound with a novel (albeit computationally expensive) algorithm for ab initio reconstruction in cryo-EM, based on invariant features of degree at most 3. We further discuss how to recover multiple molecular structures from mixed (or heterogeneous) cryo-EM samples. △ Less

Submitted 13 June, 2023; v1 submitted 29 December, 2017; originally announced December 2017.

Comments: 81 pages. Minor revisions since previous version, reflecting peer review feedback. To be published in Applied and Computational Harmonic Analysis

MSC Class: 62F10; 92C55; 16W22

Journal ref: Applied and Computational Harmonic Analysis 66 (2023) 236--319

arXiv:1612.07728 [pdf, ps, other]

Statistical limits of spiked tensor models

Authors: Amelia Perry, Alexander S. Wein, Afonso S. Bandeira

Abstract: We study the statistical limits of both detecting and estimating a rank-one deformation of a symmetric random Gaussian tensor. We establish upper and lower bounds on the critical signal-to-noise ratio, under a variety of priors for the planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. $\pm 1$ entries, and (iii) a sparse vector where a constant fraction $ρ$ of entries are i.i.d.… ▽ More We study the statistical limits of both detecting and estimating a rank-one deformation of a symmetric random Gaussian tensor. We establish upper and lower bounds on the critical signal-to-noise ratio, under a variety of priors for the planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. $\pm 1$ entries, and (iii) a sparse vector where a constant fraction $ρ$ of entries are i.i.d. $\pm 1$ and the rest are zero. For each of these cases, our upper and lower bounds match up to a $1+o(1)$ factor as the order $d$ of the tensor becomes large. For sparse signals (iii), our bounds are also asymptotically tight in the sparse limit $ρ\to 0$ for any fixed $d$ (including the $d=2$ case of sparse PCA). Our upper bounds for (i) demonstrate a phenomenon reminiscent of the work of Baik, Ben Arous and Péché: an `eigenvalue' of a perturbed tensor emerges from the bulk at a strictly lower signal-to-noise ratio than when the perturbation itself exceeds the bulk; we quantify the size of this effect. We also provide some general results for larger classes of priors. In particular, the large $d$ asymptotics of the threshold location differs between problems with discrete priors versus continuous priors. Finally, for priors (i) and (ii) we carry out the replica prediction from statistical physics, which is conjectured to give the exact information-theoretic threshold for any fixed $d$. Of independent interest, we introduce a new improvement to the second moment method for contiguity, on which our lower bounds are based. Our technique conditions away from rare `bad' events that depend on interactions between the signal and noise. This enables us to close $\sqrt{2}$-factor gaps present in several previous works. △ Less

Submitted 24 January, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

Comments: 39 pages, 5 figures

arXiv:1610.04583 [pdf, ps, other]

doi 10.1002/cpa.21750

Message-passing algorithms for synchronization problems over compact groups

Authors: Amelia Perry, Alexander S. Wein, Afonso S. Bandeira, Ankur Moitra

Abstract: Various alignment problems arising in cryo-electron microscopy, community detection, time synchronization, computer vision, and other fields fall into a common framework of synchronization problems over compact groups such as Z/L, U(1), or SO(3). The goal of such problems is to estimate an unknown vector of group elements given noisy relative observations. We present an efficient iterative algorit… ▽ More Various alignment problems arising in cryo-electron microscopy, community detection, time synchronization, computer vision, and other fields fall into a common framework of synchronization problems over compact groups such as Z/L, U(1), or SO(3). The goal of such problems is to estimate an unknown vector of group elements given noisy relative observations. We present an efficient iterative algorithm to solve a large class of these problems, allowing for any compact group, with measurements on multiple 'frequency channels' (Fourier modes, or more generally, irreducible representations of the group). Our algorithm is a highly efficient iterative method following the blueprint of approximate message passing (AMP), which has recently arisen as a central technique for inference problems such as structured low-rank estimation and compressed sensing. We augment the standard ideas of AMP with ideas from representation theory so that the algorithm can work with distributions over compact groups. Using standard but non-rigorous methods from statistical physics we analyze the behavior of our algorithm on a Gaussian noise model, identifying phases where the problem is easy, (computationally) hard, and (statistically) impossible. In particular, such evidence predicts that our algorithm is information-theoretically optimal in many cases, and that the remaining cases show evidence of statistical-to-computational gaps. △ Less

Submitted 14 October, 2016; originally announced October 2016.

Comments: 35 pages, 11 figures

arXiv:1609.05573 [pdf, other]

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

Authors: Amelia Perry, Alexander S. Wein, Afonso S. Bandeira, Ankur Moitra

Abstract: A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition as… ▽ More A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the signal strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, not all the information about the spike is necessarily contained in the spectrum. We study the fundamental limitations of statistical methods, including non-spectral ones. Our results include: I) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for a variety of benign priors for the spike. We extend previous work on the spherically symmetric and i.i.d. Rademacher priors through an elementary, unified analysis. II) For any non-Gaussian Wigner ensemble, we show that PCA is always suboptimal for detection. However, a variant of PCA achieves the optimal threshold (for benign priors) by pre-transforming the matrix entries according to a carefully designed function. This approach has been stated before, and we give a rigorous and general analysis. III) For both the Gaussian Wishart ensemble and various synchronization problems over groups, we show that inefficient procedures can work below the threshold where PCA succeeds, whereas no known efficient algorithm achieves this. This conjectural gap between what is statistically possible and what can be done efficiently remains open. △ Less

Submitted 23 December, 2016; v1 submitted 18 September, 2016; originally announced September 2016.

Comments: 58 pages, 5 figures. This version adds improved results for the Wishart model

MSC Class: 62H15; 62B15

arXiv:1511.01473 [pdf, ps, other]

How Robust are Reconstruction Thresholds for Community Detection?

Authors: Ankur Moitra, William Perry, Alexander S. Wein

Abstract: The stochastic block model is one of the oldest and most ubiquitous models for studying clustering and community detection. In an exciting sequence of developments, motivated by deep but non-rigorous ideas from statistical physics, Decelle et al. conjectured a sharp threshold for when community detection is possible in the sparse regime. Mossel, Neeman and Sly and Massoulie proved the conjecture a… ▽ More The stochastic block model is one of the oldest and most ubiquitous models for studying clustering and community detection. In an exciting sequence of developments, motivated by deep but non-rigorous ideas from statistical physics, Decelle et al. conjectured a sharp threshold for when community detection is possible in the sparse regime. Mossel, Neeman and Sly and Massoulie proved the conjecture and gave matching algorithms and lower bounds. Here we revisit the stochastic block model from the perspective of semirandom models where we allow an adversary to make `helpful' changes that strengthen ties within each community and break ties between them. We show a surprising result that these `helpful' changes can shift the information-theoretic threshold, making the community detection problem strictly harder. We complement this by showing that an algorithm based on semidefinite programming (which was known to get close to the threshold) continues to work in the semirandom model (even for partial recovery). This suggests that algorithms based on semidefinite programming are robust in ways that any algorithm meeting the information-theoretic threshold cannot be. These results point to an interesting new direction: Can we find robust, semirandom analogues to some of the classical, average-case thresholds in statistics? We also explore this question in the broadcast tree model, and we show that the viewpoint of semirandom models can help explain why some algorithms are preferred to others in practice, in spite of the gaps in their statistical performance on random models. △ Less

Submitted 21 March, 2016; v1 submitted 4 November, 2015; originally announced November 2015.

Comments: 36 pages, 3 figures

arXiv:1507.05605 [pdf, ps, other]

A semidefinite program for unbalanced multisection in the stochastic block model

Authors: Amelia Perry, Alexander S. Wein

Abstract: We propose a semidefinite programming (SDP) algorithm for community detection in the stochastic block model, a popular model for networks with latent community structure. We prove that our algorithm achieves exact recovery of the latent communities, up to the information-theoretic limits determined by Abbe and Sandon (2015). Our result extends prior SDP approaches by allowing for many communities… ▽ More We propose a semidefinite programming (SDP) algorithm for community detection in the stochastic block model, a popular model for networks with latent community structure. We prove that our algorithm achieves exact recovery of the latent communities, up to the information-theoretic limits determined by Abbe and Sandon (2015). Our result extends prior SDP approaches by allowing for many communities of different sizes. By virtue of a semidefinite approach, our algorithms succeed against a semirandom variant of the stochastic block model, guaranteeing a form of robustness and generalization. We further explore how semirandom models can lend insight into both the strengths and limitations of SDPs in this setting. △ Less

Submitted 2 December, 2016; v1 submitted 20 July, 2015; originally announced July 2015.

Comments: 29 pages

MSC Class: 68

Showing 1–38 of 38 results for author: Wein, A