Search | arXiv e-print repository

Bounds on the ground state energy of quantum $p$-spin Hamiltonians

Authors: Eric R. Anschuetz, David Gamarnik, Bobak T. Kiani

Abstract: We consider the problem of estimating the ground state energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Our main result shows that the maximum energy achievable by product states has a well-defined limit (for even $p$) as $n\to\infty$ and is $E_{\text{product}}^\ast=\sqrt{2 \log p}$ in the limit of large $p$. This valu… ▽ More We consider the problem of estimating the ground state energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Our main result shows that the maximum energy achievable by product states has a well-defined limit (for even $p$) as $n\to\infty$ and is $E_{\text{product}}^\ast=\sqrt{2 \log p}$ in the limit of large $p$. This value is interpreted as the maximal energy of a much simpler so-called Random Energy Model, widely studied in the setting of classical spin glasses. The proof of the limit existing follows from an extension of Fekete's Lemma after we demonstrate near super-additivity of the (normalized) quenched free energy. The proof of the value follows from a second moment method on the number of states achieving a given energy when restricting to an $ε$-net of product states. Furthermore, we relate the maximal energy achieved over all states to a $p$-dependent constant $γ\left(p\right)$, which is defined by the degree of violation of a certain asymptotic independence ansatz over graph matchings. We show that the maximal energy achieved by all states $E^\ast\left(p\right)$ in the limit of large $n$ is at most $\sqrt{γ\left(p\right)}E_{\text{product}}^\ast$. We also prove using Lindeberg's interpolation method that the limiting $E^\ast\left(p\right)$ is robust with respect to the choice of the randomness and, for instance, also applies to the case of sparse random Hamiltonians. This robustness in the randomness extends to a wide range of random Hamiltonian models including SYK and random quantum max-cut. △ Less

Submitted 17 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 54 pages, 0 figures. arXiv admin note: substantial text overlap with arXiv:2309.11709

arXiv:2402.08232 [pdf, ps, other]

Integrating High-Dimensional Functions Deterministically

Authors: David Gamarnik, Devin Smedira

Abstract: We design a Quasi-Polynomial time deterministic approximation algorithm for computing the integral of a multi-dimensional separable function, supported by some underlying hyper-graph structure, appropriately defined. Equivalently, our integral is the partition function of a graphical model with continuous potentials. While randomized algorithms for high-dimensional integration are widely known, de… ▽ More We design a Quasi-Polynomial time deterministic approximation algorithm for computing the integral of a multi-dimensional separable function, supported by some underlying hyper-graph structure, appropriately defined. Equivalently, our integral is the partition function of a graphical model with continuous potentials. While randomized algorithms for high-dimensional integration are widely known, deterministic counterparts generally do not exist. We use the correlation decay method applied to the Riemann sum of the function to produce our algorithm. For our method to work, we require that the domain is bounded and the hyper-edge potentials are positive and bounded on the domain. We further assume that upper and lower bounds on the potentials separated by a multiplicative factor of $1 + O(1/Δ^2)$, where $Δ$ is the maximum degree of the graph. When $Δ= 3$, our method works provided the upper and lower bounds are separated by a factor of at most $1.0479$. To the best of our knowledge, our algorithm is the first deterministic algorithm for high-dimensional integration of a continuous function, apart from the case of trivial product form distributions. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2312.03906 [pdf, ps, other]

Computing the Volume of a Restricted Independent Set Polytope Deterministically

Authors: David Gamarnik, Devin Smedira

Abstract: We construct a quasi-polynomial time deterministic approximation algorithm for computing the volume of an independent set polytope with restrictions. Randomized polynomial time approximation algorithms for computing the volume of a convex body have been known now for several decades, but the corresponding deterministic counterparts are not available, and our algorithm is the first of this kind. Th… ▽ More We construct a quasi-polynomial time deterministic approximation algorithm for computing the volume of an independent set polytope with restrictions. Randomized polynomial time approximation algorithms for computing the volume of a convex body have been known now for several decades, but the corresponding deterministic counterparts are not available, and our algorithm is the first of this kind. The class of polytopes for which our algorithm applies arises as linear programming relaxation of the independent set problem with the additional restriction that each variable takes value in the interval $[0,1-α]$ for some $α<1/2$. (We note that the $α\ge 1/2$ case is trivial). We use the correlation decay method for this problem applied to its appropriate and natural discretization. The method works provided $α> 1/2-O(1/Δ^2)$, where $Δ$ is the maximum degree of the graph. When $Δ=3$ (the sparsest non-trivial case), our method works provided $0.488<α<0.5$. Interestingly, the interpolation method, which is based on analyzing complex roots of the associated partition functions, fails even in the trivial case when the underlying graph is a singleton. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.04204 [pdf, other]

Sharp Thresholds Imply Circuit Lower Bounds: from random 2-SAT to Planted Clique

Authors: David Gamarnik, Elchanan Mossel, Ilias Zadik

Abstract: We show that sharp thresholds for Boolean functions directly imply average-case circuit lower bounds. More formally we show that any Boolean function exhibiting a sharp enough threshold at \emph{arbitrary} critical density cannot be computed by Boolean circuits of bounded depth and polynomial size. Our general result implies new average-case bounded depth circuit lower bounds in a variety of set… ▽ More We show that sharp thresholds for Boolean functions directly imply average-case circuit lower bounds. More formally we show that any Boolean function exhibiting a sharp enough threshold at \emph{arbitrary} critical density cannot be computed by Boolean circuits of bounded depth and polynomial size. Our general result implies new average-case bounded depth circuit lower bounds in a variety of settings. (a) ($k$-cliques) For $k=Θ(n)$, we prove that any circuit of depth $d$ deciding the presence of a size $k$ clique in a random graph requires exponential-in-$n^{Θ(1/d)}$ size. To the best of our knowledge, this is the first average-case exponential size lower bound for bounded depth (not necessarily monotone) circuits solving the fundamental $k$-clique problem (for any $k=k_n$). (b)(random 2-SAT) We prove that any circuit of depth $d$ deciding the satisfiability of a random 2-SAT formula requires exponential-in-$n^{Θ(1/d)}$ size. To the best of our knowledge, this is the first bounded depth circuit lower bound for random $k$-SAT for any value of $k \geq 2.$ Our results also provide the first rigorous lower bound in agreement with a conjectured, but debated, ``computational hardness'' of random $k$-SAT around its satisfiability threshold. (c)(Statistical estimation -- planted $k$-clique) Over the recent years, multiple statistical estimation problems have also been proven to exhibit a ``statistical'' sharp threshold, called the All-or-Nothing (AoN) phenomenon. We show that AoN also implies circuit lower bounds for statistical problems. As a simple corollary of that, we prove that any circuit of depth $d$ that solves to information-theoretic optimality a ``dense'' variant of the celebrated planted $k$-clique problem requires exponential-in-$n^{Θ(1/d)}$ size. △ Less

Submitted 30 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 29 pages

arXiv:2309.11709

Product states optimize quantum $p$-spin models for large $p$

Authors: Eric R. Anschuetz, David Gamarnik, Bobak T. Kiani

Abstract: We consider the problem of estimating the maximal energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Denoting by $E^*(p)$ the (appropriately normalized) maximal energy in the limit of a large number of qubits $n$, we show that $E^*(p)$ approaches $\sqrt{2\log 6}$ as $p$ increases. This value is interpreted as the maximal… ▽ More We consider the problem of estimating the maximal energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Denoting by $E^*(p)$ the (appropriately normalized) maximal energy in the limit of a large number of qubits $n$, we show that $E^*(p)$ approaches $\sqrt{2\log 6}$ as $p$ increases. This value is interpreted as the maximal energy of a much simpler so-called Random Energy Model, widely studied in the setting of classical spin glasses. Our most notable and (arguably) surprising result proves the existence of near-maximal energy states which are product states, and thus not entangled. Specifically, we prove that with high probability as $n\to\infty$, for any $E<E^*(p)$ there exists a product state with energy $\geq E$ at sufficiently large constant $p$. Even more surprisingly, this remains true even when restricting to tensor products of Pauli eigenstates. Our approximations go beyond what is known from monogamy-of-entanglement style arguments -- the best of which, in this normalization, achieve approximation error growing with $n$. Our results not only challenge prevailing beliefs in physics that extremely low-temperature states of random local Hamiltonians should exhibit non-negligible entanglement, but they also imply that classical algorithms can be just as effective as quantum algorithms in optimizing Hamiltonians with large locality -- though performing such optimization is still likely a hard problem. Our results are robust with respect to the choice of the randomness (disorder) and apply to the case of sparse random Hamiltonian using Lindeberg's interpolation method. The proof of the main result is obtained by estimating the expected trace of the associated partition function, and then matching its asymptotics with the extremal energy of product states using the second moment method. △ Less

Submitted 5 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: There is an error in the proof of the current draft with regards to the upper bound in Section 5. Consequently, the main result in this paper is not correct. A manuscript with new and corrected results will be uploaded as a separate arXiv document

arXiv:2307.07461 [pdf, ps, other]

Shattering in the Ising Pure $p$-Spin Model

Authors: David Gamarnik, Aukosh Jagannath, Eren C. Kızıldağ

Abstract: We study the Ising pure $p$-spin model for large $p$. We investigate the landscape of the Hamiltonian of this model. We show that for any $γ>0$ and any large enough $p$, the model exhibits an intricate geometrical property known as the multi Overlap Gap Property above the energy value $γ\sqrt{2\ln 2}$. We then show that for any inverse temperature $\sqrt{\ln 2}<β<\sqrt{2\ln 2}$ and any large $p$,… ▽ More We study the Ising pure $p$-spin model for large $p$. We investigate the landscape of the Hamiltonian of this model. We show that for any $γ>0$ and any large enough $p$, the model exhibits an intricate geometrical property known as the multi Overlap Gap Property above the energy value $γ\sqrt{2\ln 2}$. We then show that for any inverse temperature $\sqrt{\ln 2}<β<\sqrt{2\ln 2}$ and any large $p$, the model exhibits shattering: w.h.p. as $n\to\infty$, there exists exponentially many well-separated clusters such that (a) each cluster has exponentially small Gibbs mass, and (b) the clusters collectively contain all but a vanishing fraction of Gibbs mass. Moreover, these clusters consist of configurations with energy near $β$. Range of temperatures for which shattering occurs is within the replica symmetric region. To the best of our knowledge, this is the first shattering result regarding the Ising $p$-spin models. Our proof is elementary, and in particular based on simple applications of the first and the second moment methods. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.02555 [pdf, ps, other]

Barriers for the performance of graph neural networks (GNN) in discrete random structures. A comment on~\cite{schuetz2022combinatorial},\cite{angelini2023modern},\cite{schuetz2023reply}

Authors: David Gamarnik

Abstract: Recently graph neural network (GNN) based algorithms were proposed to solve a variety of combinatorial optimization problems, including Maximum Cut problem, Maximum Independent Set problem and similar other problems~\cite{schuetz2022combinatorial},\cite{schuetz2022graph}. The publication~\cite{schuetz2022combinatorial} stirred a debate whether GNN based method was adequately benchmarked against… ▽ More Recently graph neural network (GNN) based algorithms were proposed to solve a variety of combinatorial optimization problems, including Maximum Cut problem, Maximum Independent Set problem and similar other problems~\cite{schuetz2022combinatorial},\cite{schuetz2022graph}. The publication~\cite{schuetz2022combinatorial} stirred a debate whether GNN based method was adequately benchmarked against best prior methods. In particular, critical commentaries~\cite{angelini2023modern} and~\cite{boettcher2023inability} point out that simple greedy algorithm performs better than GNN in the setting of random graphs, and in fact stronger algorithmic performance can be reached with more sophisticated methods. A response from the authors~\cite{schuetz2023reply} pointed out that GNN performance can be improved further by tuning up the parameters better. We do not intend to discuss the merits of arguments and counter-arguments in~\cite{schuetz2022combinatorial},\cite{angelini2023modern},\cite{boettcher2023inability},\cite{schuetz2023reply}. Rather in this note we establish a fundamental limitation for running GNN on random graphs considered in these references, for a broad range of choices of GNN architecture. These limitations arise from the presence of the Overlap Gap Property (OGP) phase transition, which is a barrier for many algorithms, both classical and quantum. As we demonstrate in this paper, it is also a barrier to GNN due to its local structure. We note that at the same time known algorithms ranging from simple greedy algorithms to more sophisticated algorithms based on message passing, provide best results for these problems \emph{up to} the OGP phase transition. This leaves very little space for GNN to outperform the known algorithms, and based on this we side with the conclusions made in~\cite{angelini2023modern} and~\cite{boettcher2023inability}. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: 5 pages

arXiv:2305.03591 [pdf, other]

Maximally-stable Local Optima in Random Graphs and Spin Glasses: Phase Transitions and Universality

Authors: Yatin Dandi, David Gamarnik, Lenka Zdeborová

Abstract: We provide a unified analysis of stable local optima of Ising spins with Hamiltonians having pair-wise interactions and partitions in random weighted graphs where a large number of vertices possess sufficient single spin-flip stability. For graphs, we consider partitions on random graphs where almost all vertices possess sufficient appropriately defined friendliness/unfriendliness. For spin glasse… ▽ More We provide a unified analysis of stable local optima of Ising spins with Hamiltonians having pair-wise interactions and partitions in random weighted graphs where a large number of vertices possess sufficient single spin-flip stability. For graphs, we consider partitions on random graphs where almost all vertices possess sufficient appropriately defined friendliness/unfriendliness. For spin glasses, we characterize approximate local optima having almost all local magnetic fields of sufficiently large magnitude. For $n$ nodes, as $n \rightarrow \infty$, we prove that the maximum number of vertices possessing such stability undergoes a phase transition from $n-o(n)$ to $n-Θ(n)$ around a certain value of the stability, proving a conjecture from Behrens et al. [2022].Through a universality argument, we further prove that such a phase transition occurs around the same value of the stability for different choices of interactions, specifically ferromagnetic and anti-ferromagnetic, for sparse graphs, as $n \rightarrow \infty$ in the large degree limit. Furthermore, we show that after appropriate re-scaling, the same value of the threshold characterises such a phase transition for the case of fully connected spin-glass models. Our results also allow the characterization of possible energy values of maximally stable approximate local optima. Our work extends and proves seminal results in statistical physics related to metastable states, in particular, the work of Bray and Moore [1981]. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.00643 [pdf, other]

Combinatorial NLTS From the Overlap Gap Property

Authors: Eric R. Anschuetz, David Gamarnik, Bobak Kiani

Abstract: In an important recent development, Anshu, Breuckmann, and Nirkhe [ABN22] resolved positively the so-called No Low-Energy Trivial State (NLTS) conjecture by Freedman and Hastings. The conjecture postulated the existence of linear-size local Hamiltonians on n qubit systems for which no near-ground state can be prepared by a shallow (sublogarithmic depth) circuit. The construction in [ABN22] is base… ▽ More In an important recent development, Anshu, Breuckmann, and Nirkhe [ABN22] resolved positively the so-called No Low-Energy Trivial State (NLTS) conjecture by Freedman and Hastings. The conjecture postulated the existence of linear-size local Hamiltonians on n qubit systems for which no near-ground state can be prepared by a shallow (sublogarithmic depth) circuit. The construction in [ABN22] is based on recently developed good quantum codes. Earlier results in this direction included the constructions of the so-called Combinatorial NLTS -- a weaker version of NLTS -- where a state is defined to have low energy if it violates at most a vanishing fraction of the Hamiltonian terms [AB22]. These constructions were also based on codes. In this paper we provide a "non-code" construction of a class of Hamiltonians satisfying the Combinatorial NLTS. The construction is inspired by one in [AB22], but our proof uses the complex solution space geometry of random K-SAT instead of properties of codes. Specifically, it is known that above a certain clause-to-variables density the set of satisfying assignments of random K-SAT exhibits an overlap gap property, which implies that it can be partitioned into exponentially many clusters each constituting at most an exponentially small fraction of the total set of satisfying solutions. We establish a certain robust version of this clustering property for the space of near-satisfying assignments and show that for our constructed Hamiltonians every combinatorial near-ground state induces a near-uniform distribution supported by this set. Standard arguments then are used to show that such distributions cannot be prepared by quantum circuits with depth o(log n). Since the clustering property is exhibited by many random structures, including proper coloring and maximum cut, we anticipate that our approach is extendable to these models as well. △ Less

Submitted 11 March, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: 19 pages, 2 figures

Report number: MIT-CTP/5542

arXiv:2303.13443 [pdf, other]

Cliques, Chromatic Number, and Independent Sets in the Semi-random Process

Authors: David Gamarnik, Mihyun Kang, Pawel Pralat

Abstract: The semi-random graph process is a single player game in which the player is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the player independently and uniformly at random. The player then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a fixed monotone graph property, the objective of the player is to force the graph to s… ▽ More The semi-random graph process is a single player game in which the player is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the player independently and uniformly at random. The player then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a fixed monotone graph property, the objective of the player is to force the graph to satisfy this property with high probability in as few rounds as possible. In this paper, we investigate the following three properties: containing a complete graph of order $k$, having the chromatic number at least $k$, and not having an independent set of size at least $k$. △ Less

Submitted 13 May, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2302.06485 [pdf, ps, other]

Geometric Barriers for Stable and Online Algorithms for Discrepancy Minimization

Authors: David Gamarnik, Eren C. Kızıldağ, Will Perkins, Changji Xu

Abstract: For many computational problems involving randomness, intricate geometric features of the solution space have been used to rigorously rule out powerful classes of algorithms. This is often accomplished through the lens of the multi Overlap Gap Property ($m$-OGP), a rigorous barrier against algorithms exhibiting input stability. In this paper, we focus on the algorithmic tractability of two models:… ▽ More For many computational problems involving randomness, intricate geometric features of the solution space have been used to rigorously rule out powerful classes of algorithms. This is often accomplished through the lens of the multi Overlap Gap Property ($m$-OGP), a rigorous barrier against algorithms exhibiting input stability. In this paper, we focus on the algorithmic tractability of two models: (i) discrepancy minimization, and (ii) the symmetric binary perceptron (\texttt{SBP}), a random constraint satisfaction problem as well as a toy model of a single-layer neural network. Our first focus is on the limits of online algorithms. By establishing and leveraging a novel geometrical barrier, we obtain sharp hardness guarantees against online algorithms for both the \texttt{SBP} and discrepancy minimization. Our results match the best known algorithmic guarantees, up to constant factors. Our second focus is on efficiently finding a constant discrepancy solution, given a random matrix $\mathcal{M}\in\mathbb{R}^{M\times n}$. In a smooth setting, where the entries of $\mathcal{M}$ are i.i.d. standard normal, we establish the presence of $m$-OGP for $n=Θ(M\log M)$. Consequently, we rule out the class of stable algorithms at this value. These results give the first rigorous evidence towards a conjecture of Altschuler and Niles-Weed~\cite[Conjecture~1]{altschuler2021discrepancy}. Our methods use the intricate geometry of the solution space to prove tight hardness results for online algorithms. The barrier we establish is a novel variant of the $m$-OGP. Furthermore, it regards $m$-tuples of solutions with respect to correlated instances, with growing values of $m$, $m=ω(1)$. Importantly, our results rule out online algorithms succeeding even with an exponentially small probability. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2212.03925 [pdf, ps, other]

Densest Subgraphs of a Dense Erdös-Rényi Graph. Asymptotics, Landscape and Universality

Authors: Houssam El Cheairi, David Gamarnik

Abstract: We consider the problem of estimating the edge density of densest $K$-node subgraphs of an Erdös-Rényi graph $\mathbb{G}(n,1/2)$. The problem is well-understood in the regime $K=Θ(\log n)$ and in the regime $K=Θ(n)$. In the former case it can be reduced to the problem of estimating the size of largest cliques, and its extensions. In the latter case the full answer is known up to the order… ▽ More We consider the problem of estimating the edge density of densest $K$-node subgraphs of an Erdös-Rényi graph $\mathbb{G}(n,1/2)$. The problem is well-understood in the regime $K=Θ(\log n)$ and in the regime $K=Θ(n)$. In the former case it can be reduced to the problem of estimating the size of largest cliques, and its extensions. In the latter case the full answer is known up to the order $n^{3\over 2}$ using sophisticated methods from the theory of spin glasses. The intermediate case $K=n^α, α\in (0,1)$ however is not well studied and this is our focus. We establish that that in this regime the density (that is the maximum number of edges supported by any $K$-node subgraph) is ${1\over 4}K^2+{1+o(1)\over 2}K^{3\over 2}\sqrt{\log (n/K)}$, w.h.p. as $n\to\infty$, and provide more refined asymptotics under the $o(\cdot)$, for various ranges of $α$. This extends earlier similar results where this asymptotics was confirmed only when $α$ is a small constant. We extend our results to the case of ''weighted'' graphs, when the weights have either Gaussian or arbitrary sub-Gaussian distributions. The proofs are based on the second moment method combined with concentration bounds, the Borell-TIS inequality for the Gaussian case and the Talagrand's inequality for the case of distributions with bounded support (including the $\mathbb{G}(n,1/2)$ case). The case of general distribution is treated using a novel symmetrized version of the Lindeberg argument, which reduces the general case to the Gaussian case. Finally, using the results above we conduct the landscape analysis of the related Hidden Clique Problem, and establish that it exhibits an overlap gap property when the size of the clique is $O(n^{2\over 3})$, confirming a hypothesis stated in a previous related work. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 63 pages, 1 figure

arXiv:2210.08312 [pdf, other]

doi 10.1088/1742-5468/ac9cc8

Disordered Systems Insights on Computational Hardness

Authors: David Gamarnik, Cristopher Moore, Lenka Zdeborová

Abstract: In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as t… ▽ More In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as the planted versions (with a hidden solution) as prototypical problems in statistical inference and learning. Based on ideas from physics, many of these problems have transitions where they are believed to jump from easy (solvable in polynomial time) to hard (requiring exponential time). We discuss several emerging ideas in theoretical computer science and statistics that provide rigorous evidence for hardness by proving that large classes of algorithms fail in the conjectured hard regime. This includes the overlap gap property, a particular mathematization of clustering or dynamical symmetry-breaking, which can be used to show that many algorithms that are local or robust to changes in their input fail. We also discuss the sum-of-squares hierarchy, which places bounds on proofs or algorithms that use low-degree polynomials such as standard spectral methods and semidefinite relaxations, including the Sherrington-Kirkpatrick model. Throughout the manuscript, we present connections to the physics of disordered systems and associated replica symmetry breaking properties. △ Less

Submitted 18 October, 2022; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: 42 pages

Journal ref: J. Stat. Mech. (2022) 114015

arXiv:2204.10306 [pdf, ps, other]

doi 10.1109/FOCS54457.2022.00039

Performance and limitations of the QAOA at constant levels on large sparse hypergraphs and spin glass models

Authors: Joao Basso, David Gamarnik, Song Mei, Leo Zhou

Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is a general purpose quantum algorithm designed for combinatorial optimization. We analyze its expected performance and prove concentration properties at any constant level (number of layers) on ensembles of random combinatorial optimization problems in the infinite size limit. These ensembles include mixed spin models and Max-$q$-XORSAT on spa… ▽ More The Quantum Approximate Optimization Algorithm (QAOA) is a general purpose quantum algorithm designed for combinatorial optimization. We analyze its expected performance and prove concentration properties at any constant level (number of layers) on ensembles of random combinatorial optimization problems in the infinite size limit. These ensembles include mixed spin models and Max-$q$-XORSAT on sparse random hypergraphs. Our analysis can be understood via a saddle-point approximation of a sum-over-paths integral. This is made rigorous by proving a generalization of the multinomial theorem, which is a technical result of independent interest. We then show that the performance of the QAOA at constant levels for the pure $q$-spin model matches asymptotically the ones for Max-$q$-XORSAT on random sparse Erdős-Rényi hypergraphs and every large-girth regular hypergraph. Through this correspondence, we establish that the average-case value produced by the QAOA at constant levels is bounded away from optimality for pure $q$-spin models when $q\ge 4$ and is even. This limitation gives a hardness of approximation result for quantum algorithms in a new regime where the whole graph is seen. △ Less

Submitted 28 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: 13+47 pages, updated introduction

Journal ref: Proceedings of FOCS 2022, pp. 335-343

arXiv:2203.15667 [pdf, ps, other]

Algorithms and Barriers in the Symmetric Binary Perceptron Model

Authors: David Gamarnik, Eren C. Kızıldağ, Will Perkins, Changji Xu

Abstract: The symmetric binary perceptron ($\texttt{SBP}$) exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the $\texttt{SBP}$ exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons se… ▽ More The symmetric binary perceptron ($\texttt{SBP}$) exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the $\texttt{SBP}$ exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons separated by large Hamming distance \cite{perkins2021frozen,abbe2021proof}. This suggests that finding a solution to the $\texttt{SBP}$ may be computationally intractable. At the same time, the $\texttt{SBP}$ does admit polynomial-time search algorithms at low enough densities. A conjectural explanation for this conundrum was put forth in \cite{baldassi2020clustering}: efficient algorithms succeed in the face of freezing by finding exponentially rare clusters of large size. However, it was discovered recently that such rare large clusters exist at all subcritical densities, even at those well above the limits of known efficient algorithms \cite{abbe2021binary}. Thus the driver of the statistical-to-computational gap exhibited by this model remains a mystery. In this paper, we conduct a different landscape analysis to explain the algorithmic tractability of this problem. We show that at high enough densities the $\texttt{SBP}$ exhibits the multi Overlap Gap Property ($m-$OGP), an intricate geometrical property known to be a rigorous barrier for large classes of algorithms. Our analysis shows that the $m-$OGP threshold (a) is well below the satisfiability threshold; and (b) matches the best known algorithmic threshold up to logarithmic factors as $m\to\infty$. We then prove that the $m-$OGP rules out the class of stable algorithms for the $\texttt{SBP}$ above this threshold. We conjecture that the $m \to \infty$ limit of the $m$-OGP threshold marks the algorithmic threshold for the problem. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2109.14409 [pdf, other]

doi 10.1073/pnas.2108492118

The Overlap Gap Property: a Geometric Barrier to Optimizing over Random Structures

Authors: David Gamarnik

Abstract: The problem of optimizing over random structures emerges in many areas of science and engineering, ranging from statistical physics to machine learning and artificial intelligence. For many such structures finding optimal solutions by means of fast algorithms is not known and often is believed not possible. At the same time the formal hardness of these problems in form of say complexity-theoretic… ▽ More The problem of optimizing over random structures emerges in many areas of science and engineering, ranging from statistical physics to machine learning and artificial intelligence. For many such structures finding optimal solutions by means of fast algorithms is not known and often is believed not possible. At the same time the formal hardness of these problems in form of say complexity-theoretic $NP$-hardness is lacking. In this introductory article a new approach for algorithmic intractability in random structures is described, which is based on the topological disconnectivity property of the set of pair-wise distances of near optimal solutions, called the Overlap Gap Property. The article demonstrates how this property a) emerges in most models known to exhibit an apparent algorithmic hardness b) is consistent with the hardness/tractability phase transition for many models analyzed to the day, and importantly c) allows to mathematically rigorously rule out large classes of algorithms as potential contenders, in particular the algorithms exhibiting the input stability (insensitivity). △ Less

Submitted 1 August, 2021; originally announced September 2021.

Comments: 26 pages, 6 figures, 1 table

MSC Class: 60C05

arXiv:2109.01342 [pdf, ps, other]

Circuit Lower Bounds for the p-Spin Optimization Problem

Authors: David Gamarnik, Aukosh Jagannath, Alexander S. Wein

Abstract: We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least… ▽ More We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least $\log n/(2\log\log n)$ as $n$ grows. This is stronger than the known state of the art bounds of the form $Ω(\log n/(k(n)\log\log n))$ for similar combinatorial optimization problems, where $k(n)$ depends on the optimality value. For example, for the largest clique problem $k(n)$ corresponds to the square of the size of the clique [Rossman 2010]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work, the approach is based on the overlap gap property (OGP) exhibited by random $p$-spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem. To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity. △ Less

Submitted 21 January, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: 14 pages

arXiv:2103.01887 [pdf, ps, other]

doi 10.1109/TSP.2022.3156702

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

Authors: David Gamarnik, Eren C. Kızıldağ, Ilias Zadik

Abstract: We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmativel… ▽ More We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in $d$) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector $X\in\mathbb{R}^d$ need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through \emph{fat-shattering dimension}, a scale-sensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in $d$ with a low degree), and are in fact near-linear for some important cases of interest. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 34 pages. Some of the results in the present paper are significantly strengthened versions of certain results appearing in arXiv:2003.10523

arXiv:2103.01369 [pdf, other]

Algorithmic Obstructions in the Random Number Partitioning Problem

Authors: David Gamarnik, Eren C. Kızıldağ

Abstract: We consider the algorithmic problem of finding a near-optimal solution for the number partitioning problem (NPP). The NPP appears in many applications, including the design of randomized controlled trials, multiprocessor scheduling, and cryptography; and is also of theoretical significance. It possesses a so-called statistical-to-computational gap: when its input $X$ has distribution… ▽ More We consider the algorithmic problem of finding a near-optimal solution for the number partitioning problem (NPP). The NPP appears in many applications, including the design of randomized controlled trials, multiprocessor scheduling, and cryptography; and is also of theoretical significance. It possesses a so-called statistical-to-computational gap: when its input $X$ has distribution $\mathcal{N}(0,I_n)$, its optimal value is $Θ(\sqrt{n}2^{-n})$ w.h.p.; whereas the best polynomial-time algorithm achieves an objective value of only $2^{-Θ(\log^2 n)}$, w.h.p. In this paper, we initiate the study of the nature of this gap. Inspired by insights from statistical physics, we study the landscape of NPP and establish the presence of the Overlap Gap Property (OGP), an intricate geometric property which is known to be a rigorous evidence of an algorithmic hardness for large classes of algorithms. By leveraging the OGP, we establish that (a) any sufficiently stable algorithm, appropriately defined, fails to find a near-optimal solution with energy below $2^{-ω(n \log^{-1/5} n)}$; and (b) a very natural MCMC dynamics fails to find near-optimal solutions. Our simulations suggest that the state of the art algorithm achieving $2^{-Θ(\log^2 n)}$ is indeed stable, but formally verifying this is left as an open problem. OGP regards the overlap structure of $m-$tuples of solutions achieving a certain objective value. When $m$ is constant we prove the presence of OGP in the regime $2^{-Θ(n)}$, and the absence of it in the regime $2^{-o(n)}$. Interestingly, though, by considering overlaps with growing values of $m$ we prove the presence of the OGP up to the level $2^{-ω(\sqrt{n\log n})}$. Our proof of the failure of stable algorithms at values $2^{-ω(n \log^{-1/5} n)}$ employs methods from Ramsey Theory from the extremal combinatorics, and is of independent interest. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 84 pages, 3 figures

arXiv:2011.04915 [pdf, ps, other]

Correlation Decay and the Absence of Zeros Property of Partition Functions

Authors: David Gamarnik

Abstract: Absence of (complex) zeros property is at the heart of the interpolation method developed by Barvinok \cite{barvinok2017combinatorics} for designing deterministic approximation algorithms for various graph counting and computing partition functions problems. Earlier methods for solving the same problem include the one based on the correlation decay property. Remarkably, the classes of graphs for w… ▽ More Absence of (complex) zeros property is at the heart of the interpolation method developed by Barvinok \cite{barvinok2017combinatorics} for designing deterministic approximation algorithms for various graph counting and computing partition functions problems. Earlier methods for solving the same problem include the one based on the correlation decay property. Remarkably, the classes of graphs for which the two methods apply sometimes coincide or nearly coincide. In this paper we show that this is more than just a coincidence. We establish that if the interpolation method is valid for a family of graphs satisfying the self-reducibility property, then this family exhibits a form of correlation decay property which is asymptotic Strong Spatial Mixing (SSM) at distances $ω(\log n)$, where $n$ is the number of nodes of the graph. This applies in particular to amenable graphs, such as graphs which are finite subsets of lattices. Our proof is based on a certain graph polynomial representation of the associated partition function. This representation is at the heart of the design of the polynomial time algorithms underlying the interpolation method itself. We conjecture that our result holds for all, and not just amenable graphs. △ Less

Submitted 1 December, 2020; v1 submitted 10 November, 2020; originally announced November 2020.

Comments: 27 pages

arXiv:2007.07219 [pdf, ps, other]

Stability, memory, and messaging tradeoffs in heterogeneous service systems

Authors: David Gamarnik, John N. Tsitsiklis, Martin Zubeldia

Abstract: We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching decisions… ▽ More We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers. We study the fundamental resource requirements (memory bits and message exchange rate) in order for a dispatching policy to be {\bf maximally stable}, i.e., stable whenever the processing rates are such that the arrival rate is less than the total available processing rate. First, for the case of Poisson arrivals and exponential service times, we present a policy that is maximally stable while using a positive (but arbitrarily small) message rate, and $\log_2(n)$ bits of memory. Second, we show that within a certain broad class of policies, a dispatching policy that exchanges $o\big(n^2\big)$ messages per unit of time, and with $o(\log(n))$ bits of memory, cannot be maximally stable. Thus, as long as the message rate is not too excessive, a logarithmic memory is necessary and sufficient for maximal stability. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Comments: arXiv admin note: text overlap with arXiv:1807.02882

arXiv:2006.02806 [pdf, ps, other]

Estimation of Monotone Multi-Index Models

Authors: David Gamarnik, Julia Gaudio

Abstract: In a multi-index model with $k$ index vectors, the input variables are transformed by taking inner products with the index vectors. A transfer function $f: \mathbb{R}^k \to \mathbb{R}$ is applied to these inner products to generate the output. Thus, multi-index models are a generalization of linear models. In this paper, we consider monotone multi-index models. Namely, the transfer function is ass… ▽ More In a multi-index model with $k$ index vectors, the input variables are transformed by taking inner products with the index vectors. A transfer function $f: \mathbb{R}^k \to \mathbb{R}$ is applied to these inner products to generate the output. Thus, multi-index models are a generalization of linear models. In this paper, we consider monotone multi-index models. Namely, the transfer function is assumed to be coordinate-wise monotone. The monotone multi-index model therefore generalizes both linear regression and isotonic regression, which is the estimation of a coordinate-wise monotone function. We consider the case of nonnegative index vectors. We provide an algorithm based on integer programming for the estimation of monotone multi-index models, and provide guarantees on the $L_2$ loss of the estimated function relative to the ground truth. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 20 pages

arXiv:2005.08747 [pdf, other]

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: Worst Case Examples

Authors: Edward Farhi, David Gamarnik, Sam Gutmann

Abstract: The Quantum Approximate Optimization Algorithm can be applied to search problems on graphs with a cost function that is a sum of terms corresponding to the edges. When conjugating an edge term, the QAOA unitary at depth p produces an operator that depends only on the subgraph consisting of edges that are at most p away from the edge in question. On random d-regular graphs, with d fixed and with p… ▽ More The Quantum Approximate Optimization Algorithm can be applied to search problems on graphs with a cost function that is a sum of terms corresponding to the edges. When conjugating an edge term, the QAOA unitary at depth p produces an operator that depends only on the subgraph consisting of edges that are at most p away from the edge in question. On random d-regular graphs, with d fixed and with p a small constant time log n, these neighborhoods are almost all trees and so the performance of the QAOA is determined only by how it acts on an edge in the middle of tree. Both bipartite random d-regular graphs and general random d-regular graphs locally are trees so the QAOA's performance is the same on these two ensembles. Using this we can show that the QAOA with $(d-1)^{2p} < n^A$ for any $A<1$, can only achieve an approximation ratio of 1/2 for Max-Cut on bipartite random d-regular graphs for d large. For Maximum Independent Set, in the same setting, the best approximation ratio is a d-dependent constant that goes to 0 as d gets big. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: 6 pages, no figures

Report number: MIT-CTP/5206

arXiv:2004.12063 [pdf, ps, other]

Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics

Authors: David Gamarnik, Aukosh Jagannath, Alexander S. Wein

Abstract: We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b… ▽ More We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b) low-depth Boolean circuits; (c) the Langevin dynamics algorithm. We show that these families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory (although we consider the search problem as opposed to the decision problem). Our proof uses the fact that these models are known to exhibit a variant of the overlap gap property (OGP) of near-optimal solutions. Specifically, for both models, every two solutions whose objectives are above a certain threshold are either close or far from each other. The crux of our proof is that the classes of algorithms we consider exhibit a form of stability. We show by an interpolation argument that stable algorithms cannot overcome the OGP barrier. The stability of Langevin dynamics is an immediate consequence of the well-posedness of stochastic differential equations. The stability of low-degree polynomials and Boolean circuits is established using tools from Gaussian and Boolean analysis -- namely hypercontractivity and total influence, as well as a novel lower bound for random walks avoiding certain subsets. In the case of Boolean circuits, the result also makes use of Linal-Mansour-Nisan's classical theorem. Our techniques apply more broadly to low influence functions and may apply more generally. △ Less

Submitted 26 January, 2022; v1 submitted 25 April, 2020; originally announced April 2020.

Comments: 41 pages; v1 is the conference paper "Low-Degree Hardness of Random Optimization Problems" (FOCS 2020); v2 is a journal version which adds circuit lower bounds for max independent set, based on ideas from our note arXiv:2109.01342

arXiv:2004.09002 [pdf, other]

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case

Authors: Edward Farhi, David Gamarnik, Sam Gutmann

Abstract: The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent set… ▽ More The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent sets in random graphs with dn/2 edges kee** d fixed and n large. Using the Overlap Gap Property of almost optimal independent sets in random graphs, and the locality of the QAOA, we are able to show that if p is less than a d-dependent constant times log n, the QAOA cannot do better than finding an independent set of size .854 times the optimal for d large. Because the logarithm is slowly growing, even at one million qubits we can only show that the algorithm is blocked if p is in single digits. At higher p the algorithm "sees" the whole graph and we have no indication that performance is limited. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: 19 pages, no figures

Report number: MIT-CTP/5198

arXiv:2003.10523 [pdf, other]

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

Authors: Matt Emschwiller, David Gamarnik, Eren C. Kızıldağ, Ilias Zadik

Abstract: In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the… ▽ More In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. In this paper we prove a series of results which provide a somewhat diverging explanation. Adopting a teacher/student model where the teacher network is used to generate the predictions and student network is trained on the observed labeled data, and then tested on out-of-sample data, we show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension and approximation guarantee alone, regardless of the number of internal nodes of either teacher or student network. Our claim is based on approximating both teacher and student networks by polynomial (tensor) regression models with degree depending on the desired accuracy and network depth only. Such a parametrization notably does not depend on the number of internal nodes. Thus a message implied by our results is that parametrizing wide neural networks by the number of hidden nodes is misleading, and a more fitting measure of parametrization complexity is the number of regression coefficients associated with tensorized data. In particular, this somewhat reconciles the generalization ability of neural networks with more classical statistical notions of data complexity and generalization bounds. Our empirical results on MNIST and Fashion-MNIST datasets indeed confirm that tensorized regression achieves a good out-of-sample performance, even when the degree of the tensor is at most two. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: 59 pages, 3 figures

arXiv:1912.01599 [pdf, ps, other]

Stationary Points of Shallow Neural Networks with Quadratic Activation Function

Authors: David Gamarnik, Eren C. Kızıldağ, Ilias Zadik

Abstract: We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*\in\mathbb{R}^{m\times d}$, where $m$ is the width of the hidden layer and $d\le m$ is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are f… ▽ More We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*\in\mathbb{R}^{m\times d}$, where $m$ is the width of the hidden layer and $d\le m$ is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are full-rank we obtain the following results. First, we establish that the landscape of the empirical risk admits an "energy barrier" separating rank-deficient $W$ from $W^*$: if $W$ is rank deficient, then its risk is bounded away from zero by an amount we quantify. We then couple this result by showing that, assuming number $N$ of samples grows at least like a polynomial function of $d$, all full-rank approximate stationary points of the empirical risk are nearly global optimum. These two results allow us to prove that gradient descent, when initialized below the energy barrier, approximately minimizes the empirical risk and recovers the planted weights in polynomial-time. Next, we show that initializing below this barrier is in fact easily achieved when the weights are randomly generated under relatively weak assumptions. We show that provided the network is sufficiently overparametrized, initializing with an appropriate multiple of the identity suffices to obtain a risk below the energy barrier. At a technical level, the last result is a consequence of the semicircle law for the Wishart ensemble and could be of independent interest. Finally, we study the minimizers of the empirical risk and identify a simple necessary and sufficient geometric condition on the training data under which any minimizer has necessarily zero generalization error. We show that as soon as $N\ge N^*=d(d+1)/2$, randomly generated data enjoys this geometric condition almost surely, while that ceases to be true if $N<N^*$. △ Less

Submitted 9 July, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: 54 pages

arXiv:1911.06943 [pdf, ps, other]

The Overlap Gap Property and Approximate Message Passing Algorithms for $p$-spin models

Authors: David Gamarnik, Aukosh Jagannath

Abstract: We consider the algorithmic problem of finding a near ground state (near optimal solution) of a $p$-spin model. We show that for a class of algorithms broadly defined as Approximate Message Passing (AMP), the presence of the Overlap Gap Property (OGP), appropriately defined, is a barrier. We conjecture that when $p\ge 4$ the model does indeed exhibits OGP (and prove it for the space of binary solu… ▽ More We consider the algorithmic problem of finding a near ground state (near optimal solution) of a $p$-spin model. We show that for a class of algorithms broadly defined as Approximate Message Passing (AMP), the presence of the Overlap Gap Property (OGP), appropriately defined, is a barrier. We conjecture that when $p\ge 4$ the model does indeed exhibits OGP (and prove it for the space of binary solutions). Assuming the validity of this conjecture, as an implication, the AMP fails to find near ground states in these models, per our result. We extend our result to the problem of finding pure states by means of Thouless, Anderson and Palmer (TAP) based iterations, which is yet another example of AMP type algorithms. We show that such iterations fail to find pure states approximately, subject to the conjecture that the space of pure states exhibits the OGP, appropriately stated, when $p\ge 4$. △ Less

Submitted 25 November, 2019; v1 submitted 15 November, 2019; originally announced November 2019.

Comments: 27 pages

arXiv:1910.10890 [pdf, other]

doi 10.1109/TIT.2021.3113921

Inference in High-Dimensional Linear Regression via Lattice Basis Reduction and Integer Relation Detection

Authors: David Gamarnik, Eren C. Kızıldağ, Ilias Zadik

Abstract: We focus on the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $β^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $β^*$, but instead adopt a different regularization: In the noiseless setting, we assume $β^*$ consists of ent… ▽ More We focus on the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $β^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $β^*$, but instead adopt a different regularization: In the noiseless setting, we assume $β^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q$-rationality); or irrational numbers supported on a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-support assumption. Using a novel combination of the PSLQ integer relation detection, and LLL lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $β^*\in\mathbb{R}^p$ enjoying the mixed-support assumption, from its linear measurements $Y=Xβ^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement $(n=1)$. In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $β^*\in\mathbb{R}^p$ enjoying $Q$-rationality, from its noisy measurements $Y=Xβ^*+W\in\mathbb{R}^n$, even with a single sample $(n=1)$. We further establish for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample $(n=1)$ regime, where the sparsity-based methods such as LASSO and Basis Pursuit are known to fail. Furthermore, our results also reveal an algorithmic connection between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: 56 pages. Parts of the material of this manuscript were presented at NeurIPS 2018, and ISIT 2019. This submission subsumes the content of arXiv:1803.06716

Journal ref: IEEE Transactions on Information Theory (Volume: 67, Issue: 12, December 2021)

arXiv:1908.09959 [pdf, other]

doi 10.1007/s00440-021-01089-7

The Overlap Gap Property in Principal Submatrix Recovery

Authors: David Gamarnik, Aukosh Jagannath, Subhabrata Sen

Abstract: We study support recovery for a $k \times k$ principal submatrix with elevated mean $λ/N$, hidden in an $N\times N$ symmetric mean zero Gaussian matrix. Here $λ>0$ is a universal constant, and we assume $k = N ρ$ for some constant $ρ\in (0,1)$. We establish that {there exists a constant $C>0$ such that} the MLE recovers a constant proportion of the hidden submatrix if… ▽ More We study support recovery for a $k \times k$ principal submatrix with elevated mean $λ/N$, hidden in an $N\times N$ symmetric mean zero Gaussian matrix. Here $λ>0$ is a universal constant, and we assume $k = N ρ$ for some constant $ρ\in (0,1)$. We establish that {there exists a constant $C>0$ such that} the MLE recovers a constant proportion of the hidden submatrix if $λ{\geq C} \sqrt{\frac{1}ρ \log \frac{1}ρ}$, {while such recovery is information theoretically impossible if $λ= o( \sqrt{\frac{1}ρ \log \frac{1}ρ} )$}. The MLE is computationally intractable in general, and in fact, for $ρ>0$ sufficiently small, this problem is conjectured to exhibit a \emph{statistical-computational gap}. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some $\varepsilon>0$ and $\sqrt{\frac{1}ρ \log \frac{1}ρ } \ll λ\ll \frac{1}{ρ^{1/2 + \varepsilon}}$, the problem exhibits a variant of the \emph{Overlap-Gap-Property (OGP)}. As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for $λ> 1/ρ$, a simple spectral method recovers a constant proportion of the hidden submatrix. △ Less

Submitted 12 December, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

Comments: 42 pages, 1 figure

MSC Class: Primary:68Q87; 60C05; Secondary:82B44; 68Q25; 62H25

Journal ref: Probab. Theo. Relat. Fields 181, pp 757-814 (2021)

arXiv:1907.01715 [pdf, other]

Sparse High-Dimensional Isotonic Regression

Authors: David Gamarnik, Julia Gaudio

Abstract: We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinate-wise monotone functi… ▽ More We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinate-wise monotone functions, and identify the regime of statistical consistency of our estimator. We also propose a linear program to recover the active coordinates, and provide theoretical recovery guarantees. We close with experiments on cancer classification, and show that our method significantly outperforms standard methods. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 28 pages, 3 figures

arXiv:1904.07174 [pdf, other]

The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property

Authors: David Gamarnik, Ilias Zadik

Abstract: In this paper we study the computational-statistical gap of the planted clique problem, where a clique of size $k$ is planted in an Erdos Renyi graph $G(n,\frac{1}{2})$ resulting in a graph $G\left(n,\frac{1}{2},k\right)$. The goal is to recover the planted clique vertices by observing $G\left(n,\frac{1}{2},k\right)$ . It is known that the clique can be recovered as long as… ▽ More In this paper we study the computational-statistical gap of the planted clique problem, where a clique of size $k$ is planted in an Erdos Renyi graph $G(n,\frac{1}{2})$ resulting in a graph $G\left(n,\frac{1}{2},k\right)$. The goal is to recover the planted clique vertices by observing $G\left(n,\frac{1}{2},k\right)$ . It is known that the clique can be recovered as long as $k \geq \left(2+ε\right)\log n $ for any $ε>0$, but no polynomial-time algorithm is known for this task unless $k=Ω\left(\sqrt{n} \right)$. Following a statistical-physics inspired point of view as an attempt to understand this computational-statistical gap, we study the landscape of the "sufficiently dense" subgraphs of $G$ and their overlap with the planted clique. Using the first moment method, we study the densest subgraph problems for subgraphs with fixed, but arbitrary, overlap size with the planted clique, and provide evidence of a phase transition for the presence of Overlap Gap Property (OGP) at $k=Θ\left(\sqrt{n}\right)$. OGP is a concept introduced originally in spin glass theory and known to suggest algorithmic hardness when it appears. We establish the presence of OGP when $k$ is a small positive power of $n$ by using a conditional second moment method. As our main technical tool, we establish the first, to the best of our knowledge, concentration results for the $K$-densest subgraph problem for the Erdos-Renyi model $G\left(n,\frac{1}{2}\right)$ when $K=n^{0.5-ε}$ for arbitrary $ε>0$. Finally, to study the OGP we employ a certain form of overparametrization, which is conceptually aligned with a large body of recent work in learning theory and optimization. △ Less

Submitted 30 December, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: 70 pages, 3 Figures. Added Figure 1 (phase diagram), and a new result proving that the OGP implies the failure of an MCMC family to recover the planted clique

arXiv:1810.05907 [pdf, ps, other]

doi 10.1214/20-AAP1625

Computing the partition function of the Sherrington-Kirkpatrick model is hard on average

Authors: David Gamarnik, Eren Kizildag

Abstract: We establish the average-case hardness of the algorithmic problem of exact computation of the partition function associated with the Sherrington-Kirkpatrick model of spin glasses with Gaussian couplings and random external field. In particular, we establish that unless $P= \#P$, there does not exist a polynomial-time algorithm to exactly compute the partition function on average. This is done by s… ▽ More We establish the average-case hardness of the algorithmic problem of exact computation of the partition function associated with the Sherrington-Kirkpatrick model of spin glasses with Gaussian couplings and random external field. In particular, we establish that unless $P= \#P$, there does not exist a polynomial-time algorithm to exactly compute the partition function on average. This is done by showing that if there exists a polynomial time algorithm, which exactly computes the partition function for inverse polynomial fraction ($1/n^{O(1)}$) of all inputs, then there is a polynomial time algorithm, which exactly computes the partition function for all inputs, with high probability, yielding $P=\#P$. The computational model that we adopt is {\em finite-precision arithmetic}, where the algorithmic inputs are truncated first to a certain level $N$ of digital precision. The ingredients of our proof include the random and downward self-reducibility of the partition function with random external field; an argument of Cai et al. \cite{cai1999hardness} for establishing the average-case hardness of computing the permanent of a matrix; a list-decoding algorithm of Sudan \cite{sudan1996maximum}, for reconstructing polynomials intersecting a given list of numbers at sufficiently many points; and near-uniformity of the log-normal distribution, modulo a large prime $p$. To the best of our knowledge, our result is the first one establishing a provable hardness of a model arising in the field of spin glasses. Furthermore, we extend our result to the same problem under a different {\em real-valued} computational model, e.g. using a Blum-Shub-Smale machine \cite{blum1988theory} operating over real-valued inputs. △ Less

Submitted 25 November, 2019; v1 submitted 13 October, 2018; originally announced October 2018.

Comments: 31 pages

Journal ref: The Annals of Applied Probability 31(3): 1474-1504 (June 2021)

arXiv:1809.06950 [pdf, ps, other]

Finding cliques using few probes

Authors: Uriel Feige, David Gamarnik, Joe Neeman, Miklós Z. Rácz, Prasad Tetali

Abstract: Consider algorithms with unbounded computation time that probe the entries of the adjacency matrix of an $n$ vertex graph, and need to output a clique. We show that if the input graph is drawn at random from $G_{n,\frac{1}{2}}$ (and hence is likely to have a clique of size roughly $2\log n$), then for every $δ< 2$ and constant $\ell$, there is an $α< 2$ (that may depend on $δ$ and $\ell$) such tha… ▽ More Consider algorithms with unbounded computation time that probe the entries of the adjacency matrix of an $n$ vertex graph, and need to output a clique. We show that if the input graph is drawn at random from $G_{n,\frac{1}{2}}$ (and hence is likely to have a clique of size roughly $2\log n$), then for every $δ< 2$ and constant $\ell$, there is an $α< 2$ (that may depend on $δ$ and $\ell$) such that no algorithm that makes $n^δ$ probes in $\ell$ rounds is likely (over the choice of the random graph) to output a clique of size larger than $α\log n$. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: 15 pages

arXiv:1807.02882 [pdf, ps, other]

A lower bound on the queueing delay in resource constrained load balancing

Authors: David Gamarnik, John N. Tsitsiklis, Martin Zubeldia

Abstract: We consider the following distributed service model: jobs with unit mean, general distribution, and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, and are immediately dispatched to one of several queues associated with $n$ identical servers with unit processing rate. We assume that the dispatching decisions are made by a central dispatcher endowed with a finit… ▽ More We consider the following distributed service model: jobs with unit mean, general distribution, and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, and are immediately dispatched to one of several queues associated with $n$ identical servers with unit processing rate. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers. We study the fundamental resource requirements (memory bits and message exchange rate), in order to drive the expected queueing delay in steady-state of a typical job to zero, as $n$ increases. We develop a novel approach to show that, within a certain broad class of "symmetric" policies, every dispatching policy with a message rate of the order of $n$, and with a memory of the order of $\log n$ bits, results in an expected queueing delay which is bounded away from zero, uniformly as $n\to\infty$. △ Less

Submitted 8 July, 2018; originally announced July 2018.

Comments: 44 pages

arXiv:1805.11238 [pdf, ps, other]

Explicit construction of RIP matrices is Ramsey-hard

Authors: David Gamarnik

Abstract: Matrices $Φ\in\R^{n\times p}$ satisfying the Restricted Isometry Property (RIP) are an important ingredient of the compressive sensing methods. While it is known that random matrices satisfy the RIP with high probability even for $n=\log^{O(1)}p$, the explicit construction of such matrices defied the repeated efforts, and the most known approaches hit the so-called $\sqrt{n}$ sparsity bottleneck.… ▽ More Matrices $Φ\in\R^{n\times p}$ satisfying the Restricted Isometry Property (RIP) are an important ingredient of the compressive sensing methods. While it is known that random matrices satisfy the RIP with high probability even for $n=\log^{O(1)}p$, the explicit construction of such matrices defied the repeated efforts, and the most known approaches hit the so-called $\sqrt{n}$ sparsity bottleneck. The notable exception is the work by Bourgain et al \cite{bourgain2011explicit} constructing an $n\times p$ RIP matrix with sparsity $s=Θ(n^{{1\over 2}+ε})$, but in the regime $n=Ω(p^{1-δ})$. In this short note we resolve this open question in a sense by showing that an explicit construction of a matrix satisfying the RIP in the regime $n=O(\log^2 p)$ and $s=Θ(n^{1\over 2})$ implies an explicit construction of a three-colored Ramsey graph on $p$ nodes with clique sizes bounded by $O(\log^2 p)$ -- a question in the extremal combinatorics which has been open for decades. △ Less

Submitted 15 November, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: 4 pages

arXiv:1803.06716 [pdf, other]

High Dimensional Linear Regression using Lattice Basis Reduction

Authors: David Gamarnik, Ilias Zadik

Abstract: We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector $β^*$ from $n$ noisy linear observations $Y=Xβ^*+W \in \mathbb{R}^n$, for known $X \in \mathbb{R}^{n \times p}$ and unknown $W \in \mathbb{R}^n$. Unlike most of the literature on this model we make no sparsity assumption on $β^*$. Instead we adopt a regularization based on assuming t… ▽ More We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector $β^*$ from $n$ noisy linear observations $Y=Xβ^*+W \in \mathbb{R}^n$, for known $X \in \mathbb{R}^{n \times p}$ and unknown $W \in \mathbb{R}^n$. Unlike most of the literature on this model we make no sparsity assumption on $β^*$. Instead we adopt a regularization based on assuming that the underlying vectors $β^*$ have rational entries with the same denominator $Q \in \mathbb{Z}_{>0}$. We call this $Q$-rationality assumption. We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the $Q$-rationality assumption, our algorithm recovers exactly the vector $β^*$ for a large class of distributions for the iid entries of $X$ and non-zero noise $W$. We prove that it is successful under small noise, even when the learner has access to only one observation ($n=1$). Furthermore, we prove that in the case of the Gaussian white noise for $W$, $n=o\left(p/\log p\right)$ and $Q$ sufficiently large, our algorithm tolerates a nearly optimal information-theoretic level of the noise. △ Less

Submitted 8 November, 2018; v1 submitted 18 March, 2018; originally announced March 2018.

arXiv:1711.04952 [pdf, ps, other]

Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm

Authors: David Gamarnik, Ilias Zadik

Abstract: We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $β^*$ from $n$ noisy linear observations of the form $Y=Xβ^*+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,σ^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between… ▽ More We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $β^*$ from $n$ noisy linear observations of the form $Y=Xβ^*+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,σ^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of $n$, call it $n^*$, for which the recovery is information theoretically possible, and the minimum value of $n$, call it $n_{\mathrm{alg}}$, for which an efficient algorithm is known to provably recover $β^*$. In \cite{gamarnikzadik} it was conjectured that the gap is not artificial, in the sense that for sample sizes $n \in [n^*,n_{\mathrm{alg}}]$ the problem is algorithmically hard. We support this conjecture in two ways. Firstly, we show that the optimal solution of the LASSO provably fails to $\ell_2$-stably recover the unknown vector $β^*$ when $n \in [n^*,c n_{\mathrm{alg}}]$, for some sufficiently small constant $c>0$. Secondly, we establish that $n_{\mathrm{alg}}$, up to a multiplicative constant factor, is a phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of $k$-sparse vectors. The presence of such an Overlap Gap Property phase transition, which originates in statistical physics, is known to provide evidence of an algorithmic hardness. Finally we show that if $n>C n_{\mathrm{alg}}$ for some large enough constant $C>0$, a very simple algorithm based on a local search improvement rule is able both to $\ell_2$-stably recover the unknown vector $β^*$ and to infer correctly its support, adding it to the list of provably successful algorithms for the high dimensional linear regression problem. △ Less

Submitted 22 September, 2019; v1 submitted 14 November, 2017; originally announced November 2017.

Comments: Added a result on the failure of the LASSO recovery mechanism in the conjectured algorithmically hard regime $n<c n_{alg}$ and minor corrections

arXiv:1709.04102 [pdf, other]

Delay, memory, and messaging tradeoffs in distributed service systems

Authors: David Gamarnik, John N. Tsitsiklis, Martin Zubeldia

Abstract: We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate $λn$, with $0<λ<1$, and are immediately dispatched by a centralized dispatcher to one of $n$ First-In-First-Out queues associated with $n$ identical servers. The dispatcher is endowed with a finite memory, and with the ability to… ▽ More We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate $λn$, with $0<λ<1$, and are immediately dispatched by a centralized dispatcher to one of $n$ First-In-First-Out queues associated with $n$ identical servers. The dispatcher is endowed with a finite memory, and with the ability to exchange messages with the servers. We propose and study a resource-constrained "pull-based" dispatching policy that involves two parameters: (i) the number of memory bits available at the dispatcher, and (ii) the average rate at which servers communicate with the dispatcher. We establish (using a fluid limit approach) that the asymptotic, as $n\to\infty$, expected queueing delay is zero when either (i) the number of memory bits grows logarithmically with $n$ and the message rate grows superlinearly with $n$, or (ii) the number of memory bits grows superlogarithmically with $n$ and the message rate is at least $λn$. Furthermore, when the number of memory bits grows only logarithmically with $n$ and the message rate is proportional to $n$, we obtain a closed-form expression for the (now positive) asymptotic delay. Finally, we demonstrate an interesting phase transition in the resource-constrained regime where the asymptotic delay is non-zero. In particular, we show that for any given $α>0$ (no matter how small), if our policy only uses a linear message rate $αn$, the resulting asymptotic delay is upper bounded, uniformly over all $λ<1$; this is in sharp contrast to the delay obtained when no messages are used ($α= 0$), which grows as $1/(1-λ)$ when $λ\uparrow 1$, or when the popular power-of-$d$-choices is used, in which the delay grows as $\log(1/(1-λ))$. △ Less

Submitted 12 September, 2017; originally announced September 2017.

arXiv:1708.04263 [pdf, other]

Uniqueness of Gibbs Measures for Continuous Hardcore Models

Authors: David Gamarnik, Kavita Ramanan

Abstract: We formulate a continuous version of the well known discrete hardcore (or independent set) model on a locally finite graph, parameterized by the so-called activity parameter $λ> 0$. In this version, the state or "spin value" $x_u$ of any node $u$ of the graph lies in the interval $[0,1]$, the hardcore constraint $x_u + x_v \leq 1$ is satisfied for every edge $(u,v)$ of the graph, and the space of… ▽ More We formulate a continuous version of the well known discrete hardcore (or independent set) model on a locally finite graph, parameterized by the so-called activity parameter $λ> 0$. In this version, the state or "spin value" $x_u$ of any node $u$ of the graph lies in the interval $[0,1]$, the hardcore constraint $x_u + x_v \leq 1$ is satisfied for every edge $(u,v)$ of the graph, and the space of feasible configurations is given by a convex polytope. When the graph is a regular tree, we show that there is a unique Gibbs measure associated to each activity parameter $λ>0$. Our result shows that, in contrast to the standard discrete hardcore model, the continuous hardcore model does not exhibit a phase transition on the infinite regular tree. We also consider a family of continuous models that interpolate between the discrete and continuous hardcore models on a regular tree when $λ= 1$ and show that each member of the family has a unique Gibbs measure, even when the discrete model does not. In each case, the proof entails the analysis of an associated Hamiltonian dynamical system that describes a certain limit of the marginal distribution at a node. Furthermore, given any sequence of regular graphs with fixed degree and girth diverging to infinity, we apply our results to compute the asymptotic limit of suitably normalized volumes of the corresponding sequence of convex polytopes of feasible configurations. In particular, this yields an approximation for the partition function of the continuous hard core model on a regular graph with large girth in the case $λ= 1$. △ Less

Submitted 14 August, 2017; originally announced August 2017.

Comments: 34 pages, 1 figure

MSC Class: 60K35; 82B20; 82B27; 68W25

arXiv:1707.05386 [pdf, ps, other]

doi 10.1214/18-AOP1291

Suboptimality of local algorithms for a class of max-cut problems

Authors: Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, Mustazee Rahman

Abstract: We show that in random $K$-uniform hypergraphs of constant average degree, for even $K \geq 4$, local algorithms defined as factors of i.i.d. can not find nearly maximal cuts, when the average degree is sufficiently large. These algorithms have been used frequently to obtain lower bounds for the max-cut problem on random graphs, but it was not known whether they could be successful in finding near… ▽ More We show that in random $K$-uniform hypergraphs of constant average degree, for even $K \geq 4$, local algorithms defined as factors of i.i.d. can not find nearly maximal cuts, when the average degree is sufficiently large. These algorithms have been used frequently to obtain lower bounds for the max-cut problem on random graphs, but it was not known whether they could be successful in finding nearly maximal cuts. This result follows from the fact that the overlap of any two nearly maximal cuts in such hypergraphs does not take values in a certain non-trivial interval - a phenomenon referred to as the overlap gap property - which is proved by comparing diluted models with large average degree with appropriate fully connected spin glass models and showing the overlap gap property in the latter setting. △ Less

Submitted 8 August, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

Comments: Final version; to appear in Ann. Probab

Journal ref: Annals of Probability 2019, Vol. 47, No. 3, 1587-1618

arXiv:1702.02267 [pdf, ps, other]

Matrix Completion from $O(n)$ Samples in Linear Time

Authors: David Gamarnik, Quan Li, Hongyi Zhang

Abstract: We consider the problem of reconstructing a rank-$k$ $n \times n$ matrix $M$ from a sampling of its entries. Under a certain incoherence assumption on $M$ and for the case when both the rank and the condition number of $M$ are bounded, it was shown in \cite{CandesRecht2009, CandesTao2010, keshavan2010, Recht2011, Jain2012, Hardt2014} that $M$ can be recovered exactly or approximately (depending on… ▽ More We consider the problem of reconstructing a rank-$k$ $n \times n$ matrix $M$ from a sampling of its entries. Under a certain incoherence assumption on $M$ and for the case when both the rank and the condition number of $M$ are bounded, it was shown in \cite{CandesRecht2009, CandesTao2010, keshavan2010, Recht2011, Jain2012, Hardt2014} that $M$ can be recovered exactly or approximately (depending on some trade-off between accuracy and computational complexity) using $O(n \, \text{poly}(\log n))$ samples in super-linear time $O(n^{a} \, \text{poly}(\log n))$ for some constant $a \geq 1$. In this paper, we propose a new matrix completion algorithm using a novel sampling scheme based on a union of independent sparse random regular bipartite graphs. We show that under the same conditions w.h.p. our algorithm recovers an $ε$-approximation of $M$ in terms of the Frobenius norm using $O(n \log^2(1/ε))$ samples and in linear time $O(n \log^2(1/ε))$. This provides the best known bounds both on the sample complexity and computational complexity for reconstructing (approximately) an unknown low-rank matrix. The novelty of our algorithm is two new steps of thresholding singular values and rescaling singular vectors in the application of the "vanilla" alternating minimization algorithm. The structure of sparse random regular graphs is used heavily for controlling the impact of these regularization steps. △ Less

Submitted 22 August, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

Comments: 45 pages, 1 figure. Short version accepted for presentation at Conference on Learning Theory (COLT) 2017

arXiv:1701.04455 [pdf, other]

High-Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transition

Authors: David Gamarnik, Ilias Zadik

Abstract: We consider a sparse linear regression model Y=Xβ^{*}+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and β^{*} is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error \min_β\|Y-Xβ\|_{2}, where the minimization is over all k… ▽ More We consider a sparse linear regression model Y=Xβ^{*}+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and β^{*} is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error \min_β\|Y-Xβ\|_{2}, where the minimization is over all k-sparse binary vectors β. The approximation reveals interesting structural properties of the underlying regression problem. In particular, a) We establish that n^*=2k\log p/\log (2k/σ^{2}+1) is a phase transition point with the following "all-or-nothing" property. When n exceeds n^{*}, (2k)^{-1}\|β_{2}-β^*\|_0\approx 0, and when n is below n^{*}, (2k)^{-1}\|β_{2}-β^*\|_0\approx 1, where β_2 is the optimal solution achieving the smallest squared error. With this we prove that n^{*} is the asymptotic threshold for recovering β^* information theoretically. b) We compute the squared error for an intermediate problem \min_β\|Y-Xβ\|_{2} where minimization is restricted to vectors βwith \|β-β^{*}\|_0=2k ζ, for ζ\in [0,1]. We show that a lower bound part Γ(ζ) of the estimate, which corresponds to the estimate based on the first moment method, undergoes a phase transition at three different thresholds, namely n_{\text{inf,1}}=σ^2\log p, which is information theoretic bound for recovering β^* when k=1 and σis large, then at n^{*} and finally at n_{\text{LASSO/CS}}. c) We establish a certain Overlap Gap Property (OGP) on the space of all binary vectors βwhen n\le ck\log p for sufficiently small constant c. We conjecture that OGP is the source of algorithmic hardness of solving the minimization problem \min_β\|Y-Xβ\|_{2} in the regime n<n_{\text{LASSO/CS}}. △ Less

Submitted 25 September, 2019; v1 submitted 16 January, 2017; originally announced January 2017.

Comments: 36 pages, 5 figures

arXiv:1610.03522 [pdf, ps, other]

Supermarket Queueing System in the Heavy Traffic Regime. Short Queue Dynamics

Authors: Patrick Eschenfeldt, David Gamarnik

Abstract: We consider a queueing system with $n$ parallel queues operating according to the so-called "supermarket model" in which arriving customers join the shortest of $d$ randomly selected queues. Assuming rate $nλ_{n}$ Poisson arrivals and rate $1$ exponentially distributed service times, we consider this model in the heavy traffic regime, described by $λ_{n}\uparrow 1$ as $n\to\infty$. We give a simpl… ▽ More We consider a queueing system with $n$ parallel queues operating according to the so-called "supermarket model" in which arriving customers join the shortest of $d$ randomly selected queues. Assuming rate $nλ_{n}$ Poisson arrivals and rate $1$ exponentially distributed service times, we consider this model in the heavy traffic regime, described by $λ_{n}\uparrow 1$ as $n\to\infty$. We give a simple expectation argument establishing that majority of queues have steady state length at least $\log_d(1-λ_{n})^{-1} - O(1)$ with probability approaching one as $n\rightarrow\infty$, implying the same for the steady state delay of a typical customer. Our main result concerns the detailed behavior of queues with length smaller than $\log_d(1-λ_{n})^{-1}-O(1)$. Assuming $λ_{n}$ converges to $1$ at rate at most $\sqrt{n}$, we show that the dynamics of such queues does not follow a diffusion process, as is typical for queueing systems in heavy traffic, but is described instead by a deterministic infinite system of linear differential equations, after an appropriate rescaling. The unique fixed point solution of this system is shown explicitly to be of the form $π_{1}(d^{i}-1)/(d-1), i\ge 1$, which we conjecture describes the steady state behavior of the queue lengths after the same rescaling. Our result is obtained by combination of several technical ideas including establishing the existence and uniqueness of an associated infinite dimensional system of non-linear integral equations and adopting an appropriate stopped process as an intermediate step. △ Less

Submitted 17 January, 2017; v1 submitted 11 October, 2016; originally announced October 2016.

Comments: 39 pages, 1 figure

arXiv:1603.06002 [pdf, ps, other]

A Message Passing Algorithm for the Problem of Path Packing in Graphs

Authors: Patrick Eschenfeldt, David Gamarnik

Abstract: We consider the problem of packing node-disjoint directed paths in a directed graph. We consider a variant of this problem where each path starts within a fixed subset of root nodes, subject to a given bound on the length of paths. This problem is motivated by the so-called kidney exchange problem, but has potential other applications and is interesting in its own right. We propose a new algorit… ▽ More We consider the problem of packing node-disjoint directed paths in a directed graph. We consider a variant of this problem where each path starts within a fixed subset of root nodes, subject to a given bound on the length of paths. This problem is motivated by the so-called kidney exchange problem, but has potential other applications and is interesting in its own right. We propose a new algorithm for this problem based on the message passing/belief propagation technique. A priori this problem does not have an associated graphical model, so in order to apply a belief propagation algorithm we provide a novel representation of the problem as a graphical model. Standard belief propagation on this model has poor scaling behavior, so we provide an efficient implementation that significantly decreases the complexity. We provide numerical results comparing the performance of our algorithm on both artificially created graphs and real world networks to several alternative algorithms, including algorithms based on integer programming (IP) techniques. These comparisons show that our algorithm scales better to large instances than IP-based algorithms and often finds better solutions than a simple algorithm that greedily selects the longest path from each root node. In some cases it also finds better solutions than the ones found by IP-based algorithms even when the latter are allowed to run significantly longer than our algorithm. △ Less

Submitted 18 March, 2016; originally announced March 2016.

Comments: 34 pages

arXiv:1602.08529 [pdf, other]

Finding a Large Submatrix of a Gaussian Random Matrix

Authors: David Gamarnik, Quan Li

Abstract: We consider the problem of finding a $k\times k$ submatrix of an $n\times n$ matrix with i.i.d. standard Gaussian entries, which has a large average entry. It was shown earlier by Bhamidi et al. that the largest average value of such a matrix is $2\sqrt{\log n/k}$ with high probability. In the same paper an evidence was provided that a natural greedy algorithm called Largest Average Submatrix (… ▽ More We consider the problem of finding a $k\times k$ submatrix of an $n\times n$ matrix with i.i.d. standard Gaussian entries, which has a large average entry. It was shown earlier by Bhamidi et al. that the largest average value of such a matrix is $2\sqrt{\log n/k}$ with high probability. In the same paper an evidence was provided that a natural greedy algorithm called Largest Average Submatrix ($\LAS$) should produce a matrix with average entry approximately $\sqrt{2}$ smaller. In this paper we show that the matrix produced by the $\LAS$ algorithm is indeed $\sqrt{2\log n/k}$ w.h.p. Then by drawing an analogy with the problem of finding cliques in random graphs, we propose a simple greedy algorithm which produces a $k\times k$ matrix with asymptotically the same average value. Since the greedy algorithm is the best known algorithm for finding cliques in random graphs, it is tempting to believe that beating the factor $\sqrt{2}$ performance gap suffered by both algorithms might be very challenging. Surprisingly, we show the existence of a very simple algorithm which produces a matrix with average value $(4/3)\sqrt{2\log n/k}$. To get an insight into the algorithmic hardness of this problem, and motivated by methods originating in the theory of spin glasses, we conduct the so-called expected overlap analysis of matrices with average value asymptotically $α\sqrt{2\log n/k}$. The overlap corresponds to the number of common rows and common columns for pairs of matrices achieving this value. We discover numerically an intriguing phase transition at $α^*\approx 1.3608..$: when $α<α^*$ the space of overlaps is a continuous subset of $[0,1]^2$, whereas $α=α^*$ marks the onset of discontinuity, and the model exhibits the Overlap Gap Property when $α>α^*$. We conjecture that $α>α^*$ marks the onset of the algorithmic hardness. △ Less

Submitted 26 February, 2016; originally announced February 2016.

Comments: 38 pages 6 figures

arXiv:1602.02164 [pdf, other]

doi 10.1109/LSP.2016.2576979

A Note on Alternating Minimization Algorithm for the Matrix Completion Problem

Authors: David Gamarnik, Sidhant Misra

Abstract: We consider the problem of reconstructing a low rank matrix from a subset of its entries and analyze two variants of the so-called Alternating Minimization algorithm, which has been proposed in the past. We establish that when the underlying matrix has rank $r=1$, has positive bounded entries, and the graph $\mathcal{G}$ underlying the revealed entries has bounded degree and diameter which is at m… ▽ More We consider the problem of reconstructing a low rank matrix from a subset of its entries and analyze two variants of the so-called Alternating Minimization algorithm, which has been proposed in the past. We establish that when the underlying matrix has rank $r=1$, has positive bounded entries, and the graph $\mathcal{G}$ underlying the revealed entries has bounded degree and diameter which is at most logarithmic in the size of the matrix, both algorithms succeed in reconstructing the matrix approximately in polynomial time starting from an arbitrary initialization. We further provide simulation results which suggest that the second algorithm which is based on the message passing type updates, performs significantly better. △ Less

Submitted 5 February, 2016; originally announced February 2016.

Comments: 8 pages, 2 figures

arXiv:1502.00999 [pdf, ps, other]

Join the Shortest Queue with Many Servers. The Heavy Traffic Asymptotics

Authors: Patrick Eschenfeldt, David Gamarnik

Abstract: We consider queueing systems with n parallel queues under a Join the Shortest Queue (JSQ) policy in the Halfin-Whitt heavy traffic regime. We use the martingale method to prove that a scaled process counting the number of idle servers and queues of length exactly 2 weakly converges to a two-dimensional reflected Ornstein-Uhlenbeck process, while processes counting longer queues converge to a deter… ▽ More We consider queueing systems with n parallel queues under a Join the Shortest Queue (JSQ) policy in the Halfin-Whitt heavy traffic regime. We use the martingale method to prove that a scaled process counting the number of idle servers and queues of length exactly 2 weakly converges to a two-dimensional reflected Ornstein-Uhlenbeck process, while processes counting longer queues converge to a deterministic system decaying to zero in constant time. This limiting system is comparable to that of the traditional Halfin-Whitt model, but there are key differences in the queueing behavior of the JSQ model. In particular, only a vanishing fraction of customers will have to wait, but those who do will incur a constant order waiting time. △ Less

Submitted 21 September, 2015; v1 submitted 3 February, 2015; originally announced February 2015.

Comments: 21 pages, 2 figures

arXiv:1412.1443 [pdf, ps, other]

Structure learning of antiferromagnetic Ising models

Authors: Guy Bresler, David Gamarnik, Devavrat Shah

Abstract: In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of $Ω(p^{d/2})$ for learning ge… ▽ More In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of $Ω(p^{d/2})$ for learning general graphical models on $p$ nodes of maximum degree $d$, for the class of so-called statistical algorithms recently introduced by Feldman et al (2013). The lower bound suggests that the $O(p^d)$ runtime required to exhaustively search over neighborhoods cannot be significantly improved without restricting the class of models. Aside from structural assumptions on the graph such as it being a tree, hypertree, tree-like, etc., many recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari (2009) showed that all known low-complexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the opposite behavior: very strong interaction allows efficient learning in time $O(p^2)$. We provide an algorithm whose performance interpolates between $O(p^2)$ and $O(p^{d+2})$ depending on the strength of the repulsion. △ Less

Submitted 3 December, 2014; originally announced December 2014.

Comments: 15 pages. NIPS 2014

arXiv:1411.1698 [pdf, ps, other]

On the Max-Cut of Sparse Random Graphs

Authors: David Gamarnik, Quan Li

Abstract: We consider the problem of estimating the size of a maximum cut (Max-Cut problem) in a random Erdős-Rényi graph on $n$ nodes and $\lfloor cn \rfloor$ edges. It is shown in Coppersmith et al. ~\cite{Coppersmith2004} that the size of the maximum cut in this graph normalized by the number of nodes belongs to the asymptotic region $[c/2+0.37613\sqrt{c},c/2+0.58870\sqrt{c}]$ with high probability (w.h.… ▽ More We consider the problem of estimating the size of a maximum cut (Max-Cut problem) in a random Erdős-Rényi graph on $n$ nodes and $\lfloor cn \rfloor$ edges. It is shown in Coppersmith et al. ~\cite{Coppersmith2004} that the size of the maximum cut in this graph normalized by the number of nodes belongs to the asymptotic region $[c/2+0.37613\sqrt{c},c/2+0.58870\sqrt{c}]$ with high probability (w.h.p.) as $n$ increases, for all sufficiently large $c$. In this paper we improve both upper and lower bounds by introducing a novel bounding technique. Specifically, we establish that the size of the maximum cut normalized by the number of nodes belongs to the interval $[c/2+0.47523\sqrt{c},c/2+0.55909\sqrt{c}]$ w.h.p. as $n$ increases, for all sufficiently large $c$. Instead of considering the expected number of cuts achieving a particular value as is done in the application of the first moment method, we observe that every maximum size cut satisfies a certain local optimality property, and we compute the expected number of cuts with a given value satisfying this local optimality property. Estimating this expectation amounts to solving a rather involved two dimensional large deviations problem. We solve this underlying large deviation problem asymptotically as $c$ increases and use it to obtain an improved upper bound on the Max-Cut value. The lower bound is obtained by application of the second moment method, coupled with the same local optimality constraint, and is shown to work up to the stated lower bound value $c/2+0.47523\sqrt{c}$. It is worth noting that both bounds are stronger than the ones obtained by standard first and second moment methods. Finally, we also obtain an improved lower bound of $1.36000n$ on the Max-Cut for the random cubic graph or any cubic graph with large girth, improving the previous best bound of $1.33773n$. △ Less

Submitted 12 February, 2017; v1 submitted 6 November, 2014; originally announced November 2014.

Comments: To appear in Random Structures & Algorithms

Showing 1–50 of 84 results for author: Gamarnik, D