-
Bounds on the ground state energy of quantum $p$-spin Hamiltonians
Authors:
Eric R. Anschuetz,
David Gamarnik,
Bobak T. Kiani
Abstract:
We consider the problem of estimating the ground state energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Our main result shows that the maximum energy achievable by product states has a well-defined limit (for even $p$) as $n\to\infty$ and is $E_{\text{product}}^\ast=\sqrt{2 \log p}$ in the limit of large $p$. This valu…
▽ More
We consider the problem of estimating the ground state energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Our main result shows that the maximum energy achievable by product states has a well-defined limit (for even $p$) as $n\to\infty$ and is $E_{\text{product}}^\ast=\sqrt{2 \log p}$ in the limit of large $p$. This value is interpreted as the maximal energy of a much simpler so-called Random Energy Model, widely studied in the setting of classical spin glasses. The proof of the limit existing follows from an extension of Fekete's Lemma after we demonstrate near super-additivity of the (normalized) quenched free energy. The proof of the value follows from a second moment method on the number of states achieving a given energy when restricting to an $ε$-net of product states.
Furthermore, we relate the maximal energy achieved over all states to a $p$-dependent constant $γ\left(p\right)$, which is defined by the degree of violation of a certain asymptotic independence ansatz over graph matchings. We show that the maximal energy achieved by all states $E^\ast\left(p\right)$ in the limit of large $n$ is at most $\sqrt{γ\left(p\right)}E_{\text{product}}^\ast$. We also prove using Lindeberg's interpolation method that the limiting $E^\ast\left(p\right)$ is robust with respect to the choice of the randomness and, for instance, also applies to the case of sparse random Hamiltonians. This robustness in the randomness extends to a wide range of random Hamiltonian models including SYK and random quantum max-cut.
△ Less
Submitted 17 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Integrating High-Dimensional Functions Deterministically
Authors:
David Gamarnik,
Devin Smedira
Abstract:
We design a Quasi-Polynomial time deterministic approximation algorithm for computing the integral of a multi-dimensional separable function, supported by some underlying hyper-graph structure, appropriately defined. Equivalently, our integral is the partition function of a graphical model with continuous potentials. While randomized algorithms for high-dimensional integration are widely known, de…
▽ More
We design a Quasi-Polynomial time deterministic approximation algorithm for computing the integral of a multi-dimensional separable function, supported by some underlying hyper-graph structure, appropriately defined. Equivalently, our integral is the partition function of a graphical model with continuous potentials. While randomized algorithms for high-dimensional integration are widely known, deterministic counterparts generally do not exist. We use the correlation decay method applied to the Riemann sum of the function to produce our algorithm. For our method to work, we require that the domain is bounded and the hyper-edge potentials are positive and bounded on the domain. We further assume that upper and lower bounds on the potentials separated by a multiplicative factor of $1 + O(1/Δ^2)$, where $Δ$ is the maximum degree of the graph. When $Δ= 3$, our method works provided the upper and lower bounds are separated by a factor of at most $1.0479$. To the best of our knowledge, our algorithm is the first deterministic algorithm for high-dimensional integration of a continuous function, apart from the case of trivial product form distributions.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Computing the Volume of a Restricted Independent Set Polytope Deterministically
Authors:
David Gamarnik,
Devin Smedira
Abstract:
We construct a quasi-polynomial time deterministic approximation algorithm for computing the volume of an independent set polytope with restrictions. Randomized polynomial time approximation algorithms for computing the volume of a convex body have been known now for several decades, but the corresponding deterministic counterparts are not available, and our algorithm is the first of this kind. Th…
▽ More
We construct a quasi-polynomial time deterministic approximation algorithm for computing the volume of an independent set polytope with restrictions. Randomized polynomial time approximation algorithms for computing the volume of a convex body have been known now for several decades, but the corresponding deterministic counterparts are not available, and our algorithm is the first of this kind. The class of polytopes for which our algorithm applies arises as linear programming relaxation of the independent set problem with the additional restriction that each variable takes value in the interval $[0,1-α]$ for some $α<1/2$. (We note that the $α\ge 1/2$ case is trivial).
We use the correlation decay method for this problem applied to its appropriate and natural discretization. The method works provided $α> 1/2-O(1/Δ^2)$, where $Δ$ is the maximum degree of the graph. When $Δ=3$ (the sparsest non-trivial case), our method works provided $0.488<α<0.5$. Interestingly, the interpolation method, which is based on analyzing complex roots of the associated partition functions, fails even in the trivial case when the underlying graph is a singleton.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Sharp Thresholds Imply Circuit Lower Bounds: from random 2-SAT to Planted Clique
Authors:
David Gamarnik,
Elchanan Mossel,
Ilias Zadik
Abstract:
We show that sharp thresholds for Boolean functions directly imply average-case circuit lower bounds. More formally we show that any Boolean function exhibiting a sharp enough threshold at \emph{arbitrary} critical density cannot be computed by Boolean circuits of bounded depth and polynomial size.
Our general result implies new average-case bounded depth circuit lower bounds in a variety of set…
▽ More
We show that sharp thresholds for Boolean functions directly imply average-case circuit lower bounds. More formally we show that any Boolean function exhibiting a sharp enough threshold at \emph{arbitrary} critical density cannot be computed by Boolean circuits of bounded depth and polynomial size.
Our general result implies new average-case bounded depth circuit lower bounds in a variety of settings.
(a) ($k$-cliques) For $k=Θ(n)$, we prove that any circuit of depth $d$ deciding the presence of a size $k$ clique in a random graph requires exponential-in-$n^{Θ(1/d)}$ size. To the best of our knowledge, this is the first average-case exponential size lower bound for bounded depth (not necessarily monotone) circuits solving the fundamental $k$-clique problem (for any $k=k_n$).
(b)(random 2-SAT) We prove that any circuit of depth $d$ deciding the satisfiability of a random 2-SAT formula requires exponential-in-$n^{Θ(1/d)}$ size. To the best of our knowledge, this is the first bounded depth circuit lower bound for random $k$-SAT for any value of $k \geq 2.$ Our results also provide the first rigorous lower bound in agreement with a conjectured, but debated, ``computational hardness'' of random $k$-SAT around its satisfiability threshold.
(c)(Statistical estimation -- planted $k$-clique) Over the recent years, multiple statistical estimation problems have also been proven to exhibit a ``statistical'' sharp threshold, called the All-or-Nothing (AoN) phenomenon. We show that AoN also implies circuit lower bounds for statistical problems. As a simple corollary of that, we prove that any circuit of depth $d$ that solves to information-theoretic optimality a ``dense'' variant of the celebrated planted $k$-clique problem requires exponential-in-$n^{Θ(1/d)}$ size.
△ Less
Submitted 30 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Product states optimize quantum $p$-spin models for large $p$
Authors:
Eric R. Anschuetz,
David Gamarnik,
Bobak T. Kiani
Abstract:
We consider the problem of estimating the maximal energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Denoting by $E^*(p)$ the (appropriately normalized) maximal energy in the limit of a large number of qubits $n$, we show that $E^*(p)$ approaches $\sqrt{2\log 6}$ as $p$ increases. This value is interpreted as the maximal…
▽ More
We consider the problem of estimating the maximal energy of quantum $p$-local spin glass random Hamiltonians, the quantum analogues of widely studied classical spin glass models. Denoting by $E^*(p)$ the (appropriately normalized) maximal energy in the limit of a large number of qubits $n$, we show that $E^*(p)$ approaches $\sqrt{2\log 6}$ as $p$ increases. This value is interpreted as the maximal energy of a much simpler so-called Random Energy Model, widely studied in the setting of classical spin glasses.
Our most notable and (arguably) surprising result proves the existence of near-maximal energy states which are product states, and thus not entangled. Specifically, we prove that with high probability as $n\to\infty$, for any $E<E^*(p)$ there exists a product state with energy $\geq E$ at sufficiently large constant $p$. Even more surprisingly, this remains true even when restricting to tensor products of Pauli eigenstates. Our approximations go beyond what is known from monogamy-of-entanglement style arguments -- the best of which, in this normalization, achieve approximation error growing with $n$. Our results not only challenge prevailing beliefs in physics that extremely low-temperature states of random local Hamiltonians should exhibit non-negligible entanglement, but they also imply that classical algorithms can be just as effective as quantum algorithms in optimizing Hamiltonians with large locality -- though performing such optimization is still likely a hard problem.
Our results are robust with respect to the choice of the randomness (disorder) and apply to the case of sparse random Hamiltonian using Lindeberg's interpolation method. The proof of the main result is obtained by estimating the expected trace of the associated partition function, and then matching its asymptotics with the extremal energy of product states using the second moment method.
△ Less
Submitted 5 April, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Shattering in the Ising Pure $p$-Spin Model
Authors:
David Gamarnik,
Aukosh Jagannath,
Eren C. Kızıldağ
Abstract:
We study the Ising pure $p$-spin model for large $p$. We investigate the landscape of the Hamiltonian of this model. We show that for any $γ>0$ and any large enough $p$, the model exhibits an intricate geometrical property known as the multi Overlap Gap Property above the energy value $γ\sqrt{2\ln 2}$. We then show that for any inverse temperature $\sqrt{\ln 2}<β<\sqrt{2\ln 2}$ and any large $p$,…
▽ More
We study the Ising pure $p$-spin model for large $p$. We investigate the landscape of the Hamiltonian of this model. We show that for any $γ>0$ and any large enough $p$, the model exhibits an intricate geometrical property known as the multi Overlap Gap Property above the energy value $γ\sqrt{2\ln 2}$. We then show that for any inverse temperature $\sqrt{\ln 2}<β<\sqrt{2\ln 2}$ and any large $p$, the model exhibits shattering: w.h.p. as $n\to\infty$, there exists exponentially many well-separated clusters such that (a) each cluster has exponentially small Gibbs mass, and (b) the clusters collectively contain all but a vanishing fraction of Gibbs mass. Moreover, these clusters consist of configurations with energy near $β$. Range of temperatures for which shattering occurs is within the replica symmetric region. To the best of our knowledge, this is the first shattering result regarding the Ising $p$-spin models. Our proof is elementary, and in particular based on simple applications of the first and the second moment methods.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Barriers for the performance of graph neural networks (GNN) in discrete random structures. A comment on~\cite{schuetz2022combinatorial},\cite{angelini2023modern},\cite{schuetz2023reply}
Authors:
David Gamarnik
Abstract:
Recently graph neural network (GNN) based algorithms were proposed to solve a variety of combinatorial optimization problems, including Maximum Cut problem, Maximum Independent Set problem and similar other problems~\cite{schuetz2022combinatorial},\cite{schuetz2022graph}.
The publication~\cite{schuetz2022combinatorial} stirred a debate whether GNN based method was adequately benchmarked against…
▽ More
Recently graph neural network (GNN) based algorithms were proposed to solve a variety of combinatorial optimization problems, including Maximum Cut problem, Maximum Independent Set problem and similar other problems~\cite{schuetz2022combinatorial},\cite{schuetz2022graph}.
The publication~\cite{schuetz2022combinatorial} stirred a debate whether GNN based method was adequately benchmarked against best prior methods. In particular, critical commentaries~\cite{angelini2023modern} and~\cite{boettcher2023inability} point out that simple greedy algorithm performs better than GNN in the setting of random graphs, and in fact stronger algorithmic performance can be reached with more sophisticated methods. A response from the authors~\cite{schuetz2023reply} pointed out that GNN performance can be improved further by tuning up the parameters better.
We do not intend to discuss the merits of arguments and counter-arguments in~\cite{schuetz2022combinatorial},\cite{angelini2023modern},\cite{boettcher2023inability},\cite{schuetz2023reply}. Rather in this note we establish a fundamental limitation for running GNN on random graphs considered in these references, for a broad range of choices of GNN architecture. These limitations arise from the presence of the Overlap Gap Property (OGP) phase transition, which is a barrier for many algorithms, both classical and quantum. As we demonstrate in this paper, it is also a barrier to GNN due to its local structure. We note that at the same time known algorithms ranging from simple greedy algorithms to more sophisticated algorithms based on message passing, provide best results for these problems \emph{up to} the OGP phase transition. This leaves very little space for GNN to outperform the known algorithms, and based on this we side with the conclusions made in~\cite{angelini2023modern} and~\cite{boettcher2023inability}.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
Maximally-stable Local Optima in Random Graphs and Spin Glasses: Phase Transitions and Universality
Authors:
Yatin Dandi,
David Gamarnik,
Lenka Zdeborová
Abstract:
We provide a unified analysis of stable local optima of Ising spins with Hamiltonians having pair-wise interactions and partitions in random weighted graphs where a large number of vertices possess sufficient single spin-flip stability. For graphs, we consider partitions on random graphs where almost all vertices possess sufficient appropriately defined friendliness/unfriendliness. For spin glasse…
▽ More
We provide a unified analysis of stable local optima of Ising spins with Hamiltonians having pair-wise interactions and partitions in random weighted graphs where a large number of vertices possess sufficient single spin-flip stability. For graphs, we consider partitions on random graphs where almost all vertices possess sufficient appropriately defined friendliness/unfriendliness. For spin glasses, we characterize approximate local optima having almost all local magnetic fields of sufficiently large magnitude. For $n$ nodes, as $n \rightarrow \infty$, we prove that the maximum number of vertices possessing such stability undergoes a phase transition from $n-o(n)$ to $n-Θ(n)$ around a certain value of the stability, proving a conjecture from Behrens et al. [2022].Through a universality argument, we further prove that such a phase transition occurs around the same value of the stability for different choices of interactions, specifically ferromagnetic and anti-ferromagnetic, for sparse graphs, as $n \rightarrow \infty$ in the large degree limit. Furthermore, we show that after appropriate re-scaling, the same value of the threshold characterises such a phase transition for the case of fully connected spin-glass models. Our results also allow the characterization of possible energy values of maximally stable approximate local optima. Our work extends and proves seminal results in statistical physics related to metastable states, in particular, the work of Bray and Moore [1981].
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Combinatorial NLTS From the Overlap Gap Property
Authors:
Eric R. Anschuetz,
David Gamarnik,
Bobak Kiani
Abstract:
In an important recent development, Anshu, Breuckmann, and Nirkhe [ABN22] resolved positively the so-called No Low-Energy Trivial State (NLTS) conjecture by Freedman and Hastings. The conjecture postulated the existence of linear-size local Hamiltonians on n qubit systems for which no near-ground state can be prepared by a shallow (sublogarithmic depth) circuit. The construction in [ABN22] is base…
▽ More
In an important recent development, Anshu, Breuckmann, and Nirkhe [ABN22] resolved positively the so-called No Low-Energy Trivial State (NLTS) conjecture by Freedman and Hastings. The conjecture postulated the existence of linear-size local Hamiltonians on n qubit systems for which no near-ground state can be prepared by a shallow (sublogarithmic depth) circuit. The construction in [ABN22] is based on recently developed good quantum codes. Earlier results in this direction included the constructions of the so-called Combinatorial NLTS -- a weaker version of NLTS -- where a state is defined to have low energy if it violates at most a vanishing fraction of the Hamiltonian terms [AB22]. These constructions were also based on codes.
In this paper we provide a "non-code" construction of a class of Hamiltonians satisfying the Combinatorial NLTS. The construction is inspired by one in [AB22], but our proof uses the complex solution space geometry of random K-SAT instead of properties of codes. Specifically, it is known that above a certain clause-to-variables density the set of satisfying assignments of random K-SAT exhibits an overlap gap property, which implies that it can be partitioned into exponentially many clusters each constituting at most an exponentially small fraction of the total set of satisfying solutions. We establish a certain robust version of this clustering property for the space of near-satisfying assignments and show that for our constructed Hamiltonians every combinatorial near-ground state induces a near-uniform distribution supported by this set. Standard arguments then are used to show that such distributions cannot be prepared by quantum circuits with depth o(log n). Since the clustering property is exhibited by many random structures, including proper coloring and maximum cut, we anticipate that our approach is extendable to these models as well.
△ Less
Submitted 11 March, 2024; v1 submitted 2 April, 2023;
originally announced April 2023.
-
Cliques, Chromatic Number, and Independent Sets in the Semi-random Process
Authors:
David Gamarnik,
Mihyun Kang,
Pawel Pralat
Abstract:
The semi-random graph process is a single player game in which the player is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the player independently and uniformly at random. The player then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a fixed monotone graph property, the objective of the player is to force the graph to s…
▽ More
The semi-random graph process is a single player game in which the player is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the player independently and uniformly at random. The player then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a fixed monotone graph property, the objective of the player is to force the graph to satisfy this property with high probability in as few rounds as possible. In this paper, we investigate the following three properties: containing a complete graph of order $k$, having the chromatic number at least $k$, and not having an independent set of size at least $k$.
△ Less
Submitted 13 May, 2024; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Geometric Barriers for Stable and Online Algorithms for Discrepancy Minimization
Authors:
David Gamarnik,
Eren C. Kızıldağ,
Will Perkins,
Changji Xu
Abstract:
For many computational problems involving randomness, intricate geometric features of the solution space have been used to rigorously rule out powerful classes of algorithms. This is often accomplished through the lens of the multi Overlap Gap Property ($m$-OGP), a rigorous barrier against algorithms exhibiting input stability. In this paper, we focus on the algorithmic tractability of two models:…
▽ More
For many computational problems involving randomness, intricate geometric features of the solution space have been used to rigorously rule out powerful classes of algorithms. This is often accomplished through the lens of the multi Overlap Gap Property ($m$-OGP), a rigorous barrier against algorithms exhibiting input stability. In this paper, we focus on the algorithmic tractability of two models: (i) discrepancy minimization, and (ii) the symmetric binary perceptron (\texttt{SBP}), a random constraint satisfaction problem as well as a toy model of a single-layer neural network.
Our first focus is on the limits of online algorithms. By establishing and leveraging a novel geometrical barrier, we obtain sharp hardness guarantees against online algorithms for both the \texttt{SBP} and discrepancy minimization. Our results match the best known algorithmic guarantees, up to constant factors. Our second focus is on efficiently finding a constant discrepancy solution, given a random matrix $\mathcal{M}\in\mathbb{R}^{M\times n}$. In a smooth setting, where the entries of $\mathcal{M}$ are i.i.d. standard normal, we establish the presence of $m$-OGP for $n=Θ(M\log M)$. Consequently, we rule out the class of stable algorithms at this value. These results give the first rigorous evidence towards a conjecture of Altschuler and Niles-Weed~\cite[Conjecture~1]{altschuler2021discrepancy}.
Our methods use the intricate geometry of the solution space to prove tight hardness results for online algorithms. The barrier we establish is a novel variant of the $m$-OGP. Furthermore, it regards $m$-tuples of solutions with respect to correlated instances, with growing values of $m$, $m=ω(1)$. Importantly, our results rule out online algorithms succeeding even with an exponentially small probability.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Densest Subgraphs of a Dense Erdös-Rényi Graph. Asymptotics, Landscape and Universality
Authors:
Houssam El Cheairi,
David Gamarnik
Abstract:
We consider the problem of estimating the edge density of densest $K$-node subgraphs of an Erdös-Rényi graph $\mathbb{G}(n,1/2)$. The problem is well-understood in the regime $K=Θ(\log n)$ and in the regime $K=Θ(n)$. In the former case it can be reduced to the problem of estimating the size of largest cliques, and its extensions. In the latter case the full answer is known up to the order…
▽ More
We consider the problem of estimating the edge density of densest $K$-node subgraphs of an Erdös-Rényi graph $\mathbb{G}(n,1/2)$. The problem is well-understood in the regime $K=Θ(\log n)$ and in the regime $K=Θ(n)$. In the former case it can be reduced to the problem of estimating the size of largest cliques, and its extensions. In the latter case the full answer is known up to the order $n^{3\over 2}$ using sophisticated methods from the theory of spin glasses. The intermediate case $K=n^α, α\in (0,1)$ however is not well studied and this is our focus. We establish that that in this regime the density (that is the maximum number of edges supported by any $K$-node subgraph) is ${1\over 4}K^2+{1+o(1)\over 2}K^{3\over 2}\sqrt{\log (n/K)}$, w.h.p. as $n\to\infty$, and provide more refined asymptotics under the $o(\cdot)$, for various ranges of $α$. This extends earlier similar results where this asymptotics was confirmed only when $α$ is a small constant.
We extend our results to the case of ''weighted'' graphs, when the weights have either Gaussian or arbitrary sub-Gaussian distributions. The proofs are based on the second moment method combined with concentration bounds, the Borell-TIS inequality for the Gaussian case and the Talagrand's inequality for the case of distributions with bounded support (including the $\mathbb{G}(n,1/2)$ case). The case of general distribution is treated using a novel symmetrized version of the Lindeberg argument, which reduces the general case to the Gaussian case. Finally, using the results above we conduct the landscape analysis of the related Hidden Clique Problem, and establish that it exhibits an overlap gap property when the size of the clique is $O(n^{2\over 3})$, confirming a hypothesis stated in a previous related work.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Disordered Systems Insights on Computational Hardness
Authors:
David Gamarnik,
Cristopher Moore,
Lenka Zdeborová
Abstract:
In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as t…
▽ More
In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as the planted versions (with a hidden solution) as prototypical problems in statistical inference and learning. Based on ideas from physics, many of these problems have transitions where they are believed to jump from easy (solvable in polynomial time) to hard (requiring exponential time). We discuss several emerging ideas in theoretical computer science and statistics that provide rigorous evidence for hardness by proving that large classes of algorithms fail in the conjectured hard regime. This includes the overlap gap property, a particular mathematization of clustering or dynamical symmetry-breaking, which can be used to show that many algorithms that are local or robust to changes in their input fail. We also discuss the sum-of-squares hierarchy, which places bounds on proofs or algorithms that use low-degree polynomials such as standard spectral methods and semidefinite relaxations, including the Sherrington-Kirkpatrick model. Throughout the manuscript, we present connections to the physics of disordered systems and associated replica symmetry breaking properties.
△ Less
Submitted 18 October, 2022; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Performance and limitations of the QAOA at constant levels on large sparse hypergraphs and spin glass models
Authors:
Joao Basso,
David Gamarnik,
Song Mei,
Leo Zhou
Abstract:
The Quantum Approximate Optimization Algorithm (QAOA) is a general purpose quantum algorithm designed for combinatorial optimization. We analyze its expected performance and prove concentration properties at any constant level (number of layers) on ensembles of random combinatorial optimization problems in the infinite size limit. These ensembles include mixed spin models and Max-$q$-XORSAT on spa…
▽ More
The Quantum Approximate Optimization Algorithm (QAOA) is a general purpose quantum algorithm designed for combinatorial optimization. We analyze its expected performance and prove concentration properties at any constant level (number of layers) on ensembles of random combinatorial optimization problems in the infinite size limit. These ensembles include mixed spin models and Max-$q$-XORSAT on sparse random hypergraphs. Our analysis can be understood via a saddle-point approximation of a sum-over-paths integral. This is made rigorous by proving a generalization of the multinomial theorem, which is a technical result of independent interest. We then show that the performance of the QAOA at constant levels for the pure $q$-spin model matches asymptotically the ones for Max-$q$-XORSAT on random sparse Erdős-Rényi hypergraphs and every large-girth regular hypergraph. Through this correspondence, we establish that the average-case value produced by the QAOA at constant levels is bounded away from optimality for pure $q$-spin models when $q\ge 4$ and is even. This limitation gives a hardness of approximation result for quantum algorithms in a new regime where the whole graph is seen.
△ Less
Submitted 28 September, 2022; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Algorithms and Barriers in the Symmetric Binary Perceptron Model
Authors:
David Gamarnik,
Eren C. Kızıldağ,
Will Perkins,
Changji Xu
Abstract:
The symmetric binary perceptron ($\texttt{SBP}$) exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the $\texttt{SBP}$ exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons se…
▽ More
The symmetric binary perceptron ($\texttt{SBP}$) exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the $\texttt{SBP}$ exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons separated by large Hamming distance \cite{perkins2021frozen,abbe2021proof}. This suggests that finding a solution to the $\texttt{SBP}$ may be computationally intractable. At the same time, the $\texttt{SBP}$ does admit polynomial-time search algorithms at low enough densities. A conjectural explanation for this conundrum was put forth in \cite{baldassi2020clustering}: efficient algorithms succeed in the face of freezing by finding exponentially rare clusters of large size. However, it was discovered recently that such rare large clusters exist at all subcritical densities, even at those well above the limits of known efficient algorithms \cite{abbe2021binary}. Thus the driver of the statistical-to-computational gap exhibited by this model remains a mystery.
In this paper, we conduct a different landscape analysis to explain the algorithmic tractability of this problem. We show that at high enough densities the $\texttt{SBP}$ exhibits the multi Overlap Gap Property ($m-$OGP), an intricate geometrical property known to be a rigorous barrier for large classes of algorithms. Our analysis shows that the $m-$OGP threshold (a) is well below the satisfiability threshold; and (b) matches the best known algorithmic threshold up to logarithmic factors as $m\to\infty$. We then prove that the $m-$OGP rules out the class of stable algorithms for the $\texttt{SBP}$ above this threshold. We conjecture that the $m \to \infty$ limit of the $m$-OGP threshold marks the algorithmic threshold for the problem.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
The Overlap Gap Property: a Geometric Barrier to Optimizing over Random Structures
Authors:
David Gamarnik
Abstract:
The problem of optimizing over random structures emerges in many areas of science and engineering, ranging from statistical physics to machine learning and artificial intelligence. For many such structures finding optimal solutions by means of fast algorithms is not known and often is believed not possible. At the same time the formal hardness of these problems in form of say complexity-theoretic…
▽ More
The problem of optimizing over random structures emerges in many areas of science and engineering, ranging from statistical physics to machine learning and artificial intelligence. For many such structures finding optimal solutions by means of fast algorithms is not known and often is believed not possible. At the same time the formal hardness of these problems in form of say complexity-theoretic $NP$-hardness is lacking.
In this introductory article a new approach for algorithmic intractability in random structures is described, which is based on the topological disconnectivity property of the set of pair-wise distances of near optimal solutions, called the Overlap Gap Property. The article demonstrates how this property a) emerges in most models known to exhibit an apparent algorithmic hardness b) is consistent with the hardness/tractability phase transition for many models analyzed to the day, and importantly c) allows to mathematically rigorously rule out large classes of algorithms as potential contenders, in particular the algorithms exhibiting the input stability (insensitivity).
△ Less
Submitted 1 August, 2021;
originally announced September 2021.
-
Circuit Lower Bounds for the p-Spin Optimization Problem
Authors:
David Gamarnik,
Aukosh Jagannath,
Alexander S. Wein
Abstract:
We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least…
▽ More
We consider the problem of finding a near ground state of a $p$-spin model with Rademacher couplings by means of a low-depth circuit. As a direct extension of the authors' recent work [Gamarnik, Jagannath, Wein 2020], we establish that any poly-size $n$-output circuit that produces a spin assignment with objective value within a certain constant factor of optimality, must have depth at least $\log n/(2\log\log n)$ as $n$ grows. This is stronger than the known state of the art bounds of the form $Ω(\log n/(k(n)\log\log n))$ for similar combinatorial optimization problems, where $k(n)$ depends on the optimality value. For example, for the largest clique problem $k(n)$ corresponds to the square of the size of the clique [Rossman 2010]. At the same time our results are not quite comparable since in our case the circuits are required to produce a solution itself rather than solving the associated decision problem. As in our earlier work, the approach is based on the overlap gap property (OGP) exhibited by random $p$-spin models, but the derivation of the circuit lower bound relies further on standard facts from Fourier analysis on the Boolean cube, in particular the Linial-Mansour-Nisan Theorem.
To the best of our knowledge, this is the first instance when methods from spin glass theory have ramifications for circuit complexity.
△ Less
Submitted 21 January, 2022; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks
Authors:
David Gamarnik,
Eren C. Kızıldağ,
Ilias Zadik
Abstract:
We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmativel…
▽ More
We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in $d$) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector $X\in\mathbb{R}^d$ need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through \emph{fat-shattering dimension}, a scale-sensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in $d$ with a low degree), and are in fact near-linear for some important cases of interest.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Algorithmic Obstructions in the Random Number Partitioning Problem
Authors:
David Gamarnik,
Eren C. Kızıldağ
Abstract:
We consider the algorithmic problem of finding a near-optimal solution for the number partitioning problem (NPP). The NPP appears in many applications, including the design of randomized controlled trials, multiprocessor scheduling, and cryptography; and is also of theoretical significance. It possesses a so-called statistical-to-computational gap: when its input $X$ has distribution…
▽ More
We consider the algorithmic problem of finding a near-optimal solution for the number partitioning problem (NPP). The NPP appears in many applications, including the design of randomized controlled trials, multiprocessor scheduling, and cryptography; and is also of theoretical significance. It possesses a so-called statistical-to-computational gap: when its input $X$ has distribution $\mathcal{N}(0,I_n)$, its optimal value is $Θ(\sqrt{n}2^{-n})$ w.h.p.; whereas the best polynomial-time algorithm achieves an objective value of only $2^{-Θ(\log^2 n)}$, w.h.p.
In this paper, we initiate the study of the nature of this gap. Inspired by insights from statistical physics, we study the landscape of NPP and establish the presence of the Overlap Gap Property (OGP), an intricate geometric property which is known to be a rigorous evidence of an algorithmic hardness for large classes of algorithms. By leveraging the OGP, we establish that (a) any sufficiently stable algorithm, appropriately defined, fails to find a near-optimal solution with energy below $2^{-ω(n \log^{-1/5} n)}$; and (b) a very natural MCMC dynamics fails to find near-optimal solutions. Our simulations suggest that the state of the art algorithm achieving $2^{-Θ(\log^2 n)}$ is indeed stable, but formally verifying this is left as an open problem.
OGP regards the overlap structure of $m-$tuples of solutions achieving a certain objective value. When $m$ is constant we prove the presence of OGP in the regime $2^{-Θ(n)}$, and the absence of it in the regime $2^{-o(n)}$. Interestingly, though, by considering overlaps with growing values of $m$ we prove the presence of the OGP up to the level $2^{-ω(\sqrt{n\log n})}$. Our proof of the failure of stable algorithms at values $2^{-ω(n \log^{-1/5} n)}$ employs methods from Ramsey Theory from the extremal combinatorics, and is of independent interest.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Correlation Decay and the Absence of Zeros Property of Partition Functions
Authors:
David Gamarnik
Abstract:
Absence of (complex) zeros property is at the heart of the interpolation method developed by Barvinok \cite{barvinok2017combinatorics} for designing deterministic approximation algorithms for various graph counting and computing partition functions problems. Earlier methods for solving the same problem include the one based on the correlation decay property. Remarkably, the classes of graphs for w…
▽ More
Absence of (complex) zeros property is at the heart of the interpolation method developed by Barvinok \cite{barvinok2017combinatorics} for designing deterministic approximation algorithms for various graph counting and computing partition functions problems. Earlier methods for solving the same problem include the one based on the correlation decay property. Remarkably, the classes of graphs for which the two methods apply sometimes coincide or nearly coincide. In this paper we show that this is more than just a coincidence. We establish that if the interpolation method is valid for a family of graphs satisfying the self-reducibility property, then this family exhibits a form of correlation decay property which is asymptotic Strong Spatial Mixing (SSM) at distances $ω(\log n)$, where $n$ is the number of nodes of the graph. This applies in particular to amenable graphs, such as graphs which are finite subsets of lattices.
Our proof is based on a certain graph polynomial representation of the associated partition function. This representation is at the heart of the design of the polynomial time algorithms underlying the interpolation method itself. We conjecture that our result holds for all, and not just amenable graphs.
△ Less
Submitted 1 December, 2020; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Stability, memory, and messaging tradeoffs in heterogeneous service systems
Authors:
David Gamarnik,
John N. Tsitsiklis,
Martin Zubeldia
Abstract:
We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching decisions…
▽ More
We consider a heterogeneous distributed service system, consisting of $n$ servers with unknown and possibly different processing rates. Jobs with unit mean and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, to the system. Incoming jobs are immediately dispatched to one of several queues associated with the $n$ servers. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers.
We study the fundamental resource requirements (memory bits and message exchange rate) in order for a dispatching policy to be {\bf maximally stable}, i.e., stable whenever the processing rates are such that the arrival rate is less than the total available processing rate. First, for the case of Poisson arrivals and exponential service times, we present a policy that is maximally stable while using a positive (but arbitrarily small) message rate, and $\log_2(n)$ bits of memory. Second, we show that within a certain broad class of policies, a dispatching policy that exchanges $o\big(n^2\big)$ messages per unit of time, and with $o(\log(n))$ bits of memory, cannot be maximally stable. Thus, as long as the message rate is not too excessive, a logarithmic memory is necessary and sufficient for maximal stability.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
Estimation of Monotone Multi-Index Models
Authors:
David Gamarnik,
Julia Gaudio
Abstract:
In a multi-index model with $k$ index vectors, the input variables are transformed by taking inner products with the index vectors. A transfer function $f: \mathbb{R}^k \to \mathbb{R}$ is applied to these inner products to generate the output. Thus, multi-index models are a generalization of linear models. In this paper, we consider monotone multi-index models. Namely, the transfer function is ass…
▽ More
In a multi-index model with $k$ index vectors, the input variables are transformed by taking inner products with the index vectors. A transfer function $f: \mathbb{R}^k \to \mathbb{R}$ is applied to these inner products to generate the output. Thus, multi-index models are a generalization of linear models. In this paper, we consider monotone multi-index models. Namely, the transfer function is assumed to be coordinate-wise monotone. The monotone multi-index model therefore generalizes both linear regression and isotonic regression, which is the estimation of a coordinate-wise monotone function. We consider the case of nonnegative index vectors. We provide an algorithm based on integer programming for the estimation of monotone multi-index models, and provide guarantees on the $L_2$ loss of the estimated function relative to the ground truth.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: Worst Case Examples
Authors:
Edward Farhi,
David Gamarnik,
Sam Gutmann
Abstract:
The Quantum Approximate Optimization Algorithm can be applied to search problems on graphs with a cost function that is a sum of terms corresponding to the edges. When conjugating an edge term, the QAOA unitary at depth p produces an operator that depends only on the subgraph consisting of edges that are at most p away from the edge in question. On random d-regular graphs, with d fixed and with p…
▽ More
The Quantum Approximate Optimization Algorithm can be applied to search problems on graphs with a cost function that is a sum of terms corresponding to the edges. When conjugating an edge term, the QAOA unitary at depth p produces an operator that depends only on the subgraph consisting of edges that are at most p away from the edge in question. On random d-regular graphs, with d fixed and with p a small constant time log n, these neighborhoods are almost all trees and so the performance of the QAOA is determined only by how it acts on an edge in the middle of tree. Both bipartite random d-regular graphs and general random d-regular graphs locally are trees so the QAOA's performance is the same on these two ensembles. Using this we can show that the QAOA with $(d-1)^{2p} < n^A$ for any $A<1$, can only achieve an approximation ratio of 1/2 for Max-Cut on bipartite random d-regular graphs for d large. For Maximum Independent Set, in the same setting, the best approximation ratio is a d-dependent constant that goes to 0 as d gets big.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics
Authors:
David Gamarnik,
Aukosh Jagannath,
Alexander S. Wein
Abstract:
We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b…
▽ More
We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising $p$-spin glass model, and (b) finding a large independent set in a sparse Erdős-Rényi graph. The following families of algorithms are considered: (a) low-degree polynomials of the input; (b) low-depth Boolean circuits; (c) the Langevin dynamics algorithm. We show that these families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory (although we consider the search problem as opposed to the decision problem).
Our proof uses the fact that these models are known to exhibit a variant of the overlap gap property (OGP) of near-optimal solutions. Specifically, for both models, every two solutions whose objectives are above a certain threshold are either close or far from each other. The crux of our proof is that the classes of algorithms we consider exhibit a form of stability. We show by an interpolation argument that stable algorithms cannot overcome the OGP barrier.
The stability of Langevin dynamics is an immediate consequence of the well-posedness of stochastic differential equations. The stability of low-degree polynomials and Boolean circuits is established using tools from Gaussian and Boolean analysis -- namely hypercontractivity and total influence, as well as a novel lower bound for random walks avoiding certain subsets. In the case of Boolean circuits, the result also makes use of Linal-Mansour-Nisan's classical theorem. Our techniques apply more broadly to low influence functions and may apply more generally.
△ Less
Submitted 26 January, 2022; v1 submitted 25 April, 2020;
originally announced April 2020.
-
The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case
Authors:
Edward Farhi,
David Gamarnik,
Sam Gutmann
Abstract:
The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent set…
▽ More
The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent sets in random graphs with dn/2 edges kee** d fixed and n large. Using the Overlap Gap Property of almost optimal independent sets in random graphs, and the locality of the QAOA, we are able to show that if p is less than a d-dependent constant times log n, the QAOA cannot do better than finding an independent set of size .854 times the optimal for d large. Because the logarithm is slowly growing, even at one million qubits we can only show that the algorithm is blocked if p is in single digits. At higher p the algorithm "sees" the whole graph and we have no indication that performance is limited.
△ Less
Submitted 19 April, 2020;
originally announced April 2020.
-
Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena
Authors:
Matt Emschwiller,
David Gamarnik,
Eren C. Kızıldağ,
Ilias Zadik
Abstract:
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the…
▽ More
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data, even though the number of parameters significantly exceeds the sample sizes, and the model perfectly fits the in-training data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. In this paper we prove a series of results which provide a somewhat diverging explanation. Adopting a teacher/student model where the teacher network is used to generate the predictions and student network is trained on the observed labeled data, and then tested on out-of-sample data, we show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension and approximation guarantee alone, regardless of the number of internal nodes of either teacher or student network.
Our claim is based on approximating both teacher and student networks by polynomial (tensor) regression models with degree depending on the desired accuracy and network depth only. Such a parametrization notably does not depend on the number of internal nodes. Thus a message implied by our results is that parametrizing wide neural networks by the number of hidden nodes is misleading, and a more fitting measure of parametrization complexity is the number of regression coefficients associated with tensorized data. In particular, this somewhat reconciles the generalization ability of neural networks with more classical statistical notions of data complexity and generalization bounds. Our empirical results on MNIST and Fashion-MNIST datasets indeed confirm that tensorized regression achieves a good out-of-sample performance, even when the degree of the tensor is at most two.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Stationary Points of Shallow Neural Networks with Quadratic Activation Function
Authors:
David Gamarnik,
Eren C. Kızıldağ,
Ilias Zadik
Abstract:
We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*\in\mathbb{R}^{m\times d}$, where $m$ is the width of the hidden layer and $d\le m$ is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are f…
▽ More
We consider the teacher-student setting of learning shallow neural networks with quadratic activations and planted weight matrix $W^*\in\mathbb{R}^{m\times d}$, where $m$ is the width of the hidden layer and $d\le m$ is the data dimension. We study the optimization landscape associated with the empirical and the population squared risk of the problem. Under the assumption the planted weights are full-rank we obtain the following results. First, we establish that the landscape of the empirical risk admits an "energy barrier" separating rank-deficient $W$ from $W^*$: if $W$ is rank deficient, then its risk is bounded away from zero by an amount we quantify. We then couple this result by showing that, assuming number $N$ of samples grows at least like a polynomial function of $d$, all full-rank approximate stationary points of the empirical risk are nearly global optimum. These two results allow us to prove that gradient descent, when initialized below the energy barrier, approximately minimizes the empirical risk and recovers the planted weights in polynomial-time. Next, we show that initializing below this barrier is in fact easily achieved when the weights are randomly generated under relatively weak assumptions. We show that provided the network is sufficiently overparametrized, initializing with an appropriate multiple of the identity suffices to obtain a risk below the energy barrier. At a technical level, the last result is a consequence of the semicircle law for the Wishart ensemble and could be of independent interest. Finally, we study the minimizers of the empirical risk and identify a simple necessary and sufficient geometric condition on the training data under which any minimizer has necessarily zero generalization error. We show that as soon as $N\ge N^*=d(d+1)/2$, randomly generated data enjoys this geometric condition almost surely, while that ceases to be true if $N<N^*$.
△ Less
Submitted 9 July, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
The Overlap Gap Property and Approximate Message Passing Algorithms for $p$-spin models
Authors:
David Gamarnik,
Aukosh Jagannath
Abstract:
We consider the algorithmic problem of finding a near ground state (near optimal solution) of a $p$-spin model. We show that for a class of algorithms broadly defined as Approximate Message Passing (AMP), the presence of the Overlap Gap Property (OGP), appropriately defined, is a barrier. We conjecture that when $p\ge 4$ the model does indeed exhibits OGP (and prove it for the space of binary solu…
▽ More
We consider the algorithmic problem of finding a near ground state (near optimal solution) of a $p$-spin model. We show that for a class of algorithms broadly defined as Approximate Message Passing (AMP), the presence of the Overlap Gap Property (OGP), appropriately defined, is a barrier. We conjecture that when $p\ge 4$ the model does indeed exhibits OGP (and prove it for the space of binary solutions). Assuming the validity of this conjecture, as an implication, the AMP fails to find near ground states in these models, per our result. We extend our result to the problem of finding pure states by means of Thouless, Anderson and Palmer (TAP) based iterations, which is yet another example of AMP type algorithms. We show that such iterations fail to find pure states approximately, subject to the conjecture that the space of pure states exhibits the OGP, appropriately stated, when $p\ge 4$.
△ Less
Submitted 25 November, 2019; v1 submitted 15 November, 2019;
originally announced November 2019.
-
Inference in High-Dimensional Linear Regression via Lattice Basis Reduction and Integer Relation Detection
Authors:
David Gamarnik,
Eren C. Kızıldağ,
Ilias Zadik
Abstract:
We focus on the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $β^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $β^*$, but instead adopt a different regularization: In the noiseless setting, we assume $β^*$ consists of ent…
▽ More
We focus on the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $β^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $β^*$, but instead adopt a different regularization: In the noiseless setting, we assume $β^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q$-rationality); or irrational numbers supported on a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-support assumption. Using a novel combination of the PSLQ integer relation detection, and LLL lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $β^*\in\mathbb{R}^p$ enjoying the mixed-support assumption, from its linear measurements $Y=Xβ^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement $(n=1)$. In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $β^*\in\mathbb{R}^p$ enjoying $Q$-rationality, from its noisy measurements $Y=Xβ^*+W\in\mathbb{R}^n$, even with a single sample $(n=1)$. We further establish for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level of noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample $(n=1)$ regime, where the sparsity-based methods such as LASSO and Basis Pursuit are known to fail. Furthermore, our results also reveal an algorithmic connection between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
The Overlap Gap Property in Principal Submatrix Recovery
Authors:
David Gamarnik,
Aukosh Jagannath,
Subhabrata Sen
Abstract:
We study support recovery for a $k \times k$ principal submatrix with elevated mean $λ/N$, hidden in an $N\times N$ symmetric mean zero Gaussian matrix. Here $λ>0$ is a universal constant, and we assume $k = N ρ$ for some constant $ρ\in (0,1)$. We establish that {there exists a constant $C>0$ such that} the MLE recovers a constant proportion of the hidden submatrix if…
▽ More
We study support recovery for a $k \times k$ principal submatrix with elevated mean $λ/N$, hidden in an $N\times N$ symmetric mean zero Gaussian matrix. Here $λ>0$ is a universal constant, and we assume $k = N ρ$ for some constant $ρ\in (0,1)$. We establish that {there exists a constant $C>0$ such that} the MLE recovers a constant proportion of the hidden submatrix if $λ{\geq C} \sqrt{\frac{1}ρ \log \frac{1}ρ}$, {while such recovery is information theoretically impossible if $λ= o( \sqrt{\frac{1}ρ \log \frac{1}ρ} )$}. The MLE is computationally intractable in general, and in fact, for $ρ>0$ sufficiently small, this problem is conjectured to exhibit a \emph{statistical-computational gap}. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some $\varepsilon>0$ and $\sqrt{\frac{1}ρ \log \frac{1}ρ } \ll λ\ll \frac{1}{ρ^{1/2 + \varepsilon}}$, the problem exhibits a variant of the \emph{Overlap-Gap-Property (OGP)}. As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for $λ> 1/ρ$, a simple spectral method recovers a constant proportion of the hidden submatrix.
△ Less
Submitted 12 December, 2020; v1 submitted 26 August, 2019;
originally announced August 2019.
-
Sparse High-Dimensional Isotonic Regression
Authors:
David Gamarnik,
Julia Gaudio
Abstract:
We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinate-wise monotone functi…
▽ More
We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinate-wise monotone functions, and identify the regime of statistical consistency of our estimator. We also propose a linear program to recover the active coordinates, and provide theoretical recovery guarantees. We close with experiments on cancer classification, and show that our method significantly outperforms standard methods.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property
Authors:
David Gamarnik,
Ilias Zadik
Abstract:
In this paper we study the computational-statistical gap of the planted clique problem, where a clique of size $k$ is planted in an Erdos Renyi graph $G(n,\frac{1}{2})$ resulting in a graph $G\left(n,\frac{1}{2},k\right)$. The goal is to recover the planted clique vertices by observing $G\left(n,\frac{1}{2},k\right)$ . It is known that the clique can be recovered as long as…
▽ More
In this paper we study the computational-statistical gap of the planted clique problem, where a clique of size $k$ is planted in an Erdos Renyi graph $G(n,\frac{1}{2})$ resulting in a graph $G\left(n,\frac{1}{2},k\right)$. The goal is to recover the planted clique vertices by observing $G\left(n,\frac{1}{2},k\right)$ . It is known that the clique can be recovered as long as $k \geq \left(2+ε\right)\log n $ for any $ε>0$, but no polynomial-time algorithm is known for this task unless $k=Ω\left(\sqrt{n} \right)$. Following a statistical-physics inspired point of view as an attempt to understand this computational-statistical gap, we study the landscape of the "sufficiently dense" subgraphs of $G$ and their overlap with the planted clique.
Using the first moment method, we study the densest subgraph problems for subgraphs with fixed, but arbitrary, overlap size with the planted clique, and provide evidence of a phase transition for the presence of Overlap Gap Property (OGP) at $k=Θ\left(\sqrt{n}\right)$. OGP is a concept introduced originally in spin glass theory and known to suggest algorithmic hardness when it appears. We establish the presence of OGP when $k$ is a small positive power of $n$ by using a conditional second moment method. As our main technical tool, we establish the first, to the best of our knowledge, concentration results for the $K$-densest subgraph problem for the Erdos-Renyi model $G\left(n,\frac{1}{2}\right)$ when $K=n^{0.5-ε}$ for arbitrary $ε>0$. Finally, to study the OGP we employ a certain form of overparametrization, which is conceptually aligned with a large body of recent work in learning theory and optimization.
△ Less
Submitted 30 December, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Computing the partition function of the Sherrington-Kirkpatrick model is hard on average
Authors:
David Gamarnik,
Eren Kizildag
Abstract:
We establish the average-case hardness of the algorithmic problem of exact computation of the partition function associated with the Sherrington-Kirkpatrick model of spin glasses with Gaussian couplings and random external field. In particular, we establish that unless $P= \#P$, there does not exist a polynomial-time algorithm to exactly compute the partition function on average. This is done by s…
▽ More
We establish the average-case hardness of the algorithmic problem of exact computation of the partition function associated with the Sherrington-Kirkpatrick model of spin glasses with Gaussian couplings and random external field. In particular, we establish that unless $P= \#P$, there does not exist a polynomial-time algorithm to exactly compute the partition function on average. This is done by showing that if there exists a polynomial time algorithm, which exactly computes the partition function for inverse polynomial fraction ($1/n^{O(1)}$) of all inputs, then there is a polynomial time algorithm, which exactly computes the partition function for all inputs, with high probability, yielding $P=\#P$. The computational model that we adopt is {\em finite-precision arithmetic}, where the algorithmic inputs are truncated first to a certain level $N$ of digital precision. The ingredients of our proof include the random and downward self-reducibility of the partition function with random external field; an argument of Cai et al. \cite{cai1999hardness} for establishing the average-case hardness of computing the permanent of a matrix; a list-decoding algorithm of Sudan \cite{sudan1996maximum}, for reconstructing polynomials intersecting a given list of numbers at sufficiently many points; and near-uniformity of the log-normal distribution, modulo a large prime $p$. To the best of our knowledge, our result is the first one establishing a provable hardness of a model arising in the field of spin glasses.
Furthermore, we extend our result to the same problem under a different {\em real-valued} computational model, e.g. using a Blum-Shub-Smale machine \cite{blum1988theory} operating over real-valued inputs.
△ Less
Submitted 25 November, 2019; v1 submitted 13 October, 2018;
originally announced October 2018.
-
Finding cliques using few probes
Authors:
Uriel Feige,
David Gamarnik,
Joe Neeman,
Miklós Z. Rácz,
Prasad Tetali
Abstract:
Consider algorithms with unbounded computation time that probe the entries of the adjacency matrix of an $n$ vertex graph, and need to output a clique. We show that if the input graph is drawn at random from $G_{n,\frac{1}{2}}$ (and hence is likely to have a clique of size roughly $2\log n$), then for every $δ< 2$ and constant $\ell$, there is an $α< 2$ (that may depend on $δ$ and $\ell$) such tha…
▽ More
Consider algorithms with unbounded computation time that probe the entries of the adjacency matrix of an $n$ vertex graph, and need to output a clique. We show that if the input graph is drawn at random from $G_{n,\frac{1}{2}}$ (and hence is likely to have a clique of size roughly $2\log n$), then for every $δ< 2$ and constant $\ell$, there is an $α< 2$ (that may depend on $δ$ and $\ell$) such that no algorithm that makes $n^δ$ probes in $\ell$ rounds is likely (over the choice of the random graph) to output a clique of size larger than $α\log n$.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
A lower bound on the queueing delay in resource constrained load balancing
Authors:
David Gamarnik,
John N. Tsitsiklis,
Martin Zubeldia
Abstract:
We consider the following distributed service model: jobs with unit mean, general distribution, and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, and are immediately dispatched to one of several queues associated with $n$ identical servers with unit processing rate. We assume that the dispatching decisions are made by a central dispatcher endowed with a finit…
▽ More
We consider the following distributed service model: jobs with unit mean, general distribution, and independent processing times arrive as a renewal process of rate $λn$, with $0<λ<1$, and are immediately dispatched to one of several queues associated with $n$ identical servers with unit processing rate. We assume that the dispatching decisions are made by a central dispatcher endowed with a finite memory, and with the ability to exchange messages with the servers.
We study the fundamental resource requirements (memory bits and message exchange rate), in order to drive the expected queueing delay in steady-state of a typical job to zero, as $n$ increases. We develop a novel approach to show that, within a certain broad class of "symmetric" policies, every dispatching policy with a message rate of the order of $n$, and with a memory of the order of $\log n$ bits, results in an expected queueing delay which is bounded away from zero, uniformly as $n\to\infty$.
△ Less
Submitted 8 July, 2018;
originally announced July 2018.
-
Explicit construction of RIP matrices is Ramsey-hard
Authors:
David Gamarnik
Abstract:
Matrices $Φ\in\R^{n\times p}$ satisfying the Restricted Isometry Property (RIP) are an important ingredient of the compressive sensing methods. While it is known that random matrices satisfy the RIP with high probability even for $n=\log^{O(1)}p$, the explicit construction of such matrices defied the repeated efforts, and the most known approaches hit the so-called $\sqrt{n}$ sparsity bottleneck.…
▽ More
Matrices $Φ\in\R^{n\times p}$ satisfying the Restricted Isometry Property (RIP) are an important ingredient of the compressive sensing methods. While it is known that random matrices satisfy the RIP with high probability even for $n=\log^{O(1)}p$, the explicit construction of such matrices defied the repeated efforts, and the most known approaches hit the so-called $\sqrt{n}$ sparsity bottleneck. The notable exception is the work by Bourgain et al \cite{bourgain2011explicit} constructing an $n\times p$ RIP matrix with sparsity $s=Θ(n^{{1\over 2}+ε})$, but in the regime $n=Ω(p^{1-δ})$.
In this short note we resolve this open question in a sense by showing that an explicit construction of a matrix satisfying the RIP in the regime $n=O(\log^2 p)$ and $s=Θ(n^{1\over 2})$ implies an explicit construction of a three-colored Ramsey graph on $p$ nodes with clique sizes bounded by $O(\log^2 p)$ -- a question in the extremal combinatorics which has been open for decades.
△ Less
Submitted 15 November, 2018; v1 submitted 29 May, 2018;
originally announced May 2018.
-
High Dimensional Linear Regression using Lattice Basis Reduction
Authors:
David Gamarnik,
Ilias Zadik
Abstract:
We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector $β^*$ from $n$ noisy linear observations $Y=Xβ^*+W \in \mathbb{R}^n$, for known $X \in \mathbb{R}^{n \times p}$ and unknown $W \in \mathbb{R}^n$. Unlike most of the literature on this model we make no sparsity assumption on $β^*$. Instead we adopt a regularization based on assuming t…
▽ More
We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector $β^*$ from $n$ noisy linear observations $Y=Xβ^*+W \in \mathbb{R}^n$, for known $X \in \mathbb{R}^{n \times p}$ and unknown $W \in \mathbb{R}^n$. Unlike most of the literature on this model we make no sparsity assumption on $β^*$. Instead we adopt a regularization based on assuming that the underlying vectors $β^*$ have rational entries with the same denominator $Q \in \mathbb{Z}_{>0}$. We call this $Q$-rationality assumption.
We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the $Q$-rationality assumption, our algorithm recovers exactly the vector $β^*$ for a large class of distributions for the iid entries of $X$ and non-zero noise $W$. We prove that it is successful under small noise, even when the learner has access to only one observation ($n=1$). Furthermore, we prove that in the case of the Gaussian white noise for $W$, $n=o\left(p/\log p\right)$ and $Q$ sufficiently large, our algorithm tolerates a nearly optimal information-theoretic level of the noise.
△ Less
Submitted 8 November, 2018; v1 submitted 18 March, 2018;
originally announced March 2018.
-
Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm
Authors:
David Gamarnik,
Ilias Zadik
Abstract:
We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $β^*$ from $n$ noisy linear observations of the form $Y=Xβ^*+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,σ^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between…
▽ More
We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $β^*$ from $n$ noisy linear observations of the form $Y=Xβ^*+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,σ^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of $n$, call it $n^*$, for which the recovery is information theoretically possible, and the minimum value of $n$, call it $n_{\mathrm{alg}}$, for which an efficient algorithm is known to provably recover $β^*$. In \cite{gamarnikzadik} it was conjectured that the gap is not artificial, in the sense that for sample sizes $n \in [n^*,n_{\mathrm{alg}}]$ the problem is algorithmically hard.
We support this conjecture in two ways. Firstly, we show that the optimal solution of the LASSO provably fails to $\ell_2$-stably recover the unknown vector $β^*$ when $n \in [n^*,c n_{\mathrm{alg}}]$, for some sufficiently small constant $c>0$. Secondly, we establish that $n_{\mathrm{alg}}$, up to a multiplicative constant factor, is a phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of $k$-sparse vectors. The presence of such an Overlap Gap Property phase transition, which originates in statistical physics, is known to provide evidence of an algorithmic hardness. Finally we show that if $n>C n_{\mathrm{alg}}$ for some large enough constant $C>0$, a very simple algorithm based on a local search improvement rule is able both to $\ell_2$-stably recover the unknown vector $β^*$ and to infer correctly its support, adding it to the list of provably successful algorithms for the high dimensional linear regression problem.
△ Less
Submitted 22 September, 2019; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Delay, memory, and messaging tradeoffs in distributed service systems
Authors:
David Gamarnik,
John N. Tsitsiklis,
Martin Zubeldia
Abstract:
We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate $λn$, with $0<λ<1$, and are immediately dispatched by a centralized dispatcher to one of $n$ First-In-First-Out queues associated with $n$ identical servers. The dispatcher is endowed with a finite memory, and with the ability to…
▽ More
We consider the following distributed service model: jobs with unit mean, exponentially distributed, and independent processing times arrive as a Poisson process of rate $λn$, with $0<λ<1$, and are immediately dispatched by a centralized dispatcher to one of $n$ First-In-First-Out queues associated with $n$ identical servers. The dispatcher is endowed with a finite memory, and with the ability to exchange messages with the servers.
We propose and study a resource-constrained "pull-based" dispatching policy that involves two parameters: (i) the number of memory bits available at the dispatcher, and (ii) the average rate at which servers communicate with the dispatcher. We establish (using a fluid limit approach) that the asymptotic, as $n\to\infty$, expected queueing delay is zero when either (i) the number of memory bits grows logarithmically with $n$ and the message rate grows superlinearly with $n$, or (ii) the number of memory bits grows superlogarithmically with $n$ and the message rate is at least $λn$. Furthermore, when the number of memory bits grows only logarithmically with $n$ and the message rate is proportional to $n$, we obtain a closed-form expression for the (now positive) asymptotic delay.
Finally, we demonstrate an interesting phase transition in the resource-constrained regime where the asymptotic delay is non-zero. In particular, we show that for any given $α>0$ (no matter how small), if our policy only uses a linear message rate $αn$, the resulting asymptotic delay is upper bounded, uniformly over all $λ<1$; this is in sharp contrast to the delay obtained when no messages are used ($α= 0$), which grows as $1/(1-λ)$ when $λ\uparrow 1$, or when the popular power-of-$d$-choices is used, in which the delay grows as $\log(1/(1-λ))$.
△ Less
Submitted 12 September, 2017;
originally announced September 2017.
-
Uniqueness of Gibbs Measures for Continuous Hardcore Models
Authors:
David Gamarnik,
Kavita Ramanan
Abstract:
We formulate a continuous version of the well known discrete hardcore (or independent set) model on a locally finite graph, parameterized by the so-called activity parameter $λ> 0$. In this version, the state or "spin value" $x_u$ of any node $u$ of the graph lies in the interval $[0,1]$, the hardcore constraint $x_u + x_v \leq 1$ is satisfied for every edge $(u,v)$ of the graph, and the space of…
▽ More
We formulate a continuous version of the well known discrete hardcore (or independent set) model on a locally finite graph, parameterized by the so-called activity parameter $λ> 0$. In this version, the state or "spin value" $x_u$ of any node $u$ of the graph lies in the interval $[0,1]$, the hardcore constraint $x_u + x_v \leq 1$ is satisfied for every edge $(u,v)$ of the graph, and the space of feasible configurations is given by a convex polytope. When the graph is a regular tree, we show that there is a unique Gibbs measure associated to each activity parameter $λ>0$. Our result shows that, in contrast to the standard discrete hardcore model, the continuous hardcore model does not exhibit a phase transition on the infinite regular tree. We also consider a family of continuous models that interpolate between the discrete and continuous hardcore models on a regular tree when $λ= 1$ and show that each member of the family has a unique Gibbs measure, even when the discrete model does not. In each case, the proof entails the analysis of an associated Hamiltonian dynamical system that describes a certain limit of the marginal distribution at a node. Furthermore, given any sequence of regular graphs with fixed degree and girth diverging to infinity, we apply our results to compute the asymptotic limit of suitably normalized volumes of the corresponding sequence of convex polytopes of feasible configurations. In particular, this yields an approximation for the partition function of the continuous hard core model on a regular graph with large girth in the case $λ= 1$.
△ Less
Submitted 14 August, 2017;
originally announced August 2017.
-
Suboptimality of local algorithms for a class of max-cut problems
Authors:
Wei-Kuo Chen,
David Gamarnik,
Dmitry Panchenko,
Mustazee Rahman
Abstract:
We show that in random $K$-uniform hypergraphs of constant average degree, for even $K \geq 4$, local algorithms defined as factors of i.i.d. can not find nearly maximal cuts, when the average degree is sufficiently large. These algorithms have been used frequently to obtain lower bounds for the max-cut problem on random graphs, but it was not known whether they could be successful in finding near…
▽ More
We show that in random $K$-uniform hypergraphs of constant average degree, for even $K \geq 4$, local algorithms defined as factors of i.i.d. can not find nearly maximal cuts, when the average degree is sufficiently large. These algorithms have been used frequently to obtain lower bounds for the max-cut problem on random graphs, but it was not known whether they could be successful in finding nearly maximal cuts. This result follows from the fact that the overlap of any two nearly maximal cuts in such hypergraphs does not take values in a certain non-trivial interval - a phenomenon referred to as the overlap gap property - which is proved by comparing diluted models with large average degree with appropriate fully connected spin glass models and showing the overlap gap property in the latter setting.
△ Less
Submitted 8 August, 2018; v1 submitted 17 July, 2017;
originally announced July 2017.
-
Matrix Completion from $O(n)$ Samples in Linear Time
Authors:
David Gamarnik,
Quan Li,
Hongyi Zhang
Abstract:
We consider the problem of reconstructing a rank-$k$ $n \times n$ matrix $M$ from a sampling of its entries. Under a certain incoherence assumption on $M$ and for the case when both the rank and the condition number of $M$ are bounded, it was shown in \cite{CandesRecht2009, CandesTao2010, keshavan2010, Recht2011, Jain2012, Hardt2014} that $M$ can be recovered exactly or approximately (depending on…
▽ More
We consider the problem of reconstructing a rank-$k$ $n \times n$ matrix $M$ from a sampling of its entries. Under a certain incoherence assumption on $M$ and for the case when both the rank and the condition number of $M$ are bounded, it was shown in \cite{CandesRecht2009, CandesTao2010, keshavan2010, Recht2011, Jain2012, Hardt2014} that $M$ can be recovered exactly or approximately (depending on some trade-off between accuracy and computational complexity) using $O(n \, \text{poly}(\log n))$ samples in super-linear time $O(n^{a} \, \text{poly}(\log n))$ for some constant $a \geq 1$.
In this paper, we propose a new matrix completion algorithm using a novel sampling scheme based on a union of independent sparse random regular bipartite graphs. We show that under the same conditions w.h.p. our algorithm recovers an $ε$-approximation of $M$ in terms of the Frobenius norm using $O(n \log^2(1/ε))$ samples and in linear time $O(n \log^2(1/ε))$. This provides the best known bounds both on the sample complexity and computational complexity for reconstructing (approximately) an unknown low-rank matrix.
The novelty of our algorithm is two new steps of thresholding singular values and rescaling singular vectors in the application of the "vanilla" alternating minimization algorithm. The structure of sparse random regular graphs is used heavily for controlling the impact of these regularization steps.
△ Less
Submitted 22 August, 2017; v1 submitted 7 February, 2017;
originally announced February 2017.
-
High-Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transition
Authors:
David Gamarnik,
Ilias Zadik
Abstract:
We consider a sparse linear regression model Y=Xβ^{*}+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and β^{*} is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error \min_β\|Y-Xβ\|_{2}, where the minimization is over all k…
▽ More
We consider a sparse linear regression model Y=Xβ^{*}+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and β^{*} is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error \min_β\|Y-Xβ\|_{2}, where the minimization is over all k-sparse binary vectors β. The approximation reveals interesting structural properties of the underlying regression problem. In particular, a) We establish that n^*=2k\log p/\log (2k/σ^{2}+1) is a phase transition point with the following "all-or-nothing" property. When n exceeds n^{*}, (2k)^{-1}\|β_{2}-β^*\|_0\approx 0, and when n is below n^{*}, (2k)^{-1}\|β_{2}-β^*\|_0\approx 1, where β_2 is the optimal solution achieving the smallest squared error. With this we prove that n^{*} is the asymptotic threshold for recovering β^* information theoretically. b) We compute the squared error for an intermediate problem \min_β\|Y-Xβ\|_{2} where minimization is restricted to vectors βwith \|β-β^{*}\|_0=2k ζ, for ζ\in [0,1]. We show that a lower bound part Γ(ζ) of the estimate, which corresponds to the estimate based on the first moment method, undergoes a phase transition at three different thresholds, namely n_{\text{inf,1}}=σ^2\log p, which is information theoretic bound for recovering β^* when k=1 and σis large, then at n^{*} and finally at n_{\text{LASSO/CS}}. c) We establish a certain Overlap Gap Property (OGP) on the space of all binary vectors βwhen n\le ck\log p for sufficiently small constant c. We conjecture that OGP is the source of algorithmic hardness of solving the minimization problem \min_β\|Y-Xβ\|_{2} in the regime n<n_{\text{LASSO/CS}}.
△ Less
Submitted 25 September, 2019; v1 submitted 16 January, 2017;
originally announced January 2017.
-
Supermarket Queueing System in the Heavy Traffic Regime. Short Queue Dynamics
Authors:
Patrick Eschenfeldt,
David Gamarnik
Abstract:
We consider a queueing system with $n$ parallel queues operating according to the so-called "supermarket model" in which arriving customers join the shortest of $d$ randomly selected queues. Assuming rate $nλ_{n}$ Poisson arrivals and rate $1$ exponentially distributed service times, we consider this model in the heavy traffic regime, described by $λ_{n}\uparrow 1$ as $n\to\infty$. We give a simpl…
▽ More
We consider a queueing system with $n$ parallel queues operating according to the so-called "supermarket model" in which arriving customers join the shortest of $d$ randomly selected queues. Assuming rate $nλ_{n}$ Poisson arrivals and rate $1$ exponentially distributed service times, we consider this model in the heavy traffic regime, described by $λ_{n}\uparrow 1$ as $n\to\infty$. We give a simple expectation argument establishing that majority of queues have steady state length at least $\log_d(1-λ_{n})^{-1} - O(1)$ with probability approaching one as $n\rightarrow\infty$, implying the same for the steady state delay of a typical customer.
Our main result concerns the detailed behavior of queues with length smaller than $\log_d(1-λ_{n})^{-1}-O(1)$. Assuming $λ_{n}$ converges to $1$ at rate at most $\sqrt{n}$, we show that the dynamics of such queues does not follow a diffusion process, as is typical for queueing systems in heavy traffic, but is described instead by a deterministic infinite system of linear differential equations, after an appropriate rescaling. The unique fixed point solution of this system is shown explicitly to be of the form $π_{1}(d^{i}-1)/(d-1), i\ge 1$, which we conjecture describes the steady state behavior of the queue lengths after the same rescaling. Our result is obtained by combination of several technical ideas including establishing the existence and uniqueness of an associated infinite dimensional system of non-linear integral equations and adopting an appropriate stopped process as an intermediate step.
△ Less
Submitted 17 January, 2017; v1 submitted 11 October, 2016;
originally announced October 2016.
-
A Message Passing Algorithm for the Problem of Path Packing in Graphs
Authors:
Patrick Eschenfeldt,
David Gamarnik
Abstract:
We consider the problem of packing node-disjoint directed paths in a directed graph. We consider a variant of this problem where each path starts within a fixed subset of root nodes, subject to a given bound on the length of paths. This problem is motivated by the so-called kidney exchange problem, but has potential other applications and is interesting in its own right.
We propose a new algorit…
▽ More
We consider the problem of packing node-disjoint directed paths in a directed graph. We consider a variant of this problem where each path starts within a fixed subset of root nodes, subject to a given bound on the length of paths. This problem is motivated by the so-called kidney exchange problem, but has potential other applications and is interesting in its own right.
We propose a new algorithm for this problem based on the message passing/belief propagation technique. A priori this problem does not have an associated graphical model, so in order to apply a belief propagation algorithm we provide a novel representation of the problem as a graphical model. Standard belief propagation on this model has poor scaling behavior, so we provide an efficient implementation that significantly decreases the complexity. We provide numerical results comparing the performance of our algorithm on both artificially created graphs and real world networks to several alternative algorithms, including algorithms based on integer programming (IP) techniques. These comparisons show that our algorithm scales better to large instances than IP-based algorithms and often finds better solutions than a simple algorithm that greedily selects the longest path from each root node. In some cases it also finds better solutions than the ones found by IP-based algorithms even when the latter are allowed to run significantly longer than our algorithm.
△ Less
Submitted 18 March, 2016;
originally announced March 2016.
-
Finding a Large Submatrix of a Gaussian Random Matrix
Authors:
David Gamarnik,
Quan Li
Abstract:
We consider the problem of finding a $k\times k$ submatrix of an $n\times n$ matrix with i.i.d. standard Gaussian entries, which has a large average entry. It was shown earlier by Bhamidi et al. that the largest average value of such a matrix is $2\sqrt{\log n/k}$ with high probability. In the same paper an evidence was provided that a natural greedy algorithm called Largest Average Submatrix (…
▽ More
We consider the problem of finding a $k\times k$ submatrix of an $n\times n$ matrix with i.i.d. standard Gaussian entries, which has a large average entry. It was shown earlier by Bhamidi et al. that the largest average value of such a matrix is $2\sqrt{\log n/k}$ with high probability. In the same paper an evidence was provided that a natural greedy algorithm called Largest Average Submatrix ($\LAS$) should produce a matrix with average entry approximately $\sqrt{2}$ smaller.
In this paper we show that the matrix produced by the $\LAS$ algorithm is indeed $\sqrt{2\log n/k}$ w.h.p. Then by drawing an analogy with the problem of finding cliques in random graphs, we propose a simple greedy algorithm which produces a $k\times k$ matrix with asymptotically the same average value. Since the greedy algorithm is the best known algorithm for finding cliques in random graphs, it is tempting to believe that beating the factor $\sqrt{2}$ performance gap suffered by both algorithms might be very challenging. Surprisingly, we show the existence of a very simple algorithm which produces a matrix with average value $(4/3)\sqrt{2\log n/k}$.
To get an insight into the algorithmic hardness of this problem, and motivated by methods originating in the theory of spin glasses, we conduct the so-called expected overlap analysis of matrices with average value asymptotically $α\sqrt{2\log n/k}$. The overlap corresponds to the number of common rows and common columns for pairs of matrices achieving this value. We discover numerically an intriguing phase transition at $α^*\approx 1.3608..$: when $α<α^*$ the space of overlaps is a continuous subset of $[0,1]^2$, whereas $α=α^*$ marks the onset of discontinuity, and the model exhibits the Overlap Gap Property when $α>α^*$. We conjecture that $α>α^*$ marks the onset of the algorithmic hardness.
△ Less
Submitted 26 February, 2016;
originally announced February 2016.
-
A Note on Alternating Minimization Algorithm for the Matrix Completion Problem
Authors:
David Gamarnik,
Sidhant Misra
Abstract:
We consider the problem of reconstructing a low rank matrix from a subset of its entries and analyze two variants of the so-called Alternating Minimization algorithm, which has been proposed in the past. We establish that when the underlying matrix has rank $r=1$, has positive bounded entries, and the graph $\mathcal{G}$ underlying the revealed entries has bounded degree and diameter which is at m…
▽ More
We consider the problem of reconstructing a low rank matrix from a subset of its entries and analyze two variants of the so-called Alternating Minimization algorithm, which has been proposed in the past. We establish that when the underlying matrix has rank $r=1$, has positive bounded entries, and the graph $\mathcal{G}$ underlying the revealed entries has bounded degree and diameter which is at most logarithmic in the size of the matrix, both algorithms succeed in reconstructing the matrix approximately in polynomial time starting from an arbitrary initialization. We further provide simulation results which suggest that the second algorithm which is based on the message passing type updates, performs significantly better.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
Join the Shortest Queue with Many Servers. The Heavy Traffic Asymptotics
Authors:
Patrick Eschenfeldt,
David Gamarnik
Abstract:
We consider queueing systems with n parallel queues under a Join the Shortest Queue (JSQ) policy in the Halfin-Whitt heavy traffic regime. We use the martingale method to prove that a scaled process counting the number of idle servers and queues of length exactly 2 weakly converges to a two-dimensional reflected Ornstein-Uhlenbeck process, while processes counting longer queues converge to a deter…
▽ More
We consider queueing systems with n parallel queues under a Join the Shortest Queue (JSQ) policy in the Halfin-Whitt heavy traffic regime. We use the martingale method to prove that a scaled process counting the number of idle servers and queues of length exactly 2 weakly converges to a two-dimensional reflected Ornstein-Uhlenbeck process, while processes counting longer queues converge to a deterministic system decaying to zero in constant time. This limiting system is comparable to that of the traditional Halfin-Whitt model, but there are key differences in the queueing behavior of the JSQ model. In particular, only a vanishing fraction of customers will have to wait, but those who do will incur a constant order waiting time.
△ Less
Submitted 21 September, 2015; v1 submitted 3 February, 2015;
originally announced February 2015.
-
Structure learning of antiferromagnetic Ising models
Authors:
Guy Bresler,
David Gamarnik,
Devavrat Shah
Abstract:
In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of $Ω(p^{d/2})$ for learning ge…
▽ More
In this paper we investigate the computational complexity of learning the graph structure underlying a discrete undirected graphical model from i.i.d. samples. We first observe that the notoriously difficult problem of learning parities with noise can be captured as a special case of learning graphical models. This leads to an unconditional computational lower bound of $Ω(p^{d/2})$ for learning general graphical models on $p$ nodes of maximum degree $d$, for the class of so-called statistical algorithms recently introduced by Feldman et al (2013). The lower bound suggests that the $O(p^d)$ runtime required to exhaustively search over neighborhoods cannot be significantly improved without restricting the class of models.
Aside from structural assumptions on the graph such as it being a tree, hypertree, tree-like, etc., many recent papers on structure learning assume that the model has the correlation decay property. Indeed, focusing on ferromagnetic Ising models, Bento and Montanari (2009) showed that all known low-complexity algorithms fail to learn simple graphs when the interaction strength exceeds a number related to the correlation decay threshold. Our second set of results gives a class of repelling (antiferromagnetic) models that have the opposite behavior: very strong interaction allows efficient learning in time $O(p^2)$. We provide an algorithm whose performance interpolates between $O(p^2)$ and $O(p^{d+2})$ depending on the strength of the repulsion.
△ Less
Submitted 3 December, 2014;
originally announced December 2014.
-
On the Max-Cut of Sparse Random Graphs
Authors:
David Gamarnik,
Quan Li
Abstract:
We consider the problem of estimating the size of a maximum cut (Max-Cut problem) in a random Erdős-Rényi graph on $n$ nodes and $\lfloor cn \rfloor$ edges. It is shown in Coppersmith et al. ~\cite{Coppersmith2004} that the size of the maximum cut in this graph normalized by the number of nodes belongs to the asymptotic region $[c/2+0.37613\sqrt{c},c/2+0.58870\sqrt{c}]$ with high probability (w.h.…
▽ More
We consider the problem of estimating the size of a maximum cut (Max-Cut problem) in a random Erdős-Rényi graph on $n$ nodes and $\lfloor cn \rfloor$ edges. It is shown in Coppersmith et al. ~\cite{Coppersmith2004} that the size of the maximum cut in this graph normalized by the number of nodes belongs to the asymptotic region $[c/2+0.37613\sqrt{c},c/2+0.58870\sqrt{c}]$ with high probability (w.h.p.) as $n$ increases, for all sufficiently large $c$.
In this paper we improve both upper and lower bounds by introducing a novel bounding technique. Specifically, we establish that the size of the maximum cut normalized by the number of nodes belongs to the interval $[c/2+0.47523\sqrt{c},c/2+0.55909\sqrt{c}]$ w.h.p. as $n$ increases, for all sufficiently large $c$. Instead of considering the expected number of cuts achieving a particular value as is done in the application of the first moment method, we observe that every maximum size cut satisfies a certain local optimality property, and we compute the expected number of cuts with a given value satisfying this local optimality property. Estimating this expectation amounts to solving a rather involved two dimensional large deviations problem. We solve this underlying large deviation problem asymptotically as $c$ increases and use it to obtain an improved upper bound on the Max-Cut value. The lower bound is obtained by application of the second moment method, coupled with the same local optimality constraint, and is shown to work up to the stated lower bound value $c/2+0.47523\sqrt{c}$. It is worth noting that both bounds are stronger than the ones obtained by standard first and second moment methods.
Finally, we also obtain an improved lower bound of $1.36000n$ on the Max-Cut for the random cubic graph or any cubic graph with large girth, improving the previous best bound of $1.33773n$.
△ Less
Submitted 12 February, 2017; v1 submitted 6 November, 2014;
originally announced November 2014.