-
Are Graph Neural Networks Optimal Approximation Algorithms?
Authors:
Morris Yau,
Eric Lu,
Nikolaos Karalias,
Jessica Xu,
Stefanie Jegelka
Abstract:
In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problem…
▽ More
In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN's ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.
△ Less
Submitted 7 February, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Morris Yau
Abstract:
Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical…
▽ More
Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.
△ Less
Submitted 23 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
A New Approach to Learning Linear Dynamical Systems
Authors:
Ainesh Bakshi,
Allen Liu,
Ankur Moitra,
Morris Yau
Abstract:
Linear dynamical systems are the foundational statistical model upon which control theory is built. Both the celebrated Kalman filter and the linear quadratic regulator require knowledge of the system dynamics to provide analytic guarantees. Naturally, learning the dynamics of a linear dynamical system from linear measurements has been intensively studied since Rudolph Kalman's pioneering work in…
▽ More
Linear dynamical systems are the foundational statistical model upon which control theory is built. Both the celebrated Kalman filter and the linear quadratic regulator require knowledge of the system dynamics to provide analytic guarantees. Naturally, learning the dynamics of a linear dynamical system from linear measurements has been intensively studied since Rudolph Kalman's pioneering work in the 1960's. Towards these ends, we provide the first polynomial time algorithm for learning a linear dynamical system from a polynomial length trajectory up to polynomial error in the system parameters under essentially minimal assumptions: observability, controllability, and marginal stability. Our algorithm is built on a method of moments estimator to directly estimate Markov parameters from which the dynamics can be extracted. Furthermore, we provide statistical lower bounds when our observability and controllability assumptions are violated.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Approximating Nash Equilibrium in Random Graphical Games
Authors:
Morris Yau
Abstract:
Computing Nash equilibrium in multi-agent games is a longstanding challenge at the interface of game theory and computer science. It is well known that a general normal form game in N players and k strategies requires exponential space simply to write down. This Curse of Multi-Agents prompts the study of succinct games which can be written down efficiently. A canonical example of a succinct game i…
▽ More
Computing Nash equilibrium in multi-agent games is a longstanding challenge at the interface of game theory and computer science. It is well known that a general normal form game in N players and k strategies requires exponential space simply to write down. This Curse of Multi-Agents prompts the study of succinct games which can be written down efficiently. A canonical example of a succinct game is the graphical game which models players as nodes in a graph interacting with only their neighbors in direct analogy with markov random fields. Graphical games have found applications in wireless, financial, and social networks. However, computing the nash equilbrium of graphical games has proven challenging. Even for polymatrix games, a model where payoffs to an agent can be written as the sum of payoffs of interactions with the agent's neighbors, it has been shown that computing an epsilon approximate nash equilibrium is PPAD hard for epsilon smaller than a constant. The focus of this work is to circumvent this computational hardness by considering average case graph models i.e random graphs. We provide a quasipolynomial time approximation scheme (QPTAS) for computing an epsilon approximate nash equilibrium of polymatrix games on random graphs with edge density greater than poly(k, 1/epsilon, ln(N))$ with high probability. Furthermore, with the same runtime we can compute an epsilon-approximate Nash equilibrium that epsilon-approximates the maximum social welfare of any nash equilibrium of the game. Our primary technical innovation is an "accelerated rounding" of a novel hierarchical convex program for the nash equilibrium problem. Our accelerated rounding also yields faster algorithms for Max-2CSP on the same family of random graphs, which may be of independent interest.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Kalman Filtering with Adversarial Corruptions
Authors:
Sitan Chen,
Frederic Koehler,
Ankur Moitra,
Morris Yau
Abstract:
Here we revisit the classic problem of linear quadratic estimation, i.e. estimating the trajectory of a linear dynamical system from noisy measurements. The celebrated Kalman filter gives an optimal estimator when the measurement noise is Gaussian, but is widely known to break down when one deviates from this assumption, e.g. when the noise is heavy-tailed. Many ad hoc heuristics have been employe…
▽ More
Here we revisit the classic problem of linear quadratic estimation, i.e. estimating the trajectory of a linear dynamical system from noisy measurements. The celebrated Kalman filter gives an optimal estimator when the measurement noise is Gaussian, but is widely known to break down when one deviates from this assumption, e.g. when the noise is heavy-tailed. Many ad hoc heuristics have been employed in practice for dealing with outliers. In a pioneering work, Schick and Mitter gave provable guarantees when the measurement noise is a known infinitesimal perturbation of a Gaussian and raised the important question of whether one can get similar guarantees for large and unknown perturbations.
In this work we give a truly robust filter: we give the first strong provable guarantees for linear quadratic estimation when even a constant fraction of measurements have been adversarially corrupted. This framework can model heavy-tailed and even non-stationary noise processes. Our algorithm robustifies the Kalman filter in the sense that it competes with the optimal algorithm that knows the locations of the corruptions. Our work is in a challenging Bayesian setting where the number of measurements scales with the complexity of what we need to estimate. Moreover, in linear dynamical systems past information decays over time. We develop a suite of new techniques to robustly extract information across different time steps and over varying time scales.
△ Less
Submitted 11 November, 2021;
originally announced November 2021.
-
Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination
Authors:
Sitan Chen,
Frederic Koehler,
Ankur Moitra,
Morris Yau
Abstract:
In this work we revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits, from the perspective of adversarial robustness. Existing works in algorithmic robust statistics make strong distributional assumptions that ensure that the input data is evenly spread out or comes from a nice generative model. Is it possible to achieve strong robustness g…
▽ More
In this work we revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits, from the perspective of adversarial robustness. Existing works in algorithmic robust statistics make strong distributional assumptions that ensure that the input data is evenly spread out or comes from a nice generative model. Is it possible to achieve strong robustness guarantees even without distributional assumptions altogether, where the sequence of tasks we are asked to solve is adaptively and adversarially chosen?
We answer this question in the affirmative for both linear regression and contextual bandits. In fact our algorithms succeed where conventional methods fail. In particular we show strong lower bounds against Huber regression and more generally any convex M-estimator. Our approach is based on a novel alternating minimization scheme that interleaves ordinary least-squares with a simple convex program that finds the optimal reweighting of the distribution under a spectral constraint. Our results obtain essentially optimal dependence on the contamination level $η$, reach the optimal breakdown point, and naturally apply to infinite dimensional settings where the feature vectors are represented implicitly via a kernel map.
△ Less
Submitted 10 June, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability
Authors:
Sitan Chen,
Frederic Koehler,
Ankur Moitra,
Morris Yau
Abstract:
In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate $η$. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy $η+ ε$ for any $ε> 0$. However, their algorithm outputs a compli…
▽ More
In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate $η$. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy $η+ ε$ for any $ε> 0$. However, their algorithm outputs a complicated hypothesis, which partitions space into $\text{poly}(d,1/ε)$ regions. Here we give a much simpler algorithm and in the process resolve a number of outstanding open questions:
(1) We give the first proper learner for Massart halfspaces that achieves $η+ ε$. We also give improved bounds on the sample complexity achievable by polynomial time algorithms.
(2) Based on (1), we develop a blackbox knowledge distillation procedure to convert an arbitrarily complex classifier to an equally good proper classifier.
(3) By leveraging a simple but overlooked connection to evolvability, we show any SQ algorithm requires super-polynomially many queries to achieve $\mathsf{OPT} + ε$.
Moreover we study generalized linear models where $\mathbb{E}[Y|\mathbf{X}] = σ(\langle \mathbf{w}^*, \mathbf{X}\rangle)$ for any odd, monotone, and Lipschitz function $σ$. This family includes the previously mentioned halfspace models as a special case, but is much richer and includes other fundamental models like logistic regression. We introduce a challenging new corruption model that generalizes Massart noise, and give a general algorithm for learning in this setting. Our algorithms are based on a small set of core recipes for learning to classify in the presence of misspecification.
Finally we study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties.
△ Less
Submitted 20 September, 2023; v1 submitted 8 June, 2020;
originally announced June 2020.
-
List Decodable Mean Estimation in Nearly Linear Time
Authors:
Yeshwanth Cherapanamjeri,
Sidhanth Mohanty,
Morris Yau
Abstract:
Learning from data in the presence of outliers is a fundamental problem in statistics. Until recently, no computationally efficient algorithms were known to compute the mean of a high dimensional distribution under natural assumptions in the presence of even a small fraction of outliers. In this paper, we consider robust statistics in the presence of overwhelming outliers where the majority of the…
▽ More
Learning from data in the presence of outliers is a fundamental problem in statistics. Until recently, no computationally efficient algorithms were known to compute the mean of a high dimensional distribution under natural assumptions in the presence of even a small fraction of outliers. In this paper, we consider robust statistics in the presence of overwhelming outliers where the majority of the dataset is introduced adversarially. With only an $α< 1/2$ fraction of "inliers" (clean data) the mean of a distribution is unidentifiable. However, in their influential work, [CSV17] introduces a polynomial time algorithm recovering the mean of distributions with bounded covariance by outputting a succinct list of $O(1/α)$ candidate solutions, one of which is guaranteed to be close to the true distributional mean; a direct analog of 'List Decoding' in the theory of error correcting codes. In this work, we develop an algorithm for list decodable mean estimation in the same setting achieving up to constants the information theoretically optimal recovery, optimal sample complexity, and in nearly linear time up to polylogarithmic factors in dimension. Our conceptual innovation is to design a descent style algorithm on a nonconvex landscape, iteratively removing minima to generate a succinct list of solutions. Our runtime bottleneck is a saddle-point optimization for which we design custom primal dual solvers for generalized packing and covering SDP's under Ky-Fan norms, which may be of independent interest.
△ Less
Submitted 21 January, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
List Decodable Subspace Recovery
Authors:
Prasad Raghavendra,
Morris Yau
Abstract:
Learning from data in the presence of outliers is a fundamental problem in statistics. In this work, we study robust statistics in the presence of overwhelming outliers for the fundamental problem of subspace recovery. Given a dataset where an $α$ fraction (less than half) of the data is distributed uniformly in an unknown $k$ dimensional subspace in $d$ dimensions, and with no additional assumpti…
▽ More
Learning from data in the presence of outliers is a fundamental problem in statistics. In this work, we study robust statistics in the presence of overwhelming outliers for the fundamental problem of subspace recovery. Given a dataset where an $α$ fraction (less than half) of the data is distributed uniformly in an unknown $k$ dimensional subspace in $d$ dimensions, and with no additional assumptions on the remaining data, the goal is to recover a succinct list of $O(\frac{1}α)$ subspaces one of which is nontrivially correlated with the planted subspace. We provide the first polynomial time algorithm for the 'list decodable subspace recovery' problem, and subsume it under a more general framework of list decoding over distributions that are "certifiably resilient" capturing state of the art results for list decodable mean estimation and regression.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
List Decodable Learning via Sum of Squares
Authors:
Prasad Raghavendra,
Morris Yau
Abstract:
In the list-decodable learning setup, an overwhelming majority (say a $1-β$-fraction) of the input data consists of outliers and the goal of an algorithm is to output a small list $\mathcal{L}$ of hypotheses such that one of them agrees with inliers. We develop a framework for list-decodable learning via the Sum-of-Squares SDP hierarchy and demonstrate it on two basic statistical estimation proble…
▽ More
In the list-decodable learning setup, an overwhelming majority (say a $1-β$-fraction) of the input data consists of outliers and the goal of an algorithm is to output a small list $\mathcal{L}$ of hypotheses such that one of them agrees with inliers. We develop a framework for list-decodable learning via the Sum-of-Squares SDP hierarchy and demonstrate it on two basic statistical estimation problems
{\it Linear regression:} Suppose we are given labelled examples $\{(X_i,y_i)\}_{i \in [N]}$ containing a subset $S$ of $βN$ {\it inliers} $\{X_i \}_{i \in S}$ that are drawn i.i.d. from standard Gaussian distribution $N(0,I)$ in $\mathbb{R}^d$, where the corresponding labels $y_i$ are well-approximated by a linear function $\ell$. We devise an algorithm that outputs a list $\mathcal{L}$ of linear functions such that there exists some $\hat{\ell} \in \mathcal{L}$ that is close to $\ell$.
This yields the first algorithm for linear regression in a list-decodable setting. Our results hold for any distribution of examples whose concentration and anticoncentration can be certified by Sum-of-Squares proofs.
{\it Mean Estimation:}
Given data points $\{X_i\}_{i \in [N]}$ containing a subset $S$ of $βN$ {\it inliers} $\{X_i \}_{i \in S}$ that are drawn i.i.d. from a Gaussian distribution $N(μ,I)$ in $\mathbb{R}^d$, we devise an algorithm that generates a list $\mathcal{L}$ of means such that there exists $\hatμ \in \mathcal{L}$ close to $μ$.
The recovery guarantees of the algorithm are analogous to the existing algorithms for the problem by Diakonikolas \etal and Kothari \etal.
In an independent and concurrent work, Karmalkar \etal \cite{KlivansKS19} also obtain an algorithm for list-decodable linear regression using the Sum-of-Squares SDP hierarchy.
△ Less
Submitted 12 May, 2019;
originally announced May 2019.