-
Learning Graphical Models Using Multiplicative Weights
Authors:
Adam Klivans,
Raghu Meka
Abstract:
We give a simple, multiplicative-weight update algorithm for learning undirected graphical models or Markov random fields (MRFs). The approach is new, and for the well-studied case of Ising models or Boltzmann machines, we obtain an algorithm that uses a nearly optimal number of samples and has quadratic running time (up to logarithmic factors), subsuming and improving on all prior work. Additiona…
▽ More
We give a simple, multiplicative-weight update algorithm for learning undirected graphical models or Markov random fields (MRFs). The approach is new, and for the well-studied case of Ising models or Boltzmann machines, we obtain an algorithm that uses a nearly optimal number of samples and has quadratic running time (up to logarithmic factors), subsuming and improving on all prior work. Additionally, we give the first efficient algorithm for learning Ising models over general alphabets.
Our main application is an algorithm for learning the structure of t-wise MRFs with nearly-optimal sample complexity (up to polynomial losses in necessary terms that depend on the weights) and running time that is $n^{O(t)}$. In addition, given $n^{O(t)}$ samples, we can also learn the parameters of the model and generate a hypothesis that is close in statistical distance to the true MRF. All prior work runs in time $n^{Ω(d)}$ for graphs of bounded degree d and does not generate a hypothesis close in statistical distance even for t=3. We observe that our runtime has the correct dependence on n and t assuming the hardness of learning sparse parities with noise.
Our algorithm--the Sparsitron-- is easy to implement (has only one parameter) and holds in the on-line setting. Its analysis applies a regret bound from Freund and Schapire's classic Hedge algorithm. It also gives the first solution to the problem of learning sparse Generalized Linear Models (GLMs).
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Hyperparameter Optimization: A Spectral Approach
Authors:
Elad Hazan,
Adam Klivans,
Yang Yuan
Abstract:
We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampli…
▽ More
We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampling of the hyperparameters and is thus easily parallelizable.
Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x.
Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity.
△ Less
Submitted 19 January, 2018; v1 submitted 2 June, 2017;
originally announced June 2017.
-
An Invariance Principle for Polytopes
Authors:
Prahladh Harsha,
Adam Klivans,
Raghu Meka
Abstract:
Let X be randomly chosen from {-1,1}^n, and let Y be randomly chosen from the standard spherical Gaussian on R^n. For any (possibly unbounded) polytope P formed by the intersection of k halfspaces, we prove that
|Pr [X belongs to P] - Pr [Y belongs to P]| < log^{8/5}k * Delta, where Delta is a parameter that is small for polytopes formed by the intersection of "regular" halfspaces (i.e., halfspa…
▽ More
Let X be randomly chosen from {-1,1}^n, and let Y be randomly chosen from the standard spherical Gaussian on R^n. For any (possibly unbounded) polytope P formed by the intersection of k halfspaces, we prove that
|Pr [X belongs to P] - Pr [Y belongs to P]| < log^{8/5}k * Delta, where Delta is a parameter that is small for polytopes formed by the intersection of "regular" halfspaces (i.e., halfspaces with low influence). The novelty of our invariance principle is the polylogarithmic dependence on k. Previously, only bounds that were at least linear in k were known. We give two important applications of our main result: (1) A polylogarithmic in k bound on the Boolean noise sensitivity of intersections of k "regular" halfspaces (previous work gave bounds linear in k). (2) A pseudorandom generator (PRG) with seed length O((log n)*poly(log k,1/delta)) that delta-fools all polytopes with k faces with respect to the Gaussian distribution. We also obtain PRGs with similar parameters that fool polytopes formed by intersection of regular halfspaces over the hypercube. Using our PRG constructions, we obtain the first deterministic quasi-polynomial time algorithms for approximately counting the number of solutions to a broad class of integer programs, including dense covering problems and contingency tables.
△ Less
Submitted 12 September, 2012; v1 submitted 24 December, 2009;
originally announced December 2009.