-
Equivariant Neural Network for Factor Graphs
Authors:
Fan-Yun Sun,
Jonathan Kuck,
Hao Tang,
Stefano Ermon
Abstract:
Several indices used in a factor graph data structure can be permuted without changing the underlying probability distribution. An algorithm that performs inference on a factor graph should ideally be equivariant or invariant to permutations of global indices of nodes, variable orderings within a factor, and variable assignment orderings. However, existing neural network-based inference procedures…
▽ More
Several indices used in a factor graph data structure can be permuted without changing the underlying probability distribution. An algorithm that performs inference on a factor graph should ideally be equivariant or invariant to permutations of global indices of nodes, variable orderings within a factor, and variable assignment orderings. However, existing neural network-based inference procedures fail to take advantage of this inductive bias. In this paper, we precisely characterize these isomorphic properties of factor graphs and propose two inference models: Factor-Equivariant Neural Belief Propagation (FE-NBP) and Factor-Equivariant Graph Neural Networks (FE-GNN). FE-NBP is a neural network that generalizes BP and respects each of the above properties of factor graphs while FE-GNN is an expressive GNN model that relaxes an isomorphic property in favor of greater expressivity. Empirically, we demonstrate on both real-world and synthetic datasets, for both marginal inference and MAP inference, that FE-NBP and FE-GNN together cover a range of sample complexity regimes: FE-NBP achieves state-of-the-art performance on small datasets while FE-GNN achieves state-of-the-art performance on large datasets.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Sample-Efficient Safety Assurances using Conformal Prediction
Authors:
Rachel Luo,
Shengjia Zhao,
Jonathan Kuck,
Boris Ivanovic,
Silvio Savarese,
Edward Schmerling,
Marco Pavone
Abstract:
When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than $ε$ will…
▽ More
When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than $ε$ will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an $ε$ false negative rate using as few as $1/ε$ data points. We apply our framework to a driver warning system and a robotic gras** application, and empirically demonstrate guaranteed false negative rate while also observing low false detection (positive) rate.
△ Less
Submitted 2 January, 2024; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Privacy Preserving Recalibration under Domain Shift
Authors:
Rachel Luo,
Shengjia Zhao,
Jiaming Song,
Jonathan Kuck,
Stefano Ermon,
Silvio Savarese
Abstract:
Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training dat…
▽ More
Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training data, and privacy preservation is paramount (e.g. protecting patient records). We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints. This framework allows us to adapt existing recalibration algorithms to satisfy differential privacy while remaining effective for domain-shift situations. Guided by our framework, we also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets. In an extensive empirical study, we find that our algorithm improves calibration on domain-shift benchmarks under the constraints of differential privacy. On the 15 highest severity perturbations of the ImageNet-C dataset, our method achieves a median ECE of 0.029, over 2x better than the next best recalibration method and almost 5x better than without recalibration.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Belief Propagation Neural Networks
Authors:
Jonathan Kuck,
Shuvam Chakraborty,
Hao Tang,
Rachel Luo,
Jiaming Song,
Ashish Sabharwal,
Stefano Ermon
Abstract:
Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP…
▽ More
Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP). In its strictest form, a BPNN layer (BPNN-D) is a learned iterative operator that provably maintains many of the desirable properties of BP for any choice of the parameters. Empirically, we show that by training BPNN-D learns to perform the task better than the original BP: it converges 1.7x faster on Ising models while providing tighter bounds. On challenging model counting problems, BPNNs compute estimates 100's of times faster than state-of-the-art handcrafted methods, while returning an estimate of comparable quality.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Approximating the Permanent by Sampling from Adaptive Partitions
Authors:
Jonathan Kuck,
Tri Dao,
Hamid Rezatofighi,
Ashish Sabharwal,
Stefano Ermon
Abstract:
Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that can be computed efficiently. While the problem admits a fully polynomial randomized approximation scheme, this method has seen little use beca…
▽ More
Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that can be computed efficiently. While the problem admits a fully polynomial randomized approximation scheme, this method has seen little use because it is both inefficient in practice and difficult to implement. We present AdaPart, a simple and efficient method for drawing exact samples from an unnormalized distribution. Using AdaPart, we show how to construct tight bounds on the permanent which hold with high probability, with guaranteed polynomial runtime for dense matrices. We find that AdaPart can provide empirical speedups exceeding 25x over prior sampling methods on matrices that are challenging for variational based approaches. Finally, in the context of multi-target tracking, exact sampling from the distribution defined by the matrix permanent allows us to use the optimal proposal distribution during particle filtering. Using AdaPart, we show that this leads to improved tracking performance using an order of magnitude fewer samples.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Uniform Inference in High-Dimensional Gaussian Graphical Models
Authors:
Sven Klaassen,
Jannis Kück,
Martin Spindler,
Victor Chernozhukov
Abstract:
Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a caus…
▽ More
Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. To construct simultaneous confidence regions on many target parameters, sufficiently fast estimation rates of the nuisance functions are crucial. In this context, we establish uniform estimation rates and sparsity guarantees of the square-root estimator in a random design under approximate sparsity conditions that might be of independent interest for related problems in high-dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties.
△ Less
Submitted 3 December, 2018; v1 submitted 30 August, 2018;
originally announced August 2018.
-
Approximate Inference via Weighted Rademacher Complexity
Authors:
Jonathan Kuck,
Ashish Sabharwal,
Stefano Ermon
Abstract:
Rademacher complexity is often used to characterize the learnability of a hypothesis class and is known to be related to the class size. We leverage this observation and introduce a new technique for estimating the size of an arbitrary weighted set, defined as the sum of weights of all elements in the set. Our technique provides upper and lower bounds on a novel generalization of Rademacher comple…
▽ More
Rademacher complexity is often used to characterize the learnability of a hypothesis class and is known to be related to the class size. We leverage this observation and introduce a new technique for estimating the size of an arbitrary weighted set, defined as the sum of weights of all elements in the set. Our technique provides upper and lower bounds on a novel generalization of Rademacher complexity to the weighted setting in terms of the weighted set size. This generalizes Massart's Lemma, a known upper bound on the Rademacher complexity in terms of the unweighted set size. We show that the weighted Rademacher complexity can be estimated by solving a randomly perturbed optimization problem, allowing us to derive high-probability bounds on the size of any weighted set. We apply our method to the problems of calculating the partition function of an Ising model and computing propositional model counts (#SAT). Our experiments demonstrate that we can produce tighter bounds than competing methods in both the weighted and unweighted settings.
△ Less
Submitted 26 January, 2018;
originally announced January 2018.
-
High-Dimensional $L_2$Boosting: Rate of Convergence
Authors:
Ye Luo,
Martin Spindler,
Jannis Kück
Abstract:
Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage…
▽ More
Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stop**, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.
△ Less
Submitted 21 July, 2022; v1 submitted 29 February, 2016;
originally announced February 2016.