-
The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks
Authors:
Jakub Swiatkowski,
Kevin Roth,
Bastiaan S. Veeling,
Linh Tran,
Joshua V. Dillon,
Jasper Snoek,
Stephan Mandt,
Tim Salimans,
Rodolphe Jenatton,
Sebastian Nowozin
Abstract:
Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work develo** this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational d…
▽ More
Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work develo** this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.
△ Less
Submitted 5 July, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
How Good is the Bayes Posterior in Deep Neural Networks Really?
Authors:
Florian Wenzel,
Kevin Roth,
Bastiaan S. Veeling,
Jakub Świątkowski,
Linh Tran,
Stephan Mandt,
Jasper Snoek,
Tim Salimans,
Rodolphe Jenatton,
Sebastian Nowozin
Abstract:
During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neura…
▽ More
During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
△ Less
Submitted 2 July, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.
-
Hydra: Preserving Ensemble Diversity for Model Distillation
Authors:
Linh Tran,
Bastiaan S. Veeling,
Kevin Roth,
Jakub Swiatkowski,
Joshua V. Dillon,
Jasper Snoek,
Stephan Mandt,
Tim Salimans,
Sebastian Nowozin,
Rodolphe Jenatton
Abstract:
Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing d…
▽ More
Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behavior of the original ensemble over both in-domain and out-of-distribution tasks.
△ Less
Submitted 19 March, 2021; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Improved GQ-CNN: Deep Learning Model for Planning Robust Grasps
Authors:
Maciej Jaśkowski,
Jakub Świątkowski,
Michał Zając,
Maciej Klimek,
Jarek Potiuk,
Piotr Rybicki,
Piotr Polatowski,
Przemysław Walczyk,
Kacper Nowicki,
Marek Cygan
Abstract:
Recent developments in the field of robot gras** have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQ-CNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQ-CNN and describe practical improvements that increase the mo…
▽ More
Recent developments in the field of robot gras** have shown great improvements in the grasp success rates when dealing with unknown objects. In this work we improve on one of the most promising approaches, the Grasp Quality Convolutional Neural Network (GQ-CNN) trained on the DexNet 2.0 dataset. We propose a new architecture for the GQ-CNN and describe practical improvements that increase the model validation accuracy from 92.2% to 95.8% and from 85.9% to 88.0% on respectively image-wise and object-wise training and validation splits.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
Discriminative k-shot learning using probabilistic models
Authors:
Matthias Bauer,
Mateo Rojas-Carulla,
Jakub Bartłomiej Świątkowski,
Bernhard Schölkopf,
Richard E. Turner
Abstract:
This paper introduces a probabilistic framework for k-shot image classification. The goal is to generalise from an initial large-scale classification task to a separate task comprising new classes and small numbers of examples. The new approach not only leverages the feature-based representation learned by a neural network from the initial task (representational transfer), but also information abo…
▽ More
This paper introduces a probabilistic framework for k-shot image classification. The goal is to generalise from an initial large-scale classification task to a separate task comprising new classes and small numbers of examples. The new approach not only leverages the feature-based representation learned by a neural network from the initial task (representational transfer), but also information about the classes (concept transfer). The concept information is encapsulated in a probabilistic model for the final layer weights of the neural network which acts as a prior for probabilistic k-shot learning. We show that even a simple probabilistic model achieves state-of-the-art on a standard k-shot learning dataset by a large margin. Moreover, it is able to accurately model uncertainty, leading to well calibrated classifiers, and is easily extensible and flexible, unlike many recent approaches to k-shot learning.
△ Less
Submitted 8 December, 2017; v1 submitted 1 June, 2017;
originally announced June 2017.