Search | arXiv e-print repository

Simulation-based Inference with the Generalized Kullback-Leibler Divergence

Authors: Benjamin Kurt Miller, Marco Federici, Christoph Weniger, Patrick Forré

Abstract: In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler di… ▽ More In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler divergence that accounts for the normalization constant in unnormalized distributions. The objective recovers Neural Posterior Estimation when the model class is normalized and unifies it with Neural Ratio Estimation, combining both into a single objective. We investigate a hybrid model that offers the best of both worlds by learning a normalized base distribution and a learned ratio. We also present benchmark results. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: Accepted at Synergy of Scientific and Machine Learning Modeling ICML 2023 Workshop https://syns-ml.github.io/2023/contributions/

arXiv:2306.00608 [pdf, other]

On the Effectiveness of Hybrid Mutual Information Estimation

Authors: Marco Federici, David Ruhe, Patrick Forré

Abstract: Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative me… ▽ More Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative method that can be easily combined with discriminative estimators for minimal computational overhead. Our propositions yield a tighter bound on the information thanks to the reduced variance of the estimator. We test our methods on a challenging task of correlated high-dimensional Gaussian distributions and a stochastic process involving a system of free particles subjected to a fixed energy landscape. Empirical results show that hybrid methods consistently improved mutual information estimates when compared to the corresponding discriminative counterpart. △ Less

Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2107.09301 [pdf, other]

A Bayesian Approach to Invariant Deep Neural Networks

Authors: Nikolaos Mourdoukoutas, Marco Federici, Georges Pantalos, Mark van der Wilk, Vincent Fortuin

Abstract: We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed. We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed. △ Less

Submitted 2 November, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: 8 pages, 3 figures, To be published in ICML UDL 2021

arXiv:2002.07017 [pdf, other]

Learning Robust Representations via Multi-View Information Bottleneck

Authors: Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata

Abstract: The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend… ▽ More The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend this ability to the multi-view unsupervised setting, where two views of the same underlying entity are provided but the label is unknown. This enables us to identify superfluous information as that not shared by both views. A theoretical analysis leads to the definition of a new multi-view model that produces state-of-the-art results on the Sketchy dataset and label-limited versions of the MIR-Flickr dataset. We also extend our theory to the single-view setting by taking advantage of standard data augmentation techniques, empirically showing better generalization capabilities when compared to common unsupervised approaches for representation learning. △ Less

Submitted 18 February, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:1711.06494 [pdf, other]

Improved Bayesian Compression

Authors: Marco Federici, Karen Ullrich, Max Welling

Abstract: Compression of Neural Networks (NN) has become a highly studied topic in recent years. The main reason for this is the demand for industrial scale usage of NNs such as deploying them on mobile devices, storing them efficiently, transmitting them via band-limited channels and most importantly doing inference at scale. In this work, we propose to join the Soft-Weight Sharing and Variational Dropout… ▽ More Compression of Neural Networks (NN) has become a highly studied topic in recent years. The main reason for this is the demand for industrial scale usage of NNs such as deploying them on mobile devices, storing them efficiently, transmitting them via band-limited channels and most importantly doing inference at scale. In this work, we propose to join the Soft-Weight Sharing and Variational Dropout approaches that show strong results to define a new state-of-the-art in terms of model compression. △ Less

Submitted 7 December, 2017; v1 submitted 17 November, 2017; originally announced November 2017.

Showing 1–5 of 5 results for author: Federici, M