Search | arXiv e-print repository

Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

Authors: Vaidotas Simkus, Michael U. Gutmann

Abstract: We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions.… ▽ More We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data. △ Less

Submitted 27 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Published in Transactions on Machine Learning Research (TMLR), 2024

MSC Class: 62D10 ACM Class: I.2.6; G.3

arXiv:2308.09078 [pdf, other]

Conditional Sampling of Variational Autoencoders via Iterated Approximate Ancestral Sampling

Authors: Vaidotas Simkus, Michael U. Gutmann

Abstract: Conditional sampling of variational autoencoders (VAEs) is needed in various applications, such as missing data imputation, but is computationally intractable. A principled choice for asymptotically exact conditional sampling is Metropolis-within-Gibbs (MWG). However, we observe that the tendency of VAEs to learn a structured latent space, a commonly desired property, can cause the MWG sampler to… ▽ More Conditional sampling of variational autoencoders (VAEs) is needed in various applications, such as missing data imputation, but is computationally intractable. A principled choice for asymptotically exact conditional sampling is Metropolis-within-Gibbs (MWG). However, we observe that the tendency of VAEs to learn a structured latent space, a commonly desired property, can cause the MWG sampler to get "stuck" far from the target distribution. This paper mitigates the limitations of MWG: we systematically outline the pitfalls in the context of VAEs, propose two original methods that address these pitfalls, and demonstrate an improved performance of the proposed methods on a set of sampling tasks. △ Less

Submitted 8 November, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: Published in Transactions on Machine Learning Research (TMLR), 2023

MSC Class: 62D10 ACM Class: G.3

arXiv:2305.07721 [pdf, other]

Designing Optimal Behavioral Experiments Using Machine Learning

Authors: Simon Valentin, Steven Kleinegesse, Neil R. Bramley, Peggy Seriès, Michael U. Gutmann, Christopher G. Lucas

Abstract: Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely, and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avo… ▽ More Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely, and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code and tutorial notebooks to replicate all analyses. △ Less

Submitted 26 November, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted in eLife

arXiv:2305.00869 [pdf, other]

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Authors: Akash Srivastava, Seungwook Han, Kai Xu, Benjamin Rhodes, Michael U. Gutmann

Abstract: Functions of the ratio of the densities $p/q$ are widely used in machine learning to quantify the discrepancy between the two distributions $p$ and $q$. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well separated, estimating the density ratio with a binary classifier is challenging. In this work, we s… ▽ More Functions of the ratio of the densities $p/q$ are widely used in machine learning to quantify the discrepancy between the two distributions $p$ and $q$. For high-dimensional distributions, binary classification-based density ratio estimators have shown great promise. However, when densities are well separated, estimating the density ratio with a binary classifier is challenging. In this work, we show that the state-of-the-art density ratio estimators perform poorly on well-separated cases and demonstrate that this is due to distribution shifts between training and evaluation time. We present an alternative method that leverages multi-class classification for density ratio estimation and does not suffer from distribution shift issues. The method uses a set of auxiliary densities $\{m_k\}_{k=1}^K$ and trains a multi-class logistic regression to classify the samples from $p, q$, and $\{m_k\}_{k=1}^K$ into $K+2$ classes. We show that if these auxiliary densities are constructed such that they overlap with $p$ and $q$, then a multi-class logistic regression allows for estimating $\log p/q$ on the domain of any of the $K+2$ distributions and resolves the distribution shift problems of the current state-of-the-art methods. We compare our method to state-of-the-art density ratio estimators on both synthetic and real datasets and demonstrate its superior performance on the tasks of density ratio estimation, mutual information estimation, and representation learning. Code: https://www.blackswhan.com/mdre/ △ Less

Submitted 1 May, 2023; originally announced May 2023.

Journal ref: TMLR 2023

arXiv:2208.02704 [pdf, other]

Bayesian Optimization with Informative Covariance

Authors: Afonso Eduardo, Michael U. Gutmann

Abstract: Bayesian optimization is a methodology for global optimization of unknown and expensive objectives. It combines a surrogate Bayesian regression model with an acquisition function to decide where to evaluate the objective. Typical regression models are given by Gaussian processes with stationary covariance functions. However, these functions are unable to express prior input-dependent information,… ▽ More Bayesian optimization is a methodology for global optimization of unknown and expensive objectives. It combines a surrogate Bayesian regression model with an acquisition function to decide where to evaluate the objective. Typical regression models are given by Gaussian processes with stationary covariance functions. However, these functions are unable to express prior input-dependent information, including possible locations of the optimum. The ubiquity of stationary models has led to the common practice of exploiting prior information via informative mean functions. In this paper, we highlight that these models can perform poorly, especially in high dimensions. We propose novel informative covariance functions for optimization, leveraging nonstationarity to encode preferences for certain regions of the search space and adaptively promote local exploration during optimization. We demonstrate that the proposed functions can increase the sample efficiency of Bayesian optimization in high dimensions, even under weak prior information. △ Less

Submitted 1 April, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

Journal ref: Transactions on Machine Learning Research (TMLR), 2023. URL: https://openreview.net/forum?id=JwgVBv18RG

arXiv:2206.13446 [pdf, other]

Pen and Paper Exercises in Machine Learning

Authors: Michael U. Gutmann

Abstract: This is a collection of (mostly) pen-and-paper exercises in machine learning. The exercises are on the following topics: linear algebra, optimisation, directed graphical models, undirected graphical models, expressive power of graphical models, factor graphs and message passing, inference for hidden Markov models, model-based learning (including ICA and unnormalised models), sampling and Monte-Car… ▽ More This is a collection of (mostly) pen-and-paper exercises in machine learning. The exercises are on the following topics: linear algebra, optimisation, directed graphical models, undirected graphical models, expressive power of graphical models, factor graphs and message passing, inference for hidden Markov models, model-based learning (including ICA and unnormalised models), sampling and Monte-Carlo integration, and variational inference. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: The associated github page is https://github.com/michaelgutmann/ml-pen-and-paper-exercises

arXiv:2204.13999 [pdf, other]

Statistical applications of contrastive learning

Authors: Michael U. Gutmann, Steven Kleinegesse, Benjamin Rhodes

Abstract: The likelihood function plays a crucial role in statistical inference and experimental design. However, it is computationally intractable for several important classes of statistical models, including energy-based models and simulator-based models. Contrastive learning is an intuitive and computationally feasible alternative to likelihood-based learning. We here first provide an introduction to co… ▽ More The likelihood function plays a crucial role in statistical inference and experimental design. However, it is computationally intractable for several important classes of statistical models, including energy-based models and simulator-based models. Contrastive learning is an intuitive and computationally feasible alternative to likelihood-based learning. We here first provide an introduction to contrastive learning and then show how we can use it to derive methods for diverse statistical problems, namely parameter estimation for energy-based models, Bayesian inference for simulator-based models, as well as experimental design. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted to Behaviormetrika

arXiv:2111.13180 [pdf, other]

Variational Gibbs Inference for Statistical Model Estimation from Incomplete Data

Authors: Vaidotas Simkus, Benjamin Rhodes, Michael U. Gutmann

Abstract: Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are controlled by free parameters that are typically estimated from data by maximum-likelihood estimation or approximations thereof. However, when faced with real-world data sets many of the models run into a critical issue: they are formulated in terms of fully-observed data,… ▽ More Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are controlled by free parameters that are typically estimated from data by maximum-likelihood estimation or approximations thereof. However, when faced with real-world data sets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the data sets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as variational autoencoders and normalising flows from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods. △ Less

Submitted 15 August, 2023; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: Published at Journal of Machine Learning Research (JMLR)

MSC Class: 62D10 ACM Class: I.2.6; G.3

Journal ref: Journal of Machine Learning Research, 24(196), 1-72, 2023

arXiv:2111.02329 [pdf, other]

Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods

Authors: Desi R. Ivanova, Adam Foster, Steven Kleinegesse, Michael U. Gutmann, Tom Rainforth

Abstract: We introduce implicit Deep Adaptive Design (iDAD), a new method for performing adaptive experiments in real-time with implicit models. iDAD amortizes the cost of Bayesian optimal experimental design (BOED) by learning a design policy network upfront, which can then be deployed quickly at the time of the experiment. The iDAD network can be trained on any model which simulates differentiable samples… ▽ More We introduce implicit Deep Adaptive Design (iDAD), a new method for performing adaptive experiments in real-time with implicit models. iDAD amortizes the cost of Bayesian optimal experimental design (BOED) by learning a design policy network upfront, which can then be deployed quickly at the time of the experiment. The iDAD network can be trained on any model which simulates differentiable samples, unlike previous design policy work that requires a closed form likelihood and conditionally independent experiments. At deployment, iDAD allows design decisions to be made in milliseconds, in contrast to traditional BOED approaches that require heavy computation during the experiment itself. We illustrate the applicability of iDAD on a number of experiments, and show that it provides a fast and effective mechanism for performing adaptive design with implicit models. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: 33 pages, 8 figures. Published as a conference paper at NeurIPS 2021

arXiv:2110.15632 [pdf, other]

Bayesian Optimal Experimental Design for Simulator Models of Cognition

Authors: Simon Valentin, Steven Kleinegesse, Neil R. Bramley, Michael U. Gutmann, Christopher G. Lucas

Abstract: Bayesian optimal experimental design (BOED) is a methodology to identify experiments that are expected to yield informative data. Recent work in cognitive science considered BOED for computational models of human behavior with tractable and known likelihood functions. However, tractability often comes at the cost of realism; simulator models that can capture the richness of human behavior are ofte… ▽ More Bayesian optimal experimental design (BOED) is a methodology to identify experiments that are expected to yield informative data. Recent work in cognitive science considered BOED for computational models of human behavior with tractable and known likelihood functions. However, tractability often comes at the cost of realism; simulator models that can capture the richness of human behavior are often intractable. In this work, we combine recent advances in BOED and approximate inference for intractable models, using machine-learning methods to find optimal experimental designs, approximate sufficient summary statistics and amortized posterior distributions. Our simulation experiments on multi-armed bandit tasks show that our method results in improved model discrimination and parameter estimation, as compared to experimental designs commonly used in the literature. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: Accepted as a poster at the NeurIPS 2021 Workshop "AI for Science"

arXiv:2105.04379 [pdf, other]

Gradient-based Bayesian Experimental Design for Implicit Models using Mutual Information Lower Bounds

Authors: Steven Kleinegesse, Michael U. Gutmann

Abstract: We introduce a framework for Bayesian experimental design (BED) with implicit models, where the data-generating distribution is intractable but sampling from it is still possible. In order to find optimal experimental designs for such models, our approach maximises mutual information lower bounds that are parametrised by neural networks. By training a neural network on sampled data, we simultaneou… ▽ More We introduce a framework for Bayesian experimental design (BED) with implicit models, where the data-generating distribution is intractable but sampling from it is still possible. In order to find optimal experimental designs for such models, our approach maximises mutual information lower bounds that are parametrised by neural networks. By training a neural network on sampled data, we simultaneously update network parameters and designs using stochastic gradient-ascent. The framework enables experimental design with a variety of prominent lower bounds and can be applied to a wide range of scientific tasks, such as parameter estimation, model discrimination and improving future predictions. Using a set of intractable toy models, we provide a comprehensive empirical comparison of prominent lower bounds applied to the aforementioned tasks. We further validate our framework on a challenging system of stochastic differential equations from epidemiology. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: Under review

MSC Class: 62K05;

arXiv:2006.12204 [pdf, other]

Telesco** Density-Ratio Estimation

Authors: Benjamin Rhodes, Kai Xu, Michael U. Gutmann

Abstract: Density-ratio estimation via classification is a cornerstone of unsupervised learning. It has provided the foundation for state-of-the-art methods in representation learning and generative modelling, with the number of use-cases continuing to proliferate. However, it suffers from a critical limitation: it fails to accurately estimate ratios p/q for which the two densities differ significantly. Emp… ▽ More Density-ratio estimation via classification is a cornerstone of unsupervised learning. It has provided the foundation for state-of-the-art methods in representation learning and generative modelling, with the number of use-cases continuing to proliferate. However, it suffers from a critical limitation: it fails to accurately estimate ratios p/q for which the two densities differ significantly. Empirically, we find this occurs whenever the KL divergence between p and q exceeds tens of nats. To resolve this limitation, we introduce a new framework, telesco** density-ratio estimation (TRE), that enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces. Our experiments demonstrate that TRE can yield substantial improvements over existing single-ratio methods for mutual information estimation, representation learning and energy-based modelling. △ Less

Submitted 24 November, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:2003.09379 [pdf, other]

Sequential Bayesian Experimental Design for Implicit Models via Mutual Information

Authors: Steven Kleinegesse, Christopher Drovandi, Michael U. Gutmann

Abstract: Bayesian experimental design (BED) is a framework that uses statistical models and decision making under uncertainty to optimise the cost and performance of a scientific experiment. Sequential BED, as opposed to static BED, considers the scenario where we can sequentially update our beliefs about the model parameters through data gathered in the experiment. A class of models of particular interest… ▽ More Bayesian experimental design (BED) is a framework that uses statistical models and decision making under uncertainty to optimise the cost and performance of a scientific experiment. Sequential BED, as opposed to static BED, considers the scenario where we can sequentially update our beliefs about the model parameters through data gathered in the experiment. A class of models of particular interest for the natural and medical sciences are implicit models, where the data generating distribution is intractable, but sampling from it is possible. Even though there has been a lot of work on static BED for implicit models in the past few years, the notoriously difficult problem of sequential BED for implicit models has barely been touched upon. We address this gap in the literature by devising a novel sequential design framework for parameter estimation that uses the Mutual Information (MI) between model parameters and simulated data as a utility function to find optimal experimental designs, which has not been done before for implicit models. Our approach uses likelihood-free inference by ratio estimation to simultaneously estimate posterior distributions and the MI. During the sequential BED procedure we utilise Bayesian optimisation to help us optimise the MI utility. We find that our framework is efficient for the various implicit models tested, yielding accurate parameter estimates after only a few iterations. △ Less

Submitted 20 March, 2020; originally announced March 2020.

MSC Class: 62K05; 62L05

arXiv:2002.08129 [pdf, other]

Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation

Authors: Steven Kleinegesse, Michael U. Gutmann

Abstract: Implicit stochastic models, where the data-generation distribution is intractable but sampling is possible, are ubiquitous in the natural sciences. The models typically have free parameters that need to be inferred from data collected in scientific experiments. A fundamental question is how to design the experiments so that the collected data are most useful. The field of Bayesian experimental des… ▽ More Implicit stochastic models, where the data-generation distribution is intractable but sampling is possible, are ubiquitous in the natural sciences. The models typically have free parameters that need to be inferred from data collected in scientific experiments. A fundamental question is how to design the experiments so that the collected data are most useful. The field of Bayesian experimental design advocates that, ideally, we should choose designs that maximise the mutual information (MI) between the data and the parameters. For implicit models, however, this approach is severely hampered by the high computational cost of computing posteriors and maximising MI, in particular when we have more than a handful of design variables to optimise. In this paper, we propose a new approach to Bayesian experimental design for implicit models that leverages recent advances in neural MI estimation to deal with these issues. We show that training a neural network to maximise a lower bound on MI allows us to jointly determine the optimal design and the posterior. Simulation studies illustrate that this gracefully extends Bayesian experimental design for implicit models to higher design dimensions. △ Less

Submitted 14 August, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Accepted at the thirty-seventh International Conference on Machine Learning (ICML) 2020. Camera-ready version

MSC Class: 62K05 (Primary) ACM Class: G.3

arXiv:1907.01505 [pdf, other]

Adaptive Approximate Bayesian Computation Tolerance Selection

Authors: Umberto Simola, Jessica Cisewski-Kehe, Michael U. Gutmann, Jukka Corander

Abstract: Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al.… ▽ More Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al. (2009) has become a popular choice for approximate sampling from the posterior. ABC-PMC is a sequential sampler with an iteratively decreasing value of the tolerance, which specifies how close the simulated data need to be to the real data for acceptance. We propose a method for adaptively selecting a sequence of tolerances that improves the computational efficiency of the algorithm over other common techniques. In addition we define a stop** rule as a by-product of the adaptation procedure, which assists in automating termination of sampling. The proposed automatic ABC-PMC algorithm can be easily implemented and we present several examples demonstrating its benefits in terms of computational efficiency. △ Less

Submitted 30 April, 2020; v1 submitted 21 June, 2019; originally announced July 2019.

Comments: 26 pages, 8 figures

arXiv:1904.02431 [pdf, other]

To Stir or Not to Stir: Online Estimation of Liquid Properties for Pouring Actions

Authors: Tatiana Lopez-Guevara, Rita Pucci, Nicholas Taylor, Michael U. Gutmann, Subramanian Ramamoorthy, Kartic Subr

Abstract: Our brains are able to exploit coarse physical models of fluids to solve everyday manipulation tasks. There has been considerable interest in develo** such a capability in robots so that they can autonomously manipulate fluids adapting to different conditions. In this paper, we investigate the problem of adaptation to liquids with different characteristics. We develop a simple calibration task (… ▽ More Our brains are able to exploit coarse physical models of fluids to solve everyday manipulation tasks. There has been considerable interest in develo** such a capability in robots so that they can autonomously manipulate fluids adapting to different conditions. In this paper, we investigate the problem of adaptation to liquids with different characteristics. We develop a simple calibration task (stirring with a stick) that enables rapid inference of the parameters of the liquid from RBG data. We perform the inference in the space of simulation parameters rather than on physically accurate parameters. This facilitates prediction and optimization tasks since the inferred parameters may be fed directly to the simulator. We demonstrate that our "stirring" learner performs better than when the robot is calibrated with pouring actions. We show that our method is able to infer properties of three different liquids -- water, glycerin and gel -- and present experimental results by executing stirring and pouring actions on a UR10. We believe that decoupling of the training actions from the goal task is an important step towards simple, autonomous learning of the behavior of different fluids in unstructured environments. △ Less

Submitted 4 April, 2019; originally announced April 2019.

Comments: Presented at the Modeling the Physical World: Perception, Learning, and Control Workshop (NeurIPS) 2018

arXiv:1904.00670 [pdf, other]

Robust Optimisation Monte Carlo

Authors: Borislav Ikonomov, Michael U. Gutmann

Abstract: This paper is on Bayesian inference for parametric statistical models that are defined by a stochastic simulator which specifies how data is generated. Exact sampling is then possible but evaluating the likelihood function is typically prohibitively expensive. Approximate Bayesian Computation (ABC) is a framework to perform approximate inference in such situations. While basic ABC algorithms are w… ▽ More This paper is on Bayesian inference for parametric statistical models that are defined by a stochastic simulator which specifies how data is generated. Exact sampling is then possible but evaluating the likelihood function is typically prohibitively expensive. Approximate Bayesian Computation (ABC) is a framework to perform approximate inference in such situations. While basic ABC algorithms are widely applicable, they are notoriously slow and much research has focused on increasing their efficiency. Optimisation Monte Carlo (OMC) has recently been proposed as an efficient and embarrassingly parallel method that leverages optimisation to accelerate the inference. In this paper, we demonstrate an important previously unrecognised failure mode of OMC: It generates strongly overconfident approximations by collapsing regions of similar or near-constant likelihood into a single point. We propose an efficient, robust generalisation of OMC that corrects this. It makes fewer assumptions, retains the main benefits of OMC, and can be performed either as post-processing to OMC or as a stand-alone computation. We demonstrate the effectiveness of the proposed Robust OMC on toy examples and tasks in inverse-graphics where we perform Bayesian inference with a complex image renderer. △ Less

Submitted 28 February, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

Comments: 8 pages + 6 page appendix; v2: made clarifications, added a second possible algorithm implementation and its results; v3: small clarifications, to be published in AISTATS 2020

arXiv:1902.10704 [pdf, other]

Adaptive Gaussian Copula ABC

Authors: Yanzhi Chen, Michael U. Gutmann

Abstract: Approximate Bayesian computation (ABC) is a set of techniques for Bayesian inference when the likelihood is intractable but sampling from the model is possible. This work presents a simple yet effective ABC algorithm based on the combination of two classical ABC approaches --- regression ABC and sequential ABC. The key idea is that rather than learning the posterior directly, we first target anoth… ▽ More Approximate Bayesian computation (ABC) is a set of techniques for Bayesian inference when the likelihood is intractable but sampling from the model is possible. This work presents a simple yet effective ABC algorithm based on the combination of two classical ABC approaches --- regression ABC and sequential ABC. The key idea is that rather than learning the posterior directly, we first target another auxiliary distribution that can be learned accurately by existing methods, through which we then subsequently learn the desired posterior with the help of a Gaussian copula. During this process, the complexity of the model changes adaptively according to the data at hand. Experiments on a synthetic dataset as well as three real-world inference tasks demonstrates that the proposed method is fast, accurate, and easy to use. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: 8 pages, 5 figures, accepted to AISTATS 2019

arXiv:1810.09899 [pdf, other]

Dynamic Likelihood-free Inference via Ratio Estimation (DIRE)

Authors: Traiko Dinev, Michael U. Gutmann

Abstract: Parametric statistical models that are implicitly defined in terms of a stochastic data generating process are used in a wide range of scientific disciplines because they enable accurate modeling. However, learning the parameters from observed data is generally very difficult because their likelihood function is typically intractable. Likelihood-free Bayesian inference methods have been proposed w… ▽ More Parametric statistical models that are implicitly defined in terms of a stochastic data generating process are used in a wide range of scientific disciplines because they enable accurate modeling. However, learning the parameters from observed data is generally very difficult because their likelihood function is typically intractable. Likelihood-free Bayesian inference methods have been proposed which include the frameworks of approximate Bayesian computation (ABC), synthetic likelihood, and its recent generalization that performs likelihood-free inference by ratio estimation (LFIRE). A major difficulty in all these methods is choosing summary statistics that reduce the dimensionality of the data to facilitate inference. While several methods for choosing summary statistics have been proposed for ABC, the literature for synthetic likelihood and LFIRE is very thin to date. We here address this gap in the literature, focusing on the important special case of time-series models. We show that convolutional neural networks trained to predict the input parameters from the data provide suitable summary statistics for LFIRE. On a wide range of time-series models, a single neural network architecture produced equally or more accurate posteriors than alternative methods. △ Less

Submitted 23 October, 2018; originally announced October 2018.

Comments: For a demo, see https://traiko.com/pages/research/lfire/

arXiv:1806.03664 [pdf, other]

Conditional Noise-Contrastive Estimation of Unnormalised Models

Authors: Ciwan Ceylan, Michael U. Gutmann

Abstract: Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learning. In previous work, the estimation principle called noise-contrastive estimation (NCE) was introd… ▽ More Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learning. In previous work, the estimation principle called noise-contrastive estimation (NCE) was introduced where unnormalised models are estimated by learning to distinguish between data and auxiliary noise. An open question is how to best choose the auxiliary noise distribution. We here propose a new method that addresses this issue. The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples. The noise can thus be generated in a semi-automated manner. We first present the underlying theory of the new method, show that score matching emerges as a limiting case, validate the method on continuous and discrete valued synthetic data, and show that we can expect an improved performance compared to NCE when the data lie in a lower-dimensional manifold. Then we demonstrate its applicability in unsupervised deep learning by estimating a four-layer neural image model. △ Less

Submitted 10 June, 2018; originally announced June 2018.

Comments: Accepted to ICML 2018

arXiv:1806.00101 [pdf, other]

Generative Ratio Matching Networks

Authors: Akash Srivastava, Kai Xu, Michael U. Gutmann, Charles Sutton

Abstract: Deep generative models can learn to generate realistic-looking images, but many of the most effective methods are adversarial and involve a saddlepoint optimization, which requires a careful balancing of training between a generator network and a critic network. Maximum mean discrepancy networks (MMD-nets) avoid this issue by using kernel as a fixed adversary, but unfortunately, they have not on t… ▽ More Deep generative models can learn to generate realistic-looking images, but many of the most effective methods are adversarial and involve a saddlepoint optimization, which requires a careful balancing of training between a generator network and a critic network. Maximum mean discrepancy networks (MMD-nets) avoid this issue by using kernel as a fixed adversary, but unfortunately, they have not on their own been able to match the generative quality of adversarial training. In this work, we take their insight of using kernels as fixed adversaries further and present a novel method for training deep generative models that does not involve saddlepoint optimization. We call our method generative ratio matching or GRAM for short. In GRAM, the generator and the critic networks do not play a zero-sum game against each other, instead, they do so against a fixed kernel. Thus GRAM networks are not only stable to train like MMD-nets but they also match and beat the generative quality of adversarially trained generative networks. △ Less

Submitted 14 February, 2020; v1 submitted 31 May, 2018; originally announced June 2018.

Comments: ICLR 2020; Code: https://github.com/GRAM-nets

arXiv:1708.09274 [pdf, other]

doi 10.1038/s41524-019-0175-2

Efficient Bayesian Inference of Atomistic Structure in Complex Functional Materials

Authors: Milica Todorović, Michael U. Gutmann, Jukka Corander, Patrick Rinke

Abstract: Tailoring the functional properties of advanced organic/inorganic heterogeonous devices to their intended technological applications requires knowledge and control of the microscopic structure inside the device. Atomistic quantum mechanical simulation methods deliver accurate energies and properties for individual configurations, however, finding the most favourable configurations remains computat… ▽ More Tailoring the functional properties of advanced organic/inorganic heterogeonous devices to their intended technological applications requires knowledge and control of the microscopic structure inside the device. Atomistic quantum mechanical simulation methods deliver accurate energies and properties for individual configurations, however, finding the most favourable configurations remains computationally prohibitive. We propose a 'building block'-based Bayesian Optimisation Structure Search (BOSS) approach for addressing extended organic/inorganic interface problems and demonstrate its feasibility in a molecular surface adsorption study. In BOSS, a likelihood-free Bayesian scheme accelerates the identification of material energy landscapes with the number of sampled configurations during active learning, enabling structural inference with high chemical accuracy and featuring large simulation cells. This allowed us to identify several most favourable molecular adsorption configurations for $\mathrm{C}_{60}$ on the (101) surface of $\mathrm{TiO}_2$ anatase and clarify the key molecule-surface interactions governing structural assembly. Inferred structures were in good agreement with detailed experimental images of this surface adsorbate, demonstrating good predictive power of BOSS and opening the route towards large-scale surface adsorption studies of molecular aggregates and films. △ Less

Submitted 12 March, 2019; v1 submitted 30 August, 2017; originally announced August 2017.

Comments: 8 pages, 5 figures. Peer-reviewed version

arXiv:1708.00707 [pdf, other]

ELFI: Engine for Likelihood-Free Inference

Authors: Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Pekka Marttinen, Michael U. Gutmann, Aki Vehtari, Jukka Corander, Samuel Kaski

Abstract: Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the availab… ▽ More Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the available inference methods without modifications. A central method implemented in ELFI is Bayesian Optimization for Likelihood-Free Inference (BOLFI), which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude by surrogate-modelling the distance. ELFI also has an inbuilt support for output data storing for reuse and analysis, and supports parallelization of computation from multiple cores up to a cluster environment. ELFI is designed to be extensible and provides interfaces for widening its functionality. This makes the adding of new inference methods to ELFI straightforward and automatically compatible with the inbuilt features. △ Less

Submitted 5 July, 2018; v1 submitted 2 August, 2017; originally announced August 2017.

Journal ref: Journal of Machine Learning Research, 19(16):1-7, 2018. http://jmlr.org/papers/v19/17-374.html

arXiv:1705.07761 [pdf, other]

VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning

Authors: Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, Charles Sutton

Abstract: Deep generative models provide powerful tools for distributions over complicated manifolds, such as those of natural images. But many of these methods, including generative adversarial networks (GANs), can be difficult to train, in part because they are prone to mode collapse, which means that they characterize only a few modes of the true distribution. To address this, we introduce VEEGAN, which… ▽ More Deep generative models provide powerful tools for distributions over complicated manifolds, such as those of natural images. But many of these methods, including generative adversarial networks (GANs), can be difficult to train, in part because they are prone to mode collapse, which means that they characterize only a few modes of the true distribution. To address this, we introduce VEEGAN, which features a reconstructor network, reversing the action of the generator by map** from data to noise. Our training objective retains the original asymptotic consistency guarantee of GANs, and can be interpreted as a novel autoencoder loss over the noise. In sharp contrast to a traditional autoencoder over data points, VEEGAN does not require specifying a loss function over the data, but rather only over the representations, which are standard normal by assumption. On an extensive set of synthetic and real world image datasets, VEEGAN indeed resists mode collapsing to a far greater extent than other recent GAN variants, and produces more realistic samples. △ Less

Submitted 6 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

Comments: Published as a conference paper at NIPS, 2017

arXiv:1704.00520 [pdf, other]

doi 10.1214/18-BA1121

Efficient acquisition rules for model-based approximate Bayesian computation

Authors: Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen

Abstract: Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables… ▽ More Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies. △ Less

Submitted 8 August, 2018; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: 30 pages, 10 figures

arXiv:1611.10242 [pdf, other]

Likelihood-free inference by ratio estimation

Authors: Owen Thomas, Ritabrata Dutta, Jukka Corander, Samuel Kaski, Michael U. Gutmann

Abstract: We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several so-called likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gauss… ▽ More We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several so-called likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution. In another popular approach called approximate Bayesian computation, the inference is performed by identifying parameter values for which the summary statistics of the simulated data are close to those of the observed data. Synthetic likelihood is easier to use as no measure of `closeness' is required but the Gaussianity assumption is often limiting. Moreover, both approaches require judiciously chosen summary statistics. We here present an alternative inference approach that is as easy to use as synthetic likelihood but not as restricted in its assumptions, and that, in a natural way, enables automatic selection of relevant summary statistic from a large set of candidates. The basic idea is to frame the problem of estimating the posterior as a problem of estimating the ratio between the data generating distribution and the marginal distribution. This problem can be solved by logistic regression, and including regularising penalty terms enables automatic selection of the summary statistics relevant to the inference task. We illustrate the general theory on canonical examples and employ it to perform inference for challenging stochastic nonlinear dynamical systems and high-dimensional summary statistics. △ Less

Submitted 11 September, 2020; v1 submitted 30 November, 2016; originally announced November 2016.

Comments: Accepted to Bayesian Analysis (2020)

arXiv:1506.05666 [pdf, other]

Simultaneous Estimation of Non-Gaussian Components and their Correlation Structure

Authors: Hiroaki Sasaki, Michael U. Gutmann, Hayaru Shouno, Aapo Hyvärinen

Abstract: The statistical dependencies which independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models have been proposed, they usually concentrated on higher-order correlations such as energy (square) correlations. Yet, linear correlations are a mo… ▽ More The statistical dependencies which independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models have been proposed, they usually concentrated on higher-order correlations such as energy (square) correlations. Yet, linear correlations are a most fundamental and informative form of dependency in many real data sets. Linear correlations are usually completely removed by ICA and related methods, so they can only be analyzed by develo** new methods which explicitly allow for linearly correlated components. In this paper, we propose a probabilistic model of linear non-Gaussian components which are allowed to have both linear and energy correlations. The precision matrix of the linear components is assumed to be randomly generated by a higher-order process and explicitly parametrized by a parameter matrix. The estimation of the parameter matrix is shown to be particularly simple because using score matching, the objective function is a quadratic form. Using simulations with artificial data, we demonstrate that the proposed method improves identifiability of non-Gaussian components by simultaneously learning their correlation structure. Applications on simulated complex cells with natural image input, as well as spectrograms of natural audio data show that the method finds new kinds of dependencies between the components. △ Less

Submitted 27 July, 2017; v1 submitted 18 June, 2015; originally announced June 2015.

arXiv:1502.05503 [pdf, ps, other]

Classification and Bayesian Optimization for Likelihood-Free Inference

Authors: Michael U. Gutmann, Jukka Corander, Ritabrata Dutta, Samuel Kaski

Abstract: Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by finding the values which yield simulated data that resemble the observed data. This approach faces at least two major difficulties: The first difficulty is the ch… ▽ More Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by finding the values which yield simulated data that resemble the observed data. This approach faces at least two major difficulties: The first difficulty is the choice of the discrepancy measure which is used to judge whether the simulated data resemble the observed data. The second difficulty is the computationally efficient identification of regions in the parameter space where the discrepancy is low. We give here an introduction to our recent work where we tackle the two difficulties through classification and Bayesian optimization. △ Less

Submitted 19 February, 2015; originally announced February 2015.

arXiv:1501.03291 [pdf, other]

Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models

Authors: Michael U. Gutmann, Jukka Corander

Abstract: Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system wit… ▽ More Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude. △ Less

Submitted 31 December, 2015; v1 submitted 14 January, 2015; originally announced January 2015.

Comments: In press with the Journal of Machine Learning Research (JMLR). Accepted August 17, 2015

arXiv:1407.4981 [pdf, other]

Likelihood-free inference via classification

Authors: Michael U. Gutmann, Ritabrata Dutta, Samuel Kaski, Jukka Corander

Abstract: Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding… ▽ More Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers. △ Less

Submitted 3 March, 2017; v1 submitted 18 July, 2014; originally announced July 2014.

Comments: Accepted for publication in Statistics and Computing (Feb 13, 2017)

arXiv:1304.6803 [pdf, ps, other]

Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation

Authors: Song Liu, John A. Quinn, Michael U. Gutmann, Taiji Suzuki, Masashi Sugiyama

Abstract: We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we \emph{directly} learn the network structure change by estimating the ratio of Markov network models. This density-ratio formulation naturally allows us to introduce sparsity… ▽ More We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we \emph{directly} learn the network structure change by estimating the ratio of Markov network models. This density-ratio formulation naturally allows us to introduce sparsity in the network structure change, which highly contributes to enhancing interpretability. Furthermore, computation of the normalization term, which is a critical bottleneck of the naive approach, can be remarkably mitigated. We also give the dual formulation of the optimization problem, which further reduces the computation cost for large-scale Markov networks. Through experiments, we demonstrate the usefulness of our method. △ Less

Submitted 1 January, 2014; v1 submitted 25 April, 2013; originally announced April 2013.

Showing 1–31 of 31 results for author: Gutmann, M U