Search | arXiv e-print repository

Robust Neural Posterior Estimation and Statistical Model Criticism

Authors: Daniel Ward, Patrick Cannon, Mark Beaumont, Matteo Fasiolo, Sebastian M Schmon

Abstract: Computer simulations have proven a valuable tool for understanding complex phenomena across the sciences. However, the utility of simulators for modelling and forecasting purposes is often restricted by low data quality, as well as practical limits to model fidelity. In order to circumvent these difficulties, we argue that modellers must treat simulators as idealistic representations of the true d… ▽ More Computer simulations have proven a valuable tool for understanding complex phenomena across the sciences. However, the utility of simulators for modelling and forecasting purposes is often restricted by low data quality, as well as practical limits to model fidelity. In order to circumvent these difficulties, we argue that modellers must treat simulators as idealistic representations of the true data generating process, and consequently should thoughtfully consider the risk of model misspecification. In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models, and consider the implication of a simulation-to-reality gap. While recent works have demonstrated reliable performance of these methods, the analyses have been performed using synthetic data generated by the simulator model itself, and have therefore only addressed the well-specified case. In this paper, we find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively. As a remedy we argue that principled scientific inquiry with simulators should incorporate a model criticism component, to facilitate interpretable identification of misspecification and a robust inference component, to fit 'wrong but useful' models. We propose robust neural posterior estimation (RNPE), an extension of NPE to simultaneously achieve both these aims, through explicitly modelling the discrepancies between simulations and the observed data. We assess the approach on a range of artificially misspecified examples, and find RNPE performs well across the tasks, whereas naively using NPE leads to misleading and erratic posteriors. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.01845 [pdf, other]

Investigating the Impact of Model Misspecification in Neural Simulation-based Inference

Authors: Patrick Cannon, Daniel Ward, Sebastian M. Schmon

Abstract: Aided by advances in neural density estimation, considerable progress has been made in recent years towards a suite of simulation-based inference (SBI) methods capable of performing flexible, black-box, approximate Bayesian inference for stochastic simulation models. While it has been demonstrated that neural SBI methods can provide accurate posterior approximations, the simulation studies establi… ▽ More Aided by advances in neural density estimation, considerable progress has been made in recent years towards a suite of simulation-based inference (SBI) methods capable of performing flexible, black-box, approximate Bayesian inference for stochastic simulation models. While it has been demonstrated that neural SBI methods can provide accurate posterior approximations, the simulation studies establishing these results have considered only well-specified problems -- that is, where the model and the data generating process coincide exactly. However, the behaviour of such algorithms in the case of model misspecification has received little attention. In this work, we provide the first comprehensive study of the behaviour of neural SBI algorithms in the presence of various forms of model misspecification. We find that misspecification can have a profoundly deleterious effect on performance. Some mitigation strategies are explored, but no approach tested prevents failure in all cases. We conclude that new approaches are required to address model misspecification if neural SBI algorithms are to be relied upon to derive accurate scientific conclusions. △ Less

Submitted 5 September, 2022; originally announced September 2022.

arXiv:2207.03945 [pdf, other]

High Performance Simulation for Scalable Multi-Agent Reinforcement Learning

Authors: Jordan Langham-Lopez, Sebastian M. Schmon, Patrick Cannon

Abstract: Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents… ▽ More Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents while maintaining high training throughput by running both the environment and reinforcement learning (RL) agents on the GPU. High performance multi-agent environments at this scale have the potential to enable the learning of robust and flexible policies for use in ABMs and simulations of complex systems. We demonstrate training performance with two newly developed, large scale multi-agent training environments. Moreover, we show that these environments can train shared RL policies on time-scales of minutes and hours. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: Accepted at the Workshop AI4ABM at ICML 2022 (Spotlight)

arXiv:2206.07570 [pdf, other]

Calibrating Agent-based Models to Microdata with Graph Neural Networks

Authors: Joel Dyer, Patrick Cannon, J. Doyne Farmer, Sebastian M. Schmon

Abstract: Calibrating agent-based models (ABMs) to data is among the most fundamental requirements to ensure the model fulfils its desired purpose. In recent years, simulation-based inference methods have emerged as powerful tools for performing this task when the model likelihood function is intractable, as is often the case for ABMs. In some real-world use cases of ABMs, both the observed data and the ABM… ▽ More Calibrating agent-based models (ABMs) to data is among the most fundamental requirements to ensure the model fulfils its desired purpose. In recent years, simulation-based inference methods have emerged as powerful tools for performing this task when the model likelihood function is intractable, as is often the case for ABMs. In some real-world use cases of ABMs, both the observed data and the ABM output consist of the agents' states and their interactions over time. In such cases, there is a tension between the desire to make full use of the rich information content of such granular data on the one hand, and the need to reduce the dimensionality of the data to prevent difficulties associated with high-dimensional learning tasks on the other. A possible resolution is to construct lower-dimensional time-series through the use of summary statistics describing the macrostate of the system at each time point. However, a poor choice of summary statistics can result in an unacceptable loss of information from the original dataset, dramatically reducing the quality of the resulting calibration. In this work, we instead propose to learn parameter posteriors associated with granular microdata directly using temporal graph neural networks. We will demonstrate that such an approach offers highly compelling inductive biases for Bayesian inference using the raw ABM microstates as output. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted for a Spotlight presentation at the ICML 2022 Artificial Intelligence for Agent-based Modelling (AI4ABM) Workshop

arXiv:2202.11585 [pdf, other]

Amortised Likelihood-free Inference for Expensive Time-series Simulators with Signatured Ratio Estimation

Authors: Joel Dyer, Patrick Cannon, Sebastian M Schmon

Abstract: Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, effici… ▽ More Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, efficient likelihood approximations can be obtained whenever good probabilistic classifiers can be constructed. We propose a kernel classifier for sequential data using path signatures based on the recently introduced signature kernel. We demonstrate that the representative power of signatures yields a highly performant classifier, even in the crucially important case where sample numbers are low. In such scenarios, our approach can outperform sophisticated neural networks for common posterior inference tasks. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted for publication at AISTATS 2022

arXiv:2106.12570 [pdf, other]

Learning Multimodal VAEs through Mutual Supervision

Authors: Tom Joy, Yuge Shi, Philip H. S. Torr, Tom Rainforth, Sebastian M. Schmon, N. Siddharth

Abstract: Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introdu… ▽ More Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image-image) and CUB (image-text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data. △ Less

Submitted 16 December, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.12555 [pdf, other]

Approximate Bayesian Computation with Path Signatures

Authors: Joel Dyer, Patrick Cannon, Sebastian M Schmon

Abstract: Simulation models often lack tractable likelihood functions, making likelihood-free inference methods indispensable. Approximate Bayesian computation generates likelihood-free posterior samples by comparing simulated and observed data through some distance measure, but existing approaches are often poorly suited to time series simulators, for example due to an independent and identically distribut… ▽ More Simulation models often lack tractable likelihood functions, making likelihood-free inference methods indispensable. Approximate Bayesian computation generates likelihood-free posterior samples by comparing simulated and observed data through some distance measure, but existing approaches are often poorly suited to time series simulators, for example due to an independent and identically distributed data assumption. In this paper, we propose to use path signatures in approximate Bayesian computation to handle the sequential nature of time series. We provide theoretical guarantees on the resultant posteriors and demonstrate competitive Bayesian parameter inference for simulators generating univariate, multivariate, irregularly spaced, and even non-Euclidean sequences. △ Less

Submitted 1 February, 2023; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: 42 pages, 8 figures

arXiv:2104.06384 [pdf, other]

Optimal scaling of random-walk Metropolis algorithms using Bayesian large-sample asymptotics

Authors: Sebastian M Schmon, Philippe Gagnon

Abstract: High-dimensional limit theorems have been shown useful to derive tuning rules for finding the optimal scaling in random-walk Metropolis algorithms. The assumptions under which weak convergence results are proved are however restrictive: the target density is typically assumed to be of a product form. Users may thus doubt the validity of such tuning rules in practical applications. In this paper, w… ▽ More High-dimensional limit theorems have been shown useful to derive tuning rules for finding the optimal scaling in random-walk Metropolis algorithms. The assumptions under which weak convergence results are proved are however restrictive: the target density is typically assumed to be of a product form. Users may thus doubt the validity of such tuning rules in practical applications. In this paper, we shed some light on optimal-scaling problems from a different perspective, namely a large-sample one. This allows to prove weak convergence results under realistic assumptions and to propose novel parameter-dimension-dependent tuning guidelines. The proposed guidelines are consistent with previous ones when the target density is close to having a product form, and the results highlight that the correlation structure has to be accounted for to avoid performance deterioration if that is not the case, while justifying the use of a natural (asymptotically exact) approximation to the correlation matrix that can be employed for the very first algorithm run. △ Less

Submitted 14 February, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: Both authors contributed equally. The paper is to appear in Statistics and Computing

arXiv:2011.08644 [pdf, other]

Generalized Posteriors in Approximate Bayesian Computation

Authors: Sebastian M Schmon, Patrick W Cannon, Jeremias Knoblauch

Abstract: Complex simulators have become a ubiquitous tool in many scientific disciplines, providing high-fidelity, implicit probabilistic models of natural and social phenomena. Unfortunately, they typically lack the tractability required for conventional statistical analysis. Approximate Bayesian computation (ABC) has emerged as a key method in simulation-based inference, wherein the true model likelihood… ▽ More Complex simulators have become a ubiquitous tool in many scientific disciplines, providing high-fidelity, implicit probabilistic models of natural and social phenomena. Unfortunately, they typically lack the tractability required for conventional statistical analysis. Approximate Bayesian computation (ABC) has emerged as a key method in simulation-based inference, wherein the true model likelihood and posterior are approximated using samples from the simulator. In this paper, we draw connections between ABC and generalized Bayesian inference (GBI). First, we re-interpret the accept/reject step in ABC as an implicitly defined error model. We then argue that these implicit error models will invariably be misspecified. While ABC posteriors are often treated as a necessary evil for approximating the standard Bayesian posterior, this allows us to re-interpret ABC as a potential robustification strategy. This leads us to suggest the use of GBI within ABC, a use case we explore empirically. △ Less

Submitted 23 February, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

Comments: Accepted at Advances in Approximate Bayesian Inference, AABI 2020

arXiv:2006.10102 [pdf, other]

Capturing Label Characteristics in VAEs

Authors: Tom Joy, Sebastian M. Schmon, Philip H. S. Torr, N. Siddharth, Tom Rainforth

Abstract: We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For exampl… ▽ More We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For example, we may want to capture the characteristics of a face that make it look young, rather than just the age of the person. To this end, we develop the CCVAE, a novel VAE model and concomitant variational objective which captures label characteristics explicitly in the latent space, eschewing direct correspondences between label values and latents. Through judicious structuring of map**s between such characteristic latents and labels, we show that the CCVAE can effectively learn meaningful representations of the characteristics of interest across a variety of supervision schemes. In particular, we show that the CCVAE allows for more effective and more general interventions to be performed, such as smooth traversals within the characteristics for a given label, diverse conditional generation, and transferring characteristics across datapoints. △ Less

Submitted 16 December, 2022; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Accepted to ICLR 2021

arXiv:2006.04893 [pdf, other]

A General Framework for Survival Analysis and Multi-State Modelling

Authors: Stefan Groha, Sebastian M Schmon, Alexander Gusev

Abstract: Survival models are a popular tool for the analysis of time to event data with applications in medicine, engineering, economics, and many more. Advances like the Cox proportional hazard model have enabled researchers to better describe hazard rates for the occurrence of single fatal events, but are unable to accurately model competing events and transitions. Common phenomena are often better descr… ▽ More Survival models are a popular tool for the analysis of time to event data with applications in medicine, engineering, economics, and many more. Advances like the Cox proportional hazard model have enabled researchers to better describe hazard rates for the occurrence of single fatal events, but are unable to accurately model competing events and transitions. Common phenomena are often better described through multiple states, for example: the progress of a disease modeled as healthy, sick and dead instead of healthy and dead, where the competing nature of death and disease has to be taken into account. Moreover, Cox models are limited by modeling assumptions, like proportionality of hazard rates and linear effects. Individual characteristics can vary significantly between observational units, like patients, resulting in idiosyncratic hazard rates and different disease trajectories. These considerations require flexible modeling assumptions. To overcome these issues, we propose the use of neural ordinary differential equations as a flexible and general method for estimating multi-state survival models by directly solving the Kolmogorov forward equations. To quantify the uncertainty in the resulting individual cause-specific hazard rates, we further introduce a variational latent variable model and show that this enables meaningful clustering with respect to multi-state outcomes as well as interpretability regarding covariate values. We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting △ Less

Submitted 15 February, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 19 pages, 14 figures

arXiv:1912.00874 [pdf, other]

Implicit Priors for Knowledge Sharing in Bayesian Neural Networks

Authors: Jack K Fitzsimons, Sebastian M Schmon, Stephen J Roberts

Abstract: Bayesian interpretations of neural network have a long history, dating back to early work in the 1990's and have recently regained attention because of their desirable properties like uncertainty estimation, model robustness and regularisation. We want to discuss here the application of Bayesian models to knowledge sharing between neural networks. Knowledge sharing comes in different facets, such… ▽ More Bayesian interpretations of neural network have a long history, dating back to early work in the 1990's and have recently regained attention because of their desirable properties like uncertainty estimation, model robustness and regularisation. We want to discuss here the application of Bayesian models to knowledge sharing between neural networks. Knowledge sharing comes in different facets, such as transfer learning, model distillation and shared embeddings. All of these tasks have in common that learned "features" ought to be shared across different networks. Theoretically rooted in the concepts of Bayesian neural networks this work has widespread application to general deep learning. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 5 pages, 2 figures

Journal ref: 4th workshop on Bayesian Deep Learning (NeurIPS 2019)

arXiv:1903.00939 [pdf, other]

Bernoulli Race Particle Filters

Authors: Sebastian M Schmon, Arnaud Doucet, George Deligiannidis

Abstract: When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in comparison to a particle filter using the true weights. We… ▽ More When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in comparison to a particle filter using the true weights. We propose here a novel algorithm that allows for resampling according to the true intractable weights when only an unbiased estimator of the weights is available. We demonstrate our algorithm on several examples. △ Less

Submitted 3 March, 2019; originally announced March 2019.

Comments: 19 pages

Journal ref: The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)

arXiv:1806.10060 [pdf, other]

Large Sample Asymptotics of the Pseudo-Marginal Method

Authors: Sebastian M. Schmon, George Deligiannidis, Arnaud Doucet, Michael K. Pitt

Abstract: The pseudo-marginal algorithm is a variant of the Metropolis--Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseud… ▽ More The pseudo-marginal algorithm is a variant of the Metropolis--Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseudo-marginal algorithm. Recent works optimizing this trade-off rely on some strong assumptions which can cast doubts over their practical relevance. In particular, they all assume that the distribution of the difference between the log-density and its estimate is independent of the parameter value at which it is evaluated. Under regularity conditions we show here that, as the number of data points tends to infinity, a space-rescaled version of the pseudo-marginal chain converges weakly towards another pseudo-marginal chain for which this assumption indeed holds. A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime. This complements and validates currently available results. △ Less

Submitted 2 December, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

Comments: 76 pages, 3 figures

Showing 1–14 of 14 results for author: Schmon, S M