-
Robust Neural Posterior Estimation and Statistical Model Criticism
Authors:
Daniel Ward,
Patrick Cannon,
Mark Beaumont,
Matteo Fasiolo,
Sebastian M Schmon
Abstract:
Computer simulations have proven a valuable tool for understanding complex phenomena across the sciences. However, the utility of simulators for modelling and forecasting purposes is often restricted by low data quality, as well as practical limits to model fidelity. In order to circumvent these difficulties, we argue that modellers must treat simulators as idealistic representations of the true d…
▽ More
Computer simulations have proven a valuable tool for understanding complex phenomena across the sciences. However, the utility of simulators for modelling and forecasting purposes is often restricted by low data quality, as well as practical limits to model fidelity. In order to circumvent these difficulties, we argue that modellers must treat simulators as idealistic representations of the true data generating process, and consequently should thoughtfully consider the risk of model misspecification. In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models, and consider the implication of a simulation-to-reality gap. While recent works have demonstrated reliable performance of these methods, the analyses have been performed using synthetic data generated by the simulator model itself, and have therefore only addressed the well-specified case. In this paper, we find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively. As a remedy we argue that principled scientific inquiry with simulators should incorporate a model criticism component, to facilitate interpretable identification of misspecification and a robust inference component, to fit 'wrong but useful' models. We propose robust neural posterior estimation (RNPE), an extension of NPE to simultaneously achieve both these aims, through explicitly modelling the discrepancies between simulations and the observed data. We assess the approach on a range of artificially misspecified examples, and find RNPE performs well across the tasks, whereas naively using NPE leads to misleading and erratic posteriors.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Investigating the Impact of Model Misspecification in Neural Simulation-based Inference
Authors:
Patrick Cannon,
Daniel Ward,
Sebastian M. Schmon
Abstract:
Aided by advances in neural density estimation, considerable progress has been made in recent years towards a suite of simulation-based inference (SBI) methods capable of performing flexible, black-box, approximate Bayesian inference for stochastic simulation models. While it has been demonstrated that neural SBI methods can provide accurate posterior approximations, the simulation studies establi…
▽ More
Aided by advances in neural density estimation, considerable progress has been made in recent years towards a suite of simulation-based inference (SBI) methods capable of performing flexible, black-box, approximate Bayesian inference for stochastic simulation models. While it has been demonstrated that neural SBI methods can provide accurate posterior approximations, the simulation studies establishing these results have considered only well-specified problems -- that is, where the model and the data generating process coincide exactly. However, the behaviour of such algorithms in the case of model misspecification has received little attention. In this work, we provide the first comprehensive study of the behaviour of neural SBI algorithms in the presence of various forms of model misspecification. We find that misspecification can have a profoundly deleterious effect on performance. Some mitigation strategies are explored, but no approach tested prevents failure in all cases. We conclude that new approaches are required to address model misspecification if neural SBI algorithms are to be relied upon to derive accurate scientific conclusions.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
High Performance Simulation for Scalable Multi-Agent Reinforcement Learning
Authors:
Jordan Langham-Lopez,
Sebastian M. Schmon,
Patrick Cannon
Abstract:
Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents…
▽ More
Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents while maintaining high training throughput by running both the environment and reinforcement learning (RL) agents on the GPU. High performance multi-agent environments at this scale have the potential to enable the learning of robust and flexible policies for use in ABMs and simulations of complex systems. We demonstrate training performance with two newly developed, large scale multi-agent training environments. Moreover, we show that these environments can train shared RL policies on time-scales of minutes and hours.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Calibrating Agent-based Models to Microdata with Graph Neural Networks
Authors:
Joel Dyer,
Patrick Cannon,
J. Doyne Farmer,
Sebastian M. Schmon
Abstract:
Calibrating agent-based models (ABMs) to data is among the most fundamental requirements to ensure the model fulfils its desired purpose. In recent years, simulation-based inference methods have emerged as powerful tools for performing this task when the model likelihood function is intractable, as is often the case for ABMs. In some real-world use cases of ABMs, both the observed data and the ABM…
▽ More
Calibrating agent-based models (ABMs) to data is among the most fundamental requirements to ensure the model fulfils its desired purpose. In recent years, simulation-based inference methods have emerged as powerful tools for performing this task when the model likelihood function is intractable, as is often the case for ABMs. In some real-world use cases of ABMs, both the observed data and the ABM output consist of the agents' states and their interactions over time. In such cases, there is a tension between the desire to make full use of the rich information content of such granular data on the one hand, and the need to reduce the dimensionality of the data to prevent difficulties associated with high-dimensional learning tasks on the other. A possible resolution is to construct lower-dimensional time-series through the use of summary statistics describing the macrostate of the system at each time point. However, a poor choice of summary statistics can result in an unacceptable loss of information from the original dataset, dramatically reducing the quality of the resulting calibration. In this work, we instead propose to learn parameter posteriors associated with granular microdata directly using temporal graph neural networks. We will demonstrate that such an approach offers highly compelling inductive biases for Bayesian inference using the raw ABM microstates as output.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Amortised Likelihood-free Inference for Expensive Time-series Simulators with Signatured Ratio Estimation
Authors:
Joel Dyer,
Patrick Cannon,
Sebastian M Schmon
Abstract:
Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, effici…
▽ More
Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, efficient likelihood approximations can be obtained whenever good probabilistic classifiers can be constructed. We propose a kernel classifier for sequential data using path signatures based on the recently introduced signature kernel. We demonstrate that the representative power of signatures yields a highly performant classifier, even in the crucially important case where sample numbers are low. In such scenarios, our approach can outperform sophisticated neural networks for common posterior inference tasks.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Learning Multimodal VAEs through Mutual Supervision
Authors:
Tom Joy,
Yuge Shi,
Philip H. S. Torr,
Tom Rainforth,
Sebastian M. Schmon,
N. Siddharth
Abstract:
Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introdu…
▽ More
Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image-image) and CUB (image-text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.
△ Less
Submitted 16 December, 2022; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Approximate Bayesian Computation with Path Signatures
Authors:
Joel Dyer,
Patrick Cannon,
Sebastian M Schmon
Abstract:
Simulation models often lack tractable likelihood functions, making likelihood-free inference methods indispensable. Approximate Bayesian computation generates likelihood-free posterior samples by comparing simulated and observed data through some distance measure, but existing approaches are often poorly suited to time series simulators, for example due to an independent and identically distribut…
▽ More
Simulation models often lack tractable likelihood functions, making likelihood-free inference methods indispensable. Approximate Bayesian computation generates likelihood-free posterior samples by comparing simulated and observed data through some distance measure, but existing approaches are often poorly suited to time series simulators, for example due to an independent and identically distributed data assumption. In this paper, we propose to use path signatures in approximate Bayesian computation to handle the sequential nature of time series. We provide theoretical guarantees on the resultant posteriors and demonstrate competitive Bayesian parameter inference for simulators generating univariate, multivariate, irregularly spaced, and even non-Euclidean sequences.
△ Less
Submitted 1 February, 2023; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Optimal scaling of random-walk Metropolis algorithms using Bayesian large-sample asymptotics
Authors:
Sebastian M Schmon,
Philippe Gagnon
Abstract:
High-dimensional limit theorems have been shown useful to derive tuning rules for finding the optimal scaling in random-walk Metropolis algorithms. The assumptions under which weak convergence results are proved are however restrictive: the target density is typically assumed to be of a product form. Users may thus doubt the validity of such tuning rules in practical applications. In this paper, w…
▽ More
High-dimensional limit theorems have been shown useful to derive tuning rules for finding the optimal scaling in random-walk Metropolis algorithms. The assumptions under which weak convergence results are proved are however restrictive: the target density is typically assumed to be of a product form. Users may thus doubt the validity of such tuning rules in practical applications. In this paper, we shed some light on optimal-scaling problems from a different perspective, namely a large-sample one. This allows to prove weak convergence results under realistic assumptions and to propose novel parameter-dimension-dependent tuning guidelines. The proposed guidelines are consistent with previous ones when the target density is close to having a product form, and the results highlight that the correlation structure has to be accounted for to avoid performance deterioration if that is not the case, while justifying the use of a natural (asymptotically exact) approximation to the correlation matrix that can be employed for the very first algorithm run.
△ Less
Submitted 14 February, 2022; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Generalized Posteriors in Approximate Bayesian Computation
Authors:
Sebastian M Schmon,
Patrick W Cannon,
Jeremias Knoblauch
Abstract:
Complex simulators have become a ubiquitous tool in many scientific disciplines, providing high-fidelity, implicit probabilistic models of natural and social phenomena. Unfortunately, they typically lack the tractability required for conventional statistical analysis. Approximate Bayesian computation (ABC) has emerged as a key method in simulation-based inference, wherein the true model likelihood…
▽ More
Complex simulators have become a ubiquitous tool in many scientific disciplines, providing high-fidelity, implicit probabilistic models of natural and social phenomena. Unfortunately, they typically lack the tractability required for conventional statistical analysis. Approximate Bayesian computation (ABC) has emerged as a key method in simulation-based inference, wherein the true model likelihood and posterior are approximated using samples from the simulator. In this paper, we draw connections between ABC and generalized Bayesian inference (GBI). First, we re-interpret the accept/reject step in ABC as an implicitly defined error model. We then argue that these implicit error models will invariably be misspecified. While ABC posteriors are often treated as a necessary evil for approximating the standard Bayesian posterior, this allows us to re-interpret ABC as a potential robustification strategy. This leads us to suggest the use of GBI within ABC, a use case we explore empirically.
△ Less
Submitted 23 February, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Capturing Label Characteristics in VAEs
Authors:
Tom Joy,
Sebastian M. Schmon,
Philip H. S. Torr,
N. Siddharth,
Tom Rainforth
Abstract:
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For exampl…
▽ More
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to label values, we argue this is contrary to the intended effect of supervision in VAEs-capturing rich label characteristics with the latents. For example, we may want to capture the characteristics of a face that make it look young, rather than just the age of the person. To this end, we develop the CCVAE, a novel VAE model and concomitant variational objective which captures label characteristics explicitly in the latent space, eschewing direct correspondences between label values and latents. Through judicious structuring of map**s between such characteristic latents and labels, we show that the CCVAE can effectively learn meaningful representations of the characteristics of interest across a variety of supervision schemes. In particular, we show that the CCVAE allows for more effective and more general interventions to be performed, such as smooth traversals within the characteristics for a given label, diverse conditional generation, and transferring characteristics across datapoints.
△ Less
Submitted 16 December, 2022; v1 submitted 17 June, 2020;
originally announced June 2020.
-
A General Framework for Survival Analysis and Multi-State Modelling
Authors:
Stefan Groha,
Sebastian M Schmon,
Alexander Gusev
Abstract:
Survival models are a popular tool for the analysis of time to event data with applications in medicine, engineering, economics, and many more. Advances like the Cox proportional hazard model have enabled researchers to better describe hazard rates for the occurrence of single fatal events, but are unable to accurately model competing events and transitions. Common phenomena are often better descr…
▽ More
Survival models are a popular tool for the analysis of time to event data with applications in medicine, engineering, economics, and many more. Advances like the Cox proportional hazard model have enabled researchers to better describe hazard rates for the occurrence of single fatal events, but are unable to accurately model competing events and transitions. Common phenomena are often better described through multiple states, for example: the progress of a disease modeled as healthy, sick and dead instead of healthy and dead, where the competing nature of death and disease has to be taken into account. Moreover, Cox models are limited by modeling assumptions, like proportionality of hazard rates and linear effects. Individual characteristics can vary significantly between observational units, like patients, resulting in idiosyncratic hazard rates and different disease trajectories. These considerations require flexible modeling assumptions. To overcome these issues, we propose the use of neural ordinary differential equations as a flexible and general method for estimating multi-state survival models by directly solving the Kolmogorov forward equations. To quantify the uncertainty in the resulting individual cause-specific hazard rates, we further introduce a variational latent variable model and show that this enables meaningful clustering with respect to multi-state outcomes as well as interpretability regarding covariate values. We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting
△ Less
Submitted 15 February, 2021; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Implicit Priors for Knowledge Sharing in Bayesian Neural Networks
Authors:
Jack K Fitzsimons,
Sebastian M Schmon,
Stephen J Roberts
Abstract:
Bayesian interpretations of neural network have a long history, dating back to early work in the 1990's and have recently regained attention because of their desirable properties like uncertainty estimation, model robustness and regularisation. We want to discuss here the application of Bayesian models to knowledge sharing between neural networks. Knowledge sharing comes in different facets, such…
▽ More
Bayesian interpretations of neural network have a long history, dating back to early work in the 1990's and have recently regained attention because of their desirable properties like uncertainty estimation, model robustness and regularisation. We want to discuss here the application of Bayesian models to knowledge sharing between neural networks. Knowledge sharing comes in different facets, such as transfer learning, model distillation and shared embeddings. All of these tasks have in common that learned "features" ought to be shared across different networks. Theoretically rooted in the concepts of Bayesian neural networks this work has widespread application to general deep learning.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Bernoulli Race Particle Filters
Authors:
Sebastian M Schmon,
Arnaud Doucet,
George Deligiannidis
Abstract:
When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in comparison to a particle filter using the true weights. We…
▽ More
When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in comparison to a particle filter using the true weights. We propose here a novel algorithm that allows for resampling according to the true intractable weights when only an unbiased estimator of the weights is available. We demonstrate our algorithm on several examples.
△ Less
Submitted 3 March, 2019;
originally announced March 2019.
-
Large Sample Asymptotics of the Pseudo-Marginal Method
Authors:
Sebastian M. Schmon,
George Deligiannidis,
Arnaud Doucet,
Michael K. Pitt
Abstract:
The pseudo-marginal algorithm is a variant of the Metropolis--Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseud…
▽ More
The pseudo-marginal algorithm is a variant of the Metropolis--Hastings algorithm which samples asymptotically from a probability distribution when it is only possible to estimate unbiasedly an unnormalized version of its density. Practically, one has to trade-off the computational resources used to obtain this estimator against the asymptotic variances of the ergodic averages obtained by the pseudo-marginal algorithm. Recent works optimizing this trade-off rely on some strong assumptions which can cast doubts over their practical relevance. In particular, they all assume that the distribution of the difference between the log-density and its estimate is independent of the parameter value at which it is evaluated. Under regularity conditions we show here that, as the number of data points tends to infinity, a space-rescaled version of the pseudo-marginal chain converges weakly towards another pseudo-marginal chain for which this assumption indeed holds. A study of this limiting chain allows us to provide parameter dimension-dependent guidelines on how to optimally scale a normal random walk proposal and the number of Monte Carlo samples for the pseudo-marginal method in the large-sample regime. This complements and validates currently available results.
△ Less
Submitted 2 December, 2019; v1 submitted 26 June, 2018;
originally announced June 2018.