Search | arXiv e-print repository

A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be Unfaithful

Authors: Joeri Hermans, Arnaud Delaunoy, François Rozet, Antoine Wehenkel, Volodimir Begy, Gilles Louppe

Abstract: We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms can produce computationally unfaithful posterior approximations. Our results show that all benchmarked algorithms -- (Sequential) Neural Posterior Estimation, (Sequential) Neural Ratio Estimation, Sequential Neural Likelihood and variants of Approximate Bayesian Computation -- can yield over… ▽ More We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms can produce computationally unfaithful posterior approximations. Our results show that all benchmarked algorithms -- (Sequential) Neural Posterior Estimation, (Sequential) Neural Ratio Estimation, Sequential Neural Likelihood and variants of Approximate Bayesian Computation -- can yield overconfident posterior approximations, which makes them unreliable for scientific use cases and falsificationist inquiry. Failing to address this issue may reduce the range of applicability of simulation-based inference. For this reason, we argue that research efforts should be made towards theoretical and methodological developments of conservative approximate inference algorithms and present research directions towards this objective. In this regard, we show empirical evidence that ensembling posterior surrogates provides more reliable approximations and mitigates the issue. △ Less

Submitted 4 December, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: TMLR version

arXiv:2010.06735 [pdf, other]

Error-guided likelihood-free MCMC

Authors: Volodimir Begy, Erich Schikuta

Abstract: This work presents a novel posterior inference method for models with intractable evidence and likelihood functions. Error-guided likelihood-free MCMC, or EG-LF-MCMC in short, has been developed for scientific applications, where a researcher is interested in obtaining approximate posterior densities over model parameters, while avoiding the need for expensive training of component estimators on f… ▽ More This work presents a novel posterior inference method for models with intractable evidence and likelihood functions. Error-guided likelihood-free MCMC, or EG-LF-MCMC in short, has been developed for scientific applications, where a researcher is interested in obtaining approximate posterior densities over model parameters, while avoiding the need for expensive training of component estimators on full observational data or the tedious design of expressive summary statistics, as in related approaches. Our technique is based on two phases. In the first phase, we draw samples from the prior, simulate respective observations and record their errors $ε$ in relation to the true observation. We train a classifier to distinguish between corresponding and non-corresponding $(ε, \boldsymbolθ)$-tuples. In the second stage the said classifier is conditioned on the smallest recorded $ε$ value from the training set and employed for the calculation of transition probabilities in a Markov Chain Monte Carlo sampling procedure. By conditioning the MCMC on specific $ε$ values, our method may also be used in an amortized fashion to infer posterior densities for observations, which are located a given distance away from the observed data. We evaluate the proposed method on benchmark problems with semantically and structurally different data and compare its performance against the state of the art approximate Bayesian computation (ABC). △ Less

Submitted 23 April, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

arXiv:1903.04057 [pdf, other]

Likelihood-free MCMC with Amortized Approximate Ratio Estimators

Authors: Joeri Hermans, Volodimir Begy, Gilles Louppe

Abstract: Posterior inference with an intractable likelihood is becoming an increasingly common task in scientific domains which rely on sophisticated computer simulations. Typically, these forward models do not admit tractable densities forcing practitioners to make use of approximations. This work introduces a novel approach to address the intractability of the likelihood and the marginal model. We achiev… ▽ More Posterior inference with an intractable likelihood is becoming an increasingly common task in scientific domains which rely on sophisticated computer simulations. Typically, these forward models do not admit tractable densities forcing practitioners to make use of approximations. This work introduces a novel approach to address the intractability of the likelihood and the marginal model. We achieve this by learning a flexible amortized estimator which approximates the likelihood-to-evidence ratio. We demonstrate that the learned ratio estimator can be embedded in MCMC samplers to approximate likelihood-ratios between consecutive states in the Markov chain, allowing us to draw samples from the intractable posterior. Techniques are presented to improve the numerical stability and to measure the quality of an approximation. The accuracy of our approach is demonstrated on a variety of benchmarks against well-established techniques. Scientific applications in physics show its applicability. △ Less

Submitted 26 June, 2020; v1 submitted 10 March, 2019; originally announced March 2019.

Comments: v5: Camera-ready version presented at ICML 2020

arXiv:1902.10069 [pdf, other]

Simulating Data Access Profiles of Computational Jobs in Data Grids

Authors: Volodimir Begy, Joeri Hermans, Martin Barisits, Mario Lassnig, Erich Schikuta

Abstract: The data access patterns of applications running in computing grids are changing due to the recent proliferation of high speed local and wide area networks. The data-intensive jobs are no longer strictly required to run at the computing sites, where the respective input data are located. Instead, jobs may access the data employing arbitrary combinations of data-placement, stage-in and remote data… ▽ More The data access patterns of applications running in computing grids are changing due to the recent proliferation of high speed local and wide area networks. The data-intensive jobs are no longer strictly required to run at the computing sites, where the respective input data are located. Instead, jobs may access the data employing arbitrary combinations of data-placement, stage-in and remote data access. These data access profiles exhibit partially non-overlap** throughput bottlenecks. This fact can be exploited in order to minimize the time jobs spend waiting for input data. In this work we present a novel grid computing simulator, which puts a heavy emphasis on the various data access profiles. The fundamental assumptions underlying our simulator are justified by empirical experiments performed in the Worldwide LHC Computing Grid (WLCG) at CERN. We demonstrate how to calibrate the simulator parameters in accordance with the true system using posterior inference with likelihood-free Markov Chain Monte Carlo. Thereafter, we validate the simulator's output with respect to an authentic production workload from WLCG, demonstrating its remarkable accuracy. △ Less

Submitted 12 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Showing 1–4 of 4 results for author: Begy, V