-
Conditional Bayesian Quadrature
Authors:
Zonghao Chen,
Masha Naslidnyk,
Arthur Gretton,
François-Xavier Briol
Abstract:
We propose a novel approach for estimating conditional or parametric expectations in the setting where obtaining samples or evaluating integrands is costly. Through the framework of probabilistic numerical methods (such as Bayesian quadrature), our novel approach allows to incorporates prior information about the integrands especially the prior smoothness knowledge about the integrands and the con…
▽ More
We propose a novel approach for estimating conditional or parametric expectations in the setting where obtaining samples or evaluating integrands is costly. Through the framework of probabilistic numerical methods (such as Bayesian quadrature), our novel approach allows to incorporates prior information about the integrands especially the prior smoothness knowledge about the integrands and the conditional expectation. As a result, our approach provides a way of quantifying uncertainty and leads to a fast convergence rate, which is confirmed both theoretically and empirically on challenging tasks in Bayesian sensitivity analysis, computational finance and decision making under uncertainty.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Outlier-robust Kalman Filtering through Generalised Bayes
Authors:
Gerardo Duran-Martin,
Matias Altamirano,
Alexander Y. Shestopaloff,
Leandro Sánchez-Betancourt,
Jeremias Knoblauch,
Matt Jones,
François-Xavier Briol,
Kevin Murphy
Abstract:
We derive a novel, provably robust, and closed-form Bayesian update rule for online filtering in state-space models in the presence of outliers and misspecified measurement models. Our method combines generalised Bayesian inference with filtering methods such as the extended and ensemble Kalman filter. We use the former to show robustness and the latter to ensure computational efficiency in the ca…
▽ More
We derive a novel, provably robust, and closed-form Bayesian update rule for online filtering in state-space models in the presence of outliers and misspecified measurement models. Our method combines generalised Bayesian inference with filtering methods such as the extended and ensemble Kalman filter. We use the former to show robustness and the latter to ensure computational efficiency in the case of nonlinear models. Our method matches or outperforms other robust filtering methods (such as those based on variational Bayes) at a much lower computational cost. We show this empirically on a range of filtering problems with outlier measurements, such as object tracking, state estimation in high-dimensional chaotic systems, and online learning of neural networks.
△ Less
Submitted 28 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Robust and Conjugate Gaussian Process Regression
Authors:
Matias Altamirano,
François-Xavier Briol,
Jeremias Knoblauch
Abstract:
To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes th…
▽ More
To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.
△ Less
Submitted 3 June, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Bayesian Numerical Integration with Neural Networks
Authors:
Katharina Ott,
Michael Tiemann,
Philipp Hennig,
François-Xavier Briol
Abstract:
Bayesian probabilistic numerical methods for numerical integration offer significant advantages over their non-Bayesian counterparts: they can encode prior information about the integrand, and can quantify uncertainty over estimates of an integral. However, the most popular algorithm in this class, Bayesian quadrature, is based on Gaussian process models and is therefore associated with a high com…
▽ More
Bayesian probabilistic numerical methods for numerical integration offer significant advantages over their non-Bayesian counterparts: they can encode prior information about the integrand, and can quantify uncertainty over estimates of an integral. However, the most popular algorithm in this class, Bayesian quadrature, is based on Gaussian process models and is therefore associated with a high computational cost. To improve scalability, we propose an alternative approach based on Bayesian neural networks which we call Bayesian Stein networks. The key ingredients are a neural network architecture based on Stein operators, and an approximation of the Bayesian posterior based on the Laplace approximation. We show that this leads to orders of magnitude speed-ups on the popular Genz functions benchmark, and on challenging problems arising in the Bayesian analysis of dynamical systems, and the prediction of energy production for a large-scale wind farm.
△ Less
Submitted 10 September, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Meta-learning Control Variates: Variance Reduction with Limited Data
Authors:
Zhuo Sun,
Chris J. Oates,
François-Xavier Briol
Abstract:
Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of…
▽ More
Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of samples per task is very small. Our approach, called meta learning CVs (Meta-CVs), can be used for up to hundreds or thousands of tasks. Our empirical assessment indicates that Meta-CVs can lead to significant variance reduction in such settings, and our theoretical analysis establishes general conditions under which Meta-CVs can be successfully trained.
△ Less
Submitted 7 June, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Robust and Scalable Bayesian Online Changepoint Detection
Authors:
Matias Altamirano,
François-Xavier Briol,
Jeremias Knoblauch
Abstract:
This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate poster…
▽ More
This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate posteriors whose parameters are available in closed form by leveraging diffusion score matching. The resulting algorithm is exact, can be updated through simple algebra, and is more than 10 times faster than its closest competitor.
△ Less
Submitted 12 May, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Optimally-Weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference
Authors:
Ayush Bharti,
Masha Naslidnyk,
Oscar Key,
Samuel Kaski,
François-Xavier Briol
Abstract:
Likelihood-free inference methods typically make use of a distance between simulated and real data. A common example is the maximum mean discrepancy (MMD), which has previously been used for approximate Bayesian computation, minimum distance estimation, generalised Bayesian inference, and within the nonparametric learning framework. The MMD is commonly estimated at a root-$m$ rate, where $m$ is th…
▽ More
Likelihood-free inference methods typically make use of a distance between simulated and real data. A common example is the maximum mean discrepancy (MMD), which has previously been used for approximate Bayesian computation, minimum distance estimation, generalised Bayesian inference, and within the nonparametric learning framework. The MMD is commonly estimated at a root-$m$ rate, where $m$ is the number of simulated samples. This can lead to significant computational challenges since a large $m$ is required to obtain an accurate estimate, which is crucial for parameter estimation. In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity. The estimator is particularly well suited for computationally expensive smooth simulators with low- to mid-dimensional inputs. This claim is supported through both theoretical results and an extensive simulation study on benchmark simulators.
△ Less
Submitted 10 May, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Data-driven modelling of turbine wake interactions and flow resistance in large wind farms
Authors:
Andrew Kirby,
François-Xavier Briol,
Thomas D. Dunstan,
Takafumi Nishino
Abstract:
Turbine wake and local blockage effects are known to alter wind farm power production in two different ways: (1) by changing the wind speed locally in front of each turbine; and (2) by changing the overall flow resistance in the farm and thus the so-called farm blockage effect. To better predict these effects with low computational costs, we develop data-driven emulators of the `local' or `interna…
▽ More
Turbine wake and local blockage effects are known to alter wind farm power production in two different ways: (1) by changing the wind speed locally in front of each turbine; and (2) by changing the overall flow resistance in the farm and thus the so-called farm blockage effect. To better predict these effects with low computational costs, we develop data-driven emulators of the `local' or `internal' turbine thrust coefficient $C_T^*$ as a function of turbine layout. We train the model using a multi-fidelity Gaussian Process (GP) regression with a combination of low (engineering wake model) and high-fidelity (Large-Eddy Simulations) simulations of farms with different layouts and wind directions. A large set of low-fidelity data speeds up the learning process and the high-fidelity data ensures a high accuracy. The trained multi-fidelity GP model is shown to give more accurate predictions of $C_T^*$ compared to a standard (single-fidelity) GP regression applied only to a limited set of high-fidelity data. We also use the multi-fidelity GP model of $C_T^*$ with the two-scale momentum theory (Nishino \& Dunstan 2020, J. Fluid Mech. 894, A2) to demonstrate that the model can be used to give fast and accurate predictions of large wind farm performance under various mesoscale atmospheric conditions. This new approach could be beneficial for improving annual energy production (AEP) calculations and farm optimisation in the future.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Multilevel Bayesian Quadrature
Authors:
Kaiyu Li,
Daniel Giles,
Toni Karvonen,
Serge Guillas,
François-Xavier Briol
Abstract:
Multilevel Monte Carlo is a key tool for approximating integrals involving expensive scientific models. The idea is to use approximations of the integrand to construct an estimator with improved accuracy over classical Monte Carlo. We propose to further enhance multilevel Monte Carlo through Bayesian surrogate models of the integrand, focusing on Gaussian process models and the associated Bayesian…
▽ More
Multilevel Monte Carlo is a key tool for approximating integrals involving expensive scientific models. The idea is to use approximations of the integrand to construct an estimator with improved accuracy over classical Monte Carlo. We propose to further enhance multilevel Monte Carlo through Bayesian surrogate models of the integrand, focusing on Gaussian process models and the associated Bayesian quadrature estimators. We show, using both theory and numerical experiments, that our approach can lead to significant improvements in accuracy when the integrand is expensive and smooth, and when the dimensionality is small or moderate. We conclude the paper with a case study illustrating the potential impact of our method in landslide-generated tsunami modelling, where the cost of each integrand evaluation is typically too large for operational settings.
△ Less
Submitted 13 March, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Towards Healing the Blindness of Score Matching
Authors:
Mingtian Zhang,
Oscar Key,
Peter Hayes,
David Barber,
Brooks Paige,
François-Xavier Briol
Abstract:
Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of de…
▽ More
Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches.
△ Less
Submitted 15 October, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Generalised Bayesian Inference for Discrete Intractable Likelihood
Authors:
Takuo Matsubara,
Jeremias Knoblauch,
François-Xavier Briol,
Chris. J. Oates
Abstract:
Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspir…
▽ More
Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence, in lieu of the problematic intractable likelihood. The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov chain Monte Carlo, circumventing the intractable normalising constant. The statistical properties of the generalised posterior are analysed, with sufficient conditions for posterior consistency and asymptotic normality established. In addition, a novel and general approach to calibration of generalised posteriors is proposed. Applications are presented on lattice models for discrete spatial data and on multivariate models for count data, where in each case the methodology facilitates generalised Bayesian inference at low computational cost.
△ Less
Submitted 1 September, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap
Authors:
Charita Dellaporta,
Jeremias Knoblauch,
Theodoros Damoulas,
François-Xavier Briol
Abstract:
Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posteri…
▽ More
Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds and proofs of frequentist consistency and robustness of our posterior. The approach is then assessed on a range of examples including a g-and-k distribution and a toggle-switch model.
△ Less
Submitted 19 December, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
ProbNum: Probabilistic Numerics in Python
Authors:
Jonathan Wenger,
Nicholas Krämer,
Marvin Pförtner,
Jonathan Schmidt,
Nathanael Bosch,
Nina Effenberger,
Johannes Zenn,
Alexandra Gessner,
Toni Karvonen,
François-Xavier Briol,
Maren Mahsereci,
Philipp Hennig
Abstract:
Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python l…
▽ More
Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python library providing state-of-the-art probabilistic numerical solvers. ProbNum enables custom composition of PNMs for specific problem classes via a modular design as well as wrappers for off-the-shelf use. Tutorials, documentation, developer guides and benchmarks are available online at www.probnum.org.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Composite Goodness-of-fit Tests with Kernels
Authors:
Oscar Key,
Arthur Gretton,
François-Xavier Briol,
Tamara Fernandez
Abstract:
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In…
▽ More
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In this paper, we propose one such method. More precisely, we propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any distribution in some parametric family. Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy. They are widely applicable, including whenever the density of the parametric model is known up to normalisation constant, or if the model takes the form of a simulator. As our main result, we show that we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. Our approach is illustrated on a range of problems, including testing for goodness-of-fit of an unnormalised non-parametric density model, and an intractable generative model of a biological cellular network.
△ Less
Submitted 27 February, 2024; v1 submitted 19 November, 2021;
originally announced November 2021.
-
Vector-Valued Control Variates
Authors:
Zhuo Sun,
Alessandro Barp,
François-Xavier Briol
Abstract:
Control variates are variance reduction tools for Monte Carlo estimators. They can provide significant variance reduction, but usually require a large number of samples, which can be prohibitive when sampling or evaluating the integrand is computationally expensive. Furthermore, there are many scenarios where we need to compute multiple related integrals simultaneously or sequentially, which can f…
▽ More
Control variates are variance reduction tools for Monte Carlo estimators. They can provide significant variance reduction, but usually require a large number of samples, which can be prohibitive when sampling or evaluating the integrand is computationally expensive. Furthermore, there are many scenarios where we need to compute multiple related integrals simultaneously or sequentially, which can further exacerbate computational costs. In this paper, we propose vector-valued control variates, an extension of control variates which can be used to reduce the variance of multiple Monte Carlo estimators jointly. This allows for the transfer of information across integration tasks, and hence reduces the need for a large number of samples. We focus on control variates based on kernel interpolants and our novel construction is obtained through a generalised Stein identity and the development of novel matrix-valued Stein reproducing kernels. We demonstrate our methodology on a range of problems including multifidelity modelling, Bayesian inference for dynamical systems, and model evidence computation through thermodynamic integration.
△ Less
Submitted 7 June, 2023; v1 submitted 18 September, 2021;
originally announced September 2021.
-
Discrepancy-based Inference for Intractable Generative Models using Quasi-Monte Carlo
Authors:
Ziang Niu,
Johanna Meier,
François-Xavier Briol
Abstract:
Intractable generative models are models for which the likelihood is unavailable but sampling is possible. Most approaches to parameter inference in this setting require the computation of some discrepancy between the data and the generative model. This is for example the case for minimum distance estimation and approximate Bayesian computation. These approaches require sampling a high number of r…
▽ More
Intractable generative models are models for which the likelihood is unavailable but sampling is possible. Most approaches to parameter inference in this setting require the computation of some discrepancy between the data and the generative model. This is for example the case for minimum distance estimation and approximate Bayesian computation. These approaches require sampling a high number of realisations from the model for different parameter values, which can be a significant challenge when simulating is an expensive operation. In this paper, we propose to enhance this approach by enforcing "sample diversity" in simulations of our models. This will be implemented through the use of quasi-Monte Carlo (QMC) point sets. Our key results are sample complexity bounds which demonstrate that, under smoothness conditions on the generator, QMC can significantly reduce the number of samples required to obtain a given level of accuracy when using three of the most common discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the Sinkhorn divergence. This is complemented by a simulation study which highlights that an improved accuracy is sometimes also possible in some settings which are not covered by the theory.
△ Less
Submitted 4 July, 2022; v1 submitted 22 June, 2021;
originally announced June 2021.
-
Stein's Method Meets Computational Statistics: A Review of Some Recent Developments
Authors:
Andreas Anastasiou,
Alessandro Barp,
François-Xavier Briol,
Bruno Ebner,
Robert E. Gaunt,
Fatemeh Ghaderinezhad,
Jackson Gorham,
Arthur Gretton,
Christophe Ley,
Qiang Liu,
Lester Mackey,
Chris. J. Oates,
Gesine Reinert,
Yvik Swan
Abstract:
Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stim…
▽ More
Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stimulate further research into the successful field of Stein's method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing.
△ Less
Submitted 22 June, 2022; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Robust Generalised Bayesian Inference for Intractable Likelihoods
Authors:
Takuo Matsubara,
Jeremias Knoblauch,
François-Xavier Briol,
Chris. J. Oates
Abstract:
Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In thi…
▽ More
Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In this context, the Stein discrepancy circumvents evaluation of the normalisation constant and produces generalised posteriors that are either closed form or accessible using standard Markov chain Monte Carlo. On a theoretical level, we show consistency, asymptotic normality, and bias-robustness of the generalised posterior, highlighting how these properties are impacted by the choice of Stein discrepancy. Then, we provide numerical experiments on a range of intractable distributions, including applications to kernel-based exponential family models and non-Gaussian graphical models.
△ Less
Submitted 11 January, 2022; v1 submitted 15 April, 2021;
originally announced April 2021.
-
A General Method for Calibrating Stochastic Radio Channel Models with Kernels
Authors:
Ayush Bharti,
Francois-Xavier Briol,
Troels Pedersen
Abstract:
Calibrating stochastic radio channel models to new measurement data is challenging when the likelihood function is intractable. The standard approach to this problem involves sophisticated algorithms for extraction and clustering of multipath components, following which, point estimates of the model parameters can be obtained using specialized estimators. We propose a likelihood-free calibration m…
▽ More
Calibrating stochastic radio channel models to new measurement data is challenging when the likelihood function is intractable. The standard approach to this problem involves sophisticated algorithms for extraction and clustering of multipath components, following which, point estimates of the model parameters can be obtained using specialized estimators. We propose a likelihood-free calibration method using approximate Bayesian computation. The method is based on the maximum mean discrepancy, which is a notion of distance between probability distributions. Our method not only by-passes the need to implement any high-resolution or clustering algorithm, but is also automatic in that it does not require any additional input or manual pre-processing from the user. It also has the advantage of returning an entire posterior distribution on the value of the parameters, rather than a simple point estimate. We evaluate the performance of the proposed method by fitting two different stochastic channel models, namely the Saleh-Valenzuela model and the propagation graph model, to both simulated and measured data. The proposed method is able to estimate the parameters of both the models accurately in simulations, as well as when applied to 60 GHz indoor measurement data.
△ Less
Submitted 6 May, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks
Authors:
Takuo Matsubara,
Chris J. Oates,
François-Xavier Briol
Abstract:
Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is propose…
▽ More
Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate Gaussian process covariance function for the task at hand. Our approach constructs a prior distribution for the parameters of the network, called a ridgelet prior, that approximates the posited Gaussian process in the output space of the network. In contrast to existing work on the connection between neural networks and Gaussian processes, our analysis is non-asymptotic, with finite sample-size error bounds provided. This establishes the universality property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular. Our experimental assessment is limited to a proof-of-concept, where we demonstrate that the ridgelet prior can out-perform an unstructured prior on regression problems for which a suitable Gaussian process prior can be provided.
△ Less
Submitted 11 January, 2022; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization
Authors:
Shi**g Si,
Chris. J. Oates,
Andrew B. Duncan,
Lawrence Carin,
François-Xavier Briol
Abstract:
Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that…
▽ More
Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support.
△ Less
Submitted 21 July, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Bayesian Probabilistic Numerical Integration with Tree-Based Models
Authors:
Harrison Zhu,
Xing Liu,
Ruya Kang,
Zhichao Shen,
Seth Flaxman,
François-Xavier Briol
Abstract:
Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows users to quantify their uncertainty about the solution. The standard approach to BQ is based on a Gaussian process (GP) approximation of the integrand. As a result, BQ is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting ver…
▽ More
Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows users to quantify their uncertainty about the solution. The standard approach to BQ is based on a Gaussian process (GP) approximation of the integrand. As a result, BQ is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting very high-dimensional or non-smooth target functions. This paper proposes to tackle this issue with a new Bayesian numerical integration algorithm based on Bayesian Additive Regression Trees (BART) priors, which we call BART-Int. BART priors are easy to tune and well-suited for discontinuous functions. We demonstrate that they also lend themselves naturally to a sequential design setting and that explicit convergence rates can be obtained in a variety of settings. The advantages and disadvantages of this new methodology are highlighted on a set of benchmark tests including the Genz functions, and on a Bayesian survey design problem.
△ Less
Submitted 2 December, 2021; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Joint Modeling of Received Power, Mean Delay, and Delay Spread for Wideband Radio Channels
Authors:
Ayush Bharti,
Ramoni Adeogun,
Xuesong Cai,
Wei Fan,
François-Xavier Briol,
Laurent Clavier,
Troels Pedersen
Abstract:
We propose a multivariate log-normal distribution to jointly model received power, mean delay, and root mean square (rms) delay spread of wideband radio channels, referred to as the standardized temporal moments. The model is validated using experimental data collected from five different measurement campaigns (four indoor and one outdoor scenario). We observe that the received power, mean delay a…
▽ More
We propose a multivariate log-normal distribution to jointly model received power, mean delay, and root mean square (rms) delay spread of wideband radio channels, referred to as the standardized temporal moments. The model is validated using experimental data collected from five different measurement campaigns (four indoor and one outdoor scenario). We observe that the received power, mean delay and rms delay spread are correlated random variables and, therefore, should be simulated jointly. Joint models are able to capture the structure of the underlying process, unlike the independent models considered in the literature. The proposed model of the multivariate log-normal distribution is found to be a good fit for a large number of wideband data-sets.
△ Less
Submitted 6 January, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Convergence Guarantees for Gaussian Process Means With Misspecified Likelihoods and Smoothness
Authors:
George Wynne,
François-Xavier Briol,
Mark Girolami
Abstract:
Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness o…
▽ More
Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness of the model and the likelihood function are misspecified. In this setting, an important theoretical question of practial relevance is how accurate the Gaussian process approximations will be given the difficulty of the problem, our model and the extent of the misspecification. The answer to this problem is particularly useful since it can inform our choice of model and experimental design. In particular, we describe how the experimental design and choice of kernel and kernel hyperparameters can be adapted to alleviate model misspecification.
△ Less
Submitted 18 May, 2021; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Contributed Discussion of "A Bayesian Conjugate Gradient Method"
Authors:
Francois-Xavier Briol,
Francisco A. Diaz De la O,
Peter O. Hristov
Abstract:
We would like to congratulate the authors of "A Bayesian Conjugate Gradient Method" on their insightful paper, and welcome this publication which we firmly believe will become a fundamental contribution to the growing field of probabilistic numerical methods and in particular the sub-field of Bayesian numerical methods. In this short piece, which will be published as a comment alongside the main p…
▽ More
We would like to congratulate the authors of "A Bayesian Conjugate Gradient Method" on their insightful paper, and welcome this publication which we firmly believe will become a fundamental contribution to the growing field of probabilistic numerical methods and in particular the sub-field of Bayesian numerical methods. In this short piece, which will be published as a comment alongside the main paper, we first initiate a discussion on the choice of priors for solving linear systems, then propose an extension of the Bayesian conjugate gradient (BayesCG) algorithm for solving several related linear systems simultaneously.
△ Less
Submitted 8 August, 2019;
originally announced August 2019.
-
Minimum Stein Discrepancy Estimators
Authors:
Alessandro Barp,
Francois-Xavier Briol,
Andrew B. Duncan,
Mark Girolami,
Lester Mackey
Abstract:
When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co…
▽ More
When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities.
△ Less
Submitted 5 October, 2022; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Statistical Inference for Generative Models with Maximum Mean Discrepancy
Authors:
Francois-Xavier Briol,
Alessandro Barp,
Andrew B. Duncan,
Mark Girolami
Abstract:
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation…
▽ More
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Stein Point Markov Chain Monte Carlo
Authors:
Wilson Ye Chen,
Alessandro Barp,
François-Xavier Briol,
Jackson Gorham,
Mark Girolami,
Lester Mackey,
Chris. J. Oates
Abstract:
An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain ea…
▽ More
An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established.
△ Less
Submitted 14 September, 2020; v1 submitted 9 May, 2019;
originally announced May 2019.
-
Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"
Authors:
Francois-Xavier Briol,
Chris J. Oates,
Mark Girolami,
Michael A. Osborne,
Dino Sejdinovic
Abstract:
This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comme…
▽ More
This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation?
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Stein Points
Authors:
Wilson Ye Chen,
Lester Mackey,
Jackson Gorham,
François-Xavier Briol,
Chris J. Oates
Abstract:
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present `Stein P…
▽ More
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present `Stein Points'. The idea is to exploit either a greedy or a conditional gradient method to iteratively minimise a kernel Stein discrepancy between the empirical measure and $p(x)$. Our empirical results demonstrate that Stein Points enable accurate approximation of the posterior at modest computational cost. In addition, theoretical results are provided to establish convergence of the method.
△ Less
Submitted 19 June, 2018; v1 submitted 27 March, 2018;
originally announced March 2018.
-
Bayesian Quadrature for Multiple Related Integrals
Authors:
Xiaoyue Xi,
François-Xavier Briol,
Mark Girolami
Abstract:
Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional a…
▽ More
Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional advantages, such as the possibility of transferring information between several numerical methods. This allows users to represent uncertainty in a more faithful manner and, as a by-product, provide increased numerical efficiency. We propose the first such numerical method by extending the well-known Bayesian quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics.
△ Less
Submitted 30 July, 2018; v1 submitted 12 January, 2018;
originally announced January 2018.
-
On the Sampling Problem for Kernel Quadrature
Authors:
Francois-Xavier Briol,
Chris J. Oates,
Jon Cockayne,
Wilson Ye Chen,
Mark Girolami
Abstract:
The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the…
▽ More
The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the random points. In contrast to standard Monte Carlo integration, for which optimal importance sampling is well-understood, the sampling distribution that minimises $C$ for Kernel Quadrature does not admit a closed form. This paper argues that the practical choice of sampling distribution is an important open problem. One solution is considered; a novel automatic approach based on adaptive tempering and sequential Monte Carlo. Empirical results demonstrate a dramatic reduction in integration error of up to 4 orders of magnitude can be achieved with the proposed method.
△ Less
Submitted 11 June, 2017;
originally announced June 2017.
-
Geometry and Dynamics for Markov Chain Monte Carlo
Authors:
Alessandro Barp,
Francois-Xavier Briol,
Anthony D. Kennedy,
Mark Girolami
Abstract:
Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a…
▽ More
Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a series of authors through the last thirty years. However, there is currently a gap between the intuitions and knowledge of users of the methodology and our deep understanding of these theoretical foundations. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field which we believe will become increasingly relevant to applied scientists.
△ Less
Submitted 8 May, 2017;
originally announced May 2017.
-
Comments on "Bayesian Solution Uncertainty Quantification for Differential Equations" by Chkrebtii, Campbell, Calderhead & Girolami
Authors:
Francois-Xavier Briol,
Jon Cockayne,
Onur Teymur
Abstract:
We commend the authors for an exciting paper which provides a strong contribution to the emerging field of probabilistic numerics (PN). Below, we discuss aspects of prior modelling which need to be considered thoroughly in future work.
We commend the authors for an exciting paper which provides a strong contribution to the emerging field of probabilistic numerics (PN). Below, we discuss aspects of prior modelling which need to be considered thoroughly in future work.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Probabilistic Models for Integration Error in the Assessment of Functional Cardiac Models
Authors:
Chris. J. Oates,
Steven Niederer,
Angela Lee,
François-Xavier Briol,
Mark Girolami
Abstract:
This paper studies the numerical computation of integrals, representing estimates or predictions, over the output $f(x)$ of a computational model with respect to a distribution $p(\mathrm{d}x)$ over uncertain inputs $x$ to the model. For the functional cardiac models that motivate this work, neither $f$ nor $p$ possess a closed-form expression and evaluation of either requires $\approx$ 100 CPU ho…
▽ More
This paper studies the numerical computation of integrals, representing estimates or predictions, over the output $f(x)$ of a computational model with respect to a distribution $p(\mathrm{d}x)$ over uncertain inputs $x$ to the model. For the functional cardiac models that motivate this work, neither $f$ nor $p$ possess a closed-form expression and evaluation of either requires $\approx$ 100 CPU hours, precluding standard numerical integration methods. Our proposal is to treat integration as an estimation problem, with a joint model for both the a priori unknown function $f$ and the a priori unknown distribution $p$. The result is a posterior distribution over the integral that explicitly accounts for dual sources of numerical approximation error due to a severely limited computational budget. This construction is applied to account, in a statistically principled manner, for the impact of numerical errors that (at present) are confounding factors in functional cardiac model assessment.
△ Less
Submitted 12 December, 2017; v1 submitted 22 June, 2016;
originally announced June 2016.
-
Convergence Rates for a Class of Estimators Based on Stein's Method
Authors:
Chris J. Oates,
Jon Cockayne,
François-Xavier Briol,
Mark Girolami
Abstract:
Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein's method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper…
▽ More
Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein's method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper is to establish theoretical bounds on convergence rates for a class of estimators based on Stein's method. Our analysis accounts for (i) the degree of smoothness of the sampling distribution and test function, (ii) the dimension of the state space, and (iii) the case of non-independent samples arising from a Markov chain. These results provide insight into the rapid convergence of gradient-based estimators observed for low-dimensional problems, as well as clarifying a curse-of-dimension that appears inherent to such methods.
△ Less
Submitted 27 December, 2017; v1 submitted 10 March, 2016;
originally announced March 2016.
-
Probabilistic Integration: A Role in Statistical Computation?
Authors:
François-Xavier Briol,
Chris. J. Oates,
Mark Girolami,
Michael A. Osborne,
Dino Sejdinovic
Abstract:
A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilist…
▽ More
A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the presence of an unknown numerical error. Our main technical contribution is to establish, for the first time, rates of posterior contraction for these methods. These show that probabilistic integrators can in principle enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assess the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir.
△ Less
Submitted 18 October, 2017; v1 submitted 2 December, 2015;
originally announced December 2015.
-
Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees
Authors:
François-Xavier Briol,
Chris J. Oates,
Mark Girolami,
Michael A. Osborne
Abstract:
There is renewed interest in formulating integration as an inference problem, motivated by obtaining a full distribution over numerical error that can be propagated through subsequent computation. Current methods, such as Bayesian Quadrature, demonstrate impressive empirical performance but lack theoretical analysis. An important challenge is to reconcile these probabilistic integrators with rigor…
▽ More
There is renewed interest in formulating integration as an inference problem, motivated by obtaining a full distribution over numerical error that can be propagated through subsequent computation. Current methods, such as Bayesian Quadrature, demonstrate impressive empirical performance but lack theoretical analysis. An important challenge is to reconcile these probabilistic integrators with rigorous convergence guarantees. In this paper, we present the first probabilistic integrator that admits such theoretical treatment, called Frank-Wolfe Bayesian Quadrature (FWBQ). Under FWBQ, convergence to the true value of the integral is shown to be exponential and posterior contraction rates are proven to be superexponential. In simulations, FWBQ is competitive with state-of-the-art methods and out-performs alternatives based on Frank-Wolfe optimisation. Our approach is applied to successfully quantify numerical error in the solution to a challenging model choice problem in cellular biology.
△ Less
Submitted 6 December, 2015; v1 submitted 8 June, 2015;
originally announced June 2015.
-
A numerical study of the 3D random interchange and random loop models
Authors:
Alessandro Barp,
Edoardo Gabriele Barp,
Francois-Xavier Briol,
Daniel Ueltschi
Abstract:
We have studied numerically the random interchange model and related loop models on the three-dimensional cubic lattice. We have determined the transition time for the occurrence of long loops. The joint distribution of the lengths of long loops is Poisson-Dirichlet with parameter 1 or 1/2.
We have studied numerically the random interchange model and related loop models on the three-dimensional cubic lattice. We have determined the transition time for the occurrence of long loops. The joint distribution of the lengths of long loops is Poisson-Dirichlet with parameter 1 or 1/2.
△ Less
Submitted 14 July, 2015; v1 submitted 5 May, 2015;
originally announced May 2015.