Search | arXiv e-print repository

The Bayesian approach to inverse Robin problems

Authors: Aksel Kaastrup Rasmussen, Fanny Seizilles, Mark Girolami, Ieva Kazlauskaite

Abstract: In this paper we investigate the Bayesian approach to inverse Robin problems. These are problems for certain elliptic boundary value problems of determining a Robin coefficient on a hidden part of the boundary from Cauchy data on the observable part. Such a nonlinear inverse problem arises naturally in the initialisation of large-scale ice sheet models that are crucial in climate and sea-level pre… ▽ More In this paper we investigate the Bayesian approach to inverse Robin problems. These are problems for certain elliptic boundary value problems of determining a Robin coefficient on a hidden part of the boundary from Cauchy data on the observable part. Such a nonlinear inverse problem arises naturally in the initialisation of large-scale ice sheet models that are crucial in climate and sea-level predictions. We motivate the Bayesian approach for a prototypical Robin inverse problem by showing that the posterior mean converges in probability to the data-generating ground truth as the number of observations increases. Related to the stability theory for inverse Robin problems, we establish a logarithmic convergence rate for Sobolev-regular Robin coefficients, whereas for analytic coefficients we can attain an algebraic rate. The use of rescaled analytic Gaussian priors in posterior consistency for nonlinear inverse problems is new and may be of separate interest in other inverse problems. Our numerical results illustrate the convergence property in two observation settings. △ Less

Submitted 29 November, 2023; originally announced November 2023.

MSC Class: 35R30; 62G20; 62F15

arXiv:2303.18059 [pdf, other]

Inferring networks from time series: a neural approach

Authors: Thomas Gaskin, Grigorios A. Pavliotis, Mark Girolami

Abstract: Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work we present a powerful computational method to infer large network adjacency matrices from time series data… ▽ More Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work we present a powerful computational method to infer large network adjacency matrices from time series data using a neural network, in order to provide uncertainty quantification on the prediction in a manner that reflects both the degree to which the inference problem is underdetermined as well as the noise on the data. This is a feature that other approaches have hitherto been lacking. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from its response to a power cut, providing probability densities on each edge and allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the cut. Our method is significantly more accurate than both Markov-chain Monte Carlo sampling and least squares regression on noisy data and when the problem is underdetermined, while naturally extending to the case of non-linear dynamics, which we demonstrate by learning an entire cost matrix for a non-linear model of economic activity in Greater London. Not having been specifically engineered for network inference, this method in fact represents a general parameter estimation scheme that is applicable to any high-dimensional parameter space. △ Less

Submitted 1 November, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

MSC Class: 68T07; 49M41; 65K05; 37A50 ACM Class: G.1.6; I.2.8; G.3; J.2

arXiv:2303.13429 [pdf, other]

Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood Estimation

Authors: Ö. Deniz Akyildiz, Francesca Romana Crucinio, Mark Girolami, Tim Johnston, Sotirios Sabanis

Abstract: We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation (MMLE) procedure to estimate the parameters of a latent variable model. We achieve this by formulating a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space of parameters and latent variables. In particular, we prove that the pa… ▽ More We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation (MMLE) procedure to estimate the parameters of a latent variable model. We achieve this by formulating a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space of parameters and latent variables. In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure where number of particles acts as the inverse temperature parameter in classical settings for global optimisation. Using a particular rescaling, we then prove geometric ergodicity of this system and bound the discretisation error in a manner that is uniform in time and does not increase with the number of particles. The discretisation results in an algorithm, termed Interacting Particle Langevin Algorithm (IPLA) which can be used for MMLE. We further prove nonasymptotic bounds for the optimisation error of our estimator in terms of key parameters of the problem, and also extend this result to the case of stochastic gradients covering practical scenarios. We provide numerical experiments to illustrate the empirical behaviour of our algorithm in the context of logistic regression with verifiable assumptions. Our setting provides a straightforward way to implement a diffusion-based optimisation routine compared to more classical approaches such as the Expectation Maximisation (EM) algorithm, and allows for especially explicit nonasymptotic bounds. △ Less

Submitted 11 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 38 pages

arXiv:2301.11040 [pdf, other]

Random Grid Neural Processes for Parametric Partial Differential Equations

Authors: Arnaud Vadeboncoeur, Ieva Kazlauskaite, Yanni Papandreou, Fehmi Cirak, Mark Girolami, Ömer Deniz Akyildiz

Abstract: We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this… ▽ More We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this spatial statistics view, we solve forward and inverse problems for parametric PDEs in a way that leads to the construction of Gaussian process models of solution fields. The implementation of these random grids poses a unique set of challenges for inverse physics informed deep learning frameworks and we propose a new architecture called Grid Invariant Convolutional Networks (GICNets) to overcome these challenges. We further show how to incorporate noisy data in a principled manner into our physics informed model to improve predictions for problems where data may be available but whose measurement location does not coincide with any fixed mesh or grid. The proposed method is tested on a nonlinear Poisson problem, Burgers equation, and Navier-Stokes equations, and we provide extensive numerical comparisons. We demonstrate significant computational advantages over current physics informed neural learning methods for parametric PDEs while improving the predictive capabilities and flexibility of these models. △ Less

Submitted 7 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2209.13565 [pdf, other]

doi 10.1073/pnas.2216415120

Neural parameter calibration for large-scale multi-agent models

Authors: Thomas Gaskin, Grigorios A. Pavliotis, Mark Girolami

Abstract: Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiolo… ▽ More Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multi-agent models acting as forward solvers for systems of ordinary or stochastic differential equations, and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection, and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a non-convex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster. △ Less

Submitted 31 January, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Report number: 120, 7 MSC Class: 68T07; 49M41; 65K05; 37A50 ACM Class: G.1.6; I.2.8; G.3; J.2

Journal ref: PNAS 2023

arXiv:2209.12835 [pdf, ps, other]

Targeted Separation and Convergence with Kernel Discrepancies

Authors: Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey

Abstract: Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to… ▽ More Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent. △ Less

Submitted 6 December, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2208.04856 [pdf, other]

doi 10.1016/j.jcp.2023.112369

Fully probabilistic deep models for forward and inverse problems in parametric PDEs

Authors: Arnaud Vadeboncoeur, Ömer Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami, Fehmi Cirak

Abstract: We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coheren… ▽ More We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coherent framework. In the posited probabilistic model, both the forward and inverse maps are approximated as Gaussian distributions with a mean and covariance parameterized by deep neural networks. The PDE residual is assumed to be an observed random vector of value zero, hence we model it as a random vector with a zero mean and a user-prescribed covariance. The model is trained by maximizing the probability, that is the evidence or marginal likelihood, of observing a residual of zero by maximizing the evidence lower bound (ELBO). Consequently, the proposed methodology does not require any independent PDE solves and is physics-informed at training time, allowing the real-time solution of PDE forward and inverse problems after training. The proposed framework can be easily extended to seamlessly integrate observed data to solve inverse problems and to build generative models. We demonstrate the efficiency and robustness of our method on finite element discretized parametric PDE problems such as linear and nonlinear Poisson problems, elastic shells with complex 3D geometries, and time-dependent nonlinear and inhomogeneous PDEs using a physics-informed neural network (PINN) discretization. We achieve up to three orders of magnitude speed-up after training compared to traditional finite element method (FEM), while outputting coherent uncertainty estimates. △ Less

Submitted 14 July, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2203.10592 [pdf, other]

doi 10.1016/bs.host.2022.03.005

Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Authors: Alessandro Barp, Lancelot Da Costa, Guilherme França, Karl Friston, Mark Girolami, Michael I. Jordan, Grigorios A. Pavliotis

Abstract: In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving pr… ▽ More In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving processes, information divergences, Poisson geometry, and geometric integration. Specifically, we explain how (i) leveraging the symplectic geometry of Hamiltonian systems enable us to construct (accelerated) sampling and optimisation methods, (ii) the theory of Hilbertian subspaces and Stein operators provides a general methodology to obtain robust estimators, (iii) preserving the information geometry of decision-making yields adaptive agents that perform active inference. Throughout, we emphasise the rich connections between these fields; e.g., inference draws on sampling and optimisation, and adaptive decision-making assesses decisions by inferring their counterfactual consequences. Our exposition provides a conceptual overview of underlying ideas, rather than a technical discussion, which can be found in the references herein. △ Less

Submitted 25 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: 30 pages, 4 figures; 42 pages including table of contents and references

Journal ref: Handbook of Statistics, vol. 46, pp. 21--78 (2022)

arXiv:2201.07543 [pdf, other]

Error analysis for a statistical finite element method

Authors: Toni Karvonen, Fehmi Cirak, Mark Girolami

Abstract: The recently proposed statistical finite element (statFEM) approach synthesises measurement data with finite element models and allows for making predictions about the true system response. We provide a probabilistic error analysis for a prototypical statFEM setup based on a Gaussian process prior under the assumption that the noisy measurement data are generated by a deterministic true system res… ▽ More The recently proposed statistical finite element (statFEM) approach synthesises measurement data with finite element models and allows for making predictions about the true system response. We provide a probabilistic error analysis for a prototypical statFEM setup based on a Gaussian process prior under the assumption that the noisy measurement data are generated by a deterministic true system response function that satisfies a second-order elliptic partial differential equation for an unknown true source term. In certain cases, properties such as the smoothness of the source term may be misspecified by the Gaussian process model. The error estimates we derive are for the expectation with respect to the measurement noise of the $L^2$-norm of the difference between the true system response and the mean of the statFEM posterior. The estimates imply polynomial rates of convergence in the numbers of measurement points and finite element basis functions and depend on the Sobolev smoothness of the true source term and the Gaussian process model. A numerical example for Poisson's equation is used to illustrate these theoretical results. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2111.07691 [pdf, other]

Theoretical Guarantees for the Statistical Finite Element Method

Authors: Yanni Papandreou, Jon Cockayne, Mark Girolami, Andrew B. Duncan

Abstract: The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar co… ▽ More The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar convergence properties to the finite element method on which it is based. Our results constitute a bound on the Wasserstein-2 distance between the ideal prior and posterior and the StatFEM approximation thereof, and show that this distance converges at the same mesh-dependent rate as finite element solutions converge to the true solution. Several numerical examples are presented to demonstrate our theory, including an example which test the robustness of StatFEM when extended to nonlinear quantities of interest. △ Less

Submitted 18 February, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 27 pages for main article, 11 pages for supplement, 8 figures; typos corrected

MSC Class: 65N75 (Primary) 35R60; 65C30 (Secondary)

arXiv:2110.11131 [pdf, other]

Statistical Finite Elements via Langevin Dynamics

Authors: Ömer Deniz Akyildiz, Connor Duffin, Sotirios Sabanis, Mark Girolami

Abstract: The recent statistical finite element method (statFEM) provides a coherent statistical framework to synthesise finite element models with observed data. Through embedding uncertainty inside of the governing equations, finite element solutions are updated to give a posterior distribution which quantifies all sources of uncertainty associated with the model. However to incorporate all sources of unc… ▽ More The recent statistical finite element method (statFEM) provides a coherent statistical framework to synthesise finite element models with observed data. Through embedding uncertainty inside of the governing equations, finite element solutions are updated to give a posterior distribution which quantifies all sources of uncertainty associated with the model. However to incorporate all sources of uncertainty, one must integrate over the uncertainty associated with the model parameters, the known forward problem of uncertainty quantification. In this paper, we make use of Langevin dynamics to solve the statFEM forward problem, studying the utility of the unadjusted Langevin algorithm (ULA), a Metropolis-free Markov chain Monte Carlo sampler, to build a sample-based characterisation of this otherwise intractable measure. Due to the structure of the statFEM problem, these methods are able to solve the forward problem without explicit full PDE solves, requiring only sparse matrix-vector products. ULA is also gradient-based, and hence provides a scalable approach up to high degrees-of-freedom. Leveraging the theory behind Langevin-based samplers, we provide theoretical guarantees on sampler performance, demonstrating convergence, for both the prior and posterior, in the Kullback-Leibler divergence, and, in Wasserstein-2, with further results on the effect of preconditioning. Numerical experiments are also provided, for both the prior and posterior, to demonstrate the efficacy of the sampler, with a Python package also included. △ Less

Submitted 27 December, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: New version, experiments updated

arXiv:2109.04757 [pdf, other]

doi 10.1016/j.jcp.2022.111261

Low-rank statistical finite elements for scalable model-data synthesis

Authors: Connor Duffin, Edward Cripps, Thomas Stemler, Mark Girolami

Abstract: Statistical learning additions to physically derived mathematical models are gaining traction in the literature. A recent approach has been to augment the underlying physics of the governing equations with data driven Bayesian statistical methodology. Coined statFEM, the method acknowledges a priori model misspecification, by embedding stochastic forcing within the governing equations. Upon receip… ▽ More Statistical learning additions to physically derived mathematical models are gaining traction in the literature. A recent approach has been to augment the underlying physics of the governing equations with data driven Bayesian statistical methodology. Coined statFEM, the method acknowledges a priori model misspecification, by embedding stochastic forcing within the governing equations. Upon receipt of additional data, the posterior distribution of the discretised finite element solution is updated using classical Bayesian filtering techniques. The resultant posterior jointly quantifies uncertainty associated with the ubiquitous problem of model misspecification and the data intended to represent the true process of interest. Despite this appeal, computational scalability is a challenge to statFEM's application to high-dimensional problems typically experienced in physical and industrial contexts. This article overcomes this hurdle by embedding a low-rank approximation of the underlying dense covariance matrix, obtained from the leading order modes of the full-rank alternative. Demonstrated on a series of reaction-diffusion problems of increasing dimension, using experimental and simulated data, the method reconstructs the sparsely observed data-generating processes with minimal loss of information, in both the posterior mean and variance, paving the way for further integration of physical and probabilistic approaches to complex systems. △ Less

Submitted 21 March, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: 33 pages, 14 figures, revised version

arXiv:2107.11231 [pdf, other]

Optimization on manifolds: A symplectic approach

Authors: Guilherme França, Alessandro Barp, Mark Girolami, Michael I. Jordan

Abstract: Optimization tasks are crucial in statistical machine learning. Recently, there has been great interest in leveraging tools from dynamical systems to derive accelerated and robust optimization methods via suitable discretizations of continuous-time systems. However, these ideas have mostly been limited to Euclidean spaces and unconstrained settings, or to Riemannian gradient flows. In this work, w… ▽ More Optimization tasks are crucial in statistical machine learning. Recently, there has been great interest in leveraging tools from dynamical systems to derive accelerated and robust optimization methods via suitable discretizations of continuous-time systems. However, these ideas have mostly been limited to Euclidean spaces and unconstrained settings, or to Riemannian gradient flows. In this work, we propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems over smooth manifolds, including problems with nonlinear constraints. We develop geometric/symplectic numerical integrators on manifolds that are "rate-matching," i.e., preserve the continuous-time rates of convergence. In particular, we introduce a dissipative RATTLE integrator able to achieve optimal convergence rate locally. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts. △ Less

Submitted 4 July, 2023; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: additional results, including rates for constrained optimization on manifolds

arXiv:2105.02845 [pdf, ps, other]

A Unifying and Canonical Description of Measure-Preserving Diffusions

Authors: Alessandro Barp, So Takao, Michael Betancourt, Alexis Arnaudon, Mark Girolami

Abstract: A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. In this paper, we develop a geometric theory that improves and generalises this construction to any manifold. We thereby demonstrate that the completeness result is a direct consequence of the topology of the underlying manifold and the geometry induc… ▽ More A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. In this paper, we develop a geometric theory that improves and generalises this construction to any manifold. We thereby demonstrate that the completeness result is a direct consequence of the topology of the underlying manifold and the geometry induced by the target measure $P$; there is no need to introduce other structures such as a Riemannian metric, local coordinates, or a reference measure. Instead, our framework relies on the intrinsic geometry of $P$ and in particular its canonical derivative, the deRham rotationnel, which allows us to parametrise the Fokker--Planck currents of measure-preserving diffusions using potentials. The geometric formalism can easily incorporate constraints and symmetries, and deliver new important insights, for example, a new complete recipe of Langevin-like diffusions that are suited to the construction of samplers. We also analyse the reversibility and dissipative properties of the diffusions, the associated deterministic flow on the space of measures, and the geometry of Langevin processes. Our article connects ideas from various literature and frames the theory of measure-preserving diffusions in its appropriate mathematical context. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2103.13729 [pdf, other]

Digital twinning of self-sensing structures using the statistical finite element method

Authors: Eky Febrianto, Liam Butler, Mark Girolami, Fehmi Cirak

Abstract: The monitoring of infrastructure assets using sensor networks is becoming increasingly prevalent. A digital twin in the form of a finite element model, as used in design and construction, can help make sense of the copious amount of collected sensor data. This paper demonstrates the application of the statistical finite element method (statFEM), which provides a consistent and principled means for… ▽ More The monitoring of infrastructure assets using sensor networks is becoming increasingly prevalent. A digital twin in the form of a finite element model, as used in design and construction, can help make sense of the copious amount of collected sensor data. This paper demonstrates the application of the statistical finite element method (statFEM), which provides a consistent and principled means for synthesising data and physics-based models, in develo** a digital twin of a self-sensing structure. As a case study, an instrumented steel railway bridge of 27.34 m length located along the West Coast Mainline near Staffordshire in the UK is considered. Using strain data captured from fibre Bragg grating (FBG) sensors at 108 locations along the bridge superstructure, statFEM can predict the `true' system response while taking into account the uncertainties in sensor readings, applied loading and finite element model misspecification errors. Longitudinal strain distributions along the two main I-beams are both measured and modelled during the passage of a passenger train. The digital twin, because of its physics-based component, is able to generate reasonable strain distribution predictions at locations where no measurement data is available, including at several points along the main I-beams and on structural elements on which sensors are not even installed. The implications for long-term structural health monitoring and assessment include optimisation of sensor placement, and performing more reliable what-if analyses at locations and under loading scenarios for which no measurement data is available. △ Less

Submitted 28 July, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2004.12654 [pdf, other]

Integration in reproducing kernel Hilbert spaces of Gaussian kernels

Authors: Toni Karvonen, Chris J. Oates, Mark Girolami

Abstract: The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm for efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of a… ▽ More The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm for efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of algorithms that use $N$ evaluations to integrate $d$-variate functions reproduced by Gaussian kernels and prove the exponential or super-algebraic decay of their worst-case errors. In contrast to earlier work, no constraints are placed on the length-scale parameter of the Gaussian kernel. The first class of algorithms is obtained via an appropriate scaling of the classical Gauss-Hermite rules. For these algorithms we derive lower and upper bounds on the worst-case error of the forms $\exp(-c_1 N^{1/d}) N^{1/(4d)}$ and $\exp(-c_2 N^{1/d}) N^{-1/(4d)}$, respectively, for positive constants $c_1 > c_2$. The second class of algorithms we construct is more flexible and uses worst-case optimal weights for points that may be taken as a nested sequence. For these algorithms we derive upper bounds of the form $\exp(-c_3 N^{1/(2d)})$ for a positive constant $c_3$. △ Less

Submitted 31 March, 2021; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: Accepted for publication in Mathematics of Computation

arXiv:2001.10818 [pdf, ps, other]

Convergence Guarantees for Gaussian Process Means With Misspecified Likelihoods and Smoothness

Authors: George Wynne, François-Xavier Briol, Mark Girolami

Abstract: Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness o… ▽ More Gaussian processes are ubiquitous in machine learning, statistics, and applied mathematics. They provide a flexible modelling framework for approximating functions, whilst simultaneously quantifying uncertainty. However, this is only true when the model is well-specified, which is often not the case in practice. In this paper, we study the properties of Gaussian process means when the smoothness of the model and the likelihood function are misspecified. In this setting, an important theoretical question of practial relevance is how accurate the Gaussian process approximations will be given the difficulty of the problem, our model and the extent of the misspecification. The answer to this problem is particularly useful since it can inform our choice of model and experimental design. In particular, we describe how the experimental design and choice of kernel and kernel hyperparameters can be adapted to alleviate model misspecification. △ Less

Submitted 18 May, 2021; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: Accepted to JMLR

arXiv:1907.07037 [pdf, other]

Embedded Ridge Approximations

Authors: Chun Yui Wong, Pranay Seshadri, Geoffrey Parks, Mark Girolami

Abstract: Many quantities of interest (qois) arising from differential-equation-centric models can be resolved into functions of scalar fields. Examples of such qois include the lift over an airfoil or the displacement of a loaded structure; examples of corresponding fields are the static pressure field in a computational fluid dynamics solution, and the strain field in the finite element elasticity analysi… ▽ More Many quantities of interest (qois) arising from differential-equation-centric models can be resolved into functions of scalar fields. Examples of such qois include the lift over an airfoil or the displacement of a loaded structure; examples of corresponding fields are the static pressure field in a computational fluid dynamics solution, and the strain field in the finite element elasticity analysis. These scalar fields are evaluated at each node within a discretised computational domain. In certain scenarios, the field at a certain node is only weakly influenced by far-field perturbations; it is likely to be strongly governed by local perturbations, which in turn can be caused by uncertainties in the geometry. One can interpret this as a strong anisotropy of the field with respect to uncertainties in prescribed inputs. We exploit this notion of localised scalar-field influence for approximating global qois, which often are integrals of certain field quantities. We formalise our ideas by assigning ridge approximations for the field at select nodes. This embedded ridge approximation has favorable theoretical properties for approximating a global qoi in terms of the reduced number of computational evaluations required. Parallels are drawn between our proposed approach, active subspaces and vector-valued dimension reduction. Additionally, we study the ridge directions of adjacent nodes and devise algorithms that can recover field quantities at selected nodes, when storing the ridge profiles at a subset of nodes---paving the way for novel reduced order modeling strategies. Our paper offers analytical and simulation-based examples that expose different facets of embedded ridge approximations. △ Less

Submitted 18 August, 2020; v1 submitted 16 July, 2019; originally announced July 2019.

arXiv:1906.08283 [pdf, other]

Minimum Stein Discrepancy Estimators

Authors: Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

Abstract: When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co… ▽ More When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities. △ Less

Submitted 5 October, 2022; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: Accepted for publication at NeurIPS 2019

arXiv:1906.05944 [pdf, other]

Statistical Inference for Generative Models with Maximum Mean Discrepancy

Authors: Francois-Xavier Briol, Alessandro Barp, Andrew B. Duncan, Mark Girolami

Abstract: While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation… ▽ More While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1905.06391 [pdf, other]

doi 10.1016/j.cma.2020.113533

The statistical finite element method (statFEM) for coherent synthesis of observation data and model predictions

Authors: Mark Girolami, Eky Febrianto, Ge Yin, Fehmi Cirak

Abstract: The increased availability of observation data from engineering systems in operation poses the question of how to incorporate this data into finite element models. To this end, we propose a novel statistical construction of the finite element method that provides the means of synthesising measurement data and finite element models. The Bayesian statistical framework is adopted to treat all the unc… ▽ More The increased availability of observation data from engineering systems in operation poses the question of how to incorporate this data into finite element models. To this end, we propose a novel statistical construction of the finite element method that provides the means of synthesising measurement data and finite element models. The Bayesian statistical framework is adopted to treat all the uncertainties present in the data, the mathematical model and its finite element discretisation. From the outset, we postulate a data-generating model which additively decomposes data into a finite element, a model misspecification and a noise component. Each of the components may be uncertain and is considered as a random variable with a respective prior probability density. The prior of the finite element component is given by a conventional stochastic forward problem. The prior probabilities of the model misspecification and measurement noise, without loss of generality, are assumed to have zero-mean and known covariance structure. Our proposed statistical model is hierarchical in the sense that each of the three random components may depend on non-observable random hyperparameters. Because of the hierarchical structure of the statistical model, Bayes rule is applied on three different levels in turn to infer the posterior densities of the three random components and hyperparameters. On level one, we determine the posterior densities of the finite element component and the true system response using the prior finite element density given by the forward problem and the data likelihood. On the next level, we infer the hyperparameter posterior densities from their respective priors and the marginal likelihood of the first inference problem. Finally, on level three we use Bayes rule to choose the most suitable finite element model in light of the observed data by computing the model posteriors. △ Less

Submitted 22 January, 2021; v1 submitted 15 May, 2019; originally announced May 2019.

arXiv:1905.03673 [pdf, other]

Stein Point Markov Chain Monte Carlo

Authors: Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates

Abstract: An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain ea… ▽ More An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established. △ Less

Submitted 14 September, 2020; v1 submitted 9 May, 2019; originally announced May 2019.

Comments: Minor bug fixed in Theorem 4 (result unchanged)

Journal ref: ICML 2019

arXiv:1811.10275 [pdf, ps, other]

Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"

Authors: Francois-Xavier Briol, Chris J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

Abstract: This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comme… ▽ More This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation? △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Accepted to Statistical Science

arXiv:1810.04946 [pdf, other]

A Riemann-Stein Kernel Method

Authors: Alessandro Barp, Chris. J. Oates, Emilio Porcu, Mark Girolami

Abstract: This paper proposes and studies a numerical method for approximation of posterior expectations based on interpolation with a Stein reproducing kernel. Finite-sample-size bounds on the approximation error are established for posterior distributions supported on a compact Riemannian manifold, and we relate these to a kernel Stein discrepancy (KSD). Moreover, we prove in our setting that the KSD is e… ▽ More This paper proposes and studies a numerical method for approximation of posterior expectations based on interpolation with a Stein reproducing kernel. Finite-sample-size bounds on the approximation error are established for posterior distributions supported on a compact Riemannian manifold, and we relate these to a kernel Stein discrepancy (KSD). Moreover, we prove in our setting that the KSD is equivalent to Sobolev discrepancy and, in doing so, we completely characterise the convergence-determining properties of KSD. Our contribution is rooted in a novel combination of Stein's method, the theory of reproducing kernels, and existence and regularity results for partial differential equations on a Riemannian manifold. △ Less

Submitted 11 January, 2022; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1801.05242 [pdf, other]

A Bayesian Conjugate Gradient Method

Authors: Jon Cockayne, Chris Oates, Ilse Ipsen, Mark Girolami

Abstract: A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this cas… ▽ More A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this case are of little value unless further information can be provided about the numerical error. In this paper we propose a novel statistical model for this numerical error set in a Bayesian framework. Our approach is a strict generalisation of the conjugate gradient method, which is recovered as the posterior mean for a particular choice of prior. The estimates obtained are analysed with Krylov subspace methods and a contraction result for the posterior is presented. The method is then analysed in a simulation study as well as being applied to a challenging problem in medical imaging. △ Less

Submitted 17 December, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

arXiv:1801.04153 [pdf, other]

Bayesian Quadrature for Multiple Related Integrals

Authors: Xiaoyue Xi, François-Xavier Briol, Mark Girolami

Abstract: Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional a… ▽ More Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional advantages, such as the possibility of transferring information between several numerical methods. This allows users to represent uncertainty in a more faithful manner and, as a by-product, provide increased numerical efficiency. We propose the first such numerical method by extending the well-known Bayesian quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics. △ Less

Submitted 30 July, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

Comments: Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80:5369-5378, 2018

arXiv:1711.11280 [pdf, ps, other]

How Deep Are Deep Gaussian Processes?

Authors: Matthew M. Dunlop, Mark A. Girolami, Andrew M. Stuart, Aretha L. Teckentrup

Abstract: Recent research has shown the potential utility of Deep Gaussian Processes. These deep structures are probability distributions, designed through hierarchical construction, which are conditionally Gaussian. In this paper, the current published body of work is placed in a common framework and, through recursion, several classes of deep Gaussian processes are defined. The resulting samples generated… ▽ More Recent research has shown the potential utility of Deep Gaussian Processes. These deep structures are probability distributions, designed through hierarchical construction, which are conditionally Gaussian. In this paper, the current published body of work is placed in a common framework and, through recursion, several classes of deep Gaussian processes are defined. The resulting samples generated from a deep Gaussian process have a Markovian structure with respect to the depth parameter, and the effective depth of the resulting process is interpreted in terms of the ergodicity, or non-ergodicity, of the resulting Markov chain. For the classes of deep Gaussian processes introduced, we provide results concerning their ergodicity and hence their effective depth. We also demonstrate how these processes may be used for inference; in particular we show how a Metropolis-within-Gibbs construction across the levels of the hierarchy can be used to derive sampling tools which are robust to the level of resolution used to represent the functions on a computer. For illustration, we consider the effect of ergodicity in some simple numerical examples. △ Less

Submitted 17 August, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

arXiv:1706.03369 [pdf, other]

On the Sampling Problem for Kernel Quadrature

Authors: Francois-Xavier Briol, Chris J. Oates, Jon Cockayne, Wilson Ye Chen, Mark Girolami

Abstract: The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the… ▽ More The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the random points. In contrast to standard Monte Carlo integration, for which optimal importance sampling is well-understood, the sampling distribution that minimises $C$ for Kernel Quadrature does not admit a closed form. This paper argues that the practical choice of sampling distribution is an important open problem. One solution is considered; a novel automatic approach based on adaptive tempering and sequential Monte Carlo. Empirical results demonstrate a dramatic reduction in integration error of up to 4 orders of magnitude can be achieved with the proposed method. △ Less

Submitted 11 June, 2017; originally announced June 2017.

Comments: To appear at Thirty-fourth International Conference on Machine Learning (ICML 2017)

Journal ref: Proceedings of the 34th International Conference on Machine Learning, PMLR 70:586-595, 2017

arXiv:1705.02891 [pdf, other]

Geometry and Dynamics for Markov Chain Monte Carlo

Authors: Alessandro Barp, Francois-Xavier Briol, Anthony D. Kennedy, Mark Girolami

Abstract: Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a… ▽ More Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a series of authors through the last thirty years. However, there is currently a gap between the intuitions and knowledge of users of the methodology and our deep understanding of these theoretical foundations. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field which we believe will become increasingly relevant to applied scientists. △ Less

Submitted 8 May, 2017; originally announced May 2017.

Comments: Submitted to "Annual Review of Statistics and Its Applications"

arXiv:1702.03673 [pdf, other]

doi 10.1137/17M1139357

Bayesian Probabilistic Numerical Methods

Authors: Jon Cockayne, Chris Oates, Tim Sullivan, Mark Girolami

Abstract: The emergent field of probabilistic numerics has thus far lacked clear statistical principals. This paper establishes Bayesian probabilistic numerical methods as those which can be cast as solutions to certain inverse problems within the Bayesian framework. This allows us to establish general conditions under which Bayesian probabilistic numerical methods are well-defined, encompassing both non-li… ▽ More The emergent field of probabilistic numerics has thus far lacked clear statistical principals. This paper establishes Bayesian probabilistic numerical methods as those which can be cast as solutions to certain inverse problems within the Bayesian framework. This allows us to establish general conditions under which Bayesian probabilistic numerical methods are well-defined, encompassing both non-linear and non-Gaussian models. For general computation, a numerical approximation scheme is proposed and its asymptotic convergence established. The theoretical development is then extended to pipelines of computation, wherein probabilistic numerical methods are composed to solve more challenging numerical tasks. The contribution highlights an important research frontier at the interface of numerical analysis and uncertainty quantification, with a challenging industrial application presented. △ Less

Submitted 7 July, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

Journal ref: SIAM Review 61(4):756--789, 2019

arXiv:1701.04006 [pdf, other]

doi 10.1063/1.4985359

Probabilistic Numerical Methods for PDE-constrained Bayesian Inverse Problems

Authors: Jon Cockayne, Chris Oates, Tim Sullivan, Mark Girolami

Abstract: This paper develops meshless methods for probabilistically describing discretisation error in the numerical solution of partial differential equations. This construction enables the solution of Bayesian inverse problems while accounting for the impact of the discretisation of the forward problem. In particular, this drives statistical inferences to be more conservative in the presence of significa… ▽ More This paper develops meshless methods for probabilistically describing discretisation error in the numerical solution of partial differential equations. This construction enables the solution of Bayesian inverse problems while accounting for the impact of the discretisation of the forward problem. In particular, this drives statistical inferences to be more conservative in the presence of significant solver error. Theoretical results are presented describing rates of convergence for the posteriors in both the forward and inverse problems. This method is tested on a challenging inverse problem with a nonlinear forward model. △ Less

Submitted 15 January, 2017; originally announced January 2017.

arXiv:1612.02989 [pdf, other]

Hyperpriors for Matérn fields with applications in Bayesian inversion

Authors: Lassi Roininen, Mark Girolami, Sari Lasanen, Markku Markkanen

Abstract: We introduce non-stationary Matérn field priors with stochastic partial differential equations, and construct correlation length-scaling with hyperpriors. We model both the hyperprior and the Matérn prior as continuous-parameter random fields. As hypermodels, we use Cauchy and Gaussian random fields, which we map suitably to a desired correlation length-scaling range. For computations, we discreti… ▽ More We introduce non-stationary Matérn field priors with stochastic partial differential equations, and construct correlation length-scaling with hyperpriors. We model both the hyperprior and the Matérn prior as continuous-parameter random fields. As hypermodels, we use Cauchy and Gaussian random fields, which we map suitably to a desired correlation length-scaling range. For computations, we discretise the models with finite difference methods. We consider the convergence of the discretised prior and posterior to the discretisation limit. We apply the developed methodology to certain interpolation and numerical differentiation problems, and show numerically that we can make Bayesian inversion which promotes competing constraints of smoothness and edge-preservation. For computing the conditional mean estimator of the posterior distribution, we use a combination of Gibbs and Metropolis-within-Gibbs sampling algorithms. △ Less

Submitted 9 December, 2016; originally announced December 2016.

arXiv:1605.07811 [pdf, other]

Probabilistic Numerical Methods for Partial Differential Equations and Bayesian Inverse Problems

Authors: Jon Cockayne, Chris Oates, Tim Sullivan, Mark Girolami

Abstract: This paper develops a probabilistic numerical method for solution of partial differential equations (PDEs) and studies application of that method to PDE-constrained inverse problems. This approach enables the solution of challenging inverse problems whilst accounting, in a statistically principled way, for the impact of discretisation error due to numerical solution of the PDE. In particular, the… ▽ More This paper develops a probabilistic numerical method for solution of partial differential equations (PDEs) and studies application of that method to PDE-constrained inverse problems. This approach enables the solution of challenging inverse problems whilst accounting, in a statistically principled way, for the impact of discretisation error due to numerical solution of the PDE. In particular, the approach confers robustness to failure of the numerical PDE solver, with statistical inferences driven to be more conservative in the presence of substantial discretisation error. Going further, the problem of choosing a PDE solver is cast as a problem in the Bayesian design of experiments, where the aim is to minimise the impact of solver error on statistical inferences; here the challenge of non-linear PDEs is also considered. The method is applied to parameter inference problems in which discretisation error in non-negligible and must be accounted for in order to reach conclusions that are statistically valid. △ Less

Submitted 11 July, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

arXiv:1603.03220 [pdf, other]

Convergence Rates for a Class of Estimators Based on Stein's Method

Authors: Chris J. Oates, Jon Cockayne, François-Xavier Briol, Mark Girolami

Abstract: Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein's method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper… ▽ More Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein's method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper is to establish theoretical bounds on convergence rates for a class of estimators based on Stein's method. Our analysis accounts for (i) the degree of smoothness of the sampling distribution and test function, (ii) the dimension of the state space, and (iii) the case of non-independent samples arising from a Markov chain. These results provide insight into the rapid convergence of gradient-based estimators observed for low-dimensional problems, as well as clarifying a curse-of-dimension that appears inherent to such methods. △ Less

Submitted 27 December, 2017; v1 submitted 10 March, 2016; originally announced March 2016.

Comments: To appear in Bernoulli, 2018

arXiv:1512.00933 [pdf, other]

Probabilistic Integration: A Role in Statistical Computation?

Authors: François-Xavier Briol, Chris. J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

Abstract: A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilist… ▽ More A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the presence of an unknown numerical error. Our main technical contribution is to establish, for the first time, rates of posterior contraction for these methods. These show that probabilistic integrators can in principle enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assess the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir. △ Less

Submitted 18 October, 2017; v1 submitted 2 December, 2015; originally announced December 2015.

Comments: Several improvements suggested by reviewers, including additional experiments on uncertainty quantification properties. Change of title: previously "Probabilistic Integration: A Role for Statisticians in Numerical Analysis?"

arXiv:1506.01326 [pdf, other]

doi 10.1098/rspa.2015.0142

Probabilistic Numerics and Uncertainty in Computations

Authors: Philipp Hennig, Michael A Osborne, Mark Girolami

Abstract: We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and i… ▽ More We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data has led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations. △ Less

Submitted 3 June, 2015; originally announced June 2015.

Comments: Author Generated Postprint. 17 pages, 4 Figures, 1 Table

arXiv:1411.6669 [pdf, ps, other]

Optimizing The Integrator Step Size for Hamiltonian Monte Carlo

Authors: M. J. Betancourt, Simon Byrne, Mark Girolami

Abstract: Hamiltonian Monte Carlo can provide powerful inference in complex statistical problems, but ultimately its performance is sensitive to various tuning parameters. In this paper we use the underlying geometry of Hamiltonian Monte Carlo to construct a universal optimization criteria for tuning the step size of the symplectic integrator crucial to any implementation of the algorithm as well as diagnos… ▽ More Hamiltonian Monte Carlo can provide powerful inference in complex statistical problems, but ultimately its performance is sensitive to various tuning parameters. In this paper we use the underlying geometry of Hamiltonian Monte Carlo to construct a universal optimization criteria for tuning the step size of the symplectic integrator crucial to any implementation of the algorithm as well as diagnostics to monitor for any signs of invalidity. An immediate outcome of this result is that the suggested target average acceptance probability of 0.651 can be relaxed to $0.6 \lesssim a \lesssim 0.9$ with larger values more robust in practice. △ Less

Submitted 2 February, 2015; v1 submitted 24 November, 2014; originally announced November 2014.

Comments: 36 pages, 5 figures

arXiv:1407.1517 [pdf, other]

doi 10.1088/0266-5611/30/11/114014

Solving Large-Scale PDE-constrained Bayesian Inverse Problems with Riemann Manifold Hamiltonian Monte Carlo

Authors: Tan Bui-Thanh, Mark Girolami

Abstract: We consider the Riemann manifold Hamiltonian Monte Carlo (RMHMC) method for solving statistical inverse problems governed by partial differential equations (PDEs). The power of the RMHMC method is that it exploits the geometric structure induced by the PDE constraints of the underlying inverse problem. Consequently, each RMHMC posterior sample is almost independent from the others providing statis… ▽ More We consider the Riemann manifold Hamiltonian Monte Carlo (RMHMC) method for solving statistical inverse problems governed by partial differential equations (PDEs). The power of the RMHMC method is that it exploits the geometric structure induced by the PDE constraints of the underlying inverse problem. Consequently, each RMHMC posterior sample is almost independent from the others providing statistically efficient Markov chain simulation. We reduce the cost of forming the Fisher information matrix by using a low rank approximation via a randomized singular value decomposition technique. This is efficient since a small number of Hessian-vector products are required. The Hessian-vector product in turn requires only two extra PDE solves using the adjoint technique. The results suggest RMHMC as a highly efficient simulation scheme for sampling from PDE induced posterior measures. △ Less

Submitted 6 July, 2014; originally announced July 2014.

Comments: To Appear in IoP Inverse Problems

arXiv:1309.2983 [pdf, ps, other]

doi 10.1016/j.spl.2014.04.002

Langevin diffusions and the Metropolis-adjusted Langevin algorithm

Authors: Tatiana Xifara, Chris Sherlock, Samuel Livingstone, Simon Byrne, Mark Girolami

Abstract: We provide a clarification of the description of Langevin diffusions on Riemannian manifolds and of the measure underlying the invariant density. As a result we propose a new position-dependent Metropolis-adjusted Langevin algorithm (MALA) based upon a Langevin diffusion in $\mathbb{R}^d$ which has the required invariant density with respect to Lebesgue measure. We show that our diffusion and the… ▽ More We provide a clarification of the description of Langevin diffusions on Riemannian manifolds and of the measure underlying the invariant density. As a result we propose a new position-dependent Metropolis-adjusted Langevin algorithm (MALA) based upon a Langevin diffusion in $\mathbb{R}^d$ which has the required invariant density with respect to Lebesgue measure. We show that our diffusion and the diffusion upon which a previously-proposed position-dependent MALA is based are equivalent in some cases but are distinct in general. A simulation study illustrates the gain in efficiency provided by the new position-dependent MALA. △ Less

Submitted 11 September, 2013; originally announced September 2013.

Journal ref: Statistics & Probability Letters. Volume 91, August 2014, pages 14-19

arXiv:1211.3759 [pdf, other]

doi 10.1080/10618600.2014.902764

Lagrangian Dynamical Monte Carlo

Authors: Shiwei Lan, Vassilios Stathopoulos, Babak Shahbaba, Mark Girolami

Abstract: Hamiltonian Monte Carlo (HMC) improves the computational efficiency of the Metropolis algorithm by reducing its random walk behavior. Riemannian Manifold HMC (RMHMC) further improves HMC's performance by exploiting the geometric properties of the parameter space. However, the geometric integrator used for RMHMC involves implicit equations that require costly numerical analysis (e.g., fixed-point i… ▽ More Hamiltonian Monte Carlo (HMC) improves the computational efficiency of the Metropolis algorithm by reducing its random walk behavior. Riemannian Manifold HMC (RMHMC) further improves HMC's performance by exploiting the geometric properties of the parameter space. However, the geometric integrator used for RMHMC involves implicit equations that require costly numerical analysis (e.g., fixed-point iteration). In some cases, the computational overhead for solving implicit equations undermines RMHMC's benefits. To avoid this problem, we propose an explicit geometric integrator that replaces the momentum variable in RMHMC by velocity. We show that the resulting transformation is equivalent to transforming Riemannian Hamilton dynamics to Lagrangian dynamics. Experimental results show that our method improves RMHMC's overall computational efficiency. All computer programs and data sets are available online (http://www.ics.uci.edu/~babaks/Site/Codes.html) in order to allow replications of the results reported in this paper. △ Less

Submitted 20 November, 2012; v1 submitted 15 November, 2012; originally announced November 2012.

Journal ref: Journal of Computational and Graphical Statistics, Volume 24, Issue 2, 2015

arXiv:0907.1100

Riemannian Manifold Hamiltonian Monte Carlo

Authors: Mark Girolami, Ben Calderhead, Siu A. Chin

Abstract: The paper proposes a Riemannian Manifold Hamiltonian Monte Carlo sampler to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The method provides a fully automated adaptation mechanism that circumvents the costly pilot runs required to tune proposal densities for Metropolis-Hastings or indee… ▽ More The paper proposes a Riemannian Manifold Hamiltonian Monte Carlo sampler to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The method provides a fully automated adaptation mechanism that circumvents the costly pilot runs required to tune proposal densities for Metropolis-Hastings or indeed Hybrid Monte Carlo and Metropolis Adjusted Langevin Algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The proposed method exploits the Riemannian structure of the parameter space of statistical models and thus automatically adapts to the local manifold structure at each step based on the metric tensor. A semi-explicit second order symplectic integrator for non-separable Hamiltonians is derived for simulating paths across this manifold which provides highly efficient convergence and exploration of the target density. The performance of the Riemannian Manifold Hamiltonian Monte Carlo method is assessed by performing posterior inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models, and Bayesian estimation of parameter posteriors of dynamical systems described by nonlinear differential equations. Substantial improvements in the time normalised Effective Sample Size are reported when compared to alternative sampling approaches. Matlab code at \url{http://www.dcs.gla.ac.uk/inference/rmhmc} allows replication of all results. △ Less

Submitted 17 December, 2019; v1 submitted 6 July, 2009; originally announced July 2009.

Comments: This paper has been withdrawn by the posting author because he is no longer a co-author of this work

Showing 1–41 of 41 results for author: Girolami, M