Search | arXiv e-print repository

Operator-informed score matching for Markov diffusion models

Abstract: Diffusion models are typically trained using score matching, yet score matching is agnostic to the particular forward process that defines the model. This paper argues that Markov diffusion models enjoy an advantage over other types of diffusion model, as their associated operators can be exploited to improve the training process. In particular, (i) there exists an explicit formal solution to the… ▽ More Diffusion models are typically trained using score matching, yet score matching is agnostic to the particular forward process that defines the model. This paper argues that Markov diffusion models enjoy an advantage over other types of diffusion model, as their associated operators can be exploited to improve the training process. In particular, (i) there exists an explicit formal solution to the forward process as a sequence of time-dependent kernel mean embeddings; and (ii) the derivation of score-matching and related estimators can be streamlined. Building upon (i), we propose Riemannian diffusion kernel smoothing, which ameliorates the need for neural score approximation, at least in the low-dimensional context; Building upon (ii), we propose operator-informed score matching, a variance reduction technique that is straightforward to implement in both low- and high-dimensional diffusion modeling and is demonstrated to improve score matching in an empirical proof-of-concept. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Preprint; 19 pages, 5 figures

arXiv:2405.13574 [pdf, other]

Reinforcement Learning for Adaptive MCMC

Authors: Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates

Abstract: An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is th… ▽ More An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $\approx 90 \%$ of tasks in the PosteriorDB benchmark. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2401.14537 [pdf, other]

The CIPM list "Recommended values of standard frequencies": 2021 update

Authors: H S Margolis, G Panfilo, G Petit, C Oates, T Ido, S Bize

Abstract: This paper gives a detailed account of the analysis underpinning the 2021 update to the list of standard reference frequency values recommended by the International Committee for Weights and Measures (CIPM). This update focused on a subset of atomic transitions that are secondary representations of the second (SRS) or considered as potential SRS. As in previous updates in 2015 and 2017, methods fo… ▽ More This paper gives a detailed account of the analysis underpinning the 2021 update to the list of standard reference frequency values recommended by the International Committee for Weights and Measures (CIPM). This update focused on a subset of atomic transitions that are secondary representations of the second (SRS) or considered as potential SRS. As in previous updates in 2015 and 2017, methods for analysing over-determined data sets were applied to make optimum use of the worldwide body of published clock comparison data. To ensure that these methods were robust, three independent calculations were performed using two different algorithms. The 2021 update differed from previous updates in taking detailed account of correlations among the input data, a step shown to be important in deriving unbiased frequency values and avoiding underestimation of their uncertainties. It also differed in the procedures used to assess input data and to assign uncertainties to the recommended frequency values, with previous practice being adapted to produce a fully consistent output data set consisting of frequency ratio values as well as absolute frequencies. These changes are significant in the context of an anticipated redefinition of the second in terms of an optical transition or transitions, since optical frequency ratio measurements will be critical for verifying the international consistency of optical clocks prior to the redefinition. In the meantime, the reduced uncertainties for optical SRS resulting from this analysis significantly increases the weight that secondary frequency standards based on these transitions can have in the steering of International Atomic Time (TAI). △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 22 pages, 5 figures

arXiv:2401.07562 [pdf, other]

Probabilistic Richardson Extrapolation

Authors: Chris. J. Oates, Toni Karvonen, Aretha L. Teckentrup, Marina Strocchi, Steven A. Niederer

Abstract: For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretised and convergence orders are not easily analysed. To address this challenge we present a probabilistic perspective on Richardson extrapolation, a point of view that un… ▽ More For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretised and convergence orders are not easily analysed. To address this challenge we present a probabilistic perspective on Richardson extrapolation, a point of view that unifies classical extrapolation methods with modern multi-fidelity modelling, and handles uncertain convergence orders by allowing these to be statistically estimated. The approach is developed using Gaussian processes, leading to Gauss-Richardson Extrapolation (GRE). Conditions are established under which extrapolation using the conditional mean achieves a polynomial (or even an exponential) speed-up compared to the original numerical method. Further, the probabilistic formulation unlocks the possibility of experimental design, casting the selection of fidelities as a continuous optimisation problem which can then be (approximately) solved. A case-study involving a computational cardiac model demonstrates that practical gains in accuracy can be achieved using the GRE method. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2310.12391 [pdf, other]

Real-time Semiparametric Regression via Sequential Monte Carlo

Authors: Marianne Menictas, Chris J. Oates, Matt P. Wand

Abstract: We develop and describe online algorithms for performing real-time semiparametric regression analyses. Earlier work on this topic is in Luts, Broderick & Wand (J. Comput. Graph. Statist., 2014) where online mean field variational Bayes was employed. In this article we instead develop sequential Monte Carlo approaches to circumvent well-known inaccuracies inherent in variational approaches. Even th… ▽ More We develop and describe online algorithms for performing real-time semiparametric regression analyses. Earlier work on this topic is in Luts, Broderick & Wand (J. Comput. Graph. Statist., 2014) where online mean field variational Bayes was employed. In this article we instead develop sequential Monte Carlo approaches to circumvent well-known inaccuracies inherent in variational approaches. Even though sequential Monte Carlo is not as fast as online mean field variational Bayes, it can be a viable alternative for applications where the data rate is not overly high. For Gaussian response semiparametric regression models our new algorithms share the online mean field variational Bayes property of only requiring updating and storage of sufficient statistics quantities of streaming data. In the non-Gaussian case accurate real-time semiparametric regression requires the full data to be kept in storage. The new algorithms allow for new options concerning accuracy/speed trade-offs for real-time semiparametric regression. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2307.14141 [pdf]

doi 10.1088/1681-7575/ad17d2

Roadmap towards the redefinition of the second

Authors: N. Dimarcq, M. Gertsvolf, G. Mileti, S. Bize, C. W. Oates, E. Peik, D. Calonico, T. Ido, P. Tavella, F. Meynadier, G. Petit, G. Panfilo, J. Bartholomew, P. Defraigne, E. A. Donley, P. O. Hedekvist, I. Sesia, M. Wouters, P. Dube, F. Fang, F. Levi, J. Lodewyck, H. S. Margolis, D. Newell, S. Slyusarev , et al. (12 additional authors not shown)

Abstract: This paper outlines the roadmap towards the redefinition of the second, which was recently updated by the CCTF Task Force created by the CCTF in 2020. The main achievements and the open challenges related to the status of the optical frequency standards, their contribution to time scales and UTC, the possibility of their comparison and the knowledge of the Earth's gravitational potential at the ne… ▽ More This paper outlines the roadmap towards the redefinition of the second, which was recently updated by the CCTF Task Force created by the CCTF in 2020. The main achievements and the open challenges related to the status of the optical frequency standards, their contribution to time scales and UTC, the possibility of their comparison and the knowledge of the Earth's gravitational potential at the necessary level of uncertainty are discussed. In addition, the mandatory criteria to be achieved before redefinition and their current fulfilment level, together with the redefinition options based on a single or on a set of transitions are described. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 26 pages. This paper is based on the work of a CCTF Task Force on the roadmap to the redefinition of the second

Journal ref: Metrologia 61 012001 (2024)

arXiv:2305.10068 [pdf, other]

Stein $Π$-Importance Sampling

Authors: Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates

Abstract: Stein discrepancies have emerged as a powerful tool for retrospective improvement of Markov chain Monte Carlo output. However, the question of how to design Markov chains that are well-suited to such post-processing has yet to be addressed. This paper studies Stein importance sampling, in which weights are assigned to the states visited by a $Π$-invariant Markov chain to obtain a consistent approx… ▽ More Stein discrepancies have emerged as a powerful tool for retrospective improvement of Markov chain Monte Carlo output. However, the question of how to design Markov chains that are well-suited to such post-processing has yet to be addressed. This paper studies Stein importance sampling, in which weights are assigned to the states visited by a $Π$-invariant Markov chain to obtain a consistent approximation of $P$, the intended target. Surprisingly, the optimal choice of $Π$ is not identical to the target $P$; we therefore propose an explicit construction for $Π$ based on a novel variational argument. Explicit conditions for convergence of Stein $Π$-Importance Sampling are established. For $\approx 70\%$ of tasks in the PosteriorDB benchmark, a significant improvement over the analogous post-processing of $P$-invariant Markov chains is reported. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2303.04756 [pdf, other]

Meta-learning Control Variates: Variance Reduction with Limited Data

Authors: Zhuo Sun, Chris J. Oates, François-Xavier Briol

Abstract: Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of… ▽ More Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of samples per task is very small. Our approach, called meta learning CVs (Meta-CVs), can be used for up to hundreds or thousands of tasks. Our empirical assessment indicates that Meta-CVs can lead to significant variance reduction in such settings, and our theoretical analysis establishes general conditions under which Meta-CVs can be successfully trained. △ Less

Submitted 7 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted for publication (with an oral presentation) at UAI 2023

arXiv:2303.02759 [pdf, ps, other]

The Matérn Model: A Journey through Statistics, Numerical Analysis and Machine Learning

Authors: Emilio Porcu, Moreno Bevilacqua, Robert Schaback, Chris J. Oates

Abstract: The Matérn model has been a cornerstone of spatial statistics for more than half a century. More recently, the Matérn model has been central to disciplines as diverse as numerical analysis, approximation theory, computational statistics, machine learning, and probability theory. In this article we take a Matérn-based journey across these disciplines. First, we reflect on the importance of the Maté… ▽ More The Matérn model has been a cornerstone of spatial statistics for more than half a century. More recently, the Matérn model has been central to disciplines as diverse as numerical analysis, approximation theory, computational statistics, machine learning, and probability theory. In this article we take a Matérn-based journey across these disciplines. First, we reflect on the importance of the Matérn model for estimation and prediction in spatial statistics, establishing also connections to other disciplines in which the Matérn model has been influential. Then, we position the Matérn model within the literature on big data and scalable computation: the SPDE approach, the Vecchia likelihood approximation, and recent applications in Bayesian computation are all discussed. Finally, we review recent devlopments, including flexible alternatives to the Matérn model, whose performance we compare in terms of estimation, prediction, screening effect, computation, and Sobolev regularity properties. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2211.09196 [pdf, ps, other]

Sobolev Spaces, Kernels and Discrepancies over Hyperspheres

Authors: Simon Hubbert, Emilio Porcu, Chris. J. Oates, Mark Girolami

Abstract: This work provides theoretical foundations for kernel methods in the hyperspherical context. Specifically, we characterise the native spaces (reproducing kernel Hilbert spaces) and the Sobolev spaces associated with kernels defined over hyperspheres. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicabilit… ▽ More This work provides theoretical foundations for kernel methods in the hyperspherical context. Specifically, we characterise the native spaces (reproducing kernel Hilbert spaces) and the Sobolev spaces associated with kernels defined over hyperspheres. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicability of cubature algorithms based on Stein's method. We first introduce a suitable characterisation on Sobolev spaces on the $d$-dimensional hypersphere embedded in $(d+1)$-dimensional Euclidean spaces. Our characterisation is based on the Fourier--Schoenberg sequences associated with a given kernel. Such sequences are hard (if not impossible) to compute analytically on $d$-dimensional spheres, but often feasible over Hilbert spheres. We circumvent this problem by finding a projection operator that allows to Fourier map** from Hilbert into finite dimensional hyperspheres. We illustrate our findings through some parametric families of kernels. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2210.16357 [pdf, other]

Minimum Kernel Discrepancy Estimators

Authors: Chris. J. Oates

Abstract: For two decades, reproducing kernels and their associated discrepancies have facilitated elegant theoretical analyses in the setting of quasi Monte Carlo. These same tools are now receiving interest in statistics and related fields, as criteria that can be used to select an appropriate statistical model for a given dataset. The focus of this article is on minimum kernel discrepancy estimators, who… ▽ More For two decades, reproducing kernels and their associated discrepancies have facilitated elegant theoretical analyses in the setting of quasi Monte Carlo. These same tools are now receiving interest in statistics and related fields, as criteria that can be used to select an appropriate statistical model for a given dataset. The focus of this article is on minimum kernel discrepancy estimators, whose use in statistical applications is reviewed, and a general theoretical framework for establishing their asymptotic properties is presented. △ Less

Submitted 23 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: To appear in: A. Hinrichs, P. Kritzer, F. Pillichshammer (eds.). Monte Carlo and Quasi-Monte Carlo Methods 2022. Springer Verlag

arXiv:2208.03885 [pdf, other]

Statistical Properties of the Probabilistic Numeric Linear Solver BayesCG

Authors: Tim W. Reid, Ilse C. F. Ipsen, Jon Cockayne, Chris J. Oates

Abstract: We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose in… ▽ More We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose instead two test statistics that are necessary but not sufficient for calibration: the Z-statistic and the new S-statistic. We show analytically and experimentally that under low-rank approximate Krylov posteriors, BayesCG exhibits desirable properties of a calibrated solver, is only slightly optimistic, and is computationally competitive with CG. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: 40 Pages

MSC Class: 65F10; 62F15; 15A06

arXiv:2207.02636 [pdf, other]

Gradient-Free Kernel Stein Discrepancy

Authors: Matthew A Fisher, Chris. J Oates

Abstract: Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical mo… ▽ More Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical models, the stable numerical computation of derivatives can require bespoke algorithmic development and render Stein discrepancies impractical. This paper focuses on posterior approximation using Stein discrepancies, and introduces a collection of non-canonical Stein discrepancies that are gradient free, meaning that derivatives of the statistical model are not required. Sufficient conditions for convergence detection and control are established, and applications to sampling and variational inference are presented. △ Less

Submitted 18 July, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:2206.08420 [pdf, ps, other]

Generalised Bayesian Inference for Discrete Intractable Likelihood

Authors: Takuo Matsubara, Jeremias Knoblauch, François-Xavier Briol, Chris. J. Oates

Abstract: Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspir… ▽ More Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence, in lieu of the problematic intractable likelihood. The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov chain Monte Carlo, circumventing the intractable normalising constant. The statistical properties of the generalised posterior are analysed, with sufficient conditions for posterior consistency and asymptotic normality established. In addition, a novel and general approach to calibration of generalised posteriors is proposed. Applications are presented on lattice models for discrete spatial data and on multivariate models for count data, where in each case the methodology facilitates generalised Bayesian inference at low computational cost. △ Less

Submitted 1 September, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2203.09179 [pdf, ps, other]

Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed

Authors: Toni Karvonen, Chris J. Oates

Abstract: Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression… ▽ More Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model. △ Less

Submitted 25 April, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: An important work is missing from our literature review. Ben Salem, Bachoc, Roustant, Gamboa and Tomaso [Gaussian process-based dimension reduction for goal-oriented sequential design. SIAM/ASA Journal on Uncertainty Quantification, 7(4):1369-1397, 2019. See Proposition 4.3.] have proved parts of Theorems 2.3 and 5.3 using a technique that is more or less identical to the proof in Section 7.4

Journal ref: Journal of Machine Learning Research, 24(120):1-47, 2023

arXiv:2112.10817 [pdf]

Fundamental Physics with a State-of-the-Art Optical Clock in Space

Authors: Andrei Derevianko, Kurt Gibble, Leo Hollberg, Nathan R. Newbury, Chris Oates, Marianna S. Safronova, Laura C. Sinclair, Nan Yu

Abstract: Recent advances in optical atomic clocks and optical time transfer have enabled new possibilities in precision metrology for both tests of fundamental physics and timing applications. Here we describe a space mission concept that would place a state-of-the-art optical atomic clock in an eccentric orbit around Earth. A high stability laser link would connect the relative time, range, and velocity o… ▽ More Recent advances in optical atomic clocks and optical time transfer have enabled new possibilities in precision metrology for both tests of fundamental physics and timing applications. Here we describe a space mission concept that would place a state-of-the-art optical atomic clock in an eccentric orbit around Earth. A high stability laser link would connect the relative time, range, and velocity of the orbiting spacecraft to earthbound stations. The primary goal for this mission would be to test the gravitational redshift, a classical test of general relativity, with a sensitivity 30,000 times beyond current limits. Additional science objectives include other tests of relativity, enhanced searches for dark matter and drifts in fundamental constants, and establishing a high accuracy international time/geodesic reference. △ Less

Submitted 20 December, 2021; originally announced December 2021.

arXiv:2111.07745 [pdf, other]

doi 10.1080/00401706.2021.2009034

A Statistical Approach to Surface Metrology for 3D-Printed Stainless Steel

Authors: Chris. J. Oates, Wilfrid S. Kendall, Liam Fleming

Abstract: Surface metrology is the area of engineering concerned with the study of geometric variation in surfaces. This paper explores the potential for modern techniques from spatial statistics to act as generative models for geometric variation in 3D-printed stainless steel. The complex macro-scale geometries of 3D-printed components pose a challenge that is not present in traditional surface metrology,… ▽ More Surface metrology is the area of engineering concerned with the study of geometric variation in surfaces. This paper explores the potential for modern techniques from spatial statistics to act as generative models for geometric variation in 3D-printed stainless steel. The complex macro-scale geometries of 3D-printed components pose a challenge that is not present in traditional surface metrology, as the training data and test data need not be defined on the same manifold. Strikingly, a covariance function defined in terms of geodesic distance on one manifold can fail to satisfy positive-definiteness and thus fail to be a valid covariance function in the context of a different manifold; this hinders the use of standard techniques that aim to learn a covariance function from a training dataset. On the other hand, the associated covariance differential operators are locally defined. This paper proposes to perform inference for such differential operators, facilitating generalisation from the manifold of a training dataset to the manifold of a test dataset. The approach is assessed in the context of model selection and explored in detail in the context of a finite element model for 3D-printed stainless steel. △ Less

Submitted 3 October, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Journal ref: Technometrics, 64(3):370-383, 2022

arXiv:2110.08072 [pdf, other]

GaussED: A Probabilistic Programming Language for Sequential Experimental Design

Authors: Matthew A. Fisher, Onur Teymur, Chris. J. Oates

Abstract: Sequential algorithms are popular for experimental design, enabling emulation, optimisation and inference to be efficiently performed. For most of these applications bespoke software has been developed, but the approach is general and many of the actual computations performed in such software are identical. Motivated by the diverse problems that can in principle be solved with common code, this pa… ▽ More Sequential algorithms are popular for experimental design, enabling emulation, optimisation and inference to be efficiently performed. For most of these applications bespoke software has been developed, but the approach is general and many of the actual computations performed in such software are identical. Motivated by the diverse problems that can in principle be solved with common code, this paper presents GaussED, a simple probabilistic programming language coupled to a powerful experimental design engine, which together automate sequential experimental design for approximating a (possibly nonlinear) quantity of interest in Gaussian processes models. Using a handful of commands, GaussED can be used to: solve linear partial differential equations, perform tomographic reconstruction from integral data and implement Bayesian optimisation with gradient data. △ Less

Submitted 15 October, 2021; originally announced October 2021.

arXiv:2109.06075 [pdf, other]

Minimum Discrepancy Methods in Uncertainty Quantification

Authors: Chris J. Oates

Abstract: The lectures were prepared for the École Thématique sur les Incertitudes en Calcul Scientifique (ETICS) in September 2021. The lectures were prepared for the École Thématique sur les Incertitudes en Calcul Scientifique (ETICS) in September 2021. △ Less

Submitted 13 September, 2021; originally announced September 2021.

arXiv:2106.13718 [pdf, other]

Black Box Probabilistic Numerics

Authors: Onur Teymur, Christopher N. Foley, Philip G. Breen, Toni Karvonen, Chris. J. Oates

Abstract: Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, renderin… ▽ More Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, rendering the proper conditioning of random variables difficult and limiting the range of numerical tasks that can be addressed. Instead, this paper proposes to construct probabilistic numerical methods based only on the final output from a traditional method. A convergent sequence of approximations to the quantity of interest constitute a dataset, from which the limiting quantity of interest can be extrapolated, in a probabilistic analogue of Richardson's deferred approach to the limit. This black box approach (1) massively expands the range of tasks to which probabilistic numerics can be applied, (2) inherits the features and performance of state-of-the-art numerical methods, and (3) enables provably higher orders of convergence to be achieved. Applications are presented for nonlinear ordinary and partial differential equations, as well as for eigenvalue problems-a setting for which no probabilistic numerical methods have yet been developed. △ Less

Submitted 28 October, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

Journal ref: Advances in Neural Information Processing Systems 34 (2021)

arXiv:2105.03481 [pdf, other]

Stein's Method Meets Computational Statistics: A Review of Some Recent Developments

Authors: Andreas Anastasiou, Alessandro Barp, François-Xavier Briol, Bruno Ebner, Robert E. Gaunt, Fatemeh Ghaderinezhad, Jackson Gorham, Arthur Gretton, Christophe Ley, Qiang Liu, Lester Mackey, Chris. J. Oates, Gesine Reinert, Yvik Swan

Abstract: Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stim… ▽ More Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stimulate further research into the successful field of Stein's method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing. △ Less

Submitted 22 June, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

Comments: Accepted for publication by "Statistical Science"

arXiv:2104.12587 [pdf, other]

doi 10.1007/s11222-021-10030-w

Bayesian Numerical Methods for Nonlinear Partial Differential Equations

Authors: Junyang Wang, Jon Cockayne, Oksana Chkrebtii, T. J. Sullivan, Chris. J. Oates

Abstract: The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial… ▽ More The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial value problems specified by nonlinear PDEs, motivated by problems for which evaluations of the right-hand-side, initial conditions, or boundary conditions of the PDE have a high computational cost. The proposed method can be viewed as exact Bayesian inference under an approximate likelihood, which is based on discretisation of the nonlinear differential operator. Proof-of-concept experimental results demonstrate that meaningful probabilistic uncertainty quantification for the unknown solution of the PDE can be performed, while controlling the number of times the right-hand-side, initial and boundary conditions are evaluated. A suitable prior model for the solution of the PDE is identified using novel theoretical analysis of the sample path properties of Matérn processes, which may be of independent interest. △ Less

Submitted 3 May, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

Journal ref: Stat. Comput. 31(5):no. 55, 20pp., 2021

arXiv:2104.07359 [pdf, other]

Robust Generalised Bayesian Inference for Intractable Likelihoods

Authors: Takuo Matsubara, Jeremias Knoblauch, François-Xavier Briol, Chris. J. Oates

Abstract: Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In thi… ▽ More Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In this context, the Stein discrepancy circumvents evaluation of the normalisation constant and produces generalised posteriors that are either closed form or accessible using standard Markov chain Monte Carlo. On a theoretical level, we show consistency, asymptotic normality, and bias-robustness of the generalised posterior, highlighting how these properties are impacted by the choice of Stein discrepancy. Then, we provide numerical experiments on a range of intractable distributions, including applications to kernel-based exponential family models and non-Gaussian graphical models. △ Less

Submitted 11 January, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

arXiv:2103.16048 [pdf, other]

doi 10.1146/annurev-statistics-040220-091727

Post-Processing of MCMC

Authors: Leah F. South, Marina Riabiz, Onur Teymur, Chris. J. Oates

Abstract: Markov chain Monte Carlo (MCMC) is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is post-processed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a lim… ▽ More Markov chain Monte Carlo (MCMC) is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is post-processed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a limited computational budget engenders a bias-variance trade-off. The aim of this article is to review state-of-the-art techniques for post-processing Markov chain output. Our review covers methods based on discrepancy minimisation, which directly address the bias-variance trade-off, as well as general-purpose control variate methods for approximating expected quantities of interest. △ Less

Submitted 6 September, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: Version 3 is the accepted version. When citing this paper, please use the following: South, LF, Riabiz, M, Teymur, O & Oates, CJ. 2022. Post-Processing of MCMC. Annual Review of Statistics and Its Application. 9: Submitted. DOI: 10.1146/annurev-statistics-040220-091727

arXiv:2012.12670 [pdf, other]

Testing whether a Learning Procedure is Calibrated

Authors: Jon Cockayne, Matthew M. Graham, Chris J. Oates, T. J. Sullivan, Onur Teymur

Abstract: A learning procedure takes as input a dataset and performs inference for the parameters $θ$ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about $θ$ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning pro… ▽ More A learning procedure takes as input a dataset and performs inference for the parameters $θ$ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about $θ$ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning procedures that return distributional output. This paper studies conditions for a learning procedure to be considered calibrated, in the sense that the true data-generating parameters are plausible as samples from its distributional output. A learning procedure whose inferences and predictions are systematically over- or under-confident will fail to be calibrated. On the other hand, a learning procedure that is calibrated need not be statistically efficient. A hypothesis-testing framework is developed in order to assess, using simulation, whether a learning procedure is calibrated. Several vignettes are presented to illustrate different aspects of the framework. △ Less

Submitted 16 June, 2022; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2012.12615 [pdf, other]

Probabilistic Iterative Methods for Linear Systems

Authors: Jon Cockayne, Ilse C. F. Ipsen, Chris J. Oates, Tim W. Reid

Abstract: This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a seq… ▽ More This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a sequence $\mathbf{x}_m$ of approximations that converge to $\mathbf{x}_*$. The output of the iterative methods proposed in this paper is, instead, a sequence of probability distributions $μ_m \in \mathcal{P}(\mathbb{R}^d)$. The distributional output both provides a "best guess" for $\mathbf{x}_*$, for example as the mean of $μ_m$, and also probabilistic uncertainty quantification for the value of $\mathbf{x}_*$ when it has not been exactly determined. Theoretical analysis is provided in the prototypical case of a stationary linear iterative method. In this setting we characterise both the rate of contraction of $μ_m$ to an atomic measure on $\mathbf{x}_*$ and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the insight into solution uncertainty that can be provided by probabilistic iterative methods. △ Less

Submitted 11 January, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2010.11779 [pdf, other]

Measure Transport with Kernel Stein Discrepancy

Authors: Matthew A. Fisher, Tui Nolan, Matthew M. Graham, Dennis Prangle, Chris J. Oates

Abstract: Measure transport underpins several recent algorithms for posterior approximation in the Bayesian context, wherein a transport map is sought to minimise the Kullback--Leibler divergence (KLD) from the posterior to the approximation. The KLD is a strong mode of convergence, requiring absolute continuity of measures and placing restrictions on which transport maps can be permitted. Here we propose t… ▽ More Measure transport underpins several recent algorithms for posterior approximation in the Bayesian context, wherein a transport map is sought to minimise the Kullback--Leibler divergence (KLD) from the posterior to the approximation. The KLD is a strong mode of convergence, requiring absolute continuity of measures and placing restrictions on which transport maps can be permitted. Here we propose to minimise a kernel Stein discrepancy (KSD) instead, requiring only that the set of transport maps is dense in an $L^2$ sense and demonstrating how this condition can be validated. The consistency of the associated posterior approximation is established and empirical results suggest that KSD is competitive and more flexible alternative to KLD for measure transport. △ Less

Submitted 26 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

arXiv:2010.08488 [pdf, other]

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Authors: Takuo Matsubara, Chris J. Oates, François-Xavier Briol

Abstract: Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is propose… ▽ More Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate Gaussian process covariance function for the task at hand. Our approach constructs a prior distribution for the parameters of the network, called a ridgelet prior, that approximates the posited Gaussian process in the output space of the network. In contrast to existing work on the connection between neural networks and Gaussian processes, our analysis is non-asymptotic, with finite sample-size error bounds provided. This establishes the universality property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular. Our experimental assessment is limited to a proof-of-concept, where we demonstrate that the ridgelet prior can out-perform an unstructured prior on regression problems for which a suitable Gaussian process prior can be provided. △ Less

Submitted 11 January, 2022; v1 submitted 16 October, 2020; originally announced October 2020.

arXiv:2010.07064 [pdf, other]

Optimal quantisation of probability measures using maximum mean discrepancy

Authors: Onur Teymur, Jackson Gorham, Marina Riabiz, Chris. J. Oates

Abstract: Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a target distribution by a representative point set. We consider sequential algorithms that greedily minimise MMD over a discrete candidate set. We propose a novel non-myopic algorithm and, in order to both improve statistical efficiency and reduce com… ▽ More Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a target distribution by a representative point set. We consider sequential algorithms that greedily minimise MMD over a discrete candidate set. We propose a novel non-myopic algorithm and, in order to both improve statistical efficiency and reduce computational cost, we investigate a variant that applies this technique to a mini-batch of the candidate set at each iteration. When the candidate points are sampled from the target, the consistency of these new algorithm - and their mini-batch variants - is established. We demonstrate the algorithms on a range of important computational problems, including optimisation of nodes in Bayesian cubature and the thinning of Markov chain output. △ Less

Submitted 12 February, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

arXiv:2008.03225 [pdf, other]

BayesCG As An Uncertainty Aware Version of CG

Authors: Tim W. Reid, Ilse C. F. Ipsen, Jon Cockayne, Chris J. Oates

Abstract: The Bayesian Conjugate Gradient method (BayesCG) is a probabilistic generalization of the Conjugate Gradient method (CG) for solving linear systems with real symmetric positive definite coefficient matrices. Our CG-based implementation of BayesCG under a structure-exploiting prior distribution represents an 'uncertainty-aware' version of CG. Its output consists of CG iterates and posterior covaria… ▽ More The Bayesian Conjugate Gradient method (BayesCG) is a probabilistic generalization of the Conjugate Gradient method (CG) for solving linear systems with real symmetric positive definite coefficient matrices. Our CG-based implementation of BayesCG under a structure-exploiting prior distribution represents an 'uncertainty-aware' version of CG. Its output consists of CG iterates and posterior covariances that can be propagated to subsequent computations. The covariances have low-rank and are maintained in factored form. This allows easy generation of accurate samples to probe uncertainty in downstream computations. Numerical experiments confirm the effectiveness of the low-rank posterior covariances. △ Less

Submitted 3 October, 2022; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: 34 Pages including supplementary material (main paper is 23 pages, supplement is 11 pages). Computer codes are available at https://github.com/treid5/ProbNumCG_Supp

MSC Class: 65F10; 62F15; 65F50; 15A06; 15A10

arXiv:2006.07487 [pdf, other]

Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization

Authors: Shi**g Si, Chris. J. Oates, Andrew B. Duncan, Lawrence Carin, François-Xavier Briol

Abstract: Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that… ▽ More Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support. △ Less

Submitted 21 July, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Accepted by MCQMC2020

MSC Class: G.3

arXiv:2005.03952 [pdf, other]

Optimal Thinning of MCMC Output

Authors: Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates

Abstract: The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively se… ▽ More The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB. △ Less

Submitted 11 January, 2022; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: To appear in the Journal of the Royal Statistical Society, Series B, 2022+

arXiv:2004.12654 [pdf, other]

Integration in reproducing kernel Hilbert spaces of Gaussian kernels

Authors: Toni Karvonen, Chris J. Oates, Mark Girolami

Abstract: The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm for efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of a… ▽ More The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm for efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of algorithms that use $N$ evaluations to integrate $d$-variate functions reproduced by Gaussian kernels and prove the exponential or super-algebraic decay of their worst-case errors. In contrast to earlier work, no constraints are placed on the length-scale parameter of the Gaussian kernel. The first class of algorithms is obtained via an appropriate scaling of the classical Gauss-Hermite rules. For these algorithms we derive lower and upper bounds on the worst-case error of the forms $\exp(-c_1 N^{1/d}) N^{1/(4d)}$ and $\exp(-c_2 N^{1/d}) N^{-1/(4d)}$, respectively, for positive constants $c_1 > c_2$. The second class of algorithms we construct is more flexible and uses worst-case optimal weights for points that may be taken as a nested sequence. For these algorithms we derive upper bounds of the form $\exp(-c_3 N^{1/(2d)})$ for a positive constant $c_3$. △ Less

Submitted 31 March, 2021; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: Accepted for publication in Mathematics of Computation

arXiv:2002.00033 [pdf, other]

Semi-Exact Control Functionals From Sard's Method

Authors: Leah F. South, Toni Karvonen, Chris Nemeth, Mark Girolami, Chris. J. Oates

Abstract: The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estim… ▽ More The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estimators approximate a Gaussian cubature method near the Bernstein-von-Mises limit. The main theoretical result establishes a bias-correction property in settings where the Markov chain does not leave the posterior invariant. Empirical results are presented across a selection of Bayesian inference tasks. All methods used in this paper are available in the R package ZVCV. △ Less

Submitted 6 May, 2021; v1 submitted 31 January, 2020; originally announced February 2020.

Comments: There are 17 pages of main text. This revision provides an extended version of Theorem 1

arXiv:2001.10965 [pdf, other]

Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

Authors: Toni Karvonen, George Wynne, Filip Tronarp, Chris J. Oates, Simo Särkkä

Abstract: Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the sc… ▽ More Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Matérn kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become "slowly" overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings. △ Less

Submitted 11 May, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

arXiv:1912.10496 [pdf, other]

Discussion of "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé

Authors: Leah F. South, Chris Nemeth, Chris J. Oates

Abstract: This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B. This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B. △ Less

Submitted 20 January, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: This comment includes an appendix which was not included in the printed JRSS B discussion. Version 2 has been shorted slightly to meet word limit requirements

arXiv:1910.02995 [pdf, other]

A Locally Adaptive Bayesian Cubature Method

Authors: Matthew A Fisher, Chris J Oates, Catherine Powell, Aretha Teckentrup

Abstract: Bayesian cubature (BC) is a popular inferential perspective on the cubature of expensive integrands, wherein the integrand is emulated using a stochastic process model. Several approaches have been put forward to encode sequential adaptation (i.e. dependence on previous integrand evaluations) into this framework. However, these proposals have been limited to either estimating the parameters of a s… ▽ More Bayesian cubature (BC) is a popular inferential perspective on the cubature of expensive integrands, wherein the integrand is emulated using a stochastic process model. Several approaches have been put forward to encode sequential adaptation (i.e. dependence on previous integrand evaluations) into this framework. However, these proposals have been limited to either estimating the parameters of a stationary covariance model or focusing computational resources on regions where large values are taken by the integrand. In contrast, many classical adaptive cubature methods focus computational resources on spatial regions in which local error estimates are largest. The contributions of this work are three-fold: First, we present a theoretical result that suggests there does not exist a direct Bayesian analogue of the classical adaptive trapezoidal method. Then we put forward a novel BC method that has empirically similar behaviour to the adaptive trapezoidal method. Finally we present evidence that the novel method provides improved cubature performance, relative to standard BC, in a detailed empirical assessment. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1907.06774 [pdf, other]

doi 10.1103/PhysRevLett.123.073202

Ramsey-Bordé matter-wave interferometry for laser frequency stabilization at $10^{-16}$ frequency instability and below

Authors: Judith Olson, Richard W. Fox, Tara M. Fortier, Todd F. Sheerin, Roger C. Brown, Holly Leopardi, Richard E. Stoner, Chris W. Oates, Andrew D. Ludlow

Abstract: We demonstrate Ramsey-Bordé (RB) atom interferometry for high performance laser stabilization with fractional frequency instability $<2 \times 10^{-16}$ for timescales between 10 and 1000s. The RB spectroscopy laser interrogates two counterpropagating $^{40}$Ca beams on the $^1$S$_0$ -- $^3$P$_1$ transition at 657 nm, yielding 1.6 kHz linewidth interference fringes. Fluorescence detection of the e… ▽ More We demonstrate Ramsey-Bordé (RB) atom interferometry for high performance laser stabilization with fractional frequency instability $<2 \times 10^{-16}$ for timescales between 10 and 1000s. The RB spectroscopy laser interrogates two counterpropagating $^{40}$Ca beams on the $^1$S$_0$ -- $^3$P$_1$ transition at 657 nm, yielding 1.6 kHz linewidth interference fringes. Fluorescence detection of the excited state population is performed on the (4s4p) $^3$P$_1$ -- (4p$^2$) $^3$P$_0$ transition at 431 nm. Minimal thermal shielding and no vibration isolation are used. These stability results surpass performance from other thermal atomic or molecular systems by one to two orders of magnitude, and further improvements look feasible. △ Less

Submitted 15 July, 2019; originally announced July 2019.

arXiv:1907.03867 [pdf, other]

doi 10.1140/epjd/e2019-100324-6

SAGE: A Proposal for a Space Atomic Gravity Explorer

Authors: G. M. Tino, A. Bassi, G. Bianco, K. Bongs, P. Bouyer, L. Cacciapuoti, S. Capozziello, X. Chen, M. L. Chiofalo, A. Derevianko, W. Ertmer, N. Gaaloul, P. Gill, P. W. Graham, J. M. Hogan, L. Iess, M. A. Kasevich, H. Katori, C. Klempt, X. Lu, L. -S. Ma, H. Müller, N. R. Newbury, C. Oates, A. Peters , et al. (22 additional authors not shown)

Abstract: The proposed mission "Space Atomic Gravity Explorer" (SAGE) has the scientific objective to investigate gravitational waves, dark matter, and other fundamental aspects of gravity as well as the connection between gravitational physics and quantum physics using new quantum sensors, namely, optical atomic clocks and atom interferometers based on ultracold strontium atoms. The proposed mission "Space Atomic Gravity Explorer" (SAGE) has the scientific objective to investigate gravitational waves, dark matter, and other fundamental aspects of gravity as well as the connection between gravitational physics and quantum physics using new quantum sensors, namely, optical atomic clocks and atom interferometers based on ultracold strontium atoms. △ Less

Submitted 18 November, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

Comments: Published in Eur. Phys. J. D 73 (2019) 228 in the Topical Issue Quantum Technologies for Gravitational Physics, Guest editors Tanja Mehlstaubler, Yanbei Chen, Guglielmo M. Tino and Hsien-Chi Yeh

Journal ref: Eur. Phys. J. D 73, 228 (2019)

arXiv:1906.10564 [pdf, other]

A Role for Symmetry in the Bayesian Solution of Differential Equations

Authors: Junyang Wang, Jon Cockayne, Chris J. Oates

Abstract: The interpretation of numerical methods, such as finite difference methods for differential equations, as point estimators suggests that formal uncertainty quantification can also be performed in this context. Competing statistical paradigms can be considered and Bayesian probabilistic numerical methods (PNMs) are obtained when Bayesian statistical principles are deployed. Bayesian PNM have the ap… ▽ More The interpretation of numerical methods, such as finite difference methods for differential equations, as point estimators suggests that formal uncertainty quantification can also be performed in this context. Competing statistical paradigms can be considered and Bayesian probabilistic numerical methods (PNMs) are obtained when Bayesian statistical principles are deployed. Bayesian PNM have the appealing property of being closed under composition, such that uncertainty due to different sources of discretisation in a numerical method can be jointly modelled and rigorously propagated. Despite recent attention, no exact Bayesian PNM for the numerical solution of ordinary differential equations (ODEs) has been proposed. This raises the fundamental question of whether exact Bayesian methods for (in general nonlinear) ODEs even exist. The purpose of this paper is to provide a positive answer for a limited class of ODE. To this end, we work at a foundational level, where a novel Bayesian PNM is proposed as a proof-of-concept. Our proposal is a synthesis of classical Lie group methods, to exploit underlying symmetries in the gradient field, and non-parametric regression in a transformed solution space for the ODE. The procedure is presented in detail for first and second order ODEs and relies on a certain strong technical condition -- existence of a solvable Lie algebra -- being satisfied. Numerical illustrations are provided. △ Less

Submitted 23 September, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: A summary version of this manuscript appeared in the proceedings of MaxEnt 2018 in London, UK; see arXiv:1805.07109

arXiv:1905.03673 [pdf, other]

Stein Point Markov Chain Monte Carlo

Authors: Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates

Abstract: An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain ea… ▽ More An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established. △ Less

Submitted 14 September, 2020; v1 submitted 9 May, 2019; originally announced May 2019.

Comments: Minor bug fixed in Theorem 4 (result unchanged)

Journal ref: ICML 2019

arXiv:1902.06858 [pdf]

doi 10.1103/PhysRevApplied.12.044069

Optical-Clock-Based Time Scale

Authors: Jian Yao, Jeff A. Sherman, Tara Fortier, Holly Leopardi, Thomas Parker, William McGrew, Xiaogang Zhang, Daniele Nicolodi, Robert Fasano, Stefan Schäffer, Kyle Beloy, Joshua Savory, Stefania Romisch, Chris Oates, Scott Diddams, Andrew Ludlow, Judah Levine

Abstract: A time scale is a procedure for accurately and continuously marking the passage of time. It is exemplified by Coordinated Universal Time (UTC), and provides the backbone for critical navigation tools such as the Global Positioning System (GPS). Present time scales employ microwave atomic clocks, whose attributes can be combined and averaged in a manner such that the composite is more stable, accur… ▽ More A time scale is a procedure for accurately and continuously marking the passage of time. It is exemplified by Coordinated Universal Time (UTC), and provides the backbone for critical navigation tools such as the Global Positioning System (GPS). Present time scales employ microwave atomic clocks, whose attributes can be combined and averaged in a manner such that the composite is more stable, accurate, and reliable than the output of any individual clock. Over the past decade, clocks operating at optical frequencies have been introduced which are orders of magnitude more stable than any microwave clock. However, in spite of their great potential, these optical clocks cannot be operated continuously, which makes their use in a time scale problematic. In this paper, we report the development of a hybrid microwave-optical time scale, which only requires the optical clock to run intermittently while relying upon the ensemble of microwave clocks to serve as the flywheel oscillator. The benefit of using clock ensemble as the flywheel oscillator, instead of a single clock, can be understood by the Dick-effect limit. This time scale demonstrates for the first time sub-nanosecond accuracy for a few months, attaining a fractional frequency uncertainty of 1.45*10-16 at 30 days and reaching the 10-17 decade at 50 days, with respect to UTC. This time scale significantly improves the accuracy in timekee** and could change the existing time-scale architectures. △ Less

Submitted 10 April, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

Journal ref: Phys. Rev. Applied 12, 044069 (2019)

arXiv:1901.04457 [pdf, other]

doi 10.1007/s11222-019-09902-z

A Modern Retrospective on Probabilistic Numerics

Authors: C. J. Oates, T. J. Sullivan

Abstract: This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul'din and Larkin in the 1960s and how their pioneering early ideas have reached a… ▽ More This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul'din and Larkin in the 1960s and how their pioneering early ideas have reached a degree of maturity in the intervening period, mediated by paradigms such as average-case analysis and information-based complexity. We provide a subjective assessment of the state of research in probabilistic numerics and highlight some difficulties to be addressed by future works. △ Less

Submitted 5 May, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: 23 pages, 2 figures

MSC Class: 62-03; 65-03; 01A60; 01A65; 01A67

Journal ref: Statistics and Computing 29(6):1335--1351, 2019

arXiv:1901.04326 [pdf, other]

doi 10.1515/9783110635461-005

Optimality Criteria for Probabilistic Numerical Methods

Authors: Chris. J. Oates, Jon Cockayne, Dennis Prangle, T. J. Sullivan, Mark Girolami

Abstract: It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that standard approaches from the decision-theoretic framework are neither appropriate nor sufficient. Instead, we consider a particular optimality criterion from Bayesian experimental… ▽ More It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that standard approaches from the decision-theoretic framework are neither appropriate nor sufficient. Instead, we consider a particular optimality criterion from Bayesian experimental design and study its implied optimal information in the numerical context. This information is demonstrated to differ, in general, from the information that would be used in an average-case-optimal numerical method. The explicit connection to Bayesian experimental design suggests several distinct regimes in which optimal probabilistic numerical methods can be developed. △ Less

Submitted 10 May, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: Prepared for the proceedings of the RICAM workshop on Multivariate Algorithms and Information-Based Complexity, November 2018

Journal ref: Multivariate Algorithms and Information-Based Complexity, Radon Series on Computational and Applied Mathematics 27:65--88, 2020

arXiv:1811.11474 [pdf, other]

Improved Calibration of Numerical Integration Error in Sigma-Point Filters

Authors: Jakub Prüher, Toni Karvonen, Chris J. Oates, Ondřej Straka, Simo Särkkä

Abstract: The sigma-point filters, such as the UKF, which exploit numerical quadrature to obtain an additional order of accuracy in the moment transformation step, are popular alternatives to the ubiquitous EKF. The classical quadrature rules used in the sigma-point filters are motivated via polynomial approximation of the integrand, however in the applied context these assumptions cannot always be justifie… ▽ More The sigma-point filters, such as the UKF, which exploit numerical quadrature to obtain an additional order of accuracy in the moment transformation step, are popular alternatives to the ubiquitous EKF. The classical quadrature rules used in the sigma-point filters are motivated via polynomial approximation of the integrand, however in the applied context these assumptions cannot always be justified. As a result, quadrature error can introduce bias into estimated moments, for which there is no compensatory mechanism in the classical sigma-point filters. This can lead in turn to estimates and predictions that are poorly calibrated. In this article, we investigate the Bayes-Sard quadrature method in the context of sigma-point filters, which enables uncertainty due to quadrature error to be formalised within a probabilistic model. Our first contribution is to derive the well-known classical quadratures as special cases of the Bayes-Sard quadrature method. Then a general-purpose moment transform is developed and utilised in the design of novel sigma-point filters, so that uncertainty due to quadrature error is explicitly quantified. Numerical experiments on a challenging tracking example with misspecified initial conditions show that the additional uncertainty quantification built into our method leads to better-calibrated state estimates with improved RMSE. △ Less

Submitted 22 February, 2020; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: 13 pages, 4 figures

arXiv:1811.10275 [pdf, ps, other]

Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"

Authors: Francois-Xavier Briol, Chris J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

Abstract: This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comme… ▽ More This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation? △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Accepted to Statistical Science

arXiv:1811.05885 [pdf, other]

Towards Adoption of an Optical Second: Verifying Optical Clocks at the SI Limit

Authors: W. F. McGrew, X. Zhang, H. Leopardi, R. J. Fasano, D. Nicolodi, K. Beloy, J. Yao, J. A. Sherman, S. A. Schäffer, J. Savory, R. C. Brown, S. Römisch, C. W. Oates, T. E. Parker, T. M. Fortier, A. D. Ludlow

Abstract: The pursuit of ever more precise measures of time and frequency is likely to lead to the eventual redefinition of the second in terms of an optical atomic transition. To ensure continuity with the current definition, based on a microwave transition between hyperfine levels in ground-state $^{133}$Cs, it is necessary to measure the absolute frequency of candidate standards, which is done by compari… ▽ More The pursuit of ever more precise measures of time and frequency is likely to lead to the eventual redefinition of the second in terms of an optical atomic transition. To ensure continuity with the current definition, based on a microwave transition between hyperfine levels in ground-state $^{133}$Cs, it is necessary to measure the absolute frequency of candidate standards, which is done by comparing against a primary cesium reference. A key verification of this process can be achieved by performing a loop closure$-$comparing frequency ratios derived from absolute frequency measurements against ratios determined from direct optical comparisons. We measure the $^1$S$_0\!\rightarrow^3$P$_0$ transition of $^{171}$Yb by comparing the clock frequency to an international frequency standard with the aid of a maser ensemble serving as a flywheel oscillator. Our measurements consist of 79 separate runs spanning eight months, and we determine the absolute frequency to be 518 295 836 590 863.71(11) Hz, the uncertainty of which is equivalent to a fractional frequency of $2.1\times10^{-16}$. This absolute frequency measurement, the most accurate reported for any transition, allows us to close the Cs-Yb-Sr-Cs frequency measurement loop at an uncertainty of $<$3$\times10^{-16}$, limited by the current realization of the SI second. We use these measurements to tighten the constraints on variation of the electron-to-proton mass ratio, $μ=m_e/m_p$. Incorporating our measurements with the entire record of Yb and Sr absolute frequency measurements, we infer a coupling coefficient to gravitational potential of $k_\mathrmμ=(-1.9\pm 9.4)\times10^{-7}$ and a drift with respect to time of $\frac{\dotμ}μ=(5.3 \pm 6.5)\times10^{-17}/$yr. △ Less

Submitted 14 November, 2018; originally announced November 2018.

arXiv:1811.05073 [pdf, other]

Regularized Zero-Variance Control Variates

Authors: Leah F. South, Chris J. Oates, Antonietta Mira, Christopher Drovandi

Abstract: Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the… ▽ More Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the number of covariates in the regression rapidly increases with the dimension of the target. In this paper, we present compelling empirical evidence that the use of penalized regression techniques in the selection of high-dimensional control variates provides performance gains over the classical least squares method. Another type of regularization based on using subsets of derivatives, or a priori regularization as we refer to it in this paper, is also proposed to reduce computational and storage requirements. Several examples showing the utility and limitations of regularized ZV-CV for Bayesian inference are given. The methods proposed in this paper are accessible through the R package ZVCV. △ Less

Submitted 15 August, 2022; v1 submitted 12 November, 2018; originally announced November 2018.

Comments: Accepted to Bayesian Analysis. ArXiv version is 20 pages plus 21 pages of appendices

arXiv:1810.04946 [pdf, other]

A Riemann-Stein Kernel Method

Authors: Alessandro Barp, Chris. J. Oates, Emilio Porcu, Mark Girolami

Abstract: This paper proposes and studies a numerical method for approximation of posterior expectations based on interpolation with a Stein reproducing kernel. Finite-sample-size bounds on the approximation error are established for posterior distributions supported on a compact Riemannian manifold, and we relate these to a kernel Stein discrepancy (KSD). Moreover, we prove in our setting that the KSD is e… ▽ More This paper proposes and studies a numerical method for approximation of posterior expectations based on interpolation with a Stein reproducing kernel. Finite-sample-size bounds on the approximation error are established for posterior distributions supported on a compact Riemannian manifold, and we relate these to a kernel Stein discrepancy (KSD). Moreover, we prove in our setting that the KSD is equivalent to Sobolev discrepancy and, in doing so, we completely characterise the convergence-determining properties of KSD. Our contribution is rooted in a novel combination of Stein's method, the theory of reproducing kernels, and existence and regularity results for partial differential equations on a Riemannian manifold. △ Less

Submitted 11 January, 2022; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1809.10227 [pdf, other]

Symmetry Exploits for Bayesian Cubature Methods

Authors: Toni Karvonen, Simo Särkkä, Chris. J. Oates

Abstract: Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in or… ▽ More Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in order to reduce - in some cases substantially - the computational cost of the standard Bayesian cubature method. This work identifies several additional symmetry exploits within the Bayesian cubature framework. In particular, we go beyond earlier work in considering non-symmetric measures and, in addition to the standard Bayesian cubature method, present exploits for the Bayes-Sard cubature method and the multi-output Bayesian cubature method. △ Less

Submitted 26 January, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: Accepted for publication in Statistics and Computing

Showing 1–50 of 126 results for author: Oates, C