-
A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing
Authors:
Kevin H. Huang,
Xing Liu,
Andrew B. Duncan,
Axel Gandy
Abstract:
We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate…
▽ More
We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.
△ Less
Submitted 2 July, 2023; v1 submitted 11 February, 2023;
originally announced February 2023.
-
A Fourier representation of kernel Stein discrepancy with application to Goodness-of-Fit tests for measures on infinite dimensional Hilbert spaces
Authors:
George Wynne,
MikoĊaj Kasprzak,
Andrew B. Duncan
Abstract:
Kernel Stein discrepancy (KSD) is a widely used kernel-based measure of discrepancy between probability measures. It is often employed in the scenario where a user has a collection of samples from a candidate probability measure and wishes to compare them against a specified target probability measure. KSD has been employed in a range of settings including goodness-of-fit testing, parametric infer…
▽ More
Kernel Stein discrepancy (KSD) is a widely used kernel-based measure of discrepancy between probability measures. It is often employed in the scenario where a user has a collection of samples from a candidate probability measure and wishes to compare them against a specified target probability measure. KSD has been employed in a range of settings including goodness-of-fit testing, parametric inference, MCMC output assessment and generative modelling. However, so far the method has been restricted to finite-dimensional data. We provide the first analysis of KSD in the generality of data lying in a separable Hilbert space, for example functional data. The main result is a novel Fourier representation of KSD obtained by combining the theory of measure equations with kernel methods. This allows us to prove that KSD can separate measures and thus is valid to use in practice. Additionally, our results improve the interpretability of KSD by decoupling the effect of the kernel and Stein operator. We demonstrate the efficacy of the proposed methodology by performing goodness-of-fit tests for various Gaussian and non-Gaussian functional models in a number of synthetic data experiments.
△ Less
Submitted 20 August, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Theoretical Guarantees for the Statistical Finite Element Method
Authors:
Yanni Papandreou,
Jon Cockayne,
Mark Girolami,
Andrew B. Duncan
Abstract:
The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar co…
▽ More
The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar convergence properties to the finite element method on which it is based. Our results constitute a bound on the Wasserstein-2 distance between the ideal prior and posterior and the StatFEM approximation thereof, and show that this distance converges at the same mesh-dependent rate as finite element solutions converge to the true solution. Several numerical examples are presented to demonstrate our theory, including an example which test the robustness of StatFEM when extended to nonlinear quantities of interest.
△ Less
Submitted 18 February, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Ensemble Inference Methods for Models With Noisy and Expensive Likelihoods
Authors:
Oliver R. A. Dunbar,
Andrew B. Duncan,
Andrew M. Stuart,
Marie-Therese Wolfram
Abstract:
The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle syste…
▽ More
The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large scale parametric structure of interest. {A motivating example from climate science is presented, and ensemble Kalman methods (which do not use the derivative of the parameter-to-data map) are shown, empirically, to perform well. Multiscale analysis is then used to analyze the behaviour of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods} (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favourably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, whilst ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favourably.
△ Less
Submitted 22 January, 2022; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Probabilistic Gradients for Fast Calibration of Differential Equation Models
Authors:
Jon Cockayne,
Andrew B. Duncan
Abstract:
Calibration of large-scale differential equation models to observational or experimental data is a widespread challenge throughout applied sciences and engineering. A crucial bottleneck in state-of-the art calibration methods is the calculation of local sensitivities, i.e. derivatives of the loss function with respect to the estimated parameters, which often necessitates several numerical solves o…
▽ More
Calibration of large-scale differential equation models to observational or experimental data is a widespread challenge throughout applied sciences and engineering. A crucial bottleneck in state-of-the art calibration methods is the calculation of local sensitivities, i.e. derivatives of the loss function with respect to the estimated parameters, which often necessitates several numerical solves of the underlying system of partial or ordinary differential equations. In this paper we present a new probabilistic approach to computing local sensitivities. The proposed method has several advantages over classical methods. Firstly, it operates within a constrained computational budget and provides a probabilistic quantification of uncertainty incurred in the sensitivities from this constraint. Secondly, information from previous sensitivity estimates can be recycled in subsequent computations, reducing the overall computational effort for iterative gradient-based calibration methods. The methodology presented is applied to two challenging test problems and compared against classical methods.
△ Less
Submitted 22 February, 2021; v1 submitted 3 September, 2020;
originally announced September 2020.
-
A Kernel Two-Sample Test for Functional Data
Authors:
George Wynne,
Andrew B. Duncan
Abstract:
We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This construction is motivated by a scaling analysis of the efficiency of MMD-based tests for datasets of increasing dimension. Theoretical properties of kernels on func…
▽ More
We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This construction is motivated by a scaling analysis of the efficiency of MMD-based tests for datasets of increasing dimension. Theoretical properties of kernels on function spaces and their associated MMD are established and employed to ascertain the efficacy of the newly proposed test, as well as to assess the effects of using functional reconstructions based on discretised function samples. The theoretical results are demonstrated over a range of synthetic and real world datasets.
△ Less
Submitted 19 October, 2020; v1 submitted 25 August, 2020;
originally announced August 2020.
-
Manifold Learning for Accelerating Coarse-Grained Optimization
Authors:
Dmitry Pozharskiy,
Noah J. Wichrowski,
Andrew B. Duncan,
Grigorios A. Pavliotis,
Ioannis G. Kevrekidis
Abstract:
Algorithms proposed for solving high-dimensional optimization problems with no derivative information frequently encounter the "curse of dimensionality," becoming ineffective as the dimension of the parameter space grows. One feature of a subclass of such problems that are effectively low-dimensional is that only a few parameters (or combinations thereof) are important for the optimization and mus…
▽ More
Algorithms proposed for solving high-dimensional optimization problems with no derivative information frequently encounter the "curse of dimensionality," becoming ineffective as the dimension of the parameter space grows. One feature of a subclass of such problems that are effectively low-dimensional is that only a few parameters (or combinations thereof) are important for the optimization and must be explored in detail. Knowing these parameters/ combinations in advance would greatly simplify the problem and its solution. We propose the data-driven construction of an effective (coarse-grained, "trend") optimizer, based on data obtained from ensembles of brief simulation bursts with an "inner" optimization algorithm, that has the potential to accelerate the exploration of the parameter space. The trajectories of this "effective optimizer" quickly become attracted onto a slow manifold parameterized by the few relevant parameter combinations. We obtain the parameterization of this low-dimensional, effective optimization manifold on the fly using data mining/manifold learning techniques on the results of simulation (inner optimizer iteration) burst ensembles and exploit it locally to "jump" forward along this manifold. As a result, we can bias the exploration of the parameter space towards the few, important directions and, through this "wrapper algorithm," speed up the convergence of traditional optimization algorithms.
△ Less
Submitted 25 April, 2020; v1 submitted 10 January, 2020;
originally announced January 2020.
-
Minimum Stein Discrepancy Estimators
Authors:
Alessandro Barp,
Francois-Xavier Briol,
Andrew B. Duncan,
Mark Girolami,
Lester Mackey
Abstract:
When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co…
▽ More
When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities.
△ Less
Submitted 5 October, 2022; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Statistical Inference for Generative Models with Maximum Mean Discrepancy
Authors:
Francois-Xavier Briol,
Alessandro Barp,
Andrew B. Duncan,
Mark Girolami
Abstract:
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation…
▽ More
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
The True Cost of Stochastic Gradient Langevin Dynamics
Authors:
Tigran Nagapetyan,
Andrew B. Duncan,
Leonard Hasenclever,
Sebastian J. Vollmer,
Lukasz Szpruch,
Konstantinos Zygalakis
Abstract:
The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Mar…
▽ More
The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Using Perturbed Underdamped Langevin Dynamics to Efficiently Sample from Probability Distributions
Authors:
A. B. Duncan,
N. Nuesken,
G. A. Pavliotis
Abstract:
In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance…
▽ More
In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance. We present a detailed analysis of the new Langevin sampler for Gaussian target distributions. Our theoretical results are supported by numerical experiments with non-Gaussian target measures.
△ Less
Submitted 29 April, 2017;
originally announced May 2017.
-
Note on A. Barbour's paper on Stein's method for diffusion approximations
Authors:
Mikolaj J. Kasprzak,
Andrew B. Duncan,
Sebastian J. Vollmer
Abstract:
In (Barbour, 1990) foundations for diffusion approximation via Stein's method are laid. This paper has been cited more than 130 times and is a cornerstone in the area of Stein's method. A semigroup argument is used therein to solve a Stein equation for Gaussian diffusion approximation. We prove that, contrary to the claim in (Barbour, 1990), the semigroup considered therein is not strongly continu…
▽ More
In (Barbour, 1990) foundations for diffusion approximation via Stein's method are laid. This paper has been cited more than 130 times and is a cornerstone in the area of Stein's method. A semigroup argument is used therein to solve a Stein equation for Gaussian diffusion approximation. We prove that, contrary to the claim in (Barbour, 1990), the semigroup considered therein is not strongly continuous on the Banach space of continuous, real-valued functions on D[0,1] growing slower than a cubic, equipped with an appropriate norm. We also provide a proof of the exact formulation of the solution to the Stein equation of interest, which does not require the aforementioned strong continuity. This shows that the main results of (Barbour, 1990) hold true.
△ Less
Submitted 15 April, 2017; v1 submitted 10 February, 2017;
originally announced February 2017.
-
Measuring Sample Quality with Diffusions
Authors:
Jackson Gorham,
Andrew B. Duncan,
Sebastian J. Vollmer,
Lester Mackey
Abstract:
Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators bas…
▽ More
Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators based on Ito diffusions and develop explicit multivariate Stein factor bounds for any target with a fast-coupling Ito diffusion. As example applications, we develop computable and convergence-determining diffusion Stein discrepancies for log-concave, heavy-tailed, and multimodal targets and use these quality measures to select the hyperparameters of biased Markov chain Monte Carlo (MCMC) samplers, compare random and deterministic quadrature rules, and quantify bias-variance tradeoffs in approximate MCMC. Our results establish a near-linear relationship between diffusion Stein discrepancies and Wasserstein distances, improving upon past work even for strongly log-concave targets. The exposed relationship between Stein factors and Markov process coupling may be of independent interest.
△ Less
Submitted 12 November, 2018; v1 submitted 21 November, 2016;
originally announced November 2016.
-
Variance Reduction using Nonreversible Langevin Samplers
Authors:
A. B. Duncan,
T. Lelievre,
G. A. Pavliotis
Abstract:
A standard approach to computing expectations with respect to a given target measure is to introduce an overdamped Langevin equation which is reversible with respect to the target distribution, and to approximate the expectation by a time-averaging estimator. As has been noted in recent papers, introducing an appropriately chosen nonreversible component to the dynamics is beneficial, both in terms…
▽ More
A standard approach to computing expectations with respect to a given target measure is to introduce an overdamped Langevin equation which is reversible with respect to the target distribution, and to approximate the expectation by a time-averaging estimator. As has been noted in recent papers, introducing an appropriately chosen nonreversible component to the dynamics is beneficial, both in terms of reducing the asymptotic variance and of speeding up convergence to the target distribution. In this paper we present a detailed study of the dependence of the asymptotic variance on the deviation from reversibility. Our theoretical findings are supported by numerical simulations.
△ Less
Submitted 28 December, 2015; v1 submitted 16 June, 2015;
originally announced June 2015.
-
Homogenization of lateral diffusion on a random surface
Authors:
A. B. Duncan
Abstract:
We study the problem of lateral diffusion on a static, quasi-planar surface generated by a stationary, ergodic random field possessing rapid small-scale spatial fluctuations. The aim is to study the effective behaviour of a particle undergoing Brownian motion on the surface viewed as a projection on the underlying plane. By formulating the problem as a diffusion in a random medium, we are able to…
▽ More
We study the problem of lateral diffusion on a static, quasi-planar surface generated by a stationary, ergodic random field possessing rapid small-scale spatial fluctuations. The aim is to study the effective behaviour of a particle undergoing Brownian motion on the surface viewed as a projection on the underlying plane. By formulating the problem as a diffusion in a random medium, we are able to use known results from the theory of stochastic homogenization of SDEs to show that, in the limit of small scale fluctuations, the diffusion process behaves quantitatively like a Brownian motion with constant diffusion tensor $D$. While $D$ will not have a closed-form expression in general, we are able to derive variational bounds for the effective diffusion tensor, and using a duality transformation argument, obtain a closed form expression for $D$ in the special case where $D$ is isotropic. We also describe a numerical scheme for approximating the effective diffusion tensor and illustrate this scheme with two examples.
△ Less
Submitted 30 January, 2014; v1 submitted 22 January, 2014;
originally announced January 2014.
-
A multiscale analysis of diffusions on rapidly varying surfaces
Authors:
A. B. Duncan,
C. M. Elliott,
G. A. Pavliotis,
A. M. Stuart
Abstract:
Lateral diffusion of molecules on surfaces plays a very important role in various biological processes, including lipid transport across the cell membrane, synaptic transmission and other phenomena such as exo- and endocytosis, signal transduction, chemotaxis and cell growth. In many cases, the surfaces can possess spatial inhomogeneities and/or be rapidly changing shape. Using a generalisation of…
▽ More
Lateral diffusion of molecules on surfaces plays a very important role in various biological processes, including lipid transport across the cell membrane, synaptic transmission and other phenomena such as exo- and endocytosis, signal transduction, chemotaxis and cell growth. In many cases, the surfaces can possess spatial inhomogeneities and/or be rapidly changing shape. Using a generalisation of the model for a thermally excited Helfrich elastic membrane, we consider the problem of lateral diffusion on quasi-planar surfaces, possessing both spatial and temporal fluctuations. Using results from homogenisation theory, we show that, under the assumption of scale separation between the characteristic length and time scales of the membrane fluctuations and the characteristic scale of the diffusing particle, the lateral diffusion process can be well approximated by a Brownian motion on the plane with constant diffusion tensor $D$ which depends in a highly nonlinear way on the detailed properties of the surface. The effective diffusion tensor will depend on the relative scales of the spatial and temporal fluctuations and, for different scaling regimes, we prove the existence of a macroscopic limit in each case.
△ Less
Submitted 10 November, 2013; v1 submitted 5 November, 2013;
originally announced November 2013.