Search | arXiv e-print repository

A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Authors: Kevin H. Huang, Xing Liu, Andrew B. Duncan, Axel Gandy

Abstract: We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate… ▽ More We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth. △ Less

Submitted 2 July, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: COLT camera-ready version

arXiv:2206.04552 [pdf, other]

A Fourier representation of kernel Stein discrepancy with application to Goodness-of-Fit tests for measures on infinite dimensional Hilbert spaces

Authors: George Wynne, Mikołaj Kasprzak, Andrew B. Duncan

Abstract: Kernel Stein discrepancy (KSD) is a widely used kernel-based measure of discrepancy between probability measures. It is often employed in the scenario where a user has a collection of samples from a candidate probability measure and wishes to compare them against a specified target probability measure. KSD has been employed in a range of settings including goodness-of-fit testing, parametric infer… ▽ More Kernel Stein discrepancy (KSD) is a widely used kernel-based measure of discrepancy between probability measures. It is often employed in the scenario where a user has a collection of samples from a candidate probability measure and wishes to compare them against a specified target probability measure. KSD has been employed in a range of settings including goodness-of-fit testing, parametric inference, MCMC output assessment and generative modelling. However, so far the method has been restricted to finite-dimensional data. We provide the first analysis of KSD in the generality of data lying in a separable Hilbert space, for example functional data. The main result is a novel Fourier representation of KSD obtained by combining the theory of measure equations with kernel methods. This allows us to prove that KSD can separate measures and thus is valid to use in practice. Additionally, our results improve the interpretability of KSD by decoupling the effect of the kernel and Stein operator. We demonstrate the efficacy of the proposed methodology by performing goodness-of-fit tests for various Gaussian and non-Gaussian functional models in a number of synthetic data experiments. △ Less

Submitted 20 August, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: To appear in Bernoulli

arXiv:2111.07691 [pdf, other]

Theoretical Guarantees for the Statistical Finite Element Method

Authors: Yanni Papandreou, Jon Cockayne, Mark Girolami, Andrew B. Duncan

Abstract: The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar co… ▽ More The statistical finite element method (StatFEM) is an emerging probabilistic method that allows observations of a physical system to be synthesised with the numerical solution of a PDE intended to describe it in a coherent statistical framework, to compensate for model error. This work presents a new theoretical analysis of the statistical finite element method demonstrating that it has similar convergence properties to the finite element method on which it is based. Our results constitute a bound on the Wasserstein-2 distance between the ideal prior and posterior and the StatFEM approximation thereof, and show that this distance converges at the same mesh-dependent rate as finite element solutions converge to the true solution. Several numerical examples are presented to demonstrate our theory, including an example which test the robustness of StatFEM when extended to nonlinear quantities of interest. △ Less

Submitted 18 February, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 27 pages for main article, 11 pages for supplement, 8 figures; typos corrected

MSC Class: 65N75 (Primary) 35R60; 65C30 (Secondary)

arXiv:2104.03384 [pdf, other]

Ensemble Inference Methods for Models With Noisy and Expensive Likelihoods

Authors: Oliver R. A. Dunbar, Andrew B. Duncan, Andrew M. Stuart, Marie-Therese Wolfram

Abstract: The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle syste… ▽ More The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large scale parametric structure of interest. {A motivating example from climate science is presented, and ensemble Kalman methods (which do not use the derivative of the parameter-to-data map) are shown, empirically, to perform well. Multiscale analysis is then used to analyze the behaviour of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods} (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favourably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, whilst ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favourably. △ Less

Submitted 22 January, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

MSC Class: 65C05; 65C40; 60J22

arXiv:2009.04239 [pdf, other]

Probabilistic Gradients for Fast Calibration of Differential Equation Models

Authors: Jon Cockayne, Andrew B. Duncan

Abstract: Calibration of large-scale differential equation models to observational or experimental data is a widespread challenge throughout applied sciences and engineering. A crucial bottleneck in state-of-the art calibration methods is the calculation of local sensitivities, i.e. derivatives of the loss function with respect to the estimated parameters, which often necessitates several numerical solves o… ▽ More Calibration of large-scale differential equation models to observational or experimental data is a widespread challenge throughout applied sciences and engineering. A crucial bottleneck in state-of-the art calibration methods is the calculation of local sensitivities, i.e. derivatives of the loss function with respect to the estimated parameters, which often necessitates several numerical solves of the underlying system of partial or ordinary differential equations. In this paper we present a new probabilistic approach to computing local sensitivities. The proposed method has several advantages over classical methods. Firstly, it operates within a constrained computational budget and provides a probabilistic quantification of uncertainty incurred in the sensitivities from this constraint. Secondly, information from previous sensitivity estimates can be recycled in subsequent computations, reducing the overall computational effort for iterative gradient-based calibration methods. The methodology presented is applied to two challenging test problems and compared against classical methods. △ Less

Submitted 22 February, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

arXiv:2008.11095 [pdf, other]

A Kernel Two-Sample Test for Functional Data

Authors: George Wynne, Andrew B. Duncan

Abstract: We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This construction is motivated by a scaling analysis of the efficiency of MMD-based tests for datasets of increasing dimension. Theoretical properties of kernels on func… ▽ More We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This construction is motivated by a scaling analysis of the efficiency of MMD-based tests for datasets of increasing dimension. Theoretical properties of kernels on function spaces and their associated MMD are established and employed to ascertain the efficacy of the newly proposed test, as well as to assess the effects of using functional reconstructions based on discretised function samples. The theoretical results are demonstrated over a range of synthetic and real world datasets. △ Less

Submitted 19 October, 2020; v1 submitted 25 August, 2020; originally announced August 2020.

Comments: Added to numerics section

arXiv:2001.03518 [pdf, other]

Manifold Learning for Accelerating Coarse-Grained Optimization

Authors: Dmitry Pozharskiy, Noah J. Wichrowski, Andrew B. Duncan, Grigorios A. Pavliotis, Ioannis G. Kevrekidis

Abstract: Algorithms proposed for solving high-dimensional optimization problems with no derivative information frequently encounter the "curse of dimensionality," becoming ineffective as the dimension of the parameter space grows. One feature of a subclass of such problems that are effectively low-dimensional is that only a few parameters (or combinations thereof) are important for the optimization and mus… ▽ More Algorithms proposed for solving high-dimensional optimization problems with no derivative information frequently encounter the "curse of dimensionality," becoming ineffective as the dimension of the parameter space grows. One feature of a subclass of such problems that are effectively low-dimensional is that only a few parameters (or combinations thereof) are important for the optimization and must be explored in detail. Knowing these parameters/ combinations in advance would greatly simplify the problem and its solution. We propose the data-driven construction of an effective (coarse-grained, "trend") optimizer, based on data obtained from ensembles of brief simulation bursts with an "inner" optimization algorithm, that has the potential to accelerate the exploration of the parameter space. The trajectories of this "effective optimizer" quickly become attracted onto a slow manifold parameterized by the few relevant parameter combinations. We obtain the parameterization of this low-dimensional, effective optimization manifold on the fly using data mining/manifold learning techniques on the results of simulation (inner optimizer iteration) burst ensembles and exploit it locally to "jump" forward along this manifold. As a result, we can bias the exploration of the parameter space towards the few, important directions and, through this "wrapper algorithm," speed up the convergence of traditional optimization algorithms. △ Less

Submitted 25 April, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: 30 pages, 18 figures. Submitted to the Journal of Computational Dynamics. Revised in response to reviewers' comments

MSC Class: 37N40 (Primary); 90C56 (Secondary)

arXiv:1906.08283 [pdf, other]

Minimum Stein Discrepancy Estimators

Authors: Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

Abstract: When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co… ▽ More When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities. △ Less

Submitted 5 October, 2022; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: Accepted for publication at NeurIPS 2019

arXiv:1906.05944 [pdf, other]

Statistical Inference for Generative Models with Maximum Mean Discrepancy

Authors: Francois-Xavier Briol, Alessandro Barp, Andrew B. Duncan, Mark Girolami

Abstract: While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation… ▽ More While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1706.02692 [pdf, other]

The True Cost of Stochastic Gradient Langevin Dynamics

Authors: Tigran Nagapetyan, Andrew B. Duncan, Leonard Hasenclever, Sebastian J. Vollmer, Lukasz Szpruch, Konstantinos Zygalakis

Abstract: The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Mar… ▽ More The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings. △ Less

Submitted 8 June, 2017; originally announced June 2017.

Comments: 6 Figures

MSC Class: 65C05

arXiv:1705.00170 [pdf, other]

doi 10.1007/s10955-017-1906-8

Using Perturbed Underdamped Langevin Dynamics to Efficiently Sample from Probability Distributions

Authors: A. B. Duncan, N. Nuesken, G. A. Pavliotis

Abstract: In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance… ▽ More In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance. We present a detailed analysis of the new Langevin sampler for Gaussian target distributions. Our theoretical results are supported by numerical experiments with non-Gaussian target measures. △ Less

Submitted 29 April, 2017; originally announced May 2017.

Comments: 45 pages, 4 figures

arXiv:1702.03130 [pdf, other]

doi 10.1214/17-ECP54

Note on A. Barbour's paper on Stein's method for diffusion approximations

Authors: Mikolaj J. Kasprzak, Andrew B. Duncan, Sebastian J. Vollmer

Abstract: In (Barbour, 1990) foundations for diffusion approximation via Stein's method are laid. This paper has been cited more than 130 times and is a cornerstone in the area of Stein's method. A semigroup argument is used therein to solve a Stein equation for Gaussian diffusion approximation. We prove that, contrary to the claim in (Barbour, 1990), the semigroup considered therein is not strongly continu… ▽ More In (Barbour, 1990) foundations for diffusion approximation via Stein's method are laid. This paper has been cited more than 130 times and is a cornerstone in the area of Stein's method. A semigroup argument is used therein to solve a Stein equation for Gaussian diffusion approximation. We prove that, contrary to the claim in (Barbour, 1990), the semigroup considered therein is not strongly continuous on the Banach space of continuous, real-valued functions on D[0,1] growing slower than a cubic, equipped with an appropriate norm. We also provide a proof of the exact formulation of the solution to the Stein equation of interest, which does not require the aforementioned strong continuity. This shows that the main results of (Barbour, 1990) hold true. △ Less

Submitted 15 April, 2017; v1 submitted 10 February, 2017; originally announced February 2017.

Comments: 8 pages

MSC Class: Primary: 60B10; 60F17; Secondary: 60J60; 60J65; 60E05

Journal ref: Electron. Commun. Probab. 22 (2017), paper no. 23

arXiv:1611.06972 [pdf, other]

Measuring Sample Quality with Diffusions

Authors: Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey

Abstract: Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators bas… ▽ More Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators based on Ito diffusions and develop explicit multivariate Stein factor bounds for any target with a fast-coupling Ito diffusion. As example applications, we develop computable and convergence-determining diffusion Stein discrepancies for log-concave, heavy-tailed, and multimodal targets and use these quality measures to select the hyperparameters of biased Markov chain Monte Carlo (MCMC) samplers, compare random and deterministic quadrature rules, and quantify bias-variance tradeoffs in approximate MCMC. Our results establish a near-linear relationship between diffusion Stein discrepancies and Wasserstein distances, improving upon past work even for strongly log-concave targets. The exposed relationship between Stein factors and Markov process coupling may be of independent interest. △ Less

Submitted 12 November, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

MSC Class: 60J60; 62-04; 62E17; 60E15; 65C60 (Primary) 62-07; 65C05; 68T05 (Secondary)

arXiv:1506.04934 [pdf, other]

doi 10.1007/s10955-016-1491-2

Variance Reduction using Nonreversible Langevin Samplers

Authors: A. B. Duncan, T. Lelievre, G. A. Pavliotis

Abstract: A standard approach to computing expectations with respect to a given target measure is to introduce an overdamped Langevin equation which is reversible with respect to the target distribution, and to approximate the expectation by a time-averaging estimator. As has been noted in recent papers, introducing an appropriately chosen nonreversible component to the dynamics is beneficial, both in terms… ▽ More A standard approach to computing expectations with respect to a given target measure is to introduce an overdamped Langevin equation which is reversible with respect to the target distribution, and to approximate the expectation by a time-averaging estimator. As has been noted in recent papers, introducing an appropriately chosen nonreversible component to the dynamics is beneficial, both in terms of reducing the asymptotic variance and of speeding up convergence to the target distribution. In this paper we present a detailed study of the dependence of the asymptotic variance on the deviation from reversibility. Our theoretical findings are supported by numerical simulations. △ Less

Submitted 28 December, 2015; v1 submitted 16 June, 2015; originally announced June 2015.

Comments: 30 pages

MSC Class: 60F05; 60J25; 60J60; 65C05; 82B80

arXiv:1401.5689 [pdf, other]

Homogenization of lateral diffusion on a random surface

Authors: A. B. Duncan

Abstract: We study the problem of lateral diffusion on a static, quasi-planar surface generated by a stationary, ergodic random field possessing rapid small-scale spatial fluctuations. The aim is to study the effective behaviour of a particle undergoing Brownian motion on the surface viewed as a projection on the underlying plane. By formulating the problem as a diffusion in a random medium, we are able to… ▽ More We study the problem of lateral diffusion on a static, quasi-planar surface generated by a stationary, ergodic random field possessing rapid small-scale spatial fluctuations. The aim is to study the effective behaviour of a particle undergoing Brownian motion on the surface viewed as a projection on the underlying plane. By formulating the problem as a diffusion in a random medium, we are able to use known results from the theory of stochastic homogenization of SDEs to show that, in the limit of small scale fluctuations, the diffusion process behaves quantitatively like a Brownian motion with constant diffusion tensor $D$. While $D$ will not have a closed-form expression in general, we are able to derive variational bounds for the effective diffusion tensor, and using a duality transformation argument, obtain a closed form expression for $D$ in the special case where $D$ is isotropic. We also describe a numerical scheme for approximating the effective diffusion tensor and illustrate this scheme with two examples. △ Less

Submitted 30 January, 2014; v1 submitted 22 January, 2014; originally announced January 2014.

Comments: 25 pages, 7 figures

MSC Class: 35Q92; 60H30; 35B27

arXiv:1311.1007 [pdf, other]

A multiscale analysis of diffusions on rapidly varying surfaces

Authors: A. B. Duncan, C. M. Elliott, G. A. Pavliotis, A. M. Stuart

Abstract: Lateral diffusion of molecules on surfaces plays a very important role in various biological processes, including lipid transport across the cell membrane, synaptic transmission and other phenomena such as exo- and endocytosis, signal transduction, chemotaxis and cell growth. In many cases, the surfaces can possess spatial inhomogeneities and/or be rapidly changing shape. Using a generalisation of… ▽ More Lateral diffusion of molecules on surfaces plays a very important role in various biological processes, including lipid transport across the cell membrane, synaptic transmission and other phenomena such as exo- and endocytosis, signal transduction, chemotaxis and cell growth. In many cases, the surfaces can possess spatial inhomogeneities and/or be rapidly changing shape. Using a generalisation of the model for a thermally excited Helfrich elastic membrane, we consider the problem of lateral diffusion on quasi-planar surfaces, possessing both spatial and temporal fluctuations. Using results from homogenisation theory, we show that, under the assumption of scale separation between the characteristic length and time scales of the membrane fluctuations and the characteristic scale of the diffusing particle, the lateral diffusion process can be well approximated by a Brownian motion on the plane with constant diffusion tensor $D$ which depends in a highly nonlinear way on the detailed properties of the surface. The effective diffusion tensor will depend on the relative scales of the spatial and temporal fluctuations and, for different scaling regimes, we prove the existence of a macroscopic limit in each case. △ Less

Submitted 10 November, 2013; v1 submitted 5 November, 2013; originally announced November 2013.

Comments: 56 pages, 9 figures, submitted to J. Nonlin. Sci

MSC Class: 35Q92; 60H30; 35B27

Showing 1–16 of 16 results for author: Duncan, A B