Search | arXiv e-print repository

arXiv:2406.19051 [pdf, other]

Stochastic Gradient Piecewise Deterministic Monte Carlo Samplers

Authors: Paul Fearnhead, Sebastiano Grazzi, Chris Nemeth, Gareth O. Roberts

Abstract: Recent work has suggested using Monte Carlo methods based on piecewise deterministic Markov processes (PDMPs) to sample from target distributions of interest. PDMPs are non-reversible continuous-time processes endowed with momentum, and hence can mix better than standard reversible MCMC samplers. Furthermore, they can incorporate exact sub-sampling schemes which only require access to a single (ra… ▽ More Recent work has suggested using Monte Carlo methods based on piecewise deterministic Markov processes (PDMPs) to sample from target distributions of interest. PDMPs are non-reversible continuous-time processes endowed with momentum, and hence can mix better than standard reversible MCMC samplers. Furthermore, they can incorporate exact sub-sampling schemes which only require access to a single (randomly selected) data point at each iteration, yet without introducing bias to the algorithm's stationary distribution. However, the range of models for which PDMPs can be used, particularly with sub-sampling, is limited. We propose approximate simulation of PDMPs with sub-sampling for scalable sampling from posterior distributions. The approximation takes the form of an Euler approximation to the true PDMP dynamics, and involves using an estimate of the gradient of the log-posterior based on a data sub-sample. We thus call this class of algorithms stochastic-gradient PDMPs. Importantly, the trajectories of stochastic-gradient PDMPs are continuous and can leverage recent ideas for sampling from measures with continuous and atomic components. We show these methods are easy to implement, present results on their approximation error and demonstrate numerically that this class of algorithms has similar efficiency to, but is more robust than, stochastic gradient Langevin dynamics. △ Less

Submitted 27 June, 2024; originally announced June 2024.

MSC Class: 62-08 62F15

arXiv:2406.11664 [pdf, other]

Diffusion Generative Modelling for Divide-and-Conquer MCMC

Authors: C. Trojan, P. Fearnhead, C. Nemeth

Abstract: Divide-and-conquer MCMC is a strategy for parallelising Markov Chain Monte Carlo sampling by running independent samplers on disjoint subsets of a dataset and merging their output. An ongoing challenge in the literature is to efficiently perform this merging without imposing distributional assumptions on the posteriors. We propose using diffusion generative modelling to fit density approximations… ▽ More Divide-and-conquer MCMC is a strategy for parallelising Markov Chain Monte Carlo sampling by running independent samplers on disjoint subsets of a dataset and merging their output. An ongoing challenge in the literature is to efficiently perform this merging without imposing distributional assumptions on the posteriors. We propose using diffusion generative modelling to fit density approximations to the subposterior distributions. This approach outperforms existing methods on challenging merging problems, while its computational cost scales more efficiently to high dimensional problems than existing density estimation approaches. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures

arXiv:2405.14392 [pdf, other]

Markovian Flow Matching: Accelerating MCMC with Continuous Normalizing Flows

Authors: Alberto Cabezas, Louis Sharrock, Christopher Nemeth

Abstract: Continuous normalizing flows (CNFs) learn the probability path between a reference and a target density by modeling the vector field generating said path using neural networks. Recently, Lipman et al. (2022) introduced a simple and inexpensive method for training CNFs in generative modeling, termed flow matching (FM). In this paper, we re-purpose this method for probabilistic inference by incorpor… ▽ More Continuous normalizing flows (CNFs) learn the probability path between a reference and a target density by modeling the vector field generating said path using neural networks. Recently, Lipman et al. (2022) introduced a simple and inexpensive method for training CNFs in generative modeling, termed flow matching (FM). In this paper, we re-purpose this method for probabilistic inference by incorporating Markovian sampling methods in evaluating the FM objective and using the learned probability path to improve Monte Carlo sampling. We propose a sequential method, which uses samples from a Markov chain to fix the probability path defining the FM objective. We augment this scheme with an adaptive tempering mechanism that allows the discovery of multiple modes in the target. Under mild assumptions, we establish convergence to a local optimum of the FM objective, discuss improvements in the convergence rate, and illustrate our methods on synthetic and real-world examples. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.08514 [pdf, other]

Spatial Latent Gaussian Modelling with Change of Support

Authors: Erick A. Chacón-Montalván, Peter M. Atkinson, Christopher Nemeth, Benjamin M. Taylor, Paula Moraga

Abstract: Spatial data are often derived from multiple sources (e.g. satellites, in-situ sensors, survey samples) with different supports, but associated with the same properties of a spatial phenomenon of interest. It is common for predictors to also be measured on different spatial supports than the response variables. Although there is no standard way to work with spatial data with different supports, a… ▽ More Spatial data are often derived from multiple sources (e.g. satellites, in-situ sensors, survey samples) with different supports, but associated with the same properties of a spatial phenomenon of interest. It is common for predictors to also be measured on different spatial supports than the response variables. Although there is no standard way to work with spatial data with different supports, a prevalent approach used by practitioners has been to use downscaling or interpolation to project all the variables of analysis towards a common support, and then using standard spatial models. The main disadvantage with this approach is that simple interpolation can introduce biases and, more importantly, the uncertainty associated with the change of support is not taken into account in parameter estimation. In this article, we propose a Bayesian spatial latent Gaussian model that can handle data with different rectilinear supports in both the response variable and predictors. Our approach allows to handle changes of support more naturally according to the properties of the spatial stochastic process being used, and to take into account the uncertainty from the change of support in parameter estimation and prediction. We use spatial stochastic processes as linear combinations of basis functions where Gaussian Markov random fields define the weights. Our hierarchical modelling approach can be described by the following steps: (i) define a latent model where response variables and predictors are considered as latent stochastic processes with continuous support, (ii) link the continuous-index set stochastic processes with its projection to the support of the observed data, (iii) link the projected process with the observed data. We show the applicability of our approach by simulation studies and modelling land suitability for improved grassland in Rhondda Cynon Taf, a county borough in Wales. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 45 pages, 16 figures

arXiv:2402.00809 [pdf, other]

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential. △ Less

Submitted 2 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2310.17546 [pdf, other]

A changepoint approach to modelling non-stationary soil moisture dynamics

Authors: Mengyi Gong, Rebecca Killick, Christopher Nemeth, John Quinton

Abstract: Soil moisture dynamics provide an indicator of soil health that scientists model via soil drydown curves. The typical modeling process requires the soil moisture time series to be manually separated into drydown segments and then exponential decay models are fitted to them independently. Sensor development over recent years means that experiments that were previously conducted over a few field cam… ▽ More Soil moisture dynamics provide an indicator of soil health that scientists model via soil drydown curves. The typical modeling process requires the soil moisture time series to be manually separated into drydown segments and then exponential decay models are fitted to them independently. Sensor development over recent years means that experiments that were previously conducted over a few field campaigns can now be scaled to months or even years, often at a higher sampling rate. Manual identification of drydown segments is no longer practical. To better meet the challenge of increasing data size, this paper proposes a novel changepoint-based approach to automatically identify structural changes in the soil drying process, and estimate the parameters characterizing the drying processes simultaneously. A simulation study is carried out to assess the performance of the method. The results demonstrate its ability to identify structural changes and retrieve key parameters of interest to soil scientists. The method is applied to hourly soil moisture time series from the NEON data portal to investigate the temporal dynamics of soil moisture drydown. We recover known relationships previously identified manually, alongside delivering new insights into the temporal variability across soil types and locations. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 19 pages for the main manuscript, 6 pages for the supplemental document

MSC Class: 62M10; 62P12

arXiv:2305.14943 [pdf, other]

Learning Rate Free Sampling in Constrained Domains

Authors: Louis Sharrock, Lester Mackey, Christopher Nemeth

Abstract: We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing con… ▽ More We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing constrained sampling algorithms, including mirrored Langevin dynamics and mirrored Stein variational gradient descent. We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex, sampling with fairness constraints, and constrained sampling problems in post-selection inference. Our results indicate that our algorithms achieve competitive performance with existing constrained sampling methods, without the need to tune any hyperparameters. △ Less

Submitted 26 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted at NeurIPS 2023

arXiv:2305.14916 [pdf, other]

Tuning-Free Maximum Likelihood Training of Latent Variable Models via Coin Betting

Authors: Louis Sharrock, Daniel Dodd, Christopher Nemeth

Abstract: We introduce two new particle-based algorithms for learning latent variable models via marginal maximum likelihood estimation, including one which is entirely tuning-free. Our methods are based on the perspective of marginal maximum likelihood estimation as an optimization problem: namely, as the minimization of a free energy functional. One way to solve this problem is via the discretization of a… ▽ More We introduce two new particle-based algorithms for learning latent variable models via marginal maximum likelihood estimation, including one which is entirely tuning-free. Our methods are based on the perspective of marginal maximum likelihood estimation as an optimization problem: namely, as the minimization of a free energy functional. One way to solve this problem is via the discretization of a gradient flow associated with the free energy. We study one such approach, which resembles an extension of Stein variational gradient descent, establishing a descent lemma which guarantees that the free energy decreases at each iteration. This method, and any other obtained as the discretization of the gradient flow, necessarily depends on a learning rate which must be carefully tuned by the practitioner in order to ensure convergence at a suitable rate. With this in mind, we also propose another algorithm for optimizing the free energy which is entirely learning rate free, based on coin betting techniques from convex optimization. We validate the performance of our algorithms across several numerical experiments, including several high-dimensional settings. Our results are competitive with existing particle-based methods, without the need for any hyperparameter tuning. △ Less

Submitted 1 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2301.11294 [pdf, other]

Coin Sampling: Gradient-Based Bayesian Inference without Learning Rates

Authors: Louis Sharrock, Christopher Nemeth

Abstract: In recent years, particle-based variational inference (ParVI) methods such as Stein variational gradient descent (SVGD) have grown in popularity as scalable methods for Bayesian inference. Unfortunately, the properties of such methods invariably depend on hyperparameters such as the learning rate, which must be carefully tuned by the practitioner in order to ensure convergence to the target measur… ▽ More In recent years, particle-based variational inference (ParVI) methods such as Stein variational gradient descent (SVGD) have grown in popularity as scalable methods for Bayesian inference. Unfortunately, the properties of such methods invariably depend on hyperparameters such as the learning rate, which must be carefully tuned by the practitioner in order to ensure convergence to the target measure at a suitable rate. In this paper, we introduce a suite of new particle-based methods for scalable Bayesian inference based on coin betting, which are entirely learning-rate free. We illustrate the performance of our approach on a range of numerical examples, including several high-dimensional models and datasets, demonstrating comparable performance to other ParVI algorithms with no need to tune a learning rate. △ Less

Submitted 1 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2210.16189 [pdf, ps, other]

Preferential Subsampling for Stochastic Gradient Langevin Dynamics

Authors: Srshti Putcha, Christopher Nemeth, Paul Fearnhead

Abstract: Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed… ▽ More Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used. △ Less

Submitted 8 July, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 22 pages, 5 figures. Appeared in the proceedings of AISTATS 2023

arXiv:2210.10644 [pdf, other]

Transport Elliptical Slice Sampling

Authors: Alberto Cabezas, Christopher Nemeth

Abstract: We propose a new framework for efficiently sampling from complex probability distributions using a combination of normalizing flows and elliptical slice sampling (Murray et al., 2010). The central idea is to learn a diffeomorphism, through normalizing flows, that maps the non-Gaussian structure of the target distribution to an approximately Gaussian distribution. We then use the elliptical slice s… ▽ More We propose a new framework for efficiently sampling from complex probability distributions using a combination of normalizing flows and elliptical slice sampling (Murray et al., 2010). The central idea is to learn a diffeomorphism, through normalizing flows, that maps the non-Gaussian structure of the target distribution to an approximately Gaussian distribution. We then use the elliptical slice sampler, an efficient and tuning-free Markov chain Monte Carlo (MCMC) algorithm, to sample from the transformed distribution. The samples are then pulled back using the inverse normalizing flow, yielding samples that approximate the stationary target distribution of interest. Our transport elliptical slice sampler (TESS) is optimized for modern computer architectures, where its adaptation mechanism utilizes parallel cores to rapidly run multiple Markov chains for a few iterations. Numerical demonstrations show that TESS produces Monte Carlo samples from the target distribution with lower autocorrelation compared to non-transformed samplers, and demonstrates significant improvements in efficiency when compared to gradient-based proposals designed for parallel computer architectures, given a flexible enough diffeomorphism. △ Less

Submitted 27 March, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2208.04080 [pdf, other]

SwISS: A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy

Authors: Callum Vyner, Christopher Nemeth, Chris Sherlock

Abstract: Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the p… ▽ More Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 11 pages, 3 figures

arXiv:2206.09995 [pdf, other]

Modelling Populations of Interaction Networks via Distance Metrics

Authors: George Bolt, Simón Lunagómez, Christopher Nemeth

Abstract: Network data arises through observation of relational information between a collection of entities. Recent work in the literature has independently considered when (i) one observes a sample of networks, connectome data in neuroscience being a ubiquitous example, and (ii) the units of observation within a network are edges or paths, such as emails between people or a series of page visits to a webs… ▽ More Network data arises through observation of relational information between a collection of entities. Recent work in the literature has independently considered when (i) one observes a sample of networks, connectome data in neuroscience being a ubiquitous example, and (ii) the units of observation within a network are edges or paths, such as emails between people or a series of page visits to a website by a user, often referred to as interaction network data. The intersection of these two cases, however, is yet to be considered. In this paper, we propose a new Bayesian modelling framework to analyse such data. Given a practitioner-specified distance metric between observations, we define families of models through location and scale parameters, akin to a Gaussian distribution, with subsequent inference of model parameters providing reasoned statistical summaries for this non-standard data structure. To facilitate inference, we propose specialised Markov chain Monte Carlo (MCMC) schemes capable of sampling from doubly-intractable posterior distributions over discrete and multi-dimensional parameter spaces. Through simulation studies we confirm the efficacy of our methodology and inference scheme, whilst its application we illustrate via an example analysis of a location-based social network (LSBN) data set. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: 42 pages (76 with supplementary materials), 16 figures

arXiv:2206.08858 [pdf, other]

Distances for Comparing Multisets and Sequences

Authors: George Bolt, Simón Lunagómez, Christopher Nemeth

Abstract: Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of structured data for which standard distance measures are inapplicable. In this paper, we consider the problem of measuring the distance between sequences and multiset… ▽ More Measuring the distance between data points is fundamental to many statistical techniques, such as dimension reduction or clustering algorithms. However, improvements in data collection technologies has led to a growing versatility of structured data for which standard distance measures are inapplicable. In this paper, we consider the problem of measuring the distance between sequences and multisets of points lying within a metric space, motivated by the analysis of an in-play football data set. Drawing on the wider literature, including that of time series analysis and optimal transport, we discuss various distances which are available in such an instance. For each distance, we state and prove theoretical properties, proposing possible extensions where they fail. Finally, via an example analysis of the in-play football data, we illustrate the usefulness of these distances in practice. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 15 pages (41 pages with appendix), 5 figures

arXiv:2201.09681 [pdf, other]

Multivariate sensitivity analysis for a large-scale climate impact and adaptation model

Authors: Oluwole Oyebamiji, Christopher Nemeth, Paula Harrison, Rob Dunford, George Cojocaru

Abstract: We develop a new efficient methodology for Bayesian global sensitivity analysis for large-scale multivariate data. The focus is on computationally demanding models with correlated variables. A multivariate Gaussian process is used as a surrogate model to replace the expensive computer model. To improve the computational efficiency and performance of the model, compactly supported correlation funct… ▽ More We develop a new efficient methodology for Bayesian global sensitivity analysis for large-scale multivariate data. The focus is on computationally demanding models with correlated variables. A multivariate Gaussian process is used as a surrogate model to replace the expensive computer model. To improve the computational efficiency and performance of the model, compactly supported correlation functions are used. The goal is to generate sparse matrices, which give crucial advantages when dealing with large datasets, where we use cross-validation to determine the optimal degree of sparsity. This method was combined with a robust adaptive Metropolis algorithm coupled with a parallel implementation to speed up the convergence to the target distribution. The method was applied to a multivariate dataset from the IMPRESSIONS Integrated Assessment Platform (IAP2), an extension of the CLIMSAVE IAP, which has been widely applied in climate change impact, adaptation and vulnerability assessments. Our empirical results on synthetic and IAP2 data show that the proposed methods are efficient and accurate for global sensitivity analysis of complex models. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.10220 [pdf, other]

Sequential Estimation of Temporally Evolving Latent Space Network Models

Authors: Kathryn Turnbull, Christopher Nemeth, Matthew Nunes, Tyler McCormick

Abstract: In this article we focus on dynamic network data which describe interactions among a fixed population through time. We model this data using the latent space framework, in which the probability of a connection forming is expressed as a function of low-dimensional latent coordinates associated with the nodes, and consider sequential estimation of model parameters via Sequential Monte Carlo (SMC) me… ▽ More In this article we focus on dynamic network data which describe interactions among a fixed population through time. We model this data using the latent space framework, in which the probability of a connection forming is expressed as a function of low-dimensional latent coordinates associated with the nodes, and consider sequential estimation of model parameters via Sequential Monte Carlo (SMC) methods. In this setting, SMC is a natural candidate for estimation which offers greater scalability than existing approaches commonly considered in the literature, allows for estimates to be conveniently updated given additional observations and facilitates both online and offline inference. We present a novel approach to sequentially infer parameters of dynamic latent space network models by building on techniques from the high-dimensional SMC literature. Furthermore, we examine the scalability and performance of our approach via simulation, demonstrate the flexibility of our approach to model variants and analyse a real-world dataset describing classroom contacts. △ Less

Submitted 19 December, 2021; originally announced December 2021.

arXiv:2106.01982 [pdf, other]

Gaussian Processes on Hypergraphs

Authors: Thomas Pinder, Kathryn Turnbull, Christopher Nemeth, David Leslie

Abstract: We derive a Matern Gaussian process (GP) on the vertices of a hypergraph. This enables estimation of regression models of observed or latent values associated with the vertices, in which the correlation and uncertainty estimates are informed by the hypergraph structure. We further present a framework for embedding the vertices of a hypergraph into a latent space using the hypergraph GP. Finally, w… ▽ More We derive a Matern Gaussian process (GP) on the vertices of a hypergraph. This enables estimation of regression models of observed or latent values associated with the vertices, in which the correlation and uncertainty estimates are informed by the hypergraph structure. We further present a framework for embedding the vertices of a hypergraph into a latent space using the hypergraph GP. Finally, we provide a scheme for identifying a small number of representative inducing vertices that enables scalable inference through sparse GPs. We demonstrate the utility of our framework on three challenging real-world problems that concern multi-class classification for the political party affiliation of legislators on the basis of voting behaviour, probabilistic matrix factorisation of movie reviews, and embedding a hypergraph of animals into a low-dimensional latent space. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: 25 pages, 6 figures

arXiv:2105.13059 [pdf, other]

Efficient and Generalizable Tuning Strategies for Stochastic Gradient MCMC

Authors: Jeremie Coullon, Leah South, Christopher Nemeth

Abstract: Stochastic gradient Markov chain Monte Carlo (SGMCMC) is a popular class of algorithms for scalable Bayesian inference. However, these algorithms include hyperparameters such as step size or batch size that influence the accuracy of estimators based on the obtained posterior samples. As a result, these hyperparameters must be tuned by the practitioner and currently no principled and automated way… ▽ More Stochastic gradient Markov chain Monte Carlo (SGMCMC) is a popular class of algorithms for scalable Bayesian inference. However, these algorithms include hyperparameters such as step size or batch size that influence the accuracy of estimators based on the obtained posterior samples. As a result, these hyperparameters must be tuned by the practitioner and currently no principled and automated way to tune them exists. Standard MCMC tuning methods based on acceptance rates cannot be used for SGMCMC, thus requiring alternative tools and diagnostics. We propose a novel bandit-based algorithm that tunes the SGMCMC hyperparameters by minimizing the Stein discrepancy between the true posterior and its Monte Carlo approximation. We provide theoretical results supporting this approach and assess various Stein-based discrepancies. We support our results with experiments on both simulated and real datasets, and find that this method is practical for a wide range of applications. △ Less

Submitted 18 November, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

arXiv:2105.11022 [pdf, other]

Robust Bayesian Nonparametric Variable Selection for Linear Regression

Authors: Alberto Cabezas, Marco Battiston, Christopher Nemeth

Abstract: Spike-and-slab and horseshoe regression are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real-world statistics and machine learning applications. In this work, we propose a Bayesian nonparametric approach to linea… ▽ More Spike-and-slab and horseshoe regression are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real-world statistics and machine learning applications. In this work, we propose a Bayesian nonparametric approach to linear regression that performs variable selection while accounting for outliers and heteroskedasticity. Our proposed model is an instance of a Dirichlet process scale mixture model with the advantage that we can derive the full conditional distributions of all parameters in closed form, hence producing an efficient Gibbs sampler for posterior inference. Moreover, we present how to extend the model to account for heavy-tailed response variables. The performance of the model is tested against competing algorithms on synthetic and real-world datasets. △ Less

Submitted 19 October, 2022; v1 submitted 23 May, 2021; originally announced May 2021.

arXiv:2104.10979 [pdf, other]

A Probabilistic Assessment of the COVID-19 Lockdown on Air Quality in the UK

Authors: Thomas Pinder, Michael Hollaway, Christopher Nemeth, Paul J. Young, David Leslie

Abstract: In March 2020 the United Kingdom (UK) entered a nationwide lockdown period due to the Covid-19 pandemic. As a result, levels of nitrogen dioxide (NO2) in the atmosphere dropped. In this work, we use 550,134 NO2 data points from 237 stations in the UK to build a spatiotemporal Gaussian process capable of predicting NO2 levels across the entire UK. We integrate several covariate datasets to enhance… ▽ More In March 2020 the United Kingdom (UK) entered a nationwide lockdown period due to the Covid-19 pandemic. As a result, levels of nitrogen dioxide (NO2) in the atmosphere dropped. In this work, we use 550,134 NO2 data points from 237 stations in the UK to build a spatiotemporal Gaussian process capable of predicting NO2 levels across the entire UK. We integrate several covariate datasets to enhance the model's ability to capture the complex spatiotemporal dynamics of NO2. Our numerical analyses show that, within two weeks of a UK lockdown being imposed, UK NO2 levels dropped 36.8%. Further, we show that as a direct result of lockdown NO2 levels were 29-38% lower than what they would have been had no lockdown occurred. In accompaniment to these numerical results, we provide a software framework that allows practitioners to easily and efficiently fit similar models. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 14 pages, 4 figures

arXiv:2009.12141 [pdf, other]

Stein Variational Gaussian Processes

Authors: Thomas Pinder, Christopher Nemeth, David Leslie

Abstract: We show how to use Stein variational gradient descent (SVGD) to carry out inference in Gaussian process (GP) models with non-Gaussian likelihoods and large data volumes. Markov chain Monte Carlo (MCMC) is extremely computationally intensive for these situations, but the parametric assumptions required for efficient variational inference (VI) result in incorrect inference when they encounter the mu… ▽ More We show how to use Stein variational gradient descent (SVGD) to carry out inference in Gaussian process (GP) models with non-Gaussian likelihoods and large data volumes. Markov chain Monte Carlo (MCMC) is extremely computationally intensive for these situations, but the parametric assumptions required for efficient variational inference (VI) result in incorrect inference when they encounter the multi-modal posterior distributions that are common for such models. SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC. We prove that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior. Our method is demonstrated on benchmark problems in both regression and classification, a multimodal posterior, and an air quality example with 550,134 spatiotemporal observations, showing substantial performance improvements over MCMC and VI. △ Less

Submitted 19 January, 2022; v1 submitted 25 September, 2020; originally announced September 2020.

Comments: 26 pages, 5 figures

arXiv:2002.00033 [pdf, other]

Semi-Exact Control Functionals From Sard's Method

Authors: Leah F. South, Toni Karvonen, Chris Nemeth, Mark Girolami, Chris. J. Oates

Abstract: The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estim… ▽ More The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estimators approximate a Gaussian cubature method near the Bernstein-von-Mises limit. The main theoretical result establishes a bias-correction property in settings where the Markov chain does not leave the posterior invariant. Empirical results are presented across a selection of Bayesian inference tasks. All methods used in this paper are available in the R package ZVCV. △ Less

Submitted 6 May, 2021; v1 submitted 31 January, 2020; originally announced February 2020.

Comments: There are 17 pages of main text. This revision provides an extended version of Theorem 1

arXiv:1912.10496 [pdf, other]

Discussion of "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé

Authors: Leah F. South, Chris Nemeth, Chris J. Oates

Abstract: This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B. This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B. △ Less

Submitted 20 January, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: This comment includes an appendix which was not included in the printed JRSS B discussion. Version 2 has been shorted slightly to meet word limit requirements

arXiv:1909.00472 [pdf, other]

Latent Space Modelling of Hypergraph Data

Authors: Kathryn Turnbull, Simón Lunagómez, Christopher Nemeth, Edoardo Airoldi

Abstract: The increasing prevalence of relational data describing interactions among a target population has motivated a wide literature on statistical network analysis. In many applications, interactions may involve more than two members of the population and this data is more appropriately represented by a hypergraph. In this paper, we present a model for hypergraph data which extends the well established… ▽ More The increasing prevalence of relational data describing interactions among a target population has motivated a wide literature on statistical network analysis. In many applications, interactions may involve more than two members of the population and this data is more appropriately represented by a hypergraph. In this paper, we present a model for hypergraph data which extends the well established latent space approach for graphs and, by drawing a connection to constructs from computational topology, we develop a model whose likelihood is inexpensive to compute. A delayed-acceptance MCMC scheme is proposed to obtain posterior samples and we rely on Bookstein coordinates to remove the identifiability issues associated with the latent representation. We theoretically examine the degree distribution of hypergraphs generated under our framework and, through simulation, we investigate the flexibility of our model and consider estimation of predictive distributions. Finally, we explore the application of our model to two real-world datasets. △ Less

Submitted 2 November, 2021; v1 submitted 1 September, 2019; originally announced September 2019.

Comments: 46 pages, 13 figures

arXiv:1907.06986 [pdf, other]

Stochastic gradient Markov chain Monte Carlo

Authors: Christopher Nemeth, Paul Fearnhead

Abstract: Markov chain Monte Carlo (MCMC) algorithms are generally regarded as the gold standard technique for Bayesian inference. They are theoretically well-understood and conceptually simple to apply in practice. The drawback of MCMC is that in general performing exact inference requires all of the data to be processed at each iteration of the algorithm. For large data sets, the computational cost of MCM… ▽ More Markov chain Monte Carlo (MCMC) algorithms are generally regarded as the gold standard technique for Bayesian inference. They are theoretically well-understood and conceptually simple to apply in practice. The drawback of MCMC is that in general performing exact inference requires all of the data to be processed at each iteration of the algorithm. For large data sets, the computational cost of MCMC can be prohibitive, which has led to recent developments in scalable Monte Carlo algorithms that have a significantly lower computational cost than standard MCMC. In this paper, we focus on a particular class of scalable Monte Carlo algorithms, stochastic gradient Markov chain Monte Carlo (SGMCMC) which utilises data subsampling techniques to reduce the per-iteration cost of MCMC. We provide an introduction to some popular SGMCMC algorithms and review the supporting theoretical results, as well as comparing the efficiency of SGMCMC algorithms against MCMC on benchmark examples. The supporting R code is available online. △ Less

Submitted 16 July, 2019; originally announced July 2019.

arXiv:1901.10568 [pdf, other]

Stochastic Gradient MCMC for Nonlinear State Space Models

Authors: Christopher Aicher, Srshti Putcha, Christopher Nemeth, Paul Fearnhead, Emily B. Fox

Abstract: State space models (SSMs) provide a flexible framework for modeling complex time series via a latent stochastic process. Inference for nonlinear, non-Gaussian SSMs is often tackled with particle methods that do not scale well to long time series. The challenge is two-fold: not only do computations scale linearly with time, as in the linear case, but particle filters additionally suffer from increa… ▽ More State space models (SSMs) provide a flexible framework for modeling complex time series via a latent stochastic process. Inference for nonlinear, non-Gaussian SSMs is often tackled with particle methods that do not scale well to long time series. The challenge is two-fold: not only do computations scale linearly with time, as in the linear case, but particle filters additionally suffer from increasing particle degeneracy with longer series. Stochastic gradient MCMC methods have been developed to scale Bayesian inference for finite-state hidden Markov models and linear SSMs using buffered stochastic gradient estimates to account for temporal dependencies. We extend these stochastic gradient estimators to nonlinear SSMs using particle methods. We present error bounds that account for both buffering error and particle error in the case of nonlinear SSMs that are log-concave in the latent process. We evaluate our proposed particle buffered stochastic gradient using stochastic gradient MCMC for inference on both long sequential synthetic and minute-resolution financial returns data, demonstrating the importance of this class of methods. △ Less

Submitted 16 July, 2023; v1 submitted 29 January, 2019; originally announced January 2019.

Comments: To appear in Bayesian Analysis

arXiv:1812.09064 [pdf, other]

GaussianProcesses.jl: A Nonparametric Bayes package for the Julia Language

Authors: Jamie Fairbrother, Christopher Nemeth, Maxime Rischard, Johanni Brea, Thomas Pinder

Abstract: Gaussian processes are a class of flexible nonparametric Bayesian tools that are widely used across the sciences, and in industry, to model complex data sources. Key to applying Gaussian process models is the availability of well-developed open source software, which is available in many programming languages. In this paper, we present a tutorial of the GaussianProcesses.jl package that has been d… ▽ More Gaussian processes are a class of flexible nonparametric Bayesian tools that are widely used across the sciences, and in industry, to model complex data sources. Key to applying Gaussian process models is the availability of well-developed open source software, which is available in many programming languages. In this paper, we present a tutorial of the GaussianProcesses.jl package that has been developed for the Julia programming language. GaussianProcesses.jl utilises the inherent computational benefits of the Julia language, including multiple dispatch and just-in-time compilation, to produce a fast, flexible and user-friendly Gaussian processes package. The package provides many mean and kernel functions with supporting inference tools to fit exact Gaussian process models, as well as a range of alternative likelihood functions to handle non-Gaussian data (e.g. binary classification models) and sparse approximations for scalable Gaussian processes. The package makes efficient use of existing Julia packages to provide users with a range of optimization and plotting tools. △ Less

Submitted 30 June, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: 32 pages, 10 figures. Updated version includes sparse GPs

arXiv:1806.07137 [pdf, other]

Large-Scale Stochastic Sampling from the Probability Simplex

Authors: Jack Baker, Paul Fearnhead, Emily B Fox, Christopher Nemeth

Abstract: Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demons… ▽ More Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demonstrate that because of this, current SGMCMC methods for the simplex struggle with sparse simplex spaces; when many of the components are close to zero. Unfortunately, many popular large-scale Bayesian models, such as network or topic models, require inference on sparse simplex spaces. To avoid the biases caused by this discretization error, we propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and we prove that samples from the SCIR process are asymptotically unbiased. We discuss how this idea can be extended to target other constrained spaces. Use of the SCIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches. △ Less

Submitted 26 October, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

Comments: Accepted to Advances in Neural Information Processing Systems (2018)

arXiv:1710.00578 [pdf, other]

sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo

Authors: Jack Baker, Paul Fearnhead, Emily B. Fox, Christopher Nemeth

Abstract: This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iterati… ▽ More This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iteration. SGMCMC requires calculating gradients of the log likelihood and log priors, which can be time consuming and error prone to perform by hand. The sgmcmc package calculates these gradients itself using automatic differentiation, making the implementation of these methods much easier. To do this, the package uses the software library TensorFlow, which has a variety of statistical distributions and mathematical operations as standard, meaning a wide class of models can be built using this framework. SGMCMC has become widely adopted in the machine learning literature, but less so in the statistics community. We believe this may be partly due to lack of software; this package aims to bridge this gap. △ Less

Submitted 13 April, 2018; v1 submitted 2 October, 2017; originally announced October 2017.

arXiv:1708.05239 [pdf, other]

Pseudo-extended Markov chain Monte Carlo

Authors: Christopher Nemeth, Fredrik Lindsten, Maurizio Filippone, James Hensman

Abstract: Sampling from posterior distributions using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations, particularly when the posterior is multi-modal as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for mu… ▽ More Sampling from posterior distributions using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations, particularly when the posterior is multi-modal as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for multi-modal posterior distributions. The pseudo-extended method augments the state-space of the posterior using pseudo-samples as auxiliary variables. On the extended space, the modes of the posterior are connected, which allows the MCMC sampler to easily move between well-separated posterior modes. We demonstrate that the pseudo-extended approach delivers improved MCMC sampling over the Hamiltonian Monte Carlo algorithm on multi-modal posteriors, including Boltzmann machines and models with sparsity-inducing priors. △ Less

Submitted 29 October, 2019; v1 submitted 17 August, 2017; originally announced August 2017.

Comments: Advances in Neural Information Processing Systems 2019

arXiv:1706.05439 [pdf, other]

Control Variates for Stochastic Gradient MCMC

Authors: Jack Baker, Paul Fearnhead, Emily B. Fox, Christopher Nemeth

Abstract: It is well known that Markov chain Monte Carlo (MCMC) methods scale poorly with dataset size. A popular class of methods for solving this issue is stochastic gradient MCMC. These methods use a noisy estimate of the gradient of the log posterior, which reduces the per iteration computational cost of the algorithm. Despite this, there are a number of results suggesting that stochastic gradient Lange… ▽ More It is well known that Markov chain Monte Carlo (MCMC) methods scale poorly with dataset size. A popular class of methods for solving this issue is stochastic gradient MCMC. These methods use a noisy estimate of the gradient of the log posterior, which reduces the per iteration computational cost of the algorithm. Despite this, there are a number of results suggesting that stochastic gradient Langevin dynamics (SGLD), probably the most popular of these methods, still has computational cost proportional to the dataset size. We suggest an alternative log posterior gradient estimate for stochastic gradient MCMC, which uses control variates to reduce the variance. We analyse SGLD using this gradient estimate, and show that, under log-concavity assumptions on the target distribution, the computational cost required for a given level of accuracy is independent of the dataset size. Next we show that a different control variate technique, known as zero variance control variates can be applied to SGMCMC algorithms for free. This post-processing step improves the inference of the algorithm by reducing the variance of the MCMC output. Zero variance control variates rely on the gradient of the log posterior; we explore how the variance reduction is affected by replacing this with the noisy gradient estimate calculated by SGMCMC. △ Less

Submitted 14 December, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

arXiv:1605.08576 [pdf, other]

Merging MCMC Subposteriors through Gaussian-Process Approximations

Authors: Christopher Nemeth, Chris Sherlock

Abstract: Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate workers. The challenge wit… ▽ More Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate workers. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm targeting the expectation of the posterior density provides a sample from an approximation to the posterior; secondly, evaluating the true posterior at the sampled points leads to an importance sampler that, asymptotically, targets the true posterior expectations; finally, an alternative importance sampler uses the full Gaussian-process distribution of the approximation to the log-posterior density to re-weight any initial sample and provide both an estimate of the posterior expectation and a measure of the uncertainty in it. △ Less

Submitted 17 July, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

Comments: Accepted to Bayesian Analysis

arXiv:1510.02604 [pdf, other]

doi 10.1109/TSP.2013.2296278

Sequential Monte Carlo Methods for State and Parameter Estimation in Abruptly Changing Environments

Authors: Christopher Nemeth, Paul Fearnhead, Lyudmila Mihaylova

Abstract: This paper develops a novel sequential Monte Carlo (SMC) approach for joint state and parameter estimation that can deal efficiently with abruptly changing parameters which is a common case when tracking maneuvering targets. The approach combines Bayesian methods for dealing with changepoints with methods for estimating static parameters within the SMC framework. The result is an approach which ad… ▽ More This paper develops a novel sequential Monte Carlo (SMC) approach for joint state and parameter estimation that can deal efficiently with abruptly changing parameters which is a common case when tracking maneuvering targets. The approach combines Bayesian methods for dealing with changepoints with methods for estimating static parameters within the SMC framework. The result is an approach which adaptively estimates the model parameters in accordance with changes to the target's trajectory. The developed approach is compared against the Interacting Multiple Model (IMM) filter for tracking a maneuvering target over a complex maneuvering scenario with nonlinear observations. In the IMM filter a large combination of models is required to account for unknown parameters. In contrast, the proposed approach circumvents the combinatorial complexity of applying multiple models in the IMM filter through Bayesian parameter estimation techniques. The developed approach is validated over complex maneuvering scenarios where both the system parameters and measurement noise parameters are unknown. Accurate estimation results are presented. △ Less

Submitted 9 October, 2015; originally announced October 2015.

Comments: 26 pages, 5 figures

Journal ref: IEEE Transactions on Signal Processing (2014), Volume:62, Issue: 5

arXiv:1412.7299 [pdf, other]

Particle Metropolis-adjusted Langevin algorithms

Authors: Christopher Nemeth, Chris Sherlock, Paul Fearnhead

Abstract: This paper proposes a new sampling scheme based on Langevin dynamics that is applicable within pseudo-marginal and particle Markov chain Monte Carlo algorithms. We investigate this algorithm's theoretical properties under standard asymptotics, which correspond to an increasing dimension of the parameters, $n$. Our results show that the behaviour of the algorithm depends crucially on how accurately… ▽ More This paper proposes a new sampling scheme based on Langevin dynamics that is applicable within pseudo-marginal and particle Markov chain Monte Carlo algorithms. We investigate this algorithm's theoretical properties under standard asymptotics, which correspond to an increasing dimension of the parameters, $n$. Our results show that the behaviour of the algorithm depends crucially on how accurately one can estimate the gradient of the log target density. If the error in the estimate of the gradient is not sufficiently controlled as dimension increases, then asymptotically there will be no advantage over the simpler random-walk algorithm. However, if the error is sufficiently well-behaved, then the optimal scaling of this algorithm will be $O(n^{-1/6})$ compared to $O(n^{-1/2})$ for the random walk. Our theory also gives guidelines on how to tune the number of Monte Carlo samples in the likelihood estimate and the proposal step-size. △ Less

Submitted 27 May, 2016; v1 submitted 23 December, 2014; originally announced December 2014.

Comments: Accepted to Biometrika. Main text: 22 pages and 3 figures. Supplementary material: 18 pages and 7 figures

arXiv:1402.0694

Particle Metropolis adjusted Langevin algorithms for state space models

Authors: Chris Nemeth, Paul Fearnhead

Abstract: Particle MCMC is a class of algorithms that can be used to analyse state-space models. They use MCMC moves to update the parameters of the models, and particle filters to propose values for the path of the state-space model. Currently the default is to use random walk Metropolis to update the parameter values. We show that it is possible to use information from the output of the particle filter to… ▽ More Particle MCMC is a class of algorithms that can be used to analyse state-space models. They use MCMC moves to update the parameters of the models, and particle filters to propose values for the path of the state-space model. Currently the default is to use random walk Metropolis to update the parameter values. We show that it is possible to use information from the output of the particle filter to obtain better proposal distributions for the parameters. In particular it is possible to obtain estimates of the gradient of the log posterior from each run of the particle filter, and use these estimates within a Langevin-type proposal. We propose using the recent computationally efficient approach of Nemeth et al. (2013) for obtaining such estimates. We show empirically that for a variety of state-space models this proposal is more efficient than the standard random walk Metropolis proposal in terms of: reducing autocorrelation of the posterior samples, reducing the burn-in time of the MCMC sampler and increasing the squared jump distance between posterior samples. △ Less

Submitted 24 December, 2014; v1 submitted 4 February, 2014; originally announced February 2014.

Comments: Replaced with updated article with new title at arXiv:1412.7299

arXiv:1306.0735 [pdf, other]

Particle approximations of the score and observed information matrix for parameter estimation in state space models with linear computational cost

Authors: Christopher Nemeth, Paul Fearnhead, Lyudmila Mihaylova

Abstract: Poyiadjis et al. (2011) show how particle methods can be used to estimate both the score and the observed information matrix for state space models. These methods either suffer from a computational cost that is quadratic in the number of particles, or produce estimates whose variance increases quadratically with the amount of data. This paper introduces an alternative approach for estimating these… ▽ More Poyiadjis et al. (2011) show how particle methods can be used to estimate both the score and the observed information matrix for state space models. These methods either suffer from a computational cost that is quadratic in the number of particles, or produce estimates whose variance increases quadratically with the amount of data. This paper introduces an alternative approach for estimating these terms at a computational cost that is linear in the number of particles. The method is derived using a combination of kernel density estimation, to avoid the particle degeneracy that causes the quadratically increasing variance, and Rao-Blackwellisation. Crucially, we show the method is robust to the choice of bandwidth within the kernel density estimation, as it has good asymptotic properties regardless of this choice. Our estimates of the score and observed information matrix can be used within both online and batch procedures for estimating parameters for state space models. Empirical results show improved parameter estimates compared to existing methods at a significantly reduced computational cost. Supplementary materials including code are available. △ Less

Submitted 4 September, 2015; v1 submitted 4 June, 2013; originally announced June 2013.

Comments: Accepted to Journal of Computational and Graphical Statistics

Showing 1–36 of 36 results for author: Nemeth, C