-
Intensity Profile Projection: A Framework for Continuous-Time Representation Learning for Dynamic Networks
Authors:
Alexander Modell,
Ian Gallagher,
Emma Ceccherini,
Nick Whiteley,
Patrick Rubin-Delanchy
Abstract:
We present a new representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$, each representing a time-stamped ($t$) interaction between two entities ($i,j$), our procedure returns a continuous-time trajectory for each node, representing its behaviour over time. The framework consists of three stages: estimating pairwise intens…
▽ More
We present a new representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$, each representing a time-stamped ($t$) interaction between two entities ($i,j$), our procedure returns a continuous-time trajectory for each node, representing its behaviour over time. The framework consists of three stages: estimating pairwise intensity functions, e.g. via kernel smoothing; learning a projection which minimises a notion of intensity reconstruction error; and constructing evolving node representations via the learned projection. The trajectories satisfy two properties, known as structural and temporal coherence, which we see as fundamental for reliable inference. Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses. The theory also elucidates the role of smoothing as a bias-variance trade-off, and shows how we can reduce the level of smoothing as the signal-to-noise ratio increases on account of the algorithm `borrowing strength' across the network.
△ Less
Submitted 17 January, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Hierarchical clustering with dot products recovers hidden tree structure
Authors:
Annie Gray,
Alexander Modell,
Patrick Rubin-Delanchy,
Nick Whiteley
Abstract:
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a…
▽ More
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model. The key technical innovations are to understand how hierarchical information in this model translates into tree geometry which can be recovered from data, and to characterise the benefits of simultaneously growing sample size and data dimension. We demonstrate superior tree recovery performance with real data over existing approaches such as UPGMA, Ward's method, and HDBSCAN.
△ Less
Submitted 1 March, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Implications of sparsity and high triangle density for graph representation learning
Authors:
Hannah Sansford,
Alexander Modell,
Nick Whiteley,
Patrick Rubin-Delanchy
Abstract:
Recent work has shown that sparse graphs containing many triangles cannot be reproduced using a finite-dimensional representation of the nodes, in which link probabilities are inner products. Here, we show that such graphs can be reproduced using an infinite-dimensional inner product model, where the node representations lie on a low-dimensional manifold. Recovering a global representation of the…
▽ More
Recent work has shown that sparse graphs containing many triangles cannot be reproduced using a finite-dimensional representation of the nodes, in which link probabilities are inner products. Here, we show that such graphs can be reproduced using an infinite-dimensional inner product model, where the node representations lie on a low-dimensional manifold. Recovering a global representation of the manifold is impossible in a sparse regime. However, we can zoom in on local neighbourhoods, where a lower-dimensional representation is possible. As our constructions allow the points to be uniformly distributed on the manifold, we find evidence against the common perception that triangles imply community structure.
△ Less
Submitted 21 April, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Statistical exploration of the Manifold Hypothesis
Authors:
Nick Whiteley,
Annie Gray,
Patrick Rubin-Delanchy
Abstract:
The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as…
▽ More
The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as a key factor in the success of modern AI technologies. We show that rich and sometimes intricate manifold structure in data can emerge from a generic and remarkably simple statistical model -- the Latent Metric Model -- via elementary concepts such as latent variables, correlation and stationarity. This establishes a general statistical explanation for why the Manifold Hypothesis seems to hold in so many situations. Informed by the Latent Metric Model we derive procedures to discover and interpret the geometry of high-dimensional data, and explore hypotheses about the data generating mechanism. These procedures operate under minimal assumptions and make use of well known, scaleable graph-analytic algorithms.
△ Less
Submitted 9 February, 2024; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Consistent and fast inference in compartmental models of epidemics using Poisson Approximate Likelihoods
Authors:
Michael Whitehouse,
Nick Whiteley,
Lorenzo Rimella
Abstract:
Addressing the challenge of scaling-up epidemiological inference to complex and heterogeneous models, we introduce Poisson Approximate Likelihood (PAL) methods. In contrast to the popular ODE approach to compartmental modelling, in which a large population limit is used to motivate a deterministic model, PALs are derived from approximate filtering equations for finite-population, stochastic compar…
▽ More
Addressing the challenge of scaling-up epidemiological inference to complex and heterogeneous models, we introduce Poisson Approximate Likelihood (PAL) methods. In contrast to the popular ODE approach to compartmental modelling, in which a large population limit is used to motivate a deterministic model, PALs are derived from approximate filtering equations for finite-population, stochastic compartmental models, and the large population limit drives consistency of maximum PAL estimators. Our theoretical results appear to be the first likelihood-based parameter estimation consistency results which apply to a broad class of partially observed stochastic compartmental models and address the large population limit. PALs are simple to implement, involving only elementary arithmetic operations and no tuning parameters, and fast to evaluate, requiring no simulation from the model and having computational cost independent of population size. Through examples we demonstrate how PALs can be used to: fit an age-structured model of influenza, taking advantage of automatic differentiation in Stan; compare over-dispersion mechanisms in a model of rotavirus by embedding PALs within sequential Monte Carlo; and evaluate the role of unit-specific parameters in a meta-population model of measles.
△ Less
Submitted 2 June, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Matrix factorisation and the interpretation of geodesic distance
Authors:
Nick Whiteley,
Annie Gray,
Patrick Rubin-Delanchy
Abstract:
Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is…
▽ More
Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques.
△ Less
Submitted 22 September, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
An invitation to sequential Monte Carlo samplers
Authors:
Chenguang Dai,
Jeremy Heng,
Pierre E. Jacob,
Nick Whiteley
Abstract:
Statisticians often use Monte Carlo methods to approximate probability distributions, primarily with Markov chain Monte Carlo and importance sampling. Sequential Monte Carlo samplers are a class of algorithms that combine both techniques to approximate distributions of interest and their normalizing constants. These samplers originate from particle filtering for state space models and have become…
▽ More
Statisticians often use Monte Carlo methods to approximate probability distributions, primarily with Markov chain Monte Carlo and importance sampling. Sequential Monte Carlo samplers are a class of algorithms that combine both techniques to approximate distributions of interest and their normalizing constants. These samplers originate from particle filtering for state space models and have become general and scalable sampling techniques. This article describes sequential Monte Carlo samplers and their possible implementations, arguing that they remain under-used in statistics, despite their ability to perform sequential inference and to leverage parallel processing resources among other potential benefits.
△ Less
Submitted 17 June, 2022; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Inference in Stochastic Epidemic Models via Multinomial Approximations
Authors:
Nick Whiteley,
Lorenzo Rimella
Abstract:
We introduce a new method for inference in stochastic epidemic models which uses recursive multinomial approximations to integrate over unobserved variables and thus circumvent likelihood intractability. The method is applicable to a class of discrete-time, finite-population compartmental models with partial, randomly under-reported or missing count observations. In contrast to state-of-the-art al…
▽ More
We introduce a new method for inference in stochastic epidemic models which uses recursive multinomial approximations to integrate over unobserved variables and thus circumvent likelihood intractability. The method is applicable to a class of discrete-time, finite-population compartmental models with partial, randomly under-reported or missing count observations. In contrast to state-of-the-art alternatives such as Approximate Bayesian Computation techniques, no forward simulation of the model is required and there are no tuning parameters. Evaluating the approximate marginal likelihood of model parameters is achieved through a computationally simple filtering recursion. The accuracy of the approximation is demonstrated through analysis of real and simulated data using a model of the 1995 Ebola outbreak in the Democratic Republic of Congo. We show how the method can be embedded within a Sequential Monte Carlo approach to estimating the time-varying reproduction number of COVID-19 in Wuhan, China, recently published by Kucharski et al. 2020.
△ Less
Submitted 23 February, 2021; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Dynamic Bayesian Neural Networks
Authors:
Lorenzo Rimella,
Nick Whiteley
Abstract:
We define an evolving in time Bayesian neural network called a Hidden Markov neural network. The weights of a feed-forward neural network are modelled with the hidden states of a Hidden Markov model, whose observed process is given by the available data. A filtering algorithm is used to learn a variational approximation to the evolving in time posterior over the weights. Training is pursued throug…
▽ More
We define an evolving in time Bayesian neural network called a Hidden Markov neural network. The weights of a feed-forward neural network are modelled with the hidden states of a Hidden Markov model, whose observed process is given by the available data. A filtering algorithm is used to learn a variational approximation to the evolving in time posterior over the weights. Training is pursued through a sequential version of Bayes by Backprop Blundell et al. 2015, which is enriched with a stronger regularization technique called variational DropConnect. The experiments test variational DropConnect on MNIST and display the performance of Hidden Markov neural networks on time series.
△ Less
Submitted 24 June, 2020; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Dynamic time series clustering via volatility change-points
Authors:
Nick Whiteley
Abstract:
This note outlines a method for clustering time series based on a statistical model in which volatility shifts at unobserved change-points. The model accommodates some classical stylized features of returns and its relation to GARCH is discussed. Clustering is performed using a probability metric evaluated between posterior distributions of the most recent change-point associated with each series.…
▽ More
This note outlines a method for clustering time series based on a statistical model in which volatility shifts at unobserved change-points. The model accommodates some classical stylized features of returns and its relation to GARCH is discussed. Clustering is performed using a probability metric evaluated between posterior distributions of the most recent change-point associated with each series. This implies series are grouped together at a given time if there is evidence the most recent shifts in their respective volatilities were coincident or closely timed. The clustering method is dynamic, in that grou**s may be updated in an online manner as data arrive. Numerical results are given analyzing daily returns of constituents of the S&P 500.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Exploiting locality in high-dimensional factorial hidden Markov models
Authors:
Lorenzo Rimella,
Nick Whiteley
Abstract:
We propose algorithms for approximate filtering and smoothing in high-dimensional Factorial hidden Markov models. The approximation involves discarding, in a principled way, likelihood factors according to a notion of locality in a factor graph associated with the emission distribution. This allows the exponential-in-dimension cost of exact filtering and smoothing to be avoided. We prove that the…
▽ More
We propose algorithms for approximate filtering and smoothing in high-dimensional Factorial hidden Markov models. The approximation involves discarding, in a principled way, likelihood factors according to a notion of locality in a factor graph associated with the emission distribution. This allows the exponential-in-dimension cost of exact filtering and smoothing to be avoided. We prove that the approximation accuracy, measured in a local total variation norm, is "dimension-free" in the sense that as the overall dimension of the model increases the error bounds we derive do not necessarily degrade. A key step in the analysis is to quantify the error introduced by localizing the likelihood function in a Bayes' rule update. The factorial structure of the likelihood function which we exploit arises naturally when data have known spatial or network structure. We demonstrate the new algorithms on synthetic examples and a London Underground passenger flow problem, where the factor graph is effectively given by the train network.
△ Less
Submitted 3 March, 2022; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Parallelising Particle Filters with Butterfly Interactions
Authors:
Kari Heine,
Nick Whiteley,
A. Taylan Cemgil
Abstract:
Bootstrap particle filter (BPF) is the corner stone of many popular algorithms used for solving inference problems involving time series that are observed through noisy measurements in a non-linear and non-Gaussian context. The long term stability of BPF arises from particle interactions which in the context of modern parallel computing systems typically means that particle information needs to be…
▽ More
Bootstrap particle filter (BPF) is the corner stone of many popular algorithms used for solving inference problems involving time series that are observed through noisy measurements in a non-linear and non-Gaussian context. The long term stability of BPF arises from particle interactions which in the context of modern parallel computing systems typically means that particle information needs to be communicated between processing elements, which makes parallel implementation of BPF nontrivial.
In this paper we show that it is possible to constrain the interactions in a way which, under some assumptions, enables the reduction of the cost of communicating the particle information while still preserving the consistency and the long term stability of the BPF. Numerical experiments demonstrate that although the imposed constraints introduce additional error, the proposed method shows potential to be the method of choice in certain settings.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.
-
The infinite Viterbi alignment and decay-convexity
Authors:
Nick Whiteley,
Matt W. Jones,
Aleks P. F. Domanski
Abstract:
The infinite Viterbi alignment is the limiting maximum a-posteriori estimate of the unobserved path in a hidden Markov model as the length of the time horizon grows. For models on state-space $\mathbb{R}^{d}$ satisfying a new ``decay-convexity'' condition, we develop an approach to existence of the infinite Viterbi alignment in an infinite dimensional Hilbert space. Quantitative bounds on the dist…
▽ More
The infinite Viterbi alignment is the limiting maximum a-posteriori estimate of the unobserved path in a hidden Markov model as the length of the time horizon grows. For models on state-space $\mathbb{R}^{d}$ satisfying a new ``decay-convexity'' condition, we develop an approach to existence of the infinite Viterbi alignment in an infinite dimensional Hilbert space. Quantitative bounds on the distance to the infinite Viterbi alignment, which are the first of their kind, are derived and used to illustrate how approximate estimation via parallelization can be accurate and scaleable to high-dimensional problems because the rate of convergence to the infinite Viterbi alignment does not necessarily depend on $d$. The results are applied to approximate estimation via parallelization and a model of neural population activity.
△ Less
Submitted 13 February, 2023; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Global consensus Monte Carlo
Authors:
Lewis J. Rendell,
Adam M. Johansen,
Anthony Lee,
Nick Whiteley
Abstract:
To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with eac…
▽ More
To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with each term, which are conditionally independent given the top-level parameters. One of these top-level parameters controls the unconditional strength of association between the auxiliary parameters. This model leads to a distributed MCMC algorithm on an extended state space yielding approximations of posterior expectations. A trade-off between computational tractability and fidelity to the original model can be controlled by changing the association strength in the instrumental model. We further propose the use of a SMC sampler with a sequence of association strengths, allowing both the automatic determination of appropriate strengths and for a bias correction technique to be applied. In contrast to similar distributed Monte Carlo algorithms, this approach requires few distributional assumptions. The performance of the algorithms is illustrated with a number of simulated examples.
△ Less
Submitted 7 April, 2020; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Dimension-free Wasserstein contraction of nonlinear filters
Authors:
Nick Whiteley
Abstract:
For a class of partially observed diffusions, conditions are given for the map from the initial condition of the signal to filtering distribution to be contractive with respect to Wasserstein distances, with rate which does not necessarily depend on the dimension of the state-space. The main assumptions are that the signal has affine drift and constant diffusion coefficient and that the likelihood…
▽ More
For a class of partially observed diffusions, conditions are given for the map from the initial condition of the signal to filtering distribution to be contractive with respect to Wasserstein distances, with rate which does not necessarily depend on the dimension of the state-space. The main assumptions are that the signal has affine drift and constant diffusion coefficient and that the likelihood functions are log-concave. Ergodic and nonergodic signals are handled in a single framework. Examples include linear-Gaussian, stochastic volatility, neural spike-train and dynamic generalized linear models. For these examples filter stability can be established without any assumptions on the observations.
△ Less
Submitted 19 January, 2021; v1 submitted 4 August, 2017;
originally announced August 2017.
-
Negative association, ordering and convergence of resampling methods
Authors:
Mathieu Gerber,
Nicolas Chopin,
Nick Whiteley
Abstract:
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost-sure weak convergence of measures output from Kitagawa's (1996) stratified resampling method. Carpenter et al's (1999) systematic resampling method is similar in structure but can fail to conv…
▽ More
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost-sure weak convergence of measures output from Kitagawa's (1996) stratified resampling method. Carpenter et al's (1999) systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of Srinivasan (2001), which shares some attractive properties of systematic resampling, but which exhibits negative association and therefore converges irrespective of the order of the input samples. We confirm a conjecture made by Kitagawa (1996) that ordering input samples by their states in $\mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $\mathbb{R}^d$, the variance of the resampling error is ${\scriptscriptstyle\mathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.
△ Less
Submitted 17 January, 2020; v1 submitted 6 July, 2017;
originally announced July 2017.
-
Sampling normalizing constants in high dimensions using inhomogeneous diffusions
Authors:
Christophe Andrieu,
James Ridgway,
Nick Whiteley
Abstract:
Motivated by the task of computing normalizing constants and importance sampling in high dimensions, we study the dimension dependence of fluctuations for additive functionals of time-inhomogeneous Langevin-type diffusions on $\mathbb{R}^{d}$. The main results are nonasymptotic variance and bias bounds, and a central limit theorem in the $d\to\infty$ regime. We demonstrate that a temporal discreti…
▽ More
Motivated by the task of computing normalizing constants and importance sampling in high dimensions, we study the dimension dependence of fluctuations for additive functionals of time-inhomogeneous Langevin-type diffusions on $\mathbb{R}^{d}$. The main results are nonasymptotic variance and bias bounds, and a central limit theorem in the $d\to\infty$ regime. We demonstrate that a temporal discretization inherits the fluctuation properties of the underlying diffusion, which are controlled at a computational cost growing at most polynomially with $d$. The key steps include establishing Poincaré inequalities for time-marginal distributions of the diffusion and nonasymptotic bounds on deviation from Gaussianity in a martingale central limit theorem.
△ Less
Submitted 6 September, 2018; v1 submitted 22 December, 2016;
originally announced December 2016.
-
An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter
Authors:
Svetoslav Kostov,
Nick Whiteley
Abstract:
We propose a new algorithm for approximating the non-asymptotic second moment of the marginal likelihood estimate, or normalizing constant, provided by a particle filter. The computational cost of the new method is $O(M)$ per time step, independently of the number of particles $N$ in the particle filter, where $M$ is a parameter controlling the quality of the approximation. This is in contrast to…
▽ More
We propose a new algorithm for approximating the non-asymptotic second moment of the marginal likelihood estimate, or normalizing constant, provided by a particle filter. The computational cost of the new method is $O(M)$ per time step, independently of the number of particles $N$ in the particle filter, where $M$ is a parameter controlling the quality of the approximation. This is in contrast to $O(MN)$ for a simple averaging technique using $M$ i.i.d. replicates of a particle filter with $N$ particles. We establish that the approximation delivered by the new algorithm is unbiased, strongly consistent and, under standard regularity conditions, increasing $M$ linearly with time is sufficient to prevent growth of the relative variance of the approximation, whereas for the simple averaging technique it can be necessary to increase $M$ exponentially with time in order to achieve the same effect. Numerical examples illustrate performance in the context of a stochastic Lotka\textendash Volterra system and a simple AR(1) model.
△ Less
Submitted 18 August, 2016; v1 submitted 6 February, 2016;
originally announced February 2016.
-
Stability with respect to initial conditions in V-norm for nonlinear filters with ergodic observations
Authors:
Mathieu Gerber,
Nick Whiteley
Abstract:
We establish conditions for an exponential rate of forgetting of the initial distribution of nonlinear filters in $V$-norm, path-wise along almost all observation sequences. In contrast to previous works, our results allow for unbounded test functions. The analysis is conducted in an general setup involving nonnegative kernels in a random environment which allows treatment of filters and predictio…
▽ More
We establish conditions for an exponential rate of forgetting of the initial distribution of nonlinear filters in $V$-norm, path-wise along almost all observation sequences. In contrast to previous works, our results allow for unbounded test functions. The analysis is conducted in an general setup involving nonnegative kernels in a random environment which allows treatment of filters and prediction filters in a single framework. The main result is illustrated on two examples, the first showing that a total variation norm stability result obtained by Douc et al. (2009) can be extended to $V$-norm without any additional assumptions, the second concerning a situation in which forgetting of the initial condition holds in $V$-norm for the filters, but the $V$-norm of each prediction filter is infinite.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.
-
An Introduction to Twisted Particle Filters and Parameter Estimation in Non-linear State-space Models
Authors:
Juha Ala-Luhtala,
Nick Whiteley,
Kari Heine,
Robert Piche
Abstract:
Twisted particle filters are a class of sequential Monte Carlo methods recently introduced by Whiteley and Lee to improve the efficiency of marginal likelihood estimation in state-space models. The purpose of this article is to extend the twisted particle filtering methodology, establish accessible theoretical results which convey its rationale, and provide a demonstration of its practical perform…
▽ More
Twisted particle filters are a class of sequential Monte Carlo methods recently introduced by Whiteley and Lee to improve the efficiency of marginal likelihood estimation in state-space models. The purpose of this article is to extend the twisted particle filtering methodology, establish accessible theoretical results which convey its rationale, and provide a demonstration of its practical performance within particle Markov chain Monte Carlo for estimating static model parameters. We derive twisted particle filters that incorporate systematic or multinomial resampling and information from historical particle states, and a transparent proof which identifies the optimal algorithm for marginal likelihood estimation. We demonstrate how to approximate the optimal algorithm for nonlinear state-space models with Gaussian noise and we apply such approximations to two examples: a range and bearing tracking problem and an indoor positioning problem with Bluetooth signal strength measurements. We demonstrate improvements over standard algorithms in terms of variance of marginal likelihood estimates and Markov chain autocorrelation for given CPU time, and improved tracking performance using estimated parameters.
△ Less
Submitted 7 April, 2016; v1 submitted 30 September, 2015;
originally announced September 2015.
-
Variance estimation in the particle filter
Authors:
Anthony Lee,
Nick Whiteley
Abstract:
This paper concerns numerical assessment of Monte Carlo error in particle filters. We show that by kee** track of certain key features of the genealogical structure arising from resampling operations, it is possible to estimate variances of a number of standard Monte Carlo approximations which particle filters deliver. All our estimators can be computed from a single run of a particle filter wit…
▽ More
This paper concerns numerical assessment of Monte Carlo error in particle filters. We show that by kee** track of certain key features of the genealogical structure arising from resampling operations, it is possible to estimate variances of a number of standard Monte Carlo approximations which particle filters deliver. All our estimators can be computed from a single run of a particle filter with no further simulation. We establish that as the number of particles grows, our estimators are weakly consistent for asymptotic variances of the Monte Carlo approximations and some of them are also non-asymptotically unbiased. The asymptotic variances can be decomposed into terms corresponding to each time step of the algorithm, and we show how to consistently estimate each of these terms. When the number of particles may vary over time, this allows approximation of the asymptotically optimal allocation of particle numbers.
△ Less
Submitted 28 June, 2016; v1 submitted 1 September, 2015;
originally announced September 2015.
-
Fluctuations, stability and instability of a distributed particle filter with local exchange
Authors:
Kari Heine,
Nick Whiteley
Abstract:
We study a distributed particle filter proposed by Bolić et al.~(2005). This algorithm involves $m$ groups of $M$ particles, with interaction between groups occurring through a "local exchange" mechanism. We establish a central limit theorem in the regime where $M$ is fixed and $m\to\infty$. A formula we obtain for the asymptotic variance can be interpreted in terms of colliding Markov chains, ena…
▽ More
We study a distributed particle filter proposed by Bolić et al.~(2005). This algorithm involves $m$ groups of $M$ particles, with interaction between groups occurring through a "local exchange" mechanism. We establish a central limit theorem in the regime where $M$ is fixed and $m\to\infty$. A formula we obtain for the asymptotic variance can be interpreted in terms of colliding Markov chains, enabling analytic and numerical evaluations of how the asymptotic variance behaves over time, with comparison to a benchmark algorithm consisting of $m$ independent particle filters. We prove that subject to regularity conditions, when $m$ is fixed both algorithms converge time-uniformly at rate $M^{-1/2}$. Through use of our asymptotic variance formula we give counter-examples satisfying the same regularity conditions to show that when $M$ is fixed neither algorithm, in general, converges time-uniformly at rate $m^{-1/2}$.
△ Less
Submitted 19 May, 2016; v1 submitted 10 May, 2015;
originally announced May 2015.
-
A hidden Markov model for decoding and the analysis of replay in spike trains
Authors:
Marc Box,
Matt W. Jones,
Nick Whiteley
Abstract:
We present a hidden Markov model that describes variation in an animal's position associated with varying levels of activity in action potential spike trains of individual place cell neurons. The model incorporates a coarse-graining of position, which we find to be a more parsimonious description of the system than other models. We use a sequential Monte Carlo algorithm for Bayesian inference of m…
▽ More
We present a hidden Markov model that describes variation in an animal's position associated with varying levels of activity in action potential spike trains of individual place cell neurons. The model incorporates a coarse-graining of position, which we find to be a more parsimonious description of the system than other models. We use a sequential Monte Carlo algorithm for Bayesian inference of model parameters, including the state space dimension, and we explain how to estimate position from spike train observations (decoding). We obtain greater accuracy over other methods in the conditions of high temporal resolution and small neuronal sample size. We also present a novel, model-based approach to the study of replay: the expression of spike train activity related to behaviour during times of motionlessness or sleep, thought to be integral to the consolidation of long-term memories. We demonstrate how we can detect the time, information content and compression rate of replay events in simulated and real hippocampal data recorded from rats in two different environments, and verify the correlation between the times of detected replay events and of sharp wave/ripples in the local field potential.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
Butterfly resampling: asymptotics for particle filters with constrained interactions
Authors:
Kari Heine,
Nick Whiteley,
A. Taylan Cemgil,
Hakan Guldas
Abstract:
We generalize the elementary mechanism of sampling with replacement $N$ times from a weighted population of size $N$, by introducing auxiliary variables and constraints on conditional independence characterised by modular congruence relations. Motivated by considerations of parallelism, a convergence study reveals how sparsity of the mechanism's conditional independence graph is related to fluctua…
▽ More
We generalize the elementary mechanism of sampling with replacement $N$ times from a weighted population of size $N$, by introducing auxiliary variables and constraints on conditional independence characterised by modular congruence relations. Motivated by considerations of parallelism, a convergence study reveals how sparsity of the mechanism's conditional independence graph is related to fluctuation properties of particle filters which use it for resampling, in some cases exhibiting exotic scaling behaviour. The proofs involve detailed combinatorial analysis of conditional independence graphs.
△ Less
Submitted 21 November, 2014;
originally announced November 2014.
-
Perfect sampling for nonhomogeneous Markov chains and hidden Markov models
Authors:
Nick Whiteley,
Anthony Lee
Abstract:
We obtain a perfect sampling characterization of weak ergodicity for backward products of finite stochastic matrices, and equivalently, simultaneous tail triviality of the corresponding nonhomogeneous Markov chains. Applying these ideas to hidden Markov models, we show how to sample exactly from the finite-dimensional conditional distributions of the signal process given infinitely many observatio…
▽ More
We obtain a perfect sampling characterization of weak ergodicity for backward products of finite stochastic matrices, and equivalently, simultaneous tail triviality of the corresponding nonhomogeneous Markov chains. Applying these ideas to hidden Markov models, we show how to sample exactly from the finite-dimensional conditional distributions of the signal process given infinitely many observations, using an algorithm which requires only an almost surely finite number of observations to actually be accessed. A notion of "successful" coupling is introduced and its occurrence is characterized in terms of conditional ergodicity properties of the hidden Markov model and related to the stability of nonlinear filters.
△ Less
Submitted 6 January, 2016; v1 submitted 16 October, 2014;
originally announced October 2014.
-
Forest resampling for distributed sequential Monte Carlo
Authors:
Anthony Lee,
Nick Whiteley
Abstract:
This paper brings explicit considerations of distributed computing architectures and data structures into the rigorous design of Sequential Monte Carlo (SMC) methods. A theoretical result established recently by the authors shows that adapting interaction between particles to suitably control the Effective Sample Size (ESS) is sufficient to guarantee stability of SMC algorithms. Our objective is t…
▽ More
This paper brings explicit considerations of distributed computing architectures and data structures into the rigorous design of Sequential Monte Carlo (SMC) methods. A theoretical result established recently by the authors shows that adapting interaction between particles to suitably control the Effective Sample Size (ESS) is sufficient to guarantee stability of SMC algorithms. Our objective is to leverage this result and devise algorithms which are thus guaranteed to work well in a distributed setting. We make three main contributions to achieve this. Firstly, we study mathematical properties of the ESS as a function of matrices and graphs that parameterize the interaction amongst particles. Secondly, we show how these graphs can be induced by tree data structures which model the logical network topology of an abstract distributed computing environment. Thirdly, we present efficient distributed algorithms that achieve the desired ESS control, perform resampling and operate on forests associated with these trees.
△ Less
Submitted 23 June, 2014;
originally announced June 2014.
-
On the role of interaction in sequential Monte Carlo algorithms
Authors:
Nick Whiteley,
Anthony Lee,
Kari Heine
Abstract:
We introduce a general form of sequential Monte Carlo algorithm defined in terms of a parameterized resampling mechanism. We find that a suitably generalized notion of the Effective Sample Size (ESS), widely used to monitor algorithm degeneracy, appears naturally in a study of its convergence properties. We are then able to phrase sufficient conditions for time-uniform convergence in terms of algo…
▽ More
We introduce a general form of sequential Monte Carlo algorithm defined in terms of a parameterized resampling mechanism. We find that a suitably generalized notion of the Effective Sample Size (ESS), widely used to monitor algorithm degeneracy, appears naturally in a study of its convergence properties. We are then able to phrase sufficient conditions for time-uniform convergence in terms of algorithmic control of the ESS, in turn achievable by adaptively modulating the interaction between particles. This leads us to suggest novel algorithms which are, in senses to be made precise, provably stable and yet designed to avoid the degree of interaction which hinders parallelization of standard algorithms. As a byproduct, we prove time-uniform convergence of the popular adaptive resampling particle filter.
△ Less
Submitted 7 January, 2016; v1 submitted 11 September, 2013;
originally announced September 2013.
-
Bayesian learning of noisy Markov decision processes
Authors:
Sumeetpal S. Singh,
Nicolas Chopin,
Nick Whiteley
Abstract:
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about…
▽ More
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
△ Less
Submitted 26 November, 2012;
originally announced November 2012.
-
Twisted particle filters
Authors:
Nick Whiteley,
Anthony Lee
Abstract:
We investigate sampling laws for particle algorithms and the influence of these laws on the efficiency of particle approximations of marginal likelihoods in hidden Markov models. Among a broad class of candidates we characterize the essentially unique family of particle system transition kernels which is optimal with respect to an asymptotic-in-time variance growth rate criterion. The sampling str…
▽ More
We investigate sampling laws for particle algorithms and the influence of these laws on the efficiency of particle approximations of marginal likelihoods in hidden Markov models. Among a broad class of candidates we characterize the essentially unique family of particle system transition kernels which is optimal with respect to an asymptotic-in-time variance growth rate criterion. The sampling structure of the algorithm defined by these optimal transitions turns out to be only subtly different from standard algorithms and yet the fluctuation properties of the estimates it provides can be dramatically different. The structure of the optimal transition suggests a new class of algorithms, which we term "twisted" particle filters and which we validate with asymptotic analysis of a more traditional nature, in the regime where the number of particles tends to infinity.
△ Less
Submitted 20 February, 2014; v1 submitted 30 September, 2012;
originally announced October 2012.
-
Approximate Bayesian Computation for Smoothing
Authors:
James S. Martin,
Ajay Jasra,
Sumeetpal S. Singh,
Nick Whiteley,
Emma McCoy
Abstract:
We consider a method for approximate inference in hidden Markov models (HMMs). The method circumvents the need to evaluate conditional densities of observations given the hidden states. It may be considered an instance of Approximate Bayesian Computation (ABC) and it involves the introduction of auxiliary variables valued in the same space as the observations. The quality of the approximation may…
▽ More
We consider a method for approximate inference in hidden Markov models (HMMs). The method circumvents the need to evaluate conditional densities of observations given the hidden states. It may be considered an instance of Approximate Bayesian Computation (ABC) and it involves the introduction of auxiliary variables valued in the same space as the observations. The quality of the approximation may be controlled to arbitrary precision through a parameter ε>0 . We provide theoretical results which quantify, in terms of ε, the ABC error in approximation of expectations of additive functionals with respect to the smoothing distributions. Under regularity assumptions, this error is O(nε), where n is the number of time steps over which smoothing is performed. For numerical implementation we adopt the forward-only sequential Monte Carlo (SMC) scheme of [16] and quantify the combined error from the ABC and SMC approximations. This forms some of the first quantitative results for ABC methods which jointly treat the ABC and simulation errors, with a finite number of data and simulated samples. When the HMM has unknown static parameters, we consider particle Markov chain Monte Carlo [2] (PMCMC) methods for batch statistical inference.
△ Less
Submitted 22 June, 2012;
originally announced June 2012.
-
Calculating principal eigen-functions of non-negative integral kernels: particle approximations and applications
Authors:
Nick Whiteley,
Nikolas Kantas
Abstract:
Often in applications such as rare events estimation or optimal control it is required that one calculates the principal eigen-function and eigen-value of a non-negative integral kernel. Except in the finite-dimensional case, usually neither the principal eigen-function nor the eigen-value can be computed exactly. In this paper, we develop numerical approximations for these quantities. We show how…
▽ More
Often in applications such as rare events estimation or optimal control it is required that one calculates the principal eigen-function and eigen-value of a non-negative integral kernel. Except in the finite-dimensional case, usually neither the principal eigen-function nor the eigen-value can be computed exactly. In this paper, we develop numerical approximations for these quantities. We show how a generic interacting particle algorithm can be used to deliver numerical approximations of the eigen-quantities and the associated so-called "twisted" Markov kernel as well as how these approximations are relevant to the aforementioned applications. In addition, we study a collection of random integral operators underlying the algorithm, address some of their mean and path-wise properties, and obtain $L_{r}$ error estimates. Finally, numerical examples are provided in the context of importance sampling for computing tail probabilities of Markov chains and computing value functions for a class of stochastic optimal control problems.
△ Less
Submitted 27 September, 2016; v1 submitted 29 February, 2012;
originally announced February 2012.
-
Error Bounds and Normalizing Constants for Sequential Monte Carlo in High Dimensions
Authors:
Alexandros Beskos,
Dan Crisan,
Ajay Jasra,
Nick Whiteley
Abstract:
In a recent paper Beskos et al (2011), the Sequential Monte Carlo (SMC) sampler introduced in Del Moral et al (2006), Neal (2001) has been shown to be asymptotically stable in the dimension of the state space d at a cost that is only polynomial in d, when N the number of Monte Carlo samples, is fixed. More precisely, it has been established that the effective sample size (ESS) of the ensuing (appr…
▽ More
In a recent paper Beskos et al (2011), the Sequential Monte Carlo (SMC) sampler introduced in Del Moral et al (2006), Neal (2001) has been shown to be asymptotically stable in the dimension of the state space d at a cost that is only polynomial in d, when N the number of Monte Carlo samples, is fixed. More precisely, it has been established that the effective sample size (ESS) of the ensuing (approximate) sample and the Monte Carlo error of fixed dimensional marginals will converge as $d$ grows, with a computational cost of $\mathcal{O}(Nd^2)$. In the present work, further results on SMC methods in high dimensions are provided as $d\to\infty$ and with $N$ fixed. We deduce an explicit bound on the Monte-Carlo error for estimates derived using the SMC sampler and the exact asymptotic relative $\mathbb{L}_2$-error of the estimate of the normalizing constant. We also establish marginal propagation of chaos properties of the algorithm. The accuracy in high-dimensions of some approximate SMC-based filtering schemes is also discussed.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.
-
Stability properties of some particle filters
Authors:
Nick Whiteley
Abstract:
Under multiplicative drift and other regularity conditions, it is established that the asymptotic variance associated with a particle filter approximation of the prediction filter is bounded uniformly in time, and the nonasymptotic, relative variance associated with a particle approximation of the normalizing constant is bounded linearly in time. The conditions are demonstrated to hold for some hi…
▽ More
Under multiplicative drift and other regularity conditions, it is established that the asymptotic variance associated with a particle filter approximation of the prediction filter is bounded uniformly in time, and the nonasymptotic, relative variance associated with a particle approximation of the normalizing constant is bounded linearly in time. The conditions are demonstrated to hold for some hidden Markov models on noncompact state spaces. The particle stability results are obtained by proving $v$-norm multiplicative stability and exponential moment results for the underlying Feynman-Kac formulas.
△ Less
Submitted 5 December, 2013; v1 submitted 30 September, 2011;
originally announced September 2011.
-
Linear Variance Bounds for Particle Approximations of Time-Homogeneous Feynman-Kac Formulae
Authors:
Nick Whiteley,
Nikolas Kantas,
Ajay Jasra
Abstract:
This article establishes sufficient conditions for a linear-in-time bound on the non-asymptotic variance of particle approximations of time-homogeneous Feynman-Kac formulae. These formulae appear in a wide variety of applications including option pricing in finance and risk sensitive control in engineering. In direct Monte Carlo approximation of these formulae, the non-asymptotic variance typicall…
▽ More
This article establishes sufficient conditions for a linear-in-time bound on the non-asymptotic variance of particle approximations of time-homogeneous Feynman-Kac formulae. These formulae appear in a wide variety of applications including option pricing in finance and risk sensitive control in engineering. In direct Monte Carlo approximation of these formulae, the non-asymptotic variance typically increases at an exponential rate in the time parameter. It is shown that a linear bound holds when a non-negative kernel, defined by the logarithmic potential function and Markov kernel which specify the Feynman-Kac model, satisfies a type of multiplicative drift condition and other regularity assumptions. Examples illustrate that these conditions are general and flexible enough to accommodate two rather extreme cases, which can occur in the context of a non-compact state space: 1) when the potential function is bounded above, not bounded below and the Markov kernel is not ergodic; and 2) when the potential function is not bounded above, but the Markov kernel itself satisfies a multiplicative drift condition.
△ Less
Submitted 13 February, 2012; v1 submitted 19 August, 2011;
originally announced August 2011.
-
Sequential Monte Carlo samplers: error bounds and insensitivity to initial conditions
Authors:
Nick Whiteley
Abstract:
This paper addresses finite sample stability properties of sequential Monte Carlo methods for approximating sequences of probability distributions. The results presented herein are applicable in the scenario where the start and end distributions in the sequence are fixed and the number of intermediate steps is a parameter of the algorithm. Under assumptions which hold on non-compact spaces, it is…
▽ More
This paper addresses finite sample stability properties of sequential Monte Carlo methods for approximating sequences of probability distributions. The results presented herein are applicable in the scenario where the start and end distributions in the sequence are fixed and the number of intermediate steps is a parameter of the algorithm. Under assumptions which hold on non-compact spaces, it is shown that the effect of the initial distribution decays exponentially fast in the number of intermediate steps and the corresponding stochastic error is stable in \mathbb{L}_{p} norm.
△ Less
Submitted 21 March, 2011;
originally announced March 2011.
-
Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo Methods
Authors:
Nick Whiteley,
Christophe Andrieu,
Arnaud Doucet
Abstract:
Switching state-space models (SSSM) are a very popular class of time series models that have found many applications in statistics, econometrics and advanced signal processing. Bayesian inference for these models typically relies on Markov chain Monte Carlo (MCMC) techniques. However, even sophisticated MCMC methods dedicated to SSSM can prove quite inefficient as they update potentially strongly…
▽ More
Switching state-space models (SSSM) are a very popular class of time series models that have found many applications in statistics, econometrics and advanced signal processing. Bayesian inference for these models typically relies on Markov chain Monte Carlo (MCMC) techniques. However, even sophisticated MCMC methods dedicated to SSSM can prove quite inefficient as they update potentially strongly correlated discrete-valued latent variables one-at-a-time (Carter and Kohn, 1996; Gerlach et al., 2000; Giordani and Kohn, 2008). Particle Markov chain Monte Carlo (PMCMC) methods are a recently developed class of MCMC algorithms which use particle filters to build efficient proposal distributions in high-dimensions (Andrieu et al., 2010). The existing PMCMC methods of Andrieu et al. (2010) are applicable to SSSM, but are restricted to employing standard particle filtering techniques. Yet, in the context of discrete-valued latent variables, specialised particle techniques have been developed which can outperform by up to an order of magnitude standard methods (Fearnhead, 1998; Fearnhead and Clifford, 2003; Fearnhead, 2004). In this paper we develop a novel class of PMCMC methods relying on these very efficient particle algorithms. We establish the theoretical validy of this new generic methodology referred to as discrete PMCMC and demonstrate it on a variety of examples including a multiple change-points model for well-log data and a model for U.S./U.K. exchange rate data. Discrete PMCMC algorithms are shown to outperform experimentally state-of-the-art MCMC techniques for a fixed computational complexity. Additionally they can be easily parallelized (Lee et al., 2010) which allows further substantial gains.
△ Less
Submitted 10 November, 2010;
originally announced November 2010.