-
Mixing time of the conditional backward sampling particle filter
Authors:
Joona Karjalainen,
Anthony Lee,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
The conditional backward sampling particle filter (CBPF) is a powerful Markov chain Monte Carlo sampler for general state space hidden Markov model smoothing. It was proposed as an improvement over the conditional particle filter, which is known to have an $O(T^2)$ computational time complexity under a general `strong' mixing assumption, where $T$ is the time horizon. We provide the first proof th…
▽ More
The conditional backward sampling particle filter (CBPF) is a powerful Markov chain Monte Carlo sampler for general state space hidden Markov model smoothing. It was proposed as an improvement over the conditional particle filter, which is known to have an $O(T^2)$ computational time complexity under a general `strong' mixing assumption, where $T$ is the time horizon. We provide the first proof that the CBPF admits an $O(T \log T)$ time complexity under strong mixing, complementing strong empirical evidence of the superiority of the CBPF in practice. In particular, the CBPF's mixing time is upper bounded by $O(\log T)$, for any sufficiently large number of particles $N$ that depends only on the mixing assumptions and not $T$. We show that an $O(\log T)$ mixing time is optimal. The proof involves the analysis of a novel coupling of two CBPFs, which involves a maximal coupling of two particle systems at each time instant. The coupling is implementable, and thus can also be used to construct unbiased, finite variance, estimates of functionals which have arbitrary dependence on the latent state's path, with a total expected cost of $O(T \log T)$. We also investigate other couplings, and we show some of these alternatives have improved empirical behaviour.
△ Less
Submitted 22 February, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
On the Forgetting of Particle Filters
Authors:
Joona Karjalainen,
Anthony Lee,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
We study the forgetting properties of the particle filter when its state - the collection of particles - is regarded as a Markov chain. Under a strong mixing assumption on the particle filter's underlying Feynman-Kac model, we find that the particle filter is exponentially mixing, and forgets its initial state in $O(\log N )$ `time', where $N$ is the number of particles and time refers to the numb…
▽ More
We study the forgetting properties of the particle filter when its state - the collection of particles - is regarded as a Markov chain. Under a strong mixing assumption on the particle filter's underlying Feynman-Kac model, we find that the particle filter is exponentially mixing, and forgets its initial state in $O(\log N )$ `time', where $N$ is the number of particles and time refers to the number of particle filter algorithm steps, each comprising a selection (or resampling) and mutation (or prediction) operation. We present an example which suggests that this rate is optimal. In contrast to our result, available results to-date are extremely conservative, suggesting $O(α^N)$ time steps are needed, for some $α>1$, for the particle filter to forget its initialisation. We also study the conditional particle filter (CPF) and extend our forgetting result to this context. We establish a similar conclusion, namely, CPF is exponentially mixing and forgets its initial state in $O(\log N )$ time. To support this analysis, we establish new time-uniform $L^p$ error estimates for CPF, which can be of independent interest.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
On the convergence of dynamic implementations of Hamiltonian Monte Carlo and No U-Turn Samplers
Authors:
Alain Durmus,
Samuel Gruffaz,
Miika Kailas,
Eero Saksman,
Matti Vihola
Abstract:
There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC algorithms we call dynamic HMC. We show that this ge…
▽ More
There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC algorithms we call dynamic HMC. We show that this general framework encompasses NUTS as a particular case, implying the invariance of the target distribution as a by-product. Second, we establish conditions under which NUTS is irreducible and aperiodic and as a corrolary ergodic. Under conditions similar to the ones existing for HMC, we also show that NUTS is geometrically ergodic. Finally, we improve existing convergence results for HMC showing that this method is ergodic without any boundedness condition on the stepsize and the number of leapfrog steps, in the case where the target is a perturbation of a Gaussian distribution.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Simulating counterfactuals
Authors:
Juha Karvanen,
Santtu Tikka,
Matti Vihola
Abstract:
Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables…
▽ More
Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables. We show that the proposed algorithm can be presented as a particle filter leading to asymptotically valid inference. The algorithm is applied to fairness analysis in credit-scoring.
△ Less
Submitted 26 March, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Conditional particle filters with bridge backward sampling
Authors:
Santeri Karppinen,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
Conditional particle filters (CPFs) with backward/ancestor sampling are powerful methods for sampling from the posterior distribution of the latent states of a dynamic model such as a hidden Markov model. However, the performance of these methods deteriorates with models involving weakly informative observations and/or slowly mixing dynamics. Both of these complications arise when sampling finely…
▽ More
Conditional particle filters (CPFs) with backward/ancestor sampling are powerful methods for sampling from the posterior distribution of the latent states of a dynamic model such as a hidden Markov model. However, the performance of these methods deteriorates with models involving weakly informative observations and/or slowly mixing dynamics. Both of these complications arise when sampling finely time-discretised continuous-time path integral models, but can occur with hidden Markov models too. Multinomial resampling, which is commonly employed with CPFs, resamples excessively for weakly informative observations and thereby introduces extra variance. Furthermore, slowly mixing dynamics render the backward/ancestor sampling steps ineffective, leading to degeneracy issues. We detail two conditional resampling strategies suitable for the weakly informative regime: the so-called `killing' resampling and the systematic resampling with mean partial order. To avoid the degeneracy issues, we introduce a generalisation of the CPF with backward sampling that involves auxiliary `bridging' CPF steps that are parameterised by a blocking sequence. We present practical tuning strategies for choosing an appropriate blocking. Our experiments demonstrate that the CPF with a suitable resampling and the developed `bridge backward sampling' can lead to substantial efficiency gains in the weakly informative and slow mixing regime.
△ Less
Submitted 19 June, 2023; v1 submitted 27 May, 2022;
originally announced May 2022.
-
On resampling schemes for particle filters with weakly informative observations
Authors:
Nicolas Chopin,
Sumeetpal S. Singh,
Tomás Soto,
Matti Vihola
Abstract:
We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are ind…
▽ More
We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time limit, which is expressed as a suitably defined `infinitesimal generator.' By contrasting these generators, we find that (certain modifications of) systematic and SSP resampling `dominate' stratified and independent `killing' resampling in terms of their limiting overall resampling rate. The reduced intensity of resampling manifests itself in lower variance in our numerical experiment. This efficiency result, through an ordering of the resampling rate, is new to the literature. The second major contribution of this work concerns the analysis of the limiting behaviour of the entire population of particles of the particle filter as the time discretisation becomes finer. We provide the first proof, under general conditions, that the particle approximation of the discretised continuous-time Feynman--Kac path integral models converges to a (uniformly weighted) continuous-time particle system.
△ Less
Submitted 9 July, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R
Authors:
Jouni Helske,
Matti Vihola
Abstract:
We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Ma…
▽ More
We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias. The package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package offers an easy-to-use interface to define models with linear-Gaussian state dynamics with non-Gaussian observation models, and has an Rcpp interface for specifying custom non-linear and diffusion models.
△ Less
Submitted 28 May, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Conditional particle filters with diffuse initial distributions
Authors:
Santeri Karppinen,
Matti Vihola
Abstract:
Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform effi…
▽ More
Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform efficient inference with diffuse initial distributions. The method only requires simulatable Markov transitions that are reversible with respect to the initial distribution, which can be improper. We focus in particular on random-walk type transitions which are reversible with respect to a uniform initial distribution (on some domain), and autoregressive kernels for Gaussian initial distributions. We propose to use on-line adaptations within the methods. In the case of random-walk transition, our adaptations use the estimated covariance and acceptance rate adaptation, and we detail their theoretical validity. We tested our methods with a linear-Gaussian random-walk model, a stochastic volatility model, and a stochastic epidemic compartment model with time-varying transmission rate. The experimental findings demonstrate that our method works reliably with little user specification, and can be substantially better mixing than a direct particle Gibbs algorithm that treats initial states as parameters.
△ Less
Submitted 20 November, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Hierarchical log Gaussian Cox process for regeneration in uneven-aged forests
Authors:
Mikko Kuronen,
Aila Särkkä,
Matti Vihola,
Mari Myllymäki
Abstract:
We propose a hierarchical log Gaussian Cox process (LGCP) for point patterns, where a set of points x affects another set of points y but not vice versa. We use the model to investigate the effect of large trees to the locations of seedlings. In the model, every point in x has a parametric influence kernel or signal, which together form an influence field. Conditionally on the parameters, the infl…
▽ More
We propose a hierarchical log Gaussian Cox process (LGCP) for point patterns, where a set of points x affects another set of points y but not vice versa. We use the model to investigate the effect of large trees to the locations of seedlings. In the model, every point in x has a parametric influence kernel or signal, which together form an influence field. Conditionally on the parameters, the influence field acts as a spatial covariate in the intensity of the model, and the intensity itself is a non-linear function of the parameters. Points outside the observation window may affect the influence field inside the window. We propose an edge correction to account for this missing data. The parameters of the model are estimated in a Bayesian framework using Markov chain Monte Carlo (MCMC) where a Laplace approximation is used for the Gaussian field of the LGCP model. The proposed model is used to analyze the effect of large trees on the success of regeneration in uneven-aged forest stands in Finland.
△ Less
Submitted 7 July, 2021; v1 submitted 5 May, 2020;
originally announced May 2020.
-
On the use of approximate Bayesian computation Markov chain Monte Carlo with inflated tolerance and post-correction
Authors:
Matti Vihola,
Jordan Franks
Abstract:
Approximate Bayesian computation allows for inference of complicated probabilistic models with intractable likelihoods using model simulations. The Markov chain Monte Carlo implementation of approximate Bayesian computation is often sensitive to the tolerance parameter: low tolerance leads to poor mixing and large tolerance entails excess bias. We consider an approach using a relatively large tole…
▽ More
Approximate Bayesian computation allows for inference of complicated probabilistic models with intractable likelihoods using model simulations. The Markov chain Monte Carlo implementation of approximate Bayesian computation is often sensitive to the tolerance parameter: low tolerance leads to poor mixing and large tolerance entails excess bias. We consider an approach using a relatively large tolerance for the Markov chain Monte Carlo sampler to ensure its sufficient mixing, and post-processing the output leading to estimators for a range of finer tolerances. We introduce an approximate confidence interval for the related post-corrected estimators, and propose an adaptive approximate Bayesian computation Markov chain Monte Carlo, which finds a `balanced' tolerance level automatically, based on acceptance rate optimisation. Our experiments show that post-processing based estimators can perform better than direct Markov chain targetting a fine tolerance, that our confidence intervals are reliable, and that our adaptive algorithm leads to reliable inference with little user specification.
△ Less
Submitted 16 May, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Graphical model inference: Sequential Monte Carlo meets deterministic approximations
Authors:
Fredrik Lindsten,
Jouni Helske,
Matti Vihola
Abstract:
Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the ga…
▽ More
Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the gap between deterministic and stochastic inference. Specifically, we suggest an efficient sequential Monte Carlo (SMC) algorithm for PGMs which can leverage the output from deterministic inference methods. While generally applicable, we show explicitly how this can be done with loopy belief propagation, expectation propagation, and Laplace approximations. The resulting algorithm can be viewed as a post-correction of the biases associated with these methods and, indeed, numerical results show clear improvements over the baseline deterministic methods as well as over "plain" SMC.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
Unbiased inference for discretely observed hidden Markov model diffusions
Authors:
Neil K. Chada,
Jordan Franks,
Ajay Jasra,
Kody J. H. Law,
Matti Vihola
Abstract:
We develop a Bayesian inference method for diffusions observed discretely and with noise, which is free of discretisation bias. Unlike existing unbiased inference methods, our method does not rely on exact simulation techniques. Instead, our method uses standard time-discretised approximations of diffusions, such as the Euler--Maruyama scheme. Our approach is based on particle marginal Metropolis-…
▽ More
We develop a Bayesian inference method for diffusions observed discretely and with noise, which is free of discretisation bias. Unlike existing unbiased inference methods, our method does not rely on exact simulation techniques. Instead, our method uses standard time-discretised approximations of diffusions, such as the Euler--Maruyama scheme. Our approach is based on particle marginal Metropolis--Hastings, a particle filter, randomised multilevel Monte Carlo, and importance sampling type correction of approximate Markov chain Monte Carlo. The resulting estimator leads to inference without a bias from the time-discretisation as the number of Markov chain iterations increases. We give convergence results and recommend allocations for algorithm inputs. Our method admits a straightforward parallelisation, and can be computationally efficient. The user-friendly approach is illustrated on three examples, where the underlying diffusion is an Ornstein--Uhlenbeck process, a geometric Brownian motion, and a 2d non-reversible Langevin equation.
△ Less
Submitted 9 March, 2021; v1 submitted 26 July, 2018;
originally announced July 2018.
-
Coupled conditional backward sampling particle filter
Authors:
Anthony Lee,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
The conditional particle filter (CPF) is a promising algorithm for general hidden Markov model smoothing. Empirical evidence suggests that the variant of CPF with backward sampling (CBPF) performs well even with long time series. Previous theoretical results have not been able to demonstrate the improvement brought by backward sampling, whereas we provide rates showing that CBPF can remain effecti…
▽ More
The conditional particle filter (CPF) is a promising algorithm for general hidden Markov model smoothing. Empirical evidence suggests that the variant of CPF with backward sampling (CBPF) performs well even with long time series. Previous theoretical results have not been able to demonstrate the improvement brought by backward sampling, whereas we provide rates showing that CBPF can remain effective with a fixed number of particles independent of the time horizon. Our result is based on analysis of a new coupling of two CBPFs, the coupled conditional backward sampling particle filter (CCBPF). We show that CCBPF has good stability properties in the sense that with fixed number of particles, the coupling time in terms of iterations increases only linearly with respect to the time horizon under a general (strong mixing) condition. The CCBPF is useful not only as a theoretical tool, but also as a practical method that allows for unbiased estimation of smoothing expectations, following the recent developments by Jacob et al. (to appear). Unbiased estimation has many advantages, such as enabling the construction of asymptotically exact confidence intervals and straightforward parallelisation.
△ Less
Submitted 28 August, 2019; v1 submitted 15 June, 2018;
originally announced June 2018.
-
Importance sampling correction versus standard averages of reversible MCMCs in terms of the asymptotic variance
Authors:
Jordan Franks,
Matti Vihola
Abstract:
We establish an ordering criterion for the asymptotic variances of two consistent Markov chain Monte Carlo (MCMC) estimators: an importance sampling (IS) estimator, based on an approximate reversible chain and subsequent IS weighting, and a standard MCMC estimator, based on an exact reversible chain. Essentially, we relax the criterion of the Peskun type covariance ordering by considering two diff…
▽ More
We establish an ordering criterion for the asymptotic variances of two consistent Markov chain Monte Carlo (MCMC) estimators: an importance sampling (IS) estimator, based on an approximate reversible chain and subsequent IS weighting, and a standard MCMC estimator, based on an exact reversible chain. Essentially, we relax the criterion of the Peskun type covariance ordering by considering two different invariant probabilities, and obtain, in place of a strict ordering of asymptotic variances, a bound of the asymptotic variance of IS by that of the direct MCMC. Simple examples show that IS can have arbitrarily better or worse asymptotic variance than Metropolis-Hastings and delayed-acceptance (DA) MCMC. Our ordering implies that IS is guaranteed to be competitive up to a factor depending on the supremum of the (marginal) IS weight. We elaborate upon the criterion in case of unbiased estimators as part of an auxiliary variable framework. We show how the criterion implies asymptotic variance guarantees for IS in terms of pseudo-marginal (PM) and DA corrections, essentially if the ratio of exact and approximate likelihoods is bounded. We also show that convergence of the IS chain can be less affected by unbounded high-variance unbiased estimators than PM and DA chains.
△ Less
Submitted 24 March, 2020; v1 submitted 29 June, 2017;
originally announced June 2017.
-
Importance sampling type estimators based on approximate marginal MCMC
Authors:
Matti Vihola,
Jouni Helske,
Jordan Franks
Abstract:
We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The I…
▽ More
We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation. We detail minimal conditions which ensure strong consistency of the suggested estimators, and provide central limit theorems with expressions for asymptotic variances. We demonstrate how our method can make use of SMC in the state space models context, using Laplace approximations and time-discretised diffusions. Our experimental results are promising and show that the IS type approach can provide substantial gains relative to an analogous DA scheme, and is often competitive even without parallelisation.
△ Less
Submitted 9 March, 2020; v1 submitted 8 September, 2016;
originally announced September 2016.
-
Unbiased estimators and multilevel Monte Carlo
Authors:
Matti Vihola
Abstract:
Multilevel Monte Carlo (MLMC) and unbiased estimators recently proposed by McLeish (Monte Carlo Methods Appl., 2011) and Rhee and Glynn (Oper. Res., 2015) are closely related. This connection is elaborated by presenting a new general class of unbiased estimators, which admits previous debiasing schemes as special cases. New lower variance estimators are proposed, which are stratified versions of e…
▽ More
Multilevel Monte Carlo (MLMC) and unbiased estimators recently proposed by McLeish (Monte Carlo Methods Appl., 2011) and Rhee and Glynn (Oper. Res., 2015) are closely related. This connection is elaborated by presenting a new general class of unbiased estimators, which admits previous debiasing schemes as special cases. New lower variance estimators are proposed, which are stratified versions of earlier unbiased schemes. Under general conditions, essentially when MLMC admits the canonical square root Monte Carlo error rate, the proposed new schemes are shown to be asymptotically as efficient as MLMC, both in terms of variance and cost. The experiments demonstrate that the variance reduction provided by the new schemes can be substantial.
△ Less
Submitted 11 May, 2017; v1 submitted 3 December, 2015;
originally announced December 2015.
-
Establishing some order amongst exact approximations of MCMCs
Authors:
Christophe Andrieu,
Matti Vihola
Abstract:
Exact approximations of Markov chain Monte Carlo (MCMC) algorithms are a general emerging class of sampling algorithms. One of the main ideas behind exact approximations consists of replacing intractable quantities required to run standard MCMC algorithms, such as the target probability density in a Metropolis-Hastings algorithm, with estimators. Perhaps surprisingly, such approximations lead to p…
▽ More
Exact approximations of Markov chain Monte Carlo (MCMC) algorithms are a general emerging class of sampling algorithms. One of the main ideas behind exact approximations consists of replacing intractable quantities required to run standard MCMC algorithms, such as the target probability density in a Metropolis-Hastings algorithm, with estimators. Perhaps surprisingly, such approximations lead to powerful algorithms which are exact in the sense that they are guaranteed to have correct limiting distributions. In this paper we discover a general framework which allows one to compare, or order, performance measures of two implementations of such algorithms. In particular, we establish an order with respect to the mean acceptance probability, the first autocorrelation coefficient, the asymptotic variance and the right spectral gap. The key notion to guarantee the ordering is that of the convex order between estimators used to implement the algorithms. We believe that our convex order condition is close to optimal, and this is supported by a counter-example which shows that a weaker variance order is not sufficient. The convex order plays a central role by allowing us to construct a martingale coupling which enables the comparison of performance measures of Markov chain with differing invariant distributions, contrary to existing results. We detail applications of our result by identifying extremal distributions within given classes of approximations, by showing that averaging replicas improves performance in a monotonic fashion and that stratification is guaranteed to improve performance for the standard implementation of the Approximate Bayesian Computation (ABC) MCMC method.
△ Less
Submitted 29 October, 2015; v1 submitted 28 April, 2014;
originally announced April 2014.
-
Conditional convex orders and measurable martingale couplings
Authors:
Lasse Leskelä,
Matti Vihola
Abstract:
Strassen's classical martingale coupling theorem states that two real-valued random variables are ordered in the convex (resp.\ increasing convex) stochastic order if and only if they admit a martingale (resp.\ submartingale) coupling. By analyzing topological properties of spaces of probability measures equipped with a Wasserstein metric and applying a measurable selection theorem, we prove a con…
▽ More
Strassen's classical martingale coupling theorem states that two real-valued random variables are ordered in the convex (resp.\ increasing convex) stochastic order if and only if they admit a martingale (resp.\ submartingale) coupling. By analyzing topological properties of spaces of probability measures equipped with a Wasserstein metric and applying a measurable selection theorem, we prove a conditional version of this result for real-valued random variables conditioned on a random element taking values in a general measurable space. We also provide an analogue of the conditional martingale coupling theorem in the language of probability kernels and illustrate how this result can be applied in the analysis of pseudo-marginal Markov chain Monte Carlo algorithms. We also illustrate how our results imply the existence of a measurable minimiser in the context of martingale optimal transport.
△ Less
Submitted 30 October, 2015; v1 submitted 3 April, 2014;
originally announced April 2014.
-
Convergence of Markovian Stochastic Approximation with discontinuous dynamics
Authors:
Gersende Fort,
Eric Moulines,
Amandine Schreck,
Matti Vihola
Abstract:
This paper is devoted to the convergence analysis of stochastic approximation algorithms of the form $θ\_{n+1} = θ\_n + γ\_{n+1} H\_{θ\_n}(X\_{n+1})$ where $\{θ\_nn, n \geq 0\}$ is a $R^d$-valued sequence, $\{γ, n \geq 0\}$ is a deterministic step-size sequence and $\{X\_n, n \geq 0\}$ is a controlled Markov chain. We study the convergence under weak assumptions on smoothness-in-$θ$ of the f…
▽ More
This paper is devoted to the convergence analysis of stochastic approximation algorithms of the form $θ\_{n+1} = θ\_n + γ\_{n+1} H\_{θ\_n}(X\_{n+1})$ where $\{θ\_nn, n \geq 0\}$ is a $R^d$-valued sequence, $\{γ, n \geq 0\}$ is a deterministic step-size sequence and $\{X\_n, n \geq 0\}$ is a controlled Markov chain. We study the convergence under weak assumptions on smoothness-in-$θ$ of the function $θ\mapsto H\_θ(x)$. It is usually assumed that this function is continuous for any $x$; in this work, we relax this condition. Our results are illustrated by considering stochastic approximation algorithms for (adaptive) quantile estimation and a penalized version of the vector quantization.
△ Less
Submitted 26 January, 2016; v1 submitted 26 March, 2014;
originally announced March 2014.
-
Uniform Ergodicity of the Iterated Conditional SMC and Geometric Ergodicity of Particle Gibbs samplers
Authors:
Christophe Andrieu,
Anthony Lee,
Matti Vihola
Abstract:
We establish quantitative bounds for rates of convergence and asymptotic variances for iterated conditional sequential Monte Carlo (i-cSMC) Markov chains and associated particle Gibbs samplers. Our main findings are that the essential boundedness of potential functions associated with the i-cSMC algorithm provide necessary and sufficient conditions for the uniform ergodicity of the i-cSMC Markov c…
▽ More
We establish quantitative bounds for rates of convergence and asymptotic variances for iterated conditional sequential Monte Carlo (i-cSMC) Markov chains and associated particle Gibbs samplers. Our main findings are that the essential boundedness of potential functions associated with the i-cSMC algorithm provide necessary and sufficient conditions for the uniform ergodicity of the i-cSMC Markov chain, as well as quantitative bounds on its (uniformly geometric) rate of convergence. Furthermore, we show that the i-cSMC Markov chain cannot even be geometrically ergodic if this essential boundedness does not hold in many applications of interest. Our sufficiency and quantitative bounds rely on a novel non-asymptotic analysis of the expectation of a standard normalizing constant estimate with respect to a "doubly conditional" SMC algorithm. In addition, our results for i-cSMC imply that the rate of convergence can be improved arbitrarily by increasing N, the number of particles in the algorithm, and that in the presence of mixing assumptions, the rate of convergence can be kept constant by increasing N linearly with the time horizon. We translate the sufficiency of the boundedness condition for i-cSMC into sufficient conditions for the particle Gibbs Markov chain to be geometrically ergodic and quantitative bounds on its geometric rate of convergence, which imply convergence of properties of the particle Gibbs Markov chain to those of its corresponding Gibbs sampler. These results complement recently discovered, and related, conditions for the particle marginal Metropolis-Hastings (PMMH) Markov chain.
△ Less
Submitted 14 April, 2015; v1 submitted 22 December, 2013;
originally announced December 2013.
-
Quantitative convergence rates for sub-geometric Markov chains
Authors:
Christophe Andrieu,
Gersende Fort,
Matti Vihola
Abstract:
We provide explicit expressions for the constants involved in the characterisation of ergodicity of sub-geometric Markov chains. The constants are determined in terms of those appearing in the assumed drift and one-step minorisation conditions. The result is fundamental for the study of some algorithms where uniform bounds for these constants are needed for a family of Markov kernels. Our result a…
▽ More
We provide explicit expressions for the constants involved in the characterisation of ergodicity of sub-geometric Markov chains. The constants are determined in terms of those appearing in the assumed drift and one-step minorisation conditions. The result is fundamental for the study of some algorithms where uniform bounds for these constants are needed for a family of Markov kernels. Our result accommodates also some classes of inhomogeneous chains.
△ Less
Submitted 17 March, 2014; v1 submitted 3 September, 2013;
originally announced September 2013.
-
Adaptive Metropolis Algorithm Using Variational Bayesian Adaptive Kalman Filter
Authors:
Isambi S. Mbalawata,
Simo Särkkä,
Matti Vihola,
Heikki Haario
Abstract:
Markov chain Monte Carlo (MCMC) methods are powerful computational tools for analysis of complex statistical problems. However, their computational efficiency is highly dependent on the chosen proposal distribution, which is generally difficult to find. One way to solve this problem is to use adaptive MCMC algorithms which automatically tune the statistics of a proposal distribution during the MCM…
▽ More
Markov chain Monte Carlo (MCMC) methods are powerful computational tools for analysis of complex statistical problems. However, their computational efficiency is highly dependent on the chosen proposal distribution, which is generally difficult to find. One way to solve this problem is to use adaptive MCMC algorithms which automatically tune the statistics of a proposal distribution during the MCMC run. A new adaptive MCMC algorithm, called the variational Bayesian adaptive Metropolis (VBAM) algorithm, is developed. The VBAM algorithm updates the proposal covariance matrix using the variational Bayesian adaptive Kalman filter (VB-AKF). A strong law of large numbers for the VBAM algorithm is proven. The empirical convergence results for three simulated examples and for two real data examples are also provided.
△ Less
Submitted 1 October, 2014; v1 submitted 27 August, 2013;
originally announced August 2013.
-
Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms
Authors:
Christophe Andrieu,
Matti Vihola
Abstract:
We study convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms (Andrieu and Roberts [Ann. Statist. 37 (2009) 697-725]). We find that the asymptotic variance of the pseudo-marginal algorithm is always at least as large as that of the marginal algorithm. We show that if the marginal chain admits a (right) spectral gap and the weights (normalised estimates of the target densit…
▽ More
We study convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms (Andrieu and Roberts [Ann. Statist. 37 (2009) 697-725]). We find that the asymptotic variance of the pseudo-marginal algorithm is always at least as large as that of the marginal algorithm. We show that if the marginal chain admits a (right) spectral gap and the weights (normalised estimates of the target density) are uniformly bounded, then the pseudo-marginal chain has a spectral gap. In many cases, a similar result holds for the absolute spectral gap, which is equivalent to geometric ergodicity. We consider also unbounded weight distributions and recover polynomial convergence rates in more specific cases, when the marginal algorithm is uniformly ergodic or an independent Metropolis-Hastings or a random-walk Metropolis targeting a super-exponential density with regular contours. Our results on geometric and polynomial convergence rates imply central limit theorems. We also prove that under general conditions, the asymptotic variance of the pseudo-marginal algorithm converges to the asymptotic variance of the marginal algorithm if the accuracy of the estimators is increased.
△ Less
Submitted 30 March, 2015; v1 submitted 4 October, 2012;
originally announced October 2012.
-
On the stability of some controlled Markov chains and its applications to stochastic approximation with Markovian dynamic
Authors:
Christophe Andrieu,
Vladislav B. Tadić,
Matti Vihola
Abstract:
We develop a practical approach to establish the stability, that is, the recurrence in a given set, of a large class of controlled Markov chains. These processes arise in various areas of applied science and encompass important numerical methods. We show in particular how individual Lyapunov functions and associated drift conditions for the parametrized family of Markov transition probabilities an…
▽ More
We develop a practical approach to establish the stability, that is, the recurrence in a given set, of a large class of controlled Markov chains. These processes arise in various areas of applied science and encompass important numerical methods. We show in particular how individual Lyapunov functions and associated drift conditions for the parametrized family of Markov transition probabilities and the parameter update can be combined to form Lyapunov functions for the joint process, leading to the proof of the desired stability property. Of particular interest is the fact that the approach applies even in situations where the two components of the process present a time-scale separation, which is a crucial feature of practical situations. We then move on to show how such a recurrence property can be used in the context of stochastic approximation in order to prove the convergence of the parameter sequence, including in the situation where the so-called stepsize is adaptively tuned. We finally show that the results apply to various algorithms of interest in computational statistics and cognate areas.
△ Less
Submitted 30 January, 2015; v1 submitted 18 May, 2012;
originally announced May 2012.
-
Adaptive parallel tempering algorithm
Authors:
Blazej Miasojedow,
Eric Moulines,
Matti Vihola
Abstract:
Parallel tempering is a generic Markov chain Monte Carlo sampling method which allows good mixing with multimodal target distributions, where conventional Metropolis-Hastings algorithms often fail. The mixing properties of the sampler depend strongly on the choice of tuning parameters, such as the temperature schedule and the proposal distribution used for local exploration. We propose an adaptive…
▽ More
Parallel tempering is a generic Markov chain Monte Carlo sampling method which allows good mixing with multimodal target distributions, where conventional Metropolis-Hastings algorithms often fail. The mixing properties of the sampler depend strongly on the choice of tuning parameters, such as the temperature schedule and the proposal distribution used for local exploration. We propose an adaptive algorithm which tunes both the temperature schedule and the parameters of the random-walk Metropolis kernel automatically. We prove the convergence of the adaptation and a strong law of large numbers for the algorithm. We illustrate the performance of our method with examples. Our empirical findings indicate that the algorithm can cope well with different kind of scenarios without prior tuning.
△ Less
Submitted 4 May, 2012;
originally announced May 2012.
-
Markovian stochastic approximation with expanding projections
Authors:
Christophe Andrieu,
Matti Vihola
Abstract:
Stochastic approximation is a framework unifying many random iterative algorithms occurring in a diverse range of applications. The stability of the process is often difficult to verify in practical applications and the process may even be unstable without additional stabilisation techniques. We study a stochastic approximation procedure with expanding projections similar to Andradóttir [Oper. Res…
▽ More
Stochastic approximation is a framework unifying many random iterative algorithms occurring in a diverse range of applications. The stability of the process is often difficult to verify in practical applications and the process may even be unstable without additional stabilisation techniques. We study a stochastic approximation procedure with expanding projections similar to Andradóttir [Oper. Res. 43 (1995) 1037-1048]. We focus on Markovian noise and show the stability and convergence under general conditions. Our framework also incorporates the possibility to use a random step size sequence, which allows us to consider settings with a non-smooth family of Markov kernels. We apply the theory to stochastic approximation expectation maximisation with particle independent Metropolis-Hastings sampling.
△ Less
Submitted 7 March, 2014; v1 submitted 23 November, 2011;
originally announced November 2011.
-
Stochastic order characterization of uniform integrability and tightness
Authors:
Lasse Leskelä,
Matti Vihola
Abstract:
We show that a family of random variables is uniformly integrable if and only if it is stochastically bounded in the increasing convex order by an integrable random variable. This result is complemented by proving analogous statements for the strong stochastic order and for power-integrable dominating random variables. Especially, we show that whenever a family of random variables is stochasticall…
▽ More
We show that a family of random variables is uniformly integrable if and only if it is stochastically bounded in the increasing convex order by an integrable random variable. This result is complemented by proving analogous statements for the strong stochastic order and for power-integrable dominating random variables. Especially, we show that whenever a family of random variables is stochastically bounded by a p-integrable random variable for some p>1, there is no distinction between the strong order and the increasing convex order. These results also yield new characterizations of relative compactness in Wasserstein and Prohorov metrics.
△ Less
Submitted 3 June, 2011;
originally announced June 2011.
-
Robust adaptive Metropolis algorithm with coerced acceptance rate
Authors:
Matti Vihola
Abstract:
The adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen [Bernoulli 7 (2001) 223-242] uses the estimated covariance of the target distribution in the proposal distribution. This paper introduces a new robust adaptive Metropolis algorithm estimating the shape of the target distribution and simultaneously coercing the acceptance rate. The adaptation rule is computationally simple addin…
▽ More
The adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen [Bernoulli 7 (2001) 223-242] uses the estimated covariance of the target distribution in the proposal distribution. This paper introduces a new robust adaptive Metropolis algorithm estimating the shape of the target distribution and simultaneously coercing the acceptance rate. The adaptation rule is computationally simple adding no extra cost compared with the AM algorithm. The adaptation strategy can be seen as a multidimensional extension of the previously proposed method adapting the scale of the proposal distribution in order to attain a given acceptance rate. The empirical results show promising behaviour of the new algorithm in an example with Student target distribution having no finite second moment, where the AM covariance estimate is unstable. In the examples with finite second moments, the performance of the new approach seems to be competitive with the AM algorithm combined with scale adaptation.
△ Less
Submitted 27 May, 2011; v1 submitted 19 November, 2010;
originally announced November 2010.
-
Can the Adaptive Metropolis Algorithm Collapse Without the Covariance Lower Bound?
Authors:
Matti Vihola
Abstract:
The Adaptive Metropolis (AM) algorithm is based on the symmetric random-walk Metropolis algorithm. The proposal distribution has the following time-dependent covariance matrix at step $n+1$ \[
S_n = Cov(X_1,...,X_n) + εI, \] that is, the sample covariance matrix of the history of the chain plus a (small) constant $ε>0$ multiple of the identity matrix $I$. The lower bound on the eigenvalues of…
▽ More
The Adaptive Metropolis (AM) algorithm is based on the symmetric random-walk Metropolis algorithm. The proposal distribution has the following time-dependent covariance matrix at step $n+1$ \[
S_n = Cov(X_1,...,X_n) + εI, \] that is, the sample covariance matrix of the history of the chain plus a (small) constant $ε>0$ multiple of the identity matrix $I$. The lower bound on the eigenvalues of $S_n$ induced by the factor $εI$ is theoretically convenient, but practically cumbersome, as a good value for the parameter $ε$ may not always be easy to choose. This article considers variants of the AM algorithm that do not explicitly bound the eigenvalues of $S_n$ away from zero. The behaviour of $S_n$ is studied in detail, indicating that the eigenvalues of $S_n$ do not tend to collapse to zero in general.
△ Less
Submitted 3 November, 2009;
originally announced November 2009.
-
On the stability and ergodicity of adaptive scaling Metropolis algorithms
Authors:
Matti Vihola
Abstract:
The stability and ergodicity properties of two adaptive random walk Metropolis algorithms are considered. The both algorithms adjust the scaling of the proposal distribution continuously based on the observed acceptance probability. Unlike the previously proposed forms of the algorithms, the adapted scaling parameter is not constrained within a predefined compact interval. The first algorithm is b…
▽ More
The stability and ergodicity properties of two adaptive random walk Metropolis algorithms are considered. The both algorithms adjust the scaling of the proposal distribution continuously based on the observed acceptance probability. Unlike the previously proposed forms of the algorithms, the adapted scaling parameter is not constrained within a predefined compact interval. The first algorithm is based on scale adaptation only, while the second one incorporates also covariance adaptation. A strong law of large numbers is shown to hold assuming that the target density is smooth enough and has either compact support or super-exponentially decaying tails.
△ Less
Submitted 5 April, 2011; v1 submitted 24 March, 2009;
originally announced March 2009.
-
Grapham: Graphical Models with Adaptive Random Walk Metropolis Algorithms
Authors:
Matti Vihola
Abstract:
Recently developed adaptive Markov chain Monte Carlo (MCMC) methods have been applied successfully to many problems in Bayesian statistics. Grapham is a new open source implementation covering several such methods, with emphasis on graphical models for directed acyclic graphs. The implemented algorithms include the seminal Adaptive Metropolis algorithm adjusting the proposal covariance according…
▽ More
Recently developed adaptive Markov chain Monte Carlo (MCMC) methods have been applied successfully to many problems in Bayesian statistics. Grapham is a new open source implementation covering several such methods, with emphasis on graphical models for directed acyclic graphs. The implemented algorithms include the seminal Adaptive Metropolis algorithm adjusting the proposal covariance according to the history of the chain and a Metropolis algorithm adjusting the proposal scale based on the observed acceptance probability. Different variants of the algorithms allow one, for example, to use these two algorithms together, employ delayed rejection and adjust several parameters of the algorithms. The implemented Metropolis-within-Gibbs update allows arbitrary sampling blocks. The software is written in C and uses a simple extension language Lua in configuration.
△ Less
Submitted 2 September, 2009; v1 submitted 25 November, 2008;
originally announced November 2008.
-
On the ergodicity of the adaptive Metropolis algorithm on unbounded domains
Authors:
Eero Saksman,
Matti Vihola
Abstract:
This paper describes sufficient conditions to ensure the correct ergodicity of the Adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen [Bernoulli 7 (2001) 223--242] for target distributions with a noncompact support. The conditions ensuring a strong law of large numbers require that the tails of the target density decay super-exponentially and have regular contours. The result is ba…
▽ More
This paper describes sufficient conditions to ensure the correct ergodicity of the Adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen [Bernoulli 7 (2001) 223--242] for target distributions with a noncompact support. The conditions ensuring a strong law of large numbers require that the tails of the target density decay super-exponentially and have regular contours. The result is based on the ergodicity of an auxiliary process that is sequentially constrained to feasible adaptation sets, independent estimates of the growth rate of the AM chain and the corresponding geometric drift constants. The ergodicity result of the constrained process is obtained through a modification of the approach due to Andrieu and Moulines [Ann. Appl. Probab. 16 (2006) 1462--1505].
△ Less
Submitted 11 November, 2010; v1 submitted 18 June, 2008;
originally announced June 2008.