Search | arXiv e-print repository

Stochastic Approximation Beyond Gradient for Signal Processing and Machine Learning

Authors: Aymeric Dieuleveut, Gersende Fort, Eric Moulines, Hoi-To Wai

Abstract: Stochastic Approximation (SA) is a classical algorithm that has had since the early days a huge impact on signal processing, and nowadays on machine learning, due to the necessity to deal with a large amount of data observed with uncertainties. An exemplar special case of SA pertains to the popular stochastic (sub)gradient algorithm which is the working horse behind many important applications. A… ▽ More Stochastic Approximation (SA) is a classical algorithm that has had since the early days a huge impact on signal processing, and nowadays on machine learning, due to the necessity to deal with a large amount of data observed with uncertainties. An exemplar special case of SA pertains to the popular stochastic (sub)gradient algorithm which is the working horse behind many important applications. A lesser-known fact is that the SA scheme also extends to non-stochastic-gradient algorithms such as compressed stochastic gradient, stochastic expectation-maximization, and a number of reinforcement learning algorithms. The aim of this article is to overview and introduce the non-stochastic-gradient perspectives of SA to the signal processing and machine learning audiences through presenting a design guideline of SA algorithms backed by theories. Our central theme is to propose a general framework that unifies existing theories of SA, including its non-asymptotic and asymptotic convergence results, and demonstrate their applications on popular non-stochastic-gradient algorithms. We build our analysis framework based on classes of Lyapunov functions that satisfy a variety of mild conditions. We draw connections between non-stochastic-gradient algorithms and scenarios when the Lyapunov function is smooth, convex, or strongly convex. Using the said framework, we illustrate the convergence properties of the non-stochastic-gradient algorithms using concrete examples. Extensions to the emerging variance reduction techniques for improved sample complexity will also be discussed. △ Less

Submitted 16 July, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Accepted for publication at IEEE Transactions on Signal Processing; 31 pages, 7 pages of supplementary materials

arXiv:2301.00631 [pdf, other]

Stochastic Variable Metric Proximal Gradient with variance reduction for non-convex composite optimization

Authors: Gersende Fort, Eric Moulines

Abstract: This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward-Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strat… ▽ More This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward-Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strategy with variance reduction to address the finite sum setting. We show that 3P-SPIDER extends some Stochastic preconditioned Gradient Descent-based algorithms and some Incremental Expectation Maximization algorithms to composite optimization and to the case the forward operator can not be computed in closed form. We also provide an explicit control of convergence in expectation of 3P-SPIDER, and study its complexity in order to satisfy the epsilon-approximate stationary condition. Our results are the first to combine the composite non-convex optimization setting, a variance reduction technique to tackle the finite sum setting by using a minibatch strategy and, to allow deterministic or random approximations of the preconditioned forward operator. Finally, through an application to inference in a logistic regression model with random effects, we numerically compare 3P-SPIDER to other stochastic forward-backward algorithms and discuss the role of some design parameters of 3P-SPIDER. △ Less

Submitted 8 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: Statistics and Computing, In press

arXiv:2203.09142 [pdf, ps, other]

doi 10.1109/TSP.2023.3247142

Covid19 Reproduction Number: Credibility Intervals by Blockwise Proximal Monte Carlo Samplers

Authors: Gersende Fort, Barbara Pascal, Patrice Abry, Nelly Pustelnik

Abstract: Monitoring the Covid19 pandemic constitutes a critical societal stake that received considerable research efforts. The intensity of the pandemic on a given territory is efficiently measured by the reproduction number, quantifying the rate of growth of daily new infections. Recently, estimates for the time evolution of the reproduction number were produced using an inverse problem formulation wit… ▽ More Monitoring the Covid19 pandemic constitutes a critical societal stake that received considerable research efforts. The intensity of the pandemic on a given territory is efficiently measured by the reproduction number, quantifying the rate of growth of daily new infections. Recently, estimates for the time evolution of the reproduction number were produced using an inverse problem formulation with a nonsmooth functional minimization. While it was designed to be robust to the limited quality of the Covid19 data (outliers, missing counts), the procedure lacks the ability to output credibility interval based estimates. This remains a severe limitation for practical use in actual pandemic monitoring by epidemiologists that the present work aims to overcome by use of Monte Carlo sampling. After interpretation of the nonsmooth functional into a Bayesian framework, several sampling schemes are tailored to adjust the nonsmooth nature of the resulting posterior distribution. The originality of the devised algorithms stems from combining a Langevin Monte Carlo sampling scheme with Proximal operators. Performance of the new algorithms in producing relevant credibility intervals for the reproduction number estimates and denoised counts are compared. Assessment is conducted on real daily new infection counts made available by the Johns Hopkins University. The interest of the devised monitoring tools are illustrated on Covid19 data from several different countries. △ Less

Submitted 8 March, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

Journal ref: IEEE Transactions on Signal Processing, In press

arXiv:2202.05497 [pdf, ps, other]

Temporal evolution of the Covid19 pandemic reproduction number: Estimations from proximal optimization to Monte Carlo sampling

Authors: Patrice Abry, Gersende Fort, Barbara Pascal, Nelly Pustelnik

Abstract: Monitoring the evolution of the Covid19 pandemic constitutes a critical step in sanitary policy design. Yet, the assessment of the pandemic intensity within the pandemic period remains a challenging task because of the limited quality of data made available by public health authorities (missing data, outliers and pseudoseasonalities, notably), that calls for cumbersome and ad-hoc preprocessing (de… ▽ More Monitoring the evolution of the Covid19 pandemic constitutes a critical step in sanitary policy design. Yet, the assessment of the pandemic intensity within the pandemic period remains a challenging task because of the limited quality of data made available by public health authorities (missing data, outliers and pseudoseasonalities, notably), that calls for cumbersome and ad-hoc preprocessing (denoising) prior to estimation. Recently, the estimation of the reproduction number, a measure of the pandemic intensity, was formulated as an inverse problem, combining data-model fidelity and space-time regularity constraints, solved by nonsmooth convex proximal minimizations. Though promising, that formulation lacks robustness against the limited quality of the Covid19 data and confidence assessment. The present work aims to address both limitations: First, it discusses solutions to produce a robust assessment of the pandemic intensity by accounting for the low quality of the data directly within the inverse problem formulation. Second, exploiting a Bayesian interpretation of the inverse problem formulation, it devises a Monte Carlo sampling strategy, tailored to a nonsmooth log-concave a posteriori distribution, to produce relevant credibility intervalbased estimates for the Covid19 reproduction number. Clinical relevance Applied to daily counts of new infections made publicly available by the Health Authorities for around 200 countries, the proposed procedures permit robust assessments of the time evolution of the Covid19 pandemic intensity, updated automatically and on a daily basis. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2111.02083 [pdf, other]

Federated Expectation Maximization with heterogeneity mitigation and variance reduction

Authors: Aymeric Dieuleveut, Gersende Fort, Eric Moulines, Geneviève Robin

Abstract: The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning contex… ▽ More The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning context. FedEM is a new communication efficient method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets. To alleviate the communication bottleneck, FedEM compresses appropriately defined complete data sufficient statistics. We also develop and analyze an extension of FedEM to further incorporate a variance reduction scheme. In all cases, we derive finite-time complexity bounds for smooth non-convex problems. Numerical results are presented to support our theoretical findings, as well as an application to federated missing values imputation for biodiversity monitoring. △ Less

Submitted 10 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

Journal ref: NeurIPS 2021 - 35th Conference on Neural Information Processing Systems, Dec 2021, Sydney, Australia

arXiv:2105.11733 [pdf, ps, other]

The perturbed prox-preconditioned spider algorithm: non-asymptotic convergence bounds

Authors: Gersende Fort, E Moulines

Abstract: A novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral Differential EstimatoR (SPIDER), an algorithm known to achieve near-optimal first-order oracle inequality for nonconvex and nonsmooth optimization. Compared to the vanilla prox-SPIDER, 3P-SPIDER uses preconditi… ▽ More A novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral Differential EstimatoR (SPIDER), an algorithm known to achieve near-optimal first-order oracle inequality for nonconvex and nonsmooth optimization. Compared to the vanilla prox-SPIDER, 3P-SPIDER uses preconditioned gradient estimators. Preconditioning can either be applied "explicitly" to a gradient estimator or be introduced "implicitly" as in applications to the EM algorithm. 3P-SPIDER also assumes that the preconditioned gradients may (possibly) be not known in closed analytical form and therefore must be approximated which adds an additional degree of perturbation. Studying the convergence in expectation, we show that 3P-SPIDER achieves a near-optimal oracle inequality O(n^(1/2) /epsilon) where n is the number of observations and epsilon the target precision even when the gradient is estimated by Monte Carlo methods. We illustrate the algorithm on an application to the minimization of a penalized empirical loss. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Journal ref: IEEE Statistical Signal Processing Workshop, Jul 2021, Rio de Janeiro, Brazil

arXiv:2105.11732 [pdf, ps, other]

The Perturbed Prox-Preconditioned SPIDER algorithm for EM-based large scale learning

Authors: Gersende Fort, Eric Moulines

Abstract: Incremental Expectation Maximization (EM) algorithms were introduced to design EM for the large scale learning framework by avoiding the full data set to be processed at each iteration. Nevertheless, these algorithms all assume that the conditional expectations of the sufficient statistics are explicit. In this paper, we propose a novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPID… ▽ More Incremental Expectation Maximization (EM) algorithms were introduced to design EM for the large scale learning framework by avoiding the full data set to be processed at each iteration. Nevertheless, these algorithms all assume that the conditional expectations of the sufficient statistics are explicit. In this paper, we propose a novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER), which builds on the Stochastic Path Integral Differential EstimatoR EM (SPIDER-EM) algorithm. The 3P-SPIDER algorithm addresses many intractabilities of the E-step of EM; it also deals with non-smooth regularization and convex constraint set. Numerical experiments show that 3P-SPIDER outperforms other incremental EM methods and discuss the role of some design parameters. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Journal ref: IEEE Statistical Signal Processing Workshop, Jul 2021, Rio de Janeiro, Brazil

arXiv:2012.14670 [pdf, ps, other]

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

Authors: Gersende Fort, P. Gach, E. Moulines

Abstract: Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {\em Stochastic Approximation within EM} framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations… ▽ More Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {\em Stochastic Approximation within EM} framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations $\kmax$. We propose two strategies for achieving an $ε$-approximate stationary point, respectively with $\kmax = O(n^{2/3}/ε)$ and $\kmax = O(\sqrt{n}/ε^{3/2})$, both strategies relying on a random termination rule before $\kmax$ and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow $\kmax$ to scale as $\sqrt{n}$ which is better than $n^{2/3}$ which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance $ε$, thus making this control relevant for small to medium accuracy with respect to the number of examples $n$. Second, for the $n^{2/3}$-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results desig a less conservative choice of the step size and provide a better control of the convergence in expectation. △ Less

Submitted 29 December, 2020; originally announced December 2020.

arXiv:2012.01929 [pdf, ps, other]

A Stochastic Path-Integrated Differential EstimatoR Expectation Maximization Algorithm

Authors: Gersende Fort, Eric Moulines, Hoi-To Wai

Abstract: The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called \texttt{SPIDER-EM}, for inference from a training set of size $n$, $n \gg 1$. At the core of our algorithm is an estimator of the full conditional expectation in the {\sf E}-ste… ▽ More The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called \texttt{SPIDER-EM}, for inference from a training set of size $n$, $n \gg 1$. At the core of our algorithm is an estimator of the full conditional expectation in the {\sf E}-step, adapted from the stochastic path-integrated differential estimator ({\tt SPIDER}) technique. We derive finite-time complexity bounds for smooth non-convex likelihood: we show that for convergence to an $ε$-approximate stationary point, the complexity scales as $K_{\operatorname{Opt}} (n,ε)={\cal O}(ε^{-1})$ and $K_{\operatorname{CE}}( n,ε) = n+ \sqrt{n} {\cal O}(ε^{-1} )$, where $K_{\operatorname{Opt}}( n,ε)$ and $K_{\operatorname{CE}}(n, ε)$ are respectively the number of {\sf M}-steps and the number of per-sample conditional expectations evaluations. This improves over the state-of-the-art algorithms. Numerical results support our findings. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Journal ref: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2020), 2020

arXiv:2011.12392 [pdf, ps, other]

Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation Maximization for Nonconvex Finite-Sum Optimization

Authors: Gersende Fort, Eric Moulines, Hoi-To Wai

Abstract: The Expectation Maximization (EM) algorithm is a key reference for inference in latent variable models; unfortunately, its computational cost is prohibitive in the large scale learning setting. In this paper, we propose an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derive complexity bounds for this novel algorithm, designed to solve smooth nonconvex finit… ▽ More The Expectation Maximization (EM) algorithm is a key reference for inference in latent variable models; unfortunately, its computational cost is prohibitive in the large scale learning setting. In this paper, we propose an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derive complexity bounds for this novel algorithm, designed to solve smooth nonconvex finite-sum optimization problems. We show that it reaches the same state of the art complexity bounds as SPIDER-EM; and provide conditions for a linear rate of convergence. Numerical results support our findings. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: Submitted to an International conference, with reviewing process

arXiv:1704.08891 [pdf, other]

Stochastic Proximal Gradient Algorithms for Penalized Mixed Models

Authors: Gersende Fort, Edouard Ollier, Adeline Samson

Abstract: Motivated by penalized likelihood maximization in complex models, we study optimization problems where neither the function to optimize nor its gradient have an explicit expression, but its gradient can be approximated by a Monte Carlo technique. We propose a new algorithm based on a stochastic approximation of the Proximal-Gradient (PG) algorithm. This new algorithm, named Stochastic Approximatio… ▽ More Motivated by penalized likelihood maximization in complex models, we study optimization problems where neither the function to optimize nor its gradient have an explicit expression, but its gradient can be approximated by a Monte Carlo technique. We propose a new algorithm based on a stochastic approximation of the Proximal-Gradient (PG) algorithm. This new algorithm, named Stochastic Approximation PG (SAPG) is the combination of a stochastic gradient descent step which - roughly speaking - computes a smoothed approximation of the past gradient along the iterations, and a proximal step. The choice of the step size and the Monte Carlo batch size for the stochastic gradient descent step in SAPG are discussed. Our convergence results cover the cases of biased and unbiased Monte Carlo approximations. While the convergence analysis of the Monte Carlo-PG is already addressed in the literature (see Atchadé et al. [2016]), the convergence analysis of SAPG is new. The two algorithms are compared on a linear mixed effect model as a toy example. A more challenging application is proposed on non-linear mixed effect models in high dimension with a pharmacokinetic data set including genomic covariates. To our best knowledge, our work provides the first convergence result of a numerical method designed to solve penalized Maximum Likelihood in a non-linear mixed effect model. △ Less

Submitted 27 September, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

arXiv:1610.09194 [pdf, ps, other]

Convergence and efficiency of adaptive importance sampling techniques with partial biasing

Authors: Gersende Fort, Benjamin Jourdain, Tony Lelièvre, Gabriel Stoltz

Abstract: We consider a generalization of the discrete-time Self Healing Umbrella Sampling method, which is an adaptive importance technique useful to sample multimodal target distributions. The importance function is based on the weights (namely the relative probabilities) of disjoint sets which form a partition of the space. These weights are unknown but are learnt on the fly yielding an adaptive algorith… ▽ More We consider a generalization of the discrete-time Self Healing Umbrella Sampling method, which is an adaptive importance technique useful to sample multimodal target distributions. The importance function is based on the weights (namely the relative probabilities) of disjoint sets which form a partition of the space. These weights are unknown but are learnt on the fly yielding an adaptive algorithm. In the context of computational statistical physics, the logarithm of these weights is, up to a multiplicative constant, the free energy, and the discrete valued function defining the partition is called the collective variable. The algorithm falls into the general class of Wang-Landau type methods, and is a generalization of the original Self Healing Umbrella Sampling method in two ways: (i) the updating strategy leads to a larger penalization strength of already visited sets in order to escape more quickly from metastable states, and (ii) the target distribution is biased using only a fraction of the free energy, in order to increase the effective sample size and reduce the variance of importance sampling estimators. The algorithm can also be seen as a generalization of well-tempered metadynamics. We prove the convergence of the algorithm and analyze numerically its efficiency on a toy example. △ Less

Submitted 1 September, 2017; v1 submitted 28 October, 2016; originally announced October 2016.

arXiv:1510.03638 [pdf, other]

Spatial Prediction Under Location Uncertainty In Cellular Networks

Authors: Hajer Braham, Sana Ben Jemaa, Gersende Fort, Eric Moulines, Berna Sayrac

Abstract: Coverage optimization is an important process for the operator as it is a crucial prerequisite towards offering a satisfactory quality of service to the end-users. The first step of this process is coverage prediction, which can be performed by interpolating geo-located measurements reported to the network by mobile users' equipments. In previous works, we proposed a low complexity coverage predic… ▽ More Coverage optimization is an important process for the operator as it is a crucial prerequisite towards offering a satisfactory quality of service to the end-users. The first step of this process is coverage prediction, which can be performed by interpolating geo-located measurements reported to the network by mobile users' equipments. In previous works, we proposed a low complexity coverage prediction algorithm based on the adaptation of the Geo-statistics Fixed Rank Kriging (FRK) algorithm. We supposed that the geo-location information reported with the radio measurements was perfect, which is not the case in reality. In this paper, we study the impact of location uncertainty on the coverage prediction accuracy and we extend the previously proposed algorithm to include geo-location error in the prediction model. We validate the proposed algorithm using both simulated and real field measurements. The FRK extended to take into account the location uncertainty proves to enhance the prediction accuracy while kee** a reasonable computational complexity. △ Less

Submitted 14 March, 2016; v1 submitted 13 October, 2015; originally announced October 2015.

arXiv:1505.07062 [pdf, other]

Fixed Rank Kriging for Cellular Coverage Analysis

Authors: Hajer Braham, Sana Ben Jemaa, Gersende Fort, Eric Moulines, Berna Sayrac

Abstract: Coverage planning and optimization is one of the most crucial tasks for a radio network operator. Efficient coverage optimization requires accurate coverage estimation. This estimation relies on geo-located field measurements which are gathered today during highly expensive drive tests (DT); and will be reported in the near future by users' mobile devices thanks to the 3GPP Minimizing Drive Tests… ▽ More Coverage planning and optimization is one of the most crucial tasks for a radio network operator. Efficient coverage optimization requires accurate coverage estimation. This estimation relies on geo-located field measurements which are gathered today during highly expensive drive tests (DT); and will be reported in the near future by users' mobile devices thanks to the 3GPP Minimizing Drive Tests (MDT) feature~\cite{3GPPproposal}. This feature consists in an automatic reporting of the radio measurements associated with the geographic location of the user's mobile device. Such a solution is still costly in terms of battery consumption and signaling overhead. Therefore, predicting the coverage on a location where no measurements are available remains a key and challenging task. This paper describes a powerful tool that gives an accurate coverage prediction on the whole area of interest: it builds a coverage map by spatially interpolating geo-located measurements using the Kriging technique. The paper focuses on the reduction of the computational complexity of the Kriging algorithm by applying Fixed Rank Kriging (FRK). The performance evaluation of the FRK algorithm both on simulated measurements and real field measurements shows a good trade-off between prediction efficiency and computational complexity. In order to go a step further towards the operational application of the proposed algorithm, a multicellular use-case is studied. Simulation results show a good performance in terms of coverage prediction and detection of the best serving cell. △ Less

Submitted 14 March, 2016; v1 submitted 26 May, 2015; originally announced May 2015.

arXiv:1410.6956 [pdf, other]

Success and Failure of Adaptation-Diffusion Algorithms for Consensus in Multi-Agent Networks

Authors: Gemma Morral, Pascal Bianchi, Gersende Fort

Abstract: This paper investigates the problem of distributed stochastic approximation in multi-agent systems. The algorithm under study consists of two steps: a local stochastic approximation step and a diffusion step which drives the network to a consensus. The diffusion step uses row-stochastic matrices to weight the network exchanges. As opposed to previous works, exchange matrices are not supposed to be… ▽ More This paper investigates the problem of distributed stochastic approximation in multi-agent systems. The algorithm under study consists of two steps: a local stochastic approximation step and a diffusion step which drives the network to a consensus. The diffusion step uses row-stochastic matrices to weight the network exchanges. As opposed to previous works, exchange matrices are not supposed to be doubly stochastic, and may also depend on the past estimate. We prove that non-doubly stochastic matrices generally influence the limit points of the algorithm. Nevertheless, the limit points are not affected by the choice of the matrices provided that the latter are doubly-stochastic in expectation. This conclusion legitimates the use of broadcast-like diffusion protocols, which are easier to implement. Next, by means of a central limit theorem, we prove that doubly stochastic protocols perform asymptotically as well as centralized algorithms and we quantify the degradation caused by the use of non doubly stochastic matrices. Throughout the paper, a special emphasis is put on the special case of distributed non-convex optimization as an illustration of our results. △ Less

Submitted 25 October, 2014; originally announced October 2014.

Comments: 13 pages, 4 figures

arXiv:1410.2109 [pdf, ps, other]

Self-Healing Umbrella Sampling: Convergence and efficiency

Authors: G. Fort, B. Jourdain, T. Lelievre, G. Stoltz

Abstract: The Self-Healing Umbrella Sampling (SHUS) algorithm is an adaptive biasing algorithm which has been proposed to efficiently sample a multimodal probability measure. We show that this method can be seen as a variant of the well-known Wang-Landau algorithm. Adapting results on the convergence of the Wang-Landau algorithm, we prove the convergence of the SHUS algorithm. We also compare the two method… ▽ More The Self-Healing Umbrella Sampling (SHUS) algorithm is an adaptive biasing algorithm which has been proposed to efficiently sample a multimodal probability measure. We show that this method can be seen as a variant of the well-known Wang-Landau algorithm. Adapting results on the convergence of the Wang-Landau algorithm, we prove the convergence of the SHUS algorithm. We also compare the two methods in terms of efficiency. We finally propose a modification of the SHUS algorithm in order to increase its efficiency, and exhibit some similarities of SHUS with the well-tempered metadynamics method. △ Less

Submitted 8 October, 2014; originally announced October 2014.

arXiv:1403.6803 [pdf, ps, other]

Convergence of Markovian Stochastic Approximation with discontinuous dynamics

Authors: Gersende Fort, Eric Moulines, Amandine Schreck, Matti Vihola

Abstract: This paper is devoted to the convergence analysis of stochastic approximation algorithms of the form $θ\_{n+1} = θ\_n + γ\_{n+1} H\_{θ\_n}(X\_{n+1})$ where $\{θ\_nn, n \geq 0\}$ is a $R^d$-valued sequence, $\{γ, n \geq 0\}$ is a deterministic step-size sequence and $\{X\_n, n \geq 0\}$ is a controlled Markov chain. We study the convergence under weak assumptions on smoothness-in-$θ$ of the f… ▽ More This paper is devoted to the convergence analysis of stochastic approximation algorithms of the form $θ\_{n+1} = θ\_n + γ\_{n+1} H\_{θ\_n}(X\_{n+1})$ where $\{θ\_nn, n \geq 0\}$ is a $R^d$-valued sequence, $\{γ, n \geq 0\}$ is a deterministic step-size sequence and $\{X\_n, n \geq 0\}$ is a controlled Markov chain. We study the convergence under weak assumptions on smoothness-in-$θ$ of the function $θ\mapsto H\_θ(x)$. It is usually assumed that this function is continuous for any $x$; in this work, we relax this condition. Our results are illustrated by considering stochastic approximation algorithms for (adaptive) quantile estimation and a penalized version of the vector quantization. △ Less

Submitted 26 January, 2016; v1 submitted 26 March, 2014; originally announced March 2014.

arXiv:1402.4577 [pdf, ps, other]

Subgeometric rates of convergence in Wasserstein distance for Markov chains

Authors: Alain Durmus, Gersende Fort, Eric Moulines

Abstract: In this paper, we provide sufficient conditions for the existence of the invariant distribution and for subgeometric rates of convergence in Wasserstein distance for general state-space Markov chains which are (possibly) not irreducible. Compared to previous work, our approach is based on a purely probabilistic coupling construction which allows to retrieve rates of convergence matching those prev… ▽ More In this paper, we provide sufficient conditions for the existence of the invariant distribution and for subgeometric rates of convergence in Wasserstein distance for general state-space Markov chains which are (possibly) not irreducible. Compared to previous work, our approach is based on a purely probabilistic coupling construction which allows to retrieve rates of convergence matching those previously reported for convergence in total variation. Our results are applied to establish the subgeometric ergodicity in Wasserstein distance of non-linear autoregressive models and of the pre-conditioned Crank-Nicolson Markov chain Monte Carlo algorithm in Hilbert space. △ Less

Submitted 14 July, 2015; v1 submitted 19 February, 2014; originally announced February 2014.

arXiv:1402.2365 [pdf, other]

On perturbed proximal gradient algorithms

Authors: Yves F. Atchade, Gersende Fort, Eric Moulines

Abstract: We study a version of the proximal gradient algorithm for which the gradient is intractable and is approximated by Monte Carlo methods (and in particular Markov Chain Monte Carlo). We derive conditions on the step size and the Monte Carlo batch size under which convergence is guaranteed: both increasing batch size and constant batch size are considered. We also derive non-asymptotic bounds for an… ▽ More We study a version of the proximal gradient algorithm for which the gradient is intractable and is approximated by Monte Carlo methods (and in particular Markov Chain Monte Carlo). We derive conditions on the step size and the Monte Carlo batch size under which convergence is guaranteed: both increasing batch size and constant batch size are considered. We also derive non-asymptotic bounds for an averaged version. Our results cover both the cases of biased and unbiased Monte Carlo approximation. To support our findings, we discuss the inference of a sparse generalized linear model with random effect and the problem of learning the edge structure and parameters of sparse undirected graphical models. △ Less

Submitted 19 November, 2016; v1 submitted 10 February, 2014; originally announced February 2014.

Comments: 33 pages, 5 figures

MSC Class: 60F15; 60G42

arXiv:1312.5658 [pdf, other]

A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection

Authors: Amandine Schreck, Gersende Fort, Sylvain Le Corff, Eric Moulines

Abstract: This paper introduces a new Markov Chain Monte Carlo method for Bayesian variable selection in high dimensional settings. The algorithm is a Hastings-Metropolis sampler with a proposal mechanism which combines a Metropolis Adjusted Langevin (MALA) step to propose local moves associated with a shrinkage-thresholding step allowing to propose new models. The geometric ergodicity of this new trans-di… ▽ More This paper introduces a new Markov Chain Monte Carlo method for Bayesian variable selection in high dimensional settings. The algorithm is a Hastings-Metropolis sampler with a proposal mechanism which combines a Metropolis Adjusted Langevin (MALA) step to propose local moves associated with a shrinkage-thresholding step allowing to propose new models. The geometric ergodicity of this new trans-dimensional Markov Chain Monte Carlo sampler is established. An extensive numerical experiment, on simulated and real data, is presented to illustrate the performance of the proposed algorithm in comparison with some more classical trans-dimensional algorithms. △ Less

Submitted 11 September, 2015; v1 submitted 19 December, 2013; originally announced December 2013.

arXiv:1310.6550 [pdf, ps, other]

Efficiency of the Wang-Landau algorithm: a simple test case

Authors: G. Fort, B. Jourdain, E. Kuhn, T Lelièvre, G. Stoltz

Abstract: We analyze the efficiency of the Wang-Landau algorithm to sample a multimodal distribution on a prototypical simple test case. We show that the exit time from a metastable state is much smaller for the Wang Landau dynamics than for the original standard Metropolis-Hastings algorithm, in some asymptotic regime. Our results are confirmed by numerical experiments on a more realistic test case. We analyze the efficiency of the Wang-Landau algorithm to sample a multimodal distribution on a prototypical simple test case. We show that the exit time from a metastable state is much smaller for the Wang Landau dynamics than for the original standard Metropolis-Hastings algorithm, in some asymptotic regime. Our results are confirmed by numerical experiments on a more realistic test case. △ Less

Submitted 8 February, 2014; v1 submitted 24 October, 2013; originally announced October 2013.

arXiv:1309.3116 [pdf, ps, other]

Central Limit Theorems for Stochastic Approximation with controlled Markov chain dynamics

Authors: Gersende Fort

Abstract: This paper provides a Central Limit Theorem (CLT) for a process $\{θ_n, n\geq 0\}$ satisfying a stochastic approximation (SA) equation of the form $θ_{n+1} = θ_n + γ_{n+1} H(θ_n,X_{n+1})$; a CLT for the associated average sequence is also established. The originality of this paper is to address the case of controlled Markov chain dynamics $\{X_n, n\geq 0 \}$ and the case of multiple targets. The f… ▽ More This paper provides a Central Limit Theorem (CLT) for a process $\{θ_n, n\geq 0\}$ satisfying a stochastic approximation (SA) equation of the form $θ_{n+1} = θ_n + γ_{n+1} H(θ_n,X_{n+1})$; a CLT for the associated average sequence is also established. The originality of this paper is to address the case of controlled Markov chain dynamics $\{X_n, n\geq 0 \}$ and the case of multiple targets. The framework also accomodates (randomly) truncated SA algorithms. Sufficient conditions for CLT's to hold are provided as well as comments on how these conditions extend previous works (such as independent and identically distributed dynamics, the Robbins-Monro dynamic or the single target case). The paper gives a special emphasis on how these conditions hold for SA with controlled Markov chain dynamics and multiple targets; it is proved that this paper improves on existing works. △ Less

Submitted 12 September, 2013; originally announced September 2013.

arXiv:1309.0622 [pdf, ps, other]

Quantitative convergence rates for sub-geometric Markov chains

Authors: Christophe Andrieu, Gersende Fort, Matti Vihola

Abstract: We provide explicit expressions for the constants involved in the characterisation of ergodicity of sub-geometric Markov chains. The constants are determined in terms of those appearing in the assumed drift and one-step minorisation conditions. The result is fundamental for the study of some algorithms where uniform bounds for these constants are needed for a family of Markov kernels. Our result a… ▽ More We provide explicit expressions for the constants involved in the characterisation of ergodicity of sub-geometric Markov chains. The constants are determined in terms of those appearing in the assumed drift and one-step minorisation conditions. The result is fundamental for the study of some algorithms where uniform bounds for these constants are needed for a family of Markov kernels. Our result accommodates also some classes of inhomogeneous chains. △ Less

Submitted 17 March, 2014; v1 submitted 3 September, 2013; originally announced September 2013.

Comments: 14 pages

MSC Class: 60J05; 60J22

arXiv:1210.2601 [pdf, ps, other]

doi 10.3150/13-BEJ578

Adaptive MCMC with online relabeling

Authors: Rémi Bardenet, Olivier Cappé, Gersende Fort, Balázs Kégl

Abstract: When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal… ▽ More When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online estimate of the covariance matrix of the target, are no exception. To address the label-switching issue, relabeling algorithms associate a permutation to each MCMC sample, trying to obtain reasonable marginals. In the case of adaptive Metropolis (Bernoulli 7 (2001) 223-242), an online relabeling strategy is required. This paper is devoted to the AMOR algorithm, a provably consistent variant of AM that can cope with the label-switching problem. The idea is to nest relabeling steps within the MCMC algorithm based on the estimation of a single covariance matrix that is used both for adapting the covariance of the proposal distribution in the Metropolis algorithm step and for online relabeling. We compare the behavior of AMOR to similar relabeling methods. In the case of compactly supported target distributions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the first results on the consistency of an online relabeling algorithm to our knowledge. The proof underlines latent relations between relabeling and vector quantization. △ Less

Submitted 27 July, 2015; v1 submitted 9 October, 2012; originally announced October 2012.

Comments: Published at http://dx.doi.org/10.3150/13-BEJ578 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ578

Journal ref: Bernoulli 2015, Vol. 21, No. 3, 1304-1340

arXiv:1207.6880 [pdf, ps, other]

Convergence of the Wang-Landau algorithm

Authors: Gersende Fort, Benjamin Jourdain, Estelle Kuhn, Tony Lelièvre, Gabriel Stoltz

Abstract: We analyze the convergence properties of the Wang-Landau algorithm. This sampling method belongs to the general class of adaptive importance sampling strategies which use the free energy along a chosen reaction coordinate as a bias. Such algorithms are very helpful to enhance the sampling properties of Markov Chain Monte Carlo algorithms, when the dynamics is metastable. We prove the convergence o… ▽ More We analyze the convergence properties of the Wang-Landau algorithm. This sampling method belongs to the general class of adaptive importance sampling strategies which use the free energy along a chosen reaction coordinate as a bias. Such algorithms are very helpful to enhance the sampling properties of Markov Chain Monte Carlo algorithms, when the dynamics is metastable. We prove the convergence of the Wang-Landau algorithm and an associated central limit theorem. △ Less

Submitted 26 September, 2013; v1 submitted 30 July, 2012; originally announced July 2012.

Comments: This work is supported by the French National Research Agency under the grants ANR-09-BLAN-0216-01 (MEGAS) and ANR-08-BLAN-0218 (BigMC)

MSC Class: 65C05; 60J05; 82C80

arXiv:1207.0662 [pdf, ps, other]

Adaptive Equi-Energy Sampler : Convergence and Illustration

Authors: Amandine Schreck, Gersende Fort, Eric Moulines

Abstract: Markov chain Monte Carlo (MCMC) methods allow to sample a distribution known up to a multiplicative constant. Classical MCMC samplers are known to have very poor mixing properties when sampling multimodal distributions. The Equi-Energy sampler is an interacting MCMC sampler proposed by Kou, Zhou and Wong in 2006 to sample difficult multimodal distributions. This algorithm runs several chains at di… ▽ More Markov chain Monte Carlo (MCMC) methods allow to sample a distribution known up to a multiplicative constant. Classical MCMC samplers are known to have very poor mixing properties when sampling multimodal distributions. The Equi-Energy sampler is an interacting MCMC sampler proposed by Kou, Zhou and Wong in 2006 to sample difficult multimodal distributions. This algorithm runs several chains at different temperatures in parallel, and allow lower-tempered chains to jump to a state from a higher-tempered chain having an energy 'close' to that of the current state. A major drawback of this algorithm is that it depends on many design parameters and thus, requires a significant effort to tune these parameters. In this paper, we introduce an Adaptive Equi-Energy (AEE) sampler which automates the choice of the selection mecanism when jum** onto a state of the higher-temperature chain. We prove the ergodicity and a strong law of large numbers for AEE, and for the original Equi-Energy sampler as well. Finally, we apply our algorithm to motif sampling in DNA sequences. △ Less

Submitted 4 February, 2013; v1 submitted 3 July, 2012; originally announced July 2012.

arXiv:1203.3036 [pdf, ps, other]

doi 10.1214/11-AOS938

Convergence of adaptive and interacting Markov chain Monte Carlo algorithms

Authors: G. Fort, E. Moulines, P. Priouret

Abstract: Adaptive and interacting Markov chain Monte Carlo algorithms (MCMC) have been recently introduced in the literature. These novel simulation algorithms are designed to increase the simulation efficiency to sample complex distributions. Motivated by some recently introduced algorithms (such as the adaptive Metropolis algorithm and the interacting tempering algorithm), we develop a general methodolog… ▽ More Adaptive and interacting Markov chain Monte Carlo algorithms (MCMC) have been recently introduced in the literature. These novel simulation algorithms are designed to increase the simulation efficiency to sample complex distributions. Motivated by some recently introduced algorithms (such as the adaptive Metropolis algorithm and the interacting tempering algorithm), we develop a general methodological and theoretical framework to establish both the convergence of the marginal distribution and a strong law of large numbers. This framework weakens the conditions introduced in the pioneering paper by Roberts and Rosenthal [J. Appl. Probab. 44 (2007) 458--475]. It also covers the case when the target distribution $π$ is sampled by using Markov transition kernels with a stationary distribution that differs from $π$. △ Less

Submitted 14 March, 2012; originally announced March 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS938 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS938

Journal ref: Annals of Statistics 2011, Vol. 39, No. 6, 3262-3289

arXiv:1203.1505 [pdf, other]

Performance of a Distributed Stochastic Approximation Algorithm

Authors: Pascal Bianchi, Gersende Fort, Walid Hachem

Abstract: In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a loca… ▽ More In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a local weighted average between its estimates and those of its neighbors. Convergence of the estimates toward a consensus is established under weak assumptions. The approach relies on two main ingredients: the existence of a Lyapunov function for the mean field in the agreement subspace, and a contraction property of the random matrices of weights in the subspace orthogonal to the agreement subspace. A second order analysis of the algorithm is also performed under the form of a Central Limit Theorem. The Polyak-averaged version of the algorithm is also considered. △ Less

Submitted 2 December, 2013; v1 submitted 7 March, 2012; originally announced March 2012.

Comments: IEEE Transactions on Information Theory 2013

arXiv:1111.1307 [pdf, other]

Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm

Authors: Sylvain Le Corff, Gersende Fort

Abstract: Online variants of the Expectation Maximization (EM) algorithm have recently been proposed to perform parameter inference with large data sets or data streams, in independent latent models and in hidden Markov models. Nevertheless, the convergence properties of these algorithms remain an open problem at least in the hidden Markov case. This contribution deals with a new online EM algorithm which u… ▽ More Online variants of the Expectation Maximization (EM) algorithm have recently been proposed to perform parameter inference with large data sets or data streams, in independent latent models and in hidden Markov models. Nevertheless, the convergence properties of these algorithms remain an open problem at least in the hidden Markov case. This contribution deals with a new online EM algorithm which updates the parameter at some deterministic times. Some convergence results have been derived even in general latent models such as hidden Markov models. These properties rely on the assumption that some intermediate quantities are available in closed form or can be approximated by Monte Carlo methods when the Monte Carlo error vanishes rapidly enough. In this paper, we propose an algorithm which approximates these quantities using Sequential Monte Carlo methods. The convergence of this algorithm and of an averaged version is established and their performance is illustrated through Monte Carlo experiments. △ Less

Submitted 30 May, 2012; v1 submitted 5 November, 2011; originally announced November 2011.

arXiv:1108.4130 [pdf, other]

Supplement paper to "Online Expectation Maximization based algorithms for inference in hidden Markov models"

Authors: Sylvain Le Corff, Gersende Fort

Abstract: This is a supplementary material to the paper "Online Expectation Maximization based algorithms for inference in hidden Markov models". It contains further technical derivations and additional simulation results. This is a supplementary material to the paper "Online Expectation Maximization based algorithms for inference in hidden Markov models". It contains further technical derivations and additional simulation results. △ Less

Submitted 16 October, 2012; v1 submitted 20 August, 2011; originally announced August 2011.

arXiv:1108.3968 [pdf, other]

Online Expectation Maximization based algorithms for inference in hidden Markov models

Authors: Sylvain Le Corff, Gersende Fort

Abstract: The Expectation Maximization (EM) algorithm is a versatile tool for model parameter estimation in latent data models. When processing large data sets or data stream however, EM becomes intractable since it requires the whole data set to be available at each iteration of the algorithm. In this contribution, a new generic online EM algorithm for model parameter inference in general Hidden Markov Mod… ▽ More The Expectation Maximization (EM) algorithm is a versatile tool for model parameter estimation in latent data models. When processing large data sets or data stream however, EM becomes intractable since it requires the whole data set to be available at each iteration of the algorithm. In this contribution, a new generic online EM algorithm for model parameter inference in general Hidden Markov Model is proposed. This new algorithm updates the parameter estimate after a block of observations is processed (online). The convergence of this new algorithm is established, and the rate of convergence is studied showing the impact of the block size. An averaging procedure is also proposed to improve the rate of convergence. Finally, practical illustrations are presented to highlight the performance of these algorithms in comparison to other online maximum likelihood procedures. △ Less

Submitted 16 October, 2012; v1 submitted 19 August, 2011; originally announced August 2011.

arXiv:1107.2576 [pdf, ps, other]

A simple variance inequality for U-statistics of a Markov chain with applications

Authors: Gersende Fort, Eric Moulines, Pierre Priouret, Pierre Vandekerkhove

Abstract: We establish a simple variance inequality for U-statistics whose underlying sequence of random variables is an ergodic Markov Chain. The constants in this inequality are explicit and depend on computable bounds on the mixing rate of the Markov Chain. We apply this result to derive the strong law of large number for U-statistics of a Markov Chain under conditions which are close from being optimal. We establish a simple variance inequality for U-statistics whose underlying sequence of random variables is an ergodic Markov Chain. The constants in this inequality are explicit and depend on computable bounds on the mixing rate of the Markov Chain. We apply this result to derive the strong law of large number for U-statistics of a Markov Chain under conditions which are close from being optimal. △ Less

Submitted 13 July, 2011; originally announced July 2011.

Journal ref: Statistics and Probability Letters 82, 6 (2013) 1193--1201

arXiv:1107.2574 [pdf, ps, other]

A central limit theorem for adaptive and interacting Markov chains

Authors: Gersende Fort, Eric Moulines, Pierre Priouret, Pierre Vandekerkhove

Abstract: Adaptive and interacting Markov Chains Monte Carlo (MCMC) algorithms are a novel class of non-Markovian algorithms aimed at improving the simulation efficiency for complicated target distributions. In this paper, we study a general (non-Markovian) simulation framework covering both the adaptive and interacting MCMC algorithms. We establish a Central Limit Theorem for additive functionals of unboun… ▽ More Adaptive and interacting Markov Chains Monte Carlo (MCMC) algorithms are a novel class of non-Markovian algorithms aimed at improving the simulation efficiency for complicated target distributions. In this paper, we study a general (non-Markovian) simulation framework covering both the adaptive and interacting MCMC algorithms. We establish a Central Limit Theorem for additive functionals of unbounded functions under a set of verifiable conditions, and identify the asymptotic variance. Our result extends all the results reported so far. An application to the interacting tempering algorithm (a simplified version of the equi-energy sampler) is presented to support our claims. △ Less

Submitted 13 July, 2011; originally announced July 2011.

arXiv:1101.0950 [pdf, other]

CosmoPMC: Cosmology Population Monte Carlo

Authors: Martin Kilbinger, Karim Benabed, Olivier Cappe, Jean-Francois Cardoso, Jean Coupon, Gersende Fort, Henry J. McCracken, Simon Prunet, Christian P. Robert, Darren Wraith

Abstract: We present the public release of the Bayesian sampling algorithm for cosmology, CosmoPMC (Cosmology Population Monte Carlo). CosmoPMC explores the parameter space of various cosmological probes, and also provides a robust estimate of the Bayesian evidence. CosmoPMC is based on an adaptive importance sampling method called Population Monte Carlo (PMC). Various cosmology likelihood modules are imple… ▽ More We present the public release of the Bayesian sampling algorithm for cosmology, CosmoPMC (Cosmology Population Monte Carlo). CosmoPMC explores the parameter space of various cosmological probes, and also provides a robust estimate of the Bayesian evidence. CosmoPMC is based on an adaptive importance sampling method called Population Monte Carlo (PMC). Various cosmology likelihood modules are implemented, and new modules can be added easily. The importance-sampling algorithm is written in C, and fully parallelised using the Message Passing Interface (MPI). Due to very little overhead, the wall-clock time required for sampling scales approximately with the number of CPUs. The CosmoPMC package contains post-processing and plotting programs, and in addition a Monte-Carlo Markov chain (MCMC) algorithm. The sampling engine is implemented in the library pmclib, and can be used independently. The software is available for download at http://www.cosmopmc.info. △ Less

Submitted 29 December, 2012; v1 submitted 5 January, 2011; originally announced January 2011.

Comments: CosmoPMC user's guide, version v1.2. Replaced v1.1

arXiv:0912.1614 [pdf, other]

doi 10.1111/j.1365-2966.2010.16605.x

Bayesian model comparison in cosmology with Population Monte Carlo

Authors: Martin Kilbinger, Darren Wraith, Christian P. Robert, Karim Benabed, Olivier Cappe, Jean-Francois Cardoso, Gersende Fort, Simon Prunet, Francois R. Bouchet

Abstract: We use Bayesian model selection techniques to test extensions of the standard flat LambdaCDM paradigm. Dark-energy and curvature scenarios, and primordial perturbation models are considered. To that end, we calculate the Bayesian evidence in favour of each model using Population Monte Carlo (PMC), a new adaptive sampling technique which was recently applied in a cosmological context. The Bayesia… ▽ More We use Bayesian model selection techniques to test extensions of the standard flat LambdaCDM paradigm. Dark-energy and curvature scenarios, and primordial perturbation models are considered. To that end, we calculate the Bayesian evidence in favour of each model using Population Monte Carlo (PMC), a new adaptive sampling technique which was recently applied in a cosmological context. The Bayesian evidence is immediately available from the PMC sample used for parameter estimation without further computational effort, and it comes with an associated error evaluation. Besides, it provides an unbiased estimator of the evidence after any fixed number of iterations and it is naturally parallelizable, in contrast with MCMC and nested sampling methods. By comparison with analytical predictions for simulated data, we show that our results obtained with PMC are reliable and robust. The variability in the evidence evaluation and the stability for various cases are estimated both from simulations and from data. For the cases we consider, the log-evidence is calculated with a precision of better than 0.08. Using a combined set of recent CMB, SNIa and BAO data, we find inconclusive evidence between flat LambdaCDM and simple dark-energy models. A curved Universe is moderately to strongly disfavoured with respect to a flat cosmology. Using physically well-motivated priors within the slow-roll approximation of inflation, we find a weak preference for a running spectral index. A Harrison-Zel'dovich spectrum is weakly disfavoured. With the current data, tensor modes are not detected; the large prior volume on the tensor-to-scalar ratio r results in moderate evidence in favour of r=0. [Abridged] △ Less

Submitted 29 March, 2010; v1 submitted 8 December, 2009; originally announced December 2009.

Comments: 11 pages, 6 figures. Matches version accepted for publication by MNRAS

arXiv:0911.0221 [pdf, ps, other]

Limit theorems for some adaptive MCMC algorithms with subgeometric kernels: Part II

Authors: Yves F. Atchade, Gersende Fort

Abstract: We prove a central limit theorem for a general class of adaptive Markov Chain Monte Carlo algorithms driven by sub-geometrically ergodic Markov kernels. We discuss in detail the special case of stochastic approximation. We use the result to analyze the asymptotic behavior of an adaptive version of the Metropolis Adjusted Langevin algorithm with a heavy tailed target density. We prove a central limit theorem for a general class of adaptive Markov Chain Monte Carlo algorithms driven by sub-geometrically ergodic Markov kernels. We discuss in detail the special case of stochastic approximation. We use the result to analyze the asymptotic behavior of an adaptive version of the Metropolis Adjusted Langevin algorithm with a heavy tailed target density. △ Less

Submitted 1 November, 2009; originally announced November 2009.

Comments: 34 pages

MSC Class: 60J10; 65C05

arXiv:0903.0837 [pdf, ps, other]

doi 10.1103/PhysRevD.80.023507

Estimation of cosmological parameters using adaptive importance sampling

Authors: Darren Wraith, Martin Kilbinger, Karim Benabed, Olivier Cappé, Jean-François Cardoso, Gersende Fort, Simon Prunet, Christian P. Robert

Abstract: We present a Bayesian sampling algorithm called adaptive importance sampling or Population Monte Carlo (PMC), whose computational workload is easily parallelizable and thus has the potential to considerably reduce the wall-clock time required for sampling, along with providing other benefits. To assess the performance of the approach for cosmological problems, we use simulated and actual data co… ▽ More We present a Bayesian sampling algorithm called adaptive importance sampling or Population Monte Carlo (PMC), whose computational workload is easily parallelizable and thus has the potential to considerably reduce the wall-clock time required for sampling, along with providing other benefits. To assess the performance of the approach for cosmological problems, we use simulated and actual data consisting of CMB anisotropies, supernovae of type Ia, and weak cosmological lensing, and provide a comparison of results to those obtained using state-of-the-art Markov Chain Monte Carlo (MCMC). For both types of data sets, we find comparable parameter estimates for PMC and MCMC, with the advantage of a significantly lower computational time for PMC. In the case of WMAP5 data, for example, the wall-clock time reduces from several days for MCMC to a few hours using PMC on a cluster of processors. Other benefits of the PMC approach, along with potential difficulties in using the approach, are analysed and discussed. △ Less

Submitted 4 March, 2009; originally announced March 2009.

Comments: 17 pages, 11 figures

Journal ref: Phys.Rev.D80:023507,2009

arXiv:0901.2453 [pdf, ps, other]

State-dependent Foster-Lyapunov criteria for subgeometric convergence of Markov chains

Authors: Stephen B. Connor, Gersende Fort

Abstract: We consider a form of state-dependent drift condition for a general Markov chain, whereby the chain subsampled at some deterministic time satisfies a geometric Foster-Lyapunov condition. We present sufficient criteria for such a drift condition to exist, and use these to partially answer a question posed by Connor & Kendall (2007) concerning the existence of so-called 'tame' Markov chains. Furth… ▽ More We consider a form of state-dependent drift condition for a general Markov chain, whereby the chain subsampled at some deterministic time satisfies a geometric Foster-Lyapunov condition. We present sufficient criteria for such a drift condition to exist, and use these to partially answer a question posed by Connor & Kendall (2007) concerning the existence of so-called 'tame' Markov chains. Furthermore, we show that our 'subsampled drift condition' implies the existence of finite moments for the return time to a small set. △ Less

Submitted 3 September, 2009; v1 submitted 16 January, 2009; originally announced January 2009.

Comments: 20 pages, LaTeX: paper reduced in length

MSC Class: 60J10; 37A25

arXiv:0809.1135 [pdf, ps, other]

On adaptive stratification

Authors: Pierre Etoré, Gersende Fort, Benjamin Jourdain, Eric Moulines

Abstract: This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the strata, which should be ideally fitted to thesubsets where the functions to integrate is nearly constant, and on the allocation of the number of samples within e… ▽ More This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the strata, which should be ideally fitted to thesubsets where the functions to integrate is nearly constant, and on the allocation of the number of samples within each strata. When the dimension is large and the function to integrate is complex, finding such partitions and allocating the sample is a highly non-trivial problem. In this work, we investigate a novel method to improve the efficiency of the estimator "on the fly", by jointly sampling and adapting the strata and the allocation within the strata. The accuracy of estimators when this method is used is examined in detail, in the so-called asymptotic regime (i.e. when both the number of samples and the number of strata are large). We illustrate the use of the method for the computation of the price of path-dependent options in models with both constant and stochastic volatility. The use of this adaptive technique yields variance reduction by factors sometimes larger than 1000 compared to classical Monte Carlo estimators. △ Less

Submitted 15 September, 2009; v1 submitted 6 September, 2008; originally announced September 2008.

arXiv:0807.2952 [pdf, ps, other]

Limit theorems for some adaptive MCMC algorithms with subgeometric kernels

Authors: Yves Atchadé, Gersende Fort

Abstract: This paper deals with the ergodicity and the existence of a strong law of large numbers for adaptive Markov Chain Monte Carlo. We show that a diminishing adaptation assumption together with a drift condition for positive recurrence is enough to imply ergodicity. Strengthening the drift condition to a polynomial drift condition yields a strong law of large numbers for possibly unbounded functions… ▽ More This paper deals with the ergodicity and the existence of a strong law of large numbers for adaptive Markov Chain Monte Carlo. We show that a diminishing adaptation assumption together with a drift condition for positive recurrence is enough to imply ergodicity. Strengthening the drift condition to a polynomial drift condition yields a strong law of large numbers for possibly unbounded functions. These results broaden considerably the class of adaptive MCMC algorithms for which rigorous analysis is now possible. As an example, we give a detailed analysis of the Adaptive Metropolis Algorithm of Haario et al. (2001) when the target distribution is sub-exponential in the tails. △ Less

Submitted 3 September, 2009; v1 submitted 18 July, 2008; originally announced July 2008.

MSC Class: 60J10; 65C05

Journal ref: Bernoulli 16, 1 (2010) 116--154

arXiv:math/0703836 [pdf, ps, other]

Forgetting of the initial distribution for Hidden Markov Models

Authors: Randal Douc, Gersende Fort, Eric Moulines, Pierre Priouret

Abstract: The forgetting of the initial distribution for discrete Hidden Markov Models (HMM) is addressed: a new set of conditions is proposed, to establish the forgetting property of the filter, at a polynomial and geometric rate. Both a pathwise-type convergence of the total variation distance of the filter started from two different initial distributions, and a convergence in expectation are considered… ▽ More The forgetting of the initial distribution for discrete Hidden Markov Models (HMM) is addressed: a new set of conditions is proposed, to establish the forgetting property of the filter, at a polynomial and geometric rate. Both a pathwise-type convergence of the total variation distance of the filter started from two different initial distributions, and a convergence in expectation are considered. The results are illustrated using different HMM of interest: the dynamic tobit model, the non-linear state space model and the stochastic volatility model. △ Less

Submitted 28 March, 2007; originally announced March 2007.

MSC Class: ACM : 93E11; 60B10; 60G35

Journal ref: Stochastic Processes and their Applications (2008) A paraitre

arXiv:math/0607800 [pdf, ps, other]

doi 10.1214/07-AAP471

The ODE method for stability of skip-free Markov chains with applications to MCMC

Authors: Gersende Fort, Sean Meyn, Eric Moulines, Pierre Priouret

Abstract: Fluid limit techniques have become a central tool to analyze queueing networks over the last decade, with applications to performance analysis, simulation and optimization. In this paper, some of these techniques are extended to a general class of skip-free Markov chains. As in the case of queueing models, a fluid approximation is obtained by scaling time, space and the initial condition by a la… ▽ More Fluid limit techniques have become a central tool to analyze queueing networks over the last decade, with applications to performance analysis, simulation and optimization. In this paper, some of these techniques are extended to a general class of skip-free Markov chains. As in the case of queueing models, a fluid approximation is obtained by scaling time, space and the initial condition by a large constant. The resulting fluid limit is the solution of an ordinary differential equation (ODE) in ``most'' of the state space. Stability and finer ergodic properties for the stochastic model then follow from stability of the set of fluid limits. Moreover, similarly to the queueing context where fluid models are routinely used to design control policies, the structure of the limiting ODE in this general setting provides an understanding of the dynamics of the Markov chain. These results are illustrated through application to Markov chain Monte Carlo methods. △ Less

Submitted 2 April, 2008; v1 submitted 31 July, 2006; originally announced July 2006.

Comments: Published in at http://dx.doi.org/10.1214/07-AAP471 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

MSC Class: 60J10; 65C05 (Primary)

Journal ref: Annals of Applied Probability 18, 2 (2008) 664-707

arXiv:math/0605791 [pdf, ps, other]

Subgeometric rates of convergence of f-ergodic strong Markov processes

Authors: Randal Douc, Gersende Fort, Arnaud Guillin

Abstract: We provide a condition for f-ergodicity of strong Markov processes at a subgeometric rate. This condition is couched in terms of a supermartingale property for a functional of the Markov process. Equivalent formulations in terms of a drift inequality on the extended generator and on the resolvent kernel are given. Results related to (f,r)-regularity and to moderate deviation principle for integr… ▽ More We provide a condition for f-ergodicity of strong Markov processes at a subgeometric rate. This condition is couched in terms of a supermartingale property for a functional of the Markov process. Equivalent formulations in terms of a drift inequality on the extended generator and on the resolvent kernel are given. Results related to (f,r)-regularity and to moderate deviation principle for integral (bounded) functional are also derived. Applications to specific processes are considered, including elliptic stochastic differential equation, Langevin diffusions, hypoelliptic stochastic dam** Hamiltonian system and storage models. △ Less

Submitted 31 May, 2006; originally announced May 2006.

MSC Class: 60J25; 37A25; 60F10; 60J35; 60J60

arXiv:math/0505260 [pdf, ps, other]

doi 10.1214/105051605000000115

Subgeometric ergodicity of strong Markov processes

Authors: G. Fort, G. O. Roberts

Abstract: We derive sufficient conditions for subgeometric f-ergodicity of strongly Markovian processes. We first propose a criterion based on modulated moment of some delayed return-time to a petite set. We then formulate a criterion for polynomial f-ergodicity in terms of a drift condition on the generator. Applications to specific processes are considered, including Langevin tempered diffusions on R^n… ▽ More We derive sufficient conditions for subgeometric f-ergodicity of strongly Markovian processes. We first propose a criterion based on modulated moment of some delayed return-time to a petite set. We then formulate a criterion for polynomial f-ergodicity in terms of a drift condition on the generator. Applications to specific processes are considered, including Langevin tempered diffusions on R^n and storage models. △ Less

Submitted 12 May, 2005; originally announced May 2005.

Comments: Published at http://dx.doi.org/10.1214/105051605000000115 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP093 MSC Class: 60J25 (Primary) 60J60; 60K30. (Secondary)

Journal ref: Annals of Applied Probability 2005, Vol. 15, No. 2, 1565-1589

arXiv:math/0407122 [pdf, ps, other]

doi 10.1214/105051604000000323

Practical drift conditions for subgeometric rates of convergence

Authors: Randal Douc, Gersende Fort, Eric Moulines, Philippe Soulier

Abstract: We present a new drift condition which implies rates of convergence to the stationary distribution of the iterates of a ψ-irreducible aperiodic and positive recurrent transition kernel. This condition, extending a condition introduced by Jarner and Roberts [Ann. Appl. Probab. 12 (2002) 224-247] for polynomial convergence rates, turns out to be very convenient to prove subgeometric rates of conve… ▽ More We present a new drift condition which implies rates of convergence to the stationary distribution of the iterates of a ψ-irreducible aperiodic and positive recurrent transition kernel. This condition, extending a condition introduced by Jarner and Roberts [Ann. Appl. Probab. 12 (2002) 224-247] for polynomial convergence rates, turns out to be very convenient to prove subgeometric rates of convergence. Several applications are presented including nonlinear autoregressive models, stochastic unit root models and multidimensional random walk Hastings-Metropolis algorithms. △ Less

Submitted 8 July, 2004; originally announced July 2004.

Report number: IMS-AAP-AAP004 MSC Class: 60J10. (Primary)

Journal ref: Annals of Probability 2004, Vol. 14, No. 3, 1353-1377

Showing 1–45 of 45 results for author: Fort, G