Search | arXiv e-print repository

Stochastic Variable Metric Proximal Gradient with variance reduction for non-convex composite optimization

Abstract: This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward-Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strat… ▽ More This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm (3P-SPIDER), designed to solve finite sum non-convex composite optimization. It is a stochastic Variable Metric Forward-Backward algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strategy with variance reduction to address the finite sum setting. We show that 3P-SPIDER extends some Stochastic preconditioned Gradient Descent-based algorithms and some Incremental Expectation Maximization algorithms to composite optimization and to the case the forward operator can not be computed in closed form. We also provide an explicit control of convergence in expectation of 3P-SPIDER, and study its complexity in order to satisfy the epsilon-approximate stationary condition. Our results are the first to combine the composite non-convex optimization setting, a variance reduction technique to tackle the finite sum setting by using a minibatch strategy and, to allow deterministic or random approximations of the preconditioned forward operator. Finally, through an application to inference in a logistic regression model with random effects, we numerically compare 3P-SPIDER to other stochastic forward-backward algorithms and discuss the role of some design parameters of 3P-SPIDER. △ Less

Submitted 8 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: Statistics and Computing, In press

arXiv:2203.09142 [pdf, ps, other]

doi 10.1109/TSP.2023.3247142

Covid19 Reproduction Number: Credibility Intervals by Blockwise Proximal Monte Carlo Samplers

Authors: Gersende Fort, Barbara Pascal, Patrice Abry, Nelly Pustelnik

Abstract: Monitoring the Covid19 pandemic constitutes a critical societal stake that received considerable research efforts. The intensity of the pandemic on a given territory is efficiently measured by the reproduction number, quantifying the rate of growth of daily new infections. Recently, estimates for the time evolution of the reproduction number were produced using an inverse problem formulation wit… ▽ More Monitoring the Covid19 pandemic constitutes a critical societal stake that received considerable research efforts. The intensity of the pandemic on a given territory is efficiently measured by the reproduction number, quantifying the rate of growth of daily new infections. Recently, estimates for the time evolution of the reproduction number were produced using an inverse problem formulation with a nonsmooth functional minimization. While it was designed to be robust to the limited quality of the Covid19 data (outliers, missing counts), the procedure lacks the ability to output credibility interval based estimates. This remains a severe limitation for practical use in actual pandemic monitoring by epidemiologists that the present work aims to overcome by use of Monte Carlo sampling. After interpretation of the nonsmooth functional into a Bayesian framework, several sampling schemes are tailored to adjust the nonsmooth nature of the resulting posterior distribution. The originality of the devised algorithms stems from combining a Langevin Monte Carlo sampling scheme with Proximal operators. Performance of the new algorithms in producing relevant credibility intervals for the reproduction number estimates and denoised counts are compared. Assessment is conducted on real daily new infection counts made available by the Johns Hopkins University. The interest of the devised monitoring tools are illustrated on Covid19 data from several different countries. △ Less

Submitted 8 March, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

Journal ref: IEEE Transactions on Signal Processing, In press

arXiv:2111.02083 [pdf, other]

Federated Expectation Maximization with heterogeneity mitigation and variance reduction

Authors: Aymeric Dieuleveut, Gersende Fort, Eric Moulines, Geneviève Robin

Abstract: The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning contex… ▽ More The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning context. FedEM is a new communication efficient method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets. To alleviate the communication bottleneck, FedEM compresses appropriately defined complete data sufficient statistics. We also develop and analyze an extension of FedEM to further incorporate a variance reduction scheme. In all cases, we derive finite-time complexity bounds for smooth non-convex problems. Numerical results are presented to support our theoretical findings, as well as an application to federated missing values imputation for biodiversity monitoring. △ Less

Submitted 10 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

Journal ref: NeurIPS 2021 - 35th Conference on Neural Information Processing Systems, Dec 2021, Sydney, Australia

arXiv:2105.11732 [pdf, ps, other]

The Perturbed Prox-Preconditioned SPIDER algorithm for EM-based large scale learning

Authors: Gersende Fort, Eric Moulines

Abstract: Incremental Expectation Maximization (EM) algorithms were introduced to design EM for the large scale learning framework by avoiding the full data set to be processed at each iteration. Nevertheless, these algorithms all assume that the conditional expectations of the sufficient statistics are explicit. In this paper, we propose a novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPID… ▽ More Incremental Expectation Maximization (EM) algorithms were introduced to design EM for the large scale learning framework by avoiding the full data set to be processed at each iteration. Nevertheless, these algorithms all assume that the conditional expectations of the sufficient statistics are explicit. In this paper, we propose a novel algorithm named Perturbed Prox-Preconditioned SPIDER (3P-SPIDER), which builds on the Stochastic Path Integral Differential EstimatoR EM (SPIDER-EM) algorithm. The 3P-SPIDER algorithm addresses many intractabilities of the E-step of EM; it also deals with non-smooth regularization and convex constraint set. Numerical experiments show that 3P-SPIDER outperforms other incremental EM methods and discuss the role of some design parameters. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Journal ref: IEEE Statistical Signal Processing Workshop, Jul 2021, Rio de Janeiro, Brazil

arXiv:2012.14670 [pdf, ps, other]

Fast Incremental Expectation Maximization for finite-sum optimization: nonasymptotic convergence

Authors: Gersende Fort, P. Gach, E. Moulines

Abstract: Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {\em Stochastic Approximation within EM} framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations… ▽ More Fast Incremental Expectation Maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the {\em Stochastic Approximation within EM} framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples $n$ and of the maximal number of iterations $\kmax$. We propose two strategies for achieving an $ε$-approximate stationary point, respectively with $\kmax = O(n^{2/3}/ε)$ and $\kmax = O(\sqrt{n}/ε^{3/2})$, both strategies relying on a random termination rule before $\kmax$ and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow $\kmax$ to scale as $\sqrt{n}$ which is better than $n^{2/3}$ which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance $ε$, thus making this control relevant for small to medium accuracy with respect to the number of examples $n$. Second, for the $n^{2/3}$-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results desig a less conservative choice of the step size and provide a better control of the convergence in expectation. △ Less

Submitted 29 December, 2020; originally announced December 2020.

arXiv:2012.01929 [pdf, ps, other]

A Stochastic Path-Integrated Differential EstimatoR Expectation Maximization Algorithm

Authors: Gersende Fort, Eric Moulines, Hoi-To Wai

Abstract: The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called \texttt{SPIDER-EM}, for inference from a training set of size $n$, $n \gg 1$. At the core of our algorithm is an estimator of the full conditional expectation in the {\sf E}-ste… ▽ More The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called \texttt{SPIDER-EM}, for inference from a training set of size $n$, $n \gg 1$. At the core of our algorithm is an estimator of the full conditional expectation in the {\sf E}-step, adapted from the stochastic path-integrated differential estimator ({\tt SPIDER}) technique. We derive finite-time complexity bounds for smooth non-convex likelihood: we show that for convergence to an $ε$-approximate stationary point, the complexity scales as $K_{\operatorname{Opt}} (n,ε)={\cal O}(ε^{-1})$ and $K_{\operatorname{CE}}( n,ε) = n+ \sqrt{n} {\cal O}(ε^{-1} )$, where $K_{\operatorname{Opt}}( n,ε)$ and $K_{\operatorname{CE}}(n, ε)$ are respectively the number of {\sf M}-steps and the number of per-sample conditional expectations evaluations. This improves over the state-of-the-art algorithms. Numerical results support our findings. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Journal ref: Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2020), 2020

arXiv:2011.12392 [pdf, ps, other]

Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation Maximization for Nonconvex Finite-Sum Optimization

Authors: Gersende Fort, Eric Moulines, Hoi-To Wai

Abstract: The Expectation Maximization (EM) algorithm is a key reference for inference in latent variable models; unfortunately, its computational cost is prohibitive in the large scale learning setting. In this paper, we propose an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derive complexity bounds for this novel algorithm, designed to solve smooth nonconvex finit… ▽ More The Expectation Maximization (EM) algorithm is a key reference for inference in latent variable models; unfortunately, its computational cost is prohibitive in the large scale learning setting. In this paper, we propose an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derive complexity bounds for this novel algorithm, designed to solve smooth nonconvex finite-sum optimization problems. We show that it reaches the same state of the art complexity bounds as SPIDER-EM; and provide conditions for a linear rate of convergence. Numerical results support our findings. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: Submitted to an International conference, with reviewing process

arXiv:1510.03638 [pdf, other]

Spatial Prediction Under Location Uncertainty In Cellular Networks

Authors: Hajer Braham, Sana Ben Jemaa, Gersende Fort, Eric Moulines, Berna Sayrac

Abstract: Coverage optimization is an important process for the operator as it is a crucial prerequisite towards offering a satisfactory quality of service to the end-users. The first step of this process is coverage prediction, which can be performed by interpolating geo-located measurements reported to the network by mobile users' equipments. In previous works, we proposed a low complexity coverage predic… ▽ More Coverage optimization is an important process for the operator as it is a crucial prerequisite towards offering a satisfactory quality of service to the end-users. The first step of this process is coverage prediction, which can be performed by interpolating geo-located measurements reported to the network by mobile users' equipments. In previous works, we proposed a low complexity coverage prediction algorithm based on the adaptation of the Geo-statistics Fixed Rank Kriging (FRK) algorithm. We supposed that the geo-location information reported with the radio measurements was perfect, which is not the case in reality. In this paper, we study the impact of location uncertainty on the coverage prediction accuracy and we extend the previously proposed algorithm to include geo-location error in the prediction model. We validate the proposed algorithm using both simulated and real field measurements. The FRK extended to take into account the location uncertainty proves to enhance the prediction accuracy while kee** a reasonable computational complexity. △ Less

Submitted 14 March, 2016; v1 submitted 13 October, 2015; originally announced October 2015.

arXiv:1505.07062 [pdf, other]

Fixed Rank Kriging for Cellular Coverage Analysis

Authors: Hajer Braham, Sana Ben Jemaa, Gersende Fort, Eric Moulines, Berna Sayrac

Abstract: Coverage planning and optimization is one of the most crucial tasks for a radio network operator. Efficient coverage optimization requires accurate coverage estimation. This estimation relies on geo-located field measurements which are gathered today during highly expensive drive tests (DT); and will be reported in the near future by users' mobile devices thanks to the 3GPP Minimizing Drive Tests… ▽ More Coverage planning and optimization is one of the most crucial tasks for a radio network operator. Efficient coverage optimization requires accurate coverage estimation. This estimation relies on geo-located field measurements which are gathered today during highly expensive drive tests (DT); and will be reported in the near future by users' mobile devices thanks to the 3GPP Minimizing Drive Tests (MDT) feature~\cite{3GPPproposal}. This feature consists in an automatic reporting of the radio measurements associated with the geographic location of the user's mobile device. Such a solution is still costly in terms of battery consumption and signaling overhead. Therefore, predicting the coverage on a location where no measurements are available remains a key and challenging task. This paper describes a powerful tool that gives an accurate coverage prediction on the whole area of interest: it builds a coverage map by spatially interpolating geo-located measurements using the Kriging technique. The paper focuses on the reduction of the computational complexity of the Kriging algorithm by applying Fixed Rank Kriging (FRK). The performance evaluation of the FRK algorithm both on simulated measurements and real field measurements shows a good trade-off between prediction efficiency and computational complexity. In order to go a step further towards the operational application of the proposed algorithm, a multicellular use-case is studied. Simulation results show a good performance in terms of coverage prediction and detection of the best serving cell. △ Less

Submitted 14 March, 2016; v1 submitted 26 May, 2015; originally announced May 2015.

arXiv:1410.6956 [pdf, other]

Success and Failure of Adaptation-Diffusion Algorithms for Consensus in Multi-Agent Networks

Authors: Gemma Morral, Pascal Bianchi, Gersende Fort

Abstract: This paper investigates the problem of distributed stochastic approximation in multi-agent systems. The algorithm under study consists of two steps: a local stochastic approximation step and a diffusion step which drives the network to a consensus. The diffusion step uses row-stochastic matrices to weight the network exchanges. As opposed to previous works, exchange matrices are not supposed to be… ▽ More This paper investigates the problem of distributed stochastic approximation in multi-agent systems. The algorithm under study consists of two steps: a local stochastic approximation step and a diffusion step which drives the network to a consensus. The diffusion step uses row-stochastic matrices to weight the network exchanges. As opposed to previous works, exchange matrices are not supposed to be doubly stochastic, and may also depend on the past estimate. We prove that non-doubly stochastic matrices generally influence the limit points of the algorithm. Nevertheless, the limit points are not affected by the choice of the matrices provided that the latter are doubly-stochastic in expectation. This conclusion legitimates the use of broadcast-like diffusion protocols, which are easier to implement. Next, by means of a central limit theorem, we prove that doubly stochastic protocols perform asymptotically as well as centralized algorithms and we quantify the degradation caused by the use of non doubly stochastic matrices. Throughout the paper, a special emphasis is put on the special case of distributed non-convex optimization as an illustration of our results. △ Less

Submitted 25 October, 2014; originally announced October 2014.

Comments: 13 pages, 4 figures

arXiv:1203.1505 [pdf, other]

Performance of a Distributed Stochastic Approximation Algorithm

Authors: Pascal Bianchi, Gersende Fort, Walid Hachem

Abstract: In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a loca… ▽ More In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a local weighted average between its estimates and those of its neighbors. Convergence of the estimates toward a consensus is established under weak assumptions. The approach relies on two main ingredients: the existence of a Lyapunov function for the mean field in the agreement subspace, and a contraction property of the random matrices of weights in the subspace orthogonal to the agreement subspace. A second order analysis of the algorithm is also performed under the form of a Central Limit Theorem. The Polyak-averaged version of the algorithm is also considered. △ Less

Submitted 2 December, 2013; v1 submitted 7 March, 2012; originally announced March 2012.

Comments: IEEE Transactions on Information Theory 2013

Showing 1–11 of 11 results for author: Fort, G