-
Long-time asymptotics of noisy SVGD outside the population limit
Authors:
Victor Priser,
Pascal Bianchi,
Adil Salim
Abstract:
Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., a…
▽ More
Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., after numerous iterations ) is still not understood in the finite number of particles regime. We study the long-time asymptotic behavior of a noisy variant of SVGD. First, we establish that the limit set of noisy SVGD for large is well-defined. We then characterize this limit set, showing that it approaches the target distribution as increases. In particular, noisy SVGD provably avoids the variance collapse observed for SVGD. Our approach involves demonstrating that the trajectories of noisy SVGD closely resemble those described by a McKean-Vlasov process.
△ Less
Submitted 21 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Long run convergence of discrete-time interacting particle systems of the McKean-Vlasov type
Authors:
Pascal Bianchi,
Walid Hachem,
Victor Priser
Abstract:
We consider a discrete-time system of n coupled random vectors, a.k.a. interacting particles. The dynamics involve a vanishing step size, some random centered perturbations, and a mean vector field which induces the coupling between the particles. We study the doubly asymptotic regime where both the number of iterations and the number n of particles tend to infinity, without any constraint on…
▽ More
We consider a discrete-time system of n coupled random vectors, a.k.a. interacting particles. The dynamics involve a vanishing step size, some random centered perturbations, and a mean vector field which induces the coupling between the particles. We study the doubly asymptotic regime where both the number of iterations and the number n of particles tend to infinity, without any constraint on the relative rates of convergence of these two parameters. We establish that the empirical measure of the interpolated trajectories of the particles converges in probability, in an ergodic sense, to the set of recurrent Mc-Kean-Vlasov distributions. A first application example is the granular media equation, where the particles are shown to converge to a critical point of the Helmholtz energy. A second example is the convergence of stochastic gradient descent to the global minimizer of the risk, in a wide two-layer neural networks using random features.
△ Less
Submitted 3 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
A closed-measure approach to stochastic approximation
Authors:
Pascal Bianchi,
Rodolfo Rios-Zertuche
Abstract:
This paper introduces a new method to tackle the issue of the almost sure convergence of stochastic approximation algorithms defined from a differential inclusion. Under the assumption of slowly decaying step-sizes, we establish that the set of essential accumulation points of the iterates belongs to the Birkhoff center associated with the differential inclusion. Unlike previous works, our results…
▽ More
This paper introduces a new method to tackle the issue of the almost sure convergence of stochastic approximation algorithms defined from a differential inclusion. Under the assumption of slowly decaying step-sizes, we establish that the set of essential accumulation points of the iterates belongs to the Birkhoff center associated with the differential inclusion. Unlike previous works, our results do not rely on the notion of asymptotic pseudotrajectories introduced by Benaïm--Hofbauer--Sorin, which is the predominant technique to address the convergence problem. They follow as a consequence of Young's superposition principle for closed measures. This perspective bridges the gap between Young's principle and the notion of invariant measure of set-valued dynamical systems introduced by Faure and Roth. Also, the proposed method allows to obtain sufficient conditions under which the velocities locally compensate around any essential accumulation point.
△ Less
Submitted 2 December, 2023; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions
Authors:
Pascal Bianchi,
Walid Hachem,
Sholom Schechtman
Abstract:
In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We r…
▽ More
In non-smooth stochastic optimization, we establish the non-convergence of the stochastic subgradient descent (SGD) to the critical points recently called active strict saddles by Davis and Drusvyatskiy. Such points lie on a manifold $M$ where the function $f$ has a direction of second-order negative curvature. Off this manifold, the norm of the Clarke subdifferential of $f$ is lower-bounded. We require two conditions on $f$. The first assumption is a Verdier stratification condition, which is a refinement of the popular Whitney stratification. It allows us to establish a reinforced version of the projection formula of Bolte \emph{et.al.} for Whitney stratifiable functions, and which is of independent interest. The second assumption, termed the angle condition, allows to control the distance of the iterates to $M$. When $f$ is weakly convex, our assumptions are generic. Consequently, generically in the class of definable weakly convex functions, the SGD converges to a local minimizer.
△ Less
Submitted 25 July, 2023; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation
Authors:
Anas Barakat,
Pascal Bianchi,
Julien Lehmann
Abstract:
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic…
▽ More
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods.
△ Less
Submitted 22 February, 2022; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
Authors:
A. Barakat,
P. Bianchi,
W. Hachem,
Sh. Schechtman
Abstract:
In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recen…
▽ More
In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.
△ Less
Submitted 10 July, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Convergence of constant step stochastic gradient descent for non-smooth non-convex functions
Authors:
Pascal Bianchi,
Walid Hachem,
Sholom Schechtman
Abstract:
This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function F , defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; an ot…
▽ More
This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function F , defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; an other choice is the output of the celebrated backpropagation algorithm, which is popular amongst practionners, and whose properties have recently been studied by Bolte and Pauwels [7]. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential BF of the mean function, it has been assumed in the literature that an oracle of BF is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of the differential inclusion. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function F .
△ Less
Submitted 12 April, 2022; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization
Authors:
Anas Barakat,
Pascal Bianchi
Abstract:
Although ADAM is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of ADAM have been proposed to circumvent this convergence issue. In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate.…
▽ More
Although ADAM is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of ADAM have been proposed to circumvent this convergence issue. In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property.
△ Less
Submitted 24 September, 2020; v1 submitted 18 November, 2019;
originally announced November 2019.
-
A Fully Stochastic Primal-Dual Algorithm
Authors:
Pascal Bianchi,
Walid Hachem,
Adil Salim
Abstract:
A new stochastic primal--dual algorithm for solving a composite optimization problem is proposed. It is assumed that all the functions/operators that enter the optimization problem are given as statistical expectations. These expectations are unknown but revealed across time through i.i.d. realizations. The proposed algorithm is proven to converge to a saddle point of the Lagrangian function. In t…
▽ More
A new stochastic primal--dual algorithm for solving a composite optimization problem is proposed. It is assumed that all the functions/operators that enter the optimization problem are given as statistical expectations. These expectations are unknown but revealed across time through i.i.d. realizations. The proposed algorithm is proven to converge to a saddle point of the Lagrangian function. In the framework of the monotone operator theory, the convergence proof relies on recent results on the stochastic Forward Backward algorithm involving random monotone operators. An example of convex optimization under stochastic linear constraints is considered.
△ Less
Submitted 22 June, 2020; v1 submitted 23 January, 2019;
originally announced January 2019.
-
Convergence and Dynamical Behavior of the ADAM Algorithm for Non-Convex Stochastic Optimization
Authors:
Anas Barakat,
Pascal Bianchi
Abstract:
Adam is a popular variant of stochastic gradient descent for finding a local minimizer of a function. In the constant stepsize regime, assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point under a stability condition. The key ingredient is the introduction of a continuous-time version of Adam, under…
▽ More
Adam is a popular variant of stochastic gradient descent for finding a local minimizer of a function. In the constant stepsize regime, assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point under a stability condition. The key ingredient is the introduction of a continuous-time version of Adam, under the form of a non-autonomous ordinary differential equation. This continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly towards the solution to the ODE. The existence and the uniqueness of the solution are established. We further show the convergence of the solution towards the critical points of the objective function and quantify its convergence rate under a Lojasiewicz assumption. Then, we introduce a novel decreasing stepsize version of Adam. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.
△ Less
Submitted 13 May, 2020; v1 submitted 4 October, 2018;
originally announced October 2018.
-
A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
Authors:
Adil Salim,
Pascal Bianchi,
Walid Hachem
Abstract:
The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions. The algorithm consists in fixed point iterations involving computations of the proximity operators of the two functions separately. The paper investigates a stochastic version of the algorithm where both functions are random and the step size is constant. We establish that the iterates of…
▽ More
The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions. The algorithm consists in fixed point iterations involving computations of the proximity operators of the two functions separately. The paper investigates a stochastic version of the algorithm where both functions are random and the step size is constant. We establish that the iterates of the algorithm stay close to the set of solution with high probability when the step size is small enough. Application to structured regularization is considered.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs
Authors:
Adil Salim,
Pascal Bianchi,
Walid Hachem
Abstract:
A regularized optimization problem over a large unstructured graph is studied, where the regularization term is tied to the graph geometry. Typical regularization examples include the total variation and the Laplacian regularizations over the graph. When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backwar…
▽ More
A regularized optimization problem over a large unstructured graph is studied, where the regularization term is tied to the graph geometry. Typical regularization examples include the total variation and the Laplacian regularizations over the graph. When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backward step) in the special case where the graph is a simple path without loops. In this paper, an algorithm, referred to as "Snake", is proposed to solve such regularized problems over general graphs, by taking benefit of these fast methods. The algorithm consists in properly selecting random simple paths in the graph and performing the proximal gradient algorithm over these simple paths. This algorithm is an instance of a new general stochastic proximal gradient algorithm, whose convergence is proven. Applications to trend filtering and graph inpainting are provided among others. Numerical experiments are conducted over large graphs.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Distributed Deblurring of Large Images of Wide Field-Of-View
Authors:
Rahul Mourya,
André Ferrari,
Rémi Flamary,
Pascal Bianchi,
Cédric Richard
Abstract:
Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view…
▽ More
Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view suffering from shift-variant blur. Most of the existing image deblurring techniques are designed and implemented to work efficiently on centralized computing system having multiple processors and a shared memory. Thus, the largest image that can be handle is limited by the size of the physical memory available on the system. In this paper, we propose a distributed nonblind image deblurring algorithm in which several connected processing nodes (with reasonable computational resources) process simultaneously different portions of a large image while maintaining certain coherency among them to finally obtain a single crisp image. Unlike the existing centralized techniques, image deblurring in distributed fashion raises several issues. To tackle these issues, we consider certain approximations that trade-offs between the quality of deblurred image and the computational resources required to achieve it. The experimental results show that our algorithm produces the similar quality of images as the existing centralized techniques while allowing distribution, and thus being cost effective for extremely large images.
△ Less
Submitted 17 May, 2017;
originally announced May 2017.
-
A constant step Forward-Backward algorithm involving random maximal monotone operators
Authors:
Pascal Bianchi,
Walid Hachem,
Adil Salim
Abstract:
A stochastic Forward-Backward algorithm with a constant step is studied. At each time step, this algorithm involves an independent copy of a couple of random maximal monotone operators. Defining a mean operator as a selection integral, the differential inclusion built from the sum of the two mean operators is considered. As a first result, it is shown that the interpolated process obtained from th…
▽ More
A stochastic Forward-Backward algorithm with a constant step is studied. At each time step, this algorithm involves an independent copy of a couple of random maximal monotone operators. Defining a mean operator as a selection integral, the differential inclusion built from the sum of the two mean operators is considered. As a first result, it is shown that the interpolated process obtained from the iterates converges narrowly in the small step regime to the solution of this differential inclusion. In order to control the long term behavior of the iterates, a stability result is needed in addition. To this end, the sequence of the iterates is seen as a homogeneous Feller Markov chain whose transition kernel is parameterized by the algorithm step size. The cluster points of the Markov chains invariant measures in the small step regime are invariant for the semiflow induced by the differential inclusion. Conclusions regarding the long run behavior of the iterates for small steps are drawn. It is shown that when the sum of the mean operators is demipositive, the probabilities that the iterates are away from the set of zeros of this sum are small in Cesàro mean. The ergodic behavior of these iterates is studied as well. Applications of the proposed algorithm are considered. In particular, a detailed analysis of the random proximal gradient algorithm with constant step is performed.
△ Less
Submitted 4 April, 2018; v1 submitted 14 February, 2017;
originally announced February 2017.
-
Constant Step Stochastic Approximations Involving Differential Inclusions: Stability, Long-Run Convergence and Applications
Authors:
Pascal Bianchi,
Walid Hachem,
Adil Salim
Abstract:
We consider a Markov chain $(x_n)$ whose kernel is indexed by a scaling parameter $γ>0$, refered to as the step size. The aim is to analyze the behavior of the Markov chain in the doubly asymptotic regime where $n\to\infty$ then $γ\to 0$. First, under mild assumptions on the so-called drift of the Markov chain, we show that the interpolated process converges narrowly to the solutions of a Differen…
▽ More
We consider a Markov chain $(x_n)$ whose kernel is indexed by a scaling parameter $γ>0$, refered to as the step size. The aim is to analyze the behavior of the Markov chain in the doubly asymptotic regime where $n\to\infty$ then $γ\to 0$. First, under mild assumptions on the so-called drift of the Markov chain, we show that the interpolated process converges narrowly to the solutions of a Differential Inclusion (DI) involving an upper semicontinuous set-valued map with closed and convex values. Second, we provide verifiable conditions which ensure the stability of the iterates. Third, by putting the above results together, we establish the long run convergence of the iterates as $γ\to 0$, to the Birkhoff center of the DI. The ergodic behavior of the iterates is also provided. Application examples are investigated. We apply our findings to 1) the problem of nonconvex proximal stochastic optimization and 2) a fluid model of parallel queues.
△ Less
Submitted 14 December, 2017; v1 submitted 12 December, 2016;
originally announced December 2016.
-
A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non Separable Functions
Authors:
Olivier Fercoq,
Pascal Bianchi
Abstract:
This paper introduces a coordinate descent version of the Vũ-Condat algorithm. By coordinate descent, we mean that only a subset of the coordinates of the primal and dual iterates is updated at each iteration, the other coordinates being maintained to their past value. Our method allows us to solve optimization problems with a combination of differentiable functions, constraints as well as non-sep…
▽ More
This paper introduces a coordinate descent version of the Vũ-Condat algorithm. By coordinate descent, we mean that only a subset of the coordinates of the primal and dual iterates is updated at each iteration, the other coordinates being maintained to their past value. Our method allows us to solve optimization problems with a combination of differentiable functions, constraints as well as non-separable and non-differentiable regularizers. We show that the sequences generated by our algorithm converge to a saddle point of the problem at stake, for a wider range of parameter values than previous methods. In particular, the condition on the step-sizes depends on the coordinate-wise Lipschitz constant of the differentiable function's gradient, which is a major feature allowing classical coordinate descent to perform so well when it is applicable. We then prove a sublinear rate of convergence in general and a linear rate of convergence if the objective enjoys strong convexity properties. We illustrate the performances of the algorithm on a total-variation regularized least squares regression problem and on large scale support vector machine problems.
△ Less
Submitted 2 February, 2018; v1 submitted 19 August, 2015;
originally announced August 2015.
-
Dynamical behavior of a stochastic forward-backward algorithm using random monotone operators
Authors:
Pascal Bianchi,
Walid Hachem
Abstract:
The purpose of this paper is to study the dynamical behavior of the sequence produced by a forward-backward algorithm involving two random maximal monotone operators and a sequence of decreasing step sizes. Defining a mean monotone operator as an Aumann integral, and assuming that the sum of the two mean operators is maximal (sufficient maximality conditions are provided), it is shown that with pr…
▽ More
The purpose of this paper is to study the dynamical behavior of the sequence produced by a forward-backward algorithm involving two random maximal monotone operators and a sequence of decreasing step sizes. Defining a mean monotone operator as an Aumann integral, and assuming that the sum of the two mean operators is maximal (sufficient maximality conditions are provided), it is shown that with probability one, the interpolated process obtained from the iterates is an asymptotic pseudo trajectory in the sense of Benaïm and Hirsch of the differential inclusion involving the sum of the mean operators. The convergence of the empirical means of the iterates towards a zero of the sum of the mean operators is shown, as well as the convergence of the sequence itself to such a zero under a demipositivity assumption. These results find applications in a wide range of optimization or variational inequality problems in random environments.
△ Less
Submitted 4 July, 2016; v1 submitted 12 August, 2015;
originally announced August 2015.
-
Ergodic convergence of a stochastic proximal point algorithm
Authors:
Pascal Bianchi
Abstract:
The purpose of this paper is to establish the almost sure weak ergodic convergence of a sequence of iterates $(x_n)$ given by $x_{n+1} = (I+λ_n A(ξ_{n+1},\,.\,))^{-1}(x_n)$ where $(A(s,\,.\,):s\in E)$ is a collection of maximal monotone operators on a separable Hilbert space, $(ξ_n)$ is an independent identically distributed sequence of random variables on $E$ and $(λ_n)$ is a positive sequence in…
▽ More
The purpose of this paper is to establish the almost sure weak ergodic convergence of a sequence of iterates $(x_n)$ given by $x_{n+1} = (I+λ_n A(ξ_{n+1},\,.\,))^{-1}(x_n)$ where $(A(s,\,.\,):s\in E)$ is a collection of maximal monotone operators on a separable Hilbert space, $(ξ_n)$ is an independent identically distributed sequence of random variables on $E$ and $(λ_n)$ is a positive sequence in $\ell^2\backslash \ell^1$. The weighted averaged sequence of iterates is shown to converge weakly to a zero (assumed to exist) of the Aumann expectation ${\mathbb E}(A(ξ_1,\,.\,))$ under the assumption that the latter is maximal. We consider applications to stochastic optimization problems of the form $\min {\mathbb E}(f(ξ_1,x))$ w.r.t. $x\in \bigcap_{i=1}^m X_i$ where $f$ is a normal convex integrand and $(X_i)$ is a collection of closed convex sets. In this case, the iterations are closely related to a stochastic proximal algorithm recently proposed by Wang and Bertsekas.
△ Less
Submitted 25 July, 2016; v1 submitted 21 April, 2015;
originally announced April 2015.
-
Success and Failure of Adaptation-Diffusion Algorithms for Consensus in Multi-Agent Networks
Authors:
Gemma Morral,
Pascal Bianchi,
Gersende Fort
Abstract:
This paper investigates the problem of distributed stochastic approximation in multi-agent systems. The algorithm under study consists of two steps: a local stochastic approximation step and a diffusion step which drives the network to a consensus. The diffusion step uses row-stochastic matrices to weight the network exchanges. As opposed to previous works, exchange matrices are not supposed to be…
▽ More
This paper investigates the problem of distributed stochastic approximation in multi-agent systems. The algorithm under study consists of two steps: a local stochastic approximation step and a diffusion step which drives the network to a consensus. The diffusion step uses row-stochastic matrices to weight the network exchanges. As opposed to previous works, exchange matrices are not supposed to be doubly stochastic, and may also depend on the past estimate.
We prove that non-doubly stochastic matrices generally influence the limit points of the algorithm. Nevertheless, the limit points are not affected by the choice of the matrices provided that the latter are doubly-stochastic in expectation. This conclusion legitimates the use of broadcast-like diffusion protocols, which are easier to implement. Next, by means of a central limit theorem, we prove that doubly stochastic protocols perform asymptotically as well as centralized algorithms and we quantify the degradation caused by the use of non doubly stochastic matrices. Throughout the paper, a special emphasis is put on the special case of distributed non-convex optimization as an illustration of our results.
△ Less
Submitted 25 October, 2014;
originally announced October 2014.
-
A Coordinate Descent Primal-Dual Algorithm and Application to Distributed Asynchronous Optimization
Authors:
Pascal Bianchi,
Walid Hachem,
Franck Iutzeler
Abstract:
Based on the idea of randomized coordinate descent of $α$-averaged operators, a randomized primal-dual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed by Vũ and Condat that includes the well known ADMM as a particular case. The obtained algorithm is used to so…
▽ More
Based on the idea of randomized coordinate descent of $α$-averaged operators, a randomized primal-dual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed by Vũ and Condat that includes the well known ADMM as a particular case. The obtained algorithm is used to solve asynchronously a distributed optimization problem. A network of agents, each having a separate cost function containing a differentiable term, seek to find a consensus on the minimum of the aggregate objective. The method yields an algorithm where at each iteration, a random subset of agents wake up, update their local estimates, exchange some data with their neighbors, and go idle. Numerical results demonstrate the attractive performance of the method. The general approach can be naturally adapted to other situations where coordinate descent convex optimization algorithms are used with a random choice of the coordinates.
△ Less
Submitted 30 September, 2015; v1 submitted 3 July, 2014;
originally announced July 2014.
-
Explicit Convergence Rate of a Distributed Alternating Direction Method of Multipliers
Authors:
Franck Iutzeler,
Pascal Bianchi,
Philippe Ciblat,
Walid Hachem
Abstract:
Consider a set of N agents seeking to solve distributively the minimization problem $\inf_{x} \sum_{n = 1}^N f_n(x)$ where the convex functions $f_n$ are local to the agents. The popular Alternating Direction Method of Multipliers has the potential to handle distributed optimization problems of this kind. We provide a general reformulation of the problem and obtain a class of distributed algorithm…
▽ More
Consider a set of N agents seeking to solve distributively the minimization problem $\inf_{x} \sum_{n = 1}^N f_n(x)$ where the convex functions $f_n$ are local to the agents. The popular Alternating Direction Method of Multipliers has the potential to handle distributed optimization problems of this kind. We provide a general reformulation of the problem and obtain a class of distributed algorithms which encompass various network architectures. The rate of convergence of our method is considered. It is assumed that the infimum of the problem is reached at a point $x_\star$, the functions $f_n$ are twice differentiable at this point and $\sum \nabla^2 f_n(x_\star) > 0$ in the positive definite ordering of symmetric matrices. With these assumptions, it is shown that the convergence to the consensus $x_\star$ is linear and the exact rate is provided. Application examples where this rate can be optimized with respect to the ADMM free parameter $ρ$ are also given.
△ Less
Submitted 28 December, 2014; v1 submitted 4 December, 2013;
originally announced December 2013.
-
Robust Consensus in Distributed Networks using Total Variation
Authors:
Walid Ben-Ameur,
Pascal Bianchi,
Jérémie Jakubowicz
Abstract:
Consider a connected network of agents endowed with local cost functions representing private objectives. Agents seek to find an agreement on some minimizer of the aggregate cost, by means of repeated communications between neighbors. Consensus on the average over the network, usually addressed by gossip algorithms, is a special instance of this problem, corresponding to quadratic private objectiv…
▽ More
Consider a connected network of agents endowed with local cost functions representing private objectives. Agents seek to find an agreement on some minimizer of the aggregate cost, by means of repeated communications between neighbors. Consensus on the average over the network, usually addressed by gossip algorithms, is a special instance of this problem, corresponding to quadratic private objectives. Consensus on the median, or more generally quantiles, is also a special instance, as many more consensus problems. In this paper we show that optimizing the aggregate cost function regularized by a total variation term has appealing properties. First, it can be done very naturally in a distributed way, yielding algorithms that are efficient on numerical simulations. Secondly, the optimum for the regularized cost is shown to be also the optimum for the initial aggregate cost function under assumptions that are simple to state and easily verifiable. Finally, these algorithms are robust to unreliable agents that keep injecting some false value in the network. This is remarkable enough, and is not the case, for instance, of gossip algorithms, that are entirely ruled by unreliable agents as detailed in the paper.
△ Less
Submitted 27 September, 2013;
originally announced September 2013.
-
Asynchronous Distributed Optimization using a Randomized Alternating Direction Method of Multipliers
Authors:
Franck Iutzeler,
Pascal Bianchi,
Philippe Ciblat,
Walid Hachem
Abstract:
Consider a set of networked agents endowed with private cost functions and seeking to find a consensus on the minimizer of the aggregate cost. A new class of random asynchronous distributed optimization methods is introduced. The methods generalize the standard Alternating Direction Method of Multipliers (ADMM) to an asynchronous setting where isolated components of the network are activated in an…
▽ More
Consider a set of networked agents endowed with private cost functions and seeking to find a consensus on the minimizer of the aggregate cost. A new class of random asynchronous distributed optimization methods is introduced. The methods generalize the standard Alternating Direction Method of Multipliers (ADMM) to an asynchronous setting where isolated components of the network are activated in an uncoordinated fashion. The algorithms rely on the introduction of randomized Gauss-Seidel iterations of a Douglas-Rachford operator for finding zeros of a sum of two monotone operators. Convergence to the sought minimizers is provided under mild connectivity conditions. Numerical results sustain our claims.
△ Less
Submitted 12 March, 2013;
originally announced March 2013.
-
Performance of a Distributed Stochastic Approximation Algorithm
Authors:
Pascal Bianchi,
Gersende Fort,
Walid Hachem
Abstract:
In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a loca…
▽ More
In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a local weighted average between its estimates and those of its neighbors. Convergence of the estimates toward a consensus is established under weak assumptions. The approach relies on two main ingredients: the existence of a Lyapunov function for the mean field in the agreement subspace, and a contraction property of the random matrices of weights in the subspace orthogonal to the agreement subspace. A second order analysis of the algorithm is also performed under the form of a Central Limit Theorem. The Polyak-averaged version of the algorithm is also considered.
△ Less
Submitted 2 December, 2013; v1 submitted 7 March, 2012;
originally announced March 2012.
-
Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization
Authors:
Pascal Bianchi,
Jérémie Jakubowicz
Abstract:
We introduce a new framework for the convergence analysis of a class of distributed constrained non-convex optimization algorithms in multi-agent systems. The aim is to search for local minimizers of a non-convex objective function which is supposed to be a sum of local utility functions of the agents. The algorithm under study consists of two steps: a local stochastic gradient descent at each age…
▽ More
We introduce a new framework for the convergence analysis of a class of distributed constrained non-convex optimization algorithms in multi-agent systems. The aim is to search for local minimizers of a non-convex objective function which is supposed to be a sum of local utility functions of the agents. The algorithm under study consists of two steps: a local stochastic gradient descent at each agent and a gossip step that drives the network of agents to a consensus. Under the assumption of decreasing stepsize, it is proved that consensus is asymptotically achieved in the network and that the algorithm converges to the set of Karush-Kuhn-Tucker points. As an important feature, the algorithm does not require the double-stochasticity of the gossip matrices. It is in particular suitable for use in a natural broadcast scenario for which no feedback messages between agents are required. It is proved that our result also holds if the number of communications in the network per unit of time vanishes at moderate speed as time increases, allowing for potential savings of the network's energy. Applications to power allocation in wireless ad-hoc networks are discussed. Finally, we provide numerical results which sustain our claims.
△ Less
Submitted 2 December, 2013; v1 submitted 13 July, 2011;
originally announced July 2011.
-
High-Rate Vector Quantization for the Neyman-Pearson Detection of Correlated Processes
Authors:
Joffrey Villard,
Pascal Bianchi
Abstract:
This paper investigates the effect of quantization on the performance of the Neyman-Pearson test. It is assumed that a sensing unit observes samples of a correlated stationary ergodic multivariate process. Each sample is passed through an N-point quantizer and transmitted to a decision device which performs a binary hypothesis test. For any false alarm level, it is shown that the miss probability…
▽ More
This paper investigates the effect of quantization on the performance of the Neyman-Pearson test. It is assumed that a sensing unit observes samples of a correlated stationary ergodic multivariate process. Each sample is passed through an N-point quantizer and transmitted to a decision device which performs a binary hypothesis test. For any false alarm level, it is shown that the miss probability of the Neyman-Pearson test converges to zero exponentially as the number of samples tends to infinity, assuming that the observed process satisfies certain mixing conditions. The main contribution of this paper is to provide a compact closed-form expression of the error exponent in the high-rate regime i.e., when the number N of quantization levels tends to infinity, generalizing previous results of Gupta and Hero to the case of non-independent observations. If d represents the dimension of one sample, it is proved that the error exponent converges at rate N^{2/d} to the one obtained in the absence of quantization. As an application, relevant high-rate quantization strategies which lead to a large error exponent are determined. Numerical results indicate that the proposed quantization rule can yield better performance than existing ones in terms of detection error.
△ Less
Submitted 4 May, 2011; v1 submitted 30 April, 2010;
originally announced April 2010.
-
Performance of Statistical Tests for Single Source Detection using Random Matrix Theory
Authors:
Pascal Bianchi,
Merouane Debbah,
Mylène Maïda,
Jamal Najim
Abstract:
This paper introduces a unified framework for the detection of a source with a sensor array in the context where the noise variance and the channel between the source and the sensors are unknown at the receiver. The Generalized Maximum Likelihood Test is studied and yields the analysis of the ratio between the maximum eigenvalue of the sampled covariance matrix and its normalized trace. Using rece…
▽ More
This paper introduces a unified framework for the detection of a source with a sensor array in the context where the noise variance and the channel between the source and the sensors are unknown at the receiver. The Generalized Maximum Likelihood Test is studied and yields the analysis of the ratio between the maximum eigenvalue of the sampled covariance matrix and its normalized trace. Using recent results of random matrix theory, a practical way to evaluate the threshold and the $p$-value of the test is provided in the asymptotic regime where the number $K$ of sensors and the number $N$ of observations per sensor are large but have the same order of magnitude. The theoretical performance of the test is then analyzed in terms of Receiver Operating Characteristic (ROC) curve. It is in particular proved that both Type I and Type II error probabilities converge to zero exponentially as the dimensions increase at the same rate, and closed-form expressions are provided for the error exponents. These theoretical results rely on a precise description of the large deviations of the largest eigenvalue of spiked random matrix models, and establish that the presented test asymptotically outperforms the popular test based on the condition number of the sampled covariance matrix.
△ Less
Submitted 31 May, 2010; v1 submitted 5 October, 2009;
originally announced October 2009.
-
Asymptotic Independence in the Spectrum of the Gaussian Unitary Ensemble
Authors:
P. Bianchi,
M. Debbah,
J. Najim
Abstract:
Consider a $n \times n$ matrix from the Gaussian Unitary Ensemble (GUE). Given a finite collection of bounded disjoint real Borel sets $(Δ_{i,n},\ 1\leq i\leq p)$, properly rescaled, and eventually included in any neighbourhood of the support of Wigner's semi-circle law, we prove that the related counting measures $({\mathcal N}_n(Δ_{i,n}), 1\leq i\leq p)$, where ${\mathcal N}_n(Δ)$ represents t…
▽ More
Consider a $n \times n$ matrix from the Gaussian Unitary Ensemble (GUE). Given a finite collection of bounded disjoint real Borel sets $(Δ_{i,n},\ 1\leq i\leq p)$, properly rescaled, and eventually included in any neighbourhood of the support of Wigner's semi-circle law, we prove that the related counting measures $({\mathcal N}_n(Δ_{i,n}), 1\leq i\leq p)$, where ${\mathcal N}_n(Δ)$ represents the number of eigenvalues within $Δ$, are asymptotically independent as the size $n$ goes to infinity, $p$ being fixed.
As a consequence, we prove that the largest and smallest eigenvalues, properly centered and rescaled, are asymptotically independent; we finally describe the fluctuations of the condition number of a matrix from the GUE.
△ Less
Submitted 6 November, 2008;
originally announced November 2008.