-
Optimal convergence rates of totally asynchronous optimization
Authors:
Xuyang Wu,
Sindri Magnusson,
Hamid Reza Feyzmahdavian,
Mikael Johansson
Abstract:
Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when they operate under total asynchrony. In this paper, we derive explicit convergence rates for the proximal incremental aggregated gradient (PIAG) and the asynchrono…
▽ More
Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when they operate under total asynchrony. In this paper, we derive explicit convergence rates for the proximal incremental aggregated gradient (PIAG) and the asynchronous block-coordinate descent (Async-BCD) methods under a specific model of total asynchrony, and show that the derived rates are order-optimal. The convergence bounds provide an insightful understanding of how the growth rate of the delays deteriorates the convergence times of the algorithms. Our theoretical findings are demonstrated by a numerical example.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Delay-adaptive step-sizes for asynchronous learning
Authors:
Xuyang Wu,
Sindri Magnusson,
Hamid Reza Feyzmahdavian,
Mikael Johansson
Abstract:
In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this…
▽ More
In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.
△ Less
Submitted 11 April, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees
Authors:
Hamid Reza Feyzmahdavian,
Mikael Johansson
Abstract:
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization…
▽ More
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates.
△ Less
Submitted 3 April, 2023; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Advances in Asynchronous Parallel and Distributed Optimization
Authors:
Mahmoud Assran,
Arda Aytekin,
Hamid Feyzmahdavian,
Mikael Johansson,
Michael Rabbat
Abstract:
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of compu…
▽ More
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights as to how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Distributed learning with compressed gradients
Authors:
Sarit Khirirat,
Hamid Reza Feyzmahdavian,
Mikael Johansson
Abstract:
Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed gradient methods operating with staled and compressed gradients. Non-asymptotic bounds on convergence rates and information exchange are derived for several optim…
▽ More
Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed gradient methods operating with staled and compressed gradients. Non-asymptotic bounds on convergence rates and information exchange are derived for several optimization algorithms. These bounds give explicit expressions for step-sizes and characterize how the amount of asynchrony and the compression accuracy affect iteration and communication complexity guarantees. Numerical results highlight convergence properties of different gradient compression algorithms and confirm that fast convergence under limited information exchange is indeed possible.
△ Less
Submitted 29 November, 2018; v1 submitted 18 June, 2018;
originally announced June 2018.
-
Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server
Authors:
Arda Aytekin,
Hamid Reza Feyzmahdavian,
Mikael Johansson
Abstract:
This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit express…
▽ More
This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated convergence factors. The expressions have an explicit dependence on the degree of asynchrony and recover classical results under synchronous operation. Simulations and implementations on commercial compute clouds validate our findings.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
-
Stability Analysis of Monotone Systems via Max-separable Lyapunov Functions
Authors:
Hamid Reza Feyzmahdavian,
Bart Besselink,
Mikael Johansson
Abstract:
We analyze stability properties of monotone nonlinear systems via max-separable Lyapunov functions, motivated by the following observations: first, recent results have shown that asymptotic stability of a monotone nonlinear system implies the existence of a max-separable Lyapunov function on a compact set; second, for monotone linear systems, asymptotic stability implies the stronger properties of…
▽ More
We analyze stability properties of monotone nonlinear systems via max-separable Lyapunov functions, motivated by the following observations: first, recent results have shown that asymptotic stability of a monotone nonlinear system implies the existence of a max-separable Lyapunov function on a compact set; second, for monotone linear systems, asymptotic stability implies the stronger properties of D-stability and insensitivity to time-delays. This paper establishes that for monotone nonlinear systems, equivalence holds between asymptotic stability, the existence of a max-separable Lyapunov function, D-stability, and insensitivity to bounded and unbounded time-varying delays. In particular, a new and general notion of D-stability for monotone nonlinear systems is discussed and a set of necessary and sufficient conditions for delay-independent stability are derived. Examples show how the results extend the state-of-the-art.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.
-
An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization
Authors:
Hamid Reza Feyzmahdavian,
Arda Aytekin,
Mikael Johansson
Abstract:
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers i…
▽ More
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order $O(1/\sqrt{T})$ for general convex regularization functions, and the rate $O(1/T)$ for strongly convex regularization functions, where $T$ is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.
△ Less
Submitted 18 May, 2015;
originally announced May 2015.
-
Global convergence of the Heavy-ball method for convex optimization
Authors:
Euhanna Ghadimi,
Hamid Reza Feyzmahdavian,
Mikael Johansson
Abstract:
This paper establishes global convergence and provides global bounds of the convergence rate of the Heavy-ball method for convex optimization problems. When the objective function has Lipschitz-continuous gradient, we show that the Cesaro average of the iterates converges to the optimum at a rate of $O(1/k)$ where k is the number of iterations. When the objective function is also strongly convex,…
▽ More
This paper establishes global convergence and provides global bounds of the convergence rate of the Heavy-ball method for convex optimization problems. When the objective function has Lipschitz-continuous gradient, we show that the Cesaro average of the iterates converges to the optimum at a rate of $O(1/k)$ where k is the number of iterations. When the objective function is also strongly convex, we prove that the Heavy-ball iterates converge linearly to the unique optimum.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.
-
Sub-homogeneous positive monotone systems are insensitive to heterogeneous time-varying delays
Authors:
Hamid Reza Feyzmahdavian,
Themistoklis Charalambous,
Mikael Johansson
Abstract:
We show that a sub-homogeneous positive monotone system with bounded heterogeneous time-varying delays is globally asymptotically stable if and only if the corresponding delay-free system is globally asymptotically stable. The proof is based on an extension of a delay-independent stability result for monotone systems under constant delays by Smith to systems with bounded heterogeneous time-varying…
▽ More
We show that a sub-homogeneous positive monotone system with bounded heterogeneous time-varying delays is globally asymptotically stable if and only if the corresponding delay-free system is globally asymptotically stable. The proof is based on an extension of a delay-independent stability result for monotone systems under constant delays by Smith to systems with bounded heterogeneous time-varying delays. Under the additional assumption of positivity and sub-homogeneous vector fields, we establish the aforementioned delay insensitivity property and derive a novel test for global asymptotic stability. If the system has a unique equilibrium point in the positive orthant, we prove that our stability test is necessary and sufficient. Specialized to positive linear systems, our results extend and sharpen existing results from the literature.
△ Less
Submitted 6 July, 2014;
originally announced July 2014.
-
Asymptotic Stability and Decay Rates of Homogeneous Positive Systems With Bounded and Unbounded Delays
Authors:
Hamid Reza Feyzmahdavian,
Themistoklis Charalambous,
Mikael Johansson
Abstract:
There are several results on the stability of nonlinear positive systems in the presence of time delays. However, most of them assume that the delays are constant. This paper considers time-varying, possibly unbounded, delays and establishes asymptotic stability and bounds the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case.…
▽ More
There are several results on the stability of nonlinear positive systems in the presence of time delays. However, most of them assume that the delays are constant. This paper considers time-varying, possibly unbounded, delays and establishes asymptotic stability and bounds the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case. Specifically, we present a necessary and sufficient condition for delay-independent stability of continuous-time positive systems whose vector fields are cooperative and homogeneous. We show that global asymptotic stability of such systems is independent of the magnitude and variation of the time delays. For various classes of time delays, we are able to derive explicit expressions that quantify the decay rates of positive systems. We also provide the corresponding counterparts for discrete-time positive systems whose vector fields are non-decreasing and homogeneous.
△ Less
Submitted 30 September, 2014; v1 submitted 27 June, 2014;
originally announced June 2014.
-
Exponential Stability of Homogeneous Positive Systems of Degree One With Time-Varying Delays
Authors:
Hamid Reza Feyzmahdavian,
Themistoklis Charalambous,
Mikael Johansson
Abstract:
While the asymptotic stability of positive linear systems in the presence of bounded time delays has been thoroughly investigated, the theory for nonlinear positive systems is considerably less well-developed. This paper presents a set of conditions for establishing delay-independent stability and bounding the decay rate of a significant class of nonlinear positive systems which includes positive…
▽ More
While the asymptotic stability of positive linear systems in the presence of bounded time delays has been thoroughly investigated, the theory for nonlinear positive systems is considerably less well-developed. This paper presents a set of conditions for establishing delay-independent stability and bounding the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case. Specifically, when the time delays have a known upper bound, we derive necessary and sufficient conditions for exponential stability of (a) continuous-time positive systems whose vector fields are homogeneous and cooperative, and (b) discrete-time positive systems whose vector fields are homogeneous and order preserving. We then present explicit expressions that allow us to quantify the impact of delays on the decay rate and show that the best decay rate of positive linear systems that our bounds provide can be found via convex optimization. Finally, we extend the results to general linear systems with time-varying delays.
△ Less
Submitted 12 November, 2013;
originally announced November 2013.
-
Optimal Distributed Controller Design with Communication Delays: Application to Vehicle Formations
Authors:
Hamid Reza Feyzmahdavian,
Assad Alam,
Ather Gattami
Abstract:
This paper develops a controller synthesis algorithm for distributed LQG control problems under output feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case of this problem has previously been solved, the extension to output-feedback is nontrivial, as the classical separation principle fails.…
▽ More
This paper develops a controller synthesis algorithm for distributed LQG control problems under output feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case of this problem has previously been solved, the extension to output-feedback is nontrivial, as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components. One is delayed centralized LQR, and the other is the sum of correction terms based on additional local information. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.
-
Distributed Output-Feedback LQG Control with Delayed Information Sharing
Authors:
Hamid Reza Feyzmahdavian,
Ather Gattami,
Mikael Johansson
Abstract:
This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal…
▽ More
This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components: a centralized LQG-optimal controller under delayed state observations, and a sum of correction terms based on additional local information available to decision makers. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller.
△ Less
Submitted 17 September, 2013; v1 submitted 27 April, 2012;
originally announced April 2012.
-
Contractive Interference Functions and Rates of Convergence of Distributed Power Control Laws
Authors:
Hamid Reza Feyzmahdavian,
Mikael Johansson,
Themistoklis Charalambous
Abstract:
The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference function…
▽ More
The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference functions, a slight reformulation of the standard interference functions that guarantees the existence and uniqueness of fixed-points along with linear convergence of iterates. We show that many power control laws from the literature are contractive and derive, sometimes for the first time, analytical convergence rate estimates for these algorithms. We also prove that contractive interference functions converge when executed totally asynchronously and, under the assumption that the communication delay is bounded, derive an explicit bound on the convergence time penalty due to increased delay. Finally, we demonstrate that although standard interference functions are, in general, not contractive, they are all para-contractions with respect to a certain metric. Similar results for two-sided scalable interference functions are also derived.
△ Less
Submitted 30 May, 2012; v1 submitted 18 January, 2012;
originally announced January 2012.