Search | arXiv e-print repository

Optimal convergence rates of totally asynchronous optimization

Authors: Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when they operate under total asynchrony. In this paper, we derive explicit convergence rates for the proximal incremental aggregated gradient (PIAG) and the asynchrono… ▽ More Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when they operate under total asynchrony. In this paper, we derive explicit convergence rates for the proximal incremental aggregated gradient (PIAG) and the asynchronous block-coordinate descent (Async-BCD) methods under a specific model of total asynchrony, and show that the derived rates are order-optimal. The convergence bounds provide an insightful understanding of how the growth rate of the delays deteriorates the convergence times of the algorithms. Our theoretical findings are demonstrated by a numerical example. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 6 pages

arXiv:2202.08550 [pdf, ps, other]

Delay-adaptive step-sizes for asynchronous learning

Authors: Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this… ▽ More In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art. △ Less

Submitted 11 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: 21 pages, 4 figures

arXiv:2109.04522 [pdf, other]

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees

Authors: Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization… ▽ More We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates. △ Less

Submitted 3 April, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: 62 pages, 1 Figure

arXiv:2006.13838 [pdf, other]

Advances in Asynchronous Parallel and Distributed Optimization

Authors: Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat

Abstract: Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of compu… ▽ More Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights as to how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 33 pages, 4 figures

arXiv:1806.06573 [pdf, other]

Distributed learning with compressed gradients

Authors: Sarit Khirirat, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed gradient methods operating with staled and compressed gradients. Non-asymptotic bounds on convergence rates and information exchange are derived for several optim… ▽ More Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed gradient methods operating with staled and compressed gradients. Non-asymptotic bounds on convergence rates and information exchange are derived for several optimization algorithms. These bounds give explicit expressions for step-sizes and characterize how the amount of asynchrony and the compression accuracy affect iteration and communication complexity guarantees. Numerical results highlight convergence properties of different gradient compression algorithms and confirm that fast convergence under limited information exchange is indeed possible. △ Less

Submitted 29 November, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

Comments: 33 pages, 4 figures, 2 tables

arXiv:1610.05507 [pdf, ps, other]

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

Authors: Arda Aytekin, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit express… ▽ More This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated convergence factors. The expressions have an explicit dependence on the degree of asynchrony and recover classical results under synchronous operation. Simulations and implementations on commercial compute clouds validate our findings. △ Less

Submitted 18 October, 2016; originally announced October 2016.

Comments: 10 pages, 3 figures

arXiv:1607.07966 [pdf, ps, other]

Stability Analysis of Monotone Systems via Max-separable Lyapunov Functions

Authors: Hamid Reza Feyzmahdavian, Bart Besselink, Mikael Johansson

Abstract: We analyze stability properties of monotone nonlinear systems via max-separable Lyapunov functions, motivated by the following observations: first, recent results have shown that asymptotic stability of a monotone nonlinear system implies the existence of a max-separable Lyapunov function on a compact set; second, for monotone linear systems, asymptotic stability implies the stronger properties of… ▽ More We analyze stability properties of monotone nonlinear systems via max-separable Lyapunov functions, motivated by the following observations: first, recent results have shown that asymptotic stability of a monotone nonlinear system implies the existence of a max-separable Lyapunov function on a compact set; second, for monotone linear systems, asymptotic stability implies the stronger properties of D-stability and insensitivity to time-delays. This paper establishes that for monotone nonlinear systems, equivalence holds between asymptotic stability, the existence of a max-separable Lyapunov function, D-stability, and insensitivity to bounded and unbounded time-varying delays. In particular, a new and general notion of D-stability for monotone nonlinear systems is discussed and a set of necessary and sufficient conditions for delay-independent stability are derived. Examples show how the results extend the state-of-the-art. △ Less

Submitted 27 July, 2016; originally announced July 2016.

arXiv:1505.04824 [pdf, other]

An Asynchronous Mini-Batch Algorithm for Regularized Stochastic Optimization

Authors: Hamid Reza Feyzmahdavian, Arda Aytekin, Mikael Johansson

Abstract: Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers i… ▽ More Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order $O(1/\sqrt{T})$ for general convex regularization functions, and the rate $O(1/T)$ for strongly convex regularization functions, where $T$ is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure. △ Less

Submitted 18 May, 2015; originally announced May 2015.

arXiv:1412.7457 [pdf, other]

Global convergence of the Heavy-ball method for convex optimization

Authors: Euhanna Ghadimi, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: This paper establishes global convergence and provides global bounds of the convergence rate of the Heavy-ball method for convex optimization problems. When the objective function has Lipschitz-continuous gradient, we show that the Cesaro average of the iterates converges to the optimum at a rate of $O(1/k)$ where k is the number of iterations. When the objective function is also strongly convex,… ▽ More This paper establishes global convergence and provides global bounds of the convergence rate of the Heavy-ball method for convex optimization problems. When the objective function has Lipschitz-continuous gradient, we show that the Cesaro average of the iterates converges to the optimum at a rate of $O(1/k)$ where k is the number of iterations. When the objective function is also strongly convex, we prove that the Heavy-ball iterates converge linearly to the unique optimum. △ Less

Submitted 23 December, 2014; originally announced December 2014.

arXiv:1407.1502 [pdf, other]

Sub-homogeneous positive monotone systems are insensitive to heterogeneous time-varying delays

Authors: Hamid Reza Feyzmahdavian, Themistoklis Charalambous, Mikael Johansson

Abstract: We show that a sub-homogeneous positive monotone system with bounded heterogeneous time-varying delays is globally asymptotically stable if and only if the corresponding delay-free system is globally asymptotically stable. The proof is based on an extension of a delay-independent stability result for monotone systems under constant delays by Smith to systems with bounded heterogeneous time-varying… ▽ More We show that a sub-homogeneous positive monotone system with bounded heterogeneous time-varying delays is globally asymptotically stable if and only if the corresponding delay-free system is globally asymptotically stable. The proof is based on an extension of a delay-independent stability result for monotone systems under constant delays by Smith to systems with bounded heterogeneous time-varying delays. Under the additional assumption of positivity and sub-homogeneous vector fields, we establish the aforementioned delay insensitivity property and derive a novel test for global asymptotic stability. If the system has a unique equilibrium point in the positive orthant, we prove that our stability test is necessary and sufficient. Specialized to positive linear systems, our results extend and sharpen existing results from the literature. △ Less

Submitted 6 July, 2014; originally announced July 2014.

Comments: Submitted to the 21st International Symposium on Mathematical Theory of Networks and Systems (MTNS), 2014

arXiv:1406.7210 [pdf, ps, other]

Asymptotic Stability and Decay Rates of Homogeneous Positive Systems With Bounded and Unbounded Delays

Authors: Hamid Reza Feyzmahdavian, Themistoklis Charalambous, Mikael Johansson

Abstract: There are several results on the stability of nonlinear positive systems in the presence of time delays. However, most of them assume that the delays are constant. This paper considers time-varying, possibly unbounded, delays and establishes asymptotic stability and bounds the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case.… ▽ More There are several results on the stability of nonlinear positive systems in the presence of time delays. However, most of them assume that the delays are constant. This paper considers time-varying, possibly unbounded, delays and establishes asymptotic stability and bounds the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case. Specifically, we present a necessary and sufficient condition for delay-independent stability of continuous-time positive systems whose vector fields are cooperative and homogeneous. We show that global asymptotic stability of such systems is independent of the magnitude and variation of the time delays. For various classes of time delays, we are able to derive explicit expressions that quantify the decay rates of positive systems. We also provide the corresponding counterparts for discrete-time positive systems whose vector fields are non-decreasing and homogeneous. △ Less

Submitted 30 September, 2014; v1 submitted 27 June, 2014; originally announced June 2014.

Comments: SIAM Journal on Control and Optimization

Journal ref: SIAM Journal on Control and Optimization, 52(4), pp. 2623-2650, 2014

arXiv:1311.2897 [pdf, other]

doi 10.1109/TAC.2013.2292739

Exponential Stability of Homogeneous Positive Systems of Degree One With Time-Varying Delays

Authors: Hamid Reza Feyzmahdavian, Themistoklis Charalambous, Mikael Johansson

Abstract: While the asymptotic stability of positive linear systems in the presence of bounded time delays has been thoroughly investigated, the theory for nonlinear positive systems is considerably less well-developed. This paper presents a set of conditions for establishing delay-independent stability and bounding the decay rate of a significant class of nonlinear positive systems which includes positive… ▽ More While the asymptotic stability of positive linear systems in the presence of bounded time delays has been thoroughly investigated, the theory for nonlinear positive systems is considerably less well-developed. This paper presents a set of conditions for establishing delay-independent stability and bounding the decay rate of a significant class of nonlinear positive systems which includes positive linear systems as a special case. Specifically, when the time delays have a known upper bound, we derive necessary and sufficient conditions for exponential stability of (a) continuous-time positive systems whose vector fields are homogeneous and cooperative, and (b) discrete-time positive systems whose vector fields are homogeneous and order preserving. We then present explicit expressions that allow us to quantify the impact of delays on the decay rate and show that the best decay rate of positive linear systems that our bounds provide can be found via convex optimization. Finally, we extend the results to general linear systems with time-varying delays. △ Less

Submitted 12 November, 2013; originally announced November 2013.

Comments: Submitted to IEEE Transactions on Automatic Control

Journal ref: IEEE Transactions on Automatic Control, 59 (6), pp. 1594-1599, June 2014

arXiv:1309.4251 [pdf, other]

doi 10.1109/CDC.2012.6426380

Optimal Distributed Controller Design with Communication Delays: Application to Vehicle Formations

Authors: Hamid Reza Feyzmahdavian, Assad Alam, Ather Gattami

Abstract: This paper develops a controller synthesis algorithm for distributed LQG control problems under output feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case of this problem has previously been solved, the extension to output-feedback is nontrivial, as the classical separation principle fails.… ▽ More This paper develops a controller synthesis algorithm for distributed LQG control problems under output feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case of this problem has previously been solved, the extension to output-feedback is nontrivial, as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components. One is delayed centralized LQR, and the other is the sum of correction terms based on additional local information. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller. △ Less

Submitted 17 September, 2013; originally announced September 2013.

Comments: Submitted to the 51nd IEEE Conference on Decision and Control, 2012

arXiv:1204.6178 [pdf, other]

Distributed Output-Feedback LQG Control with Delayed Information Sharing

Authors: Hamid Reza Feyzmahdavian, Ather Gattami, Mikael Johansson

Abstract: This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal… ▽ More This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components: a centralized LQG-optimal controller under delayed state observations, and a sum of correction terms based on additional local information available to decision makers. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller. △ Less

Submitted 17 September, 2013; v1 submitted 27 April, 2012; originally announced April 2012.

Comments: 25 pages, 3 figures

arXiv:1201.3740 [pdf, ps, other]

Contractive Interference Functions and Rates of Convergence of Distributed Power Control Laws

Authors: Hamid Reza Feyzmahdavian, Mikael Johansson, Themistoklis Charalambous

Abstract: The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference function… ▽ More The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference functions, a slight reformulation of the standard interference functions that guarantees the existence and uniqueness of fixed-points along with linear convergence of iterates. We show that many power control laws from the literature are contractive and derive, sometimes for the first time, analytical convergence rate estimates for these algorithms. We also prove that contractive interference functions converge when executed totally asynchronously and, under the assumption that the communication delay is bounded, derive an explicit bound on the convergence time penalty due to increased delay. Finally, we demonstrate that although standard interference functions are, in general, not contractive, they are all para-contractions with respect to a certain metric. Similar results for two-sided scalable interference functions are also derived. △ Less

Submitted 30 May, 2012; v1 submitted 18 January, 2012; originally announced January 2012.

Comments: 20 pages, 1 figures

Journal ref: IEEE Transactions on Wireless Communications, 11 (12), pp. 4494-4502, December 2012

Showing 1–15 of 15 results for author: Feyzmahdavian, H