Search | arXiv e-print repository

Delay-adaptive step-sizes for asynchronous learning

Authors: Xuyang Wu, Sindri Magnusson, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this… ▽ More In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art. △ Less

Submitted 11 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: 21 pages, 4 figures

arXiv:2109.04522 [pdf, other]

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees

Authors: Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization… ▽ More We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-coordinate implementations of Krasnoselskii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates. △ Less

Submitted 3 April, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: 62 pages, 1 Figure

arXiv:2006.13838 [pdf, other]

Advances in Asynchronous Parallel and Distributed Optimization

Authors: Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat

Abstract: Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of compu… ▽ More Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights as to how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 33 pages, 4 figures

arXiv:1610.05507 [pdf, ps, other]

Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

Authors: Arda Aytekin, Hamid Reza Feyzmahdavian, Mikael Johansson

Abstract: This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit express… ▽ More This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated convergence factors. The expressions have an explicit dependence on the degree of asynchrony and recover classical results under synchronous operation. Simulations and implementations on commercial compute clouds validate our findings. △ Less

Submitted 18 October, 2016; originally announced October 2016.

Comments: 10 pages, 3 figures

arXiv:1201.3740 [pdf, ps, other]

Contractive Interference Functions and Rates of Convergence of Distributed Power Control Laws

Authors: Hamid Reza Feyzmahdavian, Mikael Johansson, Themistoklis Charalambous

Abstract: The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference function… ▽ More The standard interference functions introduced by Yates have been very influential on the analysis and design of distributed power control laws. While powerful and versatile, the framework has some drawbacks: the existence of fixed-points has to be established separately, and no guarantees are given on the rate of convergence of the iterates. This paper introduces contractive interference functions, a slight reformulation of the standard interference functions that guarantees the existence and uniqueness of fixed-points along with linear convergence of iterates. We show that many power control laws from the literature are contractive and derive, sometimes for the first time, analytical convergence rate estimates for these algorithms. We also prove that contractive interference functions converge when executed totally asynchronously and, under the assumption that the communication delay is bounded, derive an explicit bound on the convergence time penalty due to increased delay. Finally, we demonstrate that although standard interference functions are, in general, not contractive, they are all para-contractions with respect to a certain metric. Similar results for two-sided scalable interference functions are also derived. △ Less

Submitted 30 May, 2012; v1 submitted 18 January, 2012; originally announced January 2012.

Comments: 20 pages, 1 figures

Journal ref: IEEE Transactions on Wireless Communications, 11 (12), pp. 4494-4502, December 2012

Showing 1–5 of 5 results for author: Feyzmahdavian, H