Skip to main content

Showing 1–9 of 9 results for author: Gasanov, E

.
  1. arXiv:2402.10774  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness Constants

    Authors: Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko

    Abstract: Error Feedback (EF) is a highly popular and immensely effective mechanism for fixing convergence issues which arise in distributed training methods (such as distributed GD or SGD) when these are enhanced with greedy communication compression techniques such as TopK. While EF was proposed almost a decade ago (Seide et al., 2014), and despite concentrated effort by the community to advance the theor… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 70 pages, 14 figures, 6 tables

    MSC Class: 90C26; 74Pxx ACM Class: G.1.6; I.2.11; I.2.m

  2. arXiv:2306.03626  [pdf, other

    cs.LG math.OC

    Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

    Authors: Rafał Szlendak, Elnur Gasanov, Peter Richtárik

    Abstract: We propose a Randomized Progressive Training algorithm (RPT) -- a stochastic proxy for the well-known Progressive Training method (PT) (Karras et al., 2017). Originally designed to train GANs (Goodfellow et al., 2014), PT was proposed as a heuristic, with no convergence analysis even for the simplest objective functions. On the contrary, to the best of our knowledge, RPT is the first PT-type algor… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  3. arXiv:2305.15264  [pdf, other

    math.OC cs.DC cs.LG stat.ML

    Error Feedback Shines when Features are Rare

    Authors: Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko

    Abstract: We provide the first proof that gradient descent $\left({\color{green}\sf GD}\right)$ with greedy sparsification $\left({\color{green}\sf TopK}\right)$ and error feedback $\left({\color{green}\sf EF}\right)$ can obtain better communication complexity than vanilla ${\color{green}\sf GD}$ when solving the distributed optimization problem… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  4. arXiv:2211.00188  [pdf, other

    cs.LG cs.DC cs.IT

    Adaptive Compression for Communication-Efficient Distributed Training

    Authors: Maksim Makarenko, Elnur Gasanov, Rustem Islamov, Abdurakhmon Sadiev, Peter Richtarik

    Abstract: We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently proposed three point compressor (3PC) framework of Richtarik et al. (2022), which includes error feedback (EF21), lazily aggregated gradient (LAG), and their com… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

  5. arXiv:2202.00998  [pdf, other

    cs.LG cs.DC cs.DS math.OC

    3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

    Authors: Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov

    Abstract: We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. Unlike most established approaches, which rely on a static compressor choice (e.g., Top-$K$), our class allows the compressors to {\em evolve} throughout the… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 52 pages

  6. arXiv:2111.11556  [pdf, other

    cs.LG math.OC stat.ML

    FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning

    Authors: Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik

    Abstract: Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling sev… ▽ More

    Submitted 23 February, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: V2: includes non-convex analysis as well as new large-scale experiments with neural networks. To appear in AISTATS 2022

  7. arXiv:2106.04469  [pdf, other

    math.OC cs.LG

    Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

    Authors: Dmitry Kovalev, Elnur Gasanov, Peter Richtárik, Alexander Gasnikov

    Abstract: We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  8. arXiv:2004.01442  [pdf, other

    cs.LG math.OC stat.ML

    From Local SGD to Local Fixed-Point Methods for Federated Learning

    Authors: Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik

    Abstract: Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computatio… ▽ More

    Submitted 16 June, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Accepted to ICML 2020

  9. arXiv:1802.03703  [pdf, other

    math.OC

    Stochastic Spectral and Conjugate Descent Methods

    Authors: Dmitry Kovalev, Eduard Gorbunov, Elnur Gasanov, Peter Richtárik

    Abstract: The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate o… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.