Skip to main content

Showing 1–26 of 26 results for author: Kovalev, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18031  [pdf, other

    math.OC cs.LG

    Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks

    Authors: Dmitry Kovalev, Ekaterina Borodich, Alexander Gasnikov, Dmitrii Feoktistov

    Abstract: We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network. This problem is relatively well-studied in the scenario when the objective functions are smooth, or the links of the network are fixed in time, or both. In particular, lower bounds on the number of decentralized communications and (sub)gradient computations r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2212.14439  [pdf, other

    math.OC cs.LG

    An Optimal Algorithm for Strongly Convex Min-min Optimization

    Authors: Alexander Gasnikov, Dmitry Kovalev, Grigory Malinovsky

    Abstract: In this paper we study the smooth strongly convex minimization problem $\min_{x}\min_y f(x,y)$. The existing optimal first-order methods require $\mathcal{O}(\sqrt{\max\{κ_x,κ_y\}} \log 1/ε)$ of computations of both $\nabla_x f(x,y)$ and $\nabla_y f(x,y)$, where $κ_x$ and $κ_y$ are condition numbers with respect to variable blocks $x$ and $y$. We propose a new algorithm that only requires… ▽ More

    Submitted 8 February, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 12 pages, 2 figures, 1 algorithm

  3. Using Microbenchmark Suites to Detect Application Performance Changes

    Authors: Martin Grambow, Denis Kovalev, Christoph Laaber, Philipp Leitner, David Bermbach

    Abstract: Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted for publication in IEEE Transactions on Cloud Computing

  4. arXiv:2208.13592  [pdf, ps, other

    math.OC cs.GT cs.LG stat.ML

    Smooth Monotone Stochastic Variational Inequalities and Saddle Point Problems: A Survey

    Authors: Aleksandr Beznosikov, Boris Polyak, Eduard Gorbunov, Dmitry Kovalev, Alexander Gasnikov

    Abstract: This paper is a survey of methods for solving smooth (strongly) monotone stochastic variational inequalities. To begin with, we give the deterministic foundation from which the stochastic methods eventually evolved. Then we review methods for the general stochastic formulation, and look at the finite sum setup. The last parts of the paper are devoted to various recent (not necessarily stochastic)… ▽ More

    Submitted 2 April, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 12 pages

  5. arXiv:2207.03957  [pdf, ps, other

    cs.LG cs.DC math.OC

    Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox

    Authors: Abdurakhmon Sadiev, Dmitry Kovalev, Peter Richtárik

    Abstract: Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip). Our approach is very different, however: it is based on the celebrated method of Chambolle and Pock (2011), with severa… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 53 pages, 3 algorithms, 3 tables, 9 theorems, 11 lemmas

  6. arXiv:2206.08303  [pdf, other

    cs.LG math.OC

    On Scaled Methods for Saddle Point Problems

    Authors: Aleksandr Beznosikov, Aibek Alanov, Dmitry Kovalev, Martin Takáč, Alexander Gasnikov

    Abstract: Methods with adaptive scaling of different features play a key role in solving saddle point problems, primarily due to Adam's popularity for solving adversarial machine learning problems, including GANS training. This paper carries out a theoretical analysis of the following scaling techniques for solving SPPs: the well-known Adam and RmsProp scaling and the newer AdaHessian and OASIS based on Hut… ▽ More

    Submitted 21 June, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 54 pages, 2 algorithms with 4 options for each, 12 figures, 5 tables, 2 theorems

  7. arXiv:2205.15136  [pdf, other

    math.OC cs.DC cs.LG

    Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

    Authors: Dmitry Kovalev, Aleksandr Beznosikov, Ekaterina Borodich, Alexander Gasnikov, Gesualdo Scutari

    Abstract: We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of g… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: 24 pages, 2 new algorithms, 12 theorems, 2 figures

  8. arXiv:2205.09647  [pdf, other

    math.OC cs.LG

    The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization

    Authors: Dmitry Kovalev, Alexander Gasnikov

    Abstract: In this paper, we study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems. Arjevani et al. (2019) established the lower bound $Ω\left(ε^{-2/(3p+1)}\right)$ on the number of the $p$-th order oracle calls required by an algorithm to find an $ε$-accurate solution to the problem, where the $p$-th order oracle stands for the comput… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  9. arXiv:2205.05653  [pdf, other

    math.OC cs.LG

    The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization

    Authors: Dmitry Kovalev, Alexander Gasnikov

    Abstract: In this paper, we revisit the smooth and strongly-convex-strongly-concave minimax optimization problem. Zhang et al. (2021) and Ibrahim et al. (2020) established the lower bound $Ω\left(\sqrt{κ_xκ_y} \log \frac{1}ε\right)$ on the number of gradient evaluations required to find an $ε$-accurate solution, where $κ_x$ and $κ_y$ are condition numbers for the strong convexity and strong concavity assump… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  10. arXiv:2202.05583  [pdf, other

    cs.LG

    Similarity learning for wells based on logging data

    Authors: Evgenia Romanenkova, Alina Rogulina, Anuar Shakirov, Nikolay Stulov, Alexey Zaytsev, Leyla Ismailova, Dmitry Kovalev, Klemens Katterbauer, Abdallah AlShehri

    Abstract: One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subj… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

  11. arXiv:2202.02771  [pdf, other

    math.OC cs.DC cs.LG

    Optimal Algorithms for Decentralized Stochastic Variational Inequalities

    Authors: Dmitry Kovalev, Aleksandr Beznosikov, Abdurakhmon Sadiev, Michael Persiianov, Peter Richtárik, Alexander Gasnikov

    Abstract: Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consid… ▽ More

    Submitted 2 April, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 58 pages, 6 algorithms, 9 figures, 4 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/c959bb2cb164d37569a17fa67494d69a-Abstract-Conference.html

  12. arXiv:2112.15199  [pdf, ps, other

    math.OC cs.LG

    Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

    Authors: Dmitry Kovalev, Alexander Gasnikov, Peter Richtárik

    Abstract: In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions. We propose an Accelerated Primal-Dual Gradient Method (APDG) for solving this problem, achieving (i) an optimal linear convergence rate in the strongly-convex-strongly-concave regime, matching the lower complexity bound (Zhang et al… ▽ More

    Submitted 9 March, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

  13. arXiv:2106.04469  [pdf, other

    math.OC cs.LG

    Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

    Authors: Dmitry Kovalev, Elnur Gasanov, Peter Richtárik, Alexander Gasnikov

    Abstract: We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  14. arXiv:2102.09234  [pdf, other

    math.OC cs.LG

    ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

    Authors: Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Alexander Rogozin, Alexander Gasnikov

    Abstract: We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the same as that of accelerated Nesterov gradient… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

  15. arXiv:2102.08374  [pdf, other

    cs.LG math.OC stat.ML

    IntSGD: Adaptive Floatless Compression of Stochastic Gradients

    Authors: Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik

    Abstract: We propose a family of adaptive integer compression operators for distributed Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to integers. In contrast to the prior work on integer compression for SwitchML by Sapio et al. (2021), our IntSGD method is provably conver… ▽ More

    Submitted 20 March, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: Spotlight at ICLR 2022. 27 pages, 6 figures, 3 algorithms

    Journal ref: International Conference on Learning Representations (2022)

  16. Decentralized Distributed Optimization for Saddle Point Problems

    Authors: Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Dmitry Kovalev, Pavel Dvurechensky, Alexander Gasnikov

    Abstract: We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to have global and local groups of variables. For the proposed algorithm we prove non-asymptotic convergence rate estimates with explicit dependence on the network… ▽ More

    Submitted 9 April, 2024; v1 submitted 15 February, 2021; originally announced February 2021.

  17. arXiv:2011.01697  [pdf, other

    math.OC cs.LG

    A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

    Authors: Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian U. Stich

    Abstract: Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. In many scenarios communication between devices is energy demanding and time consuming and forms the bottleneck of the entire system. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the com… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

  18. arXiv:2010.12292  [pdf, other

    math.OC cs.LG

    Linearly Converging Error Compensated SGD

    Authors: Eduard Gorbunov, Dmitry Kovalev, Dmitry Makarenko, Peter Richtárik

    Abstract: In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, thi… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020, 99 pages

  19. arXiv:2004.01442  [pdf, other

    cs.LG math.OC stat.ML

    From Local SGD to Local Fixed-Point Methods for Federated Learning

    Authors: Grigory Malinovsky, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtárik

    Abstract: Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computatio… ▽ More

    Submitted 16 June, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Accepted to ICML 2020

  20. arXiv:2002.11364  [pdf, other

    math.OC cs.DC cs.LG

    Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

    Authors: Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik

    Abstract: Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of iterations, there are no methods which combine the benefits of both gradient compr… ▽ More

    Submitted 25 June, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  21. arXiv:2002.04670  [pdf, other

    math.OC cs.LG

    Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems

    Authors: Filip Hanzely, Dmitry Kovalev, Peter Richtarik

    Abstract: We propose an accelerated version of stochastic variance reduced coordinate descent -- ASVRCD. As other variance reduced coordinate descent methods such as SEGA or SVRCD, our method can deal with problems that include a non-separable and non-smooth regularizer, while accessing a random block of partial derivatives in each iteration only. However, ASVRCD incorporates Nesterov's momentum, which offe… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 30 pages, 8 figures

  22. arXiv:1912.09925  [pdf, other

    cs.LG cs.DC math.NA math.OC

    Distributed Fixed Point Methods with Compressed Iterates

    Authors: Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč

    Abstract: We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed. This problem is motivated by the practice of federated learning, where a large model stored in the cloud is compressed before it is sent to a mobile device, which then proceeds with training based on local data. We develop standard and variance reduced methods, and establis… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

    Comments: 15 pages, 4 algorithms, 4 Theorems

  23. arXiv:1912.01597  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

    Authors: Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik

    Abstract: We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions. The first is a stochastic variant of Newton's method (SN), and the second is a stochastic variant of cubically regularized Newton's method (SCN). We establish local linear-quadratic convergence results. Unlike existing stochast… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: 16 pages, 2 figures, 3 algorithms, 2 theorems, 7 lemmas; to be presented at the NeurIPS workshop "Beyond First Order Methods in ML"

  24. arXiv:1905.11768  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

    Authors: Adil Salim, Dmitry Kovalev, Peter Richtárik

    Abstract: We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access to a stochastic gradient of the smooth term and a… ▽ More

    Submitted 16 June, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Journal ref: Neurips 2019 (Spotlight)

  25. arXiv:1905.11373  [pdf, other

    math.OC cs.LG

    Revisiting Stochastic Extragradient

    Authors: Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky

    Abstract: We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates. Since the existing stochastic extragradient algorithm, called Mirror-Prox, of (Juditsky et al., 2011) diverges on a simple bilinear problem when the domain is not bounded, we prove guarantees for solving variational inequality that go beyond ex… ▽ More

    Submitted 31 March, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Accepted to AISTATS 2020. 16 pages, 9 figures, 2 algorithms

    Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4573-4582, 2020

  26. arXiv:1901.08689  [pdf, other

    cs.LG math.OC stat.ML

    Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop

    Authors: Dmitry Kovalev, Samuel Horvath, Peter Richtarik

    Abstract: The stochastic variance-reduced gradient method (SVRG) and its accelerated variant (Katyusha) have attracted enormous attention in the machine learning community in the last few years due to their superior theoretical properties and empirical behaviour on training supervised machine learning models via the empirical risk minimization paradigm. A key structural element in both of these methods is t… ▽ More

    Submitted 5 June, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

    Comments: 14 pages, 2 algorithms, 9 lemmas, 2 theorems, 4 figures