Skip to main content

Showing 1–4 of 4 results for author: Sailanbayev, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2104.09342  [pdf, other

    cs.LG cs.AI math.OC

    Random Reshuffling with Variance Reduction: New Analysis and Better Rates

    Authors: Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik

    Abstract: Virtually all state-of-the-art methods for training supervised machine learning models are variants of SGD enhanced with a number of additional tricks, such as minibatching, momentum, and adaptive stepsizes. One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is {\em random reshuffling (RR)}. However, the practical bene… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: 24 pages, 5 figures, 4 algorithms, 1 table

  2. arXiv:1906.01474  [pdf, other

    math.OC

    MISO is Making a Comeback With Better Proofs and Rates

    Authors: Xun Qian, Alibek Sailanbayev, Konstantin Mishchenko, Peter Richtárik

    Abstract: MISO, also known as Finito, was one of the first stochastic variance reduced methods discovered, yet its popularity is fairly low. Its initial analysis was significantly limited by the so-called Big Data assumption. Although the assumption was lifted in subsequent work using negative momentum, this introduced a new parameter and required knowledge of strong convexity and smoothness constants, whic… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: 23 pages, 3 figures, 3 tables

  3. arXiv:1901.09401  [pdf, other

    cs.LG math.OC stat.ML

    SGD: General Analysis and Improved Rates

    Authors: Robert Mansel Gower, Nicolas Loizou, Xun Qian, Alibek Sailanbayev, Egor Shulgin, Peter Richtarik

    Abstract: We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD w… ▽ More

    Submitted 1 May, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: 23 pages, 6 figures

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5200-5209, 2019

  4. arXiv:1806.05633  [pdf, other

    math.OC

    Improving SAGA via a Probabilistic Interpolation with Gradient Descent

    Authors: Adel Bibi, Alibek Sailanbayev, Bernard Ghanem, Robert Mansel Gower, Peter Richtárik

    Abstract: We develop and analyze a new algorithm for empirical risk minimization, which is the key paradigm for training supervised machine learning models. Our method---SAGD---is based on a probabilistic interpolation of SAGA and gradient descent (GD). In particular, in each iteration we take a gradient step with probability $q$ and a SAGA step with probability $1-q$. We show that, surprisingly, the total… ▽ More

    Submitted 2 April, 2020; v1 submitted 14 June, 2018; originally announced June 2018.