Skip to main content

Showing 1–33 of 33 results for author: Gürbüzbalaban, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.02051  [pdf, other

    stat.ML cs.CR cs.LG math.ST

    Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu

    Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differenti… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  2. arXiv:2307.07030  [pdf, other

    math.OC cs.LG eess.SY

    Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima

    Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

    Abstract: This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: 107 pages, 10 figures; pre-print of a journal submission

  3. arXiv:2305.12056  [pdf, ps, other

    stat.ML cs.LG math.OC

    Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

    Authors: Lingjiong Zhu, Mert Gurbuzbalaban, Anant Raj, Umut Simsekli

    Abstract: Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically requir… ▽ More

    Submitted 28 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 49 pages, NeurIPS 2023

  4. arXiv:2302.05516  [pdf, other

    stat.ML cs.LG math.OC

    Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

    Authors: Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

    Abstract: Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random s… ▽ More

    Submitted 29 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: To Appear

    Journal ref: Transactions of Machine Learning Research, 2023

  5. arXiv:2301.11885  [pdf, other

    stat.ML cs.LG

    Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

    Authors: Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenomena theoretically, several works have made strong topological and statistical assumptions to link the generalization error… ▽ More

    Submitted 30 January, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: The first two authors contributed equally to this work

  6. arXiv:2212.00570  [pdf, other

    stat.ML cs.LG stat.CO

    Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

    Authors: Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu

    Abstract: We consider the constrained sampling problem where the goal is to sample from a target distribution $π(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem… ▽ More

    Submitted 14 April, 2024; v1 submitted 29 November, 2022; originally announced December 2022.

  7. arXiv:2206.01274  [pdf, other

    stat.ML cs.LG

    Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

    Authors: Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli

    Abstract: Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has b… ▽ More

    Submitted 13 February, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: 50 pages

  8. arXiv:2205.06689  [pdf, other

    stat.ML cs.LG math.OC

    Heavy-Tail Phenomenon in Decentralized SGD

    Authors: Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

    Abstract: Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in m… ▽ More

    Submitted 16 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

  9. arXiv:2202.09688  [pdf, other

    math.OC cs.LG

    A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm

    Authors: Bugra Can, Mert Gurbuzbalaban, Necdet Serhat Aybat

    Abstract: In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y)$ where $f$ is $L$-smooth, $f(.,y)$ is $μ$-strongly convex for every $y$, and $f(x,.)$ is $μ$-strongly concave for every $x$. Such problems arise frequently in machine learning in the context of robust empirical risk minimization (ERM), e.g.… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

  10. arXiv:2108.09365  [pdf, other

    math.OC cs.DC

    L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

    Authors: Bugra Can, Saeed Soori, Maryam Mehri Dehnavi, Mert Gürbüzbalaban

    Abstract: This work proposes a distributed algorithm for solving empirical risk minimization problems, called L-DQN, under the master/worker communication model. L-DQN is a distributed limited-memory quasi-Newton method that supports asynchronous computations among the worker nodes. Our method is efficient both in terms of storage and communication costs, i.e., in every iteration the master node and workers… ▽ More

    Submitted 4 September, 2021; v1 submitted 20 August, 2021; originally announced August 2021.

    MSC Class: 68W15 (Primary)

  11. arXiv:2106.04881  [pdf, other

    stat.ML cs.LG

    Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

    Authors: Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu

    Abstract: Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generaliz… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: 34 pages including Supplement, 4 Figures

  12. arXiv:2106.03947  [pdf, other

    cs.LG

    TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

    Authors: Saeed Soori, Bugra Can, Baourun Mu, Mert Gürbüzbalaban, Maryam Mehri Dehnavi

    Abstract: This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with… ▽ More

    Submitted 3 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  13. arXiv:2102.07006  [pdf, other

    stat.ML cs.LG

    Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

    Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which i… ▽ More

    Submitted 10 June, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Main paper of 12 pages, followed by appendix

  14. arXiv:2101.02625  [pdf, other

    math.OC cs.LG eess.SY math.DS

    Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

    Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

    Abstract: Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavior of discrete trajectories of first-order methods within the geometrical landscape of these functions. This paper concerns convergence of first-order… ▽ More

    Submitted 9 March, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: 69 pages; 10 figures; extensive revision of the earlier version, including fewer assumptions, more comparisons with prior art, and new theoretical results

  15. arXiv:2008.01989  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Differentially Private Accelerated Optimization Algorithms

    Authors: Nurdan Kuru, Ş. İlker Birbil, Mert Gurbuzbalaban, Sinan Yildirim

    Abstract: We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease the accumulated noise on the gradient steps required for differential privacy. The second class of algorithms are based on Nesterov's accelerated gradient meth… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: 28 pages, 4 figures

    MSC Class: 68P27; 90C30; 90C25

    Journal ref: SIAM Journal on Optimization 2022 32:2, 795-821

  16. arXiv:2007.00590  [pdf, other

    stat.ML cs.LG math.OC

    Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo

    Authors: Mert Gürbüzbalaban, Xuefeng Gao, Yuanhan Hu, Lingjiong Zhu

    Abstract: Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these a… ▽ More

    Submitted 26 August, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    MSC Class: Primary: 68W15; 62F15; 65C05; 62D05; 62L20; secondary: 60J20; 90C15

  17. arXiv:2006.06733  [pdf, other

    math.OC cs.LG

    IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method

    Authors: Yossi Arjevani, Joan Bruna, Bugra Can, Mert Gürbüzbalaban, Stefanie Jegelka, Hongzhou Lin

    Abstract: We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method, thereby providing a systematic way for deriving several well-known decentralized algorithms including EXTRA arXiv:140… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  18. arXiv:2006.04873  [pdf, other

    math.OC cs.LG math.ST

    A Stochastic Subgradient Method for Distributionally Robust Non-Convex Learning

    Authors: Mert Gürbüzbalaban, Andrzej Ruszczyński, Landi Zhu

    Abstract: We consider a distributionally robust formulation of stochastic optimization problems arising in statistical learning, where robustness is with respect to uncertainty in the underlying data distribution. Our formulation builds on risk-averse optimization techniques and the theory of coherent risk measures. It uses semi-deviation risk for quantifying uncertainty, allowing us to compute solutions th… ▽ More

    Submitted 7 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

  19. arXiv:2006.04740  [pdf, other

    math.OC cs.LG math.ST

    The Heavy-Tail Phenomenon in SGD

    Authors: Mert Gurbuzbalaban, Umut Şimşekli, Lingjiong Zhu

    Abstract: In recent years, various notions of capacity and complexity have been proposed for characterizing the generalization properties of stochastic gradient descent (SGD) in deep learning. Some of the popular notions that correlate well with the performance on unseen data are (i) the `flatness' of the local minimum found by SGD, which is related to the eigenvalues of the Hessian, (ii) the ratio of the s… ▽ More

    Submitted 14 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Journal ref: Published as a conference paper at International Conference on Machine Learning (ICML) 2021

  20. arXiv:2006.01106  [pdf, other

    math.OC cs.LG eess.SY

    Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

    Authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa

    Abstract: This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the 'flat' geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that… ▽ More

    Submitted 6 October, 2023; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: 70 pages; pre-print of the journal paper published in Information and Inference: A Journal of the IMA, 2023

    MSC Class: 90C26; 15Axx; 41A58; 65Hxx

    Journal ref: Information and Inference: A Journal of the IMA, vol. 12, no. 2, pp. 714-786, Jun. 2023

  21. arXiv:2005.11878  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Fractional moment-preserving initialization schemes for training deep neural networks

    Authors: Mert Gurbuzbalaban, Yuanhan Hu

    Abstract: A traditional approach to initialization in deep neural networks (DNNs) is to sample the network weights randomly for preserving the variance of pre-activations. On the other hand, several studies show that during the training process, the distribution of stochastic gradients can be heavy-tailed especially for small batch sizes. In this case, weights and therefore pre-activations can be modeled wi… ▽ More

    Submitted 13 February, 2021; v1 submitted 24 May, 2020; originally announced May 2020.

  22. arXiv:2002.05685  [pdf, other

    stat.ML cs.LG

    Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

    Authors: Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban

    Abstract: Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study… ▽ More

    Submitted 4 November, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: 20 pages, Published at International Conference on Machine Learning 2020

  23. arXiv:1912.00018  [pdf, other

    stat.ML cs.LG math.CA

    On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Ga… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:1901.06053

  24. arXiv:1910.08701  [pdf, other

    math.OC cs.LG stat.ML

    Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

    Authors: Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar, Umut Simsekli, Lingjiong Zhu

    Abstract: We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are availabl… ▽ More

    Submitted 4 October, 2021; v1 submitted 19 October, 2019; originally announced October 2019.

  25. arXiv:1906.09069  [pdf, other

    stat.ML cs.LG

    First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

    Authors: Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard

    Abstract: Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $α$-stable distributions, a family… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

  26. arXiv:1901.08022  [pdf, other

    math.OC cs.LG stat.ML

    A Universally Optimal Multistage Accelerated Stochastic Gradient Method

    Authors: Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

    Abstract: We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic… ▽ More

    Submitted 27 October, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  27. arXiv:1901.07445  [pdf, other

    stat.ML cs.LG math.OC

    Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

    Authors: Bugra Can, Mert Gurbuzbalaban, Lingjiong Zhu

    Abstract: Momentum methods such as Polyak's heavy ball (HB) method, Nesterov's accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For str… ▽ More

    Submitted 16 May, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: 72 pages

    Journal ref: International Conference on Machine Learning 2019, 891-901

  28. arXiv:1901.06053  [pdf, other

    cs.LG stat.ML

    A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

    Authors: Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussiani… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

  29. arXiv:1812.07725  [pdf, other

    math.OC cs.LG math.NA math.PR stat.ML

    Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization

    Authors: Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu

    Abstract: Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales. LD is based on the first-order Langevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dyn… ▽ More

    Submitted 2 October, 2020; v1 submitted 18 December, 2018; originally announced December 2018.

    MSC Class: 65K05; 90C26; 90C30; 82C31; 65C30

  30. arXiv:1809.04618  [pdf, other

    math.OC cs.LG

    Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration

    Authors: Xuefeng Gao, Mert Gürbüzbalaban, Lingjiong Zhu

    Abstract: Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is a variant of stochastic gradient with momentum where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. Many works reported its empirical success in practice for solving stochastic non-convex optimization problems, in particular it has been observed to outperform… ▽ More

    Submitted 17 November, 2020; v1 submitted 12 September, 2018; originally announced September 2018.

  31. arXiv:1805.10579  [pdf, other

    math.OC cs.LG stat.ML

    Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions

    Authors: Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

    Abstract: We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal va… ▽ More

    Submitted 5 November, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: To appear in SIAM Journal on Optimization (SIOPT)

  32. arXiv:1710.08883  [pdf, other

    cs.DC cs.LG math.NA math.OC

    Avoiding Communication in Proximal Methods for Convex Optimization Problems

    Authors: Saeed Soori, Aditya Devarakonda, James Demmel, Mert Gurbuzbalaban, Maryam Mehri Dehnavi

    Abstract: The fast iterative soft thresholding algorithm (FISTA) is used to solve convex regularized optimization problems in machine learning. Distributed implementations of the algorithm have become popular since they enable the analysis of large datasets. However, existing formulations of FISTA communicate data at every iteration which reduces its performance on modern distributed architectures. The comm… ▽ More

    Submitted 24 October, 2017; originally announced October 2017.

  33. arXiv:1611.00347  [pdf, other

    math.OC cs.LG

    Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate

    Authors: Aryan Mokhtari, Mert Gürbüzbalaban, Alejandro Ribeiro

    Abstract: Recently, there has been growing interest in develo** optimization methods for solving large-scale machine learning problems. Most of these problems boil down to the problem of minimizing an average of a finite set of smooth and strongly convex functions where the number of functions $n$ is large. Gradient descent method (GD) is successful in minimizing convex problems at a fast linear rate; how… ▽ More

    Submitted 7 February, 2018; v1 submitted 1 November, 2016; originally announced November 2016.