Skip to main content

Showing 1–23 of 23 results for author: Gürbüzbalaban, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2403.02051  [pdf, other

    stat.ML cs.CR cs.LG math.ST

    Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Sinan Yıldırım, Lingjiong Zhu

    Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differenti… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  2. arXiv:2305.12056  [pdf, ps, other

    stat.ML cs.LG math.OC

    Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

    Authors: Lingjiong Zhu, Mert Gurbuzbalaban, Anant Raj, Umut Simsekli

    Abstract: Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically requir… ▽ More

    Submitted 28 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 49 pages, NeurIPS 2023

  3. arXiv:2302.05516  [pdf, other

    stat.ML cs.LG math.OC

    Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

    Authors: Mert Gürbüzbalaban, Yuanhan Hu, Umut Şimşekli, Lingjiong Zhu

    Abstract: Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random s… ▽ More

    Submitted 29 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: To Appear

    Journal ref: Transactions of Machine Learning Research, 2023

  4. arXiv:2301.11885  [pdf, other

    stat.ML cs.LG

    Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

    Authors: Anant Raj, Lingjiong Zhu, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenomena theoretically, several works have made strong topological and statistical assumptions to link the generalization error… ▽ More

    Submitted 30 January, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: The first two authors contributed equally to this work

  5. arXiv:2212.00570  [pdf, other

    stat.ML cs.LG stat.CO

    Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling

    Authors: Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu

    Abstract: We consider the constrained sampling problem where the goal is to sample from a target distribution $π(x)\propto e^{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem… ▽ More

    Submitted 14 April, 2024; v1 submitted 29 November, 2022; originally announced December 2022.

  6. arXiv:2206.01274  [pdf, other

    stat.ML cs.LG

    Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

    Authors: Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli

    Abstract: Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has b… ▽ More

    Submitted 13 February, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: 50 pages

  7. arXiv:2205.06689  [pdf, other

    stat.ML cs.LG math.OC

    Heavy-Tail Phenomenon in Decentralized SGD

    Authors: Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

    Abstract: Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in m… ▽ More

    Submitted 16 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

  8. arXiv:2106.04881  [pdf, other

    stat.ML cs.LG

    Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

    Authors: Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu

    Abstract: Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generaliz… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: 34 pages including Supplement, 4 Figures

  9. arXiv:2102.10346  [pdf, other

    math.OC stat.ML

    Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

    Authors: Hongjian Wang, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli, Murat A. Erdogdu

    Abstract: Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide conver… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

  10. arXiv:2102.07006  [pdf, other

    stat.ML cs.LG

    Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

    Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert Gürbüzbalaban, Umut Şimşekli

    Abstract: Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which i… ▽ More

    Submitted 10 June, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Main paper of 12 pages, followed by appendix

  11. arXiv:2008.01989  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Differentially Private Accelerated Optimization Algorithms

    Authors: Nurdan Kuru, Ş. İlker Birbil, Mert Gurbuzbalaban, Sinan Yildirim

    Abstract: We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease the accumulated noise on the gradient steps required for differential privacy. The second class of algorithms are based on Nesterov's accelerated gradient meth… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: 28 pages, 4 figures

    MSC Class: 68P27; 90C30; 90C25

    Journal ref: SIAM Journal on Optimization 2022 32:2, 795-821

  12. arXiv:2007.00590  [pdf, other

    stat.ML cs.LG math.OC

    Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo

    Authors: Mert Gürbüzbalaban, Xuefeng Gao, Yuanhan Hu, Lingjiong Zhu

    Abstract: Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these a… ▽ More

    Submitted 26 August, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    MSC Class: Primary: 68W15; 62F15; 65C05; 62D05; 62L20; secondary: 60J20; 90C15

  13. arXiv:2005.11878  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Fractional moment-preserving initialization schemes for training deep neural networks

    Authors: Mert Gurbuzbalaban, Yuanhan Hu

    Abstract: A traditional approach to initialization in deep neural networks (DNNs) is to sample the network weights randomly for preserving the variance of pre-activations. On the other hand, several studies show that during the training process, the distribution of stochastic gradients can be heavy-tailed especially for small batch sizes. In this case, weights and therefore pre-activations can be modeled wi… ▽ More

    Submitted 13 February, 2021; v1 submitted 24 May, 2020; originally announced May 2020.

  14. arXiv:2004.02823  [pdf, other

    math.OC stat.ML

    Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics

    Authors: Yuanhan Hu, Xiaoyu Wang, Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu

    Abstract: Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped La… ▽ More

    Submitted 2 June, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: 45 pages

  15. arXiv:2002.05685  [pdf, other

    stat.ML cs.LG

    Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

    Authors: Umut Şimşekli, Lingjiong Zhu, Yee Whye Teh, Mert Gürbüzbalaban

    Abstract: Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the theory is considerably less developed in the context of deep learning where the problem is non-convex and the gradient noise might exhibit a heavy-tailed behavior, as empirically observed in recent studies. In this study… ▽ More

    Submitted 4 November, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: 20 pages, Published at International Conference on Machine Learning 2020

  16. arXiv:1912.00018  [pdf, other

    stat.ML cs.LG math.CA

    On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Ga… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:1901.06053

  17. arXiv:1910.08701  [pdf, other

    math.OC cs.LG stat.ML

    Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

    Authors: Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar, Umut Simsekli, Lingjiong Zhu

    Abstract: We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational units, lying on a fixed but arbitrary connected communication graph, subject to local communication constraints where noisy estimates of the gradients are availabl… ▽ More

    Submitted 4 October, 2021; v1 submitted 19 October, 2019; originally announced October 2019.

  18. arXiv:1906.09069  [pdf, other

    stat.ML cs.LG

    First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

    Authors: Thanh Huy Nguyen, Umut Şimşekli, Mert Gürbüzbalaban, Gaël Richard

    Abstract: Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $α$-stable distributions, a family… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

  19. arXiv:1901.08022  [pdf, other

    math.OC cs.LG stat.ML

    A Universally Optimal Multistage Accelerated Stochastic Gradient Method

    Authors: Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

    Abstract: We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic… ▽ More

    Submitted 27 October, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  20. arXiv:1901.07445  [pdf, other

    stat.ML cs.LG math.OC

    Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

    Authors: Bugra Can, Mert Gurbuzbalaban, Lingjiong Zhu

    Abstract: Momentum methods such as Polyak's heavy ball (HB) method, Nesterov's accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For str… ▽ More

    Submitted 16 May, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: 72 pages

    Journal ref: International Conference on Machine Learning 2019, 891-901

  21. arXiv:1901.06053  [pdf, other

    cs.LG stat.ML

    A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

    Authors: Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussiani… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

  22. arXiv:1812.07725  [pdf, other

    math.OC cs.LG math.NA math.PR stat.ML

    Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization

    Authors: Xuefeng Gao, Mert Gurbuzbalaban, Lingjiong Zhu

    Abstract: Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales. LD is based on the first-order Langevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dyn… ▽ More

    Submitted 2 October, 2020; v1 submitted 18 December, 2018; originally announced December 2018.

    MSC Class: 65K05; 90C26; 90C30; 82C31; 65C30

  23. arXiv:1805.10579  [pdf, other

    math.OC cs.LG stat.ML

    Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions

    Authors: Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

    Abstract: We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal va… ▽ More

    Submitted 5 November, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: To appear in SIAM Journal on Optimization (SIOPT)