Skip to main content

Showing 1–11 of 11 results for author: Kale, S

Searching in archive math. Search in all archives.
.
  1. arXiv:2210.06705  [pdf, ps, other

    cs.LG cs.AI math.OC

    From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

    Authors: Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan

    Abstract: Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is provi… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  2. arXiv:2207.02794  [pdf, ps, other

    cs.DS cs.CR cs.LG math.MG stat.ML

    Private Matrix Approximation and Geometry of Unitary Orbits

    Authors: Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Guha Thakurta, Nisheeth K. Vishnoi

    Abstract: Consider the following optimization problem: Given $n \times n$ matrices $A$ and $Λ$, maximize $\langle A, UΛU^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spectrum is the same as $Λ$ and, by setting $Λ$ to be appropriate diagonal matrices, one can recover matrix approximation problems such as PCA and rank-$k$ approximat… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Journal ref: Proceedings of Thirty Fifth Conference on Learning Theory (COLT), PMLR 178:3547-3588, 2022

  3. arXiv:2202.04598  [pdf, ps, other

    math.OC cs.LG stat.ML

    Reproducibility in Optimization: Theoretical Framework and Limits

    Authors: Kwangjun Ahn, Prateek Jain, Ziwei Ji, Satyen Kale, Praneeth Netrapalli, Gil I. Shamir

    Abstract: We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions… ▽ More

    Submitted 4 December, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 45 Pages; Accepted to NeurIPS 2022

  4. arXiv:2201.13419  [pdf, ps, other

    cs.LG math.OC stat.ML

    Agnostic Learnability of Halfspaces via Logistic Loss

    Authors: Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

    Abstract: We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tildeΩ(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

  5. arXiv:2103.06972  [pdf, other

    cs.LG cs.DC math.OC

    Federated Functional Gradient Boosting

    Authors: Zebang Shen, Hamed Hassani, Satyen Kale, Amin Karbasi

    Abstract: In this paper, we initiate a study of functional minimization in Federated Learning. First, in the semi-heterogeneous setting, when the marginal distributions of the feature vectors on client machines are identical, we develop the federated functional gradient boosting (FFGB) method that provably converges to the global minimum. Subsequently, we extend our results to the fully-heterogeneous settin… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  6. arXiv:2102.11845  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Learning with User-Level Privacy

    Authors: Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, Ananda Theertha Suresh

    Abstract: We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical… ▽ More

    Submitted 3 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021. 43 pages, 0 figure

  7. arXiv:2008.03606  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

    Authors: Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

    Abstract: Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which gives rise to the client drift phenomenon. In fact, obtaining an algorithm for FL which is uniformly better than simple centralized training has been a major open problem thus far. In this work, we propose a general algorithmic framework, Mime, which i) mitigates cl… ▽ More

    Submitted 8 June, 2021; v1 submitted 8 August, 2020; originally announced August 2020.

    Comments: Version 2 provides stronger theoretical results and more thorough experiments

    MSC Class: 68W40; 68W15; 90C25; 90C06 ACM Class: G.1.6; F.2.1; E.4

  8. arXiv:1910.06378  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

    Authors: Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

    Abstract: Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow converge… ▽ More

    Submitted 9 April, 2021; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: v2 contains analysis of FedAvg, non-convex rates of Scaffold, and experimental evaluation. v3 fixes typos, ICML version. v4 slightly improves rate of SCAFFOLD for general convex functions

    MSC Class: 68W40; 68W15; 90C25; 90C06 ACM Class: G.1.6; F.2.1; E.4

  9. arXiv:1904.09237  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence of Adam and Beyond

    Authors: Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

    Abstract: Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to co… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Comments: Appeared in ICLR 2018

  10. arXiv:1901.09149  [pdf, other

    cs.LG math.OC stat.ML

    Esca** Saddle Points with Adaptive Gradient Methods

    Authors: Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

    Abstract: Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood. In this paper, we seek a crisp, clean and precise characterization of their behavior in nonconvex settings. To this end, we first provide a novel view of adaptive methods as preconditioned SGD, where the preconditioner is estimated in an online manner. By studying the preconditioner on its own,… ▽ More

    Submitted 3 February, 2020; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: Update Theorem 4.1 and proof to use martingale concentration bounds, i.e. matrix Freedman

  11. arXiv:1006.2425  [pdf, ps, other

    math.OC

    An optimal algorithm for stochastic strongly-convex optimization

    Authors: Elad Hazan, Satyen Kale

    Abstract: We consider stochastic convex optimization with a strongly convex (but not necessarily smooth) objective. We give an algorithm which performs only gradient updates with optimal rate of convergence.

    Submitted 11 June, 2010; originally announced June 2010.