Skip to main content

Showing 1–14 of 14 results for author: Dhillon, I

Searching in archive math. Search in all archives.
.
  1. arXiv:2402.07114  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Towards Quantifying the Preconditioning Effect of Adam

    Authors: Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a notable dearth of results characterizing the preconditioning effect of Adam and showing how it may alleviate the curse of ill-conditioning -- an issue plaguing gradient descent (GD). In this work, we perform a detailed analysis of Adam's preconditioning effect for quadratic functions and quantify to what extent Adam can mitigate the dependence on the condition number of the Hessian. Our… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  2. arXiv:2311.10085  [pdf, other

    cs.LG cs.CL math.OC

    A Computationally Efficient Sparsified Online Newton Method

    Authors: Fnu Devvrit, Sai Surya Duvvuri, Rohan Anil, Vineet Gupta, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order alg… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 30 pages. First two authors contributed equally. Accepted at NeurIPS 2023

  3. arXiv:2202.10506  [pdf, other

    math.OC cs.LG stat.ML

    Accelerating Primal-dual Methods for Regularized Markov Decision Processes

    Authors: Haoya Li, Hsiang-fu Yu, Lexing Ying, Inderjit Dhillon

    Abstract: Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The… ▽ More

    Submitted 12 June, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

  4. arXiv:2110.02398  [pdf, other

    cs.LG math.OC

    Approximate Newton policy gradient algorithms

    Authors: Haoya Li, Samarth Gupta, Hsiangfu Yu, Lexing Ying, Inderjit Dhillon

    Abstract: Policy gradient algorithms have been widely applied to Markov decision processes and reinforcement learning problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting… ▽ More

    Submitted 8 June, 2023; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: 22 pages, 15 figures, v6 accepted by SIAM SISC

  5. arXiv:2106.08882  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

    Authors: Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{G… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  6. arXiv:2106.07094  [pdf, other

    cs.LG cs.DC eess.SP math.OC stat.ML

    On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

    Authors: Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clip** operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clip** threshold) for bounding the sensitivity of the average update to each client's update i… ▽ More

    Submitted 15 April, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

  7. arXiv:2012.04061  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Faster Non-Convex Federated Learning via Global and Local Momentum

    Authors: Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key… ▽ More

    Submitted 24 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

  8. arXiv:2011.10643  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

    Authors: Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

    Abstract: In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such co… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  9. arXiv:1906.02436  [pdf, other

    cs.LG math.OC stat.ML

    Primal-Dual Block Frank-Wolfe

    Authors: Qi Lei, Jiacheng Zhuo, Constantine Caramanis, Inderjit S. Dhillon, Alexandros G. Dimakis

    Abstract: We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural compl… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

  10. arXiv:1812.00151  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

    Authors: Qi Lei, Lingfei Wu, Pin-Yu Chen, Alexandros G. Dimakis, Inderjit S. Dhillon, Michael Witbrock

    Abstract: Adversarial examples are carefully constructed modifications to an input that completely change the output of a classifier but are imperceptible to humans. Despite these successful attacks for continuous data (such as image and audio samples), generating adversarial examples for discrete structures such as text has proven significantly more challenging. In this paper we formulate the attacks with… ▽ More

    Submitted 4 April, 2019; v1 submitted 1 December, 2018; originally announced December 2018.

    Comments: In SysML 2019

  11. arXiv:1811.00641  [pdf, other

    cs.LG cs.CL math.NA stat.ML

    Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

    Authors: Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon

    Abstract: Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training,to compress the word embedding layer which… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: Accepted in Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)

  12. arXiv:1509.01404  [pdf, ps, other

    math.NA cs.CV cs.LG math.OC stat.ML

    Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization

    Authors: Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, Inderjit Dhillon

    Abstract: Given a symmetric nonnegative matrix $A$, symmetric nonnegative matrix factorization (symNMF) is the problem of finding a nonnegative matrix $H$, usually with much fewer columns than $A$, such that $A \approx HH^T$. SymNMF can be used for data analysis and in particular for various clustering tasks. In this paper, we propose simple and very efficient coordinate descent schemes to solve this proble… ▽ More

    Submitted 31 May, 2016; v1 submitted 4 September, 2015; originally announced September 2015.

    Comments: 25 pages, 5 figures, 7 tables. Main changes: comparison with another symNMF algorithm (namely, BetaSNMF), and correction of an error in the convergence proof

    Journal ref: IEEE Transactions on Signal Processing 64 (21), pp. 5571-5584, 2016

  13. arXiv:1411.6081  [pdf, other

    cs.LG math.NA stat.ML

    PU Learning for Matrix Completion

    Authors: Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon

    Abstract: In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem of learning from only positive and… ▽ More

    Submitted 21 November, 2014; originally announced November 2014.

  14. arXiv:0709.0535  [pdf, ps, other

    math.MG

    Constructing packings in Grassmannian manifolds via alternating projection

    Authors: I. S. Dhillon, R. W. Heath Jr, T. Strohmer, J. A. Tropp

    Abstract: This paper describes a numerical method for finding good packings in Grassmannian manifolds equipped with various metrics. This investigation also encompasses packing in projective spaces. In each case, producing a good packing is equivalent to constructing a matrix that has certain structural and spectral properties. By alternately enforcing the structural condition and then the spectral condit… ▽ More

    Submitted 4 September, 2007; originally announced September 2007.

    Comments: 41 pages, 7 tables, 4 figures

    MSC Class: 51N15; 52C17

    Journal ref: Exper. Math., Vol. 17, num. 1, pp. 9--35, 2008