Skip to main content

Showing 1–21 of 21 results for author: Cutkosky, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2407.01825  [pdf, other

    cs.LG math.OC

    Empirical Tests of Optimization Assumptions in Deep Learning

    Authors: Hoang Tran, Qinzi Zhang, Ashok Cutkosky

    Abstract: There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/prac… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.19579  [pdf, ps, other

    math.OC cs.CR cs.LG

    Private Zeroth-Order Nonsmooth Nonconvex Optimization

    Authors: Qinzi Zhang, Hoang Tran, Ashok Cutkosky

    Abstract: We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives. Given a dataset of size $M$, our algorithm ensures $(α,αρ^2/2)$-Rényi differential privacy and finds a $(δ,ε)$-stationary point so long as $M=\tildeΩ\left(\frac{d}{δε^3} + \frac{d^{3/2}}{ρδε^2}\right)$. This matches the optimal complexity of its non-private zeroth-order analog. Nota… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2405.20540  [pdf, ps, other

    cs.LG math.OC stat.ML

    Fully Unconstrained Online Learning

    Authors: Ashok Cutkosky, Zakaria Mhammedi

    Abstract: We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$. Importantly, this matches the optimal bound $G\|w_\star\|\sqrt{T}$ available with such knowledge (up to logarithmic factors), unless either $\|w_\star\|$ or… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.18199  [pdf, ps, other

    cs.LG math.OC

    Adam with model exponential moving average is effective for nonconvex optimization

    Authors: Kwangjun Ahn, Ashok Cutkosky

    Abstract: In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demonstrate that a clipped version of Adam with model EMA achieves the optimal convergence rates in various nonconvex optimization settings, both smooth an… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Comments would be appreciated!

  5. arXiv:2405.15682  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Road Less Scheduled

    Authors: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

    Abstract: Existing learning rate schedules that do not require specification of the optimization stop** step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stop** time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.09742  [pdf, other

    cs.LG math.OC

    Random Scaling and Momentum for Non-smooth Non-convex Optimization

    Authors: Qinzi Zhang, Ashok Cutkosky

    Abstract: Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on stochastic gradient descent with momentum (SGDM), for which classical analysis applies only if the loss is either convex or smooth. We show that a very small modification to SGDM closes this gap: simply scale the update at… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  7. arXiv:2302.03775  [pdf, ps, other

    cs.LG math.OC stat.ML

    Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

    Authors: Ashok Cutkosky, Harsh Mehta, Francesco Orabona

    Abstract: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a $(δ,ε)$-stationary point from $O(ε^{-4}δ^{-1})$ stochastic gradient queries to $O(ε^{-3}δ^{-1})$, which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to onl… ▽ More

    Submitted 11 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  8. arXiv:2301.13349  [pdf, other

    cs.LG math.OC stat.ML

    Unconstrained Dynamic Regret via Sparse Coding

    Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

    Abstract: Motivated by the challenge of nonstationarity in sequential decision making, we study Online Convex Optimization (OCO) under the coupling of two problem structures: the domain is unbounded, and the comparator sequence $u_1,\ldots,u_T$ is arbitrarily time-varying. As no algorithm can guarantee low regret simultaneously against all comparator sequences, handling this setting requires moving from min… ▽ More

    Submitted 25 October, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: NeurIPS 2023

  9. arXiv:2203.00444  [pdf, other

    cs.LG math.OC stat.ML

    Parameter-free Mirror Descent

    Authors: Andrew Jacobsen, Ashok Cutkosky

    Abstract: We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains. We leverage this technique to develop the first unconstrained online linear optimization algorithm achieving an optimal dynamic regret bound, and we further demonstrate that natural strategies based on Follow-the-Regularized-Leader are unable to achieve s… ▽ More

    Submitted 8 February, 2024; v1 submitted 26 February, 2022; originally announced March 2022.

    Comments: 59 pages. v4: Added a new section (7. Trade-offs in the Horizon Dependence) discussing how to achieve an alternative type of parameter-free bound using our framework; v3: published at COLT 2022 + fixed typos; v2: improved the algorithms in sections 3, 5, and 6 (tighter regret, simpler updates and analysis), corrected minor technical details and fixed typos

  10. arXiv:2202.00089  [pdf, other

    cs.LG math.OC

    Understanding AdamW through Proximal Methods and Scale-Freeness

    Authors: Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

    Abstract: Adam has been widely adopted for training deep neural networks due to less hyperparameter tuning and remarkable performance. To improve generalization, Adam is typically used in tandem with a squared $\ell_2$ regularizer (referred to as Adam-$\ell_2$). However, even better performance can be obtained with AdamW, which decouples the gradient of the regularizer from the update rule of Adam-$\ell_2$.… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  11. arXiv:2106.14343  [pdf, other

    cs.LG math.OC stat.ML

    High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

    Authors: Ashok Cutkosky, Harsh Mehta

    Abstract: We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clip**, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded $\mathfrak{p}$th moments for some… ▽ More

    Submitted 9 November, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

  12. arXiv:2002.04726  [pdf, ps, other

    cs.LG math.OC stat.ML

    Online Learning with Imperfect Hints

    Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

    Abstract: We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint vector is guaranteed to have a positive correlation with the cost vector, then the online player can achieve a regret of $O(\log T)$, thus significantly improving ov… ▽ More

    Submitted 2 October, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: appeared in ICML 2020

  13. arXiv:2002.03305  [pdf, other

    cs.LG math.OC stat.ML

    Momentum Improves Normalized SGD

    Authors: Ashok Cutkosky, Harsh Mehta

    Abstract: We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $ε$-critical point in $O(1/ε^{3.5})$ iterations, matching the b… ▽ More

    Submitted 16 May, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

  14. arXiv:1905.12721  [pdf, other

    cs.LG math.OC stat.ML

    Matrix-Free Preconditioning in Online Learning

    Authors: Ashok Cutkosky, Tamas Sarlos

    Abstract: We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix. Our regret bound is never worse than that obtained by diagonal preconditioning, and in certain setting even surpasses that of algorithms with full-matrix preconditioning. Importantly, our algorit… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  15. arXiv:1905.10018  [pdf, other

    cs.LG math.OC stat.ML

    Momentum-Based Variance Reduction in Non-Convex SGD

    Authors: Ashok Cutkosky, Francesco Orabona

    Abstract: Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-bat… ▽ More

    Submitted 21 April, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added Ack

  16. arXiv:1903.00974  [pdf, ps, other

    stat.ML cs.LG math.OC

    Anytime Online-to-Batch Conversions, Optimism, and Acceleration

    Authors: Ashok Cutkosky

    Abstract: A standard way to obtain convergence guarantees in stochastic convex optimization is to run an online learning algorithm and then output the average of its iterates: the actual iterates of the online learning algorithm do not come with individual guarantees. We close this gap by introducing a black-box modification to any online learning algorithm whose iterates converge to the optimum in stochast… ▽ More

    Submitted 3 March, 2019; originally announced March 2019.

  17. arXiv:1902.09013  [pdf, ps, other

    stat.ML cs.LG math.OC

    Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning

    Authors: Ashok Cutkosky

    Abstract: We provide algorithms that guarantee regret $R_T(u)\le \tilde O(G\|u\|^3 + G(\|u\|+1)\sqrt{T})$ or $R_T(u)\le \tilde O(G\|u\|^3T^{1/3} + GT^{1/3}+ G\|u\|\sqrt{T})$ for online convex optimization with $G$-Lipschitz losses for any comparison point $u$ without prior knowledge of either $G$ or $\|u\|$. Previous algorithms dispense with the $O(\|u\|^3)$ term at the expense of knowledge of one or both o… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

  18. arXiv:1902.09003  [pdf, ps, other

    stat.ML cs.LG math.OC

    Combining Online Learning Guarantees

    Authors: Ashok Cutkosky

    Abstract: We show how to take any two parameter-free online learning algorithms with different regret guarantees and obtain a single algorithm whose regret is the minimum of the two base algorithms. Our method is embarrassingly simple: just add the iterates. This trick can generate efficient algorithms that adapt to many norms simultaneously, as well as providing diagonal-style algorithms that still maintai… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

  19. arXiv:1901.09068  [pdf, other

    cs.LG math.OC stat.ML

    Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

    Authors: Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona

    Abstract: Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several years, a plethora of adaptive gradient-based algorithms have emerged to ameliorate this problem. They have proved efficient in reducing the labor of tuning in pract… ▽ More

    Submitted 7 June, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

  20. arXiv:1802.06293  [pdf, ps, other

    cs.LG math.OC stat.ML

    Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

    Authors: Ashok Cutkosky, Francesco Orabona

    Abstract: We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce o… ▽ More

    Submitted 25 June, 2018; v1 submitted 17 February, 2018; originally announced February 2018.

    Comments: Appears in Conference on Learning Theory 2018

  21. arXiv:0901.1678  [pdf, ps, other

    math.AC math.CO

    Associated Primes of the Square of the Alexander Dual of Hypergraphs

    Authors: Ashok Cutkosky

    Abstract: The purpose of this paper is to provide methods for determining the associated primes of the square of the Alexander dual of the edge ideal for an m-hypergraph H. We prove a general method for detecting associated primes of the square of the Alexander dual of the edge ideal based on combinatorial conditions on the m-hypergraph. Also, we demonstrate a more efficient combinatorial criterion for de… ▽ More

    Submitted 12 January, 2009; originally announced January 2009.

    Comments: 12 pages

    MSC Class: 13F55; 05E99