Skip to main content

Showing 1–10 of 10 results for author: Mishkin, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02378  [pdf, ps, other

    math.OC cs.LG

    Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

    Authors: Aaron Mishkin, Mert Pilanci, Mark Schmidt

    Abstract: We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialize… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Results extend work from Aaron Mishkin's master's thesis

  2. arXiv:2403.04081  [pdf, other

    cs.LG math.OC

    Directional Smoothness and Gradient Methods: Convergence and Adaptivity

    Authors: Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

    Abstract: We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a se… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Twenty-four pages

  3. arXiv:2403.03362  [pdf, other

    cs.LG math.OC

    Level Set Teleportation: An Optimization Perspective

    Authors: Aaron Mishkin, Alberto Bietti, Robert M. Gower

    Abstract: We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods by maximizing the gradient norm on a level-set of the objective function. Since the descent lemma implies that gradient descent (GD) decreases the objective proportional to the squared norm of the gradient, level-set teleportation maximizes this one-step progress guarantee. For convex functions… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Thirty-five pages including appendices

  4. arXiv:2403.01046  [pdf, other

    cs.LG cs.AI cs.NE math.OC stat.ML

    A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

    Authors: Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci

    Abstract: We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2 and 3-layer networks with piecewise linear activations, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in absolute valu… ▽ More

    Submitted 29 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  5. arXiv:2307.01169  [pdf, other

    math.OC cs.LG stat.ML

    Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

    Authors: Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She

    Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem di… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  6. arXiv:2306.00119  [pdf, other

    cs.LG

    Optimal Sets and Solution Paths of ReLU Networks

    Authors: Aaron Mishkin, Mert Pilanci

    Abstract: We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training pro… ▽ More

    Submitted 19 January, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: Minor updates and corrections to clarify the role of merge/split symmetries in formation of ReLU optimal set and add missing sufficient conditions for all minimal models to have the same cardinality

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24888-24924, 2023

  7. arXiv:2202.01331  [pdf, other

    cs.LG

    Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

    Authors: Aaron Mishkin, Arda Sahiner, Mert Pilanci

    Abstract: We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex reformulation of the standard weight-decay penalized training problem as a set of group-$\ell_1$-regularized data-local models, where locality is enforced by polyhedral cone constraints. In the special case of zero-regularization, we show t… ▽ More

    Submitted 31 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Camera ready version for ICML 2022

  8. arXiv:2006.06821  [pdf, other

    cs.LG stat.ML

    To Each Optimizer a Norm, To Each Norm its Generalization

    Authors: Sharan Vaswani, Reza Babanezhad, Jose Gallego-Posada, Aaron Mishkin, Simon Lacoste-Julien, Nicolas Le Roux

    Abstract: We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoni… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  9. arXiv:1905.09997  [pdf, other

    cs.LG math.OC stat.ML

    Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

    Authors: Sharan Vaswani, Aaron Mishkin, Issam Laradji, Mark Schmidt, Gauthier Gidel, Simon Lacoste-Julien

    Abstract: Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques t… ▽ More

    Submitted 4 June, 2021; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added a citation to the related work of Paul Tseng, and citations to methods that had previously explored line-searches for deep learning empirically

  10. arXiv:1811.04504  [pdf, other

    cs.LG cs.AI stat.ML

    SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

    Authors: Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

    Abstract: Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue,… ▽ More

    Submitted 11 January, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

    Comments: NeurIPS 2018 final version