Skip to main content

Showing 1–9 of 9 results for author: Cohen, J M

.
  1. arXiv:2207.14484  [pdf, other

    cs.LG

    Adaptive Gradient Methods at the Edge of Stability

    Authors: Jeremy M. Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Cardoze, Zachary Nado, George E. Dahl, Justin Gilmer

    Abstract: Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we empirically demonstrate that during full-batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 July, 2022; originally announced July 2022.

    Comments: v2 corrects the formula for Adam's preconditioner in Eq 2

  2. Universal properties of the isotropic Laplace operator on homogeneous trees

    Authors: Joel M. Cohen, Mauro Pagliacci, Massimo A Picardello

    Abstract: Let $P$ be the isotropic nearest neighbor transition operator on a homogeneous tree. We consider the $λ$-eigenfunctions of $P$ for $λ$ outside its $\ell^2$ spectrum, i.e., the eigenfunctions with eigenvalue $γ=λ- 1$ of the Laplace operator $Delta=P- \mathbb I$, and also the $λ-$polyharmonic functions, that is, the union of the kernels of $(Delta-γ\mathbb I)^n$ for $n\geqslant 0$. We prove that, on… ▽ More

    Submitted 24 March, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: The last-named author acknowledges support by MIUR Excellence Departments Project awarded to the Department of Mathematics, University of Rome Tor Vergata, CUP E83C18000100006, and by Istituto Nazionale di Alta Matematica, Gruppo GNAFA. Adv. Math. (2022)

    MSC Class: Primary: 05C05; Secondary: 31A30; 31C20; 47A16; 60J45

  3. arXiv:2103.00065  [pdf, other

    cs.LG stat.ML

    Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

    Authors: Jeremy M. Cohen, Simran Kaur, Yuanzhi Li, J. Zico Kolter, Ameet Talwalkar

    Abstract: We empirically demonstrate that full-batch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability. In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / \text{(step size)}$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long… ▽ More

    Submitted 23 November, 2022; v1 submitted 26 February, 2021; originally announced March 2021.

    Comments: ICLR 2021. v3 moves several figures from the appendix into the main text, and adds more discussion regarding Jastrzębski et al (2020): https://doi.org/10.48550/arXiv.2002.09572

  4. arXiv:1909.09577  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    NeMo: a toolkit for building AI applications using Neural Modules

    Authors: Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen

    Abstract: NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition. NeMo is built around neural modules, conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: 6 pages plus references

  5. arXiv:1905.11286  [pdf, other

    cs.LG stat.ML

    Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

    Authors: Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen

    Abstract: We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay. In our experiments on neural networks for image classification, speech recognition, machine translation, and language modeling, it performs on par or better than well tuned SGD with momentum and Adam or AdamW. Additionally, NovoGrad (1) is robust to the choice of l… ▽ More

    Submitted 6 February, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Preprint, under review

  6. arXiv:1904.03288  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Jasper: An End-to-End Convolutional Neural Acoustic Model

    Authors: Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

    Abstract: In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep arc… ▽ More

    Submitted 26 August, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted to INTERSPEECH 2019

  7. arXiv:1902.02918  [pdf, other

    cs.LG stat.ML

    Certified Adversarial Robustness via Randomized Smoothing

    Authors: Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter

    Abstract: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the $\ell_2$ norm. This "randomized smoothing" technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in $\ell_2$ norm for smoothing with Gaussian noise. We use rand… ▽ More

    Submitted 15 June, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

    Comments: ICML 2019

  8. Search for extended sources in the Galactic Plane using 6 years of Fermi-Large Area Telescope Pass 8 data above 10 GeV

    Authors: The Fermi LAT Collaboration, M. Ackermann, M. Ajello, L. Baldini, J. Ballet, G. Barbiellini, D. Bastieri, R. Bellazzini, E. Bissaldi, E. D. Bloom, R. Bonino, E. Bottacini, T. J. Brandt, J. Bregeon, P. Bruel, R. Buehler, R. A. Cameron, M. Caragiulo, P. A. Caraveo, D. Castro, E. Cavazzuti, C. Cecchi, E. Charles, A. Chekhtman, C. C. Cheung , et al. (95 additional authors not shown)

    Abstract: The spatial extension of a gamma-ray source is an essential ingredient to determine its spectral properties as well as its potential multi-wavelength counterpart. The capability to spatially resolve gamma-ray sources is greatly improved by the newly delivered Fermi-Large Area Telescope (LAT) Pass 8 event-level analysis which provides a greater acceptance and an improved point spread function, two… ▽ More

    Submitted 11 April, 2018; v1 submitted 1 February, 2017; originally announced February 2017.

    Comments: 33 pages, 22 figures & 3 tables. Published by The Astrophysical Journal. Available on the Fermi Science Support Center (FSSC) together with the 3FHL catalog

  9. arXiv:1511.06778  [pdf, other

    astro-ph.HE astro-ph.IM

    The 1st Fermi Lat Supernova Remnant Catalog

    Authors: Fabio Acero, Markus Ackermann, Marco Ajello, Luca Baldini, Jean Ballet, Guido Barbiellini, Denis Bastieri, Ronaldo Bellazzini, E. Bissaldi, Roger Blandford, E. D. Bloom, Raffaella Bonino, Eugenio Bottacini, J. Bregeon, Philippe Bruel, Rolf Buehler, S. Buson, G. A. Caliandro, Rob A. Cameron, R Caputo, Micaela Caragiulo, Patrizia A. Caraveo, Jean Marc Casandjian, Elisabetta Cavazzuti, Claudia Cecchi , et al. (134 additional authors not shown)

    Abstract: To uniformly determine the properties of supernova remnants (SNRs) at high energies, we have developed the first systematic survey at energies from 1 to 100 GeV using data from the Fermi Large Area Telescope. Based on the spatial overlap of sources detected at GeV energies with SNRs known from radio surveys, we classify 30 sources as likely GeV SNRs. We also report 14 marginal associations and 245… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

    Comments: Resubmitted to ApJS

    Journal ref: ApJS 224 8 (2016)