Skip to main content

Showing 1–7 of 7 results for author: Morwani, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17748  [pdf, other

    cs.LG math.OC stat.ML

    A New Perspective on Shampoo's Preconditioner

    Authors: Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham Kakade, Lucas Janson

    Abstract: Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. We provide an explicit and novel connec… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2311.07568  [pdf, other

    cs.LG

    Feature emergence via margin maximization: case studies in algebraic tasks

    Authors: Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade

    Abstract: Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry fo… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted as Spotlight at ICLR 2024

    ACM Class: I.5.1; I.2.6

  3. arXiv:2306.08590  [pdf, other

    cs.LG stat.ML

    Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

    Authors: Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak

    Abstract: The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise"). While prior works focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not… ▽ More

    Submitted 7 June, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

  4. arXiv:2305.18411  [pdf, other

    cs.LG

    Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

    Authors: Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan

    Abstract: We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. For simple tasks such as CIFAR-5m this holds throughout training for networks of realistic widths.… ▽ More

    Submitted 5 December, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: 24 pages, 19 figures. NeurIPS 2023. Revised based on reviewer feedback

  5. arXiv:2302.00457  [pdf, other

    cs.LG cs.AI stat.ML

    Simplicity Bias in 1-Hidden Layer Neural Networks

    Authors: Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli

    Abstract: Recent works have demonstrated that neural networks exhibit extreme simplicity bias(SB). That is, they learn only the simplest features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to the lack of a general and rigorous definition of features, these works showcase SB on semi-synthetic datasets such as Color-MNIST, MNIST-CIFAR where defining feat… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    ACM Class: I.5.1; I.2.6

  6. arXiv:2012.08854  [pdf, ps, other

    cs.LG stat.ML

    Using noise resilience for ranking generalization of deep neural networks

    Authors: Depen Morwani, Rahul Vashisht, Harish G. Ramaswamy

    Abstract: Recent papers have shown that sufficiently overparameterized neural networks can perfectly fit even random labels. Thus, it is crucial to understand the underlying reason behind the generalization performance of a network on real-world data. In this work, we propose several measures to predict the generalization error of a network given the training data and its parameters. Using one of these meas… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    ACM Class: I.5.1

  7. arXiv:2010.12909  [pdf, other

    cs.LG stat.ML

    Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets

    Authors: Depen Morwani, Harish G. Ramaswamy

    Abstract: We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these res… ▽ More

    Submitted 31 January, 2023; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Accepted to ALT 2022

    ACM Class: I.5.1; I.2.6