Skip to main content

Showing 1–9 of 9 results for author: Kunin, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.06158  [pdf, other

    cs.LG cs.AI stat.ML

    Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

    Authors: Daniel Kunin, Allan Raventós, Clémentine Dominé, Feng Chen, David Klindt, Andrew Saxe, Surya Ganguli

    Abstract: While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our theoretical understanding stemming from the opposing lazy regime. In this work, we derive exact solutions to a minimal model that transitions between laz… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 40 pages, 12 figures

  2. arXiv:2306.04251  [pdf, other

    cs.LG cs.AI stat.ML

    Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

    Authors: Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli

    Abstract: In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant set… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 37 pages, 12 figures, NeurIPS 2023

  3. arXiv:2210.03820  [pdf, other

    cs.LG stat.ML

    The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

    Authors: Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli

    Abstract: In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while stru… ▽ More

    Submitted 16 February, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: 41 pages, 5 figures, ICLR 2023

  4. arXiv:2107.09133  [pdf, other

    cs.LG cond-mat.stat-mech q-bio.NC stat.ML

    The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

    Authors: Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins

    Abstract: In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate… ▽ More

    Submitted 28 December, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 78 pages, 9 figures, Neural Computation 2024

    Journal ref: Neural Computation (2024) 36 (1) 151-174

  5. arXiv:2105.02716  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech q-bio.NC stat.ML

    Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

    Authors: Hidenori Tanaka, Daniel Kunin

    Abstract: In nature, symmetry governs regularities, while symmetry breaking brings texture. In artificial neural networks, symmetry has been a central design principle to efficiently capture regularities in the world, but the role of symmetry breaking is not well understood. Here, we develop a theoretical framework to study the "geometry of learning dynamics" in neural networks, and reveal a key mechanism o… ▽ More

    Submitted 2 November, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Journal ref: NeurIPS (Advances in Neural Information Processing Systems), 2021

  6. arXiv:2012.04728  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech q-bio.NC stat.ML

    Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

    Authors: Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

    Abstract: Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients derived from real-world datasets. We circumvent this obstacle through a unifying theoreti… ▽ More

    Submitted 29 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: 30 pages, 17 figures, ICLR 2021

  7. arXiv:2006.05467  [pdf, other

    cs.LG cond-mat.dis-nn cs.CV q-bio.NC stat.ML

    Pruning neural networks without any data by iteratively conserving synaptic flow

    Authors: Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

    Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we ide… ▽ More

    Submitted 18 November, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020, 18 pages, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 2020

  8. arXiv:2003.01513  [pdf, other

    q-bio.NC cs.LG cs.NE stat.ML

    Two Routes to Scalable Credit Assignment without Weight Symmetry

    Authors: Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins

    Abstract: The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another. Until recently, attempts to create local learning rules that avoid weight transport have typically failed in the large-scale learning scenarios where backpropagation s… ▽ More

    Submitted 24 June, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: ICML 2020 Camera Ready Version, 19 pages including supplementary information, 10 figures

  9. arXiv:1901.08168  [pdf, other

    cs.LG stat.ML

    Loss Landscapes of Regularized Linear Autoencoders

    Authors: Daniel Kunin, Jonathan M. Bloom, Aleksandrina Goeva, Cotton Seed

    Abstract: Autoencoders are a deep learning model for representation learning. When trained to minimize the distance between the data and its reconstruction, linear autoencoders (LAEs) learn the subspace spanned by the top principal directions but cannot learn the principal directions themselves. In this paper, we prove that $L_2$-regularized LAEs are symmetric at all critical points and learn the principal… ▽ More

    Submitted 14 May, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: 12 pages, 8 figures. ICML 2019