Skip to main content

Showing 1–41 of 41 results for author: Ganguli, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.06158  [pdf, other

    cs.LG cs.AI stat.ML

    Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

    Authors: Daniel Kunin, Allan Raventós, Clémentine Dominé, Feng Chen, David Klindt, Andrew Saxe, Surya Ganguli

    Abstract: While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our theoretical understanding stemming from the opposing lazy regime. In this work, we derive exact solutions to a minimal model that transitions between laz… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 40 pages, 12 figures

  2. arXiv:2401.13880  [pdf, other

    stat.AP

    Principal Component Regression to Study the Impact of Economic Factors on Disadvantaged Communities

    Authors: Narmadha M. Mohankumar, Milan Jain, Heng Wan, Sumitrra Ganguli, Kyle D. Wilson, David M. Anderson

    Abstract: The Council on Environmental Quality's Climate and Economic Justice Screening Tool defines "disadvantaged communities" (DAC) in the USA, highlighting census tracts where benefits of climate and energy investments are not accruing. We use a principal component generalized linear model, which addresses the intertwined nature of economic factors, income and employment and model their relationship to… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures, 2 tables

  3. arXiv:2306.04251  [pdf, other

    cs.LG cs.AI stat.ML

    Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

    Authors: Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli

    Abstract: In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant set… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 37 pages, 12 figures, NeurIPS 2023

  4. arXiv:2210.03820  [pdf, other

    cs.LG stat.ML

    The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

    Authors: Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli

    Abstract: In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while stru… ▽ More

    Submitted 16 February, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: 41 pages, 5 figures, ICLR 2023

  5. arXiv:2210.03044  [pdf, other

    cs.LG cs.AI stat.ML

    Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

    Authors: Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

    Abstract: Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of tra… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: The first three authors contributed equally

  6. arXiv:2206.14486  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Beyond neural scaling laws: beating power law scaling via data pruning

    Authors: Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

    Abstract: Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scal… ▽ More

    Submitted 21 April, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Outstanding Paper Award @ NeurIPS 2022. Added github link to metric scores

  7. arXiv:2206.01278  [pdf, other

    cs.LG cs.AI stat.ML

    Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

    Authors: Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

    Abstract: A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this ear… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: The first two authors contributed equally

  8. arXiv:2107.09133  [pdf, other

    cs.LG cond-mat.stat-mech q-bio.NC stat.ML

    The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

    Authors: Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins

    Abstract: In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate… ▽ More

    Submitted 28 December, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 78 pages, 9 figures, Neural Computation 2024

    Journal ref: Neural Computation (2024) 36 (1) 151-174

  9. arXiv:2107.05802  [pdf, other

    cs.LG stat.ML

    How many degrees of freedom do we need to train deep networks: a loss landscape perspective

    Authors: Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli

    Abstract: A variety of recent works, spanning pruning, lottery tickets, and training within random subspaces, have shown that deep neural networks can be trained using far fewer degrees of freedom than the total number of parameters. We analyze this phenomenon for random subspaces by first examining the success probability of hitting a training loss sub-level set when training within a random subspace of a… ▽ More

    Submitted 3 February, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: ICLR 2022

  10. arXiv:2012.04728  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech q-bio.NC stat.ML

    Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

    Authors: Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

    Abstract: Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients derived from real-world datasets. We circumvent this obstacle through a unifying theoreti… ▽ More

    Submitted 29 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: 30 pages, 17 figures, ICLR 2021

  11. arXiv:2010.15110  [pdf, other

    cs.LG stat.ML

    Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

    Authors: Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

    Abstract: In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 19 pages, 19 figures, In Advances in Neural Information Processing Systems 34 (NeurIPS 2020)

  12. arXiv:2010.11765  [pdf, other

    q-bio.NC cs.LG stat.ML

    Identifying Learning Rules From Neural Network Observables

    Authors: Aran Nayebi, Sanjana Srivastava, Surya Ganguli, Daniel L. K. Yamins

    Abstract: The brain modifies its synaptic strengths during learning in order to better adapt to its environment. However, the underlying plasticity rules that govern learning are unknown. Many proposals have been suggested, including Hebbian mechanisms, explicit error backpropagation, and a variety of alternatives. It is an open question as to what specific experimental measurements would need to be made to… ▽ More

    Submitted 8 December, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020 Camera Ready Version, 21 pages including supplementary information, 13 figures

  13. arXiv:2010.00578  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Self-supervised Learning with Dual Deep Networks

    Authors: Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

    Abstract: We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e.g., SimCLR). First, we prove that in each SGD update of SimCLR with various loss functions, including simple contrastive loss, soft Triplet loss and InfoNCE loss, the weights at each layer are updated by a \emph{covariance operator} that specific… ▽ More

    Submitted 14 February, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

  14. arXiv:2006.14178  [pdf, ps, other

    q-bio.NC cond-mat.dis-nn stat.ML

    Predictive coding in balanced neural networks with noise, chaos and delays

    Authors: Jonathan Kadmon, Jonathan Timcheck, Surya Ganguli

    Abstract: Biological neural networks face a formidable task: performing reliable computations in the face of intrinsic stochasticity in individual neurons, imprecisely specified synaptic connectivity, and nonnegligible delays in synaptic transmission. A common approach to combatting such biological heterogeneity involves averaging over large redundant networks of $N$ neurons resulting in coding errors that… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

  15. arXiv:2006.05467  [pdf, other

    cs.LG cond-mat.dis-nn cs.CV q-bio.NC stat.ML

    Pruning neural networks without any data by iteratively conserving synaptic flow

    Authors: Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

    Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we ide… ▽ More

    Submitted 18 November, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020, 18 pages, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 2020

  16. arXiv:2003.01513  [pdf, other

    q-bio.NC cs.LG cs.NE stat.ML

    Two Routes to Scalable Credit Assignment without Weight Symmetry

    Authors: Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins

    Abstract: The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another. Until recently, attempts to create local learning rules that avoid weight transport have typically failed in the large-scale learning scenarios where backpropagation s… ▽ More

    Submitted 24 June, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: ICML 2020 Camera Ready Version, 19 pages including supplementary information, 10 figures

  17. arXiv:1910.05929  [pdf, other

    cs.LG cs.NE stat.ML

    Emergent properties of the local geometry of neural loss landscapes

    Authors: Stanislav Fort, Surya Ganguli

    Abstract: The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly $C$ directions of high positive curvature, wh… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 10 pages, 8 figures

  18. arXiv:1907.00139  [pdf, other

    cs.LG stat.ML

    Fast Convolutive Nonnegative Matrix Factorization Through Coordinate and Block Coordinate Updates

    Authors: Anthony Degleris, Ben Antin, Surya Ganguli, Alex H Williams

    Abstract: Identifying recurring patterns in high-dimensional time series data is an important problem in many scientific domains. A popular model to achieve this is convolutive nonnegative matrix factorization (CNMF), which extends classic nonnegative matrix factorization (NMF) to extract short-lived temporal motifs from a long time series. Prior work has typically fit this model by multiplicative parameter… ▽ More

    Submitted 28 June, 2019; originally announced July 2019.

    Comments: 10 pages, 5 figures

  19. arXiv:1906.10720  [pdf, other

    cs.LG stat.ML

    Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

    Authors: Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

    Abstract: Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription… ▽ More

    Submitted 4 December, 2019; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: Presented at NeurIPS 2019

  20. arXiv:1810.10531  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    A mathematical theory of semantic development in deep neural networks

    Authors: Andrew M. Saxe, James L. McClelland, Surya Ganguli

    Abstract: An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual expe… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  21. arXiv:1810.10065  [pdf, ps, other

    stat.ML cs.LG q-bio.NC

    Statistical mechanics of low-rank tensor decomposition

    Authors: Jonathan Kadmon, Surya Ganguli

    Abstract: Often, large, high dimensional datasets collected across multiple modalities can be organized as a higher order tensor. Low-rank tensor decomposition then arises as a powerful and widely used tool to discover simple low dimensional structures underlying such data. However, we currently lack a theoretical understanding of the algorithmic behavior of low-rank tensor decompositions. We derive Bayesia… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

    Comments: 27 pages, 3 figures

  22. arXiv:1809.10374  [pdf, other

    stat.ML cs.LG

    An analytic theory of generalization dynamics and transfer learning in deep linear networks

    Authors: Andrew K. Lampinen, Surya Ganguli

    Abstract: Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this striking performance. Furthermore, a major hope is that knowledge may transfer across tasks, so that multi-task learning can improve generalization on individual task… ▽ More

    Submitted 4 January, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: ICLR 2019, 20 pages

    ACM Class: I.2.6; F.m

  23. arXiv:1802.09979  [pdf, other

    stat.ML cs.LG

    The Emergence of Spectral Universality in Deep Networks

    Authors: Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

    Abstract: Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools fro… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

    Comments: 17 pages, 4 figures. Appearing at the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018

  24. arXiv:1711.04735  [pdf, other

    cs.LG stat.ML

    Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

    Authors: Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

    Abstract: It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network's input-output Jacobian is $O(1)$ is essential for avoiding the exponential vanishing or explosion of gradients. The stronger condition that all singular values of the Jacobian concentrate near $1$ is a property… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

    Comments: 13 pages, 6 figures. Appearing at the 31st Conference on Neural Information Processing Systems (NIPS 2017)

  25. arXiv:1711.02282  [pdf, other

    stat.ML cs.LG cs.NE

    Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

    Authors: Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

    Abstract: We propose a novel method to directly learn a stochastic transition operator whose repeated application provides generated samples. Traditional undirected graphical models approach this problem indirectly by learning a Markov chain model whose stationary distribution obeys detailed balance with respect to a parameterized energy function. The energy function is then modified so the model and data d… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: To appear at NIPS 2017

  26. arXiv:1705.11146  [pdf, other

    q-bio.NC cs.LG cs.NE stat.ML

    SuperSpike: Supervised learning in multi-layer spiking neural networks

    Authors: Friedemann Zenke, Surya Ganguli

    Abstract: A vast majority of computation in the brain is performed by spiking neural networks. Despite the ubiquity of such spiking, we currently lack an understanding of how biological spiking neural circuits learn and compute in-vivo, as well as how we can instantiate such capabilities in artificial spiking circuits in-silico. Here we revisit the problem of supervised learning in temporally coding multi-l… ▽ More

    Submitted 14 October, 2017; v1 submitted 31 May, 2017; originally announced May 2017.

  27. arXiv:1703.09202  [pdf, other

    stat.ML cs.LG q-bio.NC

    Biologically inspired protection of deep networks from adversarial attacks

    Authors: Aran Nayebi, Surya Ganguli

    Abstract: Inspired by biophysical principles underlying nonlinear dendritic computation in neural circuits, we develop a scheme to train deep neural networks to make them robust to adversarial attacks. Our scheme generates highly nonlinear, saturated neural networks that achieve state of the art performance on gradient based adversarial examples on MNIST, despite never being exposed to adversarially chosen… ▽ More

    Submitted 27 March, 2017; originally announced March 2017.

    Comments: 11 pages

  28. arXiv:1703.04200  [pdf, other

    cs.LG q-bio.NC stat.ML

    Continual Learning Through Synaptic Intelligence

    Authors: Friedemann Zenke, Ben Poole, Surya Ganguli

    Abstract: While deep learning has led to remarkable advances across diverse applications, it struggles in domains where the data distribution changes over the course of learning. In stark contrast, biological neural networks continually adapt to changing domains, possibly by leveraging complex molecular machinery to solve many tasks simultaneously. In this study, we introduce intelligent synapses that bring… ▽ More

    Submitted 12 June, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

    Comments: ICML 2017

  29. arXiv:1702.01825  [pdf, other

    q-bio.NC stat.ML

    Deep Learning Models of the Retinal Response to Natural Scenes

    Authors: Lane T. McIntosh, Niru Maheswaranathan, Aran Nayebi, Surya Ganguli, Stephen A. Baccus

    Abstract: A central challenge in neuroscience is to understand neural computations and circuit mechanisms that underlie the encoding of ethologically relevant, natural stimuli. In multilayered neural circuits, nonlinear processes such as synaptic transmission and spiking dynamics present a significant obstacle to the creation of accurate computational models of responses to natural stimuli. Here we demonstr… ▽ More

    Submitted 6 February, 2017; originally announced February 2017.

    Comments: L.T.M. and N.M. contributed equally to this work. Presented at NIPS 2016

    Journal ref: Advances in Neural Information Processing Systems 29 (2016) 1361-1369

  30. arXiv:1611.08083  [pdf, other

    stat.ML cs.LG cs.NE

    Survey of Expressivity in Deep Neural Networks

    Authors: Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We survey results on neural network expressivity described in "On the Expressive Power of Deep Neural Networks". The paper motivates and develops three natural measures of expressiveness, which all display an exponential dependence on the depth of the network. In fact, all of these measures are related to a fourth quantity, trajectory length. This quantity grows exponentially in the depth of the n… ▽ More

    Submitted 24 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  31. arXiv:1611.01232  [pdf, other

    stat.ML cs.LG

    Deep Information Propagation

    Authors: Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth sca… ▽ More

    Submitted 4 April, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

  32. arXiv:1609.07060  [pdf, other

    stat.ML cond-mat.dis-nn math.ST q-bio.NC

    An equivalence between high dimensional Bayes optimal inference and M-estimation

    Authors: Madhu Advani, Surya Ganguli

    Abstract: When recovering an unknown signal from noisy measurements, the computational difficulty of performing optimal Bayesian MMSE (minimum mean squared error) inference often necessitates the use of maximum a posteriori (MAP) inference, a special case of regularized M-estimation, as a surrogate. However, MAP is suboptimal in high dimensions, when the number of unknown signal components is similar to the… ▽ More

    Submitted 22 September, 2016; originally announced September 2016.

    Comments: To appear in NIPS 2016

  33. arXiv:1607.04331  [pdf, other

    stat.ML cs.LG q-bio.NC

    Random projections of random manifolds

    Authors: Subhaneil Lahiri, Peiran Gao, Surya Ganguli

    Abstract: Interesting data often concentrate on low dimensional smooth manifolds inside a high dimensional ambient space. Random projections are a simple, powerful tool for dimensionality reduction of such data. Previous works have studied bounds on how many projections are needed to accurately preserve the geometry of these manifolds, given their intrinsic dimensionality, volume and curvature. However, suc… ▽ More

    Submitted 9 September, 2016; v1 submitted 14 July, 2016; originally announced July 2016.

    Comments: 45 pages, 9 figures

  34. arXiv:1606.05340  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Exponential expressivity in deep neural networks through transient chaos

    Authors: Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli

    Abstract: We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width. We prove this gene… ▽ More

    Submitted 17 June, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Fixed equation references

  35. arXiv:1606.05336  [pdf, other

    stat.ML cs.AI cs.LG

    On the Expressive Power of Deep Neural Networks

    Authors: Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute. Our approach is based on an interrelated set of measures of expressivity, unified by the novel notion of trajectory length, which measures how the output of a network changes as the input sweeps along a… ▽ More

    Submitted 18 June, 2017; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Accepted to ICML 2017

  36. arXiv:1603.07758  [pdf, other

    cond-mat.stat-mech cs.IT physics.bio-ph q-bio.NC stat.ML

    A universal tradeoff between power, precision and speed in physical communication

    Authors: Subhaneil Lahiri, Jascha Sohl-Dickstein, Surya Ganguli

    Abstract: Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal. Also, biological systems achieve remarkable speed, precision and power efficiency using poorly understood physical design principles. Powerful theories like information theory and thermodynamics do not provide general limits on power, precision and speed. Here we go beyo… ▽ More

    Submitted 24 March, 2016; originally announced March 2016.

    Comments: 15 pages, 3 figures

  37. arXiv:1601.04650  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech math.ST q-bio.QM

    Statistical Mechanics of High-Dimensional Inference

    Authors: Madhu Advani, Surya Ganguli

    Abstract: To model modern large-scale datasets, we need efficient algorithms to infer a set of $P$ unknown model parameters from $N$ noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise ratios, limited measurements, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve… ▽ More

    Submitted 21 February, 2016; v1 submitted 18 January, 2016; originally announced January 2016.

    Comments: See http://ganguli-gang.stanford.edu/pdf/HighDimInf.Supp.pdf for supplementary material

    Journal ref: Phys. Rev. X 6, 031034 (2016)

  38. arXiv:1503.03585  [pdf, other

    cs.LG cond-mat.dis-nn q-bio.NC stat.ML

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    Authors: Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

    Abstract: A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physi… ▽ More

    Submitted 18 November, 2015; v1 submitted 12 March, 2015; originally announced March 2015.

  39. arXiv:1406.2572  [pdf, other

    cs.LG math.OC stat.ML

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Authors: Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

    Abstract: A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima wit… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

    Comments: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG]

  40. arXiv:1312.6120  [pdf, other

    cs.NE cond-mat.dis-nn cs.CV cs.LG q-bio.NC stat.ML

    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

    Authors: Andrew M. Saxe, James L. McClelland, Surya Ganguli

    Abstract: Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map,… ▽ More

    Submitted 19 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Submission to ICLR2014. Revised based on reviewer feedback

  41. arXiv:1301.7115  [pdf, ps, other

    q-bio.NC cond-mat.dis-nn stat.ML

    Statistical mechanics of complex neural systems and high dimensional data

    Authors: Madhu Advani, Subhaneil Lahiri, Surya Ganguli

    Abstract: Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? And second, h… ▽ More

    Submitted 29 January, 2013; originally announced January 2013.

    Comments: 72 pages, 8 figures, iopart.cls, to appear in JSTAT