Skip to main content

Showing 1–26 of 26 results for author: Schoenholz, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.05546  [pdf, other

    cs.LG cs.CV

    What does a deep neural network confidently perceive? The effective dimension of high certainty class manifolds and their low confidence boundaries

    Authors: Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel S. Schoenholz

    Abstract: Deep neural network classifiers partition input space into high confidence regions for each class. The geometry of these class manifolds (CMs) is widely studied and intimately related to model performance; for example, the margin depends on CM boundaries. We exploit the notions of Gaussian width and Gordon's escape theorem to tractably estimate the effective dimension of CMs and their boundaries t… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: An extended version of /Slice, Dice, and Optimize: Measuring the Dimension of Neural Network Class Manifolds/

  2. arXiv:2207.09432  [pdf, other

    cs.LG

    Deep equilibrium networks are sensitive to initialization statistics

    Authors: Atish Agarwala, Samuel S. Schoenholz

    Abstract: Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  3. arXiv:2206.08720  [pdf, other

    cs.LG cs.AI stat.ML

    Fast Finite Width Neural Tangent Kernel

    Authors: Roman Novak, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: The Neural Tangent Kernel (NTK), defined as $Θ_θ^f(x_1, x_2) = \left[\partial f(θ, x_1)\big/\partial θ\right] \left[\partial f(θ, x_2)\big/\partial θ\right]^T$ where $\left[\partial f(θ, \cdot)\big/\partial θ\right]$ is a neural network (NN) Jacobian, has emerged as a central object of study in deep learning. In the infinite width limit, the NTK can sometimes be computed analytically and is useful… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Published as a conference paper at ICML 2022

  4. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  5. arXiv:2111.05803  [pdf, other

    cs.LG stat.ML

    Gradients are Not All You Need

    Authors: Luke Metz, C. Daniel Freeman, Samuel S. Schoenholz, Tal Kachman

    Abstract: Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics sim… ▽ More

    Submitted 20 January, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

  6. arXiv:2110.01765  [pdf, other

    cs.LG cs.AI cs.NE

    Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Sha**

    Authors: James Martens, Andy Ballard, Guillaume Desjardins, Grzegorz Swirszcz, Valentin Dalibard, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: Using an extended and formalized version of the Q/C map analysis of Poole et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initialization-time kernel function. We then develop a… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

  7. arXiv:2102.03793  [pdf, other

    cs.LG cond-mat.soft stat.ML

    Tilting the playing field: Dynamical loss functions for machine learning

    Authors: Miguel Ruiz-Garcia, Ge Zhang, Samuel S. Schoenholz, Andrea J. Liu

    Abstract: We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find a deep minima of the standard cross-entropy loss. In overparameterized networks, dynamical loss functions can lead to better generalization.… ▽ More

    Submitted 23 June, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

  8. arXiv:2008.07545  [pdf, other

    cs.LG stat.ML

    Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization

    Authors: Neha S. Wadia, Daniel Duckworth, Samuel S. Schoenholz, Ethan Dyer, Jascha Sohl-Dickstein

    Abstract: Machine learning is predicated on the concept of generalization: a model achieving low error on a sufficiently large training set should also perform well on novel samples from the same distribution. We show that both data whitening and second order optimization can harm or entirely prevent generalization. In general, model training harnesses information contained in the sample-sample second momen… ▽ More

    Submitted 19 July, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: 13+10 pages, 10 figures; minor textual changes and some reorganization, one new figure and a new proof of main theorem added

  9. arXiv:2007.15801  [pdf, other

    cs.LG stat.ML

    Finite Versus Infinite Neural Networks: an Empirical Study

    Authors: Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

    Abstract: We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neu… ▽ More

    Submitted 8 September, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: 17+11 pages; v2 references added, minor improvements

  10. arXiv:2001.07301  [pdf, other

    cs.LG stat.ML

    On the infinite width limit of neural networks with a standard parameterization

    Authors: Jascha Sohl-Dickstein, Roman Novak, Samuel S. Schoenholz, Jaehoon Lee

    Abstract: There are currently two parameterizations used to derive fixed kernels corresponding to infinite width neural networks, the NTK (Neural Tangent Kernel) parameterization and the naive standard parameterization. However, the extrapolation of both of these parameterizations to infinite width is problematic. The standard parameterization leads to a divergent neural tangent kernel while the NTK paramet… ▽ More

    Submitted 18 April, 2020; v1 submitted 20 January, 2020; originally announced January 2020.

  11. arXiv:1912.13053  [pdf, other

    cs.LG stat.ML

    Disentangling Trainability and Generalization in Deep Neural Networks

    Authors: Lechao Xiao, Jeffrey Pennington, Samuel S. Schoenholz

    Abstract: A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a characterization in the limit of very wide and very deep networks, for which the analysis simplifies considerably. For wide networks, the trajectory under gradi… ▽ More

    Submitted 13 July, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

    Comments: 22 pages, 3 figures, ICML 2020. Associated Colab notebook at https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/Disentangling_Trainability_and_Generalization.ipynb

  12. arXiv:1912.02803  [pdf, other

    stat.ML cs.LG

    Neural Tangents: Fast and Easy Infinite Neural Networks in Python

    Authors: Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: Neural Tangents is a library designed to enable research into infinite-width neural networks. It provides a high-level API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finite-width as usual or in their infinite-width limit. Infinite-width networks can be trained analytically using exact Bayesian inference or using… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

  13. arXiv:1902.08129  [pdf, other

    cs.NE cond-mat.dis-nn cs.LG math.DS

    A Mean Field Theory of Batch Normalization

    Authors: Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: We develop a mean field theory for batch normalization in fully-connected feedforward neural networks. In so doing, we provide a precise characterization of signal propagation and gradient backpropagation in wide batch-normalized networks at initialization. Our theory shows that gradient signals grow exponentially in depth and that these exploding gradients cannot be eliminated by tuning the initi… ▽ More

    Submitted 5 March, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

    Comments: To appear in ICLR 2019

  14. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

    Authors: Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington

    Abstract: A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained… ▽ More

    Submitted 8 December, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: 12+16 pages; open-source code available at https://github.com/google/neural-tangents; accepted to NeurIPS 2019

  15. arXiv:1901.08987  [pdf, other

    cs.LG stat.ML

    Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

    Authors: Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington

    Abstract: Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and… ▽ More

    Submitted 23 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

  16. arXiv:1806.05394  [pdf, other

    stat.ML cs.LG

    Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks

    Authors: Minmin Chen, Jeffrey Pennington, Samuel S. Schoenholz

    Abstract: Recurrent neural networks have gained widespread use in modeling sequence data across various domains. While many successful recurrent architectures employ a notion of gating, the exact mechanism that enables such remarkable performance is not well understood. We develop a theory for signal propagation in recurrent networks after random initialization using a combination of mean field theory and r… ▽ More

    Submitted 15 August, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018 Conference Proceedings

  17. arXiv:1806.05393  [pdf, other

    stat.ML cs.LG

    Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

    Authors: Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington

    Abstract: In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enabl… ▽ More

    Submitted 10 July, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: ICML 2018 Conference Proceedings

  18. arXiv:1802.09979  [pdf, other

    stat.ML cs.LG

    The Emergence of Spectral Universality in Deep Networks

    Authors: Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

    Abstract: Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools fro… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

    Comments: 17 pages, 4 figures. Appearing at the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018

  19. arXiv:1801.02774  [pdf, other

    cs.CV

    Adversarial Spheres

    Authors: Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow

    Abstract: State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We h… ▽ More

    Submitted 10 September, 2018; v1 submitted 8 January, 2018; originally announced January 2018.

    MSC Class: 68T45 ACM Class: I.2.6

  20. arXiv:1712.08969  [pdf, other

    cs.NE cond-mat.dis-nn cs.LG math.DS nlin.CD

    Mean Field Residual Networks: On the Edge of Chaos

    Authors: Greg Yang, Samuel S. Schoenholz

    Abstract: We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backw… ▽ More

    Submitted 24 December, 2017; originally announced December 2017.

    Comments: NIPS 2017

  21. arXiv:1711.04735  [pdf, other

    cs.LG stat.ML

    Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

    Authors: Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

    Abstract: It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network's input-output Jacobian is $O(1)$ is essential for avoiding the exponential vanishing or explosion of gradients. The stronger condition that all singular values of the Jacobian concentrate near $1$ is a property… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

    Comments: 13 pages, 6 figures. Appearing at the 31st Conference on Neural Information Processing Systems (NIPS 2017)

  22. arXiv:1711.02846  [pdf, other

    stat.ML cs.LG

    Intriguing Properties of Adversarial Examples

    Authors: Ekin D. Cubuk, Barret Zoph, Samuel S. Schoenholz, Quoc V. Le

    Abstract: It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an inhe… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

    Comments: 17 pages

  23. arXiv:1711.00165  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks as Gaussian Processes

    Authors: Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer… ▽ More

    Submitted 2 March, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: Published version in ICLR 2018. 10 pages + appendix

  24. arXiv:1710.06570  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    A Correspondence Between Random Neural Networks and Statistical Field Theory

    Authors: Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

    Abstract: A number of recent papers have provided evidence that practical design questions about neural networks may be tackled theoretically by studying the behavior of random networks. However, until now the tools available for analyzing random neural networks have been relatively ad-hoc. In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto la… ▽ More

    Submitted 17 October, 2017; originally announced October 2017.

  25. arXiv:1704.01212  [pdf, other

    cs.LG

    Neural Message Passing for Quantum Chemistry

    Authors: Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl

    Abstract: Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At… ▽ More

    Submitted 12 June, 2017; v1 submitted 4 April, 2017; originally announced April 2017.

    Comments: 14 pages

    ACM Class: I.2.6

  26. arXiv:1611.01232  [pdf, other

    stat.ML cs.LG

    Deep Information Propagation

    Authors: Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein

    Abstract: We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth sca… ▽ More

    Submitted 4 April, 2017; v1 submitted 3 November, 2016; originally announced November 2016.