Skip to main content

Showing 1–22 of 22 results for author: Wyart, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00048  [pdf, other

    cs.CL cond-mat.dis-nn cs.LG

    Towards a theory of how the structure of language is acquired by deep neural networks

    Authors: Francesco Cagnetta, Matthieu Wyart

    Abstract: How much data is required to learn the structure of a language via next-token prediction? We study this question for synthetic datasets generated via a Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model that captures the tree-like structure of natural languages. We determine token-token correlations analytically in our model and show that they can be used to build a repre… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures (main)

  2. arXiv:2404.10727  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model

    Authors: Umberto Tomasini, Matthieu Wyart

    Abstract: Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invari… ▽ More

    Submitted 2 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures

  3. arXiv:2402.16991  [pdf, other

    stat.ML cond-mat.dis-nn cs.CV cs.LG

    A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data

    Authors: Antonio Sclocchi, Alessandro Favero, Matthieu Wyart

    Abstract: Understanding the structure of real data is paramount in advancing modern deep-learning methodologies. Natural data such as images are believed to be composed of features organised in a hierarchical and combinatorial manner, which neural networks capture during learning. Recent advancements show that diffusion models can generate high-quality images, hinting at their ability to capture this underl… ▽ More

    Submitted 4 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 21 pages, 16 figures

  4. arXiv:2309.10688  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    On the different regimes of Stochastic Gradient Descent

    Authors: Antonio Sclocchi, Matthieu Wyart

    Abstract: Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $η$. For small $B$ and large $η$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the ''temperature'' $T\equiv η/B$. Yet this description is observed t… ▽ More

    Submitted 27 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Main: 8 pages, 4 figures; Appendix: 15 pages, 11 figures

    Journal ref: Proceedings of the National Academy of Sciences 121.9 (2024): e2316301121

  5. arXiv:2307.02129  [pdf, other

    cs.LG cs.CV stat.ML

    How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model

    Authors: Francesco Cagnetta, Leonardo Petrini, Umberto M. Tomasini, Alessandro Favero, Matthieu Wyart

    Abstract: Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional data representations. However, how many training examples are required to learn such representations remains unknown. To quantitatively study this question, we in… ▽ More

    Submitted 3 July, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: 9 pages, 8 figures

    Journal ref: Phys. Rev. X 14, 031001 (2024)

  6. arXiv:2301.13703  [pdf, other

    cs.LG cond-mat.dis-nn

    Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning

    Authors: Antonio Sclocchi, Mario Geiger, Matthieu Wyart

    Abstract: Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $α$ are varied. For gradient descent, $α$ is a k… ▽ More

    Submitted 30 May, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: 25 pages, 21 figures, added analysis in feature-learning

  7. arXiv:2210.01506  [pdf, other

    cs.LG cs.CV

    How deep convolutional neural networks lose spatial information with training

    Authors: Umberto M. Tomasini, Leonardo Petrini, Francesco Cagnetta, Matthieu Wyart

    Abstract: A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitiv… ▽ More

    Submitted 23 November, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

  8. arXiv:2208.01003  [pdf, other

    stat.ML cs.LG

    What Can Be Learnt With Wide Convolutional Neural Networks?

    Authors: Francesco Cagnetta, Alessandro Favero, Matthieu Wyart

    Abstract: Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g., the rate of decay of the generalisation error with the nu… ▽ More

    Submitted 31 May, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202. 2023

  9. arXiv:2206.12314  [pdf, other

    stat.ML cs.LG

    Learning sparse features can lead to overfitting in neural networks

    Authors: Leonardo Petrini, Francesco Cagnetta, Eric Vanden-Eijnden, Matthieu Wyart

    Abstract: It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge: for example, it is beneficial for modern architectures trained to classify images, whereas it is detrimental for fully-connected networks trained for the same t… ▽ More

    Submitted 12 October, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

  10. arXiv:2202.03348  [pdf, other

    cs.LG cond-mat.stat-mech

    Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data

    Authors: Umberto M. Tomasini, Antonio Sclocchi, Matthieu Wyart

    Abstract: Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a `spectral bias': decomposing the true function $f^*$ on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where $P$ is the size of the training set. This predicti… ▽ More

    Submitted 16 February, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 34 pages, 11 figures

  11. arXiv:2106.08849  [pdf, other

    cs.LG

    How memory architecture affects learning in a simple POMDP: the two-hypothesis testing problem

    Authors: Mario Geiger, Christophe Eloy, Matthieu Wyart

    Abstract: Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance… ▽ More

    Submitted 18 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

  12. arXiv:2106.08619  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

    Authors: Alessandro Favero, Francesco Cagnetta, Matthieu Wyart

    Abstract: Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using… ▽ More

    Submitted 12 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 32 pages, 7 figures

  13. Relative stability toward diffeomorphisms indicates performance in deep nets

    Authors: Leonardo Petrini, Alessandro Favero, Mario Geiger, Matthieu Wyart

    Abstract: Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm t… ▽ More

    Submitted 4 November, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: NeurIPS 2021 Conference

  14. arXiv:2012.15110  [pdf, other

    cs.LG

    Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

    Authors: Mario Geiger, Leonardo Petrini, Matthieu Wyart

    Abstract: Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure,… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  15. Geometric compression of invariant manifolds in neural nets

    Authors: Jonas Paccolat, Leonardo Petrini, Mario Geiger, Kevin Tyloo, Matthieu Wyart

    Abstract: We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insens… ▽ More

    Submitted 11 March, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Journal ref: Journal of Statistical Mechanics: Theory and Experiment, Volume 2021, April 2021

  16. arXiv:2006.09754  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    How isotropic kernels perform on simple invariants

    Authors: Jonas Paccolat, Stefano Spigler, Matthieu Wyart

    Abstract: We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $ε$ that follows $ε\sim p^{-β}$ where $p$ is the size of… ▽ More

    Submitted 14 December, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  17. Disentangling feature and lazy training in deep neural networks

    Authors: Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart

    Abstract: Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the paramet… ▽ More

    Submitted 4 October, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: minor revisions

  18. arXiv:1905.10843  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

    Authors: Stefano Spigler, Mario Geiger, Matthieu Wyart

    Abstract: How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression… ▽ More

    Submitted 18 August, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: We added (i) the prediction of the exponent $β$ for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks")

  19. arXiv:1901.01608  [pdf, other

    cond-mat.dis-nn cs.LG

    Scaling description of generalization with number of parameters in deep learning

    Authors: Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart

    Abstract: Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over… ▽ More

    Submitted 8 October, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: The clarity of the text has been improved: the section "Related works" has been updated and the section "3.1 Regression task" has been added

  20. arXiv:1810.09665  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Authors: Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

    Abstract: We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.09349

  21. arXiv:1809.09349  [pdf, other

    cond-mat.dis-nn cs.LG

    The jamming transition as a paradigm to understand the loss landscape of deep neural networks

    Authors: Mario Geiger, Stefano Spigler, Stéphane d'Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart

    Abstract: Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a chal… ▽ More

    Submitted 17 June, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

    Journal ref: Phys. Rev. E 100, 012115 (2019)

  22. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013