Skip to main content

Showing 1–13 of 13 results for author: Wyart, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.10727  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model

    Authors: Umberto Tomasini, Matthieu Wyart

    Abstract: Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invari… ▽ More

    Submitted 2 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures

  2. arXiv:2402.16991  [pdf, other

    stat.ML cond-mat.dis-nn cs.CV cs.LG

    A Phase Transition in Diffusion Models Reveals the Hierarchical Nature of Data

    Authors: Antonio Sclocchi, Alessandro Favero, Matthieu Wyart

    Abstract: Understanding the structure of real data is paramount in advancing modern deep-learning methodologies. Natural data such as images are believed to be composed of features organised in a hierarchical and combinatorial manner, which neural networks capture during learning. Recent advancements show that diffusion models can generate high-quality images, hinting at their ability to capture this underl… ▽ More

    Submitted 4 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 21 pages, 16 figures

  3. arXiv:2309.10688  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    On the different regimes of Stochastic Gradient Descent

    Authors: Antonio Sclocchi, Matthieu Wyart

    Abstract: Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $η$. For small $B$ and large $η$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the ''temperature'' $T\equiv η/B$. Yet this description is observed t… ▽ More

    Submitted 27 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Main: 8 pages, 4 figures; Appendix: 15 pages, 11 figures

    Journal ref: Proceedings of the National Academy of Sciences 121.9 (2024): e2316301121

  4. arXiv:2307.02129  [pdf, other

    cs.LG cs.CV stat.ML

    How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model

    Authors: Francesco Cagnetta, Leonardo Petrini, Umberto M. Tomasini, Alessandro Favero, Matthieu Wyart

    Abstract: Deep learning algorithms demonstrate a surprising ability to learn high-dimensional tasks from limited examples. This is commonly attributed to the depth of neural networks, enabling them to build a hierarchy of abstract, low-dimensional data representations. However, how many training examples are required to learn such representations remains unknown. To quantitatively study this question, we in… ▽ More

    Submitted 3 July, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: 9 pages, 8 figures

    Journal ref: Phys. Rev. X 14, 031001 (2024)

  5. arXiv:2208.01003  [pdf, other

    stat.ML cs.LG

    What Can Be Learnt With Wide Convolutional Neural Networks?

    Authors: Francesco Cagnetta, Alessandro Favero, Matthieu Wyart

    Abstract: Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g., the rate of decay of the generalisation error with the nu… ▽ More

    Submitted 31 May, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202. 2023

  6. arXiv:2206.12314  [pdf, other

    stat.ML cs.LG

    Learning sparse features can lead to overfitting in neural networks

    Authors: Leonardo Petrini, Francesco Cagnetta, Eric Vanden-Eijnden, Matthieu Wyart

    Abstract: It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge: for example, it is beneficial for modern architectures trained to classify images, whereas it is detrimental for fully-connected networks trained for the same t… ▽ More

    Submitted 12 October, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

  7. arXiv:2106.08619  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

    Authors: Alessandro Favero, Francesco Cagnetta, Matthieu Wyart

    Abstract: Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using… ▽ More

    Submitted 12 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 32 pages, 7 figures

  8. Geometric compression of invariant manifolds in neural nets

    Authors: Jonas Paccolat, Leonardo Petrini, Mario Geiger, Kevin Tyloo, Matthieu Wyart

    Abstract: We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions, but whose label only vary within a linear manifold of dimension $d_\parallel < d$. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insens… ▽ More

    Submitted 11 March, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Journal ref: Journal of Statistical Mechanics: Theory and Experiment, Volume 2021, April 2021

  9. arXiv:2006.09754  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    How isotropic kernels perform on simple invariants

    Authors: Jonas Paccolat, Stefano Spigler, Matthieu Wyart

    Abstract: We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $ε$ that follows $ε\sim p^{-β}$ where $p$ is the size of… ▽ More

    Submitted 14 December, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  10. Disentangling feature and lazy training in deep neural networks

    Authors: Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart

    Abstract: Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the paramet… ▽ More

    Submitted 4 October, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: minor revisions

  11. arXiv:1905.10843  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

    Authors: Stefano Spigler, Mario Geiger, Matthieu Wyart

    Abstract: How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression… ▽ More

    Submitted 18 August, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: We added (i) the prediction of the exponent $β$ for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks")

  12. arXiv:1810.09665  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Authors: Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

    Abstract: We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.09349

  13. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013