Skip to main content

Showing 1–5 of 5 results for author: Spigler, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2006.09754  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    How isotropic kernels perform on simple invariants

    Authors: Jonas Paccolat, Stefano Spigler, Matthieu Wyart

    Abstract: We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $ε$ that follows $ε\sim p^{-β}$ where $p$ is the size of… ▽ More

    Submitted 14 December, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  2. Disentangling feature and lazy training in deep neural networks

    Authors: Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart

    Abstract: Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen kernel $Θ$. By contrast, in the Mean-Field limit, the dynamics can be expressed in terms of the distribution of the paramet… ▽ More

    Submitted 4 October, 2020; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: minor revisions

  3. arXiv:1905.10843  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

    Authors: Stefano Spigler, Mario Geiger, Matthieu Wyart

    Abstract: How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-β}$ where $n$ is the number of training examples and $β$ an exponent that depends on both data and algorithm. In this work we measure $β$ when applying kernel methods to real datasets. For MNIST we find $β\approx 0.4$ and for CIFAR10 $β\approx 0.1$, for both regression… ▽ More

    Submitted 18 August, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: We added (i) the prediction of the exponent $β$ for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks")

  4. arXiv:1810.09665  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Authors: Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

    Abstract: We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.09349

  5. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013