Skip to main content

Showing 1–8 of 8 results for author: Sagun, L

Searching in archive cond-mat. Search in all archives.
.
  1. arXiv:2103.05524  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    On the interplay between data structure and loss function in classification problems

    Authors: Stéphane d'Ascoli, Marylou Gabrié, Levent Sagun, Giulio Biroli

    Abstract: One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well. Although the low-dimensional structure of typical datasets is key to this behavior, most theoretical studies of overparametrization focus on isotropic inputs. In this work, we instead consider an analytically tractable model of structured data, where the input covariance is b… ▽ More

    Submitted 12 October, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

  2. arXiv:2006.03509  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Triple descent and the two kinds of overfitting: Where & why do they appear?

    Authors: Stéphane d'Ascoli, Levent Sagun, Giulio Biroli

    Abstract: A recent line of research has highlighted the existence of a "double descent" phenomenon in deep learning, whereby increasing the number of training examples $N$ causes the generalization error of neural networks to peak when $N$ is of the same order as the number of parameters $P$. In earlier works, a similar phenomenon was shown to exist in simpler models such as linear regression, where the pea… ▽ More

    Submitted 13 October, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

  3. arXiv:1906.06766  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

    Authors: Stéphane d'Ascoli, Levent Sagun, Joan Bruna, Giulio Biroli

    Abstract: Despite the phenomenal success of deep neural networks in a broad range of learning tasks, there is a lack of theory to understand the way they work. In particular, Convolutional Neural Networks (CNNs) are known to perform much better than Fully-Connected Networks (FCNs) on spatially structured data: the architectural structure of CNNs benefits from prior knowledge on the features of the data, for… ▽ More

    Submitted 4 February, 2020; v1 submitted 16 June, 2019; originally announced June 2019.

    Comments: Update for the camera ready version - NeurIPS 2019

  4. arXiv:1901.01608  [pdf, other

    cond-mat.dis-nn cs.LG

    Scaling description of generalization with number of parameters in deep learning

    Authors: Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart

    Abstract: Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over… ▽ More

    Submitted 8 October, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: The clarity of the text has been improved: the section "Related works" has been updated and the section "3.1 Regression task" has been added

  5. arXiv:1810.09665  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Authors: Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

    Abstract: We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.09349

  6. arXiv:1809.09349  [pdf, other

    cond-mat.dis-nn cs.LG

    The jamming transition as a paradigm to understand the loss landscape of deep neural networks

    Authors: Mario Geiger, Stefano Spigler, Stéphane d'Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart

    Abstract: Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a chal… ▽ More

    Submitted 17 June, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

    Journal ref: Phys. Rev. E 100, 012115 (2019)

  7. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013

  8. arXiv:1703.07915  [pdf, other

    stat.ML cond-mat.dis-nn cs.CV cs.LG hep-th

    Perspective: Energy Landscapes for Machine Learning

    Authors: Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales

    Abstract: Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to… ▽ More

    Submitted 22 March, 2017; originally announced March 2017.

    Comments: 41 pages, 25 figures. Accepted for publication in Physical Chemistry Chemical Physics, 2017