Skip to main content

Showing 1–10 of 10 results for author: Stephan, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02157  [pdf, other

    stat.ML cs.LG

    Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2405.15459  [pdf, other

    stat.ML cs.LG

    Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Luca Pesce, Ludovic Stephan

    Abstract: Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2305.18502  [pdf, other

    stat.ML cs.LG

    Esca** mediocrity: how two-layer networks learn hard generalized linear models with SGD

    Authors: Luca Arnaboldi, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

    Abstract: This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d \log d)$ samples are typically needed. However, we provide precise results concerning the pre-fa… ▽ More

    Submitted 1 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

  4. arXiv:2305.18270  [pdf, other

    stat.ML cs.LG

    How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

    Authors: Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We investigate theoretically how the features of a two-layer neural network adapt to the structure of the target function through a few large batch gradient descent steps, leading to improvement in the approximation capacity with respect to the initialization. We compare the influence of batch size and that of multiple (but finitely many) steps. For a single gradient step, a batch of size… ▽ More

    Submitted 15 December, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  5. arXiv:2302.08923  [pdf, other

    math.ST cond-mat.dis-nn cs.LG stat.ML

    Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation

    Authors: Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

    Abstract: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we a… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  6. arXiv:2302.05882  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

    Authors: Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro

    Abstract: This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying an… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  7. arXiv:2205.13303  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Gaussian Universality of Perceptrons with Random Labels

    Authors: Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová

    Abstract: While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that t… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  8. arXiv:2202.00293  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

    Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect… ▽ More

    Submitted 14 June, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 20 pages

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages {23244--23255)

  9. arXiv:2102.03188  [pdf, other

    cs.LG math.PR

    A simpler spectral approach for clustering in directed networks

    Authors: Simon Coste, Ludovic Stephan

    Abstract: We study the task of clustering in directed networks. We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods which are based on a combination of data regularization and SVD truncation, and works well down to the very sparse regime where the edge density has constant order. Our analysis is based on a Master Theorem describing sharp asy… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: 42 pages

  10. arXiv:1811.05808  [pdf, ps, other

    cs.DS math.PR

    Robustness of spectral methods for community detection

    Authors: Ludovic Stephan, Laurent Massoulié

    Abstract: The present work is concerned with community detection. Specifically, we consider a random graph drawn according to the stochastic block model~: its vertex set is partitioned into blocks, or communities, and edges are placed randomly and independently of each other with probability depending only on the communities of their two endpoints. In this context, our aim is to recover the community labels… ▽ More

    Submitted 25 June, 2019; v1 submitted 14 November, 2018; originally announced November 2018.