Skip to main content

Showing 1–50 of 80 results for author: Krzakala, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.02157  [pdf, other

    stat.ML cs.LG

    Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2405.15459  [pdf, other

    stat.ML cs.LG

    Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Luca Pesce, Ludovic Stephan

    Abstract: Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2403.04234  [pdf, other

    stat.ML cs.LG

    Fundamental limits of Non-Linear Low-Rank Matrix Estimation

    Authors: Pierre Mergny, Justin Ko, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 42 pages, 2 figures

  4. arXiv:2403.03695  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Spectral Phase Transition and Optimal PCA in Block-Structured Spiked models

    Authors: Pierre Mergny, Justin Ko, Florent Krzakala

    Abstract: We discuss the inhomogeneous spiked Wigner model, a theoretical framework recently introduced to study structured noise in various learning scenarios, through the prism of random matrix theory, with a specific focus on its spectral properties. Our primary objective is to find an optimal spectral method and to extend the celebrated \cite{BBP} (BBP) phase transition criterion -- well-known in the ho… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 26 pages, 2 figures

  5. arXiv:2402.13622  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.LG

    Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

    Authors: Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, ta… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  6. arXiv:2402.05674  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs

    Authors: Kasimir Tanner, Matteo Vilucchio, Bruno Loureiro, Florent Krzakala

    Abstract: This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $α= n / d$. We introduce a tractable mathematical model where the interplay between the data and adversarial attacker geometries can be studied, while capturing the core phenomenology observ… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  7. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  8. arXiv:2402.03220  [pdf, other

    stat.ML cs.LG

    The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

    Authors: Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

    Abstract: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2024

  9. arXiv:2310.03575  [pdf, other

    stat.ML cs.LG

    Analysis of learning a flow-based generative model from limited sample complexity

    Authors: Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, Lenka Zdeborová

    Abstract: We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from th… ▽ More

    Submitted 25 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  10. arXiv:2305.18974  [pdf, other

    stat.ML cs.LG

    Asymptotic Characterisation of Robust Empirical Risk Minimisation Performance in the Presence of Outliers

    Authors: Matteo Vilucchio, Emanuele Troiani, Vittorio Erba, Florent Krzakala

    Abstract: We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $α=n/d$, and study a data model that includes outliers. We provide exact asymptotics for the performances of the empirical risk minimisation (ERM) using $\ell_2$-regularised $\ell_2$, $\ell_1$, and Huber losses, which are the standard approach to such proble… ▽ More

    Submitted 27 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Journal ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:811-819, 2024

  11. arXiv:2305.18502  [pdf, other

    stat.ML cs.LG

    Esca** mediocrity: how two-layer networks learn hard generalized linear models with SGD

    Authors: Luca Arnaboldi, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

    Abstract: This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d \log d)$ samples are typically needed. However, we provide precise results concerning the pre-fa… ▽ More

    Submitted 1 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

  12. arXiv:2305.18270  [pdf, other

    stat.ML cs.LG

    How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

    Authors: Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We investigate theoretically how the features of a two-layer neural network adapt to the structure of the target function through a few large batch gradient descent steps, leading to improvement in the approximation capacity with respect to the initialization. We compare the influence of batch size and that of multiple (but finitely many) steps. For a single gradient step, a batch of size… ▽ More

    Submitted 15 December, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  13. arXiv:2303.02644  [pdf, other

    cs.LG stat.ML

    Expectation consistency for calibration of neural networks

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite their incredible performance, it is well reported that deep neural networks tend to be overoptimistic about their prediction confidence. Finding effective and efficient calibration methods for neural networks is therefore an important endeavour towards better uncertainty quantification in deep learning. In this manuscript, we introduce a novel calibration technique named expectation consis… ▽ More

    Submitted 4 August, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:443-453, 2023

  14. arXiv:2302.08933  [pdf, other

    math.ST stat.ML

    Universality laws for Gaussian mixtures in generalized linear models

    Authors: Yatin Dandi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro, Lenka Zdeborová

    Abstract: Let $(x_{i}, y_{i})_{i=1,\dots,n}$ denote independent samples from a general mixture distribution $\sum_{c\in\mathcal{C}}ρ_{c}P_{c}^{x}$, and consider the hypothesis class of generalized linear models $\hat{y} = F(Θ^{\top}x)$. In this work, we investigate the asymptotic joint statistics of the family of generalized linear estimators $(Θ_{1}, \dots, Θ_{M})$ obtained either from (a) minimizing an em… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  15. arXiv:2302.08923  [pdf, other

    math.ST cond-mat.dis-nn cs.LG stat.ML

    Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation

    Authors: Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

    Abstract: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we a… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  16. arXiv:2302.06665  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR

    Optimal Algorithms for the Inhomogeneous Spiked Wigner Model

    Authors: Aleksandr Pak, Justin Ko, Florent Krzakala

    Abstract: In this paper, we study a spiked Wigner problem with an inhomogeneous noise profile. Our aim in this problem is to recover the signal passed through an inhomogeneous low-rank matrix channel. While the information-theoretic performances are well-known, we focus on the algorithmic problem. We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem and show that its rigoro… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 17 pages, 5 figures

  17. arXiv:2302.05882  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

    Authors: Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro

    Abstract: This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying an… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  18. arXiv:2302.00375  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Bayes-optimal Learning of Deep Random Networks of Extensive-width

    Authors: Hugo Cui, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We furt… ▽ More

    Submitted 21 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6468-6521, 2023

  19. arXiv:2210.06591  [pdf, other

    math-ph cs.IT cs.LG stat.ML

    Rigorous dynamical mean field theory for stochastic gradient descent methods

    Authors: Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

    Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match th… ▽ More

    Submitted 29 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 40 pages, 4 figures

  20. arXiv:2205.13527  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

    Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r… ▽ More

    Submitted 1 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: NeurIPS camera-ready version

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages 27087--27099

  21. arXiv:2205.13303  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Gaussian Universality of Perceptrons with Random Labels

    Authors: Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová

    Abstract: While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that t… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  22. arXiv:2202.03295  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Theoretical characterization of uncertainty in high-dimensional linear classification

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampl… ▽ More

    Submitted 14 November, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Mach. Learn.: Sci. Technol. 4 025029 (2023)

  23. arXiv:2202.00293  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

    Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect… ▽ More

    Submitted 14 June, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 20 pages

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages {23244--23255)

  24. arXiv:2201.13383  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

    Authors: Bruno Loureiro, Cédric Gerbelot, Maria Refinetti, Gabriele Sicuro, Florent Krzakala

    Abstract: From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Machine Learning practice. Understanding the statistical fluctuations engendered by the different sources of randomness in prediction is therefore key to understanding robust generalisation. In this manuscript we develop a quantitative and rigorous theory for the study of fluctuations in an ensemble o… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

    Comments: 17 pages + Appendix

    Journal ref: Proceedings of the 39th International Conference on Machine Learning (ICML). PMLR 162:14283-14314, 2022

  25. Error Scaling Laws for Kernel Classification under Source and Capacity Conditions

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of kernel classification. While worst-case bounds on the decay rate of the prediction error with the number of samples are known for some classifiers, they often fail to accurately describe the learning curves of real data sets. In this work, we consider the important class of data sets satisfying the standard source and capacity conditions, comprising a number of real data… ▽ More

    Submitted 6 September, 2023; v1 submitted 29 January, 2022; originally announced January 2022.

    Journal ref: Mach. Learn.: Sci. Technol. (2023) 4 035033

  26. arXiv:2201.09986  [pdf, ps, other

    cs.IT cs.CR cs.LG stat.ML

    Bayesian Inference with Nonlinear Generative Models: Comments on Secure Learning

    Authors: Ali Bereyhi, Bruno Loureiro, Florent Krzakala, Ralf R. Müller, Hermann Schulz-Baldes

    Abstract: Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature of statistical learning. This work aims to bringing attention to these models and their secrecy potential. To this end, we invoke the replica method to derive the asymptotic normalized cross entropy in an inverse probability problem whose generative model is described by a Gaussian random… ▽ More

    Submitted 13 July, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 72 pages, 14 figures

  27. arXiv:2106.03791  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions

    Authors: Bruno Loureiro, Gabriele Sicuro, Cédric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová

    Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM… ▽ More

    Submitted 14 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: 12 pages + 34 pages of Appendix, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 34 (2021): 10144-10157

  28. arXiv:2105.15004  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: In this manuscript we consider Kernel Ridge Regression (KRR) under the Gaussian design. Exponents for the decay of the excess generalization error of KRR have been reported in various works under the assumption of power-law decay of eigenvalues of the features co-variance. These decays were, however, provided for sizeably different setups, namely in the noiseless case with constant regularization… ▽ More

    Submitted 15 December, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: 22 pages, 10 figures, 2 tables

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) vol 34 p10131--10143. J. Stat. Mech. (2022) 114004

  29. arXiv:2105.07416  [pdf, other

    q-bio.NC cond-mat.stat-mech stat.ML

    Bayesian reconstruction of memories stored in neural networks from their connectivity

    Authors: Sebastian Goldt, Florent Krzakala, Lenka Zdeborová, Nicolas Brunel

    Abstract: The advent of comprehensive synaptic wiring diagrams of large neural circuits has created the field of connectomics and given rise to a number of open research questions. One such question is whether it is possible to reconstruct the information stored in a recurrent network of neurons, given its synaptic connectivity matrix. Here, we address this question by determining when solving such an infer… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: Code available at https://github.com/sgoldt/reconstructing_memories

    Journal ref: PLOS Computational Biology 19(1): e1010813 2023

  30. arXiv:2102.11742  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

    Authors: Maria Refinetti, Sebastian Goldt, Florent Krzakala, Lenka Zdeborová

    Abstract: A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: The accompanying code for this paper is available at https://github.com/mariaref/rfvs2lnn_GMM_online

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  31. arXiv:2102.08127  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Learning curves of generic features maps for realistic datasets with a teacher-student model

    Authors: Bruno Loureiro, Cédric Gerbelot, Hugo Cui, Sebastian Goldt, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalis… ▽ More

    Submitted 14 December, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: v3: NeurIPS camera-ready

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), vol 34 p10137--18151. J. Stat. Mech. (2022) 114001

  32. arXiv:2012.06373  [pdf, other

    cs.LG cs.AI cs.AR cs.NE stat.ML

    Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment

    Authors: Julien Launay, Iacopo Poli, Kilian Müller, Gustave Pariente, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan

    Abstract: The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute n… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 6 pages, 2 figures, 1 table. Oral at the Beyond Backpropagation Workshop, NeurIPS 2020

  33. arXiv:2006.14709  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    The Gaussian equivalence of generative models for learning with shallow neural networks

    Authors: Sebastian Goldt, Bruno Loureiro, Galen Reeves, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trai… ▽ More

    Submitted 21 May, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: The accompanying code for this paper is available at https://github.com/sgoldt/gaussian-equiv-2layer

    Journal ref: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:426-471 (2021)

  34. arXiv:2006.12878  [pdf, other

    stat.ML cs.LG cs.NE

    Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

    Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala

    Abstract: Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and arc… ▽ More

    Submitted 11 December, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 23 pages, 6 figures, 10 tables. For associated code, see https://github.com/lightonai/dfa-scales-to-modern-deep-learning. Poster at NeurIPS 2020

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 9346--9360, 2020

  35. arXiv:2006.07310  [pdf, other

    stat.ML cs.LG eess.SP

    Reservoir Computing meets Recurrent Kernels and Structured Transforms

    Authors: Jonathan Dong, Ruben Ohana, Mushegh Rafayelyan, Florent Krzakala

    Abstract: Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence.… ▽ More

    Submitted 21 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 16785--16796, 2020

  36. arXiv:2006.06997  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimensio… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 9 pages, 5 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v22, page 3265--327, 2020

  37. arXiv:2006.06581  [pdf, other

    stat.ML cond-mat.dis-nn cs.IT cs.LG math.PR

    Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula)

    Authors: Cedric Gerbelot, Alia Abbara, Florent Krzakala

    Abstract: There has been a recent surge of interest in the study of asymptotic reconstruction performance in various cases of generalized linear estimation problems in the teacher-student setting, especially for the case of i.i.d standard normal matrices. Here, we go beyond these matrices, and prove an analytical formula for the reconstruction performance of convex generalized linear models with rotationall… ▽ More

    Submitted 10 November, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 53 pages, 4 figures

    Journal ref: IEEE Transactions on Information Theory, vol. 69, no. 3, pp. 1824-1852, March 2023

  38. arXiv:2006.06560  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

    Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More

    Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

  39. arXiv:2006.06098  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

    Authors: Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD c… ▽ More

    Submitted 9 November, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 8 pages + appendix, 4 figures

    Journal ref: J. Stat. Mech. 2021 124008 & NeurIPS 2020

  40. arXiv:2006.01475  [pdf, other

    cs.LG cs.ET eess.IV stat.ML

    Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

    Authors: Julien Launay, Iacopo Poli, Kilian Müller, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan

    Abstract: As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally-implemented neural networks. We rely on direct feedback… ▽ More

    Submitted 3 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: 2 pages, 1 figure

  41. arXiv:2004.01571  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG eess.SP math.ST stat.CO

    Tree-AMP: Compositional Inference with Tree Approximate Message Passing

    Authors: Antoine Baker, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

    Abstract: We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factori… ▽ More

    Submitted 11 December, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Source code available at https://github.com/sphinxteam/tramp and documentation at https://sphinxteam.github.io/tramp.docs

    Journal ref: Journal of Machine Learning Research 24 (2023) 1-89

  42. arXiv:2003.01054  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

    Authors: Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

    Abstract: Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime o… ▽ More

    Submitted 3 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 29 pages, 12 figures

  43. arXiv:2002.11544  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    The role of regularization in classification of high-dimensional noisy Gaussian mixture

    Authors: Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and th… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 8 pages + appendix, 6 figures

    Journal ref: International Conference on Machine Learning, ICML 2020

  44. arXiv:2002.09339  [pdf, other

    math.ST cs.LG math.PR stat.ML

    Generalisation error in learning with random features and the hidden manifold model

    Authors: Federica Gerace, Bruno Loureiro, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymp… ▽ More

    Submitted 20 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: v2: ICML 2020 camera-ready

    Journal ref: J. Stat. Mech. 2021 124013 & ICML 2020

  45. arXiv:2002.04372  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech

    Asymptotic errors for convex penalized linear regression beyond Gaussian matrices

    Authors: Cédric Gerbelot, Alia Abbara, Florent Krzakala

    Abstract: We consider the problem of learning a coefficient vector $x_{0}$ in $R^{N}$ from noisy linear observations $y=Fx_{0}+w$ in $R^{M}$ in the high dimensional limit $M,N$ to infinity with $α=M/N$ fixed. We provide a rigorous derivation of an explicit formula -- first conjectured using heuristic methods from statistical physics -- for the asymptotic mean squared error obtained by penalized convex regre… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 31 pages, 2 figures

  46. arXiv:1912.02729  [pdf, ps, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG stat.ML

    Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

    Authors: Alia Abbara, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

    Abstract: Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher… ▽ More

    Submitted 15 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 15 + 10 pages, v2 revised and accepted at MSML

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:27-54, 2020

  47. arXiv:1912.02008  [pdf, other

    math.ST cond-mat.dis-nn cs.LG eess.SP stat.ML

    Exact asymptotics for phase retrieval and compressed sensing with random generative priors

    Authors: Benjamin Aubin, Bruno Loureiro, Antoine Baker, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the p… ▽ More

    Submitted 12 June, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: 13+3 pages, 7 figures, v2 revised and accepted at MSML

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:55-73, 2020

  48. arXiv:1909.11500  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Modelling the influence of data structure on learning in neural networks: the hidden manifold model

    Authors: Sebastian Goldt, Marc Mézard, Florent Krzakala, Lenka Zdeborová

    Abstract: Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterised by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model trainin… ▽ More

    Submitted 3 December, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Journal ref: Physical Review X, Vol. 10, No. 4 (2020)

  49. arXiv:1907.08226  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborová

    Abstract: Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones. Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model.… ▽ More

    Submitted 20 January, 2020; v1 submitted 18 July, 2019; originally announced July 2019.

    Comments: 9 pages, 4 figures + appendix. Appears in Proceedings of the Advances in Neural Information Processing Systems 2019 (NeurIPS 2019)

    Journal ref: Advances in Neural Information Processing Systems, pp. 8676-8686. 2019

  50. arXiv:1906.08632  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

    Authors: Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

    Abstract: Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynam… ▽ More

    Submitted 27 October, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 9 pages + references + supplemental material. Oral presentation at NeurIPS 2019. arXiv admin note: substantial text overlap with arXiv:1901.09085

    Journal ref: J. Stat. Mech. 2020 124010 & NeurIPS 2019