Skip to main content

Showing 1–50 of 67 results for author: Zdeborova, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.03522  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions

    Authors: Christian Keup, Lenka Zdeborová

    Abstract: This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2405.10763  [pdf, other

    cond-mat.dis-nn cs.DM math.OC stat.CO

    Integer Traffic Assignment Problem: Algorithms and Insights on Random Graphs

    Authors: Rayan Harfouche, Giovanni Piccioli, Lenka Zdeborová

    Abstract: Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 37 pages, 15 figures

  3. arXiv:2403.04234  [pdf, other

    stat.ML cs.LG

    Fundamental limits of Non-Linear Low-Rank Matrix Estimation

    Authors: Pierre Mergny, Justin Ko, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 42 pages, 2 figures

  4. arXiv:2402.13622  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.LG

    Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

    Authors: Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, ta… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  5. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  6. arXiv:2402.03220  [pdf, other

    stat.ML cs.LG

    The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

    Authors: Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

    Abstract: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2024

  7. arXiv:2310.03575  [pdf, other

    stat.ML cs.LG

    Analysis of learning a flow-based generative model from limited sample complexity

    Authors: Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, Lenka Zdeborová

    Abstract: We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from th… ▽ More

    Submitted 25 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  8. arXiv:2306.02729  [pdf, other

    cs.LG stat.ML

    Gibbs Sampling the Posterior of Neural Networks

    Authors: Giovanni Piccioli, Emanuele Troiani, Lenka Zdeborová

    Abstract: In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. For small models, the Gibbs sampler attains similar performances as the state-of-the-art Markov chain Monte Carlo (MCMC… ▽ More

    Submitted 11 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  9. arXiv:2305.11041  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    High-dimensional Asymptotics of Denoising Autoencoders

    Authors: Hugo Cui, Lenka Zdeborová

    Abstract: We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error.… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  10. arXiv:2303.09995  [pdf, other

    cond-mat.dis-nn cs.SI stat.ML

    Neural-prior stochastic block model

    Authors: O. Duranthon, L. Zdeborová

    Abstract: The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works i… ▽ More

    Submitted 6 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Journal ref: Mach. Learn.: Sci. Technol. 4 035017 (2023)

  11. arXiv:2303.02644  [pdf, other

    cs.LG stat.ML

    Expectation consistency for calibration of neural networks

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite their incredible performance, it is well reported that deep neural networks tend to be overoptimistic about their prediction confidence. Finding effective and efficient calibration methods for neural networks is therefore an important endeavour towards better uncertainty quantification in deep learning. In this manuscript, we introduce a novel calibration technique named expectation consis… ▽ More

    Submitted 4 August, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:443-453, 2023

  12. arXiv:2302.08933  [pdf, other

    math.ST stat.ML

    Universality laws for Gaussian mixtures in generalized linear models

    Authors: Yatin Dandi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro, Lenka Zdeborová

    Abstract: Let $(x_{i}, y_{i})_{i=1,\dots,n}$ denote independent samples from a general mixture distribution $\sum_{c\in\mathcal{C}}ρ_{c}P_{c}^{x}$, and consider the hypothesis class of generalized linear models $\hat{y} = F(Θ^{\top}x)$. In this work, we investigate the asymptotic joint statistics of the family of generalized linear estimators $(Θ_{1}, \dots, Θ_{M})$ obtained either from (a) minimizing an em… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  13. arXiv:2302.00375  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Bayes-optimal Learning of Deep Random Networks of Extensive-width

    Authors: Hugo Cui, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We furt… ▽ More

    Submitted 21 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6468-6521, 2023

  14. arXiv:2210.06591  [pdf, other

    math-ph cs.IT cs.LG stat.ML

    Rigorous dynamical mean field theory for stochastic gradient descent methods

    Authors: Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

    Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match th… ▽ More

    Submitted 29 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 40 pages, 4 figures

  15. arXiv:2208.06488  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IR math.PR stat.CO

    The planted XY model: thermodynamics and inference

    Authors: Siyu Chen, Guanhao Huang, Giovanni Piccioli, Lenka Zdeborová

    Abstract: In this paper we study a fully connected planted spin glass named the planted XY model. Motivation for studying this system comes both from the spin glass field and the one of statistical inference where it models the angular synchronization problem. We derive the replica symmetric (RS) phase diagram in the temperature, ferromagnetic bias plane using the approximate message passing (AMP) algorithm… ▽ More

    Submitted 11 January, 2024; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: 29 pages, 8 figures

    Journal ref: Phys. Rev. E 106, 054115 (2022)

  16. arXiv:2205.13527  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

    Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r… ▽ More

    Submitted 1 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: NeurIPS camera-ready version

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages 27087--27099

  17. arXiv:2205.13303  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Gaussian Universality of Perceptrons with Random Labels

    Authors: Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová

    Abstract: While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that t… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  18. arXiv:2203.12094  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves for the multi-class teacher-student perceptron

    Authors: Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová

    Abstract: One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machin… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 14 pages + appendix

    Journal ref: Machine Learning: Science and Technology 4 015019 (2022)

  19. arXiv:2202.03295  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Theoretical characterization of uncertainty in high-dimensional linear classification

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampl… ▽ More

    Submitted 14 November, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Mach. Learn.: Sci. Technol. 4 025029 (2023)

  20. arXiv:2202.00293  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

    Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect… ▽ More

    Submitted 14 June, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 20 pages

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages {23244--23255)

  21. Error Scaling Laws for Kernel Classification under Source and Capacity Conditions

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of kernel classification. While worst-case bounds on the decay rate of the prediction error with the number of samples are known for some classifiers, they often fail to accurately describe the learning curves of real data sets. In this work, we consider the important class of data sets satisfying the standard source and capacity conditions, comprising a number of real data… ▽ More

    Submitted 6 September, 2023; v1 submitted 29 January, 2022; originally announced January 2022.

    Journal ref: Mach. Learn.: Sci. Technol. (2023) 4 035033

  22. arXiv:2106.03791  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions

    Authors: Bruno Loureiro, Gabriele Sicuro, Cédric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová

    Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM… ▽ More

    Submitted 14 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: 12 pages + 34 pages of Appendix, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 34 (2021): 10144-10157

  23. arXiv:2105.15004  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: In this manuscript we consider Kernel Ridge Regression (KRR) under the Gaussian design. Exponents for the decay of the excess generalization error of KRR have been reported in various works under the assumption of power-law decay of eigenvalues of the features co-variance. These decays were, however, provided for sizeably different setups, namely in the noiseless case with constant regularization… ▽ More

    Submitted 15 December, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: 22 pages, 10 figures, 2 tables

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) vol 34 p10131--10143. J. Stat. Mech. (2022) 114004

  24. arXiv:2105.07416  [pdf, other

    q-bio.NC cond-mat.stat-mech stat.ML

    Bayesian reconstruction of memories stored in neural networks from their connectivity

    Authors: Sebastian Goldt, Florent Krzakala, Lenka Zdeborová, Nicolas Brunel

    Abstract: The advent of comprehensive synaptic wiring diagrams of large neural circuits has created the field of connectomics and given rise to a number of open research questions. One such question is whether it is possible to reconstruct the information stored in a recurrent network of neurons, given its synaptic connectivity matrix. Here, we address this question by determining when solving such an infer… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: Code available at https://github.com/sgoldt/reconstructing_memories

    Journal ref: PLOS Computational Biology 19(1): e1010813 2023

  25. arXiv:2103.04902  [pdf, other

    cond-mat.dis-nn cs.LG math.ST stat.ML

    Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

    Authors: Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototy… ▽ More

    Submitted 13 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: 28 pages, 11 figures

    Journal ref: Mach. Learn.: Sci. Technol. 2 035029 (2021)

  26. arXiv:2102.11742  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

    Authors: Maria Refinetti, Sebastian Goldt, Florent Krzakala, Lenka Zdeborová

    Abstract: A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: The accompanying code for this paper is available at https://github.com/mariaref/rfvs2lnn_GMM_online

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  27. arXiv:2102.08127  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Learning curves of generic features maps for realistic datasets with a teacher-student model

    Authors: Bruno Loureiro, Cédric Gerbelot, Hugo Cui, Sebastian Goldt, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalis… ▽ More

    Submitted 14 December, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: v3: NeurIPS camera-ready

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), vol 34 p10137--18151. J. Stat. Mech. (2022) 114001

  28. arXiv:2006.15459  [pdf, other

    cs.LG stat.ML

    Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

    Authors: Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

    Abstract: We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width $m$ is larger than the input dimension $d$. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width $m^*\le m$. We describe… ▽ More

    Submitted 18 August, 2020; v1 submitted 27 June, 2020; originally announced June 2020.

    Comments: 10 pages, 4 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v33, page 13445--13455, 2020

  29. arXiv:2006.14709  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    The Gaussian equivalence of generative models for learning with shallow neural networks

    Authors: Sebastian Goldt, Bruno Loureiro, Galen Reeves, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trai… ▽ More

    Submitted 21 May, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: The accompanying code for this paper is available at https://github.com/sgoldt/gaussian-equiv-2layer

    Journal ref: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:426-471 (2021)

  30. arXiv:2006.06997  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimensio… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 9 pages, 5 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v22, page 3265--327, 2020

  31. arXiv:2006.06560  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

    Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More

    Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

  32. arXiv:2006.06098  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

    Authors: Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD c… ▽ More

    Submitted 9 November, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 8 pages + appendix, 4 figures

    Journal ref: J. Stat. Mech. 2021 124008 & NeurIPS 2020

  33. arXiv:2004.01571  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG eess.SP math.ST stat.CO

    Tree-AMP: Compositional Inference with Tree Approximate Message Passing

    Authors: Antoine Baker, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

    Abstract: We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factori… ▽ More

    Submitted 11 December, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Source code available at https://github.com/sphinxteam/tramp and documentation at https://sphinxteam.github.io/tramp.docs

    Journal ref: Journal of Machine Learning Research 24 (2023) 1-89

  34. arXiv:2002.11544  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    The role of regularization in classification of high-dimensional noisy Gaussian mixture

    Authors: Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and th… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 8 pages + appendix, 6 figures

    Journal ref: International Conference on Machine Learning, ICML 2020

  35. arXiv:2002.09339  [pdf, other

    math.ST cs.LG math.PR stat.ML

    Generalisation error in learning with random features and the hidden manifold model

    Authors: Federica Gerace, Bruno Loureiro, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymp… ▽ More

    Submitted 20 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: v2: ICML 2020 camera-ready

    Journal ref: J. Stat. Mech. 2021 124013 & ICML 2020

  36. arXiv:2001.00479  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Thresholds of descending algorithms in inference problems

    Authors: Stefano Sarao Mannelli, Lenka Zdeborova

    Abstract: We review recent works on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem. Using methods and insights from the physics of glassy systems, these works showed how to understand quantitatively and qualitatively the performance of gradient-based algorithms. Here we review the key results and their interpretation in non-technical terms accessible to a… ▽ More

    Submitted 4 January, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: 8 pages, 4 figures

    Journal ref: J. Stat. Mech. (2020) 034004

  37. arXiv:1912.02729  [pdf, ps, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG stat.ML

    Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

    Authors: Alia Abbara, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

    Abstract: Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher… ▽ More

    Submitted 15 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 15 + 10 pages, v2 revised and accepted at MSML

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:27-54, 2020

  38. arXiv:1912.02008  [pdf, other

    math.ST cond-mat.dis-nn cs.LG eess.SP stat.ML

    Exact asymptotics for phase retrieval and compressed sensing with random generative priors

    Authors: Benjamin Aubin, Bruno Loureiro, Antoine Baker, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the p… ▽ More

    Submitted 12 June, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: 13+3 pages, 7 figures, v2 revised and accepted at MSML

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:55-73, 2020

  39. arXiv:1909.11500  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Modelling the influence of data structure on learning in neural networks: the hidden manifold model

    Authors: Sebastian Goldt, Marc Mézard, Florent Krzakala, Lenka Zdeborová

    Abstract: Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterised by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model trainin… ▽ More

    Submitted 3 December, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Journal ref: Physical Review X, Vol. 10, No. 4 (2020)

  40. arXiv:1907.08226  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborová

    Abstract: Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones. Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model.… ▽ More

    Submitted 20 January, 2020; v1 submitted 18 July, 2019; originally announced July 2019.

    Comments: 9 pages, 4 figures + appendix. Appears in Proceedings of the Advances in Neural Information Processing Systems 2019 (NeurIPS 2019)

    Journal ref: Advances in Neural Information Processing Systems, pp. 8676-8686. 2019

  41. arXiv:1906.08632  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

    Authors: Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

    Abstract: Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynam… ▽ More

    Submitted 27 October, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 9 pages + references + supplemental material. Oral presentation at NeurIPS 2019. arXiv admin note: substantial text overlap with arXiv:1901.09085

    Journal ref: J. Stat. Mech. 2020 124010 & NeurIPS 2019

  42. arXiv:1906.04735  [pdf, other

    stat.ML cs.IT cs.LG eess.SP math.ST

    On the Universality of Noiseless Linear Estimation with Respect to the Measurement Matrix

    Authors: Alia Abbara, Antoine Baker, Florent Krzakala, Lenka Zdeborová

    Abstract: In a noiseless linear estimation problem, one aims to reconstruct a vector x* from the knowledge of its linear projections y=Phi x*. There have been many theoretical works concentrating on the case where the matrix Phi is a random i.i.d. one, but a number of heuristic evidence suggests that many of these results are universal and extend well beyond this restricted case. Here we revisit this proble… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: 13 pages, 4 figures

    Journal ref: Journal of Physics A: Mathematical and Theoretical (2019)

  43. arXiv:1905.12385  [pdf, other

    math.ST cs.LG eess.SP math.PR stat.ML

    The spiked matrix model with generative priors

    Authors: Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová

    Abstract: Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly p… ▽ More

    Submitted 30 May, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: 12 + 56, 8 figures, v2 lighter jpeg figures

    Journal ref: Advances in Neural Information Processing Systems, pp. 8364-8375. 2019

  44. arXiv:1902.00139  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models

    Authors: Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model. We study a loss function that is the negative log-likelihood of the model. We analyse the number of local minima at a fixed distance from the signal/spike with the Kac-Rice formula, and locate trivialization of th… ▽ More

    Submitted 20 January, 2020; v1 submitted 31 January, 2019; originally announced February 2019.

    Comments: 12 pages + appendix, 10 figures. Appears in Proceedings of the International Conference on Machine Learning (ICML 2019)

    Journal ref: International Conference on Machine Learning, 4333-4342 (ICML 2019)

  45. arXiv:1901.09085  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Generalisation dynamics of online learning in over-parameterised neural networks

    Authors: Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

    Abstract: Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data. Here we study the generalisation dynamics of two-layer neural networks in a teacher-student setup, where one network, the student, is trained using stochastic gradient descent (SGD) on data generated by another network, called the teacher. We show ho… ▽ More

    Submitted 25 January, 2019; originally announced January 2019.

    Comments: 25 pages, 13 figures

    Journal ref: Presented at the ICML 2019 Workshop on Theoretical Physics for Deep Learning

  46. arXiv:1812.09066  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Gradient-descent-based algorithms and their stochastic versions have widespread applications in machine learning and statistical inference. In this work we perform an analytic study of the performances of one of them, the Langevin algorithm, in the context of noisy high-dimensional inference. We employ the Langevin algorithm to sample the posterior probability measure for the spiked matrix-tensor… ▽ More

    Submitted 13 January, 2020; v1 submitted 21 December, 2018; originally announced December 2018.

    Comments: 11 pages and 5 figures + appendix

    Journal ref: Phys. Rev. X 10, 011057 (2020)

  47. arXiv:1809.06304  [pdf, other

    stat.ML cs.IT cs.LG

    Approximate message-passing for convex optimization with non-separable penalties

    Authors: Andre Manoel, Florent Krzakala, Gaël Varoquaux, Bertrand Thirion, Lenka Zdeborová

    Abstract: We introduce an iterative optimization scheme for convex objectives consisting of a linear loss and a non-separable penalty, based on the expectation-consistent approximation and the vector approximate message-passing (VAMP) algorithm. Specifically, the penalties we approach are convex on a linear transformation of the variable to be determined, a notable example being total variation (TV). We des… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: 18 pages, 6 figures

  48. arXiv:1807.01296  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IT math.ST stat.ML

    Approximate Survey Propagation for Statistical Inference

    Authors: Fabrizio Antenucci, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Approximate message passing algorithm enjoyed considerable attention in the last decade. In this paper we introduce a variant of the AMP algorithm that takes into account glassy nature of the system under consideration. We coin this algorithm as the approximate survey propagation (ASP) and derive it for a class of low-rank matrix estimation problems. We derive the state evolution for the ASP algor… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

    Comments: 37 pages, 14 figures

    Journal ref: J. Stat. Mech. (2019) 023401

  49. arXiv:1806.05451  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech physics.comp-ph stat.ML

    The committee machine: Computational to statistical gaps in learning a two-layers neural network

    Authors: Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

    Abstract: Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of… ▽ More

    Submitted 29 February, 2024; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

    Journal ref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018

  50. arXiv:1805.09785  [pdf, other

    cs.LG cond-mat.dis-nn cs.IT stat.ML

    Entropy and mutual information in models of deep neural networks

    Authors: Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

    Abstract: We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is kno… ▽ More

    Submitted 29 October, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

    Journal ref: J. Stat. Mech. (2019) 124014. & NeurIPS 2018