Skip to main content

Showing 1–50 of 150 results for author: Zdeborova, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03522  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions

    Authors: Christian Keup, Lenka Zdeborová

    Abstract: This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2405.15480  [pdf, other

    cs.LG cond-mat.dis-nn cs.CC

    Fundamental limits of weak learnability in high-dimensional multi-index models

    Authors: Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

    Abstract: Multi-index models -- functions which only depend on the covariates through a non-linear transformation of their projection on a subspace -- are a useful benchmark for investigating feature learning with neural networks. This paper examines the theoretical boundaries of learnability in this hypothesis class, focusing particularly on the minimum sample complexity required for weakly recovering thei… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.10763  [pdf, other

    cond-mat.dis-nn cs.DM math.OC stat.CO

    Integer Traffic Assignment Problem: Algorithms and Insights on Random Graphs

    Authors: Rayan Harfouche, Giovanni Piccioli, Lenka Zdeborová

    Abstract: Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 37 pages, 15 figures

  4. arXiv:2403.04234  [pdf, other

    stat.ML cs.LG

    Fundamental limits of Non-Linear Low-Rank Matrix Estimation

    Authors: Pierre Mergny, Justin Ko, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 42 pages, 2 figures

  5. arXiv:2402.13622  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.LG

    Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

    Authors: Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, ta… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  6. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  7. arXiv:2402.03902  [pdf, other

    cs.LG

    A phase transition between positional and semantic learning in a solvable model of dot-product attention

    Authors: Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborová

    Abstract: We investigate how a dot-product attention layer learns a positional attention matrix (with tokens attending to each other based on their respective positions) and a semantic attention matrix (with tokens attending to each other based on their meaning). For an algorithmic task, we experimentally show how the same simple architecture can learn to implement a solution using either the positional or… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  8. arXiv:2402.03818  [pdf, other

    cs.LG cond-mat.dis-nn

    Asymptotic generalization error of a single-layer graph convolutional network

    Authors: O. Duranthon, L. Zdeborová

    Abstract: While graph convolutional networks show great practical promises, the theoretical understanding of their generalization properties as a function of the number of samples is still in its infancy compared to the more broadly studied case of supervised fully connected neural networks. In this article, we predict the performances of a single-layer graph convolutional network (GCN) trained on data prod… ▽ More

    Submitted 20 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  9. arXiv:2402.03220  [pdf, other

    stat.ML cs.LG

    The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

    Authors: Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

    Abstract: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2024

  10. arXiv:2310.03575  [pdf, other

    stat.ML cs.LG

    Analysis of learning a flow-based generative model from limited sample complexity

    Authors: Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, Lenka Zdeborová

    Abstract: We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from th… ▽ More

    Submitted 25 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  11. arXiv:2308.14085  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Sampling with flows, diffusion and autoregressive neural networks: A spin-glass perspective

    Authors: Davide Ghio, Yatin Dandi, Florent Krzakala, Lenka Zdeborová

    Abstract: Recent years witnessed the development of powerful generative models based on flows, diffusion or autoregressive neural networks, achieving remarkable success in generating data from examples with applications in a broad range of areas. A theoretical analysis of the performance and understanding of the limitations of these methods remain, however, challenging. In this paper, we undertake a step in… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: 39 pages, 12 figures

  12. arXiv:2306.07948  [pdf, other

    cs.SI cs.LG

    Optimal Inference in Contextual Stochastic Block Models

    Authors: O. Duranthon, L. Zdeborová

    Abstract: The contextual stochastic block model (cSBM) was proposed for unsupervised community detection on attributed graphs where both the graph and the high-dimensional node information correlate with node labels. In the context of machine learning on graphs, the cSBM has been widely used as a synthetic dataset for evaluating the performance of graph-neural networks (GNNs) for semi-supervised node classi… ▽ More

    Submitted 5 March, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Journal ref: TMLR 2024

  13. arXiv:2306.02729  [pdf, other

    cs.LG stat.ML

    Gibbs Sampling the Posterior of Neural Networks

    Authors: Giovanni Piccioli, Emanuele Troiani, Lenka Zdeborová

    Abstract: In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. For small models, the Gibbs sampler attains similar performances as the state-of-the-art Markov chain Monte Carlo (MCMC… ▽ More

    Submitted 11 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  14. arXiv:2305.11041  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    High-dimensional Asymptotics of Denoising Autoencoders

    Authors: Hugo Cui, Lenka Zdeborová

    Abstract: We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error.… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  15. arXiv:2304.12127  [pdf, other

    cs.IT cond-mat.dis-nn cond-mat.stat-mech

    Compressed sensing with l0-norm: statistical physics analysis and algorithms for signal recovery

    Authors: D. Barbier, C Lucibello, L. Saglietti, F. Krzakala, L. Zdeborova

    Abstract: Noiseless compressive sensing is a protocol that enables undersampling and later recovery of a signal without loss of information. This compression is possible because the signal is usually sufficiently sparse in a given basis. Currently, the algorithm offering the best tradeoff between compression rate, robustness, and speed for compressive sensing is the LASSO (l1-norm bias) algorithm. However,… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of ITW 2023

  16. arXiv:2303.09995  [pdf, other

    cond-mat.dis-nn cs.SI stat.ML

    Neural-prior stochastic block model

    Authors: O. Duranthon, L. Zdeborová

    Abstract: The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works i… ▽ More

    Submitted 6 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Journal ref: Mach. Learn.: Sci. Technol. 4 035017 (2023)

  17. arXiv:2303.05237  [pdf, other

    cond-mat.dis-nn cs.IT

    Statistical mechanics of the maximum-average submatrix problem

    Authors: Vittorio Erba, Florent Krzakala, Rodrigo Pérez, Lenka Zdeborová

    Abstract: We study the maximum-average submatrix problem, in which given an $N \times N$ matrix $J$ one needs to find the $k \times k$ submatrix with the largest average of entries. We study the problem for random matrices $J$ whose entries are i.i.d. random variables by map** it to a variant of the Sherrington-Kirkpatrick spin-glass model at fixed magnetization. We characterize analytically the phase dia… ▽ More

    Submitted 21 September, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Journal ref: J. Stat. Mech. (2024) 013403

  18. arXiv:2303.02644  [pdf, other

    cs.LG stat.ML

    Expectation consistency for calibration of neural networks

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite their incredible performance, it is well reported that deep neural networks tend to be overoptimistic about their prediction confidence. Finding effective and efficient calibration methods for neural networks is therefore an important endeavour towards better uncertainty quantification in deep learning. In this manuscript, we introduce a novel calibration technique named expectation consis… ▽ More

    Submitted 4 August, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:443-453, 2023

  19. arXiv:2302.00375  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Bayes-optimal Learning of Deep Random Networks of Extensive-width

    Authors: Hugo Cui, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We furt… ▽ More

    Submitted 21 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6468-6521, 2023

  20. arXiv:2210.12760  [pdf, ps, other

    cs.LG

    On double-descent in uncertainty quantification in overparametrized models

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Uncertainty quantification is a central challenge in reliable and trustworthy machine learning. Naive measures such as last-layer scores are well-known to yield overconfident estimates in the context of overparametrized neural networks. Several methods, ranging from temperature scaling to different Bayesian treatments of neural networks, have been proposed to mitigate overconfidence, most often su… ▽ More

    Submitted 23 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (2023), PMLR 206:7089-7125

  21. arXiv:2210.08312  [pdf, other

    cond-mat.dis-nn cs.CC math.PR math.ST

    Disordered Systems Insights on Computational Hardness

    Authors: David Gamarnik, Cristopher Moore, Lenka Zdeborová

    Abstract: In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as t… ▽ More

    Submitted 18 October, 2022; v1 submitted 15 October, 2022; originally announced October 2022.

    Comments: 42 pages

    Journal ref: J. Stat. Mech. (2022) 114015

  22. arXiv:2210.06591  [pdf, other

    math-ph cs.IT cs.LG stat.ML

    Rigorous dynamical mean field theory for stochastic gradient descent methods

    Authors: Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

    Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match th… ▽ More

    Submitted 29 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 40 pages, 4 figures

  23. arXiv:2209.03423  [pdf, other

    cond-mat.dis-nn cs.DM cs.IT math.PR math.ST

    Planted matching problems on random hypergraphs

    Authors: Urte Adomaityte, Anshul Toshniwal, Gabriele Sicuro, Lenka Zdeborová

    Abstract: We consider the problem of inferring a matching hidden in a weighted random $k$-hypergraph. We assume that the hyperedges' weights are random and distributed according to two different densities conditioning on the fact that they belong to the hidden matching, or not. We show that, for $k>2$ and in the large graph size limit, an algorithmic first order transition in the signal strength separates a… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: 13 pages, 12 figures

    Journal ref: Phys. Rev. E 106, 054302 (2022)

  24. arXiv:2208.06488  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IR math.PR stat.CO

    The planted XY model: thermodynamics and inference

    Authors: Siyu Chen, Guanhao Huang, Giovanni Piccioli, Lenka Zdeborová

    Abstract: In this paper we study a fully connected planted spin glass named the planted XY model. Motivation for studying this system comes both from the spin glass field and the one of statistical inference where it models the angular synchronization problem. We derive the replica symmetric (RS) phase diagram in the temperature, ferromagnetic bias plane using the approximate message passing (AMP) algorithm… ▽ More

    Submitted 11 January, 2024; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: 29 pages, 8 figures

    Journal ref: Phys. Rev. E 106, 054115 (2022)

  25. arXiv:2205.13527  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

    Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r… ▽ More

    Submitted 1 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: NeurIPS camera-ready version

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages 27087--27099

  26. arXiv:2205.13503  [pdf, other

    cs.IT

    Multi-layer State Evolution Under Random Convolutional Design

    Authors: Max Daniels, Cédric Gerbelot, Florent Krzakala, Lenka Zdeborová

    Abstract: Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d. weights, this was achieved by the multi-layer approximate message (ML-AMP) algorithm v… ▽ More

    Submitted 12 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted to NeurIPS 2022

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 52, pages 7089--7102

  27. arXiv:2205.13303  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Gaussian Universality of Perceptrons with Random Labels

    Authors: Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová

    Abstract: While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that t… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  28. arXiv:2203.12094  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves for the multi-class teacher-student perceptron

    Authors: Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová

    Abstract: One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machin… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 14 pages + appendix

    Journal ref: Machine Learning: Science and Technology 4 015019 (2022)

  29. arXiv:2203.07752  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IT math.PR

    Optimal denoising of rotationally invariant rectangular matrices

    Authors: Emanuele Troiani, Vittorio Erba, Florent Krzakala, Antoine Maillard, Lenka Zdeborová

    Abstract: In this manuscript we consider denoising of large rectangular matrices: given a noisy observation of a signal matrix, what is the best way of recovering the signal matrix itself? For Gaussian noise and rotationally-invariant signal priors, we completely characterize the optimal denoiser and its performance in the high-dimensional limit, in which the size of the signal matrix goes to infinity with… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Journal ref: Proceedings of Mathematical and Scientific Machine Learning (MSML), PMLR 190:97-112, 2022

  30. arXiv:2202.10379  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.DM math.PR

    (Dis)assortative Partitions on Random Regular Graphs

    Authors: Freya Behrens, Gabriel Arpino, Yaroslav Kivva, Lenka Zdeborová

    Abstract: We study the problem of assortative and disassortative partitions on random $d$-regular graphs. Nodes in the graph are partitioned into two non-empty groups. In the assortative partition every node requires at least $H$ of their neighbors to be in their own group. In the disassortative partition they require less than $H$ neighbors to be in their own group. Using the cavity method based on analysi… ▽ More

    Submitted 2 May, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: 21 pages; Corrected usage of the world "planted" in Section 4

    Journal ref: J. Phys. A: Math. Theor. 55 395004 (2022)

  31. arXiv:2202.03295  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Theoretical characterization of uncertainty in high-dimensional linear classification

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampl… ▽ More

    Submitted 14 November, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Mach. Learn.: Sci. Technol. 4 025029 (2023)

  32. arXiv:2202.00293  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

    Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect… ▽ More

    Submitted 14 June, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 20 pages

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages {23244--23255)

  33. Error Scaling Laws for Kernel Classification under Source and Capacity Conditions

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of kernel classification. While worst-case bounds on the decay rate of the prediction error with the number of samples are known for some classifiers, they often fail to accurately describe the learning curves of real data sets. In this work, we consider the important class of data sets satisfying the standard source and capacity conditions, comprising a number of real data… ▽ More

    Submitted 6 September, 2023; v1 submitted 29 January, 2022; originally announced January 2022.

    Journal ref: Mach. Learn.: Sci. Technol. (2023) 4 035033

  34. arXiv:2112.13079  [pdf, other

    cs.IT cond-mat.dis-nn cs.DM math.PR

    Aligning random graphs with a sub-tree similarity message-passing algorithm

    Authors: Giovanni Piccioli, Guilhem Semerjian, Gabriele Sicuro, Lenka Zdeborová

    Abstract: The problem of aligning Erdös-Rényi random graphs is a noisy, average-case version of the graph isomorphism problem, in which a pair of correlated random graphs is observed through a random permutation of their vertices. We study a polynomial time message-passing algorithm devised to solve the inference problem of partially recovering the hidden permutation, in the sparse regime with constant aver… ▽ More

    Submitted 4 May, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: 36 pages, 14 figures, submitted to Journal of Statistical Mechanics: Theory and Experiment. Corrected typos. Modified Figure 1 for clarity. Added references' titles in bibliography. Added definition of "quasi-aligned". Added clarifications about the significance of Nishimori experiments

    Journal ref: J. Stat. Mech. (2022) 063401

  35. arXiv:2110.08775  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IT math.PR

    Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

    Authors: Antoine Maillard, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise f… ▽ More

    Submitted 8 June, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: 30 pages (main text), 25 pages of references and appendices. v2: Adding clarifications and a new result to derive the optimal denoising estimator from the asymptotic free energy. v3: corrections to match the published version

    Journal ref: J. Stat. Mech. (2022) 083301

  36. arXiv:2106.05418  [pdf, other

    cs.LG cond-mat.dis-nn

    Probing transfer learning with a model of synthetic correlated datasets

    Authors: Federica Gerace, Luca Saglietti, Stefano Sarao Mannelli, Andrew Saxe, Lenka Zdeborová

    Abstract: Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable… ▽ More

    Submitted 2 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Journal ref: Machine Learning: Science and Technology 3.1 (2022): 015030

  37. arXiv:2106.03791  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions

    Authors: Bruno Loureiro, Gabriele Sicuro, Cédric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová

    Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM… ▽ More

    Submitted 14 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: 12 pages + 34 pages of Appendix, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 34 (2021): 10144-10157

  38. arXiv:2105.15004  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: In this manuscript we consider Kernel Ridge Regression (KRR) under the Gaussian design. Exponents for the decay of the excess generalization error of KRR have been reported in various works under the assumption of power-law decay of eigenvalues of the features co-variance. These decays were, however, provided for sizeably different setups, namely in the noiseless case with constant regularization… ▽ More

    Submitted 15 December, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: 22 pages, 10 figures, 2 tables

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) vol 34 p10131--10143. J. Stat. Mech. (2022) 114004

  39. arXiv:2103.04902  [pdf, other

    cond-mat.dis-nn cs.LG math.ST stat.ML

    Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

    Authors: Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototy… ▽ More

    Submitted 13 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: 28 pages, 11 figures

    Journal ref: Mach. Learn.: Sci. Technol. 2 035029 (2021)

  40. arXiv:2102.11742  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

    Authors: Maria Refinetti, Sebastian Goldt, Florent Krzakala, Lenka Zdeborová

    Abstract: A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: The accompanying code for this paper is available at https://github.com/mariaref/rfvs2lnn_GMM_online

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  41. arXiv:2102.08127  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Learning curves of generic features maps for realistic datasets with a teacher-student model

    Authors: Bruno Loureiro, Cédric Gerbelot, Hugo Cui, Sebastian Goldt, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalis… ▽ More

    Submitted 14 December, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: v3: NeurIPS camera-ready

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), vol 34 p10137--18151. J. Stat. Mech. (2022) 114001

  42. arXiv:2012.04524  [pdf, other

    cs.IT cond-mat.dis-nn

    Construction of optimal spectral methods in phase retrieval

    Authors: Antoine Maillard, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider the phase retrieval problem, in which the observer wishes to recover a $n$-dimensional real or complex signal $\mathbf{X}^\star$ from the (possibly noisy) observation of $|\mathbfΦ \mathbf{X}^\star|$, in which $\mathbfΦ$ is a matrix of size $m \times n$. We consider a \emph{high-dimensional} setting where $n,m \to \infty$ with $m/n = \mathcal{O}(1)$, and a large class of (possibly corr… ▽ More

    Submitted 14 October, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: 14 pages + references and appendix. v2: Version updated to match the one accepted at MSML 2021. v3: Adding a reference to a previous work mentioning marginal stability and its connection to Bayes-optimality

    Journal ref: Proceedings of Machine Learning Research vol 145:1-28, 2021 2nd Annual Conference on Mathematical and Scientific Machine Learning (MSML 21)

  43. arXiv:2012.00194  [pdf, ps, other

    cs.LG cond-mat.dis-nn cs.NE

    Solvable Model for Inheriting the Regularization through Knowledge Distillation

    Authors: Luca Saglietti, Lenka Zdeborová

    Abstract: In recent years the empirical success of transfer learning with neural networks has stimulated an increasing interest in obtaining a theoretical understanding of its core properties. Knowledge distillation where a smaller neural network is trained using the outputs of a larger neural network is a particularly interesting case of transfer learning. In the present work, we introduce a statistical ph… ▽ More

    Submitted 2 December, 2020; v1 submitted 30 November, 2020; originally announced December 2020.

    Journal ref: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:809-846, 2022

  44. arXiv:2010.13700  [pdf, other

    cond-mat.dis-nn cs.DM math.PR

    The planted $k$-factor problem

    Authors: Gabriele Sicuro, Lenka Zdeborová

    Abstract: We consider the problem of recovering an unknown $k$-factor, hidden in a weighted random graph. For $k=1$ this is the planted matching problem, while the $k=2$ case is closely related to the planted travelling salesman problem. The inference problem is solved by exploiting the information arising from the use of two different distributions for the weights on the edges inside and outside the plante… ▽ More

    Submitted 8 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: 21 pages, 4 figures

    Journal ref: J. Phys. A: Math. Theor. 54 175002 (2021)

  45. arXiv:2009.09422  [pdf, other

    q-bio.PE cond-mat.stat-mech cs.AI cs.LG

    Epidemic mitigation by statistical inference from contact tracing data

    Authors: Antoine Baker, Indaco Biazzo, Alfredo Braunstein, Giovanni Catania, Luca Dall'Asta, Alessandro Ingrosso, Florent Krzakala, Fabio Mazza, Marc Mézard, Anna Paola Muntoni, Maria Refinetti, Stefano Sarao Mannelli, Lenka Zdeborová

    Abstract: Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing th… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: 21 pages, 7 figures

    ACM Class: G.3; G.4; I.2.11; J.3

    Journal ref: PNAS 2021 Vol. 118 No. 32 e2106548118

  46. arXiv:2006.15459  [pdf, other

    cs.LG stat.ML

    Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

    Authors: Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

    Abstract: We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width $m$ is larger than the input dimension $d$. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width $m^*\le m$. We describe… ▽ More

    Submitted 18 August, 2020; v1 submitted 27 June, 2020; originally announced June 2020.

    Comments: 10 pages, 4 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v33, page 13445--13455, 2020

  47. arXiv:2006.14709  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    The Gaussian equivalence of generative models for learning with shallow neural networks

    Authors: Sebastian Goldt, Bruno Loureiro, Galen Reeves, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trai… ▽ More

    Submitted 21 May, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: The accompanying code for this paper is available at https://github.com/sgoldt/gaussian-equiv-2layer

    Journal ref: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:426-471 (2021)

  48. arXiv:2006.06997  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimensio… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 9 pages, 5 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v22, page 3265--327, 2020

  49. arXiv:2006.06560  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

    Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More

    Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

  50. arXiv:2006.06098  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

    Authors: Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD c… ▽ More

    Submitted 9 November, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 8 pages + appendix, 4 figures

    Journal ref: J. Stat. Mech. 2021 124008 & NeurIPS 2020