Skip to main content

Showing 1–9 of 9 results for author: Mignacco, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15926  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

    Authors: Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie, Haim Sompolinsky

    Abstract: Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-wi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2405.06851  [pdf, other

    q-bio.NC cond-mat.dis-nn cond-mat.stat-mech cs.NE stat.ML

    Nonlinear classification of neural manifolds with contextual information

    Authors: Francesca Mignacco, Chi-Ning Chou, SueYeon Chung

    Abstract: Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a prom… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures

  3. arXiv:2302.05440  [pdf, other

    cs.LG

    Forward Learning with Top-Down Feedback: Empirical and Analytical Characterization

    Authors: Ravi Srinivasan, Francesca Mignacco, Martino Sorbaro, Maria Refinetti, Avi Cooper, Gabriel Kreiman, Giorgia Dellaferrera

    Abstract: "Forward-only" algorithms, which train neural networks while avoiding a backward pass, have recently gained attention as a way of solving the biologically unrealistic aspects of backpropagation. Here, we first address compelling challenges related to the "forward-only" rules, which include reducing the performance gap with backpropagation and providing an analytical understanding of their dynamics… ▽ More

    Submitted 22 March, 2024; v1 submitted 10 February, 2023; originally announced February 2023.

  4. arXiv:2210.06591  [pdf, other

    math-ph cs.IT cs.LG stat.ML

    Rigorous dynamical mean field theory for stochastic gradient descent methods

    Authors: Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

    Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match th… ▽ More

    Submitted 29 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 40 pages, 4 figures

  5. arXiv:2203.12094  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves for the multi-class teacher-student perceptron

    Authors: Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová

    Abstract: One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machin… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 14 pages + appendix

    Journal ref: Machine Learning: Science and Technology 4 015019 (2022)

  6. arXiv:2112.10852  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    The effective noise of Stochastic Gradient Descent

    Authors: Francesca Mignacco, Pierfrancesco Urbani

    Abstract: Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted according to the performance on this specific subset of examples. The mini-batch sampling procedure introduces a stochastic dynamics to the gradient descent, with a… ▽ More

    Submitted 1 June, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: 7 pages + appendix, 5 figures

  7. arXiv:2103.04902  [pdf, other

    cond-mat.dis-nn cs.LG math.ST stat.ML

    Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

    Authors: Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototy… ▽ More

    Submitted 13 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: 28 pages, 11 figures

    Journal ref: Mach. Learn.: Sci. Technol. 2 035029 (2021)

  8. arXiv:2006.06098  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

    Authors: Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD c… ▽ More

    Submitted 9 November, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 8 pages + appendix, 4 figures

    Journal ref: J. Stat. Mech. 2021 124008 & NeurIPS 2020

  9. arXiv:2002.11544  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    The role of regularization in classification of high-dimensional noisy Gaussian mixture

    Authors: Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and th… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 8 pages + appendix, 6 figures

    Journal ref: International Conference on Machine Learning, ICML 2020