Skip to main content

Showing 1–1 of 1 results for author: Tiberi, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15926  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

    Authors: Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie, Haim Sompolinsky

    Abstract: Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-wi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.