Asymptotic theory of in-context learning by linear attention
Authors:
Yue M. Lu,
Mary I. Letey,
Jacob A. Zavatone-Veth,
Anindita Maiti,
Cengiz Pehlevan
Abstract:
Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unr…
▽ More
Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unresolved. Here, we provide a precise answer to these questions in an exactly solvable model of ICL of a linear regression task by linear attention. We derive sharp asymptotics for the learning curve in a phenomenologically-rich scaling regime where the token dimension is taken to infinity; the context length and pretraining task diversity scale proportionally with the token dimension; and the number of pretraining examples scales quadratically. We demonstrate a double-descent learning curve with increasing pretraining examples, and uncover a phase transition in the model's behavior between low and high task diversity regimes: In the low diversity regime, the model tends toward memorization of training tasks, whereas in the high diversity regime, it achieves genuine in-context learning and generalization beyond the scope of pretrained tasks. These theoretical insights are empirically validated through experiments with both linear attention and full nonlinear Transformer architectures.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
Quantum initial conditions for curved inflating universes
Authors:
Mary I. Letey,
Zakhar Shumaylov,
Fruzsina J. Agocs,
Will J. Handley,
Michael P. Hobson,
Anthony N. Lasenby
Abstract:
We discuss the challenges of motivating, constructing, and quantizing a canonically normalized inflationary perturbation in spatially curved universes. We show that this has historically proved challenging due to the interaction of nonadiabaticity with spatial curvature. We construct a novel curvature perturbation that is canonically normalized in the sense of its equation of motion and is unique…
▽ More
We discuss the challenges of motivating, constructing, and quantizing a canonically normalized inflationary perturbation in spatially curved universes. We show that this has historically proved challenging due to the interaction of nonadiabaticity with spatial curvature. We construct a novel curvature perturbation that is canonically normalized in the sense of its equation of motion and is unique up to a single scalar parameter. With this construction it becomes possible to set initial conditions invariant under canonical transformations, overcoming known ambiguities in the literature. This corrected quantization has potentially observational consequences via modifications to the primordial power spectrum at large angular scales, as well as theoretical implications for quantization procedures in curved cosmologies filled with a scalar field.
△ Less
Submitted 8 July, 2024; v1 submitted 30 November, 2022;
originally announced November 2022.