-
Scaling and renormalization in high-dimensional regression
Authors:
Alexander Atanasov,
Jacob A. Zavatone-Veth,
Cengiz Pehlevan
Abstract:
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generaliza…
▽ More
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.
△ Less
Submitted 26 June, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
A Dynamical Model of Neural Scaling Laws
Authors:
Blake Bordelon,
Alexander Atanasov,
Cengiz Pehlevan
Abstract:
On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature…
▽ More
On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. This reproduces many observations about neural scaling laws. First, our model makes a prediction about why the scaling of performance with training time and with model size have different power law exponents. Consequently, the theory predicts an asymmetric compute-optimal scaling rule where the number of training steps are increased faster than model parameters, consistent with recent empirical observations. Second, it has been observed that early in training, networks converge to their infinite-width dynamics at a rate $1/\textit{width}$ but at late time exhibit a rate $\textit{width}^{-c}$, where $c$ depends on the structure of the architecture and task. We show that our model exhibits this behavior. Lastly, our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
△ Less
Submitted 23 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Precision Bootstrap for the $\mathcal{N}=1$ Super-Ising Model
Authors:
Alexander Atanasov,
Aaron Hillman,
David Poland,
Junchen Rong,
Ning Su
Abstract:
In this note we report an improved determination of the scaling dimensions and OPE coefficients of the minimal supersymmetric extension of the 3d Ising model using the conformal bootstrap. We also show how this data can be used as input to the Lorentzian inversion formula, finding good agreement between analytic calculations and numerical extremal spectra once mixing effects are resolved.
In this note we report an improved determination of the scaling dimensions and OPE coefficients of the minimal supersymmetric extension of the 3d Ising model using the conformal bootstrap. We also show how this data can be used as input to the Lorentzian inversion formula, finding good agreement between analytic calculations and numerical extremal spectra once mixing effects are resolved.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
Bootstrap** the Minimal 3D SCFT
Authors:
Alexander Atanasov,
Aaron Hillman,
David Poland
Abstract:
We study the conformal bootstrap constraints for 3D conformal field theories with a $\mathbb{Z}_2$ or parity symmetry, assuming a single relevant scalar operator $ε$ that is invariant under the symmetry. When there is additionally a single relevant odd scalar $σ$, we map out the allowed space of dimensions and three-point couplings of such "Ising-like" CFTs. If we allow a second relevant odd scala…
▽ More
We study the conformal bootstrap constraints for 3D conformal field theories with a $\mathbb{Z}_2$ or parity symmetry, assuming a single relevant scalar operator $ε$ that is invariant under the symmetry. When there is additionally a single relevant odd scalar $σ$, we map out the allowed space of dimensions and three-point couplings of such "Ising-like" CFTs. If we allow a second relevant odd scalar $σ'$, we identify a feature in the allowed space compatible with 3D $\mathcal{N}=1$ superconformal symmetry and conjecture that it corresponds to the minimal $\mathcal{N}=1$ supersymmetric extension of the Ising CFT. This model has appeared in previous numerical bootstrap studies, as well as in proposals for emergent supersymmetry on the boundaries of topological phases of matter. Adding further constraints from 3D $\mathcal{N}=1$ superconformal symmetry, we isolate this theory and use the numerical bootstrap to compute the leading scaling dimensions $Δ_σ = Δ_ε - 1 = .58444(22)$ and three-point couplings $λ_{σσε} = 1.0721(2)$ and $λ_{εεε} = 1.67(1)$. We additionally place bounds on the central charge and use the extremal functional method to estimate the dimensions of the next several operators in the spectrum. Based on our results we observe the possible exact relation $λ_{εεε}/λ_{σσε} = \tan(1)$.
△ Less
Submitted 6 August, 2018; v1 submitted 16 July, 2018;
originally announced July 2018.