Search | arXiv e-print repository

Scaling and renormalization in high-dimensional regression

Authors: Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

Abstract: This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generaliza… ▽ More This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws. △ Less

Submitted 26 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: 68 pages, 17 figures

arXiv:2402.01092 [pdf, other]

A Dynamical Model of Neural Scaling Laws

Authors: Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

Abstract: On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature… ▽ More On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. This reproduces many observations about neural scaling laws. First, our model makes a prediction about why the scaling of performance with training time and with model size have different power law exponents. Consequently, the theory predicts an asymmetric compute-optimal scaling rule where the number of training steps are increased faster than model parameters, consistent with recent empirical observations. Second, it has been observed that early in training, networks converge to their infinite-width dynamics at a rate $1/\textit{width}$ but at late time exhibit a rate $\textit{width}^{-c}$, where $c$ depends on the structure of the architecture and task. We show that our model exhibits this behavior. Lastly, our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data. △ Less

Submitted 23 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: ICML Camera Ready. Included online SGD section with additional simulations and its connection to large sample limit of our gradient flow theory. Fixed typo in Appendix eq 112

arXiv:2201.02206 [pdf, other]

doi 10.1007/JHEP08(2022)136

Precision Bootstrap for the $\mathcal{N}=1$ Super-Ising Model

Authors: Alexander Atanasov, Aaron Hillman, David Poland, Junchen Rong, Ning Su

Abstract: In this note we report an improved determination of the scaling dimensions and OPE coefficients of the minimal supersymmetric extension of the 3d Ising model using the conformal bootstrap. We also show how this data can be used as input to the Lorentzian inversion formula, finding good agreement between analytic calculations and numerical extremal spectra once mixing effects are resolved. In this note we report an improved determination of the scaling dimensions and OPE coefficients of the minimal supersymmetric extension of the 3d Ising model using the conformal bootstrap. We also show how this data can be used as input to the Lorentzian inversion formula, finding good agreement between analytic calculations and numerical extremal spectra once mixing effects are resolved. △ Less

Submitted 6 January, 2022; originally announced January 2022.

Comments: 32 pages, 6 figures

Journal ref: JHEP 08 (2022) 136

arXiv:1807.05702 [pdf, other]

doi 10.1007/JHEP11(2018)140

Bootstrap** the Minimal 3D SCFT

Authors: Alexander Atanasov, Aaron Hillman, David Poland

Abstract: We study the conformal bootstrap constraints for 3D conformal field theories with a $\mathbb{Z}_2$ or parity symmetry, assuming a single relevant scalar operator $ε$ that is invariant under the symmetry. When there is additionally a single relevant odd scalar $σ$, we map out the allowed space of dimensions and three-point couplings of such "Ising-like" CFTs. If we allow a second relevant odd scala… ▽ More We study the conformal bootstrap constraints for 3D conformal field theories with a $\mathbb{Z}_2$ or parity symmetry, assuming a single relevant scalar operator $ε$ that is invariant under the symmetry. When there is additionally a single relevant odd scalar $σ$, we map out the allowed space of dimensions and three-point couplings of such "Ising-like" CFTs. If we allow a second relevant odd scalar $σ'$, we identify a feature in the allowed space compatible with 3D $\mathcal{N}=1$ superconformal symmetry and conjecture that it corresponds to the minimal $\mathcal{N}=1$ supersymmetric extension of the Ising CFT. This model has appeared in previous numerical bootstrap studies, as well as in proposals for emergent supersymmetry on the boundaries of topological phases of matter. Adding further constraints from 3D $\mathcal{N}=1$ superconformal symmetry, we isolate this theory and use the numerical bootstrap to compute the leading scaling dimensions $Δ_σ = Δ_ε - 1 = .58444(22)$ and three-point couplings $λ_{σσε} = 1.0721(2)$ and $λ_{εεε} = 1.67(1)$. We additionally place bounds on the central charge and use the extremal functional method to estimate the dimensions of the next several operators in the spectrum. Based on our results we observe the possible exact relation $λ_{εεε}/λ_{σσε} = \tan(1)$. △ Less

Submitted 6 August, 2018; v1 submitted 16 July, 2018; originally announced July 2018.

Comments: 16 pages, 6 figures; V2: references added

Showing 1–4 of 4 results for author: Atanasov, A