Skip to main content

Showing 1–14 of 14 results for author: Simon, J B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.14646  [pdf, other

    cs.LG stat.ML

    More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

    Authors: James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin

    Abstract: In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in… ▽ More

    Submitted 15 May, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Appeared in ICLR 2024

  2. arXiv:2310.17813  [pdf, other

    cs.LG

    A Spectral Condition for Feature Learning

    Authors: Greg Yang, James B. Simon, Jeremy Bernstein

    Abstract: The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  3. arXiv:2309.01592  [pdf, other

    stat.ML cs.AI cs.LG hep-th math.PR

    Les Houches Lectures on Deep Learning at Large & Infinite Width

    Authors: Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

    Abstract: These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural ne… ▽ More

    Submitted 12 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: These are notes from lectures delivered by Yasaman Bahri and Boris Hanin at the 2022 Les Houches Summer School on Statistics Physics and Machine Learning and a first version of them were transcribed by Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon

  4. arXiv:2306.13185  [pdf, ps, other

    stat.ML cs.LG

    An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

    Authors: Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro

    Abstract: We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or… ▽ More

    Submitted 22 March, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: This is the ICLR CR version

  5. arXiv:2306.08055  [pdf, other

    cs.LG cs.AI

    Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

    Authors: Abraham J. Fetterman, Ellie Kitanidis, Joshua Albrecht, Zachary Polizzi, Bryden Fogelman, Maksis Knutins, Bartosz Wróblewski, James B. Simon, Kanjun Qiu

    Abstract: Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a prac… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  6. arXiv:2303.15438  [pdf, other

    cs.LG

    On the Stepwise Nature of Self-Supervised Learning

    Authors: James B. Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J. Fetterman, Joshua Albrecht

    Abstract: We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 9 pages (main text) + 14 pages (refs + appendices). ICML '23

  7. arXiv:2210.13417  [pdf, other

    cs.AI cs.LG

    Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

    Authors: Joshua Albrecht, Abraham J. Fetterman, Bryden Fogelman, Ellie Kitanidis, Bartosz Wróblewski, Nicole Seo, Michael Rosenthal, Maksis Knutins, Zachary Polizzi, James B. Simon, Kanjun Qiu

    Abstract: Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gatheri… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS Datasets and Benchmarks 2022. Video and links to all code, data, etc can be found at https://generallyintelligent.com/avalon/

  8. arXiv:2209.01691  [pdf, other

    cs.LG stat.ML

    On Kernel Regression with Data-Dependent Kernels

    Authors: James B. Simon

    Abstract: The primary hyperparameter in kernel regression (KR) is the choice of kernel. In most theoretical studies of KR, one assumes the kernel is fixed before seeing the training data. Under this assumption, it is known that the optimal kernel is equal to the prior covariance of the target function. In this note, we consider KR in which the kernel may be updated after seeing the training data. We point o… ▽ More

    Submitted 26 September, 2022; v1 submitted 4 September, 2022; originally announced September 2022.

    Comments: 7 pages, 1 figure

  9. arXiv:2207.06569  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

    Authors: Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

    Abstract: The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body… ▽ More

    Submitted 20 October, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: NM and JS co-first authors

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. arXiv:2202.11730  [pdf, other

    astro-ph.EP cs.LG

    Using Bayesian Deep Learning to infer Planet Mass from Gaps in Protoplanetary Disks

    Authors: Sayantan Auddy, Ramit Dey, Min-Kai Lin, Daniel Carrera, Jacob B. Simon

    Abstract: Planet induced sub-structures, like annular gaps, observed in dust emission from protoplanetary disks provide a unique probe to characterize unseen young planets. While deep learning based model has an edge in characterizing the planet's properties over traditional methods, like customized simulations and empirical relations, it lacks in its ability to quantify the uncertainty associated with its… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 14 pages, 6 figures, submitted to ApJ

  12. arXiv:2110.03922  [pdf, other

    cs.LG stat.ML

    The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks

    Authors: James B. Simon, Madeline Dickens, Dhruva Karkada, Michael R. DeWeese

    Abstract: We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.… ▽ More

    Submitted 26 October, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: 12 pages (main text) + 25 pages (refs + appendices). A previous version of this manuscript was entitled "Neural Tangent Kernel Eigenvalues Accurately Predict Generalization."

  13. arXiv:2107.11774  [pdf, other

    cs.LG math.OC stat.ML

    SGD with a Constant Large Learning Rate Can Converge to Local Maxima

    Authors: Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda

    Abstract: Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) S… ▽ More

    Submitted 27 May, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: Fixed typos

  14. arXiv:2106.03186  [pdf, other

    cs.LG

    Reverse Engineering the Neural Tangent Kernel

    Authors: James B. Simon, Sajant Anand, Michael R. DeWeese

    Abstract: The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature… ▽ More

    Submitted 13 August, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: 15 pages, 5 figures