Skip to main content

Showing 1–12 of 12 results for author: Roberts, D A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01413  [pdf, other

    cs.LG cs.AI cs.CL cs.ET stat.ML

    Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

    Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

    Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  2. arXiv:2403.17887  [pdf, other

    cs.CL cs.LG stat.ML

    The Unreasonable Ineffectiveness of the Deeper Layers

    Authors: Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

    Abstract: We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 + 10 pages, 5 + 4 figures

    Report number: MIT-CTP/5694

  3. arXiv:2310.07765  [pdf, other

    cs.LG hep-ph hep-th stat.ML

    Feature Learning and Generalization in Deep Networks with Orthogonal Weights

    Authors: Hannah Day, Yonatan Kahn, Daniel A. Roberts

    Abstract: Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We s… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: v2: numerical experiments updated with more data, plots updated to match, conclusions unchanged. 30+12 pages, 20 figures

    Report number: MIT-CTP/5625

  4. arXiv:2210.16859  [pdf, other

    cs.LG hep-th stat.ML

    A Solvable Model of Neural Scaling Laws

    Authors: Alexander Maloney, Daniel A. Roberts, James Sully

    Abstract: Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource. To understand this better, we first identify the necessary properties allowing such sca… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 73 + 23 pages, 14 + 5 figures

    Report number: MIT-CTP/5463

  5. arXiv:2106.10165  [pdf, other

    cs.LG cs.AI hep-th stat.ML

    The Principles of Deep Learning Theory

    Authors: Daniel A. Roberts, Sho Yaida, Boris Hanin

    Abstract: This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are… ▽ More

    Submitted 24 August, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: 471 pages, to be published by Cambridge University Press; v2: hyperlinks fixed, index added

    Report number: MIT-CTP/5306

    Journal ref: Cambridge University Press (2022)

  6. arXiv:2104.04874  [pdf, ps, other

    cs.LG stat.ML

    SGD Implicitly Regularizes Generalization Error

    Authors: Daniel A. Roberts

    Abstract: We derive a simple and model-independent formula for the change in the generalization gap due to a gradient descent update. We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient descent acts to regularize generalization error by decorrelating nearby… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: First appeared at the "Workshop on Integration of Deep Learning Theories" at NeurIPS in 2018 and has been available since then at https://research.fb.com/publications/sgd-implicitly-regularizes-generalization-error/

  7. arXiv:2104.00008  [pdf, other

    hep-th cs.AI cs.LG physics.hist-ph stat.ML

    Why is AI hard and Physics simple?

    Authors: Daniel A. Roberts

    Abstract: We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theor… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: written for a special issue of Machine Learning: Science and Technology as an invited perspective piece

    Report number: MIT-CTP/5269

  8. arXiv:2102.08380  [pdf, other

    hep-ph cs.LG stat.ML

    Topological Obstructions to Autoencoding

    Authors: Joshua Batson, C. Grace Haaf, Yonatan Kahn, Daniel A. Roberts

    Abstract: Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so cle… ▽ More

    Submitted 3 May, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 24 + 20 pages, 26 figures; no autoencoders were harmed in the making of this project. v2: JHEP published version

    Report number: MIT-CTP/5264

    Journal ref: JHEP04(2021)280

  9. arXiv:2012.08919  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

    Authors: Denisa A. O. Roberts

    Abstract: This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. The goal is building multilingual systems that retrieve in evidence-rich languages to verify claims in evidence-poor languages that are more commonly targeted by disinformation. To this end, our EnmBERT fact verificati… ▽ More

    Submitted 19 January, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: Accepted ECIR 2021

  10. arXiv:2009.10071  [pdf, ps, other

    math.NA cs.LG cs.MS stat.ML

    QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep -- Real or Complex -- Matrices and Their Software Implementation

    Authors: Denisa A. O. Roberts, Lucas R. Roberts

    Abstract: This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m, n}$, that are either square (m = n), wide (m < n), or deep (m > n), with rank $k = min(m, n)$. Furthermore, we derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition and for the LQ decomposition of deep input matrices. Differentiable QR decomposition offers a numeri… ▽ More

    Submitted 11 December, 2020; v1 submitted 19 September, 2020; originally announced September 2020.

  11. arXiv:1908.02729  [pdf, other

    stat.ML cs.LG

    Robust Learning with Jacobian Regularization

    Authors: Judy Hoffman, Daniel A. Roberts, Sho Yaida

    Abstract: Design of reliable systems must guarantee stability against input perturbations. In machine learning, such guarantee entails preventing overfitting and ensuring robustness of models against corruption of input data. In order to maximize stability, we analyze and develop a computationally efficient implementation of Jacobian regularization that increases classification margins of neural networks. T… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: 21 pages, 10 figures

  12. arXiv:1812.04754  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient Descent Happens in a Tiny Subspace

    Authors: Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer

    Abstract: We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: 9 pages + appendices, 12 figures