Skip to main content

Showing 1–9 of 9 results for author: Leavitt, M L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20541  [pdf, other

    cs.LG cs.CL

    Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

    Authors: Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

    Abstract: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2309.07311  [pdf, other

    cs.CL

    Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

    Authors: Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra

    Abstract: Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout… ▽ More

    Submitted 7 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 camera-ready

  3. arXiv:2305.15096  [pdf, other

    cs.CL cs.AI

    Dynamic Masking Rate Schedules for MLM Pretraining

    Authors: Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

    Abstract: Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectivel… ▽ More

    Submitted 10 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  4. arXiv:2211.00683  [pdf, other

    cs.LG cs.AI

    Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

    Authors: Cody Blakeney, Jessica Zosa Forde, Jonathan Frankle, Ziliang Zong, Matthew L. Leavitt

    Abstract: Methods for improving the efficiency of deep network training (i.e. the resources required to achieve a given level of model quality) are of immediate benefit to deep learning practitioners. Distillation is typically used to compress models or improve model quality, but it's unclear if distillation actually improves training efficiency. Can the quality improvements of distillation be converted int… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  5. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĆ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  6. arXiv:2010.12016  [pdf, other

    cs.CY cs.AI cs.CV cs.LG stat.ML

    Towards falsifiable interpretability research

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpre… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  7. arXiv:2010.07693  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Linking average- and worst-case perturbation robustness via class selectivity and dimensionality

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: Representational sparsity is known to affect robustness to input perturbations in deep neural networks (DNNs), but less is known about how the semantic content of representations affects robustness. Class selectivity-the variability of a unit's responses across data classes or dimensions-is one way of quantifying the sparsity of semantic representations. Given recent evidence that class selectivit… ▽ More

    Submitted 29 March, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:2007.04440

  8. arXiv:2007.04440  [pdf, other

    cs.LG stat.ML

    On the relationship between class selectivity, dimensionality, and robustness

    Authors: Matthew L. Leavitt, Ari S. Morcos

    Abstract: While the relative trade-offs between sparse and distributed representations in deep neural networks (DNNs) are well-studied, less is known about how these trade-offs apply to representations of semantically-meaningful information. Class selectivity, the variability of a unit's responses across data classes or dimensions, is one way of quantifying the sparsity of semantic representations. Given re… ▽ More

    Submitted 13 October, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

  9. arXiv:2003.01262  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: The properties of individual neurons are often analyzed in order to understand the biological and artificial neural networks in which they're embedded. Class selectivity-typically defined as how different a neuron's responses are across different classes of stimuli or data samples-is commonly used for this purpose. However, it remains an open question whether it is necessary and/or sufficient for… ▽ More

    Submitted 14 October, 2020; v1 submitted 2 March, 2020; originally announced March 2020.