Skip to main content

Showing 1–14 of 14 results for author: Leavitt, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20541  [pdf, other

    cs.LG cs.CL

    Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

    Authors: Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

    Abstract: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2310.08743  [pdf

    cs.CV cs.AI cs.LG

    Development and Validation of a Deep Learning-Based Microsatellite Instability Predictor from Prostate Cancer Whole-Slide Images

    Authors: Qiyuan Hu, Abbas A. Rizvi, Geoffery Schau, Kshitij Ingale, Yoni Muller, Rachel Baits, Sebastian Pretzer, Aïcha BenTaieb, Abigail Gordhamer, Roberto Nussenzveig, Adam Cole, Matthew O. Leavitt, Rohan P. Joshi, Nike Beaubier, Martin C. Stumpe, Kunal Nagpal

    Abstract: Microsatellite instability-high (MSI-H) is a tumor agnostic biomarker for immune checkpoint inhibitor therapy. However, MSI status is not routinely tested in prostate cancer, in part due to low prevalence and assay cost. As such, prediction of MSI status from hematoxylin and eosin (H&E) stained whole-slide images (WSIs) could identify prostate cancer patients most likely to benefit from confirmato… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  3. arXiv:2309.07311  [pdf, other

    cs.CL

    Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

    Authors: Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra

    Abstract: Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout… ▽ More

    Submitted 7 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 camera-ready

  4. arXiv:2305.17409  [pdf, other

    cs.LG

    On the special role of class-selective neurons in early training

    Authors: Omkar Ranadive, Nikhil Thakurdesai, Ari S Morcos, Matthew Leavitt, Stéphane Deny

    Abstract: It is commonly observed that deep networks trained for classification exhibit class-selective neurons in their early and intermediate layers. Intriguingly, recent studies have shown that these class-selective neurons can be ablated without deteriorating network function. But if class-selective neurons are not necessary, why do they exist? We attempt to answer this question in a series of experimen… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  5. arXiv:2305.15096  [pdf, other

    cs.CL cs.AI

    Dynamic Masking Rate Schedules for MLM Pretraining

    Authors: Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

    Abstract: Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectivel… ▽ More

    Submitted 10 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  6. arXiv:2303.06480  [pdf, other

    cs.LG cs.CV

    Knowledge Distillation for Efficient Sequences of Training Runs

    Authors: Xingyu Liu, Alex Leonardi, Lu Yu, Chris Gilmer-Hill, Matthew Leavitt, Jonathan Frankle

    Abstract: In many practical scenarios -- like hyperparameter search or continual retraining with new data -- related training runs are performed many times in sequence. Current practice is to train each of these models independently from scratch. We study the problem of exploiting the computation invested in previous runs to reduce the cost of future runs using knowledge distillation (KD). We find that augm… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: This paper was accepted by ICML 2022 First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward

  7. arXiv:2211.00683  [pdf, other

    cs.LG cs.AI

    Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

    Authors: Cody Blakeney, Jessica Zosa Forde, Jonathan Frankle, Ziliang Zong, Matthew L. Leavitt

    Abstract: Methods for improving the efficiency of deep network training (i.e. the resources required to achieve a given level of model quality) are of immediate benefit to deep learning practitioners. Distillation is typically used to compress models or improve model quality, but it's unclear if distillation actually improves training efficiency. Can the quality improvements of distillation be converted int… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  8. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  9. arXiv:2202.00878  [pdf, other

    cs.SI cs.AI cs.CL cs.CR cs.LG

    A Longitudinal Dataset of Twitter ISIS Users

    Authors: Younes Karimi, Anna Squicciarini, Peter K. Forster, Kira M. Leavitt

    Abstract: We present a large longitudinal dataset of tweets from two sets of users that are suspected to be affiliated with ISIS. These sets of users are identified based on a prior study and a campaign aimed at shutting down ISIS Twitter accounts. These users have engaged with known ISIS accounts at least once during 2014-2015 and are still active as of 2021. Some of them have directly supported the ISIS u… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 10 pages, 7 figures; Submitted to the 16th International Conference on Web and Social Media (AAAI ICWSM-2022)

    MSC Class: 68T50; 68-11 ACM Class: J.4; K.4.1; I.2.7; H.4.3

  10. arXiv:2103.10697  [pdf, other

    cs.CV cs.LG stat.ML

    ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

    Authors: Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

    Abstract: Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

  11. arXiv:2010.12016  [pdf, other

    cs.CY cs.AI cs.CV cs.LG stat.ML

    Towards falsifiable interpretability research

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpre… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  12. arXiv:2010.07693  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Linking average- and worst-case perturbation robustness via class selectivity and dimensionality

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: Representational sparsity is known to affect robustness to input perturbations in deep neural networks (DNNs), but less is known about how the semantic content of representations affects robustness. Class selectivity-the variability of a unit's responses across data classes or dimensions-is one way of quantifying the sparsity of semantic representations. Given recent evidence that class selectivit… ▽ More

    Submitted 29 March, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:2007.04440

  13. arXiv:2007.04440  [pdf, other

    cs.LG stat.ML

    On the relationship between class selectivity, dimensionality, and robustness

    Authors: Matthew L. Leavitt, Ari S. Morcos

    Abstract: While the relative trade-offs between sparse and distributed representations in deep neural networks (DNNs) are well-studied, less is known about how these trade-offs apply to representations of semantically-meaningful information. Class selectivity, the variability of a unit's responses across data classes or dimensions, is one way of quantifying the sparsity of semantic representations. Given re… ▽ More

    Submitted 13 October, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning

  14. arXiv:2003.01262  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs

    Authors: Matthew L. Leavitt, Ari Morcos

    Abstract: The properties of individual neurons are often analyzed in order to understand the biological and artificial neural networks in which they're embedded. Class selectivity-typically defined as how different a neuron's responses are across different classes of stimuli or data samples-is commonly used for this purpose. However, it remains an open question whether it is necessary and/or sufficient for… ▽ More

    Submitted 14 October, 2020; v1 submitted 2 March, 2020; originally announced March 2020.