Skip to main content

Showing 1–25 of 25 results for author: Saphra, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17746  [pdf, other

    cs.CL cs.AI

    Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

    Authors: USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra

    Abstract: Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, recons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.11741  [pdf, other

    cs.LG cs.AI

    Transcendence: Generative Models Can Outperform The Experts That Train Them

    Authors: Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, Benjamin L. Edelman, Milind Tambe, Sham M. Kakade, Eran Malach

    Abstract: Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Code, models, and data at https://transcendence.eddie.win

  3. arXiv:2403.13106  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data

    Authors: Divyansh Singhvi, Andrej Erkelens, Raghav Jain, Diganta Misra, Naomi Saphra

    Abstract: Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations in a variety of modalities, tasks, and architectures. Considering linguistic structure in masked and auto-regressive language mo… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  4. arXiv:2311.18007  [pdf, other

    astro-ph.IM astro-ph.GA cs.LG

    Towards out-of-distribution generalization in large-scale astronomical surveys: robust networks learn similar representations

    Authors: Yash Gondhalekar, Sultan Hassan, Naomi Saphra, Sambatra Andrianomena

    Abstract: The generalization of machine learning (ML) models to out-of-distribution (OOD) examples remains a key challenge in extracting information from upcoming astronomical surveys. Interpretability approaches are a natural way to gain insights into the OOD generalization problem. We use Centered Kernel Alignment (CKA), a similarity measure metric of neural network representations, to examine the relatio… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted to Machine Learning and the Physical Sciences Workshop, NeurIPS 2023

  5. arXiv:2311.08695  [pdf, other

    cs.LG cs.CL cs.CV

    Attribute Diversity Determines the Systematicity Gap in VQA

    Authors: Ian Berlot-Attwell, Kumar Krishna Agrawal, A. Michael Carrell, Yash Sharma, Naomi Saphra

    Abstract: The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagn… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 33 pages, 20 figures

  6. arXiv:2311.05020  [pdf, other

    cs.CL

    First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

    Authors: Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez

    Abstract: Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs). After such a disruptive change to our understanding of the field, what is left to do? Taking a historical lens, we look for guidance from the first era of LLMs, which began in 2005 with large $n$-gram models for machine translation (MT… ▽ More

    Submitted 25 March, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

  7. arXiv:2310.03646  [pdf, other

    cs.LG cs.CL

    TRAM: Bridging Trust Regions and Sharpness Aware Minimization

    Authors: Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng

    Abstract: Sharpness-aware minimization (SAM) reports improving domain generalization by reducing the loss surface curvature in the parameter space. However, generalization during fine-tuning is often more dependent on the transferability of representations in the function space. Trust-region methods (TR) target this goal by regularizing representation curvature to reduce catastrophic forgetting of pre-train… ▽ More

    Submitted 12 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Camera Ready for ICLR 2024 (Accepted as Spotlight). 21 pages, 14 tables, 2 figures

  8. arXiv:2309.07311  [pdf, other

    cs.CL

    Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

    Authors: Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra

    Abstract: Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout… ▽ More

    Submitted 7 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 camera-ready

  9. arXiv:2308.09543  [pdf, other

    cs.LG

    Latent State Models of Training Dynamics

    Authors: Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho

    Abstract: The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dyna… ▽ More

    Submitted 19 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted at TMLR 2023. Updated Jan 19, 2024 with erratum

  10. arXiv:2305.15096  [pdf, other

    cs.CL cs.AI

    Dynamic Masking Rate Schedules for MLM Pretraining

    Authors: Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

    Abstract: Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectivel… ▽ More

    Submitted 10 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  11. arXiv:2211.12424  [pdf, other

    cs.DL cs.LG

    One Venue, Two Conferences: The Separation of Chinese and American Citation Networks

    Authors: Bingchen Zhao, Yuling Gu, Jessica Zosa Forde, Naomi Saphra

    Abstract: At NeurIPS, American and Chinese institutions cite papers from each other's regions substantially less than they cite endogamously. We build a citation graph to quantify this divide, compare it to European connectivity, and discuss the causes and consequences of the separation.

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Workshop on Cultures of AI and AI for Culture @ NeurIPS 2022

  12. State-of-the-art generalisation research in NLP: A taxonomy and review

    Authors: Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhi**g **

    Abstract: The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation… ▽ More

    Submitted 12 January, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: This preprint was published as an Analysis article in Nature Machine Intelligence. Please refer to the published version when citing this work. 28 pages of content + 6 pages of appendix + 52 pages of references

    Journal ref: Nat Mach Intell 5, 1161-1174 (2023)

  13. arXiv:2208.08195  [pdf, other

    cs.CL

    Benchmarking Compositionality with Formal Languages

    Authors: Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell

    Abstract: Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP can acquire this ability while learning from data is an open question. In this paper, we investigate this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with… ▽ More

    Submitted 1 August, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Published at COLING 2022. This version fixes a mistake in Figure 4 and adds a clarifying note in teal. Code is available at https://github.com/valvoda/neuralTransducer

  14. arXiv:2205.12411  [pdf, other

    cs.LG cs.CL

    Linear Connectivity Reveals Generalization Strategies

    Authors: Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, João Sedoc, Naomi Saphra

    Abstract: It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained. Under some circumstances, including transfer learning from pretrained models, these paths are presumed to be linear. In contrast to existing results, we find that among text c… ▽ More

    Submitted 23 January, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Publushed as a conference paper at ICLR 2023

  15. arXiv:2106.16163  [pdf, other

    cs.CL

    The MultiBERTs: BERT Reproductions for Robustness Analysis

    Authors: Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick

    Abstract: Experiments with pre-trained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact tested in the experiment (i.e., the particular instance of the model), it is not always clear whether they hold for the more general procedure which includes the architecture, training data, initialization scheme, and loss function. Recent work has shown that r… ▽ More

    Submitted 21 March, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted at ICLR'22. Checkpoints and example analyses: http://goo.gle/multiberts

  16. arXiv:2105.10185  [pdf, other

    cs.CL cs.LG

    A Non-Linear Structural Probe

    Authors: Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell

    Abstract: Probes are models devised to investigate the encoding of knowledge -- e.g. syntactic structure -- in contextual representations. Probes are often designed for simplicity, which has led to restrictions on probe design that may not allow for the full exploitation of the structure of encoded information; one such restriction is linearity. We examine the case of a structural probe (Hewitt and Manning,… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: Accepted at NAACL 2021

  17. arXiv:2010.04650  [pdf, other

    cs.CL cs.LG

    LSTMs Compose (and Learn) Bottom-Up

    Authors: Naomi Saphra, Adam Lopez

    Abstract: Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the \textit{learning} process that leads to their compositional behavior. For a closer look at how an LSTM's sequential representations are composed hierarchically, we present a related measure of Decompositional Interdependence (DI) between word meanings in… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Published in EMNLP Findings 2020. arXiv admin note: substantial text overlap with arXiv:2004.13195

  18. Pareto Probing: Trading Off Accuracy for Complexity

    Authors: Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell

    Abstract: The question of how to probe contextual word representations for linguistic structure in a way that is both principled and useful has seen significant attention recently in the NLP literature. In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume. To measure complexity, we present… ▽ More

    Submitted 4 December, 2023; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Tiago Pimentel and Naomi Saphra contributed equally to this work. Camera ready version of EMNLP 2020 publication. In this new version, we fixed some notation issues in the appendix, and added a new appendix section describing our MLP. Code available in https://github.com/rycolab/pareto-probing

  19. arXiv:2004.13195  [pdf, other

    cs.CL cs.LG stat.ML

    Word Interdependence Exposes How LSTMs Compose Representations

    Authors: Naomi Saphra, Adam Lopez

    Abstract: Recent work in NLP shows that LSTM language models capture compositional structure in language data. For a closer look at how these representations are composed hierarchically, we present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates. To explore how compositional representations arise over training, we conduct simple experime… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

  20. arXiv:1911.04669  [pdf, other

    cs.CL cs.IR cs.LG

    How to Evaluate Word Representations of Informal Domain?

    Authors: Yekun Chai, Naomi Saphra, Adam Lopez

    Abstract: Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications. Nevertheless, how to efficiently evaluate such word embeddings in the informal domain such as Twitter or forums, remains an ongoing challenge due to the lack of sufficient evaluation dataset. We derived a large list of variant spelling pairs from UrbanDictionary with the automatic appr… ▽ More

    Submitted 12 November, 2019; v1 submitted 11 November, 2019; originally announced November 2019.

  21. arXiv:1908.01817  [pdf, other

    cs.CL cs.LG stat.ML

    Sparsity Emerges Naturally in Neural Language Models

    Authors: Naomi Saphra, Adam Lopez

    Abstract: Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks. If sparsity is important for NLP, might well-trained neural models naturally become roughly sparse? Using the Taxi-Euclidean norm to measure sparsity, we find that frequent input words are associated with concentrated or sparse activations… ▽ More

    Submitted 22 July, 2019; originally announced August 2019.

    Comments: Published in the ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena: https://openreview.net/forum?id=H1ets1h56E

  22. Understanding Learning Dynamics Of Language Models with SVCCA

    Authors: Naomi Saphra, Adam Lopez

    Abstract: Research has shown that neural models implicitly encode linguistic features, but there has been no research showing \emph{how} these encodings arise as the models are trained. We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned re… ▽ More

    Submitted 3 April, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

    Comments: Accepted for publication in NAACL 2019

  23. arXiv:1701.03980  [pdf, other

    stat.ML cs.CL cs.MS

    DyNet: The Dynamic Neural Network Toolkit

    Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

    Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

    Comments: 33 pages

  24. arXiv:1606.08270  [pdf, ps, other

    cs.CL

    Evaluating Informal-Domain Word Representations With UrbanDictionary

    Authors: Naomi Saphra, Adam Lopez

    Abstract: Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums. We want to test whether a representation of informal words fulfills the promise of eliding explicit text normalization as a preprocessing step. One possible evaluation metric for such domains is the proximity of spelling variants. We propose how such a metric might b… ▽ More

    Submitted 27 June, 2016; originally announced June 2016.

  25. arXiv:1306.2091  [pdf, other

    cs.CL

    A framework for (under)specifying dependency syntax without overloading annotators

    Authors: Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge

    Abstract: We introduce a framework for lightweight dependency syntax annotation. Our formalism builds upon the typical representation for unlabeled dependencies, permitting a simple notation and annotation workflow. Moreover, the formalism encourages annotators to underspecify parts of the syntax if doing so would streamline the annotation process. We demonstrate the efficacy of this annotation on three lan… ▽ More

    Submitted 14 June, 2013; v1 submitted 9 June, 2013; originally announced June 2013.

    Comments: This is an expanded version of a paper appearing in Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, Sofia, Bulgaria, August 8-9, 2013