Skip to main content

Showing 1–21 of 21 results for author: Tenney, I

.
  1. arXiv:2404.07498  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Interactive Prompt Debugging with Sequence Salience

    Authors: Ian Tenney, Ryan Mullins, Bin Du, Shree Pandya, Minsuk Kahng, Lucas Dixon

    Abstract: We present Sequence Salience, a visual tool for interactive prompt debugging with input salience methods. Sequence Salience builds on widely used salience methods for text classification and single-token prediction, and extends this to a system tailored for debugging complex LLM prompts. Our system is well-suited for long texts, and expands on previous work by 1) providing controllable aggregation… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2402.10524  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

    Authors: Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, Lucas Dixon

    Abstract: Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluat… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  5. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  6. arXiv:2303.08114  [pdf, other

    cs.LG cs.CL

    Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs

    Authors: Kelvin Guu, Albert Webson, Ellie Pavlick, Lucas Dixon, Ian Tenney, Tolga Bolukbasi

    Abstract: Training data attribution (TDA) methods offer to trace a model's prediction on any given example back to specific influential training examples. Existing approaches do so by assigning a scalar influence score to each training example, under a simplifying assumption that influence is additive. But in reality, we observe that training examples interact in highly non-additive ways due to factors such… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  7. arXiv:2205.11482  [pdf, other

    cs.CL cs.IR

    Towards Tracing Factual Knowledge in Language Models Back to the Training Data

    Authors: Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu

    Abstract: Language models (LMs) have been shown to memorize a great deal of factual knowledge contained in their training data. But when an LM generates an assertion, it is often difficult to determine where it learned this information and whether it is true. In this paper, we propose the problem of fact tracing: identifying which training examples taught an LM to generate a particular factual assertion. Pr… ▽ More

    Submitted 25 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Findings of EMNLP, 2022

  8. arXiv:2110.07596  [pdf, other

    cs.CL cs.AI

    Retrieval-guided Counterfactual Generation for QA

    Authors: Bhargavi Paranjape, Matthew Lamm, Ian Tenney

    Abstract: Deep NLP models have been shown to learn spurious correlations, leaving them brittle to input perturbations. Recent work has shown that counterfactual or contrastive data -- i.e. minimally perturbed inputs -- can reveal these weaknesses, and that data augmentation using counterfactuals can help ameliorate them. Proposed techniques for generating counterfactuals rely on human annotations, perturbat… ▽ More

    Submitted 29 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: ACL 2022 Camera-ready version

  9. arXiv:2106.16163  [pdf, other

    cs.CL

    The MultiBERTs: BERT Reproductions for Robustness Analysis

    Authors: Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick

    Abstract: Experiments with pre-trained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact tested in the experiment (i.e., the particular instance of the model), it is not always clear whether they hold for the more general procedure which includes the architecture, training data, initialization scheme, and loss function. Recent work has shown that r… ▽ More

    Submitted 21 March, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted at ICLR'22. Checkpoints and example analyses: http://goo.gle/multiberts

  10. arXiv:2010.06032  [pdf, other

    cs.CL

    Measuring and Reducing Gendered Correlations in Pre-trained Models

    Authors: Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, Slav Petrov

    Abstract: Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for… ▽ More

    Submitted 2 March, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

  11. arXiv:2010.05345  [pdf, other

    cs.CL

    Do Language Embeddings Capture Scales?

    Authors: Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, Dan Roth

    Abstract: Pretrained Language Models (LMs) have been shown to possess significant linguistic, common sense, and factual knowledge. One form of knowledge that has not been studied yet in this context is information about the scalar magnitudes of objects. We show that pretrained language models capture a significant amount of this information but are short of the capability required for general common-sense r… ▽ More

    Submitted 24 November, 2020; v1 submitted 11 October, 2020; originally announced October 2020.

    Comments: Accepted at EMNLP Findings 2020 and EMNLP BlackboxNLP workshop 2020; 8 pages, 2 figures; Minor changes to the acknowledgment section

    ACM Class: I.2.7

  12. arXiv:2008.05122  [pdf, other

    cs.CL

    The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

    Authors: Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan

    Abstract: We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamline… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

  13. arXiv:2004.14513  [pdf, other

    cs.CL

    Asking without Telling: Exploring Latent Ontologies in Contextual Representations

    Authors: Julian Michael, Jan A. Botha, Ian Tenney

    Abstract: The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure? If so, how is this structure encoded? To investigate this, we introduce latent subclass learning (LSL): a modification to existing classifier-based probing methods th… ▽ More

    Submitted 8 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: 21 pages, 8 figures, 11 tables. Published in EMNLP 2020

    ACM Class: I.2.7

  14. arXiv:2004.14448  [pdf, other

    cs.CL

    What Happens To BERT Embeddings During Fine-tuning?

    Authors: Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, Ian Tenney

    Abstract: While there has been much recent work studying how linguistic information is encoded in pre-trained sentence representations, comparatively little is understood about how these models change when adapted to solve downstream tasks. Using a suite of analysis techniques (probing classifiers, Representational Similarity Analysis, and model ablations), we investigate how fine-tuning affects the represe… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 9 pages (not including references), 5 figures

  15. arXiv:2003.02249  [pdf, other

    cs.CL

    jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

    Authors: Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, Ian Tenney, Samuel R. Bowman

    Abstract: We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad set of tasks for probing, transfer learning, and multitask training experiments. jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark t… ▽ More

    Submitted 13 May, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

  16. arXiv:1905.06316  [pdf, other

    cs.CL

    What do you learn from context? Probing for sentence structure in contextualized word representations

    Authors: Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick

    Abstract: Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: ICLR 2019 camera-ready version, 17 pages including appendices

  17. arXiv:1905.05950  [pdf, other

    cs.CL

    BERT Rediscovers the Classical NLP Pipeline

    Authors: Ian Tenney, Dipanjan Das, Ellie Pavlick

    Abstract: Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network. We find that the model represents the steps of the traditional NLP pipeline in an interpretable and localizable way, and that the regions responsible for each step appear in the expected sequence:… ▽ More

    Submitted 9 August, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: Presented at ACL 2019

    Journal ref: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019) 4593-4601

  18. arXiv:1904.11544  [pdf, other

    cs.CL

    Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

    Authors: Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

    Abstract: We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeli… ▽ More

    Submitted 7 August, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Accepted to *SEM 2019 (revised submission). Corresponding authors: Najoung Kim ([email protected]), Ellie Pavlick ([email protected])

  19. arXiv:1812.10860  [pdf, other

    cs.CL

    Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

    Authors: Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning **, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman

    Abstract: Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our prim… ▽ More

    Submitted 22 July, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

    Comments: ACL 2019. This paper supercedes "Looking for ELMo's Friends: Sentence-Level Pretraining Beyond Language Modeling", an earlier version of this work by the same authors

  20. arXiv:1808.09422  [pdf, other

    cs.CL

    WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse

    Authors: Manaal Faruqui, Ellie Pavlick, Ian Tenney, Dipanjan Das

    Abstract: We release a corpus of 43 million atomic edits across 8 languages. These edits are mined from Wikipedia edit history and consist of instances in which a human editor has inserted a single contiguous phrase into, or deleted a single contiguous phrase from, an existing sentence. We use the collected data to show that the language generated during editing differs from the language that we observe in… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Journal ref: Proc. of EMNLP 2018

  21. arXiv:1301.3104  [pdf

    physics.chem-ph

    Delayed Ultrafast X-ray Auger Probing (DUXAP) of Nucleobase Ultraviolet Photoprotection

    Authors: B. K. McFarland, J. P. Farrell, S. Miyabe, F. Tarantelli, A. Aguilar, N. Berrah, C. Bostedt, J. Bozek, P. H. Bucksbaum, J. C. Castagna, R. Coffee, J. Cryan, L. Fang, R. Feifel, K. Gaffney, J. Glownia, T. Martinez, M. Mucke, B. Murphy, A. Natan, T. Osipov, V . Petrovic, S. Schorb, Th. Schultz, L. Spector , et al. (6 additional authors not shown)

    Abstract: We present a new method for ultrafast spectroscopy of molecular photoexcited dynamics. The technique uses a pair of femtosecond pulses: a photoexcitation pulse initiating excited state dynamics followed by a soft x-ray (SXR) probe pulse that core ionizes certain atoms inside the molecule. We observe the Auger decay of the core hole as a function of delay between the photoexcitation and SXR pulses.… ▽ More

    Submitted 14 January, 2013; originally announced January 2013.