Skip to main content

Showing 1–38 of 38 results for author: Wallace, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20053  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

    Authors: Danny Halawi, Alexander Wei, Eric Wallace, Tony T. Wang, Nika Haghtalab, Jacob Steinhardt

    Abstract: Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate the challenge of defending finetuning interfaces, we introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection. Our method constructs a malicious d… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 22 pages

  2. arXiv:2404.13208  [pdf, other

    cs.CR cs.CL cs.LG

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    Authors: Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, Alex Beutel

    Abstract: Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts. In this work, we argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from untrus… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  3. arXiv:2403.06634  [pdf, other

    cs.CR

    Stealing Part of a Production Language Model

    Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr

    Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2403.05612  [pdf, other

    cs.LG cs.AI cs.CL

    Unfamiliar Finetuning Examples Control How Language Models Hallucinate

    Authors: Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine

    Abstract: Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in sha** these errors. In particular, we find that a… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2402.11782  [pdf, other

    cs.CL cs.LG

    What Evidence Do Language Models Find Convincing?

    Authors: Alexander Wan, Eric Wallace, Dan Klein

    Abstract: Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as "is aspartame linked to cancer". To resolve these ambiguous queries, one must search through a large range of websites and consider "which, if any, of this evidence do I find convincing?". In this work, we study how LLMs answer this question. In particular, we construct C… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  6. arXiv:2311.17035  [pdf, other

    cs.LG cs.CL cs.CR

    Scalable Extraction of Training Data from (Production) Language Models

    Authors: Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

    Abstract: This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  7. arXiv:2309.05610  [pdf, other

    cs.CR cs.LG

    Privacy Side Channels in Machine Learning Systems

    Authors: Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr

    Abstract: Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates th… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  8. arXiv:2308.04430  [pdf, other

    cs.CL cs.AI cs.LG

    SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

    Authors: Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer

    Abstract: The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 27 pages; 6 figures. Code, models, and data available at https://github.com/kernelmachine/silo-lm

  9. arXiv:2305.15717  [pdf, other

    cs.CL

    The False Promise of Imitating Proprietary LLMs

    Authors: Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song

    Abstract: An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that i… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  10. arXiv:2305.00944  [pdf, other

    cs.CL cs.CR cs.LG

    Poisoning Language Models During Instruction Tuning

    Authors: Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein

    Abstract: Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired tr… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  11. arXiv:2301.13188  [pdf, other

    cs.CR cs.CV cs.LG

    Extracting Training Data from Diffusion Models

    Authors: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

    Abstract: Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  12. arXiv:2211.08411  [pdf, other

    cs.CL cs.LG

    Large Language Models Struggle to Learn Long-Tail Knowledge

    Authors: Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel

    Abstract: The Internet contains a wealth of knowledge -- from the birthdays of historical figures to tutorials on how to code -- all of which may be learned by language models. However, while certain pieces of information are ubiquitous on the web, others appear extremely rarely. In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-trai… ▽ More

    Submitted 27 July, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: ICML 2023 Camera Ready Version

  13. arXiv:2207.00099  [pdf, other

    cs.LG

    Measuring Forgetting of Memorized Training Examples

    Authors: Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

    Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what… ▽ More

    Submitted 9 May, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: Appeared at ICLR '23, 22 pages, 12 figures

  14. arXiv:2205.09665  [pdf, other

    cs.CL

    Automated Crossword Solving

    Authors: Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein

    Abstract: We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles. Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search to find full puzzle solutions. Compared to existing approaches, our system improves exact puzzle accuracy from 7… ▽ More

    Submitted 3 July, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: ACL 2022

  15. arXiv:2204.05999  [pdf, other

    cs.SE cs.CL cs.LG

    InCoder: A Generative Model for Code Infilling and Synthesis

    Authors: Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

    Abstract: Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and move… ▽ More

    Submitted 9 April, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: ICLR 2023. v3: camera-ready that includes PLBART and OpenAI baselines

  16. arXiv:2202.06539  [pdf, other

    cs.CR cs.CL cs.LG

    Deduplicating Training Data Mitigates Privacy Risks in Language Models

    Authors: Nikhil Kandpal, Eric Wallace, Colin Raffel

    Abstract: Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. We first show that the rate at which language models regenerate t… ▽ More

    Submitted 20 December, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: ICML 2022 Camera Ready Version

  17. arXiv:2110.08514  [pdf, other

    cs.CL cs.LG

    Analyzing Dynamic Adversarial Training Data in the Limit

    Authors: Eric Wallace, Adina Williams, Robin Jia, Douwe Kiela

    Abstract: To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena. Dynamic adversarial data collection (DADC), where annotators craft examples that challenge continually improving models, holds promise as an approach for generating such diverse training sets. Prior work has shown that running DADC over 1-3 rounds can… ▽ More

    Submitted 26 September, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: ACL Findings 2022

  18. arXiv:2106.13353  [pdf, other

    cs.CL cs.LG

    Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

    Authors: Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel

    Abstract: Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competiti… ▽ More

    Submitted 1 July, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

  19. arXiv:2104.06390  [pdf, other

    cs.CL cs.LG

    Detoxifying Language Models Risks Marginalizing Minority Voices

    Authors: Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Gururangan, Maarten Sap, Dan Klein

    Abstract: Language models (LMs) must be both safe and equitable to be responsibly deployed in practice. With safety in mind, numerous detoxification techniques (e.g., Dathathri et al. 2020; Krause et al. 2020) have been proposed to mitigate toxic LM generations. In this work, we show that current detoxification techniques hurt equity: they decrease the utility of LMs on language used by marginalized groups… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  20. arXiv:2102.09690  [pdf, other

    cs.CL cs.LG

    Calibrate Before Use: Improving Few-Shot Performance of Language Models

    Authors: Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh

    Abstract: GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable: the choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art. We demonstrate that this instability arises from the bias of lang… ▽ More

    Submitted 10 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  21. arXiv:2012.07805  [pdf, other

    cs.CR cs.CL cs.LG

    Extracting Training Data from Large Language Models

    Authors: Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

    Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model. We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and ar… ▽ More

    Submitted 15 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

  22. arXiv:2010.15980  [pdf, other

    cs.CL cs.LG

    AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

    Authors: Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh

    Abstract: The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AutoPro… ▽ More

    Submitted 7 November, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

    Comments: v2: Fixed error in Figure 2

  23. arXiv:2010.12563  [pdf, other

    cs.CL

    Concealed Data Poisoning Attacks on NLP Models

    Authors: Eric Wallace, Tony Z. Zhao, Shi Feng, Sameer Singh

    Abstract: Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is much less understood whether, and how, predictions can be manipulated with small, concealed changes to the training data. In this work, we develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input. For instance, we… ▽ More

    Submitted 12 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: NAACL 2021

  24. arXiv:2010.05419  [pdf, other

    cs.CL cs.LG

    Gradient-based Analysis of NLP Models is Manipulable

    Authors: Junlin Wang, Jens Tuyls, Eric Wallace, Sameer Singh

    Abstract: Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness. In this paper, however, we demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-base… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

  25. arXiv:2008.04449  [pdf, ps, other

    cs.CR cs.AI cs.AR cs.CY cs.LG

    Trustworthy AI Inference Systems: An Industry Research View

    Authors: Rosario Cammarota, Matthias Schunter, Anand Rajan, Fabian Boemer, Ágnes Kiss, Amos Treiber, Christian Weinert, Thomas Schneider, Emmanuel Stapf, Ahmad-Reza Sadeghi, Daniel Demmler, Joshua Stock, Huili Chen, Siam Umar Hussain, Sadegh Riazi, Farinaz Koushanfar, Saransh Gupta, Tajan Simunic Rosing, Kamalika Chaudhuri, Hamid Nejatollahi, Nikil Dutt, Mohsen Imani, Kim Laine, Anuj Dubey, Aydin Aysu , et al. (4 additional authors not shown)

    Abstract: In this work, we provide an industry research view for approaching the design, deployment, and operation of trustworthy Artificial Intelligence (AI) inference systems. Such systems provide customers with timely, informed, and customized inferences to aid their decision, while at the same time utilizing appropriate security protection mechanisms for AI models. Additionally, such systems should also… ▽ More

    Submitted 10 February, 2023; v1 submitted 10 August, 2020; originally announced August 2020.

  26. arXiv:2004.15015  [pdf, other

    cs.CL cs.CR cs.LG

    Imitation Attacks and Defenses for Black-box Machine Translation Systems

    Authors: Eric Wallace, Mitchell Stern, Dawn Song

    Abstract: Adversaries may look to steal or attack black-box NLP systems, either for financial gain or to exploit model errors. One setting of particular interest is machine translation (MT), where models have high commercial value and errors can be costly. We investigate possible exploits of black-box MT systems and explore a preliminary defense against such threats. We first show that MT systems can be sto… ▽ More

    Submitted 3 January, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020

  27. arXiv:2004.06100  [pdf, other

    cs.CL cs.LG

    Pretrained Transformers Improve Out-of-Distribution Robustness

    Authors: Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Song

    Abstract: Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for seven NLP datasets by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and… ▽ More

    Submitted 16 April, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  28. arXiv:2004.02709  [pdf, other

    cs.CL

    Evaluating Models' Local Decision Boundaries via Contrast Sets

    Authors: Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang , et al. (1 additional authors not shown)

    Abstract: Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systemati… ▽ More

    Submitted 1 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  29. arXiv:2002.11794  [pdf, other

    cs.CL cs.LG

    Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

    Authors: Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez

    Abstract: Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that… ▽ More

    Submitted 22 June, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  30. arXiv:1909.09251  [pdf, other

    cs.CL cs.LG

    AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

    Authors: Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner, Sameer Singh

    Abstract: Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders ad… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019 Demo

  31. arXiv:1909.07940  [pdf, other

    cs.CL cs.LG

    Do NLP Models Know Numbers? Probing Numeracy in Embeddings

    Authors: Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner

    Abstract: The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks. Currently, most NLP models treat numbers in text in the same way as other tokens---they embed them as distributed vectors. Is this enough to capture numeracy? We begin by investigating the numerical reasoning capabilities of a state-of-the-art question answering model on the DROP dataset. We fi… ▽ More

    Submitted 18 September, 2019; v1 submitted 17 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  32. arXiv:1908.07125  [pdf, other

    cs.CL cs.LG

    Universal Adversarial Triggers for Attacking and Analyzing NLP

    Authors: Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh

    Abstract: Adversarial examples highlight model vulnerabilities and are useful for evaluation and interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens that trigger a model to produce a specific prediction when concatenated to any input from a dataset. We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification… ▽ More

    Submitted 3 January, 2021; v1 submitted 19 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  33. arXiv:1906.02900  [pdf, other

    cs.CL cs.AI

    Compositional Questions Do Not Necessitate Multi-hop Reasoning

    Authors: Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer

    Abstract: Multi-hop reading comprehension (RC) questions are challenging because they require reading and reasoning over multiple paragraphs. We argue that it can be difficult to construct large multi-hop RC datasets. For example, even highly compositional questions can be answered with a single hop if they target specific entity types, or the facts needed to answer them are redundant. Our analysis is cente… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: Published as a conference paper at ACL 2019 (short). Code available at https://github.com/shmsw25/single-hop-rc

  34. arXiv:1905.05778  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Misleading Failures of Partial-input Baselines

    Authors: Shi Feng, Eric Wallace, Jordan Boyd-Graber

    Abstract: Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate th… ▽ More

    Submitted 18 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: ACL 2019

  35. arXiv:1902.00407  [pdf, other

    cs.LG stat.ML

    Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation

    Authors: Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi

    Abstract: Current methods to interpret deep learning models by generating saliency maps generally rely on two key assumptions. First, they use first-order approximations of the loss function neglecting higher-order terms such as the loss curvatures. Second, they evaluate each feature's importance in isolation, ignoring their inter-dependencies. In this work, we study the effect of relaxing these two assumpt… ▽ More

    Submitted 30 May, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: Proceedings of the 36th International Conference on Machine Learning, 2019

  36. arXiv:1809.02847  [pdf, other

    cs.CL

    Interpreting Neural Networks With Nearest Neighbors

    Authors: Eric Wallace, Shi Feng, Jordan Boyd-Graber

    Abstract: Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address… ▽ More

    Submitted 7 November, 2018; v1 submitted 8 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018 BlackboxNLP

  37. arXiv:1809.02701  [pdf, other

    cs.CL

    Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering

    Authors: Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber

    Abstract: Adversarial evaluation stress tests a model's understanding of natural language. While past approaches expose superficial patterns, the resulting adversarial examples are limited in complexity and diversity. We propose human-in-the-loop adversarial generation, where human authors are guided to break models. We aid the authors with interpretations of model predictions through an interactive user in… ▽ More

    Submitted 16 July, 2019; v1 submitted 7 September, 2018; originally announced September 2018.

    Comments: Author final version of article accepted for publication in TACL 2019

  38. Pathologies of Neural Models Make Interpretations Difficult

    Authors: Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber

    Abstract: One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is determined by either input perturbation---measuring the decrease in model confidence when that word is removed---or by the gradient with respect to that word. To… ▽ More

    Submitted 28 August, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: EMNLP 2018 camera ready