Skip to main content

Showing 1–14 of 14 results for author: Wiegreffe, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.06751  [pdf, other

    cs.CL cs.AI cs.LG

    The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

    Authors: Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe

    Abstract: How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: ACL 2024. 23 pages, 20 figures

  2. arXiv:2311.09605  [pdf, other

    cs.CL

    Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals

    Authors: Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. Smith

    Abstract: The inevitable appearance of spurious correlations in training datasets hurts the generalization of NLP models on unseen data. Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e.g., the hypothesis in NLI) and the label; consequently, models trained only on those outperform chance. Are these correlations picked up by models tra… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  3. arXiv:2305.14956  [pdf, other

    cs.CL

    Editing Common Sense in Transformers

    Authors: Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon

    Abstract: Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training (Meng et al., 2023). However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer. Commonsense knowledge with multiple correct answers, e.g., an apple can be green or red but not transparent, has not be… ▽ More

    Submitted 26 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 Main Conference. Anshita, Debanjan, Akshay are co-first authors. Code and datasets for all experiments are available at https://github.com/anshitag/memit_csk

  4. arXiv:2305.14596  [pdf, other

    cs.CL cs.LG

    Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

    Authors: Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal

    Abstract: When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms with identical meaning (such as "bath" and "bathtub") is thought to cause an underestimation of a model's true performance, referred to as th… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  5. arXiv:2303.17651  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Refine: Iterative Refinement with Self-Feedback

    Authors: Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark

    Abstract: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it… ▽ More

    Submitted 25 May, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Code, data, and demo at https://selfrefine.info/

  6. arXiv:2204.07693  [pdf, other

    cs.CL

    Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

    Authors: Kaige Xie, Sarah Wiegreffe, Mark Riedl

    Abstract: Multi-hop Question Answering (QA) is a challenging task since it requires an accurate aggregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that performance can be boosted by first decomposing the questions into simpler, single-hop questions. In this paper, we explore one additional utility… ▽ More

    Submitted 31 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: Accepted to EMNLP 2022 Findings

  7. arXiv:2112.08674  [pdf, other

    cs.CL

    Reframing Human-AI Collaboration for Generating Free-Text Explanations

    Authors: Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Ye** Choi

    Abstract: Large language models are increasingly capable of generating fluent-appearing text with relatively little task-specific supervision. But can these models accurately explain classification decisions? We consider the task of generating free-text explanations using human-written examples in a few-shot manner. We find that (1) authoring higher quality prompts results in higher quality generations; and… ▽ More

    Submitted 4 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: NAACL 2022 Camera-ready. 13 pages main + references, 14 pages appendix

  8. arXiv:2105.01311  [pdf, other

    cs.CL

    Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning

    Authors: Xiangyu Peng, Siyan Li, Sarah Wiegreffe, Mark Riedl

    Abstract: Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generating narratives over time, and critically lack basic commonsense reasoning. Furthermore, existing methods generally focus only on single-character stories, or fail to track characters at all. To improve the coherence of ge… ▽ More

    Submitted 17 November, 2023; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: Findings of EMNLP 2022. For conference video and anthology version, see https://aclanthology.org/2022.findings-emnlp.520/

  9. arXiv:2102.12060  [pdf, other

    cs.CL cs.AI cs.LG

    Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing

    Authors: Sarah Wiegreffe, Ana Marasović

    Abstract: Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual explanations. These explanations are used downstream in three ways: as data augmentation to improve performance on a predictive task, as supervision to train models to produce explanations for their predictions, and as a ground-truth to evaluate model-generated explanations. In this review, we identify 65 datase… ▽ More

    Submitted 7 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: v3: NeurIPS 2021 accepted paper camera-ready version. The content of v3 is almost the same as of v1-2 but is more condensed. v4: Fixed a typo in the title and added acknowledgements. 10 pages main, 6 pages appendix

  10. arXiv:2010.12762  [pdf, other

    cs.CL

    Measuring Association Between Labels and Free-Text Rationales

    Authors: Sarah Wiegreffe, Ana Marasović, Noah A. Smith

    Abstract: In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. While prior work focuses on extractive rationales (a subset of the input words), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-ex… ▽ More

    Submitted 29 August, 2022; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Revision to EMNLP 2021 camera-ready; corrects simulatability terminology and clarifies computation of rationale quality metric (no results changed). For a detailed explanation of changes, see https://github.com/allenai/label_rationale_association

  11. arXiv:2005.00115  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to Faithfully Rationalize by Construction

    Authors: Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace

    Abstract: In many settings it is important for one to be able to understand why a model made a particular prediction. In NLP this often entails extracting snippets of an input text `responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation. In some settings, faithfulness may be critical to ensure transparency.… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: ACL2020 Camera Ready Submission

  12. arXiv:1908.04626  [pdf, other

    cs.CL

    Attention is not not Explanation

    Authors: Sarah Wiegreffe, Yuval Pinter

    Abstract: Attention mechanisms play a central role in NLP systems, especially within recurrent neural network (RNN) models. Recently, there has been increasing interest in whether or not the intermediate representations offered by these modules may be used to explain the reasoning for a model's prediction, and consequently reach insights regarding the model's decision-making process. A recent paper claims t… ▽ More

    Submitted 5 September, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

    Comments: Accepted to EMNLP 2019; related blog post at https://medium.com/@yuvalpinter/attention-is-not-not-explanation-dbc25b534017

  13. arXiv:1906.03380  [pdf, other

    cs.CL

    Clinical Concept Extraction for Document-Level Coding

    Authors: Sarah Wiegreffe, Edward Choi, Sherry Yan, Jimeng Sun, Jacob Eisenstein

    Abstract: The text of clinical notes can be a valuable source of patient information and clinical assessments. Historically, the primary approach for exploiting clinical notes has been information extraction: linking spans of text to concepts in a detailed domain ontology. However, recent work has demonstrated the potential of supervised machine learning to extract document-level codes directly from the raw… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: ACL BioNLP workshop (2019)

  14. arXiv:1802.05695  [pdf, other

    cs.CL cs.LG stat.ML

    Explainable Prediction of Medical Codes from Clinical Text

    Authors: James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, Jacob Eisenstein

    Abstract: Clinical notes are text documents that are created by clinicians for each patient encounter. They are typically accompanied by medical codes, which describe the diagnosis and treatment. Annotating these codes is labor intensive and error prone; furthermore, the connection between the codes and the text is not annotated, obscuring the reasons and details behind specific diagnoses and treatments. We… ▽ More

    Submitted 16 April, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: NAACL 2018