Skip to main content

Showing 1–29 of 29 results for author: Li, X L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13905  [pdf, other

    cs.CL

    Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking

    Authors: Mohamed Elaraby, Diane Litman, Xiang Lorraine Li, Ahmed Magooda

    Abstract: Generating free-text rationales is among the emergent capabilities of Large Language Models (LLMs). These rationales have been found to enhance LLM performance across various NLP tasks. Recently, there has been growing interest in using these rationales to provide insights for various important downstream tasks. In this paper, we analyze generated free-text rationales in tasks with subjective answ… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.04145  [pdf, other

    cs.CL cs.AI

    Every Answer Matters: Evaluating Commonsense with Probabilistic Measures

    Authors: Qi Cheng, Michael Boratko, Pranay Kumar Yelugam, Tim O'Gorman, Nalini Singh, Andrew McCallum, Xiang Lorraine Li

    Abstract: Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capt… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Camera Ready

  3. arXiv:2403.18286  [pdf, other

    cs.CL cs.AI cs.LG

    Few-Shot Recalibration of Language Models

    Authors: Xiang Lisa Li, Urvashi Khandelwal, Kelvin Guu

    Abstract: Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), where the model's confidence score reflects how likely it is to be correct. However, while LMs may appear well-calibrated over broad distributions, this often hides significant miscalibration within narrower slices (e.g., systemic over-confidence in math can balance out systemic und… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: preprint

  4. arXiv:2401.01482  [pdf, other

    cs.CV cs.AI cs.LG

    Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition

    Authors: Kyle Buettner, Sina Malakouti, Xiang Lorraine Li, Adriana Kovashka

    Abstract: Existing object recognition models have been shown to lack robustness in diverse geographical scenarios due to domain shifts in design and context. Class representations need to be adapted to more accurately reflect an object concept under these shifts. In the absence of training data from target geographies, we hypothesize that geographically diverse descriptive knowledge of categories can enhanc… ▽ More

    Submitted 29 March, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: To appear in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024

  5. arXiv:2312.04469  [pdf, other

    cs.LG cs.CL cs.CR

    On the Learnability of Watermarks for Language Models

    Authors: Chenchen Gu, Xiang Lisa Li, Percy Liang, Tatsunori Hashimoto

    Abstract: Watermarking of language model outputs enables statistical detection of model-generated text, which can mitigate harms and misuses of language models. Existing watermarking strategies operate by altering the decoder of an existing language model. In this paper, we ask whether language models can directly learn to generate watermarked text, which would have significant implications for the real-wor… ▽ More

    Submitted 2 May, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted at ICLR 2024

  6. arXiv:2311.08469  [pdf, other

    cs.CL

    UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

    Authors: Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Ye** Choi, Xiang Lorraine Li, Alane Suhr

    Abstract: Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexp… ▽ More

    Submitted 1 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: accepted at NAACL'24

  7. arXiv:2311.07237  [pdf, other

    cs.CL cs.AI

    In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

    Authors: Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li, Ximing Lu, Wenting Zhao, Faeze Brahman, Ye** Choi, Xiang Ren

    Abstract: State-of-the-art LLMs outperform humans on reasoning tasks such as Natural Language Inference. Recent works evaluating LLMs note a marked performance drop on input data from the low-probability distribution, i.e., the longtail. Therefore, we focus on systematically generating statements involving long-tail inferential knowledge for more effective evaluation of LLMs in the reasoning space. We first… ▽ More

    Submitted 27 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  8. arXiv:2310.01846  [pdf, other

    cs.CL cs.LG

    Benchmarking and Improving Generator-Validator Consistency of Language Models

    Authors: Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang

    Abstract: As of September 2023, ChatGPT correctly answers "what is 7+8" with 15, but when asked "7+8=15, True or False" it responds with "False". This inconsistency between generating and validating an answer is prevalent in language models (LMs) and erodes trust. In this paper, we propose a framework for measuring the consistency between generation and validation (which we call generator-validator consiste… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: preprint

  9. arXiv:2305.19472  [pdf, other

    cs.CL cs.AI cs.LG

    PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

    Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Ye** Choi

    Abstract: Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using… ▽ More

    Submitted 26 July, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: cited new paper, 27 pages

  10. arXiv:2305.18654  [pdf, other

    cs.CL cs.AI cs.LG

    Faith and Fate: Limits of Transformers on Compositionality

    Authors: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Ye** Choi

    Abstract: Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the li… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: 10 pages + appendix (40 pages)

  11. arXiv:2305.14956  [pdf, other

    cs.CL

    Editing Common Sense in Transformers

    Authors: Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon

    Abstract: Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training (Meng et al., 2023). However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer. Commonsense knowledge with multiple correct answers, e.g., an apple can be green or red but not transparent, has not be… ▽ More

    Submitted 26 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 Main Conference. Anshita, Debanjan, Akshay are co-first authors. Code and datasets for all experiments are available at https://github.com/anshitag/memit_csk

  12. arXiv:2304.08467  [pdf, other

    cs.CL

    Learning to Compress Prompts with Gist Tokens

    Authors: Jesse Mu, Xiang Lisa Li, Noah Goodman

    Abstract: Prompting is the primary way to utilize the multitask capabilities of language models (LMs), but prompts occupy valuable space in the input context window, and repeatedly encoding the same prompt is computationally inefficient. Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we… ▽ More

    Submitted 12 February, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023, 26 pages. Version 3 updates preprint to camera-ready version and clarifies some writing in places

  13. arXiv:2212.14024  [pdf, other

    cs.CL cs.IR

    Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

    Authors: Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia

    Abstract: Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose De… ▽ More

    Submitted 23 January, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

  14. arXiv:2212.09746  [pdf, other

    cs.CL

    Evaluating Human-Language Model Interaction

    Authors: Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

    Abstract: Many real-world applications of language models (LMs), such as writing assistance and code autocomplete, involve human-LM interaction. However, most benchmarks are non-interactive in that a model produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive… ▽ More

    Submitted 5 January, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

  15. arXiv:2210.15097  [pdf, other

    cs.CL cs.AI cs.LG

    Contrastive Decoding: Open-ended Text Generation as Optimization

    Authors: Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis

    Abstract: Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The… ▽ More

    Submitted 10 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Main conference long paper at ACL 2023

  16. arXiv:2205.14217  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion-LM Improves Controllable Text Generation

    Authors: Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto

    Abstract: Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  17. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  18. arXiv:2111.00607  [pdf, other

    cs.CL

    A Systematic Investigation of Commonsense Knowledge in Large Language Models

    Authors: Xiang Lorraine Li, Adhiguna Kuncoro, Jordan Hoffmann, Cyprien de Masson d'Autume, Phil Blunsom, Aida Nematzadeh

    Abstract: Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup. Here we aim to better understand the extent to which such models learn commonsense knowledge -- a critical component of many NLP applications. We conduct a systematic and rigorous zero-shot and few-shot commonsense evaluation of large pre-trained LMs, w… ▽ More

    Submitted 31 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Accepted to EMNLP 2022

  19. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  20. arXiv:2106.14361  [pdf, other

    cs.CL cs.AI

    Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings

    Authors: Shib Sankar Dasgupta, Michael Boratko, Siddhartha Mishra, Shriya Atmakuri, Dhruvesh Patel, Xiang Lorraine Li, Andrew McCallum

    Abstract: Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. "red cars"$\subseteq$"cars") and homographs (eg. "tongue"$\cap$"body" should be similar to "mout… ▽ More

    Submitted 8 June, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022

  21. arXiv:2104.04597  [pdf, other

    cs.AI cs.CL cs.LG

    Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning

    Authors: Xuelu Chen, Michael Boratko, Muhao Chen, Shib Sankar Dasgupta, Xiang Lorraine Li, Andrew McCallum

    Abstract: Knowledge bases often consist of facts which are harvested from a variety of sources, many of which are noisy and some of which conflict, resulting in a level of uncertainty for each triple. Knowledge bases are also often incomplete, prompting the use of embedding methods to generalize from known facts, however, existing embedding methods only model triple-level uncertainty, and reasoning results… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: NAACL-HLT 2021

  22. arXiv:2101.00190  [pdf, other

    cs.CL

    Prefix-Tuning: Optimizing Continuous Prompts for Generation

    Authors: Xiang Lisa Li, Percy Liang

    Abstract: Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimi… ▽ More

    Submitted 1 January, 2021; originally announced January 2021.

  23. arXiv:2010.07375  [pdf, other

    cs.CL

    Decoding Methods for Neural Narrative Generation

    Authors: Alexandra DeLucia, Aaron Mueller, Xiang Lisa Li, João Sedoc

    Abstract: Narrative generation is an open-ended NLP task in which a model generates a story given a prompt. The task is similar to neural response generation for chatbots; however, innovations in response generation are often not applied to narrative generation, despite the similarity between these tasks. We aim to bridge this gap by applying and evaluating advances in decoding methods for neural response g… ▽ More

    Submitted 8 July, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 20 pages. Updated to the accepted version in Workshop on Generation Evaluation and Metrics at ACL 2021 (GEM'21)

  24. arXiv:2010.04831  [pdf, other

    cs.LG cs.AI stat.ML

    Improving Local Identifiability in Probabilistic Box Embeddings

    Authors: Shib Sankar Dasgupta, Michael Boratko, Dongxu Zhang, Luke Vilnis, Xiang Lorraine Li, Andrew McCallum

    Abstract: Geometric embeddings have recently received attention for their natural ability to represent transitive asymmetric relations via containment. Box embeddings, where objects are represented by n-dimensional hyperrectangles, are a particularly promising example of such an embedding as they are closed under intersection and their volume can be calculated easily, allowing them to naturally represent ca… ▽ More

    Submitted 28 October, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted at NeurIPS2020

  25. arXiv:2009.12756  [pdf, other

    cs.CL

    Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

    Authors: Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, **gfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Wen-tau Yih, Sebastian Riedel, Douwe Kiela, Barlas Oğuz

    Abstract: We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be ap… ▽ More

    Submitted 19 February, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

  26. arXiv:2005.04560  [pdf, other

    cs.CL cs.AI cs.LG

    Posterior Control of Blackbox Generation

    Authors: Xiang Lisa Li, Alexander M. Rush

    Abstract: Text generation often requires high-precision output that obeys task-specific rules. This fine-grained control is difficult to enforce with off-the-shelf deep learning models. In this work, we consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach. Under this formulation, task-specific knowledge can be encoded through a range… ▽ More

    Submitted 9 May, 2020; originally announced May 2020.

    Comments: Accepted for publication at ACL 2020

  27. arXiv:2005.00771  [pdf, other

    cs.CL

    ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning

    Authors: Michael Boratko, Xiang Lorraine Li, Rajarshi Das, Tim O'Gorman, Dan Le, Andrew McCallum

    Abstract: Given questions regarding some prototypical situation such as Name something that people usually do before they leave the house for work? a human can easily answer them via acquired experiences. There can be multiple right answers for such questions, with some more common for a situation than others. This paper introduces a new question answering dataset for training and evaluating common sense re… ▽ More

    Submitted 27 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: First four authors contribute equally

  28. arXiv:1910.00163  [pdf, other

    cs.CL cs.LG

    Specializing Word Embeddings (for Parsing) by Information Bottleneck

    Authors: Xiang Lisa Li, Jason Eisner

    Abstract: Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, kee** only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a cont… ▽ More

    Submitted 30 September, 2019; originally announced October 2019.

    Comments: Accepted for publication at EMNLP 2019

  29. arXiv:1906.11298  [pdf, other

    cs.CL cs.LG

    A Generative Model for Punctuation in Dependency Trees

    Authors: Xiang Lisa Li, Dingquan Wang, Jason Eisner

    Abstract: Treebanks traditionally treat punctuation marks as ordinary words, but linguists have suggested that a tree's "true" punctuation marks are not observed (Nunberg, 1990). These latent "underlying" marks serve to delimit or separate constituents in the syntax tree. When the tree's yield is rendered as a written sentence, a string rewriting mechanism transduces the underlying marks into "surface" mark… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.