Skip to main content

Showing 1–25 of 25 results for author: Dziri, N

.
  1. arXiv:2406.18510  [pdf, other

    cs.CL

    WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

    Authors: Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Ye** Choi, Nouha Dziri

    Abstract: We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of novel jailbreaks. Compared to prior work that performed red-teaming via recruited human workers, gradient-based optimization, or iterative revision with… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.18495  [pdf, other

    cs.CL

    WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

    Authors: Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Ye** Choi, Nouha Dziri

    Abstract: We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Together, WildGuard serves the increasing needs for automatic safety moderation and evaluation of LLM interactions, providing a one-stop tool with enhanced a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Third and fourth authors contributed equally

  3. arXiv:2406.04770  [pdf, other

    cs.CL cs.AI

    WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Authors: Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Ye** Choi

    Abstract: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs. For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs su… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Link: https://hf.co/spaces/allenai/WildBench

  4. arXiv:2404.10199  [pdf, other

    cs.CL cs.AI

    CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

    Authors: Huihan Li, Liwei Jiang, Jena D. Huang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Ye** Choi

    Abstract: As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are as… ▽ More

    Submitted 26 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  5. arXiv:2403.13787  [pdf, other

    cs.LG

    RewardBench: Evaluating Reward Models for Language Modeling

    Authors: Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Ye** Choi, Noah A. Smith, Hannaneh Hajishirzi

    Abstract: Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. Resources for reward model training a… ▽ More

    Submitted 8 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 44 pages, 19 figures, 12 tables

  6. arXiv:2402.05070  [pdf, other

    cs.AI cs.CL cs.IR

    A Roadmap to Pluralistic Alignment

    Authors: Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Ye** Choi

    Abstract: With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formaliz… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  7. arXiv:2312.01552  [pdf, other

    cs.CL cs.AI

    The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

    Authors: Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Ye** Choi

    Abstract: The alignment tuning process of large language models (LLMs) typically involves instruction learning through supervised fine-tuning (SFT) and preference tuning via reinforcement learning from human feedback (RLHF). A recent study, LIMA (Zhou et al. 2023), shows that using merely 1K examples for SFT can achieve significant alignment performance as well, suggesting that the effect of alignment tunin… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 26 pages, 8 figures. Project website: https://allenai.github.io/re-align/

  8. arXiv:2311.00059  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    The Generative AI Paradox: "What It Can Create, It May Not Understand"

    Authors: Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Ye** Choi

    Abstract: The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-exp… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  9. arXiv:2310.15431  [pdf, other

    cs.CL

    What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

    Authors: Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Ye** Choi

    Abstract: Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral r… ▽ More

    Submitted 1 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Camera Ready EMNLP Findings 2023. First two authors contributed equally

  10. arXiv:2310.08559  [pdf, other

    cs.CL cs.AI

    Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

    Authors: Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Ye** Choi, Nouha Dziri, Xiang Ren

    Abstract: The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. In this work, we conduct a systematic study of the inductive reason… ▽ More

    Submitted 22 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  11. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

    Authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Ye** Choi

    Abstract: Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve A… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 38

    Journal ref: Vol. 38 No. 18: AAAI-24 Technical Tracks 18; 2024; 19937-19947

  12. arXiv:2306.01693  [pdf, other

    cs.CL

    Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

    Authors: Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

    Abstract: Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text… ▽ More

    Submitted 30 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 camera-ready

  13. arXiv:2305.18654  [pdf, other

    cs.CL cs.AI cs.LG

    Faith and Fate: Limits of Transformers on Compositionality

    Authors: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Ye** Choi

    Abstract: Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the li… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: 10 pages + appendix (40 pages)

  14. arXiv:2305.15065  [pdf, other

    cs.CL

    Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

    Authors: Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Ye** Choi

    Abstract: While extreme-scale language models have demonstrated exceptional performance on a variety of language tasks, the degree of control over these language models through pure prompting can often be limited. Directly fine-tuning such language models can be effective for tailoring them, but it can be either extremely costly (e.g., GPT-3) or not even feasible for the broader community (e.g., GPT-4). W… ▽ More

    Submitted 6 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  15. arXiv:2305.06984  [pdf, other

    cs.CL

    Evaluating Open-Domain Question Answering in the Era of Large Language Models

    Authors: Ehsan Kamalloo, Nouha Dziri, Charles L. A. Clarke, Davood Rafiei

    Abstract: Lexical matching remains the de facto evaluation method for open-domain question answering (QA). Unfortunately, lexical matching fails completely when a plausible candidate answer does not appear in the list of gold answers, which is increasingly the case as we shift from extractive to generative models. The recent success of large language models (LLMs) for QA aggravates lexical matching failures… ▽ More

    Submitted 6 July, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: ACL 2023; code and data released at https://github.com/ehsk/OpenQA-eval

  16. arXiv:2303.17651  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Refine: Iterative Refinement with Self-Feedback

    Authors: Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark

    Abstract: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it… ▽ More

    Submitted 25 May, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Code, data, and demo at https://selfrefine.info/

  17. arXiv:2303.17574  [pdf, other

    cs.CL cs.AI cs.LG

    Elastic Weight Removal for Faithful and Abstractive Dialogue Generation

    Authors: Nico Daheim, Nouha Dziri, Mrinmaya Sachan, Iryna Gurevych, Edoardo M. Ponti

    Abstract: Ideally, dialogue systems should generate responses that are faithful to the knowledge contained in relevant documents. However, many models generate hallucinated responses instead that contradict it or contain unverifiable information. To mitigate such undesirable behaviour, it has been proposed to fine-tune a `negative expert' on negative examples and subtract its parameters from those of a pre-… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

  18. arXiv:2303.09713  [pdf, other

    cs.CL cs.AI

    CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

    Authors: Seungju Han, Jack Hessel, Nouha Dziri, Ye** Choi, Youngjae Yu

    Abstract: Visual information is central to conversation: body gestures and physical behaviour, for example, contribute to meaning that transcends words alone. To date, however, most neural conversational models are limited to just text. We introduce CHAMPAGNE, a generative model of conversations that can account for visual contexts. To train CHAMPAGNE, we collect and release YTD-18M, a large-scale corpus of… ▽ More

    Submitted 16 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: ICCV 2023, Project page: https://seungjuhan.me/champagne

  19. arXiv:2204.10757  [pdf, other

    cs.CL

    FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

    Authors: Nouha Dziri, Ehsan Kamalloo, Sivan Milton, Osmar Zaiane, Mo Yu, Edoardo M. Ponti, Siva Reddy

    Abstract: The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinat… ▽ More

    Submitted 23 October, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: TACL 2022 (20 pages, 3 figures, 10 tables)

  20. arXiv:2204.07931  [pdf, other

    cs.CL

    On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

    Authors: Nouha Dziri, Sivan Milton, Mo Yu, Osmar Zaiane, Siva Reddy

    Abstract: Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallucination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

    Comments: NAACL 2022, 14 pages

  21. arXiv:2106.13401  [pdf, other

    cs.LG cs.AI

    Decomposed Mutual Information Estimation for Contrastive Representation Learning

    Authors: Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet

    Abstract: Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong unde… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  22. arXiv:2105.00071  [pdf, other

    cs.CL

    Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark

    Authors: Nouha Dziri, Hannah Rashkin, Tal Linzen, David Reitter

    Abstract: Knowledge-grounded dialogue systems powered by large language models often generate responses that, while fluent, are not attributable to a relevant source of information. Progress towards models that do not exhibit this issue requires evaluation metrics that can quantify its prevalence. To this end, we introduce the Benchmark for Evaluation of Grounded INteraction (BEGIN), comprised of 12k dialog… ▽ More

    Submitted 28 June, 2022; v1 submitted 30 April, 2021; originally announced May 2021.

    Comments: TACL, 12 pages, 9 figures, 2 tables

  23. arXiv:2104.08455  [pdf, other

    cs.CL

    Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding

    Authors: Nouha Dziri, Andrea Madotto, Osmar Zaiane, Avishek Joey Bose

    Abstract: Dialogue systems powered by large pre-trained language models (LM) exhibit an innate ability to deliver fluent and natural-looking responses. Despite their impressive generation performance, these models can often generate factually incorrect statements impeding their widespread adoption. In this paper, we focus on the task of improving the faithfulness -- and thus reduce hallucination -- of Neura… ▽ More

    Submitted 14 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 18 pages

  24. arXiv:1904.03371  [pdf, other

    cs.CL cs.LG

    Evaluating Coherence in Dialogue Systems using Entailment

    Authors: Nouha Dziri, Ehsan Kamalloo, Kory W. Mathewson, Osmar Zaiane

    Abstract: Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, j… ▽ More

    Submitted 31 March, 2020; v1 submitted 6 April, 2019; originally announced April 2019.

    Comments: 5 pages, 2 figures; NAACL-HLT 2019

  25. arXiv:1811.01063  [pdf, other

    cs.CL

    Augmenting Neural Response Generation with Context-Aware Topical Attention

    Authors: Nouha Dziri, Ehsan Kamalloo, Kory W. Mathewson, Osmar Zaiane

    Abstract: Sequence-to-Sequence (Seq2Seq) models have witnessed a notable success in generating natural conversational exchanges. Notwithstanding the syntactically well-formed responses generated by these neural network models, they are prone to be acontextual, short and generic. In this work, we introduce a Topical Hierarchical Recurrent Encoder Decoder (THRED), a novel, fully data-driven, multi-turn respon… ▽ More

    Submitted 4 June, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted at ACL 2019 Workshop on NLP for ConvAI (NLP4ConvAI). 8 pages + 4 appendix pages, 6 figures, 9 tables