Skip to main content

Showing 1–9 of 9 results for author: Kabbara, J

.
  1. arXiv:2406.17737  [pdf, other

    cs.CL cs.AI cs.LG

    LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users

    Authors: Elinor Poole-Dayan, Deb Roy, Jad Kabbara

    Abstract: While state-of-the-art Large Language Models (LLMs) have shown impressive performance on many tasks, there has been extensive research on undesirable model behavior such as hallucinations and bias. In this work, we investigate how the quality of LLM responses changes in terms of information accuracy, truthfulness, and refusals depending on three user traits: English proficiency, education level, a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2405.16282  [pdf, other

    cs.CL cs.AI cs.LG

    Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models

    Authors: Abhishek Kumar, Robert Morabito, Sanzhar Umbet, Jad Kabbara, Ali Emami

    Abstract: As the use of Large Language Models (LLMs) becomes more widespread, understanding their self-evaluation of confidence in generated responses becomes increasingly important as it is integral to the reliability of the output of these models. We introduce the concept of Confidence-Probability Alignment, that connects an LLM's internal confidence, quantified by token probabilities, to the confidence c… ▽ More

    Submitted 15 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 pages (excluding references), accepted to ACL 2024 Main Conference

  3. arXiv:2404.12691  [pdf, other

    cs.AI cs.CY

    Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

    Authors: Shayne Longpre, Robert Mahari, Naana Obeng-Marnu, William Brannon, Tobin South, Katy Gero, Sandy Pentland, Jad Kabbara

    Abstract: New capabilities in foundation models are owed in large part to massive, widely-sourced, and under-documented training data collections. Existing practices in data collection have led to challenges in documenting data transparency, tracing authenticity, verifying consent, privacy, representation, bias, copyright infringement, and the overall development of ethical and trustworthy foundation models… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 9 pages, 2 tables

  4. arXiv:2402.17019  [pdf, other

    cs.CL cs.HC

    Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling

    Authors: Hang Jiang, Xiajie Zhang, Robert Mahari, Daniel Kessler, Eric Ma, Tal August, Irene Li, Alex 'Sandy' Pentland, Yoon Kim, Deb Roy, Jad Kabbara

    Abstract: Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts throug… ▽ More

    Submitted 2 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  5. arXiv:2310.16787  [pdf, other

    cs.CL cs.AI cs.LG

    The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

    Authors: Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

    Abstract: The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tool… ▽ More

    Submitted 4 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 30 pages (18 main), 6 figures, 5 tables

  6. arXiv:2305.14321  [pdf, other

    cs.CL

    ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings

    Authors: William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, Jad Kabbara

    Abstract: Learning on text-attributed graphs (TAGs), in which nodes are associated with one or more texts, has been the subject of much recent work. However, most approaches tend to make strong assumptions about the downstream task of interest, are reliant on hand-labeled data, or fail to equally balance the importance of both text and graph representations. In this work, we propose Contrastive Graph-Text p… ▽ More

    Submitted 9 July, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: New visualizations, added references, and an application to community detection. To appear at the TextGraphs workshop @ ACL 2024. 21 pages, 5 figures, 13 tables

  7. arXiv:2305.14307  [pdf, other

    cs.CL cs.AI

    Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

    Authors: Robert Morabito, Jad Kabbara, Ali Emami

    Abstract: Debiasing methods that seek to mitigate the tendency of Language Models (LMs) to occasionally output toxic or inappropriate text have recently gained traction. In this paper, we propose a standardized protocol which distinguishes methods that yield not only desirable results, but are also consistent with their mechanisms and specifications. For example, we ask, given a debiasing method that is dev… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 9 pages (excluding references), accepted at ACL Findings 2023

  8. arXiv:2305.02547  [pdf, other

    cs.CL cs.AI cs.HC

    PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

    Authors: Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, Jad Kabbara

    Abstract: Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents which we refer to as LLM personas and present a case study with GPT-3.5 and GPT-4 to… ▽ More

    Submitted 2 April, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: First version in 05/2023. Accepted at NAACL Findings 2024

  9. arXiv:1806.04262  [pdf, other

    cs.CL

    Let's do it "again": A First Computational Approach to Detecting Adverbial Presupposition Triggers

    Authors: Andre Cianflone, Yulan Feng, Jad Kabbara, Jackie Chi Kit Cheung

    Abstract: We introduce the task of predicting adverbial presupposition triggers such as also and again. Solving such a task requires detecting recurring or similar events in the discourse context, and has applications in natural language generation tasks such as summarization and dialogue systems. We create two new datasets for the task, derived from the Penn Treebank and the Annotated English Gigaword corp… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: ACL 2018 camera-ready version. Best paper award. First three listed authors contributed equally