Skip to main content

Showing 1–2 of 2 results for author: Perisetla, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.16787  [pdf, other

    cs.CL cs.AI cs.LG

    The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

    Authors: Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

    Abstract: The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tool… ▽ More

    Submitted 4 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 30 pages (18 main), 6 figures, 5 tables

  2. arXiv:2109.05052  [pdf, other

    cs.CL cs.LG

    Entity-Based Knowledge Conflicts in Question Answering

    Authors: Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, Sameer Singh

    Abstract: Knowledge-dependent tasks typically use two sources of knowledge: parametric, learned at training time, and contextual, given as a passage at inference time. To understand how models use these sources together, we formalize the problem of knowledge conflicts, where the contextual information contradicts the learned information. Analyzing the behaviour of popular models, we measure their over-relia… ▽ More

    Submitted 11 January, 2022; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted to Empirical Methods in Natural Language Processing (EMNLP) 2021