Skip to main content

Showing 1–4 of 4 results for author: Pozzobon, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.03893  [pdf, other

    cs.CL cs.AI

    From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

    Authors: Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis

    Abstract: To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient anno… ▽ More

    Submitted 30 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  2. arXiv:2310.07589  [pdf, other

    cs.AI

    Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

    Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

    Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes i… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  3. arXiv:2309.04564  [pdf, other

    cs.CL cs.LG

    When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

    Authors: Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

    Abstract: Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scra** the internet, leading to pretraining datasets comprised of noisy web text. To date, efforts to prune these datasets down to a higher quality subset have relied on hand-crafted heuristics encoded as rule-based filters. In this work… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 14 pages, 8 figures

  4. arXiv:2304.12397  [pdf, other

    cs.CL cs.AI

    On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

    Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker

    Abstract: Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relat… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.