Skip to main content

Showing 1–2 of 2 results for author: Manerba, M M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.17389  [pdf, other

    cs.CL cs.AI

    FairBelief -- Assessing Harmful Beliefs in Language Models

    Authors: Mattia Setzu, Marta Marchiori Manerba, Pasquale Minervini, Debora Nozza

    Abstract: Language Models (LMs) have been shown to inherit undesired biases that might hurt minorities and underrepresented groups if such systems were integrated into real-world applications without careful fairness auditing. This paper proposes FairBelief, an analytical approach to capture and assess beliefs, i.e., propositions that an LM may embed with different degrees of confidence and that covertly in… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  2. arXiv:2311.09090  [pdf, other

    cs.CL

    Social Bias Probing: Fairness Benchmarking for Language Models

    Authors: Marta Marchiori Manerba, Karolina StaƄczak, Riccardo Guidotti, Isabelle Augenstein

    Abstract: While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to… ▽ More

    Submitted 22 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.