Showing 1–2 of 2 results for author: Iskander, S

Search v0.5.6 released 2020-02-24

arXiv:2403.09516 [pdf, other]

cs.CL cs.CY cs.LG

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Authors: Shadi Iskander, Kira Radinsky, Yonatan Belinkov

Abstract: Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporat… ▽ More Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches. △ Less

Submitted 5 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.
arXiv:2305.10204 [pdf, other]

cs.CL cs.AI

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Authors: Shadi Iskander, Kira Radinsky, Yonatan Belinkov

Abstract: Natural language processing models tend to learn and encode social biases present in the data. One popular approach for addressing such biases is to eliminate encoded information from the model's representations. However, current methods are restricted to removing only linearly encoded information. In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non… ▽ More Natural language processing models tend to learn and encode social biases present in the data. One popular approach for addressing such biases is to eliminate encoded information from the model's representations. However, current methods are restricted to removing only linearly encoded information. In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non-linear encoded concepts from neural representations. Our method consists of iteratively training neural classifiers to predict a particular attribute we seek to eliminate, followed by a projection of the representation on a hypersurface, such that the classifiers become oblivious to the target attribute. We evaluate the effectiveness of our method on the task of removing gender and race information as sensitive attributes. Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations, with minimal impact on downstream task accuracy. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: This paper will be published in the proceedings of Findings of ACL 2023

Search v0.5.6 released 2020-02-24