Probabilistic Bias Mitigation in Word Embeddings

Joren, Hailey; Alvarez-Melis, David

Computer Science > Computation and Language

arXiv:1910.14497 (cs)

[Submitted on 31 Oct 2019 (v1), last revised 26 Apr 2020 (this version, v2)]

Title:Probabilistic Bias Mitigation in Word Embeddings

Authors:Hailey Joren, David Alvarez-Melis

View PDF

Abstract: It has been shown that word embeddings derived from large corpora tend to incorporate biases present in their training data. Various methods for mitigating these biases have been proposed, but recent work has demonstrated that these methods hide but fail to truly remove the biases, which can still be observed in word nearest-neighbor statistics. In this work we propose a probabilistic view of word embedding bias. We leverage this framework to present a novel method for mitigating bias which relies on probabilistic observations to yield a more robust bias mitigation algorithm. We demonstrate that this method effectively reduces bias according to three separate measures of bias while maintaining embedding quality across various popular benchmark semantic tasks

Comments:	4 pages, 4 figures, Workshop on Human-Centric Machine Learning at NeurIPS 2019
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1910.14497 [cs.CL]
	(or arXiv:1910.14497v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1910.14497

Submission history

From: Hailey Joren [view email]
[v1] Thu, 31 Oct 2019 14:34:14 UTC (136 KB)
[v2] Sun, 26 Apr 2020 23:17:22 UTC (136 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

David Alvarez-Melis

export BibTeX citation

Computer Science > Computation and Language

Title:Probabilistic Bias Mitigation in Word Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Probabilistic Bias Mitigation in Word Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators