Simplicity Bias Leads to Amplified Performance Disparities

Bell, Samuel J.; Sagun, Levent

doi:10.1145/3593013.3594003

Computer Science > Machine Learning

arXiv:2212.06641 (cs)

[Submitted on 13 Dec 2022 (v1), last revised 8 Jun 2023 (this version, v2)]

Title:Simplicity Bias Leads to Amplified Performance Disparities

Authors:Samuel J. Bell, Levent Sagun

View PDF

Abstract:Which parts of a dataset will a given model find difficult? Recent work has shown that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class, or to rely upon harmful spurious correlations. Here, we show that the preference for "easy" runs far deeper: A model may prioritize any class or group of the dataset that it finds simple-at the expense of what it finds complex-as measured by performance difference on the test set. When subsets with different levels of complexity align with demographic groups, we term this difficulty disparity, a phenomenon that occurs even with balanced datasets that lack group/label associations. We show how difficulty disparity is a model-dependent quantity, and is further amplified in commonly-used models as selected by typical average performance scores. We quantify an amplification factor across a range of settings in order to compare disparity of different models on a fixed dataset. Finally, we present two real-world examples of difficulty amplification in action, resulting in worse-than-expected performance disparities between groups even when using a balanced dataset. The existence of such disparities in balanced datasets demonstrates that merely balancing sample sizes of groups is not sufficient to ensure unbiased performance. We hope this work presents a step towards measurable understanding of the role of model bias as it interacts with the structure of data, and call for additional model-dependent mitigation methods to be deployed alongside dataset audits.

Comments:	In 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23). ACM
Subjects:	Machine Learning (cs.LG); Computers and Society (cs.CY)
Cite as:	arXiv:2212.06641 [cs.LG]
	(or arXiv:2212.06641v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.06641
Related DOI:	https://doi.org/10.1145/3593013.3594003

Submission history

From: Samuel Bell [view email]
[v1] Tue, 13 Dec 2022 15:24:41 UTC (1,395 KB)
[v2] Thu, 8 Jun 2023 13:33:01 UTC (630 KB)

Computer Science > Machine Learning

Title:Simplicity Bias Leads to Amplified Performance Disparities

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Simplicity Bias Leads to Amplified Performance Disparities

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators