Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

Puli, Aahlad; Zhang, Lily H.; Oermann, Eric K.; Ranganath, Rajesh

Computer Science > Machine Learning

arXiv:2107.00520 (cs)

[Submitted on 29 Jun 2021 (v1), last revised 12 Feb 2023 (this version, v5)]

Title:Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

Authors:Aahlad Puli, Lily H. Zhang, Eric K. Oermann, Rajesh Ranganath

View PDF

Abstract:In many prediction problems, spurious correlations are induced by a changing relationship between the label and a nuisance variable that is also correlated with the covariates. For example, in classifying animals in natural images, the background, which is a nuisance, can predict the type of animal. This nuisance-label relationship does not always hold, and the performance of a model trained under one such relationship may be poor on data with a different nuisance-label relationship. To build predictive models that perform well regardless of the nuisance-label relationship, we develop Nuisance-Randomized Distillation (NURD). We introduce the nuisance-randomized distribution, a distribution where the nuisance and the label are independent. Under this distribution, we define the set of representations such that conditioning on any member, the nuisance and the label remain independent. We prove that the representations in this set always perform better than chance, while representations outside of this set may not. NURD finds a representation from this set that is most informative of the label under the nuisance-randomized distribution, and we prove that this representation achieves the highest performance regardless of the nuisance-label relationship. We evaluate NURD on several tasks including chest X-ray classification where, using non-lung patches as the nuisance, NURD produces models that predict pneumonia under strong spurious correlations.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2107.00520 [cs.LG]
	(or arXiv:2107.00520v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2107.00520

Submission history

From: Aahlad Manas Puli [view email]
[v1] Tue, 29 Jun 2021 18:12:59 UTC (1,130 KB)
[v2] Tue, 6 Jul 2021 08:41:13 UTC (1,130 KB)
[v3] Mon, 18 Oct 2021 06:00:47 UTC (1,418 KB)
[v4] Thu, 17 Mar 2022 06:48:04 UTC (2,245 KB)
[v5] Sun, 12 Feb 2023 23:13:35 UTC (1,707 KB)

Computer Science > Machine Learning

Title:Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators