Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Arpit, Devansh; Wang, Huan; Zhou, Yingbo; **. Taking advantage of our observation, we show that instead of ensembling unaveraged models (that is typical in practice), ensembling moving average models (EoA) from independent runs further boosts performance. We theoretically explain the boost in performance of ensembling and model averaging by adapting the well known Bias-Variance trade-off to the domain generalization setting. On the DomainBed benchmark, when using a pre-trained ResNet-50, this ensemble of averages achieves an average of $68.0\%$, beating vanilla ERM (w/o averaging/ensembling) by $\sim 4\%$, and when using a pre-trained RegNetY-16GF, achieves an average of $76.6\%$, beating vanilla ERM by $6\%$. Our code is available at https://github.com/salesforce/ensemble-of-averages.

Computer Science > Machine Learning

arXiv:2110.10832 (cs)

[Submitted on 21 Oct 2021 (v1), last revised 10 Oct 2022 (this version, v4)]

Title:Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Authors:Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

View PDF

Abstract:In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stop**. Taking advantage of our observation, we show that instead of ensembling unaveraged models (that is typical in practice), ensembling moving average models (EoA) from independent runs further boosts performance. We theoretically explain the boost in performance of ensembling and model averaging by adapting the well known Bias-Variance trade-off to the domain generalization setting. On the DomainBed benchmark, when using a pre-trained ResNet-50, this ensemble of averages achieves an average of $68.0\%$, beating vanilla ERM (w/o averaging/ensembling) by $\sim 4\%$, and when using a pre-trained RegNetY-16GF, achieves an average of $76.6\%$, beating vanilla ERM by $6\%$. Our code is available at this https URL.

Comments:	Accepted at NeurIPS 2022
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2110.10832 [cs.LG]
	(or arXiv:2110.10832v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2110.10832

Submission history

From: Devansh Arpit [view email]
[v1] Thu, 21 Oct 2021 00:08:17 UTC (316 KB)
[v2] Tue, 26 Oct 2021 23:46:41 UTC (316 KB)
[v3] Tue, 24 May 2022 16:02:46 UTC (351 KB)
[v4] Mon, 10 Oct 2022 20:20:12 UTC (702 KB)

Computer Science > Machine Learning

Title:Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators