Source data selection for out-of-domain generalization

Miao, Xinran; Sankaran, Kris

Computer Science > Machine Learning

arXiv:2202.02155 (cs)

[Submitted on 4 Feb 2022]

Title:Source data selection for out-of-domain generalization

Authors:Xinran Miao, Kris Sankaran

View PDF

Abstract:Models that perform out-of-domain generalization borrow knowledge from heterogeneous source data and apply it to a related but distinct target task. Transfer learning has proven effective for accomplishing this generalization in many applications. However, poor selection of a source dataset can lead to poor performance on the target, a phenomenon called negative transfer. In order to take full advantage of available source data, this work studies source data selection with respect to a target task. We propose two source selection methods that are based on the multi-bandit theory and random search, respectively. We conduct a thorough empirical evaluation on both simulated and real data. Our proposals can be also viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.

Comments:	18 pages, 16 figures
Subjects:	Machine Learning (cs.LG); Applications (stat.AP)
Cite as:	arXiv:2202.02155 [cs.LG]
	(or arXiv:2202.02155v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.02155

Submission history

From: Xinran Miao [view email]
[v1] Fri, 4 Feb 2022 14:37:31 UTC (19,490 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2022-02

Change to browse by:

cs
stat
stat.AP

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kris Sankaran

export BibTeX citation

Computer Science > Machine Learning

Title:Source data selection for out-of-domain generalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Source data selection for out-of-domain generalization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators