ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Xiao, Ruixuan; Dong, Yiwen; Wang, Haobo; Feng, Lei; Wu, Runze; Chen, Gang; Zhao, Junbo

Computer Science > Machine Learning

arXiv:2207.10276 (cs)

[Submitted on 21 Jul 2022 (v1), last revised 3 Aug 2023 (this version, v4)]

Title:ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Authors:Ruixuan Xiao, Yiwen Dong, Haobo Wang, Lei Feng, Runze Wu, Gang Chen, Junbo Zhao

View PDF

Abstract:Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48\% on the CIFAR-N dataset. The code is available at this https URL

Comments:	Accepted to IJCAI 2023; A previous version won the 1st LMNL Challenge in IJCAI 2022
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2207.10276 [cs.LG]
	(or arXiv:2207.10276v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2207.10276

Submission history

From: Ruixuan Xiao [view email]
[v1] Thu, 21 Jul 2022 03:01:04 UTC (387 KB)
[v2] Fri, 22 Jul 2022 09:43:25 UTC (386 KB)
[v3] Wed, 2 Aug 2023 08:32:57 UTC (994 KB)
[v4] Thu, 3 Aug 2023 12:20:15 UTC (994 KB)

Computer Science > Machine Learning

Title:ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators