Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

Seedat, Nabeel; Imrie, Fergus; van der Schaar, Mihaela

Computer Science > Machine Learning

arXiv:2403.04551 (cs)

[Submitted on 7 Mar 2024]

Title:Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

Authors:Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar

View PDF

Abstract:Characterizing samples that are difficult to learn from is crucial to develo** highly performant ML models. This has led to numerous Hardness Characterization Methods (HCMs) that aim to identify "hard" samples. However, there is a lack of consensus regarding the definition and evaluation of "hardness". Unfortunately, current HCMs have only been evaluated on specific types of hardness and often only qualitatively or with respect to downstream performance, overlooking the fundamental quantitative identification task. We address this gap by presenting a fine-grained taxonomy of hardness types. Additionally, we propose the Hardness Characterization Analysis Toolkit (H-CAT), which supports comprehensive and quantitative benchmarking of HCMs across the hardness taxonomy and can easily be extended to new HCMs, hardness types, and datasets. We use H-CAT to evaluate 13 different HCMs across 8 hardness types. This comprehensive evaluation encompassing over 14K setups uncovers strengths and weaknesses of different HCMs, leading to practical tips to guide HCM selection and future development. Our findings highlight the need for more comprehensive HCM evaluation, while we hope our hardness taxonomy and toolkit will advance the principled evaluation and uptake of data-centric AI methods.

Comments:	Published at International Conference on Learning Representations (ICLR) 2024
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2403.04551 [cs.LG]
	(or arXiv:2403.04551v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.04551

Submission history

From: Nabeel Seedat [view email]
[v1] Thu, 7 Mar 2024 14:45:03 UTC (16,883 KB)

Computer Science > Machine Learning

Title:Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators