Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

Schiller, Benjamin; Daxenberger, Johannes; Gurevych, Iryna

Computer Science > Computation and Language

arXiv:2205.11472 (cs)

[Submitted on 23 May 2022 (v1), last revised 15 Jul 2023 (this version, v2)]

Title:Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

Authors:Benjamin Schiller, Johannes Daxenberger, Iryna Gurevych

View PDF

Abstract:The task of Argument Mining, that is extracting argumentative sentences for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argumentative sentences requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. Given the cost and complexity of creating suitably large Argument Mining datasets, we ask whether it is necessary for acceptable performance to have datasets growing in size. Our findings show that, when using carefully composed training samples and a model pretrained on related tasks, we can reach 95% of the maximum performance while reducing the training sample size by at least 85%. This gain is consistent across three Argument Mining tasks on three different datasets. We also publish a new dataset for future benchmarking.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2205.11472 [cs.CL]
	(or arXiv:2205.11472v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.11472

Submission history

From: Benjamin Schiller [view email]
[v1] Mon, 23 May 2022 17:14:32 UTC (89 KB)
[v2] Sat, 15 Jul 2023 14:39:15 UTC (116 KB)

Computer Science > Computation and Language

Title:Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators