The Fragility of Multi-Treebank Parsing Evaluation

Alonso-Alonso, Iago; Vilares, David; Gómez-Rodríguez, Carlos

Computer Science > Computation and Language

arXiv:2209.06699 (cs)

[Submitted on 14 Sep 2022]

Title:The Fragility of Multi-Treebank Parsing Evaluation

Authors:Iago Alonso-Alonso, David Vilares, Carlos Gómez-Rodríguez

View PDF

Abstract:Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, it is possible to detect potentially harmful strategies.

Comments:	Accepted at COLING 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2209.06699 [cs.CL]
	(or arXiv:2209.06699v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.06699

Submission history

From: David Vilares [view email]
[v1] Wed, 14 Sep 2022 15:07:29 UTC (6,411 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2022-09

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:The Fragility of Multi-Treebank Parsing Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Fragility of Multi-Treebank Parsing Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators