Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish
Authors:
Büşra Marşan,
Salih Furkan Akkurt,
Muhammet Şen,
Merve Gürbüz,
Onur Güngör,
Şaziye Betül Özateş,
Suzan Üsküdarlı,
Arzucan Özgür,
Tunga Güngör,
Balkız Öztürk
Abstract:
In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework.
In order to tackle these issues, new annotation conventions were introduced by splitting certain lemma…
▽ More
In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework.
In order to tackle these issues, new annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation. Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency parser and an updated version of the BoAT Tool is introduced.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
BoAT v2 -- A Web-Based Dependency Annotation Tool with Focus on Agglutinative Languages
Authors:
Salih Furkan Akkurt,
Büşra Marşan,
Susan Uskudarli
Abstract:
The value of quality treebanks is steadily increasing due to the crucial role they play in the development of natural language processing tools. The creation of such treebanks is enormously labor-intensive and time-consuming. Especially when the size of treebanks is considered, tools that support the annotation process are essential. Various annotation tools have been proposed, however, they are o…
▽ More
The value of quality treebanks is steadily increasing due to the crucial role they play in the development of natural language processing tools. The creation of such treebanks is enormously labor-intensive and time-consuming. Especially when the size of treebanks is considered, tools that support the annotation process are essential. Various annotation tools have been proposed, however, they are often not suitable for agglutinative languages such as Turkish. BoAT v1 was developed for annotating dependency relations and was subsequently used to create the manually annotated BOUN Treebank (UD_Turkish-BOUN). In this work, we report on the design and implementation of a dependency annotation tool BoAT v2 based on the experiences gained from the use of BoAT v1, which revealed several opportunities for improvement. BoAT v2 is a multi-user and web-based dependency annotation tool that is designed with a focus on the annotator user experience to yield valid annotations. The main objectives of the tool are to: (1) support creating valid and consistent annotations with increased speed, (2) significantly improve the user experience of the annotator, (3) support collaboration among annotators, and (4) provide an open-source and easily deployable web-based annotation tool with a flexible application programming interface (API) to benefit the scientific community. This paper discusses the requirements elicitation, design, and implementation of BoAT v2 along with examples.
△ Less
Submitted 3 February, 2024; v1 submitted 4 July, 2022;
originally announced July 2022.