Corpus and Models for Lemmatisation and POS-tagging of Old French
Authors:
Jean-Baptiste Camps,
Thibault Clérice,
Frédéric Duval,
Lucence Ing,
Naomi Kanaoka,
Ariane Pinche
Abstract:
Old French is a typical example of an under-resourced historic languages, that furtherly displays animportant amount of linguistic variation. In this paper, we present the current results of a long going project (2015-...) and describe how we broached the difficult question of providing lemmatisation andPOS models for Old French with the help of neural taggers and the progressive constitution of d…
▽ More
Old French is a typical example of an under-resourced historic languages, that furtherly displays animportant amount of linguistic variation. In this paper, we present the current results of a long going project (2015-...) and describe how we broached the difficult question of providing lemmatisation andPOS models for Old French with the help of neural taggers and the progressive constitution of dedicated corpora.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic Hypothesis
Authors:
Jean-Baptiste Camps,
Thibault Clérice,
Ariane Pinche
Abstract:
Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and errors introduced in the tradition, complicate the task of the would-be stylometrist. Basing the analysis on the study of the copy from a single hand of several texts can partially mitigate these issues (Camps and C…
▽ More
Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and errors introduced in the tradition, complicate the task of the would-be stylometrist. Basing the analysis on the study of the copy from a single hand of several texts can partially mitigate these issues (Camps and Cafiero, 2013), but the limited availability of complete diplomatic transcriptions might make this difficult. In this paper, we use a workflow combining handwritten text recognition and stylometric analysis, applied to the case of the hagiographic works contained in MS BnF, fr. 412. We seek to evaluate Paul Meyer's hypothesis about the constitution of groups of hagiographic works, as well as to examine potential authorial grou**s in a vastly anonymous corpus.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.