Skip to main content

Showing 1–7 of 7 results for author: Kestemont, M

.
  1. From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored

    Authors: Wouter Haverals, Mike Kestemont

    Abstract: This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. A) and Brussels, KBR, 2877-2878 (ms. B). On the basis of codicological and contextual arguments, it is assumed that the scribe who produced B used A as an exemplar. While the similarities in both layout and content between the… ▽ More

    Submitted 6 April, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Journal ref: Journal of Data Mining & Digital Humanities, On the Way to the Future of Digital Manuscript Studies (April 20, 2023) jdmdh:10206

  2. arXiv:2005.11239  [pdf, other

    cs.CL

    Character-level Transformer-based Neural Machine Translation

    Authors: Nikolay Banar, Walter Daelemans, Mike Kestemont

    Abstract: Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel,… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

  3. arXiv:2005.05232  [pdf, other

    cs.CV

    On the Transferability of Winning Tickets in Non-Natural Image Datasets

    Authors: Matthia Sabatelli, Mike Kestemont, Pierre Geurts

    Abstract: We study the generalization properties of pruned neural networks that are the winners of the lottery ticket hypothesis on datasets of natural images. We analyse their potential under conditions in which training data is scarce and comes from a non-natural domain. Specifically, we investigate whether pruned models that are found on the popular CIFAR-10/100 and Fashion-MNIST datasets, generalize to… ▽ More

    Submitted 20 November, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  4. arXiv:1905.02973  [pdf, other

    cs.CL

    On the Feasibility of Automated Detection of Allusive Text Reuse

    Authors: Enrique Manjavacas, Brian Long, Mike Kestemont

    Abstract: The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexical sparsity. A further obstacle is th… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Journal ref: NAACL-HLT (LaTeCH-CLfL Workshop) 2019

  5. arXiv:1903.06939  [pdf, other

    cs.CL

    Improving Lemmatization of Non-Standard Languages with Joint Learning

    Authors: Enrique Manjavacas, Ákos Kádár, Mike Kestemont

    Abstract: Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword. In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (iii): spelling va… ▽ More

    Submitted 16 March, 2019; originally announced March 2019.

    Journal ref: NAACL-HLT 2019

  6. arXiv:1708.05536  [pdf, other

    cs.CL

    Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution

    Authors: E. Manjavacas, J. de Gussem, W. Daelemans, M. Kestemont

    Abstract: Recent applications of neural language models have led to an increased interest in the automatic generation of natural language. However impressive, the evaluation of neurally generated text has so far remained rather informal and anecdotal. Here, we present an attempt at the systematic assessment of one aspect of the quality of neurally generated text. We focus on a specific aspect of neural lang… ▽ More

    Submitted 18 August, 2017; originally announced August 2017.

  7. arXiv:1603.01597  [pdf

    cs.CL cs.LG stat.ML

    Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

    Authors: Mike Kestemont, Jeroen De Gussem

    Abstract: In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solv… ▽ More

    Submitted 3 August, 2017; v1 submitted 4 March, 2016; originally announced March 2016.

    Journal ref: Journal of Data Mining & Digital Humanities, Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages, Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities (August 6, 2017) jdmdh:1398