Skip to main content

Showing 1–7 of 7 results for author: van Cranenburgh, A

.
  1. arXiv:2011.01624  [pdf, other

    cs.CL

    Results of a Single Blind Literary Taste Test with Short Anonymized Novel Fragments

    Authors: Andreas van Cranenburgh, Corina Koolen

    Abstract: It is an open question to what extent perceptions of literary quality are derived from text-intrinsic versus social factors. While supervised models can predict literary quality ratings from textual factors quite successfully, as shown in the Riddle of Literary Quality project (Koolen et al., 2020), this does not prove that social factors are not important, nor can we assume that readers make judg… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted for LaTeCH 2020 @ COLING

  2. arXiv:2011.01615  [pdf, other

    cs.CL

    A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News

    Authors: Corbèn Poot, Andreas van Cranenburgh

    Abstract: We evaluate a rule-based (Lee et al., 2013) and neural (Lee et al., 2018) coreference system on Dutch datasets of two domains: literary novels and news/Wikipedia text. The results provide insight into the relative strengths of data-driven and knowledge-driven systems, as well as the influence of domain, document length, and annotation schemes. The neural system performs best on news/Wikipedia text… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted for CRAC 2020 @ COLING

  3. arXiv:2004.13580  [pdf, other

    cs.CL

    Embarrassingly Simple Unsupervised Aspect Extraction

    Authors: Stéphan Tulkens, Andreas van Cranenburgh

    Abstract: We present a simple but effective method for aspect identification in sentiment analysis. Our unsupervised method only requires word embeddings and a POS tagger, and is therefore straightforward to apply to new domains and languages. We introduce Contrastive Attention (CAt), a novel single-head attention mechanism based on an RBF kernel, which gives a considerable boost in performance and makes th… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: Accepted as ACL 2020 short paper

  4. What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

    Authors: Wietse de Vries, Andreas van Cranenburgh, Malvina Nissim

    Abstract: Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers. To investigate to what extent these results also hold for a language other than English, we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, through a deeper analysis of part-… ▽ More

    Submitted 12 October, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

    Comments: Accepted at Findings of EMNLP 2020 (camera-ready)

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2020

  5. arXiv:1912.09582  [pdf, other

    cs.CL

    BERTje: A Dutch BERT Model

    Authors: Wietse de Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli, Gertjan van Noord, Malvina Nissim

    Abstract: The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

  6. arXiv:1701.03329  [pdf, other

    cs.CL

    A Data-Oriented Model of Literary Language

    Authors: Andreas van Cranenburgh, Rens Bod

    Abstract: We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of… ▽ More

    Submitted 26 January, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

    Comments: To be published in EACL 2017, 11 pages

    Journal ref: Proceedings of EACL 2017, pp. 1228-1238

  7. arXiv:1410.0286  [pdf, other

    cs.CL

    LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible

    Authors: Dirk Roorda, Gino Kalkman, Martijn Naaijer, Andreas van Cranenburgh

    Abstract: The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an analy… ▽ More

    Submitted 1 October, 2014; originally announced October 2014.

    Journal ref: Computational Linguistics in the Netherlands Journal, Volume 4, December 2014, pp. 105-109