Skip to main content

Showing 1–2 of 2 results for author: Kåsen, A

.
  1. arXiv:2305.13527  [pdf, other

    cs.CL

    Aligning the Norwegian UD Treebank with Entity and Coreference Information

    Authors: Tollef Emil Jørgensen, Andre Kåsen

    Abstract: This paper presents a merged collection of entity and coreference annotated data grounded in the Universal Dependencies (UD) treebanks for the two written forms of Norwegian: Bokmål and Nynorsk. The aligned and converted corpora are the Norwegian Named Entities (NorNE) and Norwegian Anaphora Resolution Corpus (NARC). While NorNE is aligned with an older version of the treebank, NARC is misaligned… ▽ More

    Submitted 25 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 4 pages, 1 table. Appendix: 3 tables and 5 data examples

    ACM Class: I.2.7

  2. arXiv:2210.06150  [pdf, other

    cs.CL

    Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

    Authors: Petter Mæhlum, Andre Kåsen, Samia Touileb, Jeremy Barnes

    Abstract: Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokmål and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (Vardial2022). Collocated with COLING2022