Skip to main content

Showing 1–6 of 6 results for author: Mæhlum, P

.
  1. arXiv:2404.18832  [pdf, other

    cs.CL

    It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments

    Authors: Petter Mæhlum, David Samuel, Rebecka Maria Norman, Elma Jelin, Øyvind Andresen Bjertnæs, Lilja Øvrelid, Erik Velldal

    Abstract: Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, a… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2404.01196  [pdf, other

    cs.CL

    Estimating Lexical Complexity from Document-Level Distributions

    Authors: Sondre Wold, Petter Mæhlum, Oddbjørn Hove

    Abstract: Existing methods for complexity estimation are typically developed for entire documents. This limitation in scope makes them inapplicable for shorter pieces of text, such as health assessment tools. These typically consist of lists of independent sentences, all of which are too short for existing methods to apply. The choice of wording in these assessment tools is crucial, as both the cognitive ca… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024

  3. arXiv:2210.06150  [pdf, other

    cs.CL

    Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

    Authors: Petter Mæhlum, Andre Kåsen, Samia Touileb, Jeremy Barnes

    Abstract: Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokmål and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (Vardial2022). Collocated with COLING2022

  4. arXiv:2201.05123  [pdf, other

    cs.CL

    NorDiaChange: Diachronic Semantic Change Dataset for Norwegian

    Authors: Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Ranveig Enstad, Alexandra Wittemann

    Abstract: We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian. NorDiaChange comprises two novel subsets, covering about 80 Norwegian nouns manually annotated with graded semantic change over time. Both datasets follow the same annotation procedure and can be used interchangeably as train and test splits for each other. NorDiaChange covers the time periods related to pre- and… ▽ More

    Submitted 27 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: LREC'2022 proceedings

  5. arXiv:2104.04989  [pdf, other

    cs.CL

    NorDial: A Preliminary Corpus of Written Norwegian Dialect Use

    Authors: Jeremy Barnes, Petter Mæhlum, Samia Touileb

    Abstract: Norway has a large amount of dialectal variation, as well as a general tolerance to its use in the public sphere. There are, however, few available resources to study this variation and its change over time and in more informal areas, \eg on social media. In this paper, we propose a first step to creating a corpus of dialectal variation of written Norwegian. We collect a small corpus of tweets and… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: Accepted to NoDaLiDa 2021

  6. arXiv:1911.12722  [pdf, other

    cs.CL

    A Fine-Grained Sentiment Dataset for Norwegian

    Authors: Lilja Øvrelid, Petter Mæhlum, Jeremy Barnes, Erik Velldal

    Abstract: We introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed des… ▽ More

    Submitted 6 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted for LREC 2020