Skip to main content

Showing 1–25 of 25 results for author: Velldal, E

.
  1. arXiv:2407.03916  [pdf, other

    cs.CL

    Entity-Level Sentiment: More than the Sum of Its Parts

    Authors: Egil Rønningstad, Roman Klinger, Erik Velldal, Lilja Øvrelid

    Abstract: In sentiment analysis of longer texts, there may be a variety of topics discussed, of entities mentioned, and of sentiments expressed regarding each entity. We find a lack of studies exploring how such texts express their sentiment towards each entity of interest, and how these sentiments can be modelled. In order to better understand how sentiment regarding persons and organizations (each entity… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 14th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2024)

  2. arXiv:2406.04989  [pdf, other

    cs.CL

    Compositional Generalization with Grounded Language Models

    Authors: Sondre Wold, Étienne Simon, Lucas Georges Gabriel Charpentier, Egor V. Kostylev, Erik Velldal, Lilja Øvrelid

    Abstract: Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. By extending previous work on compositional generalization in semantic parsing, we allow for a controlled evaluation of the degree to which these models learn and generalize from patterns in knowledge graphs. We develop a procedure for generat… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACL 2024, Findings

  3. arXiv:2404.18832  [pdf, other

    cs.CL

    It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments

    Authors: Petter Mæhlum, David Samuel, Rebecka Maria Norman, Elma Jelin, Øyvind Andresen Bjertnæs, Lilja Øvrelid, Erik Velldal

    Abstract: Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, a… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2306.02871  [pdf, other

    cs.CL

    Text-To-KG Alignment: Comparing Current Methods on Classification Tasks

    Authors: Sondre Wold, Lilja Øvrelid, Erik Velldal

    Abstract: In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Camera ready version for MATCHING workshop at ACL 2023

  5. arXiv:2305.03880  [pdf, other

    cs.CL

    NorBench -- A Benchmark for Norwegian Language Models

    Authors: David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja Øvrelid, Egil Rønningstad, Elina Sigdel, Anna Palatkina

    Abstract: We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench.

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to NoDaLiDa 2023

  6. arXiv:2304.14241  [pdf, other

    cs.CL

    Entity-Level Sentiment Analysis (ELSA): An exploratory task survey

    Authors: Egil Rønningstad, Erik Velldal, Lilja Øvrelid

    Abstract: This paper explores the task of identifying the overall sentiment expressed towards volitional entities (persons and organizations) in a document -- what we refer to as Entity-Level Sentiment Analysis (ELSA). While identifying sentiment conveyed towards an entity is well researched for shorter texts like tweets, we find little to no research on this specific task for longer texts with multiple men… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of the 29th International Conference on Computational Linguistics, 2022, pages 6773-6783

  7. arXiv:2304.05764  [pdf, other

    cs.CL

    Measuring Normative and Descriptive Biases in Language Models Using Census Data

    Authors: Samia Touileb, Lilja Øvrelid, Erik Velldal

    Abstract: We investigate in this paper how distributions of occupations with respect to gender is reflected in pre-trained language models. Such distributions are not always aligned to normative ideals, nor do they necessarily reflect a descriptive assessment of reality. In this paper, we introduce an approach for measuring to what degree pre-trained language models are aligned to normative and descriptive… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted at EACL2023 -- main conference

  8. arXiv:2303.09859  [pdf, other

    cs.CL

    Trained on 100 million words and still in shape: BERT meets British National Corpus

    Authors: David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal

    Abstract: While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus. We show that pre-training on this carefully curated corpus can reach better performance than the original BERT model. We argue that this ty… ▽ More

    Submitted 5 May, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: Accepted to EACL 2023

  9. Contextualized language models for semantic change detection: lessons learned

    Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

    Abstract: We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our fi… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Journal ref: Northern European Journal of Language Technology (NEJLT). ISSN 2000-1533. 8(1)

  10. arXiv:2203.13209  [pdf, other

    cs.CL

    Direct parsing to sentiment graphs

    Authors: David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal

    Abstract: This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions.

    Submitted 26 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022

  11. arXiv:2105.14504  [pdf, other

    cs.CL

    Structured Sentiment Analysis as Dependency Graph Parsing

    Authors: Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal

    Abstract: Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e,g,, target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency gra… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

    Comments: Accepted at ACL-IJCNLP 2021

  12. arXiv:2104.06546  [pdf, other

    cs.CL

    Large-Scale Contextualised Language Modelling for Norwegian

    Authors: Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen

    Abstract: We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo an… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted to NoDaLiDa'2021

  13. arXiv:2102.00299  [pdf, other

    cs.CL

    If you've got it, flaunt it: Making the most of fine-grained sentiment annotations

    Authors: Jeremy Barnes, Lilja Øvrelid, Erik Velldal

    Abstract: Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity.In this paper, we explore whether incorporating h… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

    Comments: To appear in EACL 2021

  14. arXiv:2002.08131  [pdf, other

    cs.CL cs.LG

    A Systematic Comparison of Architectures for Document-Level Sentiment Classification

    Authors: Jeremy Barnes, Vinit Ravishankar, Lilja Øvrelid, Erik Velldal

    Abstract: Documents are composed of smaller pieces - paragraphs, sentences, and tokens - that have complex relationships between one another. Sentiment classification models that take into account the structure inherent in these documents have a theoretical advantage over those that do not. At the same time, transfer learning models based on language model pretraining have shown promise for document classif… ▽ More

    Submitted 2 February, 2022; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 5 pages, 2 figures

  15. arXiv:1911.12722  [pdf, other

    cs.CL

    A Fine-Grained Sentiment Dataset for Norwegian

    Authors: Lilja Øvrelid, Petter Mæhlum, Jeremy Barnes, Erik Velldal

    Abstract: We introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed des… ▽ More

    Submitted 6 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted for LREC 2020

  16. arXiv:1911.12146  [pdf, other

    cs.CL

    NorNE: Annotating Named Entities for Norwegian

    Authors: Fredrik Jørgensen, Tobias Aasmoe, Anne-Stine Ruud Husevåg, Lilja Øvrelid, Erik Velldal

    Abstract: This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and… ▽ More

    Submitted 6 March, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: Accepted for LREC 2020

  17. arXiv:1907.12674  [pdf, other

    cs.CL

    One-to-X analogical reasoning on word embeddings: a case for diachronic armed conflict prediction from news texts

    Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

    Abstract: We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type `location:armed-group' based on data about past events. As the source of semantic information, we use diachronic word embeddi… ▽ More

    Submitted 29 July, 2019; originally announced July 2019.

    Comments: 1st International Workshop on Computational Approaches to Historical Language Change (ACL 2019)

  18. Improving Sentiment Analysis with Multi-task Learning of Negation

    Authors: Jeremy Barnes, Erik Velldal, Lilja Øvrelid

    Abstract: Sentiment analysis is directly affected by compositional phenomena in language that act on the prior polarity of the words and phrases found in the text. Negation is the most prevalent of these phenomena and in order to correctly predict sentiment, a classifier must be able to identify negation and disentangle the effect that its scope has on the final polarity of a text. This paper proposes a mul… ▽ More

    Submitted 1 October, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Under submission for Journal of Natural Language Engineering special issue on Negation. 30 pages with references

    Journal ref: Nat. Lang. Eng. 27 (2021) 249-269

  19. arXiv:1906.05887  [pdf, other

    cs.CL

    Sentiment analysis is not solved! Assessing and probing sentiment classification

    Authors: Jeremy Barnes, Lilja Øvrelid, Erik Velldal

    Abstract: Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English a… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: Accepted to BlackBoxNLP Workshop at ACL 2019

  20. arXiv:1906.05061  [pdf, other

    cs.CL

    Probing Multilingual Sentence Representations With X-Probe

    Authors: Vinit Ravishankar, Lilja Øvrelid, Erik Velldal

    Abstract: This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by map** sentence repres… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: To appear at RepL4NLP '19

  21. arXiv:1809.06748  [pdf, other

    cs.CL

    Transfer and Multi-Task Learning for Noun-Noun Compound Interpretation

    Authors: Murhaf Fares, Stephan Oepen, Erik Velldal

    Abstract: In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun--noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification mod… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018: Conference on Empirical Methods in Natural Language Processing (EMNLP)

  22. arXiv:1806.03537  [pdf, ps, other

    cs.CL

    Diachronic word embeddings and semantic shifts: a survey

    Authors: Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, Erik Velldal

    Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic resear… ▽ More

    Submitted 13 June, 2018; v1 submitted 9 June, 2018; originally announced June 2018.

    Comments: Proceedings of COLING 2018

  23. arXiv:1710.05370  [pdf, other

    cs.CL

    NoReC: The Norwegian Review Corpus

    Authors: Erik Velldal, Lilja Øvrelid, Eivind Alexander Bergem, Cathrine Stadsnes, Samia Touileb, Fredrik Jørgensen

    Abstract: This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each revi… ▽ More

    Submitted 15 October, 2017; originally announced October 2017.

    Comments: Pending (non-anonymous) review for LREC 2018

  24. arXiv:1707.08660  [pdf, ps, other

    cs.CL

    Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants

    Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

    Abstract: This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let u… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

    Comments: to appear in EMNLP 2017 proceedings

  25. arXiv:1608.03803  [pdf, other

    cs.CL

    Redefining part-of-speech classes with distributional semantic models

    Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

    Abstract: This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliati… ▽ More

    Submitted 12 August, 2016; originally announced August 2016.

    Journal ref: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 115-125