Search | arXiv e-print repository

Entity-Level Sentiment: More than the Sum of Its Parts

Authors: Egil Rønningstad, Roman Klinger, Erik Velldal, Lilja Øvrelid

Abstract: In sentiment analysis of longer texts, there may be a variety of topics discussed, of entities mentioned, and of sentiments expressed regarding each entity. We find a lack of studies exploring how such texts express their sentiment towards each entity of interest, and how these sentiments can be modelled. In order to better understand how sentiment regarding persons and organizations (each entity… ▽ More In sentiment analysis of longer texts, there may be a variety of topics discussed, of entities mentioned, and of sentiments expressed regarding each entity. We find a lack of studies exploring how such texts express their sentiment towards each entity of interest, and how these sentiments can be modelled. In order to better understand how sentiment regarding persons and organizations (each entity in our scope) is expressed in longer texts, we have collected a dataset of expert annotations where the overall sentiment regarding each entity is identified, together with the sentence-level sentiment for these entities separately. We show that the reader's perceived sentiment regarding an entity often differs from an arithmetic aggregation of sentiments at the sentence level. Only 70\% of the positive and 55\% of the negative entities receive a correct overall sentiment label when we aggregate the (human-annotated) sentiment labels for the sentences where the entity is mentioned. Our dataset reveals the complexity of entity-specific sentiment in longer texts, and allows for more precise modelling and evaluation of such sentiment expressions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 14th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2024)

arXiv:2406.04989 [pdf, other]

Compositional Generalization with Grounded Language Models

Authors: Sondre Wold, Étienne Simon, Lucas Georges Gabriel Charpentier, Egor V. Kostylev, Erik Velldal, Lilja Øvrelid

Abstract: Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. By extending previous work on compositional generalization in semantic parsing, we allow for a controlled evaluation of the degree to which these models learn and generalize from patterns in knowledge graphs. We develop a procedure for generat… ▽ More Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. By extending previous work on compositional generalization in semantic parsing, we allow for a controlled evaluation of the degree to which these models learn and generalize from patterns in knowledge graphs. We develop a procedure for generating natural language questions paired with knowledge graphs that targets different aspects of compositionality and further avoids grounding the language models in information already encoded implicitly in their weights. We evaluate existing methods for combining language models with knowledge graphs and find them to struggle with generalization to sequences of unseen lengths and to novel combinations of seen base components. While our experimental results provide some insight into the expressive power of these models, we hope our work and released datasets motivate future research on how to better combine language models with structured knowledge representations. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: ACL 2024, Findings

arXiv:2404.18832 [pdf, other]

It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments

Authors: Petter Mæhlum, David Samuel, Rebecka Maria Norman, Elma Jelin, Øyvind Andresen Bjertnæs, Lilja Øvrelid, Erik Velldal

Abstract: Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, a… ▽ More Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, annotation can be a time-consuming and resource-intensive process, particularly when it requires domain expertise. We therefore also evaluate a possible alternative to human annotation, using large language models (LLMs) as annotators. We perform an extensive evaluation of the approach for two openly available pretrained LLMs for Norwegian, experimenting with different configurations of prompts and in-context learning, comparing their performance to human annotators. We find that even for zero-shot runs, models perform well above the baseline for binary sentiment, but still cannot compete with human annotators on the full dataset. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2306.02871 [pdf, other]

Text-To-KG Alignment: Comparing Current Methods on Classification Tasks

Authors: Sondre Wold, Lilja Øvrelid, Erik Velldal

Abstract: In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that… ▽ More In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that retrieve information from KGs like ConceptNet as additional context. Many of these models consist of multiple components, and although they differ in the number and nature of these parts, they all have in common that for some given text query, they attempt to identify and retrieve a relevant subgraph from the KG. Due to the noise and idiosyncrasies often found in KGs, it is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query. In this work, we try to bridge this knowledge gap by reviewing current approaches to text-to-KG alignment and evaluating them on two datasets where manually created graphs are available, providing insights into the effectiveness of current methods. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Camera ready version for MATCHING workshop at ACL 2023

arXiv:2305.03880 [pdf, other]

NorBench -- A Benchmark for Norwegian Language Models

Authors: David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja Øvrelid, Egil Rønningstad, Elina Sigdel, Anna Palatkina

Abstract: We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench. We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Accepted to NoDaLiDa 2023

arXiv:2304.14241 [pdf, other]

Entity-Level Sentiment Analysis (ELSA): An exploratory task survey

Authors: Egil Rønningstad, Erik Velldal, Lilja Øvrelid

Abstract: This paper explores the task of identifying the overall sentiment expressed towards volitional entities (persons and organizations) in a document -- what we refer to as Entity-Level Sentiment Analysis (ELSA). While identifying sentiment conveyed towards an entity is well researched for shorter texts like tweets, we find little to no research on this specific task for longer texts with multiple men… ▽ More This paper explores the task of identifying the overall sentiment expressed towards volitional entities (persons and organizations) in a document -- what we refer to as Entity-Level Sentiment Analysis (ELSA). While identifying sentiment conveyed towards an entity is well researched for shorter texts like tweets, we find little to no research on this specific task for longer texts with multiple mentions and opinions towards the same entity. This lack of research would be understandable if ELSA can be derived from existing tasks and models. To assess this, we annotate a set of professional reviews for their overall sentiment towards each volitional entity in the text. We sample from data already annotated for document-level, sentence-level, and target-level sentiment in a multi-domain review corpus, and our results indicate that there is no single proxy task that provides this overall sentiment we seek for the entities at a satisfactory level of performance. We present a suite of experiments aiming to assess the contribution towards ELSA provided by document-, sentence-, and target-level sentiment analysis, and provide a discussion of their shortcomings. We show that sentiment in our dataset is expressed not only with an entity mention as target, but also towards targets with a sentiment-relevant relation to a volitional entity. In our data, these relations extend beyond anaphoric coreference resolution, and our findings call for further research of the topic. Finally, we also present a survey of previous relevant work. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Journal ref: Proceedings of the 29th International Conference on Computational Linguistics, 2022, pages 6773-6783

arXiv:2304.05764 [pdf, other]

Measuring Normative and Descriptive Biases in Language Models Using Census Data

Authors: Samia Touileb, Lilja Øvrelid, Erik Velldal

Abstract: We investigate in this paper how distributions of occupations with respect to gender is reflected in pre-trained language models. Such distributions are not always aligned to normative ideals, nor do they necessarily reflect a descriptive assessment of reality. In this paper, we introduce an approach for measuring to what degree pre-trained language models are aligned to normative and descriptive… ▽ More We investigate in this paper how distributions of occupations with respect to gender is reflected in pre-trained language models. Such distributions are not always aligned to normative ideals, nor do they necessarily reflect a descriptive assessment of reality. In this paper, we introduce an approach for measuring to what degree pre-trained language models are aligned to normative and descriptive occupational distributions. To this end, we use official demographic information about gender--occupation distributions provided by the national statistics agencies of France, Norway, United Kingdom, and the United States. We manually generate template-based sentences combining gendered pronouns and nouns with occupations, and subsequently probe a selection of ten language models covering the English, French, and Norwegian languages. The scoring system we introduce in this work is language independent, and can be used on any combination of template-based sentences, occupations, and languages. The approach could also be extended to other dimensions of national census data and other demographic variables. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted at EACL2023 -- main conference

arXiv:2303.09859 [pdf, other]

Trained on 100 million words and still in shape: BERT meets British National Corpus

Authors: David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal

Abstract: While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus. We show that pre-training on this carefully curated corpus can reach better performance than the original BERT model. We argue that this ty… ▽ More While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British National Corpus. We show that pre-training on this carefully curated corpus can reach better performance than the original BERT model. We argue that this type of corpora has great potential as a language modeling benchmark. To showcase this potential, we present fair, reproducible and data-efficient comparative studies of LMs, in which we evaluate several training objectives and model architectures and replicate previous empirical results in a systematic way. We propose an optimized LM architecture called LTG-BERT. △ Less

Submitted 5 May, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: Accepted to EACL 2023

arXiv:2209.00154 [pdf, other]

doi 10.3384/nejlt.2000-1533.2022.3478

Contextualized language models for semantic change detection: lessons learned

Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

Abstract: We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our fi… ▽ More We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable). Such challenging cases are discussed in detail with examples, and their linguistic categorization is proposed. Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance, which naturally stem from their distributional nature, but is different from the types of issues observed in methods based on static embeddings. Additionally, they often merge together syntactic and semantic aspects of lexical entities. We propose a range of possible future solutions to these issues. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Journal ref: Northern European Journal of Language Technology (NEJLT). ISSN 2000-1533. 8(1)

arXiv:2203.13209 [pdf, other]

Direct parsing to sentiment graphs

Authors: David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal

Abstract: This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions. This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions. △ Less

Submitted 26 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Accepted to ACL 2022

arXiv:2105.14504 [pdf, other]

Structured Sentiment Analysis as Dependency Graph Parsing

Authors: Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal

Abstract: Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e,g,, target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency gra… ▽ More Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e,g,, target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency graph parsing, where the nodes are spans of sentiment holders, targets and expressions, and the arcs are the relations between them. We perform experiments on five datasets in four languages (English, Norwegian, Basque, and Catalan) and show that this approach leads to strong improvements over state-of-the-art baselines. Our analysis shows that refining the sentiment graphs with syntactic dependency information further improves results. △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: Accepted at ACL-IJCNLP 2021

arXiv:2104.06546 [pdf, other]

Large-Scale Contextualised Language Modelling for Norwegian

Authors: Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen

Abstract: We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo an… ▽ More We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see http://norlm.nlpl.eu △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: Accepted to NoDaLiDa'2021

arXiv:2102.00299 [pdf, other]

If you've got it, flaunt it: Making the most of fine-grained sentiment annotations

Authors: Jeremy Barnes, Lilja Øvrelid, Erik Velldal

Abstract: Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity.In this paper, we explore whether incorporating h… ▽ More Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity.In this paper, we explore whether incorporating holder and expression information can improve target extraction and classification and perform experiments on eight English datasets. We conclude that jointly predicting target and polarity BIO labels improves target extraction, and that augmenting the input text with gold expressions generally improves targeted polarity classification. This highlights the potential importance of annotating expressions for fine-grained sentiment datasets. At the same time, our results show that performance of current models for predicting polar expressions is poor, hampering the benefit of this information in practice. △ Less

Submitted 30 January, 2021; originally announced February 2021.

Comments: To appear in EACL 2021

arXiv:2002.08131 [pdf, other]

A Systematic Comparison of Architectures for Document-Level Sentiment Classification

Authors: Jeremy Barnes, Vinit Ravishankar, Lilja Øvrelid, Erik Velldal

Abstract: Documents are composed of smaller pieces - paragraphs, sentences, and tokens - that have complex relationships between one another. Sentiment classification models that take into account the structure inherent in these documents have a theoretical advantage over those that do not. At the same time, transfer learning models based on language model pretraining have shown promise for document classif… ▽ More Documents are composed of smaller pieces - paragraphs, sentences, and tokens - that have complex relationships between one another. Sentiment classification models that take into account the structure inherent in these documents have a theoretical advantage over those that do not. At the same time, transfer learning models based on language model pretraining have shown promise for document classification. However, these two paradigms have not been systematically compared and it is not clear under which circumstances one approach is better than the other. In this work we empirically compare hierarchical models and transfer learning for document-level sentiment classification. We show that non-trivial hierarchical models outperform previous baselines and transfer learning on document-level sentiment classification in five languages. △ Less

Submitted 2 February, 2022; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: 5 pages, 2 figures

arXiv:1911.12722 [pdf, other]

A Fine-Grained Sentiment Dataset for Norwegian

Authors: Lilja Øvrelid, Petter Mæhlum, Jeremy Barnes, Erik Velldal

Abstract: We introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed des… ▽ More We introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed description of this annotation effort. We provide an overview of the developed annotation guidelines, illustrated with examples, and present an analysis of inter-annotator agreement. We also report the first experimental results on the dataset, intended as a preliminary benchmark for further experiments. △ Less

Submitted 6 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: Accepted for LREC 2020

arXiv:1911.12146 [pdf, other]

NorNE: Annotating Named Entities for Norwegian

Authors: Fredrik Jørgensen, Tobias Aasmoe, Anne-Stine Ruud Husevåg, Lilja Øvrelid, Erik Velldal

Abstract: This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and… ▽ More This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and events, in addition to a class corresponding to nominals derived from names. We here present details on the annotation effort, guidelines, inter-annotator agreement and an experimental analysis of the corpus using a neural sequence labeling architecture. △ Less

Submitted 6 March, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: Accepted for LREC 2020

arXiv:1907.12674 [pdf, other]

One-to-X analogical reasoning on word embeddings: a case for diachronic armed conflict prediction from news texts

Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

Abstract: We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type `location:armed-group' based on data about past events. As the source of semantic information, we use diachronic word embeddi… ▽ More We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type `location:armed-group' based on data about past events. As the source of semantic information, we use diachronic word embedding models trained on English news texts. A simple technique to improve diachronic performance in such task is demonstrated, using a threshold based on a function of cosine distance to decrease the number of false positives; this approach is shown to be beneficial on two different corpora. Finally, we publish a ready-to-use test set for one-to-X analogy evaluation on historical armed conflicts data. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: 1st International Workshop on Computational Approaches to Historical Language Change (ACL 2019)

arXiv:1906.07610 [pdf, other]

doi 10.1017/S1351324920000510

Improving Sentiment Analysis with Multi-task Learning of Negation

Authors: Jeremy Barnes, Erik Velldal, Lilja Øvrelid

Abstract: Sentiment analysis is directly affected by compositional phenomena in language that act on the prior polarity of the words and phrases found in the text. Negation is the most prevalent of these phenomena and in order to correctly predict sentiment, a classifier must be able to identify negation and disentangle the effect that its scope has on the final polarity of a text. This paper proposes a mul… ▽ More Sentiment analysis is directly affected by compositional phenomena in language that act on the prior polarity of the words and phrases found in the text. Negation is the most prevalent of these phenomena and in order to correctly predict sentiment, a classifier must be able to identify negation and disentangle the effect that its scope has on the final polarity of a text. This paper proposes a multi-task approach to explicitly incorporate information about negation in sentiment analysis, which we show outperforms learning negation implicitly in a data-driven manner. We describe our approach, a cascading neural architecture with selective sharing of LSTM layers, and show that explicitly training the model with negation as an auxiliary task helps improve the main task of sentiment analysis. The effect is demonstrated across several different standard English-language data sets for both tasks and we analyze several aspects of our system related to its performance, varying types and amounts of input data and different multi-task setups. △ Less

Submitted 1 October, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Under submission for Journal of Natural Language Engineering special issue on Negation. 30 pages with references

Journal ref: Nat. Lang. Eng. 27 (2021) 249-269

arXiv:1906.05887 [pdf, other]

Sentiment analysis is not solved! Assessing and probing sentiment classification

Authors: Jeremy Barnes, Lilja Øvrelid, Erik Velldal

Abstract: Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English a… ▽ More Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. The dataset is available at https://github.com/ltgoslo/assessing_and_probing_sentiment. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: Accepted to BlackBoxNLP Workshop at ACL 2019

arXiv:1906.05061 [pdf, other]

Probing Multilingual Sentence Representations With X-Probe

Authors: Vinit Ravishankar, Lilja Øvrelid, Erik Velldal

Abstract: This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by map** sentence repres… ▽ More This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by map** sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: To appear at RepL4NLP '19

arXiv:1809.06748 [pdf, other]

Transfer and Multi-Task Learning for Noun-Noun Compound Interpretation

Authors: Murhaf Fares, Stephan Oepen, Erik Velldal

Abstract: In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun--noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification mod… ▽ More In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun--noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification model generalize over a highly skewed distribution of relations. Further, we demonstrate how dual annotation with two distinct sets of relations over the same set of compounds can be exploited to improve the overall accuracy of a neural classifier and its F1 scores on the less frequent, but more difficult relations. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: EMNLP 2018: Conference on Empirical Methods in Natural Language Processing (EMNLP)

arXiv:1806.03537 [pdf, ps, other]

Diachronic word embeddings and semantic shifts: a survey

Authors: Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, Erik Velldal

Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic resear… ▽ More Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications. △ Less

Submitted 13 June, 2018; v1 submitted 9 June, 2018; originally announced June 2018.

Comments: Proceedings of COLING 2018

arXiv:1710.05370 [pdf, other]

NoReC: The Norwegian Review Corpus

Authors: Erik Velldal, Lilja Øvrelid, Eivind Alexander Bergem, Cathrine Stadsnes, Samia Touileb, Fredrik Jørgensen

Abstract: This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each revi… ▽ More This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1-6, as provided by the rating of the original author. This first release of the corpus comprises more than 35,000 reviews. It is distributed using the CoNLL-U format, pre-processed using UDPipe, along with a rich set of metadata. The work reported in this paper forms part of the SANT initiative (Sentiment Analysis for Norwegian Text), a project seeking to provide resources and tools for sentiment analysis and opinion mining for Norwegian. As resources for sentiment analysis have so far been unavailable for Norwegian, NoReC represents a highly valuable and sought-after addition to Norwegian language technology. △ Less

Submitted 15 October, 2017; originally announced October 2017.

Comments: Pending (non-anonymous) review for LREC 2018

arXiv:1707.08660 [pdf, ps, other]

Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants

Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

Abstract: This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let u… ▽ More This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let us map between members of the relation. The proposed approach is evaluated on the task of predicting insurgent armed groups based on geographical locations. The gold standard data for the time span 1994--2010 is extracted from the UCDP Armed Conflicts dataset. The results show that the method is feasible and outperforms the baselines, but also that important work still remains to be done. △ Less

Submitted 26 July, 2017; originally announced July 2017.

Comments: to appear in EMNLP 2017 proceedings

arXiv:1608.03803 [pdf, other]

Redefining part-of-speech classes with distributional semantic models

Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid

Abstract: This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliati… ▽ More This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliation contained in the distributional vectors allows us to discover groups of words with distributional patterns that differ from other words of the same part of speech. This data often reveals hidden inconsistencies of the annotation process or guidelines. At the same time, it supports the notion of `soft' or `graded' part of speech affiliations. Finally, we show that information about PoS is distributed among dozens of vector components, not limited to only one or two features. △ Less

Submitted 12 August, 2016; originally announced August 2016.

Journal ref: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 115-125

Showing 1–25 of 25 results for author: Velldal, E