-
A Dataset for the Detection of Dehumanizing Language
Authors:
Paul Engelmann,
Peter Brunsgaard Trolle,
Christian Hardmeier
Abstract:
Dehumanization is a mental process that enables the exclusion and ill treatment of a group of people. In this paper, we present two data sets of dehumanizing text, a large, automatically collected corpus and a smaller, manually annotated data set. Both data sets include a combination of political discourse and dialogue from movie subtitles. Our methods give us a broad and varied amount of dehumani…
▽ More
Dehumanization is a mental process that enables the exclusion and ill treatment of a group of people. In this paper, we present two data sets of dehumanizing text, a large, automatically collected corpus and a smaller, manually annotated data set. Both data sets include a combination of political discourse and dialogue from movie subtitles. Our methods give us a broad and varied amount of dehumanization data to work with, enabling further exploratory analysis and automatic classification of dehumanization patterns. Both data sets will be publicly released.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Parallel Data Helps Neural Entity Coreference Resolution
Authors:
Gongbo Tang,
Christian Hardmeier
Abstract:
Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we pro…
▽ More
Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we propose a simple yet effective model to exploit coreference knowledge from parallel data. In addition to the conventional modules learning coreference from annotations, we introduce an unsupervised module to capture cross-lingual coreference knowledge. Our proposed cross-lingual model achieves consistent improvements, up to 1.74 percentage points, on the OntoNotes 5.0 English dataset using 9 different synthetic parallel datasets. These experimental results confirm that parallel data can provide additional coreference knowledge which is beneficial to coreference resolution tasks.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity
Authors:
Dennis Ulmer,
Jes Frellsen,
Christian Hardmeier
Abstract:
We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pr…
▽ More
We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
Authors:
Dennis Ulmer,
Christian Hardmeier,
Jes Frellsen
Abstract:
A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containin…
▽ More
A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containing different significance tests and utility functions specifically tailored towards research needs and usability.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Experimental Standards for Deep Learning in Natural Language Processing Research
Authors:
Dennis Ulmer,
Elisa Bassignana,
Max Müller-Eberstein,
Daniel Varab,
Mike Zhang,
Rob van der Goot,
Christian Hardmeier,
Barbara Plank
Abstract:
The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards…
▽ More
The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in NLP into a single, widely-applicable methodology. Following these best practices is crucial to strengthen experimental evidence, improve reproducibility and support scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.
△ Less
Submitted 17 October, 2022; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Unsupervised Discovery of Unaccusative and Unergative Verbs
Authors:
Sharid Loáiciga,
Luca Bevacqua,
Christian Hardmeier
Abstract:
We present an unsupervised method to detect English unergative and unaccusative verbs. These categories allow us to identify verbs participating in the causative-inchoative alternation without knowing the semantic roles of the verb. The method is based on the generation of intransitive sentence variants of candidate verbs and probing a language model. We obtained results on par with similar approa…
▽ More
We present an unsupervised method to detect English unergative and unaccusative verbs. These categories allow us to identify verbs participating in the causative-inchoative alternation without knowing the semantic roles of the verb. The method is based on the generation of intransitive sentence variants of candidate verbs and probing a language model. We obtained results on par with similar approaches, with the added benefit of not relying on annotated resources.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Prior and Posterior Networks: A Survey on Evidential Deep Learning Methods For Uncertainty Estimation
Authors:
Dennis Ulmer,
Christian Hardmeier,
Jes Frellsen
Abstract:
Popular approaches for quantifying predictive uncertainty in deep neural networks often involve distributions over weights or multiple models, for instance via Markov Chain sampling, ensembling, or Monte Carlo dropout. These techniques usually incur overhead by having to train multiple model instances or do not produce very diverse predictions. This comprehensive and extensive survey aims to famil…
▽ More
Popular approaches for quantifying predictive uncertainty in deep neural networks often involve distributions over weights or multiple models, for instance via Markov Chain sampling, ensembling, or Monte Carlo dropout. These techniques usually incur overhead by having to train multiple model instances or do not produce very diverse predictions. This comprehensive and extensive survey aims to familiarize the reader with an alternative class of models based on the concept of Evidential Deep Learning: For unfamiliar data, they aim to admit "what they don't know", and fall back onto a prior belief. Furthermore, they allow uncertainty estimation in a single model and forward pass by parameterizing distributions over distributions. This survey recapitulates existing works, focusing on the implementation in a classification setting, before surveying the application of the same paradigm to regression. We also reflect on the strengths and weaknesses compared to other existing methods and provide the most fundamental derivations using a unified notation to aid future research.
△ Less
Submitted 7 March, 2023; v1 submitted 6 October, 2021;
originally announced October 2021.
-
How to Write a Bias Statement: Recommendations for Submissions to the Workshop on Gender Bias in NLP
Authors:
Christian Hardmeier,
Marta R. Costa-jussà,
Kellie Webster,
Will Radford,
Su Lin Blodgett
Abstract:
At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage authors to give explicit consideration to the wider aspects of bias and its social implications. For the 2020 edition of the workshop, we therefore requested that all authors include an explicit bias statement in their work to clarify how their work relates to the social context in which NLP systems are used.
The programme co…
▽ More
At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage authors to give explicit consideration to the wider aspects of bias and its social implications. For the 2020 edition of the workshop, we therefore requested that all authors include an explicit bias statement in their work to clarify how their work relates to the social context in which NLP systems are used.
The programme committee of the workshops included a number of reviewers with a background in the humanities and social sciences, in addition to NLP experts doing the bulk of the reviewing. Each paper was assigned one of those reviewers, and they were asked to pay specific attention to the provided bias statements in their reviews. This initiative was well received by the authors who submitted papers to the workshop, several of whom said they received useful suggestions and literature hints from the bias reviewers. We are therefore planning to keep this feature of the review process in future editions of the workshop.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Principal Word Vectors
Authors:
Ali Basirat,
Christian Hardmeier,
Joakim Nivre
Abstract:
We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regar…
▽ More
We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regard to different types of contexts and different types of annotations provided for a corpus. The second is to generalize the transformation step used in most of the word embedding methods. To this end, we define two levels of transformations. The first is a quadratic transformation, which accounts for different types of weighting over the vocabulary units and contextual features. Second is an adaptive non-linear transformation, which reshapes the data distribution to be meaningful to principal component analysis. The effect of these generalizations on the word vectors is intrinsically studied with regard to the spread and the discriminability of the word vectors. We also provide an extrinsic evaluation of the contribution of the principal word vectors on a word similarity benchmark and the task of dependency parsing. Our experiments are finalized by a comparison between the principal word vectors and other sets of word vectors generated with popular word embedding methods. The results obtained from our intrinsic evaluation metrics show that the spread and the discriminability of the principal word vectors are higher than that of other word embedding methods. The results obtained from the extrinsic evaluation metrics show that the principal word vectors are better than some of the word embedding methods and on par with popular methods of word embedding.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction
Authors:
Liane Guillou,
Christian Hardmeier,
Preslav Nakov,
Sara Stymne,
Jörg Tiedemann,
Yannick Versley,
Mauro Cettolo,
Bonnie Webber,
Andrei Popescu-Belis
Abstract:
We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a placeholder value in the target-language text, provided in lemmatised and PoS-tagged form. We provided four subtasks, for the English-French an…
▽ More
We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a placeholder value in the target-language text, provided in lemmatised and PoS-tagged form. We provided four subtasks, for the English-French and English-German language pairs, in both directions. Eleven teams participated in the shared task; nine for the English-French subtask, five for French-English, nine for English-German, and six for German-English. Most of the submissions outperformed two strong language-model based baseline systems, with systems using deep recurrent neural networks outperforming those using other architectures for most language pairs.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Getting Gender Right in Neural Machine Translation
Authors:
Eva Vanmassenhove,
Christian Hardmeier,
Andy Way
Abstract:
Speakers of different languages must attend to and encode strikingly different aspects of the world in order to use their language correctly (Sapir, 1921; Slobin, 1996). One such difference is related to the way gender is expressed in a language. Saying "I am happy" in English, does not encode any additional knowledge of the speaker that uttered the sentence. However, many other languages do have…
▽ More
Speakers of different languages must attend to and encode strikingly different aspects of the world in order to use their language correctly (Sapir, 1921; Slobin, 1996). One such difference is related to the way gender is expressed in a language. Saying "I am happy" in English, does not encode any additional knowledge of the speaker that uttered the sentence. However, many other languages do have grammatical gender systems and so such knowledge would be encoded. In order to correctly translate such a sentence into, say, French, the inherent gender information needs to be retained/recovered. The same sentence would become either "Je suis heureux", for a male speaker or "Je suis heureuse" for a female one. Apart from morphological agreement, demographic factors (gender, age, etc.) also influence our use of language in terms of word choices or even on the level of syntactic constructions (Tannen, 1991; Pennebaker et al., 2003). We integrate gender information into NMT systems. Our contribution is two-fold: (1) the compilation of large datasets with speaker information for 20 language pairs, and (2) a simple set of experiments that incorporate gender information into NMT for multiple language pairs. Our experiments show that adding a gender feature to an NMT system significantly improves the translation quality for some language pairs.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
Pronoun Translation in English-French Machine Translation: An Analysis of Error Types
Authors:
Christian Hardmeier,
Liane Guillou
Abstract:
Pronouns are a long-standing challenge in machine translation. We present a study of the performance of a range of rule-based, statistical and neural MT systems on pronoun translation based on an extensive manual evaluation using the PROTEST test suite, which enables a fine-grained analysis of different pronoun types and sheds light on the difficulties of the task. We find that the rule-based appr…
▽ More
Pronouns are a long-standing challenge in machine translation. We present a study of the performance of a range of rule-based, statistical and neural MT systems on pronoun translation based on an extensive manual evaluation using the PROTEST test suite, which enables a fine-grained analysis of different pronoun types and sheds light on the difficulties of the task. We find that the rule-based approaches in our corpus perform poorly as a result of oversimplification, whereas SMT and early NMT systems exhibit significant shortcomings due to a lack of awareness of the functional and referential properties of pronouns. A recent Transformer-based NMT system with cross-sentence context shows very promising results on non-anaphoric pronouns and intra-sentential anaphora, but there is still considerable room for improvement in examples with cross-sentence dependencies.
△ Less
Submitted 30 August, 2018;
originally announced August 2018.
-
Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point
Authors:
Liane Guillou,
Christian Hardmeier
Abstract:
We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and…
▽ More
We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
Universal Word Segmentation: Implementation and Interpretation
Authors:
Yan Shao,
Christian Hardmeier,
Joakim Nivre
Abstract:
Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. T…
▽ More
Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF
Authors:
Yan Shao,
Christian Hardmeier,
Jörg Tiedemann,
Joakim Nivre
Abstract:
We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tag…
▽ More
We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.
△ Less
Submitted 12 September, 2017; v1 submitted 5 April, 2017;
originally announced April 2017.
-
Learning Structural Kernels for Natural Language Processing
Authors:
Daniel Beck,
Trevor Cohn,
Christian Hardmeier,
Lucia Specia
Abstract:
Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection…
▽ More
Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection by maximizing the evidence on the training data through gradient-based methods. In this paper we show how to perform this in the context of structural kernels by using Gaussian Processes. Experimental results on tree kernels show that this procedure results in better prediction performance compared to hyperparameter optimization via grid search. The framework proposed in this paper can be adapted to other structures besides trees, e.g., strings and graphs, thereby extending the utility of kernel-based methods.
△ Less
Submitted 10 August, 2015;
originally announced August 2015.