Skip to main content

Showing 1–16 of 16 results for author: Hardmeier, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08764  [pdf, other

    cs.CL

    A Dataset for the Detection of Dehumanizing Language

    Authors: Paul Engelmann, Peter Brunsgaard Trolle, Christian Hardmeier

    Abstract: Dehumanization is a mental process that enables the exclusion and ill treatment of a group of people. In this paper, we present two data sets of dehumanizing text, a large, automatically collected corpus and a smaller, manually annotated data set. Both data sets include a combination of political discourse and dialogue from movie subtitles. Our methods give us a broad and varied amount of dehumani… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  2. arXiv:2305.17709  [pdf, other

    cs.CL

    Parallel Data Helps Neural Entity Coreference Resolution

    Authors: Gongbo Tang, Christian Hardmeier

    Abstract: Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we pro… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: camera-ready version; to appear in the Findings of ACL 2023

  3. arXiv:2210.15452  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

    Authors: Dennis Ulmer, Jes Frellsen, Christian Hardmeier

    Abstract: We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pr… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  4. arXiv:2204.06815  [pdf, other

    cs.LG

    deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks

    Authors: Dennis Ulmer, Christian Hardmeier, Jes Frellsen

    Abstract: A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containin… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  5. arXiv:2204.06251  [pdf, other

    cs.LG cs.CL

    Experimental Standards for Deep Learning in Natural Language Processing Research

    Authors: Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, Barbara Plank

    Abstract: The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards… ▽ More

    Submitted 17 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

  6. arXiv:2111.00808  [pdf, ps, other

    cs.CL

    Unsupervised Discovery of Unaccusative and Unergative Verbs

    Authors: Sharid Loáiciga, Luca Bevacqua, Christian Hardmeier

    Abstract: We present an unsupervised method to detect English unergative and unaccusative verbs. These categories allow us to identify verbs participating in the causative-inchoative alternation without knowing the semantic roles of the verb. The method is based on the generation of intransitive sentence variants of candidate verbs and probing a language model. We obtained results on par with similar approa… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  7. arXiv:2110.03051  [pdf, other

    cs.LG cs.AI stat.ML

    Prior and Posterior Networks: A Survey on Evidential Deep Learning Methods For Uncertainty Estimation

    Authors: Dennis Ulmer, Christian Hardmeier, Jes Frellsen

    Abstract: Popular approaches for quantifying predictive uncertainty in deep neural networks often involve distributions over weights or multiple models, for instance via Markov Chain sampling, ensembling, or Monte Carlo dropout. These techniques usually incur overhead by having to train multiple model instances or do not produce very diverse predictions. This comprehensive and extensive survey aims to famil… ▽ More

    Submitted 7 March, 2023; v1 submitted 6 October, 2021; originally announced October 2021.

  8. arXiv:2104.03026  [pdf, ps, other

    cs.CL

    How to Write a Bias Statement: Recommendations for Submissions to the Workshop on Gender Bias in NLP

    Authors: Christian Hardmeier, Marta R. Costa-jussà, Kellie Webster, Will Radford, Su Lin Blodgett

    Abstract: At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage authors to give explicit consideration to the wider aspects of bias and its social implications. For the 2020 edition of the workshop, we therefore requested that all authors include an explicit bias statement in their work to clarify how their work relates to the social context in which NLP systems are used. The programme co… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: This document was originally published as a blog post on the web site of GeBNLP 2020

  9. arXiv:2007.04629  [pdf, ps, other

    cs.CL

    Principal Word Vectors

    Authors: Ali Basirat, Christian Hardmeier, Joakim Nivre

    Abstract: We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regar… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  10. arXiv:1911.12091  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction

    Authors: Liane Guillou, Christian Hardmeier, Preslav Nakov, Sara Stymne, Jörg Tiedemann, Yannick Versley, Mauro Cettolo, Bonnie Webber, Andrei Popescu-Belis

    Abstract: We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a placeholder value in the target-language text, provided in lemmatised and PoS-tagged form. We provided four subtasks, for the English-French an… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

    Comments: cross-lingual pronoun prediction, WMT, shared task, English, German, French

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: WMT-2016

  11. Getting Gender Right in Neural Machine Translation

    Authors: Eva Vanmassenhove, Christian Hardmeier, Andy Way

    Abstract: Speakers of different languages must attend to and encode strikingly different aspects of the world in order to use their language correctly (Sapir, 1921; Slobin, 1996). One such difference is related to the way gender is expressed in a language. Saying "I am happy" in English, does not encode any additional knowledge of the speaker that uttered the sentence. However, many other languages do have… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), October-November, 2018. Brussels, Belgium, pages 3003-3008, URL: https://www.aclweb.org/anthology/D18-1334, DOI: 10.18653/v1/D18-1334

    Journal ref: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

  12. arXiv:1808.10196  [pdf, ps, other

    cs.CL

    Pronoun Translation in English-French Machine Translation: An Analysis of Error Types

    Authors: Christian Hardmeier, Liane Guillou

    Abstract: Pronouns are a long-standing challenge in machine translation. We present a study of the performance of a range of rule-based, statistical and neural MT systems on pronoun translation based on an extensive manual evaluation using the PROTEST test suite, which enables a fine-grained analysis of different pronoun types and sheds light on the difficulties of the task. We find that the rule-based appr… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

  13. arXiv:1808.04164  [pdf, ps, other

    cs.CL

    Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

    Authors: Liane Guillou, Christian Hardmeier

    Abstract: We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  14. arXiv:1807.02974  [pdf, ps, other

    cs.CL

    Universal Word Segmentation: Implementation and Interpretation

    Authors: Yan Shao, Christian Hardmeier, Joakim Nivre

    Abstract: Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. T… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

    Journal ref: Transactions of the Association for Computational Linguistics, vol. 6, pp. 421--435, 2018

  15. arXiv:1704.01314  [pdf, ps, other

    cs.CL

    Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

    Authors: Yan Shao, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre

    Abstract: We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tag… ▽ More

    Submitted 12 September, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

    Comments: 10 pages plus 1 page appendix, 3 figures, IJCNLP 2017

  16. arXiv:1508.02131  [pdf, other

    cs.CL cs.LG

    Learning Structural Kernels for Natural Language Processing

    Authors: Daniel Beck, Trevor Cohn, Christian Hardmeier, Lucia Specia

    Abstract: Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection… ▽ More

    Submitted 10 August, 2015; originally announced August 2015.

    Comments: Transactions of the Association for Computational Linguistics, 2015