Skip to main content

Showing 1–50 of 92 results for author: Plank, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01137  [pdf, other

    cs.CL cs.AI

    An Empirical Comparison of Generative Approaches for Product Attribute-Value Identification

    Authors: Kassem Sabeh, Robert Litschko, Mouna Kacimi, Barbara Plank, Johann Gamper

    Abstract: Product attributes are crucial for e-commerce platforms, supporting applications like search, recommendation, and question answering. The task of Product Attribute and Value Identification (PAVI) involves identifying both attributes and their values from product information. In this paper, we formulate PAVI as a generation task and provide, to the best of our knowledge, the most comprehensive eval… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.18403  [pdf, other

    cs.CL

    LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

    Abstract: There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human anno… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17600  [pdf, other

    cs.CL

    "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

    Authors: Beiduo Chen, Xinpeng Wang, Siyao Peng, Robert Litschko, Anna Korhonen, Barbara Plank

    Abstract: Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their c… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  4. arXiv:2406.16732  [pdf, other

    cs.CL

    CLIMATELI: Evaluating Entity Linking on Climate Change Data

    Authors: Shijia Zhou, Siyao Peng, Barbara Plank

    Abstract: Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights… ▽ More

    Submitted 27 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 8 pages, accepted at ClimateNLP 2024 workshop @ ACL 2024

  5. arXiv:2406.12546  [pdf, other

    cs.CL

    Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

    Authors: Philipp Mondorf, Barbara Plank

    Abstract: Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character's identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement. Solving these puzzles requires not only direct deductions from ind… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 22 pages, 19 figures

  6. arXiv:2406.11096  [pdf, other

    cs.CL

    The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

    Authors: Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter

    Abstract: Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may have. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOV). However, measuring AOV embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of… ▽ More

    Submitted 1 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2405.18218  [pdf, other

    cs.LG

    FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

    Authors: Yang Zhang, Yawei Li, Xinpeng Wang, Qianli Shen, Barbara Plank, Bernd Bischl, Mina Rezaei, Kenji Kawaguchi

    Abstract: Overparametrized transformer networks are the state-of-the-art architecture for Large Language Models (LLMs). However, such models contain billions of parameters making large compute a necessity, while raising environmental concerns. To address these issues, we propose FinerCut, a new form of fine-grained layer pruning, which in contrast to prior work at the transformer block level, considers all… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 22 pages

  8. arXiv:2405.02411  [pdf, other

    cs.CL

    The Call for Socially Aware Language Technologies

    Authors: Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank

    Abstract: Language technologies have made enormous progress, especially with the introduction of large language models (LLMs). On traditional tasks such as machine translation and sentiment analysis, these models perform at near-human level. These advances can, however, exacerbate a variety of issues that models have traditionally struggled with, such as bias, evaluation, and risks. In this position paper,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  9. arXiv:2404.13760  [pdf, other

    cs.CL

    How to Encode Domain Information in Relation Classification

    Authors: Elisa Bassignana, Viggo Unmack Gascou, Frida Nøhr Laustsen, Gustav Kristensen, Marie Haahr Petersen, Rob van der Goot, Barbara Plank

    Abstract: Current language models require a lot of training data to obtain high performance. For Relation Classification (RC), many datasets are domain-specific, so combining datasets to obtain better performance is non-trivial. We explore a multi-domain training setup for RC, and attempt to improve performance by encoding domain information. Our proposed models improve > 2 Macro-F1 against the baseline set… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-COLING 2024

  10. arXiv:2404.08382  [pdf, other

    cs.CL cs.AI

    Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

    Authors: Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank

    Abstract: Multiple choice questions (MCQs) are commonly used to evaluate the capabilities of large language models (LLMs). One common way to evaluate the model response is to rank the candidate answers based on the log probability of the first token prediction. An alternative way is to examine the text output. Prior work has shown that first token probabilities lack robustness to changes in MCQ phrasing, an… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  11. arXiv:2404.02570  [pdf, other

    cs.CL

    MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

    Authors: Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko

    Abstract: This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  12. arXiv:2404.01869  [pdf, other

    cs.CL cs.AI

    Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

    Authors: Philipp Mondorf, Barbara Plank

    Abstract: Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs' reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow ac… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 26 pages, 2 figures

  13. arXiv:2403.12749  [pdf, other

    cs.CL

    Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

    Authors: Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank

    Abstract: Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs fro… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  14. arXiv:2403.10293  [pdf, other

    cs.CL

    MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

    Authors: Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze, Barbara Plank

    Abstract: Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  15. arXiv:2403.05902  [pdf, other

    cs.CL

    MaiBaam Annotation Guidelines

    Authors: Verena Blaschke, Barbara Kovačić, Siyao Peng, Barbara Plank

    Abstract: This document provides the annotation guidelines for MaiBaam, a Bavarian corpus annotated with part-of-speech (POS) tags and syntactic dependencies. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  16. arXiv:2403.01931  [pdf, other

    cs.CL

    VariErr NLI: Separating Annotation Error from Human Label Variation

    Authors: Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank

    Abstract: Human label variation arises when annotators assign different labels to the same item for valid reasons, while annotation errors occur when labels are assigned for invalid reasons. These two issues are prevalent in NLP benchmarks, yet existing research has studied them in isolation. To the best of our knowledge, there exists no prior work that focuses on teasing apart error from signal, especially… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 14 pages, accepted at ACL 2024 main

  17. arXiv:2403.01208  [pdf, other

    cs.HC stat.ME

    Position: Insights from Survey Methodology can Improve Training Data

    Authors: Stephanie Eckman, Barbara Plank, Frauke Kreuter

    Abstract: Whether future AI models are fair, trustworthy, and aligned with the public's interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performin… ▽ More

    Submitted 7 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures. ICML 2024 Position Paper, forthcoming

    ACM Class: E.0

  18. arXiv:2402.16102  [pdf, other

    cs.CL

    Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

    Authors: Joris Baan, Raquel Fernández, Barbara Plank, Wilker Aziz

    Abstract: With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; t… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: EACL 2024 main

  19. arXiv:2402.14856  [pdf, other

    cs.CL cs.AI

    Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

    Authors: Philipp Mondorf, Barbara Plank

    Abstract: Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily a… ▽ More

    Submitted 3 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: ACL 2024 main, 31 pages, 19 figures

  20. arXiv:2402.14499  [pdf, other

    cs.CL

    "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

    Authors: Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

    Abstract: The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions (MCQ) to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final r… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  21. arXiv:2402.11968  [pdf, other

    cs.CL

    What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects

    Authors: Verena Blaschke, Christoph Purschke, Hinrich Schütze, Barbara Plank

    Abstract: Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations' needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German -- a group of varieties… ▽ More

    Submitted 7 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: ACL 2024 main

  22. arXiv:2402.07214  [pdf, other

    cs.CL

    Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification

    Authors: Shanshan Xu, T. Y. S. S Santosh, Oana Ichim, Barbara Plank, Matthias Grabmair

    Abstract: In legal decisions, split votes (SV) occur when judges cannot reach a unanimous decision, posing a difficulty for lawyers who must navigate diverse legal arguments and opinions. In high-stakes domains, understanding the alignment of perceived difficulty between humans and AI systems is crucial to build trust. However, existing NLP calibration methods focus on a classifier's awareness of predictive… ▽ More

    Submitted 6 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  23. arXiv:2402.05617  [pdf, other

    cs.CL

    Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings

    Authors: Elena Senger, Mike Zhang, Rob van der Goot, Barbara Plank

    Abstract: Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis. Core tasks in this application domain are skill extraction and classification from job postings. Because of its quick growth and its interdisciplinary nature, there is no exhaustive assessment of this emerging field. This survey aims to… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Published at NLP4HR 2024 (EACL Workshop)

  24. arXiv:2402.02864  [pdf, other

    cs.CL cs.HC

    EEVEE: An Easy Annotation Tool for Natural Language Processing

    Authors: Axel Sorensen, Siyao Peng, Barbara Plank, Rob van der Goot

    Abstract: Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets. There is a wide variety of tools available; setting up these tools is however a hindrance. We propose EEVEE, an annotation tool focused on simplicity, efficiency, and ease of use. It can run directly in the browser (no setup required) and uses tab-separated files (as opposed to character offsets or tas… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 6 pages; accepted to The Linguistic Annotation Workshop (LAW) at EACL 2024

  25. arXiv:2402.02078  [pdf, other

    cs.CL

    Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties

    Authors: Ekaterina Artemova, Verena Blaschke, Barbara Plank

    Abstract: Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages. We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data. Inspired by prior work on English va… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: To appear in EACL 2024 (main)

  26. arXiv:2402.01423  [pdf, other

    cs.CL

    Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

    Authors: Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank

    Abstract: Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition. While recent studies address and aim to correct annotation errors via re-labeling efforts, little is known about the sources of human label variation, such as text ambiguity, annotation error, or guideline divergence. This is especially the case for high-quality datasets and beyond English CoNLL03.… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 9 pages; Accepted at UnImplicit workshop at EACL 2024

  27. arXiv:2401.17979  [pdf, other

    cs.CL

    Entity Linking in the Job Market Domain

    Authors: Mike Zhang, Rob van der Goot, Barbara Plank

    Abstract: In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Pr… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at EACL 2024 Findings

  28. arXiv:2401.17092  [pdf, other

    cs.CL

    NNOSE: Nearest Neighbor Occupational Skill Extraction

    Authors: Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

    Abstract: The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks -- combining and leveraging multiple datasets for skill extraction, to identify rare… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted at EACL 2024 Main

  29. arXiv:2311.09122  [pdf, other

    cs.CL

    Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

    Authors: Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

    Abstract: We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse langu… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Camera-ready

  30. arXiv:2310.16484  [pdf, other

    cs.CL

    Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

    Authors: Max Müller-Eberstein, Rob van der Goot, Barbara Plank, Ivan Titov

    Abstract: Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational s… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (Findings)

  31. arXiv:2310.14979  [pdf, other

    cs.CL cs.AI cs.LG

    ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

    Authors: Xinpeng Wang, Barbara Plank

    Abstract: Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main

  32. arXiv:2310.12020  [pdf, other

    cs.RO cs.CL cs.CV

    LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation

    Authors: Shengqiang Zhang, Philipp Wicke, Lütfi Kerem Şenel, Luis Figueredo, Abdeldjallil Naceri, Sami Haddadin, Barbara Plank, Hinrich Schütze

    Abstract: The convergence of embodied agents and large language models (LLMs) has brought significant advancements to embodied instruction following. Particularly, the strong reasoning capabilities of LLMs make it possible for robots to perform long-horizon tasks without expensive annotated demonstrations. However, public benchmarks for testing the long-horizon reasoning capabilities of language-conditioned… ▽ More

    Submitted 23 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures. The video and code of LoHoRavens are available at https://cisnlp.github.io/lohoravens-webpage/

  33. arXiv:2310.11878  [pdf, other

    cs.CL

    From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification

    Authors: Shanshan Xu, T. Y. S. S Santosh, Oana Ichim, Isabella Risini, Barbara Plank, Matthias Grabmair

    Abstract: In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RAVE: Rationale Variation in ECHR1, which is obtained from two experts in the domai… ▽ More

    Submitted 16 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  34. arXiv:2310.05442  [pdf, other

    cs.CL

    Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

    Authors: Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank

    Abstract: Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence have been compartmentalized into tasks with specialized model architectures and corresponding evaluation protocols. With the advent of large language models (LLMs) the community has w… ▽ More

    Submitted 23 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (Main Conference), camera-ready

  35. arXiv:2309.01669  [pdf, other

    cs.CL

    Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

    Authors: Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova, Barbara Plank

    Abstract: Instruction tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality problems in gold standard labels. So far, however, the application of AED methods has been limited to classification tasks. It is an… ▽ More

    Submitted 22 February, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: Camera ready version for LAW-XVIII

  36. arXiv:2307.15703  [pdf, other

    cs.CL cs.AI cs.LG

    Uncertainty in Natural Language Generation: From Theory to Applications

    Authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz

    Abstract: Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  37. arXiv:2305.20080  [pdf, other

    cs.CL

    Findings of the VarDial Evaluation Campaign 2023

    Authors: Noëmi Aepli, Çağrı Çöltekin, Rob Van Der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri

    Abstract: This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with EACL 2023. Three separate shared tasks were included this year: Slot and intent detection for low-resource language varieties (SID4LR),… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Journal ref: In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), pages 251-261, Dubrovnik, Croatia. Association from Computational Linguistics

  38. arXiv:2305.20045  [pdf, other

    cs.CL cs.LG

    ActiveAED: A Human in the Loop Improves Annotation Error Detection

    Authors: Leon Weber, Barbara Plank

    Abstract: Manually annotated datasets are crucial for training and evaluating Natural Language Processing models. However, recent work has discovered that even widely-used benchmark datasets contain a substantial number of erroneous annotations. This problem has been addressed with Annotation Error Detection (AED) models, which can flag such errors for human re-annotation. However, even though many of these… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Findings of ACL 2023

  39. arXiv:2305.15032  [pdf, other

    cs.CL cs.AI cs.LG

    How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

    Authors: Xinpeng Wang, Leonie Weissweiler, Hinrich Schütze, Barbara Plank

    Abstract: Recently, various intermediate layer distillation (ILD) objectives have been shown to improve compression of BERT models via Knowledge Distillation (KD). However, a comprehensive evaluation of the objectives in both task-specific and task-agnostic settings is lacking. To the best of our knowledge, this is the first work comprehensively evaluating distillation objectives in both settings. We show t… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  40. arXiv:2305.12092  [pdf, other

    cs.CL

    ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

    Authors: Mike Zhang, Rob van der Goot, Barbara Plank

    Abstract: The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification. While some approaches have been developed that are specific to the job market domain, there is a lack of generalized… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL2023 (Main)

  41. arXiv:2305.11707  [pdf, other

    cs.CL cs.AI cs.LG

    What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

    Authors: Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank

    Abstract: In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of o… ▽ More

    Submitted 20 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Camera ready version for EMNLP 2023

  42. arXiv:2305.11016  [pdf, other

    cs.CL

    Silver Syntax Pre-training for Cross-Domain Relation Extraction

    Authors: Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

    Abstract: Relation Extraction (RE) remains a challenging task, especially when considering realistic out-of-domain evaluations. One of the main reasons for this is the limited training size of current RE datasets: obtaining high-quality (manually annotated) data is extremely expensive and cannot realistically be repeated for each new domain. An intermediate training step on data from related tasks has shown… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted in Findings of the Association for Computational Linguistics: ACL 2023

  43. arXiv:2305.10985  [pdf, other

    cs.CL

    Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

    Authors: Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, Barbara Plank

    Abstract: Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at NoDaLiDa 2023

  44. arXiv:2305.05295  [pdf, other

    cs.CL cs.IR

    Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

    Authors: Robert Litschko, Ekaterina Artemova, Barbara Plank

    Abstract: Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach. In this work, we show that the effectiveness of zero-shot rankers diminishes when queries and documents are present in different languages. Motivated by this, we propose to train ranking models on artificially code-switched… ▽ More

    Submitted 26 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  45. arXiv:2304.14803  [pdf

    cs.CL

    SemEval-2023 Task 11: Learning With Disagreements (LeWiDi)

    Authors: Elisa Leonardelli, Alexandra Uma, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Massimo Poesio

    Abstract: NLP datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the NLP community has come to realize that the approach of 'reconciling' these different subjective interpretations is inappropriate. Many NLP r… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

  46. arXiv:2304.10158  [pdf, other

    cs.CL

    Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages

    Authors: Verena Blaschke, Hinrich Schütze, Barbara Plank

    Abstract: One of the challenges with finetuning pretrained language models (PLMs) is that their tokenizer is optimized for the language(s) it was pretrained on, but brittle when it comes to previously unseen variations in the data. This can for instance be observed when finetuning PLMs on one language and evaluating them on data in a closely related language variety with no standardized orthography. Despite… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: VarDial 2023

  47. arXiv:2304.09957  [pdf, other

    cs.CL

    Low-resource Bilingual Dialect Lexicon Induction with Large Language Models

    Authors: Ekaterina Artemova, Barbara Plank

    Abstract: Bilingual word lexicons are crucial tools for multilingual natural language understanding and machine translation tasks, as they facilitate the map** of words in one language to their synonyms in another language. To achieve this, numerous papers have explored bilingual lexicon induction (BLI) in high-resource scenarios, using a typical pipeline consisting of two unsupervised steps: bitext minin… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to NoDaLiDa 2023

  48. arXiv:2304.09805  [pdf, other

    cs.CL

    A Survey of Corpora for Germanic Low-Resource Languages and Dialects

    Authors: Verena Blaschke, Hinrich Schütze, Barbara Plank

    Abstract: Despite much progress in recent years, the vast majority of work in natural language processing (NLP) is on standard languages with many speakers. In this work, we instead focus on low-resource languages and in particular non-standardized low-resource languages. Even within branches of major language families, often considered well-researched, little is known about the extent and type of available… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: NoDaLiDa 2023

  49. arXiv:2211.14082  [pdf, ps, other

    cs.DS

    Simple Algorithms for Stochastic Score Classification with Small Approximation Ratios

    Authors: Benedikt M. Plank, Kevin Schewior

    Abstract: We revisit the Stochastic Score Classification (SSC) problem introduced by Gkenosis et al. (ESA 2018): We are given $n$ tests. Each test $j$ can be conducted at cost $c_j$, and it succeeds independently with probability $p_j$. Further, a partition of the (integer) interval $\{0,\dots,n\}$ into $B$ smaller intervals is known. The goal is to conduct tests so as to determine that interval from the pa… ▽ More

    Submitted 20 January, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

  50. arXiv:2211.02570  [pdf, other

    cs.CL cs.LG

    The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

    Authors: Barbara Plank

    Abstract: Human variation in labeling is often considered noise. Annotation projects for machine learning (ML) aim at minimizing human label variation, with the assumption to maximize data quality and in turn optimize and maximize machine learning metrics. However, this conventional practice assumes that there exists a ground truth, and neglects that there exists genuine human variation in labeling due to d… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022