Skip to main content

Showing 1–50 of 201 results for author: Gurevych, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01091  [pdf, other

    cs.CL

    M2QA: Multi-domain Multilingual Question Answering

    Authors: Leon Engländer, Hannah Sterz, Clifton Poth, Jonas Pfeiffer, Ilia Kuznetsov, Iryna Gurevych

    Abstract: Generalization and robustness to input variation are core desiderata of machine learning research. Language varies along several axes, most importantly, language instance (e.g. French) and domain (e.g. news). While adapting NLP models to new languages within a single domain, or to new domains within a single language, is widely studied, research in joint adaptation is hampered by the lack of evalu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.07080  [pdf, other

    cs.CL

    DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

    Authors: Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

    Abstract: Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications. To improve the neural-symbolic reasoning capabilities of language agents powered by Large Language Models (LLMs) in KGQA, we propose the DecompositionAlignment-Reasoning Agent (DARA) framework. DARA effectively parses questions into formal queries through a dual… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL2024 findings

  3. arXiv:2406.04566  [pdf, other

    cs.CL cs.AI cs.LG

    SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models

    Authors: Md Imbesat Hassan Rizvi, Xiaodan Zhu, Iryna Gurevych

    Abstract: Spatial reasoning is a crucial component of both biological and artificial intelligence. In this work, we present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning. To support our study, we created and contribute a novel Spatial Reasoning Characterization (SpaRC) framework and Spatial Reasoning Paths (SpaRP) datasets, to enable an… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 (Main)

  4. arXiv:2406.03930  [pdf, other

    cs.CL

    Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art

    Authors: Chen Cecilia Liu, Iryna Gurevych, Anna Korhonen

    Abstract: The surge of interest in culturally aware and adapted Natural Language Processing (NLP) has inspired much recent research. However, the lack of common understanding of the concept of "culture" has made it difficult to evaluate progress in this emerging area. Drawing on prior research in NLP and related fields, we propose an extensive taxonomy of elements of culture that can provide a systematic fr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2406.03181  [pdf, other

    cs.CL

    Missci: Reconstructing Fallacies in Misrepresented Science

    Authors: Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych

    Abstract: Health-related misinformation on social networks can lead to poor decision-making and real-world dangers. Such misinformation often misrepresents scientific publications and cites them as "proof" to gain perceived credibility. To effectively counter such claims automatically, a system must explain how the claim was falsely derived from the cited publication. Current methods for automated fact-chec… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (main)

  6. arXiv:2406.00197  [pdf, other

    cs.CL

    Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision

    Authors: Qian Ruan, Ilia Kuznetsov, Iryna Gurevych

    Abstract: Collaborative review and revision of textual documents is the core of knowledge work and a promising target for empirical analysis and NLP assistance. Yet, a holistic framework that would allow modeling complex relationships between document revisions, reviews and author responses is lacking. To address this gap, we introduce Re3, a framework for joint analysis of collaborative document revision.… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: accepted to ACL2024 main

  7. arXiv:2405.06563  [pdf, other

    cs.CL

    What Can Natural Language Processing Do for Peer Review?

    Authors: Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, **gyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

    Abstract: The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  8. arXiv:2404.18923  [pdf, other

    cs.CL

    Holmes: Benchmark the Linguistic Competence of Language Models

    Authors: Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

    Abstract: We introduce Holmes, a benchmark to assess the linguistic competence of language models (LMs) - their ability to grasp linguistic phenomena. Unlike prior prompting-based evaluations, Holmes assesses the linguistic competence of LMs via their internal representations using classifier-based probing. In doing so, we disentangle specific phenomena (e.g., part-of-speech of words) from other cognitive a… ▽ More

    Submitted 22 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  9. arXiv:2404.14183  [pdf, other

    cs.CL

    SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  10. arXiv:2404.12897  [pdf, other

    cs.CL

    Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning

    Authors: Ahmed Elshabrawy, Yongxin Huang, Iryna Gurevych, Alham Fikri Aji

    Abstract: While Large Language Models (LLMs) exhibit remarkable capabilities in zero-shot and few-shot scenarios, they often require computationally prohibitive sizes. Conversely, smaller Masked Language Models (MLMs) like BERT and RoBERTa achieve state-of-the-art results through fine-tuning but struggle with extending to few-shot and zero-shot settings due to their architectural constraints. Hence, we prop… ▽ More

    Submitted 22 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  11. arXiv:2404.08821  [pdf, other

    cs.CL

    Constrained C-Test Generation via Mixed-Integer Programming

    Authors: Ji-Ung Lee, Marc E. Pfetsch, Iryna Gurevych

    Abstract: This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. This allows us to consider gap size and placement si… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Github: https://github.com/UKPLab/arxiv2024-constrained-ctest-generation

  12. arXiv:2403.15210  [pdf, other

    cs.LG

    Early Period of Training Impacts Out-of-Distribution Generalization

    Authors: Chen Cecilia Liu, Iryna Gurevych

    Abstract: Prior research has found that differences in the early period of neural network training significantly impact the performance of in-distribution (ID) tasks. However, neural networks are often sensitive to out-of-distribution (OOD) data, making them less reliable in downstream applications. Yet, the impact of the early training period on OOD generalization remains understudied due to its complexity… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: WIP

  13. arXiv:2403.03894  [pdf, other

    cs.AI cs.CL cs.PL

    IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

    Authors: Indraneil Paul, Goran Glavaš, Iryna Gurevych

    Abstract: Code understanding and generation have fast become some of the most popular applications of language models (LMs). Nonetheless, research on multilingual aspects of Code-LMs (i.e., LMs for code generation) such as cross-lingual transfer between different programming languages, language-specific data augmentation, and post-hoc LM adaptation, alongside exploitation of data sources other than the orig… ▽ More

    Submitted 15 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  14. arXiv:2403.03627  [pdf, other

    cs.CL cs.AI

    Multimodal Large Language Models to Support Real-World Fact-Checking

    Authors: Jiahui Geng, Yova Kementchedjhieva, Preslav Nakov, Iryna Gurevych

    Abstract: Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. Here is aim to bridge this gap. In particular, we propose a framework for systematically assessing the capacity of current multimodal models to facilitat… ▽ More

    Submitted 26 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  15. arXiv:2403.03029  [pdf, other

    cs.CL

    Socratic Reasoning Improves Positive Text Rewriting

    Authors: Anmol Goel, Nico Daheim, Iryna Gurevych

    Abstract: Reframing a negative into a positive thought is at the crux of several cognitive approaches to mental health and psychotherapy that could be made more accessible by large language model-based solutions. Such reframing is typically non-trivial and requires multiple rationalization steps to uncover the underlying issue of a negative thought and transform it to be more positive. However, this rationa… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  16. arXiv:2402.17641  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Variational Learning is Effective for Large Deep Networks

    Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon

  17. arXiv:2402.12332  [pdf, other

    cs.CL

    Triple-Encoders: Representations That Fire Together, Wire Together

    Authors: Justus-Jonas Erker, Florian Mai, Nils Reimers, Gerasimos Spanakis, Iryna Gurevych

    Abstract: Search-based dialog models typically re-encode the dialog history at every turn, incurring high cost. Curved Contrastive Learning, a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder, has recently shown promising results for dialog modeling at far superior efficiency. While high efficiency is achieved through independently e… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: in Review at ACL Rolling Review

  18. arXiv:2402.11175  [pdf, other

    cs.CL

    M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 29 pages

    Journal ref: ACL 2024 main

  19. arXiv:2402.02113  [pdf, other

    cs.CL

    Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

    Authors: Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, Timothy Baldwin

    Abstract: Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, i… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted at EACL 2024

  20. arXiv:2402.01375  [pdf, other

    cs.CL

    Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization

    Authors: Andreas Waldis, Yufang Hou, Iryna Gurevych

    Abstract: Pre-trained language models (LMs) perform well in In-Topic setups, where training and testing data come from the same topics. However, they face challenges in Cross-Topic scenarios where testing data is derived from distinct topics -- such as Gun Control. This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: EACL 2024

  21. arXiv:2401.17658  [pdf, other

    cs.CL

    Document Structure in Long Document Transformers

    Authors: Jan Buchmann, Max Eichler, Jan-Micha Bodensohn, Ilia Kuznetsov, Iryna Gurevych

    Abstract: Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs. Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque. Do long-document Transformer models acquire an internal representation of document structure during pre-training? How can structural information be… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at EACL 2024. Code and data: http://github.com/UKPLab/eacl2024-doc-structure

  22. arXiv:2401.10065  [pdf, other

    cs.CL

    Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

    Authors: Haritz Puerto, Martin Tutek, Somak Aditya, Xiaodan Zhu, Iryna Gurevych

    Abstract: Reasoning is a fundamental component of language understanding. Recent prompting techniques, such as chain of thought, have consistently improved LLMs' performance on various reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs in the inference stage. In this paper, we introduce code prompting, a chain of prompts that transforms a natural… ▽ More

    Submitted 25 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Code, prompt templates, prompts, and outputs are publicly available at https://github.com/UKPLab/arxiv2024-conditional-reasoning-llms

  23. arXiv:2401.09248  [pdf, other

    cs.CL cs.HC

    Learning from Emotions, Demographic Information and Implicit User Feedback in Task-Oriented Document-Grounded Dialogues

    Authors: Dominic Petrak, Thy Thy Tran, Iryna Gurevych

    Abstract: The success of task-oriented and document-grounded dialogue systems depends on users accepting and enjoying using them. To achieve this, recently published work in the field of Human-Computer Interaction suggests that the combination of considering demographic information, user emotions and learning from the implicit feedback in their utterances, is particularly important. However, these findings… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  24. arXiv:2311.11077  [pdf, other

    cs.CL cs.AI cs.LG

    Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

    Authors: Clifton Poth, Hannah Sterz, Indraneil Paul, Sukannya Purkayastha, Leon Engländer, Timo Imhof, Ivan Vulić, Sebastian Ruder, Iryna Gurevych, Jonas Pfeiffer

    Abstract: We introduce Adapters, an open-source library that unifies parameter-efficient and modular transfer learning in large language models. By integrating 10 diverse adapter methods into a unified interface, Adapters offers ease of use and flexible configuration. Our library allows researchers and practitioners to leverage adapter modularity through composition blocks, enabling the design of complex ad… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023: Systems Demonstrations

  25. arXiv:2311.09000  [pdf, other

    cs.CL

    Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

    Authors: Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov

    Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present a holistic end-to-end solution for annotating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels concerning the verifiability and factu… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 30 pages, 13 figures

  26. arXiv:2311.08298  [pdf, other

    cs.CL cs.AI

    A Survey of Confidence Estimation and Calibration in Large Language Models

    Authors: Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, they can be unreliable due to factual errors in their generations. Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations. There has been a lot of recent re… ▽ More

    Submitted 25 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 16 pages, 1 page, 1 table

  27. arXiv:2311.07230  [pdf, other

    cs.CL

    How are Prompts Different in Terms of Sensitivity?

    Authors: Sheng Lu, Hendrik Schuff, Iryna Gurevych

    Abstract: In-context learning (ICL) has become one of the most popular learning paradigms. While there is a growing body of literature focusing on prompt engineering, there is a lack of systematic analysis comparing the effects of prompts across different models and tasks. To address this gap, we present a comprehensive prompt analysis based on the sensitivity of a function. Our analysis reveals that sensit… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Main

  28. arXiv:2311.06649  [pdf, other

    cs.CL

    A Template Is All You Meme

    Authors: Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych

    Abstract: Memes are a modern form of communication and meme templates possess a base semantics that is customizable by whomever posts it on social media. Machine learning systems struggle with memes, which is likely due to such systems having insufficient context to understand memes, as there is more to memes than the obvious image and text. Here, to aid understanding of memes, we release a knowledge base o… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 9 pages, 11 supplemental pages, 6 Tables, 10 Figures

  29. arXiv:2311.03998  [pdf, other

    cs.CL

    Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals

    Authors: Sukannya Purkayastha, Anne Lauscher, Iryna Gurevych

    Abstract: In many domains of argumentation, people's arguments are driven by so-called attitude roots, i.e., underlying beliefs and world views, and their corresponding attitude themes. Given the strength of these latent drivers of arguments, recent work in psychology suggests that instead of directly countering surface-level reasoning (e.g., falsifying given premises), one should follow an argumentation st… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP Main Conference 2023

  30. arXiv:2311.00408  [pdf, other

    cs.CL

    AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification

    Authors: Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Nath Patel, Goran Glavaš, Iryna Gurevych

    Abstract: Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an S… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023 Main

  31. arXiv:2310.15758  [pdf, other

    cs.CL

    Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones?

    Authors: Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych

    Abstract: Learning from free-text human feedback is essential for dialog systems, but annotated data is scarce and usually covers only a small fraction of error types known in conversational AI. Instead of collecting and annotating new datasets from scratch, recent advances in synthetic dialog generation could be used to augment existing dialog datasets with the necessary annotations. However, to assess the… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to be presented at EMNLP 2023

  32. arXiv:2310.12808  [pdf, other

    cs.LG cs.AI cs.CL

    Model Merging by Uncertainty-Based Gradient Matching

    Authors: Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan

    Abstract: Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averag… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Preprint. Under review

  33. arXiv:2310.12300  [pdf, other

    cs.CL

    Measuring Pointwise $\mathcal{V}$-Usable Information In-Context-ly

    Authors: Sheng Lu, Shan Chen, Yingya Li, Danielle Bitterman, Guergana Savova, Iryna Gurevych

    Abstract: In-context learning (ICL) is a new learning paradigm that has gained popularity along with the development of large language models. In this work, we adapt a recently proposed hardness metric, pointwise $\mathcal{V}$-usable information (PVI), to an in-context version (in-context PVI). Compared to the original PVI, in-context PVI is more efficient in that it requires only a few exemplars and does n… ▽ More

    Submitted 8 December, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings

  34. arXiv:2309.08591  [pdf, other

    cs.CL

    Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings

    Authors: Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych

    Abstract: Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground. As languages are associated with diverse cultures, LLMs should also be culturally-diverse reasoners. In this paper, we study the ability of a wide range of state-of-the-art multilingual LLMs (… ▽ More

    Submitted 30 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: NAACL

  35. arXiv:2309.08316  [pdf, other

    cs.CL

    How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study

    Authors: Andreas Waldis, Yufang Hou, Iryna Gurevych

    Abstract: The advent of pre-trained Language Models (LMs) has markedly advanced natural language processing, but their efficacy in out-of-distribution (OOD) scenarios remains a significant challenge. Computational argumentation (CA), modeling human argumentation processes, is a field notably impacted by these challenges because complex annotation schemes and high annotation costs naturally lead to resources… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

  36. arXiv:2309.07822  [pdf, other

    cs.CL

    CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

    Authors: Rachneet Sachdeva, Martin Tutek, Iryna Gurevych

    Abstract: In recent years, large language models (LLMs) have shown remarkable capabilities at scale, particularly at generating text conditioned on a prompt. In our work, we investigate the use of LLMs to augment training data of small language models~(SLMs) with automatically generated counterfactual~(CF) instances -- i.e. minimally altered inputs -- in order to improve out-of-domain~(OOD) performance of S… ▽ More

    Submitted 13 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to EACL 2024 main conference

  37. arXiv:2309.07034  [pdf, other

    cs.CL cs.AI

    Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting

    Authors: Tilman Beck, Hendrik Schuff, Anne Lauscher, Iryna Gurevych

    Abstract: Annotators' sociodemographic backgrounds (i.e., the individual compositions of their gender, age, educational background, etc.) have a strong impact on their decisions when working on subjective NLP tasks, such as toxic language detection. Often, heterogeneous backgrounds result in high disagreements. To model this variation, recent work has explored sociodemographic prompting, a technique, which… ▽ More

    Submitted 8 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: EACL 2024 camera-ready

  38. arXiv:2309.01809  [pdf, other

    cs.CL

    Are Emergent Abilities in Large Language Models just In-Context Learning?

    Authors: Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych

    Abstract: Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities. The emergence of such abilities carries profound implications for the future direction of research in NLP, especially as the deployment of such models becomes more prevalent. However… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Code available at https://github.com/UKPLab/on-emergence and data available at https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3931

  39. arXiv:2307.10488  [pdf, other

    cs.IR cs.CL cs.LG

    SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

    Authors: Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin

    Abstract: Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, commo… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted at SIGIR 2023 (Resource Track)

  40. arXiv:2307.08153  [pdf, other

    cs.CL

    Analyzing Dataset Annotation Quality Management in the Wild

    Authors: Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

    Abstract: Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models as well as for their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exi… ▽ More

    Submitted 9 March, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

  41. arXiv:2306.16900  [pdf, other

    cs.CL

    Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

    Authors: Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge

    Abstract: Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based… ▽ More

    Submitted 9 November, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

  42. arXiv:2305.19748  [pdf, other

    cs.CL

    UKP-SQuARE: An Interactive Tool for Teaching Question Answering

    Authors: Haishuo Fang, Haritz Puerto, Iryna Gurevych

    Abstract: The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. Additionally, the breadth of QA derived from this exponential growth makes it an ideal scenario for teaching related NLP topics such as information retrieval, explainability, and adversarial attacks among others. In this paper, we introduce UKP-SQuARE as a platform… ▽ More

    Submitted 2 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted by BEA workshop, ACL2023

  43. arXiv:2305.15025  [pdf, other

    cs.CL

    Dior-CVAE: Pre-trained Language Models and Diffusion Priors for Variational Dialog Generation

    Authors: Tianyu Yang, Thy Thy Tran, Iryna Gurevych

    Abstract: Current variational dialog models have employed pre-trained language models (PLMs) to parameterize the likelihood and posterior distributions. However, the Gaussian assumption made on the prior distribution is incompatible with these distributions, thus restricting the diversity of generated responses. These models also suffer from posterior collapse, i.e., the decoder tends to ignore latent varia… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  44. arXiv:2305.14902  [pdf, other

    cs.CL

    M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries. However, this has also raised concerns about the potential misuse of such texts in journalism, education, and academia. In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse. We first introduce a la… ▽ More

    Submitted 9 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 41 pages

  45. arXiv:2305.14536  [pdf, other

    cs.CL

    MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

    Authors: Jakub Macina, Nico Daheim, Sankalan Pal Chowdhury, Tanmay Sinha, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

    Abstract: While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framew… ▽ More

    Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Jakub Macina, Nico Daheim, and Sankalan Pal Chowdhury contributed equally to this work. Accepted at EMNLP2023 Findings. Code and dataset available: https://github.com/eth-nlped/mathdial

  46. arXiv:2305.13915  [pdf, other

    cs.IR cs.CL

    DAPR: A Benchmark on Document-Aware Passage Retrieval

    Authors: Kexin Wang, Nils Reimers, Iryna Gurevych

    Abstract: The work of neural retrieval so far focuses on ranking short texts and is challenged with long documents. There are many cases where the users want to find a relevant passage within a long document from a huge corpus, e.g. Wikipedia articles, research papers, etc. We propose and name this task \emph{Document-Aware Passage Retrieval} (DAPR). While analyzing the errors of the State-of-The-Art (SoTA)… ▽ More

    Submitted 9 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2024 Main Conference

  47. arXiv:2305.12920  [pdf, other

    cs.CL

    A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why?

    Authors: Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

    Abstract: Understanding the fundamental concepts and trends in a scientific field is crucial for kee** abreast of its continuous advancement. In this study, we propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NL… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: accepted at EMNLP 2023

  48. arXiv:2305.07716  [pdf, other

    cs.RO cs.AI

    Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2 into a Robot Language Model for Grounded Task Planning

    Authors: Georgia Chalvatzaki, Ali Younes, Daljeet Nandha, An Le, Leonardo F. R. Ribeiro, Iryna Gurevych

    Abstract: Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: 21 pages, 6 figures

  49. arXiv:2304.12836  [pdf, other

    cs.CL

    Lessons Learned from a Citizen Science Project for Natural Language Processing

    Authors: Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, Gözde Gül Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho, Iryna Gurevych

    Abstract: Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted to EACL 2023. Code will be published on github: https://github.com/UKPLab/eacl2023-citizen-science-lessons-learned

  50. arXiv:2304.08865  [pdf, other

    cs.CL cs.LG

    Romanization-based Large-scale Adaptation of Multilingual Language Models

    Authors: Sukannya Purkayastha, Sebastian Ruder, Jonas Pfeiffer, Iryna Gurevych, Ivan Vulić

    Abstract: Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP. However, their large-scale deployment to many languages, besides pretraining data scarcity, is also hindered by the increase in vocabulary size and limitations in their parameter budget. In order to boost the capacity of mPLMs to deal with low-resource and unseen langu… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 9 pages, 5 figures