Skip to main content

Showing 1–9 of 9 results for author: Aumiller, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03211  [pdf, other

    cs.CL cs.LG

    How Does Quantization Affect Multilingual LLMs?

    Authors: Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

    Abstract: Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2310.15773  [pdf, other

    cs.CL

    BLESS: Benchmarking Large Language Models on Sentence Simplification

    Authors: Tannon Kew, Alison Chi, Laura Vásquez-Rodríguez, Sweta Agrawal, Dennis Aumiller, Fernando Alva-Manchego, Matthew Shardlow

    Abstract: We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news,… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted to EMNLP 2023 as a main long paper. 9 pages, 7 figures

  3. arXiv:2305.13309  [pdf, other

    cs.CL

    Evaluating Factual Consistency of Texts with Semantic Role Labeling

    Authors: **g Fan, Dennis Aumiller, Michael Gertz

    Abstract: Automated evaluation of text generation systems has recently seen increasing attention, particularly checking whether generated text stays truthful to input sources. Existing methods frequently rely on an evaluation using task-specific language models, which in turn allows for little interpretability of generated scores. We introduce SRLScore, a reference-free evaluation metric designed with text… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at *SEM 2023

  4. arXiv:2301.07095  [pdf, other

    cs.CL

    On the State of German (Abstractive) Text Summarization

    Authors: Dennis Aumiller, **g Fan, Michael Gertz

    Abstract: With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: Accepted at the 20th Conference on Database Systems for Business, Technology and Web (BTW'23)

  5. arXiv:2301.01764  [pdf, other

    cs.CL

    UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?

    Authors: Dennis Aumiller, Michael Gertz

    Abstract: Previous state-of-the-art models for lexical simplification consist of complex pipelines with several components, each of which requires deep technical knowledge and fine-tuned interaction to achieve its full potential. As an alternative, we describe a frustratingly simple pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instan… ▽ More

    Submitted 5 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) at EMNLP 2022

  6. arXiv:2210.13448  [pdf, other

    cs.CL

    EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

    Authors: Dennis Aumiller, Ashish Chouhan, Michael Gertz

    Abstract: Existing summarization datasets come with two main drawbacks: (1) They tend to focus on overly exposed domains, such as news articles or wiki-like texts, and (2) are primarily monolingual, with few multilingual datasets. In this work, we propose a novel dataset, called EUR-Lex-Sum, based on manually curated document summaries of legal acts from the European Union law platform (EUR-Lex). Documents… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  7. arXiv:2201.07198  [pdf, other

    cs.CL

    Klexikon: A German Dataset for Joint Summarization and Simplification

    Authors: Dennis Aumiller, Michael Gertz

    Abstract: Traditionally, Text Simplification is treated as a monolingual translation task where sentences between source texts and their simplified counterparts are aligned for training. However, especially for longer input documents, summarizing the text (or drop** less relevant content altogether) plays an important role in the simplification process, which is currently not reflected in existing dataset… ▽ More

    Submitted 28 July, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: Code and data are available on Github: https://github.com/dennlinger/klexikon

  8. arXiv:2109.14927   

    cs.CL

    BERT got a Date: Introducing Transformers to Temporal Tagging

    Authors: Satya Almasian, Dennis Aumiller, Michael Gertz

    Abstract: Temporal expressions in text play a significant role in language understanding and correctly identifying them is fundamental to various retrieval and natural language processing systems. Previous works have slowly shifted from rule-based to neural architectures, capable of tagging expressions with higher accuracy. However, neural models can not yet distinguish between different expression types at… ▽ More

    Submitted 24 January, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: unreliable evaluation results for Seq2seq models

  9. Structural Text Segmentation of Legal Documents

    Authors: Dennis Aumiller, Satya Almasian, Sebastian Lackner, Michael Gertz

    Abstract: The growing complexity of legal cases has lead to an increasing interest in legal information retrieval systems that can effectively satisfy user-specific information needs. However, such downstream systems typically require documents to be properly formatted and segmented, which is often done with relatively simple pre-processing steps, disregarding topical coherence of segments. Systems generall… ▽ More

    Submitted 17 May, 2021; v1 submitted 7 December, 2020; originally announced December 2020.