Skip to main content

Showing 1–12 of 12 results for author: Aßenmacher, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.01544  [pdf, other

    cs.CL cs.LG

    Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

    Authors: Björn Deiseroth, Max Meuer, Nikolas Gritsch, Constantin Eichenberg, Patrick Schramowski, Matthias Aßenmacher, Kristian Kersting

    Abstract: Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. However, their ever-increasing size has raised concerns about their effective deployment and the need for LLM compression. This study introduces the Divergent Token Metrics (DTMs), a novel approach to assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  2. arXiv:2310.03031  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses

    Authors: Stefanie Urchs, Veronika Thurner, Matthias Aßenmacher, Christian Heumann, Stephanie Thiemichen

    Abstract: With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the… ▽ More

    Submitted 13 May, 2024; v1 submitted 21 September, 2023; originally announced October 2023.

    Comments: Accepted @ "1st Workshop on Biased Data in Conversational Agents" (co-located with ECML PKDD 2023). This is the author's version of the work. The definite version of record will be published in the proceedings

  3. arXiv:2308.09368  [pdf, other

    cs.CV cs.CL cs.CY cs.LG stat.ML

    A tailored Handwritten-Text-Recognition System for Medieval Latin

    Authors: Philipp Koch, Gilary Vera Nuñez, Esteban Garces Arias, Christian Heumann, Matthias Schöffel, Alexander Häberlin, Matthias Aßenmacher

    Abstract: The Bavarian Academy of Sciences and Humanities aims to digitize its Medieval Latin Dictionary. This dictionary entails record cards referring to lemmas in medieval Latin, a low-resource language. A crucial step of the digitization process is the Handwritten Text Recognition (HTR) of the handwritten lemmas found on these record cards. In our work, we introduce an end-to-end pipeline, tailored to t… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted at the First Workshop on Ancient Language Processing, co-located with RANLP 2023. This is the author's version of the work. The definite version of record will be published in the proceedings

  4. arXiv:2307.16511  [pdf, other

    cs.CL cs.CY cs.LG stat.ML

    Classifying multilingual party manifestos: Domain transfer across country, time, and genre

    Authors: Matthias Aßenmacher, Nadja Sauter, Christian Heumann

    Abstract: Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential o… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  5. arXiv:2307.07331  [pdf, other

    cs.CL cs.CY cs.LG stat.ML

    How Different Is Stereotypical Bias Across Languages?

    Authors: Ibrahim Tolga Öztürk, Rostislav Nedelchev, Christian Heumann, Esteban Garces Arias, Marius Roger, Bernd Bischl, Matthias Aßenmacher

    Abstract: Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the Engli… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted @ "3rd Workshop on Bias and Fairness in AI" (co-located with ECML PKDD 2023). This is the author's version of the work. The definite version of record will be published in the proceedings

  6. arXiv:2306.10087  [pdf, other

    cs.LG cs.AI

    ActiveGLAE: A Benchmark for Deep Active Learning with Transformers

    Authors: Lukas Rauch, Matthias Aßenmacher, Denis Huseljic, Moritz Wirth, Bernd Bischl, Bernhard Sick

    Abstract: Deep active learning (DAL) seeks to reduce annotation costs by enabling the model to actively query instance annotations from which it expects to learn the most. Despite extensive research, there is currently no standardized evaluation protocol for transformer-based language models in the field of DAL. Diverse experimental settings lead to difficulties in comparing research and deriving recommenda… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted @ ECML PKDD 2023. This is the author's version of the work. The definitive Version of Record will be published in the Proceedings of ECML PKDD 2023

  7. arXiv:2301.04856  [pdf, other

    cs.CL cs.LG stat.ML

    Multimodal Deep Learning

    Authors: Cem Akkus, Luyang Chu, Vladana Djakovic, Steffen Jauch-Walser, Philipp Koch, Giacomo Loss, Christopher Marquardt, Marco Moldovan, Nadja Sauter, Maximilian Schneider, Rickmer Schulte, Karol Urbanczyk, Jann Goschenhofer, Christian Heumann, Rasmus Hvingelby, Daniel Schalk, Matthias Aßenmacher

    Abstract: This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance rep… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  8. arXiv:2105.04876  [pdf, other

    cs.CL cs.LG stat.ML

    Benchmarking down-scaled (not so large) pre-trained language models

    Authors: M. Aßenmacher, P. Schulze, C. Heumann

    Abstract: Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or architectural hyperparameters, are modified. In total, it is therefore difficult to ascribe changes in performance to specific factors. Since searching the hyperparam… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: 14 pages, 5 figures

  9. arXiv:2104.02496  [pdf, other

    cs.CL cs.LG stat.ML

    Exploring Topic-Metadata Relationships with the STM: A Bayesian Approach

    Authors: P. Schulze, S. Wiegrebe, P. W. Thurner, C. Heumann, M. Aßenmacher, S. Wankmüller

    Abstract: Topic models such as the Structural Topic Model (STM) estimate latent topical clusters within text. An important step in many topic modeling applications is to explore relationships between the discovered topical structure and metadata associated with the text documents. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but inst… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 8 pages, 4 figures

  10. arXiv:2102.12330  [pdf, ps, other

    cs.CL cs.LG

    Re-Evaluating GermEval17 Using German Pre-Trained Language Models

    Authors: M. Aßenmacher, A. Corvonato, C. Heumann

    Abstract: The lack of a commonly used benchmark data set (collection) such as (Super-)GLUE (Wang et al., 2018, 2019) for the evaluation of non-English pre-trained language models is a severe shortcoming of current English-centric NLP-research. It concentrates a large part of the research on English, neglecting the uncertainty when transferring conclusions found for the English language to other languages. W… ▽ More

    Submitted 5 July, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: Accepted as a conference paper at the 6th Swiss Text Analytics Conference (SwissText), Brugg, Switzerland (Online), June 14-16, 2021

  11. arXiv:2012.02558  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Pre-trained language models as knowledge bases for Automotive Complaint Analysis

    Authors: V. D. Viellieber, M. Aßenmacher

    Abstract: Recently it has been shown that large pre-trained language models like BERT (Devlin et al., 2018) are able to store commonsense factual knowledge captured in its pre-training corpus (Petroni et al., 2019). In our work we further evaluate this ability with respect to an application from industry creating a set of probes specifically designed to reveal technical quality issues captured as described… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 5 pages

  12. arXiv:2001.00781  [pdf, ps, other

    cs.CL cs.LG stat.ML

    On the comparability of Pre-trained Language Models

    Authors: Matthias Aßenmacher, Christian Heumann

    Abstract: Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. Mainly three forces are driving the improvements in this area of research: More elaborated architectures are making better use of contextual information. Instead of simply plugging in static pre-trained representations, these are learned based on surrounding context in… ▽ More

    Submitted 3 January, 2020; originally announced January 2020.

    Journal ref: Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), Zurich, Switzerland, June 23-25, 2020