Skip to main content

Showing 1–14 of 14 results for author: Kreuter, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11096  [pdf, other

    cs.CL

    The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

    Authors: Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter

    Abstract: Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may have. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOV). However, measuring AOV embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of… ▽ More

    Submitted 1 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  2. arXiv:2406.09289  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models

    Authors: Sarah Ball, Frauke Kreuter, Nina Rimsky

    Abstract: Conversational Large Language Models are trained to refuse to answer harmful questions. However, emergent jailbreaking techniques can still elicit unsafe outputs, presenting an ongoing challenge for model alignment. To better understand how different jailbreak types circumvent safeguards, this paper analyses model activations on different jailbreak inputs. We find that it is possible to extract a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2403.01208  [pdf, other

    cs.HC stat.ME

    Position: Insights from Survey Methodology can Improve Training Data

    Authors: Stephanie Eckman, Barbara Plank, Frauke Kreuter

    Abstract: Whether future AI models are fair, trustworthy, and aligned with the public's interests rests in part on our ability to collect accurate data about what we want the models to do. However, collecting high-quality data is difficult, and few AI/ML researchers are trained in data collection methods. Recent research in data-centric AI has show that higher quality training data leads to better performin… ▽ More

    Submitted 7 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures. ICML 2024 Position Paper, forthcoming

    ACM Class: E.0

  4. arXiv:2402.18397  [pdf, other

    cs.CL

    Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models

    Authors: Ercong Nie, Shuzhou Yuan, Bolei Ma, Helmut Schmid, Michael Färber, Frauke Kreuter, Hinrich Schütze

    Abstract: Despite the predominance of English in their training data, English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks, raising questions about the depth and nature of their cross-lingual capabilities. This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence la… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 18 pages, 7 figures

  5. arXiv:2402.14499  [pdf, other

    cs.CL

    "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

    Authors: Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

    Abstract: The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions (MCQ) to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final r… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  6. arXiv:2401.16589  [pdf, other

    cs.CL

    ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

    Authors: Bolei Ma, Ercong Nie, Shuzhou Yuan, Helmut Schmid, Michael Färber, Frauke Kreuter, Hinrich Schütze

    Abstract: Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt De… ▽ More

    Submitted 13 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: EACL 2024

  7. arXiv:2311.14212  [pdf, other

    stat.ML cs.CL cs.LG stat.ME

    Annotation Sensitivity: Training Data Collection Methods Affect Model Performance

    Authors: Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter

    Abstract: When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annota… ▽ More

    Submitted 22 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Findings: https://aclanthology.org/2023.findings-emnlp.992/

  8. arXiv:2310.19091  [pdf, other

    cs.LG cs.CY cs.HC stat.ME

    Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector

    Authors: Unai Fischer-Abaigar, Christoph Kern, Noam Barda, Frauke Kreuter

    Abstract: Machine Learning (ML) systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, they still face the challenge of aligning nuanced policy objecti… ▽ More

    Submitted 26 April, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

  9. arXiv:2307.06708  [pdf, other

    cs.CL cs.CR

    To share or not to share: What risks would laypeople accept to give sensitive data to differentially-private NLP systems?

    Authors: Christopher Weiss, Frauke Kreuter, Ivan Habernal

    Abstract: Although the NLP community has adopted central differential privacy as a go-to framework for privacy-preserving model training or data sharing, the choice and interpretation of the key parameter, privacy budget $\varepsilon$ that governs the strength of privacy protection, remains largely arbitrary. We argue that determining the $\varepsilon$ value should not be solely in the hands of researchers… ▽ More

    Submitted 25 March, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted at LREC-COLING 2024; final camera-ready version

  10. arXiv:2305.16703  [pdf, other

    stat.ML cs.LG

    Sources of Uncertainty in Machine Learning -- A Statisticians' View

    Authors: Cornelia Gruber, Patrick Oliver Schenk, Malte Schierholz, Frauke Kreuter, Göran Kauermann

    Abstract: Machine Learning and Deep Learning have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becomes clear, that beyond pure prediction, which is the primary strength of most supervised machine learning algorithms, the quantification of uncertainty is relevant and necessary as well. While first concepts and idea… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  11. arXiv:2303.05349  [pdf, other

    stat.AP cs.CL cs.CY

    Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data

    Authors: Anna-Carolina Haensch, Sarah Ball, Markus Herklotz, Frauke Kreuter

    Abstract: Advanced large language models like ChatGPT have gained considerable attention recently, including among students. However, while the debate on ChatGPT in academia is making waves, more understanding is needed among lecturers and teachers on how students use and perceive ChatGPT. To address this gap, we analyzed the content on ChatGPT available on TikTok in February 2023. TikTok is a rapidly growi… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  12. arXiv:2108.04134  [pdf, other

    cs.CY cs.LG stat.AP

    Fairness in Algorithmic Profiling: A German Case Study

    Authors: Christoph Kern, Ruben L. Bach, Hannah Mautner, Frauke Kreuter

    Abstract: Algorithmic profiling is increasingly used in the public sector as a means to allocate limited public resources effectively and objectively. One example is the prediction-based statistical profiling of job seekers to guide the allocation of support measures by public employment services. However, empirical evaluations of potential side-effects such as unintended discrimination and fairness concern… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

  13. arXiv:2105.01441  [pdf, other

    stat.ML cs.LG

    Distributive Justice and Fairness Metrics in Automated Decision-making: How Much Overlap Is There?

    Authors: Matthias Kuppler, Christoph Kern, Ruben L. Bach, Frauke Kreuter

    Abstract: The advent of powerful prediction algorithms led to increased automation of high-stake decisions regarding the allocation of scarce resources such as government spending and welfare support. This automation bears the risk of perpetuating unwanted discrimination against vulnerable and historically disadvantaged groups. Research on algorithmic discrimination in computer science and other disciplines… ▽ More

    Submitted 6 May, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

  14. arXiv:2011.06916  [pdf

    cs.HC cs.LG stat.AP

    Predicting respondent difficulty in web surveys: A machine-learning approach based on mouse movement features

    Authors: Amanda Fernández-Fontelo, Pascal J. Kieslich, Felix Henninger, Frauke Kreuter, Sonja Greven

    Abstract: A central goal of survey research is to collect robust and reliable data from respondents. However, despite researchers' best efforts in designing questionnaires, respondents may experience difficulty understanding questions' intent and therefore may struggle to respond appropriately. If it were possible to detect such difficulty, this knowledge could be used to inform real-time interventions thro… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: 40 pages, 2 Figures, 3 Tables