Skip to main content

Showing 1–28 of 28 results for author: Mosbach, M

.
  1. arXiv:2406.12618  [pdf, other

    cs.CL

    From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP

    Authors: Marius Mosbach, Vagrant Gautam, Tomás Vergara-Browne, Dietrich Klakow, Mor Geva

    Abstract: Interpretability and analysis (IA) research is a growing subfield within NLP with the goal of develo** a deeper understanding of the behavior or inner workings of NLP systems and methods. Despite growing interest in the subfield, a commonly voiced criticism is that it lacks actionable insights and therefore has little impact on NLP. In this paper, we seek to quantify the impact of IA research on… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2404.05961  [pdf, other

    cs.CL cs.AI

    LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

    Authors: Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy

    Abstract: Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2403.13537  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    What explains the success of cross-modal fine-tuning with ORCA?

    Authors: Paloma García-de-Herreros, Vagrant Gautam, Philipp Slusallek, Dietrich Klakow, Marius Mosbach

    Abstract: ORCA (Shen et al., 2023) is a recent technique for cross-modal fine-tuning, i.e., applying pre-trained transformer models to modalities beyond their training data. The technique consists primarily of training an embedder and fine-tuning the embedder and model. Despite its high performance on a variety of downstream tasks, we do not understand precisely how each of these components contribute to OR… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2403.10187  [pdf, other

    cs.RO cs.AI cs.LG

    Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects

    Authors: Malte Mosbach, Sven Behnke

    Abstract: Interactive gras** from clutter, akin to human dexterity, is one of the longest-standing problems in robot learning. Challenges stem from the intricacies of visual perception, the demand for precise motor skills, and the complex interplay between the two. In this work, we present Teacher-Augmented Policy Gradient (TAPG), a novel two-stage learning framework that synergizes reinforcement learning… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  5. arXiv:2402.13137  [pdf, other

    cs.CL

    The Hidden Space of Transformer Language Adapters

    Authors: Jesujoba O. Alabi, Marius Mosbach, Matan Eyal, Dietrich Klakow, Mor Geva

    Abstract: We analyze the operation of transformer language adapters, which are small modules trained on top of a frozen language model to adapt its predictions to new target languages. We show that adapted predictions mostly evolve in the source language the model was trained on, while the target language becomes pronounced only in the very last layers of the model. Moreover, the adaptation process is gradu… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 (main conference)

  6. arXiv:2402.12976  [pdf, other

    cs.CL cs.AI

    The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis

    Authors: Miaoran Zhang, Vagrant Gautam, Mingyang Wang, Jesujoba O. Alabi, Xiaoyu Shen, Dietrich Klakow, Marius Mosbach

    Abstract: In-context learning is a popular inference strategy where large language models solve a task using only a few labeled demonstrations without needing any parameter updates. Although there have been extensive studies on English in-context learning, multilingual in-context learning remains under-explored, and we lack an in-depth understanding of the role of demonstrations in this context. To address… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: ACL 2024 findings

  7. arXiv:2311.04547  [pdf, other

    cs.CL

    Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures

    Authors: Julius Steuer, Marius Mosbach, Dietrich Klakow

    Abstract: Research on the cognitive plausibility of language models (LMs) has so far mostly concentrated on modelling psycholinguistic response variables such as reading times, gaze durations and N400/P600 EEG signals, while mostly leaving out the dimension of what Mahowald et al. (2023) described as formal and functional linguistic competence, and developmental plausibility. We address this gap by training… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  8. arXiv:2307.16499  [pdf, other

    cs.RO cs.LG

    Learning Generalizable Tool Use with Non-rigid Grasp-pose Registration

    Authors: Malte Mosbach, Sven Behnke

    Abstract: Tool use, a hallmark feature of human intelligence, remains a challenging problem in robotics due the complex contacts and high-dimensional action space. In this work, we present a novel method to enable reinforcement learning of tool use behaviors. Our approach provides a scalable way to learn the operation of tools in a new category using only a single demonstration. To this end, we propose a ne… ▽ More

    Submitted 1 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: Accepted for publication at IEEE CASE 2023

  9. arXiv:2305.17442  [pdf, other

    cs.CL

    Weaker Than You Think: A Critical Look at Weakly Supervised Learning

    Authors: Dawei Zhu, Xiaoyu Shen, Marius Mosbach, Andreas Stephan, Dietrich Klakow

    Abstract: Weakly supervised learning is a popular approach for training machine learning models in low-resource settings. Instead of requesting high-quality yet costly human annotations, it allows training models with noisy annotations obtained from various weak sources. Recently, many sophisticated approaches have been proposed for robust training under label noise, reporting impressive results. In this pa… ▽ More

    Submitted 17 September, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ACL 2023, oral presentation

  10. arXiv:2305.16938  [pdf, other

    cs.CL

    Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation

    Authors: Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, Dietrich Klakow, Yanai Elazar

    Abstract: Few-shot fine-tuning and in-context learning are two alternative strategies for task adaptation of pre-trained language models. Recently, in-context learning has gained popularity over fine-tuning due to its simplicity and improved out-of-domain generalization, and because extensive evidence shows that fine-tuned models pick up on spurious correlations. Unfortunately, previous comparisons of the t… ▽ More

    Submitted 30 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  11. arXiv:2212.02126  [pdf, other

    cs.RO cs.AI cs.LG

    Accelerating Interactive Human-like Manipulation Learning with GPU-based Simulation and High-quality Demonstrations

    Authors: Malte Mosbach, Kara Moraw, Sven Behnke

    Abstract: Dexterous manipulation with anthropomorphic robot hands remains a challenging problem in robotics because of the high-dimensional state and action spaces and complex contacts. Nevertheless, skillful closed-loop manipulation is required to enable humanoid robots to operate in unstructured real-world environments. Reinforcement learning (RL) has traditionally imposed enormous interaction data requir… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Journal ref: 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids) 435-441

  12. arXiv:2211.10957  [pdf, other

    cs.RO cs.AI cs.LG

    Efficient Representations of Object Geometry for Reinforcement Learning of Interactive Gras** Policies

    Authors: Malte Mosbach, Sven Behnke

    Abstract: Gras** objects of different shapes and sizes - a foundational, effortless skill for humans - remains a challenging task in robotics. Although model-based approaches can predict stable grasp configurations for known object models, they struggle to generalize to novel objects and often operate in a non-interactive open-loop manner. In this work, we present a reinforcement learning framework that l… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  13. arXiv:2208.02402  [pdf, other

    cs.CL cs.LG

    Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

    Authors: Vilém Zouhar, Marius Mosbach, Dietrich Klakow

    Abstract: Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e.g. concatenation) to obtain a richer co… ▽ More

    Submitted 5 August, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

    Comments: Submitted to PBML. Code & experiment repository: https://github.com/zouharvi/sentence-embd-fusion

  14. arXiv:2207.14251  [pdf, other

    cs.CL

    Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

    Authors: Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg

    Abstract: Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. But what exactly in the training data causes a model to make a certain prediction? We seek to answer this question by providing a language for describing how training data influences predictions, through a causal framework. Importantly, our framework bypasses the need to retrain exp… ▽ More

    Submitted 24 March, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: We received a criticism regarding the validity of the causal formulation in this paper. We will address them in an upcoming version

  15. arXiv:2205.14036  [pdf, other

    cs.CL

    StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes

    Authors: Awantee Deshpande, Dana Ruiter, Marius Mosbach, Dietrich Klakow

    Abstract: Analyzing ethnic or religious bias is important for improving fairness, accountability, and transparency of natural language processing models. However, many techniques rely on human-compiled lists of bias terms, which are expensive to create and are limited in coverage. In this study, we present a fully data-driven pipeline for generating a knowledge graph (KG) of cultural knowledge and stereotyp… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 12 pages, 2 figures, accepted as a long paper at WOAH at NAACL 2022

  16. arXiv:2204.10931  [pdf, other

    cs.CL

    MCSE: Multimodal Contrastive Learning of Sentence Embeddings

    Authors: Miaoran Zhang, Marius Mosbach, David Ifeoluwa Adelani, Michael A. Hedderich, Dietrich Klakow

    Abstract: Learning semantically meaningful sentence embeddings is an open problem in natural language processing. In this work, we propose a sentence embedding learning approach that exploits both visual and textual information via a multimodal contrastive objective. Through experiments on a variety of semantic textual similarity tasks, we demonstrate that our approach consistently improves the performance… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted by NAACL 2022 main conference (short paper), 11 pages

  17. arXiv:2204.06487  [pdf, other

    cs.CL

    Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning

    Authors: Jesujoba O. Alabi, David Ifeoluwa Adelani, Marius Mosbach, Dietrich Klakow

    Abstract: Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is \textit{language adaptive fine-tuning} (LA… ▽ More

    Submitted 18 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted to COLING 2022

  18. arXiv:2204.02906  [pdf, other

    cs.IR cs.CL

    Knowledge Base Index Compression via Dimensionality and Precision Reduction

    Authors: Vilém Zouhar, Marius Mosbach, Miaoran Zhang, Dietrich Klakow

    Abstract: Recently neural network based approaches to knowledge-intensive NLP tasks, such as question answering, started to rely heavily on the combination of neural retrievers and readers. Retrieval is typically performed over a large textual knowledge base (KB) which requires significant memory and compute resources, especially when scaled up. On HotpotQA we systematically investigate reducing the size of… ▽ More

    Submitted 18 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: To be presented at Spa-NLP workshop at ACL 2022

  19. arXiv:2201.09651  [pdf, other

    cs.CL cs.IR

    Artefact Retrieval: Overview of NLP Models with Knowledge Base Access

    Authors: Vilém Zouhar, Marius Mosbach, Debanjali Biswas, Dietrich Klakow

    Abstract: Many NLP models gain performance by having access to a knowledge base. A lot of research has been devoted to devising and improving the way the knowledge base is accessed and incorporated into the model, resulting in a number of mechanisms and pipelines. Despite the diversity of proposed mechanisms, there are patterns in the designs of such systems. In this paper, we systematically describe the ty… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: 11 pages of main content, 7 pages of appendix; presented at AKBC CSRR 2021

  20. arXiv:2110.05881  [pdf, ps, other

    cs.CV

    Fourier-based Video Prediction through Relational Object Motion

    Authors: Malte Mosbach, Sven Behnke

    Abstract: The ability to predict future outcomes conditioned on observed video frames is crucial for intelligent decision-making in autonomous systems. Recently, deep recurrent architectures have been applied to the task of video prediction. However, this often results in blurry predictions and requires tedious training on large datasets. Here, we explore a different approach by (1) using frequency-domain a… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  21. arXiv:2106.08686  [pdf, other

    cs.CL cs.SD eess.AS

    Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

    Authors: Badr M. Abdullah, Marius Mosbach, Iuliia Zaitova, Bernd Möbius, Dietrich Klakow

    Abstract: Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted in Interspeech 2021

  22. arXiv:2011.00960  [pdf, other

    cs.CL

    A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English

    Authors: Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr M. Abdullah, Dietrich Klakow

    Abstract: Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted to COLING 2020

  23. arXiv:2010.15251  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Fusion Models for Improved Visual Captioning

    Authors: Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow

    Abstract: Visual captioning aims to generate textual descriptions given images or videos. Traditionally, image captioning models are trained on human annotated datasets such as Flickr30k and MS-COCO, which are limited in size and diversity. This limitation hinders the generalization capabilities of these models while also rendering them liable to making mistakes. Language models can, however, be trained on… ▽ More

    Submitted 4 December, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Accepted at "Multi-Modal Deep Learning: Challenges and Applications" (MMDLCA), International Conference on Pattern Recognition (ICPR)-2020, Milano, Italia

    Journal ref: Springer LNCS, volume 12666, 2021

  24. arXiv:2010.02616  [pdf, other

    cs.CL cs.LG

    On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

    Authors: Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow

    Abstract: Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contrib… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted at Findings of EMNLP 2020 and BlackboxNLP 2020

  25. arXiv:2007.06077  [pdf, other

    cs.CV cs.CL cs.LG

    Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

    Authors: Aditya Mogadala, Marius Mosbach, Dietrich Klakow

    Abstract: Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore. The challenge here proliferate over the standard vision conditioned sentence-level generation (e.g., image or video captioning) as it requires to produce a brief and coherent story describing the visual content. In this paper, we mask this Vision-to-Sequence as Graph-to-Sequence lea… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Comments: International Conference on Machine Learning (ICML) 2020 Workshop (https://logicalreasoninggnn.github.io/)

  26. arXiv:2006.04884  [pdf, other

    cs.LG stat.ML

    On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

    Authors: Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow

    Abstract: Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks. Despite the strong empirical performance of fine-tuned models, fine-tuning is an unstable process: training the same model with multiple random seeds can result in a large variance of the task performance. Previous literature (Devlin et al., 201… ▽ More

    Submitted 25 March, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: ICLR 2021

  27. arXiv:1902.03020  [pdf, ps, other

    cs.CR cs.LG

    On the security relevance of weights in deep learning

    Authors: Kathrin Grosse, Thomas A. Trost, Marius Mosbach, Michael Backes, Dietrich Klakow

    Abstract: Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed. We show that the threat is broader: A task-independent permutation on the initial weights suffices to limit the achieved accuracy to for example 50% on the Fashion MNIST dataset from initially more than $90$%. These findings are confirmed on MNIST and CIFAR. We formally confirm that the attack su… ▽ More

    Submitted 29 November, 2020; v1 submitted 8 February, 2019; originally announced February 2019.

    Comments: 16 pages, 18 figures, long version of paper published at ICANN 2020

  28. arXiv:1810.12042  [pdf, other

    cs.LG cs.CR stat.ML

    Logit Pairing Methods Can Fool Gradient-Based Attacks

    Authors: Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow

    Abstract: Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers. We show that the computationally fast methods they propose - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder without providing actual robustness. We find that Adversarial Lo… ▽ More

    Submitted 12 March, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: Accepted to NeurIPS 2018 Workshop on Security in Machine Learning