Search | arXiv e-print repository

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Authors: Daniel Scalena, Gabriele Sarti, Malvina Nissim

Abstract: Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting… ▽ More Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.13663 [pdf, other]

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Authors: Jirui Qi, Gabriele Sarti, Raquel Fernández, Arianna Bisazza

Abstract: Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sou… ▽ More Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution. △ Less

Submitted 1 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: Under review. Code and data released at https://github.com/Betswish/MIRAGE

arXiv:2405.00208 [pdf, other]

A Primer on the Inner Workings of Transformer-based Language Models

Authors: Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà

Abstract: The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architect… ▽ More The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architecture. We conclude by presenting a comprehensive overview of the known internal mechanisms implemented by these models, uncovering connections across popular approaches and active research directions in this area. △ Less

Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

arXiv:2310.03686 [pdf, other]

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Authors: Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Abstract: In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representation… ▽ More In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers, shedding new light on the information flow inside the encoder component of this important class of models. △ Less

Submitted 3 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of NAACL 2024

arXiv:2310.01188 [pdf, other]

Quantifying the Plausibility of Context Reliance in Neural Machine Translation

Authors: Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim, Arianna Bisazza

Abstract: Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address thi… ▽ More Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use \pecore to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated model translations to identify context-mediated predictions and highlight instances of (im)plausible context usage throughout generation. △ Less

Submitted 13 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: ICLR 2024 Camera Ready. Code: https://github.com/gsarti/pecore. Artifacts: https://huggingface.co/collections/gsarti/pecore-iclr-2024-65edab42e28439e21b612c2e

ACM Class: I.2.7

arXiv:2309.00751 [pdf, other]

Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

Authors: Daniel Scalena, Gabriele Sarti, Malvina Nissim, Elisabetta Fersini

Abstract: Due to language models' propensity to generate toxic or hateful responses, several techniques were developed to align model generations with users' preferences. Despite the effectiveness of such methods in improving the safety of model interactions, their impact on models' internal processes is still poorly understood. In this work, we apply popular detoxification approaches to several language mo… ▽ More Due to language models' propensity to generate toxic or hateful responses, several techniques were developed to align model generations with users' preferences. Despite the effectiveness of such methods in improving the safety of model interactions, their impact on models' internal processes is still poorly understood. In this work, we apply popular detoxification approaches to several language models and quantify their impact on the resulting models' prompt dependence using feature attribution methods. We evaluate the effectiveness of counter-narrative fine-tuning and compare it with reinforcement learning-driven detoxification, observing differences in prompt reliance between the two methods despite their similar detoxification performances. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 4 pages

arXiv:2305.17131 [pdf, other]

doi 10.18653/v1/2023.acl-short.126

RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation

Authors: Gabriele Sarti, Phu Mon Htut, Xing Niu, Benjamin Hsu, Anna Currey, Georgiana Dinu, Maria Nadejde

Abstract: Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised… ▽ More Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised methods. To address this limitation, we propose Retrieval and Attribute-Marking enhanced Prompting (RAMP), which leverages large multilingual language models to perform ACT in few-shot and zero-shot settings. RAMP improves generation accuracy over the standard prompting approach by (1) incorporating a semantic similarity retrieval component for selecting similar in-context examples, and (2) marking in-context examples with attribute annotations. Our comprehensive experiments show that RAMP is a viable approach in both zero-shot and few-shot settings. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023

Journal ref: Proceedings of ACL (2023) 1476-1490

arXiv:2302.14220 [pdf, other]

doi 10.1162/tacl_a_00651

Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation

Authors: Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza

Abstract: Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks. However, there has been little research on their effectiveness for neural machine translation (NMT), particularly within the popular pretrain-then-finetune paradigm. This work performs an extensive comparison across multi… ▽ More Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks. However, there has been little research on their effectiveness for neural machine translation (NMT), particularly within the popular pretrain-then-finetune paradigm. This work performs an extensive comparison across multiple languages and experimental conditions of character- and subword-level pretrained models (ByT5 and mT5, respectively) on NMT. We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited. In our analysis, we show how character models' gains in translation quality are reflected in better translations of orthographically similar words and rare words. While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5, suggesting an ability to modulate word-level and character-level information during generation. We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality. △ Less

Submitted 26 January, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: This version of our work is a pre-MIT Press publication version

arXiv:2302.13942 [pdf, other]

doi 10.18653/v1/2023.acl-demo.40

Inseq: An Interpretability Toolkit for Sequence Generation Models

Authors: Gabriele Sarti, Nils Feldhus, Ludwig Sickert, Oskar van der Wal, Malvina Nissim, Arianna Bisazza

Abstract: Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal infor… ▽ More Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal information and feature importance scores for popular decoder-only and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations. △ Less

Submitted 27 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: ACL 2023 Demo Track. Library: https://github.com/inseq-team/inseq, Docs: https://inseq.readthedocs.io, v0.4

Journal ref: Proceedings of ACL: System Demonstrations (2023) 421-435

arXiv:2205.12215 [pdf, other]

doi 10.18653/v1/2022.emnlp-main.532

DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages

Authors: Gabriele Sarti, Arianna Bisazza, Ana Guerberof Arenas, Antonio Toral

Abstract: We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keys… ▽ More We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times and pauses were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and post-editing effectiveness. Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity. We find that post-editing is consistently faster than translation from scratch. However, the magnitude of productivity gains varies widely across systems and languages, highlighting major disparities in post-editing effectiveness for languages at different degrees of typological relatedness to English, even when controlling for system architecture and training data size. We publicly release the complete dataset including all collected behavioral data, to foster new research on the translation capabilities of NMT systems for typologically diverse languages. △ Less

Submitted 18 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: EMNLP 2022, materials: https://github.com/gsarti/divemt

Journal ref: Proceedings of EMNLP (2022) 7795-7816

arXiv:2203.03759 [pdf, other]

IT5: Text-to-text Pretraining for Italian Language Understanding and Generation

Authors: Gabriele Sarti, Malvina Nissim

Abstract: We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the p… ▽ More We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the performance of IT5 models and multilingual baselines. We find monolingual IT5 models to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for Italian language generation. △ Less

Submitted 20 May, 2024; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: LREC-COLING 2024. Code and checkpoints: https://github.com/gsarti/it5

Journal ref: Proceedings of LREC-COLING (2024) 9422-9433

arXiv:2108.08688 [pdf, other]

Contrastive Language-Image Pre-training for the Italian Language

Authors: Federico Bianchi, Giuseppe Attanasio, Raphael Pisoni, Silvia Terragni, Gabriele Sarti, Sri Lakshmi

Abstract: CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs hi… ▽ More CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2104.12422 [pdf, other]

doi 10.18653/v1/2021.teachingnlp-1.26

Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students

Authors: Ludovica Pannitto, Lucia Busso, Claudia Roberta Combei, Lucio Messina, Alessio Miaschi, Gabriele Sarti, Malvina Nissim

Abstract: Although Natural Language Processing (NLP) is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and l… ▽ More Although Natural Language Processing (NLP) is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and longer-term interest in young people, we have developed an interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years. The workshop takes the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language: from voice recognition to Markov chains to syntactic parsing. Participants are guided through the workshop with the help of instructors, who present the activities and explain core concepts from computational linguistics. The workshop was presented at numerous outlets in Italy between 2019 and 2021, both face-to-face and online. △ Less

Submitted 14 May, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: 11 pages, 16 figures, accepted at Teaching NLP 2021 Workshop

Journal ref: Proceedings of the 5th Workshop on Teaching NLP (2021) 160-170

arXiv:2104.12405 [pdf, other]

doi 10.18653/v1/2021.teachingnlp-1.7

A dissemination workshop for introducing young Italian students to NLP

Authors: Lucio Messina, Lucia Busso, Claudia Roberta Combei, Ludovica Pannitto, Alessio Miaschi, Gabriele Sarti, Malvina Nissim

Abstract: We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students. We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students. △ Less

Submitted 14 May, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: 3 pages, 4 figures, accepted at Teaching NLP 2021 workshop

Journal ref: Proceedings of the 5th Workshop on Teaching NLP (2021) 52-54

arXiv:2011.05197 [pdf, other]

UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations

Authors: Gabriele Sarti

Abstract: This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available. Multiple copies of the original model are initially trained on the downstream task. Their predictions are then used to annotate a large set of unlabeled examples. Finally, multi-task training is performed on the parallel annotation… ▽ More This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available. Multiple copies of the original model are initially trained on the downstream task. Their predictions are then used to annotate a large set of unlabeled examples. Finally, multi-task training is performed on the parallel annotations of the resulting training set, and final scores are obtained by averaging annotator-specific head predictions. Neural language models are fine-tuned using this procedure in the context of the AcCompl-it shared task at EVALITA 2020, obtaining considerable improvements in prediction quality. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: 5 pages, Best system award for the AcCompl-It shared task at the EVALITA 2020 workshop

MSC Class: 68T50 ACM Class: I.2.7; J.5

Journal ref: Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020)

arXiv:2008.10875 [pdf, other]

doi 10.4000/ijcol.728

ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Authors: Ginevra Carbone, Gabriele Sarti

Abstract: Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitab… ▽ More Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitable for low-resource settings. We present ETC-NLG, an approach leveraging topic modeling annotations to enable fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation over emergent topics in unlabeled document collections. We first test the effectiveness of our approach in a low-resource setting for Italian, evaluating the conditioning for both topic models and gold annotations. We then perform a comparative evaluation of ETC-NLG for Italian and English using a parallel corpus. Finally, we propose an automatic approach to estimate the effectiveness of conditioning on the generated utterances. △ Less

Submitted 22 June, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

Journal ref: Italian Journal of Computational Linguistics (IJCoL) 6-2 (2020) 61-77

Showing 1–16 of 16 results for author: Sarti, G