Skip to main content

Showing 1–16 of 16 results for author: Sarti, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17563  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-property Steering of Large Language Models with Dynamic Activation Composition

    Authors: Daniel Scalena, Gabriele Sarti, Malvina Nissim

    Abstract: Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.13663  [pdf, other

    cs.CL cs.AI cs.LG

    Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

    Authors: Jirui Qi, Gabriele Sarti, Raquel Fernández, Arianna Bisazza

    Abstract: Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sou… ▽ More

    Submitted 1 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Under review. Code and data released at https://github.com/Betswish/MIRAGE

  3. arXiv:2405.00208  [pdf, other

    cs.CL

    A Primer on the Inner Workings of Transformer-based Language Models

    Authors: Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà

    Abstract: The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architect… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

  4. arXiv:2310.03686  [pdf, other

    cs.CL

    DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

    Authors: Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

    Abstract: In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representation… ▽ More

    Submitted 3 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of NAACL 2024

  5. arXiv:2310.01188  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Quantifying the Plausibility of Context Reliance in Neural Machine Translation

    Authors: Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim, Arianna Bisazza

    Abstract: Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address thi… ▽ More

    Submitted 13 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera Ready. Code: https://github.com/gsarti/pecore. Artifacts: https://huggingface.co/collections/gsarti/pecore-iclr-2024-65edab42e28439e21b612c2e

    ACM Class: I.2.7

  6. arXiv:2309.00751  [pdf, other

    cs.CL

    Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

    Authors: Daniel Scalena, Gabriele Sarti, Malvina Nissim, Elisabetta Fersini

    Abstract: Due to language models' propensity to generate toxic or hateful responses, several techniques were developed to align model generations with users' preferences. Despite the effectiveness of such methods in improving the safety of model interactions, their impact on models' internal processes is still poorly understood. In this work, we apply popular detoxification approaches to several language mo… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: 4 pages

  7. RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation

    Authors: Gabriele Sarti, Phu Mon Htut, Xing Niu, Benjamin Hsu, Anna Currey, Georgiana Dinu, Maria Nadejde

    Abstract: Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

    Journal ref: Proceedings of ACL (2023) 1476-1490

  8. Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation

    Authors: Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza

    Abstract: Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks. However, there has been little research on their effectiveness for neural machine translation (NMT), particularly within the popular pretrain-then-finetune paradigm. This work performs an extensive comparison across multi… ▽ More

    Submitted 26 January, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: This version of our work is a pre-MIT Press publication version

  9. arXiv:2302.13942  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Inseq: An Interpretability Toolkit for Sequence Generation Models

    Authors: Gabriele Sarti, Nils Feldhus, Ludwig Sickert, Oskar van der Wal, Malvina Nissim, Arianna Bisazza

    Abstract: Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal infor… ▽ More

    Submitted 27 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: ACL 2023 Demo Track. Library: https://github.com/inseq-team/inseq, Docs: https://inseq.readthedocs.io, v0.4

    Journal ref: Proceedings of ACL: System Demonstrations (2023) 421-435

  10. DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages

    Authors: Gabriele Sarti, Arianna Bisazza, Ana Guerberof Arenas, Antonio Toral

    Abstract: We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keys… ▽ More

    Submitted 18 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022, materials: https://github.com/gsarti/divemt

    Journal ref: Proceedings of EMNLP (2022) 7795-7816

  11. arXiv:2203.03759  [pdf, other

    cs.CL

    IT5: Text-to-text Pretraining for Italian Language Understanding and Generation

    Authors: Gabriele Sarti, Malvina Nissim

    Abstract: We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the p… ▽ More

    Submitted 20 May, 2024; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: LREC-COLING 2024. Code and checkpoints: https://github.com/gsarti/it5

    Journal ref: Proceedings of LREC-COLING (2024) 9422-9433

  12. arXiv:2108.08688  [pdf, other

    cs.CL cs.CV

    Contrastive Language-Image Pre-training for the Italian Language

    Authors: Federico Bianchi, Giuseppe Attanasio, Raphael Pisoni, Silvia Terragni, Gabriele Sarti, Sri Lakshmi

    Abstract: CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs hi… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  13. Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students

    Authors: Ludovica Pannitto, Lucia Busso, Claudia Roberta Combei, Lucio Messina, Alessio Miaschi, Gabriele Sarti, Malvina Nissim

    Abstract: Although Natural Language Processing (NLP) is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and l… ▽ More

    Submitted 14 May, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: 11 pages, 16 figures, accepted at Teaching NLP 2021 Workshop

    Journal ref: Proceedings of the 5th Workshop on Teaching NLP (2021) 160-170

  14. A dissemination workshop for introducing young Italian students to NLP

    Authors: Lucio Messina, Lucia Busso, Claudia Roberta Combei, Ludovica Pannitto, Alessio Miaschi, Gabriele Sarti, Malvina Nissim

    Abstract: We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students.

    Submitted 14 May, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: 3 pages, 4 figures, accepted at Teaching NLP 2021 workshop

    Journal ref: Proceedings of the 5th Workshop on Teaching NLP (2021) 52-54

  15. arXiv:2011.05197  [pdf, other

    cs.CL cs.LG

    UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations

    Authors: Gabriele Sarti

    Abstract: This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available. Multiple copies of the original model are initially trained on the downstream task. Their predictions are then used to annotate a large set of unlabeled examples. Finally, multi-task training is performed on the parallel annotation… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: 5 pages, Best system award for the AcCompl-It shared task at the EVALITA 2020 workshop

    MSC Class: 68T50 ACM Class: I.2.7; J.5

    Journal ref: Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020)

  16. ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

    Authors: Ginevra Carbone, Gabriele Sarti

    Abstract: Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitab… ▽ More

    Submitted 22 June, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

    Journal ref: Italian Journal of Computational Linguistics (IJCoL) 6-2 (2020) 61-77