Skip to main content

Showing 1–2 of 2 results for author: Scalena, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17563  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-property Steering of Large Language Models with Dynamic Activation Composition

    Authors: Daniel Scalena, Gabriele Sarti, Malvina Nissim

    Abstract: Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2309.00751  [pdf, other

    cs.CL

    Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

    Authors: Daniel Scalena, Gabriele Sarti, Malvina Nissim, Elisabetta Fersini

    Abstract: Due to language models' propensity to generate toxic or hateful responses, several techniques were developed to align model generations with users' preferences. Despite the effectiveness of such methods in improving the safety of model interactions, their impact on models' internal processes is still poorly understood. In this work, we apply popular detoxification approaches to several language mo… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: 4 pages