Skip to main content

Showing 1–23 of 23 results for author: Bentivogli, L

.
  1. arXiv:2406.14177  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. For this year's submission in the speech-to-text translation (ST) sub-track, we propose SimulSeamless, which is realized by combining AlignAtt and SeamlessM4T in its medium configuration. The SeamlessM4T model is used "off-the-shelf" and its simultaneous inference is enabled through the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.06097  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: Streaming speech-to-text translation (StreamST) is the task of automatically translating speech while incrementally receiving an audio stream. Unlike simultaneous ST (SimulST), which deals with pre-segmented speech, StreamST faces the challenges of handling continuous and unbounded audio streams. This requires additional decisions about what to retain of the previous history, which is impractical… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 main conference

  3. arXiv:2405.10741  [pdf, other

    cs.CL

    SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

    Abstract: Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subta… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 main conference

  4. arXiv:2405.08477  [pdf, other

    cs.CL

    Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models

    Authors: Andrea Piergentili, Beatrice Savoldi, Matteo Negri, Luisa Bentivogli

    Abstract: Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted at EAMT 2024

  5. arXiv:2402.13208  [pdf, other

    cs.CL cs.AI

    How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

    Abstract: The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity. Consequently, research efforts in the last few years focused on finding more efficient alternatives. Among them, Hyena (Poli et al., 2023) stands out for achieving competitive results in both language modeling and image classification,… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  6. arXiv:2402.12025  [pdf, other

    cs.CL

    Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

    Abstract: The field of natural language processing (NLP) has recently witnessed a transformative shift with the emergence of foundation models, particularly Large Language Models (LLMs) that have revolutionized text-based NLP. This paradigm has extended to other modalities, including speech, where researchers are actively exploring the combination of Speech Foundation Models (SFMs) and LLMs into single, uni… ▽ More

    Submitted 17 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to the ACL 2024 main conference

  7. arXiv:2402.06041  [pdf, other

    cs.CL

    A Prompt Response to the Demand for Automatic Gender-Neutral Translation

    Authors: Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli

    Abstract: Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies. Advancements for this task in Machine Translation (MT), however, are hindered by the lack of dedicated parallel data, which are necessary to adapt MT systems to satisfy neutral constraints. For such a scenario, large language models of… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted at EACL 2024

  8. arXiv:2310.19345  [pdf, other

    cs.CL

    Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES

    Authors: Beatrice Savoldi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted at WMT 2023

  9. arXiv:2310.15752  [pdf, other

    cs.CL cs.AI

    Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

    Authors: Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

    Abstract: When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST d… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  10. arXiv:2310.15114  [pdf, other

    cs.CL

    How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation

    Authors: Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli

    Abstract: When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: To appear in CLiC-it 2023

  11. arXiv:2310.06590  [pdf, ps, other

    cs.CL

    No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation

    Authors: Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

    Abstract: Automatic speech recognition (ASR) systems are known to be sensitive to the sociolinguistic variability of speech data, in which gender plays a crucial role. This can result in disparities in recognition accuracy between male and female speakers, primarily due to the under-representation of the latter group in the training data. While in the context of hybrid ASR models several solutions have been… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at ASRU 2023

  12. arXiv:2310.05294  [pdf, other

    cs.CL

    Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus

    Authors: Andrea Piergentili, Beatrice Savoldi, Dennis Fucci, Matteo Negri, Luisa Bentivogli

    Abstract: Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  13. arXiv:2306.05882  [pdf

    cs.CL

    Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems

    Authors: Silvia Alma Piazzolla, Beatrice Savoldi, Luisa Bentivogli

    Abstract: Machine Translation (MT) continues to make significant strides in quality and is increasingly adopted on a larger scale. Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems - Google Translate,… ▽ More

    Submitted 26 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Journal ref: Hermes Journal of Language and Communication in Business no 63 2023

  14. arXiv:2301.10075  [pdf, other

    cs.CL

    Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges

    Authors: Andrea Piergentili, Dennis Fucci, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri

    Abstract: Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative… ▽ More

    Submitted 4 July, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: Accepted at the GITT workshop @ EAMT 2023

  15. arXiv:2203.09866  [pdf, other

    cs.CL

    Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation

    Authors: Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages. However, most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions. Such protocols overlook key features of grammatical gender languages, which are cha… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022

  16. arXiv:2109.07439  [pdf, other

    cs.CL

    Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation

    Authors: Marco Gaido, Susana Rodríguez, Matteo Negri, Luisa Bentivogli, Marco Turchi

    Abstract: Automatic translation systems are known to struggle with rare words. Among these, named entities (NEs) and domain-specific terms are crucial, since errors in their translation can lead to severe meaning distortions. Despite their importance, previous speech translation (ST) studies have neglected them, also due to the dearth of publicly available resources tailored to their specific evaluation. To… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP2021

  17. arXiv:2106.01045  [pdf, other

    cs.CL

    Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?

    Authors: Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Alberto Martinelli, Matteo Negri, Marco Turchi

    Abstract: Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions. In light of this steady progress, can we claim that the performance gap between the two is closed? Starting from this question, we present a systematic comparison between state-of-the-art systems representative of the two paradigms. Focusing on… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL2021

  18. arXiv:2105.13782  [pdf, other

    cs.CL

    How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation

    Authors: Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: Having recognized gender bias as a major issue affecting current translation technologies, researchers have primarily attempted to mitigate it by working on the data front. However, whether algorithmic aspects concur to exacerbate unwanted outputs remains so far under-investigated. In this work, we bring the analysis on gender bias in automatic translation onto a seemingly neutral yet critical com… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted in Findings of ACL 2021

  19. arXiv:2104.06001  [pdf, other

    cs.CL

    Gender Bias in Machine Translation

    Authors: Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: Machine translation (MT) technology has facilitated our daily tasks by providing accessible shortcuts for gathering, elaborating and communicating information. However, it can suffer from biases that harm users and society at large. As a relatively new field of inquiry, gender bias in MT still lacks internal cohesion, which advocates for a unified framework to ease future research. To this end, we… ▽ More

    Submitted 7 May, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted for publication in Transaction of the Association for Computational Linguistics (TACL), 2021

  20. arXiv:2012.04955  [pdf, ps, other

    cs.CL

    Breeding Gender-aware Direct Speech Translation Systems

    Authors: Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vo… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Outstanding paper at COLING 2020

    Journal ref: In Proceedings of the 28th International Conference on Computational Linguistics, Dec 2020, 3951-3964. Online

  21. arXiv:2006.05754  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

    Authors: Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi

    Abstract: Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained b… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: 9 pages of content, accepted at ACL 2020

  22. arXiv:1910.00478  [pdf, other

    cs.CL

    Machine Translation for Machines: the Sentiment Classification Use Case

    Authors: Amirhossein Tebbifakhr, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: We propose a neural machine translation (NMT) approach that, instead of pursuing adequacy and fluency ("human-oriented" quality criteria), aims to generate translations that are best suited as input to a natural language processing component designed for a specific downstream task (a "machine-oriented" criterion). Towards this objective, we present a reinforcement learning technique based on a new… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  23. arXiv:1608.04631  [pdf, other

    cs.CL

    Neural versus Phrase-Based Machine Translation Quality: a Case Study

    Authors: Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, Marcello Federico

    Abstract: Within the field of Statistical Machine Translation (SMT), the neural approach (NMT) has recently emerged as the first technology able to challenge the long-standing dominance of phrase-based approaches (PBMT). In particular, at the IWSLT 2015 evaluation campaign, NMT outperformed well established state-of-the-art PBMT systems on English-German, a language pair known to be particularly hard becaus… ▽ More

    Submitted 9 October, 2016; v1 submitted 16 August, 2016; originally announced August 2016.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), November 1-5, 2016, Austin, Texas, USA