Skip to main content

Showing 1–50 of 68 results for author: Negri, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14177  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. For this year's submission in the speech-to-text translation (ST) sub-track, we propose SimulSeamless, which is realized by combining AlignAtt and SeamlessM4T in its medium configuration. The SeamlessM4T model is used "off-the-shelf" and its simultaneous inference is enabled through the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.09116  [pdf, other

    cs.LG stat.ML

    Injective Flows for parametric hypersurfaces

    Authors: Marcello Massimo Negri, Jonathan Aellen, Volker Roth

    Abstract: Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for para… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.06097  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: Streaming speech-to-text translation (StreamST) is the task of automatically translating speech while incrementally receiving an audio stream. Unlike simultaneous ST (SimulST), which deals with pre-segmented speech, StreamST faces the challenges of handling continuous and unbounded audio streams. This requires additional decisions about what to retain of the previous history, which is impractical… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 main conference

  4. arXiv:2406.03881  [pdf, other

    cs.CL

    Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

    Authors: Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh, Marco Turchi

    Abstract: Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to fill this gap by conducting a comprehensive human evaluation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: LREC-COLING2024 publication (with corrections for Table 3)

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  5. arXiv:2405.10741  [pdf, other

    cs.CL

    SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

    Abstract: Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subta… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 main conference

  6. arXiv:2405.08477  [pdf, other

    cs.CL

    Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models

    Authors: Andrea Piergentili, Beatrice Savoldi, Matteo Negri, Luisa Bentivogli

    Abstract: Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted at EAMT 2024

  7. arXiv:2402.13208  [pdf, other

    cs.CL cs.AI

    How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

    Abstract: The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity. Consequently, research efforts in the last few years focused on finding more efficient alternatives. Among them, Hyena (Poli et al., 2023) stands out for achieving competitive results in both language modeling and image classification,… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  8. arXiv:2402.12025  [pdf, other

    cs.CL

    Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli

    Abstract: The field of natural language processing (NLP) has recently witnessed a transformative shift with the emergence of foundation models, particularly Large Language Models (LLMs) that have revolutionized text-based NLP. This paradigm has extended to other modalities, including speech, where researchers are actively exploring the combination of Speech Foundation Models (SFMs) and LLMs into single, uni… ▽ More

    Submitted 17 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to the ACL 2024 main conference

  9. arXiv:2402.06041  [pdf, other

    cs.CL

    A Prompt Response to the Demand for Automatic Gender-Neutral Translation

    Authors: Beatrice Savoldi, Andrea Piergentili, Dennis Fucci, Matteo Negri, Luisa Bentivogli

    Abstract: Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a pivotal challenge for the creation of more inclusive translation technologies. Advancements for this task in Machine Translation (MT), however, are hindered by the lack of dedicated parallel data, which are necessary to adapt MT systems to satisfy neutral constraints. For such a scenario, large language models of… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted at EACL 2024

  10. arXiv:2310.19345  [pdf, other

    cs.CL

    Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES

    Authors: Beatrice Savoldi, Marco Gaido, Matteo Negri, Luisa Bentivogli

    Abstract: As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted at WMT 2023

  11. arXiv:2310.15752  [pdf, other

    cs.CL cs.AI

    Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

    Authors: Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

    Abstract: When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST d… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  12. arXiv:2310.15114  [pdf, other

    cs.CL

    How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation

    Authors: Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli

    Abstract: When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: To appear in CLiC-it 2023

  13. arXiv:2310.06590  [pdf, ps, other

    cs.CL

    No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation

    Authors: Dennis Fucci, Marco Gaido, Matteo Negri, Mauro Cettolo, Luisa Bentivogli

    Abstract: Automatic speech recognition (ASR) systems are known to be sensitive to the sociolinguistic variability of speech data, in which gender plays a crucial role. This can result in disparities in recognition accuracy between male and female speakers, primarily due to the under-representation of the latter group in the training data. While in the context of hybrid ASR models several solutions have been… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at ASRU 2023

  14. arXiv:2310.05294  [pdf, other

    cs.CL

    Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus

    Authors: Andrea Piergentili, Beatrice Savoldi, Dennis Fucci, Matteo Negri, Luisa Bentivogli

    Abstract: Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023

  15. arXiv:2309.15554  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

    Authors: Sara Papi, Marco Gaido, Matteo Negri

    Abstract: This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. Our submission focused on the use of direct architectures to perform both tasks: for the simultaneous one, we leveraged the knowledge already acquired by offline-trained models and directly applied a policy to obtain the real-time inference; for the su… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Published at IWSTL 2023

    Journal ref: Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

  16. arXiv:2306.07255  [pdf, other

    cs.LG stat.ML

    Conditional Matrix Flows for Gaussian Graphical Models

    Authors: Marcello Massimo Negri, F. Arend Torres, Volker Roth

    Abstract: Studying conditional independence among many variables with few observations is a challenging task. Gaussian Graphical Models (GGMs) tackle this problem by encouraging sparsity in the precision matrix through $l_q$ regularization with $q\leq1$. However, most GMMs rely on the $l_1$ norm because the objective is highly non-convex for sub-$l_1$ pseudo-norms. In the frequentist formulation, the $l_1$… ▽ More

    Submitted 16 November, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: NeurIPS23 version

  17. arXiv:2305.16846  [pdf, other

    cs.LG physics.data-an physics.flu-dyn stat.ML

    Lagrangian Flow Networks for Conservation Laws

    Authors: F. Arend Torres, Marcello Massimo Negri, Marco Inversi, Jonathan Aellen, Volker Roth

    Abstract: We introduce Lagrangian Flow Networks (LFlows) for modeling fluid densities and velocities continuously in space and time. By construction, the proposed LFlows satisfy the continuity equation, a PDE describing mass conservation in its differentiable form. Our model is based on the insight that solutions to the continuity equation can be expressed as time-dependent density transformations via diffe… ▽ More

    Submitted 13 December, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  18. arXiv:2305.11408  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

    Authors: Sara Papi, Marco Turchi, Matteo Negri

    Abstract: Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text is substituted with audio segments, as in the cas… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

    Journal ref: Proceedings of INTERSPEECH 2023

  19. arXiv:2303.16880  [pdf, other

    cond-mat.dis-nn cs.LG

    Storage and Learning phase transitions in the Random-Features Hopfield Model

    Authors: Matteo Negri, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico Malatesta

    Abstract: The Hopfield model is a paradigmatic model of neural networks that has been analyzed for many decades in the statistical physics, neuroscience, and machine learning communities. Inspired by the manifold hypothesis in machine learning, we propose and investigate a generalization of the standard setting that we name Random-Features Hopfield Model. Here $P$ binary patterns of length $N$ are generated… ▽ More

    Submitted 28 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  20. arXiv:2303.16166  [pdf, other

    cs.CL cs.AI

    When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP

    Authors: Sara Papi, Marco Gaido, Andrea Pilzer, Matteo Negri

    Abstract: Despite its crucial role in research experiments, code correctness is often presumed only on the basis of the perceived quality of results. This assumption comes with the risk of erroneous outcomes and potentially misleading findings. To address this issue, we posit that the current focus on reproducibility should go hand in hand with the emphasis on software quality. We present a case study in wh… ▽ More

    Submitted 15 August, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  21. arXiv:2301.10075  [pdf, other

    cs.CL

    Gender Neutralization for an Inclusive Machine Translation: from Theoretical Foundations to Open Challenges

    Authors: Andrea Piergentili, Dennis Fucci, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri

    Abstract: Gender inclusivity in language technologies has become a prominent research topic. In this study, we explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models, which have been found to perpetuate gender bias and discrimination. Specifically, we focus on translation from English into Italian, a language pair representative… ▽ More

    Submitted 4 July, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: Accepted at the GITT workshop @ EAMT 2023

  22. Attention as a Guide for Simultaneous Speech Translation

    Authors: Sara Papi, Matteo Negri, Marco Turchi

    Abstract: The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this… ▽ More

    Submitted 11 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023

    Journal ref: Proceedings of ACL 2023

  23. Joint Speech Translation and Named Entity Recognition

    Authors: Marco Gaido, Sara Papi, Matteo Negri, Marco Turchi

    Abstract: Modern automatic translation systems aim at place the human at the center by providing contextual support and knowledge. In this context, a critical task is enriching the output with information regarding the mentioned entities, which is currently achieved processing the generated translation with named entity recognition (NER) and entity linking systems. In light of the recent promising results s… ▽ More

    Submitted 20 May, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted at INTERSPEECH 2023

  24. arXiv:2209.13192  [pdf, other

    cs.CL

    Direct Speech Translation for Automatic Subtitling

    Authors: Sara Papi, Marco Gaido, Alina Karakanta, Mauro Cettolo, Matteo Negri, Marco Turchi

    Abstract: Automatic subtitling is the task of automatically translating the speech of audiovisual content into short pieces of timed text, i.e. subtitles and their corresponding timestamps. The generated subtitles need to conform to space and time requirements, while being synchronised with the speech and segmented in a way that facilitates comprehension. Given its considerable complexity, the task has so f… ▽ More

    Submitted 25 July, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted at TACL

  25. arXiv:2209.10608  [pdf, other

    cs.CL

    Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora

    Authors: Sara Papi, Alina Karakanta, Matteo Negri, Marco Turchi

    Abstract: Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines. Similar to speech translation (ST), model training requires parallel data comprising audio inputs paired with their textual translations. In SubST, however, the text has to be also annotated with subtitle… ▽ More

    Submitted 16 November, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Journal ref: AACL 2022

  26. Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Simultaneous speech translation (SimulST) systems aim at generating their output with the lowest possible latency, which is normally computed in terms of Average Lagging (AL). In this paper we highlight that, despite its widespread adoption, AL provides underestimated scores for systems that generate longer predictions compared to the corresponding references. We also show that this problem has pr… ▽ More

    Submitted 20 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: AutoSimTrans Workshop @ NAACL2022

    Journal ref: Proceedings of the Third Workshop on Automatic Simultaneous Translation (AutoSimTrans 2022)

  27. arXiv:2206.01545  [pdf, other

    cs.LG stat.ML

    Mesh-free Eulerian Physics-Informed Neural Networks

    Authors: Fabricio Arend Torres, Marcello Massimo Negri, Monika Nagy-Huber, Maxim Samarin, Volker Roth

    Abstract: Physics-informed Neural Networks (PINNs) have recently emerged as a principled way to include prior physical knowledge in form of partial differential equations (PDEs) into neural networks. Although PINNs are generally viewed as mesh-free, current approaches still rely on collocation points within a bounded region, even in settings with spatially sparse signals. Furthermore, if the boundaries are… ▽ More

    Submitted 1 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Preprint

  28. arXiv:2205.06755  [pdf, other

    cs.CL

    Who Are We Talking About? Handling Person Names in Speech Translation

    Authors: Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Recent work has shown that systems for speech translation (ST) -- similarly to automatic speech recognition (ASR) -- poorly handle person names. This shortcoming does not only lead to errors that can seriously distort the meaning of the input, but also hinders the adoption of such systems in application scenarios (like computer-assisted interpreting) where the translation of named entities, like p… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: Accepted at IWSLT2022

  29. Efficient yet Competitive Speech Translation: FBK@IWSLT2022

    Authors: Marco Gaido, Sara Papi, Dennis Fucci, Giuseppe Fiameni, Matteo Negri, Marco Turchi

    Abstract: The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ra… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: IWSLT 2022 System Description

    Journal ref: Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

  30. Does Simultaneous Speech Translation need Simultaneous Models?

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: In simultaneous speech translation (SimulST), finding the best trade-off between high translation quality and low latency is a challenging task. To meet the latency constraints posed by the different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, motivated by the increased social and environmental imp… ▽ More

    Submitted 16 November, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Findings of EMNLP 2022

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2022

  31. arXiv:2203.09866  [pdf, other

    cs.CL

    Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation

    Authors: Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages. However, most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions. Such protocols overlook key features of grammatical gender languages, which are cha… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022

  32. arXiv:2111.00514  [pdf, ps, other

    cs.CL

    Visualization: the missing factor in Simultaneous Speech Translation

    Authors: Sara Papi, Matteo Negri, Marco Turchi

    Abstract: Simultaneous speech translation (SimulST) is the task in which output generation has to be performed on partial, incremental speech input. In recent years, SimulST has become popular due to the spread of cross-lingual application scenarios, like international live conferences and streaming lectures, in which on-the-fly speech translation can facilitate users' access to audio-visual content. In thi… ▽ More

    Submitted 8 November, 2021; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Accepted at CLIC-it 2021

    Journal ref: Italian Conference on Computational Linguistics 2021

  33. arXiv:2109.07439  [pdf, other

    cs.CL

    Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation

    Authors: Marco Gaido, Susana Rodríguez, Matteo Negri, Luisa Bentivogli, Marco Turchi

    Abstract: Automatic translation systems are known to struggle with rare words. Among these, named entities (NEs) and domain-specific terms are crucial, since errors in their translation can lead to severe meaning distortions. Despite their importance, previous speech translation (ST) studies have neglected them, also due to the dearth of publicly available resources tailored to their specific evaluation. To… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP2021

  34. Speechformer: Reducing Information Loss in Direct Speech Translation

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation. However, Transformer's quadratic complexity with respect to the input sequence length prevents its adoption as is with audio signals, which are typically represented by long sequences. Current solutions resort to an initial sub-optimal compression… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021 Main Conference

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

  35. arXiv:2107.08807  [pdf, other

    cs.CL

    Simultaneous Speech Translation for Live Subtitling: from Delay to Display

    Authors: Alina Karakanta, Sara Papi, Matteo Negri, Marco Turchi

    Abstract: With the increased audiovisualisation of communication, the need for live subtitles in multilingual events is more relevant than ever. In an attempt to automatise the process, we aim at exploring the feasibility of simultaneous speech translation (SimulST) for live subtitling. However, the word-for-word rate of generation of SimulST systems is not optimal for displaying the subtitles in a comprehe… ▽ More

    Submitted 20 July, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Journal ref: Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW 2021)

  36. arXiv:2107.06246  [pdf, ps, other

    cs.CL

    Between Flexibility and Consistency: Joint Generation of Captions and Subtitles

    Authors: Alina Karakanta, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i.e. captions). However, the joint generation of source captions and target subtitles does not only bring potential output quality advantages when the two decoding processes inform each other, but it is also often required in mu… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: Accepted at IWSLT 2021

  37. arXiv:2106.12607  [pdf, other

    cs.CL cs.SD eess.AS

    Dealing with training and test segmentation mismatch: FBK@IWSLT2021

    Authors: Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning st… ▽ More

    Submitted 28 June, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted at IWSLT2021

    Journal ref: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

  38. arXiv:2106.01045  [pdf, other

    cs.CL

    Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?

    Authors: Luisa Bentivogli, Mauro Cettolo, Marco Gaido, Alina Karakanta, Alberto Martinelli, Matteo Negri, Marco Turchi

    Abstract: Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions. In light of this steady progress, can we claim that the performance gap between the two is closed? Starting from this question, we present a systematic comparison between state-of-the-art systems representative of the two paradigms. Focusing on… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL2021

  39. arXiv:2105.13782  [pdf, other

    cs.CL

    How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation

    Authors: Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: Having recognized gender bias as a major issue affecting current translation technologies, researchers have primarily attempted to mitigate it by working on the data front. However, whether algorithmic aspects concur to exacerbate unwanted outputs remains so far under-investigated. In this work, we bring the analysis on gender bias in automatic translation onto a seemingly neutral yet critical com… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted in Findings of ACL 2021

  40. arXiv:2104.11710  [pdf, other

    cs.SD cs.CL eess.AS

    Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

    Authors: Marco Gaido, Matteo Negri, Mauro Cettolo, Marco Turchi

    Abstract: The audio segmentation mismatch between training data and those seen at run-time is a major problem in direct speech translation. Indeed, while systems are usually trained on manually segmented corpora, in real use cases they are often presented with continuous audio requiring automatic (and sub-optimal) segmentation. After comparing existing techniques (VAD-based, fixed-length and hybrid segmenta… ▽ More

    Submitted 14 October, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Accepted to ICNLSP 2021

  41. arXiv:2104.06001  [pdf, other

    cs.CL

    Gender Bias in Machine Translation

    Authors: Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: Machine translation (MT) technology has facilitated our daily tasks by providing accessible shortcuts for gathering, elaborating and communicating information. However, it can suffer from biases that harm users and society at large. As a relatively new field of inquiry, gender bias in MT still lacks internal cohesion, which advocates for a unified framework to ease future research. To this end, we… ▽ More

    Submitted 7 May, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted for publication in Transaction of the Association for Computational Linguistics (TACL), 2021

  42. arXiv:2103.05951  [pdf, other

    cs.CL

    Self-Learning for Zero Shot Neural Machine Translation

    Authors: Surafel M. Lakew, Matteo Negri, Marco Turchi

    Abstract: Neural Machine Translation (NMT) approaches employing monolingual data are showing steady improvements in resource rich conditions. However, evaluations using real-world low-resource languages still result in unsatisfactory performance. This work proposes a novel zero-shot NMT modeling approach that learns without the now-standard assumption of a pivot language sharing parallel data with the zero-… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

  43. arXiv:2102.01757  [pdf, other

    cs.CL

    The Multilingual TEDx Corpus for Speech Recognition and Translation

    Authors: Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post

    Abstract: We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the source-language audio and target-language translations. The corpus is released along with op… ▽ More

    Submitted 14 June, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted to Interspeech 2021

  44. arXiv:2102.01578  [pdf, other

    cs.CL

    CTC-based Compression for Direct Speech Translation

    Authors: Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi

    Abstract: Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct ST, in which a single model translates the input audio into the target language without intermediate representations. In this work, we propose the first method a… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted at EACL2021

    Journal ref: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (2021), 690-696

  45. arXiv:2012.04964  [pdf, ps, other

    cs.CL

    On Knowledge Distillation for Direct Speech Translation

    Authors: Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi

    Abstract: Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyz… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Accepted at CLiC-IT 2020

  46. arXiv:2012.04955  [pdf, ps, other

    cs.CL

    Breeding Gender-aware Direct Speech Translation Systems

    Authors: Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi

    Abstract: In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vo… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: Outstanding paper at COLING 2020

    Journal ref: In Proceedings of the 28th International Conference on Computational Linguistics, Dec 2020, 3951-3964. Online

  47. arXiv:2010.14761  [pdf, other

    cs.LG cond-mat.dis-nn math.ST

    Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures

    Authors: Carlo Baldassi, Enrico M. Malatesta, Matteo Negri, Riccardo Zecchina

    Abstract: We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape i… ▽ More

    Submitted 17 November, 2020; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: 19 pages, 4 figures. arXiv admin note: text overlap with arXiv:2006.07897

  48. arXiv:2009.04707  [pdf, other

    cs.CL

    On Target Segmentation for Direct Speech Translation

    Authors: Mattia Antonino Di Gangi, Marco Gaido, Matteo Negri, Marco Turchi

    Abstract: Recent studies on direct speech translation show continuous improvements by means of data augmentation techniques and bigger deep learning models. While these methods are hel** to close the gap between this new approach and the more traditional cascaded one, there are many incongruities among different studies that make it difficult to assess the state of the art. Surprisingly, one point of disc… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: 14 pages single column, 4 figures, accepted for presentation at the AMTA2020 research track

  49. arXiv:2008.02270  [pdf, other

    cs.CL

    Contextualized Translation of Automatically Segmented Speech

    Authors: Marco Gaido, Mattia Antonino Di Gangi, Matteo Negri, Mauro Cettolo, Marco Turchi

    Abstract: Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sen… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: Interspeech 2020

  50. arXiv:2006.05754  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

    Authors: Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia Antonino Di Gangi, Roldano Cattoni, Marco Turchi

    Abstract: Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained b… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: 9 pages of content, accepted at ACL 2020