Skip to main content

Showing 1–13 of 13 results for author: Lakew, S M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.12979  [pdf, other

    cs.CL cs.SD eess.AS

    Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing

    Authors: Alexandra Chronopoulou, Brian Thompson, Prashant Mathur, Yogesh Virkar, Surafel M. Lakew, Marcello Federico

    Abstract: Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech. The new target language speech should satisfy isochrony; that is, the new speech should be time aligned with the original video, including mouth movements, pauses, hand gestures, etc. In this paper, we propose training a model that directly optimizes both the translation as well as the spe… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 5 pages

  2. arXiv:2112.08682  [pdf, other

    cs.CL cs.LG

    Isometric MT: Neural Machine Translation for Automatic Dubbing

    Authors: Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico

    Abstract: Automatic dubbing (AD) is among the machine translation (MT) use cases where translations should match a given length to allow for synchronicity between source and target speech. For neural MT, generating translations of length close to the source length (e.g. within +-10% in character count), while preserving quality is a challenging task. Controlling MT output length comes at a cost to translati… ▽ More

    Submitted 16 February, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published in ICASSP 2022 - scheduled for 22-27 May 2022 in Singapore

  3. arXiv:2112.08548  [pdf, other

    cs.CL

    Isochrony-Aware Neural Machine Translation for Automatic Dubbing

    Authors: Derek Tam, Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico

    Abstract: We introduce the task of isochrony-aware machine translation which aims at generating translations suitable for dubbing. Dubbing of a spoken sentence requires transferring the content as well as the speech-pause structure of the source into the target language to achieve audiovisual coherence. Practically, this implies correctly projecting pauses from the source to the target and ensuring that tar… ▽ More

    Submitted 8 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Published at InterSpeech 2022 (https://interspeech2022.org) - scheduled for September 18-22 2022, Incheon Korea

  4. arXiv:2110.03847  [pdf, other

    cs.CL cs.SD eess.AS

    Machine Translation Verbosity Control for Automatic Dubbing

    Authors: Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh Virkar, Roberto Barra-Chicote, Robert Enyedi

    Abstract: Automatic dubbing aims at seamlessly replacing the speech in a video document with synthetic speech in a different language. The task implies many challenges, one of which is generating translations that not only convey the original content, but also match the duration of the corresponding utterances. In this paper, we focus on the problem of controlling the verbosity of machine translation output… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021

  5. arXiv:2103.05951  [pdf, other

    cs.CL

    Self-Learning for Zero Shot Neural Machine Translation

    Authors: Surafel M. Lakew, Matteo Negri, Marco Turchi

    Abstract: Neural Machine Translation (NMT) approaches employing monolingual data are showing steady improvements in resource rich conditions. However, evaluations using real-world low-resource languages still result in unsatisfactory performance. This work proposes a novel zero-shot NMT modeling approach that learns without the now-standard assumption of a pivot language sharing parallel data with the zero-… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

  6. arXiv:2003.14402  [pdf, other

    cs.CL

    Low Resource Neural Machine Translation: A Benchmark for Five African Languages

    Authors: Surafel M. Lakew, Matteo Negri, Marco Turchi

    Abstract: Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks. In this work, we benchmark NMT between English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo, Somali [SATOS]). We collected the available resources on the SATOS languages to evaluate the current state of NMT for LRLs. Our evaluation, comparing a baseline sing… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: Accepted for AfricaNLP workshop at ICLR 2020

  7. arXiv:1910.13998  [pdf, other

    cs.CL

    Adapting Multilingual Neural Machine Translation to Unseen Languages

    Authors: Surafel M. Lakew, Alina Karakanta, Marcello Federico, Matteo Negri, Marco Turchi

    Abstract: Multilingual Neural Machine Translation (MNMT) for low-resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: Accepted at the 16th International Workshop on Spoken Language Translation (IWSLT), November, 2019

  8. arXiv:1910.10408  [pdf, other

    cs.CL

    Controlling the Output Length of Neural Machine Translation

    Authors: Surafel Melaku Lakew, Mattia Di Gangi, Marcello Federico

    Abstract: The recent advances introduced by neural machine translation (NMT) are rapidly expanding the application fields of machine translation, as well as resha** the quality level to be targeted. In particular, if translations have to fit some given layout, quality should not only be measured in terms of adequacy and fluency, but also length. Exemplary cases are the translation of document files, subti… ▽ More

    Submitted 25 October, 2019; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: To appear at the 16th International Workshop on Spoken Language Translation (IWSLT), 2019

  9. arXiv:1909.07342  [pdf, other

    cs.CL

    Multilingual Neural Machine Translation for Zero-Resource Languages

    Authors: Surafel M. Lakew, Marcello Federico, Matteo Negri, Marco Turchi

    Abstract: In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT). However, NMT systems are limited in translating low-resourced languages, due to the significant amount of parallel data that is required to learn useful map**s between languages. In this work, we show… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

    Comments: 15 pages, Published on Italian Journal of Computational Linguistics (IJCoL) -- Multilingual Neural Machine Translation for Low-Resource Languages, June 2018

  10. arXiv:1811.01389  [pdf, other

    cs.CL

    Improving Zero-Shot Translation of Low-Resource Languages

    Authors: Surafel M. Lakew, Quintino F. Lotito, Matteo Negri, Marco Turchi, Marcello Federico

    Abstract: Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zeroshot) translation directions not observed at training time. We investigate here a zero-shot translation in a particularly lowresource multilingual setting. We propose a simple iterative training procedure that leverages a duality of… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

    Comments: Published at the International Workshop on Spoken Language Translation (IWSLT), Tokyo, Japan, December 2017

  11. arXiv:1811.01137  [pdf, other

    cs.CL

    Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

    Authors: Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico, Marco Turchi

    Abstract: We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language pair to cover new languages by adapting its vocabulary as long as new data become available (i.e., introducing new vocabulary items if they are not included in the initial model). The parameter transfer… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: Published at the International Workshop on Spoken Language Translation (IWSLT), 2018

  12. arXiv:1811.01064  [pdf, ps, other

    cs.CL

    Neural Machine Translation into Language Varieties

    Authors: Surafel M. Lakew, Aliia Erofeeva, Marcello Federico

    Abstract: Both research and commercial machine translation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among language varieties. Notable cases are standard national varieties such as Brazilian and European Portuguese, and Canadian and European French, which popular online machine translation services are not kee** distinct. We show that… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: Published at EMNLP 2018: third conference on machine translation (WMT 2018)

  13. arXiv:1806.06957  [pdf, ps, other

    cs.CL

    A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

    Authors: Surafel M. Lakew, Mauro Cettolo, Marcello Federico

    Abstract: Recently, neural machine translation (NMT) has been extended to multilinguality, that is to handle more than one translation direction with a single system. Multilingual NMT showed competitive performance against pure bilingual systems. Notably, in low-resource settings, it proved to work effectively and efficiently, thanks to shared representation space that is forced across languages and induces… ▽ More

    Submitted 20 June, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: 12 pages, to appear on the 27th International Conference on Computational Linguistics (COLING 2018)