-
End-to-End Speech Translation of Arabic to English Broadcast News
Authors:
Fethi Bougares,
Salim Jouili
Abstract:
Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift…
▽ More
Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift towards the end-to-end approaches using sequence-to-sequence deep neural network models. This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. Starting from independent ASR and MT LDC releases, we were able to identify about 92 hours of Arabic audio recordings for which the manual transcription was also translated into English at the segment level. These data was used to train and compare pipeline and end-to-end speech translation systems under multiple scenarios including transfer learning and data augmentation techniques.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
Authors:
Marcely Zanon Boito,
John Ortega,
Hugo Riguidel,
Antoine Laurent,
Loïc Barrault,
Fethi Bougares,
Firas Chaabani,
Ha Nguyen,
Florentin Barbier,
Souhir Gahbiche,
Yannick Estève
Abstract:
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tu…
▽ More
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results show that in our settings pipeline approaches are still very competitive, and that with the use of transfer learning, they can outperform end-to-end models for speech translation (ST). For the Tamasheq-French dataset (low-resource track) our primary submission leverages intermediate representations from a wav2vec 2.0 model trained on 234 hours of Tamasheq audio, while our contrastive model uses a French phonetic transcription of the Tamasheq audio as input in a Conformer speech translation architecture jointly trained on automatic speech recognition, ST and machine translation losses. Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models. Results also illustrate that even approximate phonetic transcriptions can improve ST scores.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Speech Resources in the Tamasheq Language
Authors:
Marcely Zanon Boito,
Fethi Bougares,
Florentin Barbier,
Souhir Gahbiche,
Loïc Barrault,
Mickael Rouvier,
Yannick Estève
Abstract:
In this paper we present two datasets for Tamasheq, a develo** language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours)…
▽ More
In this paper we present two datasets for Tamasheq, a develo** language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours) in five languages: French from Niger, Fulfulde, Hausa, Tamasheq and Zarma, and (ii) a smaller 17 hours parallel corpus of audio recordings in Tamasheq, with utterance-level translations in the French language. All this data is shared under the Creative Commons BY-NC-ND 3.0 license. We hope these resources will inspire the speech community to develop and benchmark models using the Tamasheq language.
△ Less
Submitted 11 April, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020
Authors:
Maha Elbayad,
Ha Nguyen,
Fethi Bougares,
Natalia Tomashenko,
Antoine Caubrière,
Benjamin Lecouteux,
Yannick Estève,
Laurent Besacier
Abstract:
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention…
▽ More
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track. Our contributions focused on data augmentation and ensembling of multiple models. In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask. For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system. We propose an algorithm to control the latency of the ASR+MT cascade and achieve a good latency-quality trade-off on both subtasks.
△ Less
Submitted 24 May, 2020;
originally announced May 2020.
-
ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task
Authors:
Ha Nguyen,
Natalia Tomashenko,
Marcely Zanon Boito,
Antoine Caubriere,
Fethi Bougares,
Mickael Rouvier,
Laurent Besacier,
Yannick Esteve
Abstract:
This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod…
▽ More
This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encoder-decoder architecture with attention mechanism was used for two primary submissions corresponding to the two EN-PT evaluations sets: (1) TED (MuST-C) and (2) How2. In this paper, we notably investigate impact of pooling heterogeneous corpora for training, impact of target tokenization (characters or BPEs), impact of speech input segmentation and we also compare our best end-to-end model (BLEU of 26.91 on MuST-C and 43.82 on How2 validation sets) to a pipeline (ASR+MT) approach.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
LIUM-CVC Submissions for WMT18 Multimodal Translation Task
Authors:
Ozan Caglayan,
Adrien Bardet,
Fethi Bougares,
Loïc Barrault,
Kai Wang,
Marc Masana,
Luis Herranz,
Joost van de Weijer
Abstract:
This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English-Fr…
▽ More
This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English-French and second for English-German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.
△ Less
Submitted 1 September, 2018;
originally announced September 2018.
-
Neural Machine Translation by Generating Multiple Linguistic Factors
Authors:
Mercedes García-Martínez,
Loïc Barrault,
Fethi Bougares
Abstract:
Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage l…
▽ More
Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). Moreover, we can produce grammatically correct words that are not part of the vocabulary. FNMT model is evaluated on IWSLT'15 English to French task and compared to the baseline word-based and BPE-based NMT systems. Promising qualitative and quantitative results (in terms of BLEU and METEOR) are reported.
△ Less
Submitted 5 December, 2017;
originally announced December 2017.
-
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
Authors:
Desmond Elliott,
Stella Frank,
Loïc Barrault,
Fethi Bougares,
Lucia Specia
Abstract:
We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, o…
▽ More
We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, only the image is given. Compared to last year, multimodal systems improved, but text-only systems remain competitive.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
LIUM Machine Translation Systems for WMT17 News Translation Task
Authors:
Mercedes García-Martínez,
Ozan Caglayan,
Walid Aransa,
Adrien Bardet,
Fethi Bougares,
Loïc Barrault
Abstract:
This paper describes LIUM submissions to WMT17 News Translation Task for English-German, English-Turkish, English-Czech and English-Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target mo…
▽ More
This paper describes LIUM submissions to WMT17 News Translation Task for English-German, English-Turkish, English-Czech and English-Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target monolingual corpora for back-translation. The impact of back-translation quantity and quality is also analyzed for English-Turkish where our post-deadline submission surpassed the best entry by +1.6 BLEU.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
LIUM-CVC Submissions for WMT17 Multimodal Translation Task
Authors:
Ozan Caglayan,
Walid Aransa,
Adrien Bardet,
Mercedes García-Martínez,
Fethi Bougares,
Loïc Barrault,
Marc Masana,
Luis Herranz,
Joost van de Weijer
Abstract:
This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs…
▽ More
This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems
Authors:
Ozan Caglayan,
Mercedes García-Martínez,
Adrien Bardet,
Walid Aransa,
Fethi Bougares,
Loïc Barrault
Abstract:
In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM's top-rank…
▽ More
In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News Translation tasks in 2016 and 2017.
△ Less
Submitted 1 June, 2017;
originally announced June 2017.
-
Factored Neural Machine Translation
Authors:
Mercedes García-Martínez,
Loïc Barrault,
Fethi Bougares
Abstract:
We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabul…
▽ More
We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). In addition, we can produce new words that are not in the vocabulary. We use a morphological analyser to get a factored representation of each word (lemmas, Part of Speech tag, tense, person, gender and number). We have extended the NMT approach with attention mechanism in order to have two different outputs, one for the lemmas and the other for the rest of the factors. The final translation is built using some \textit{a priori} linguistic information. We compare our extension with a word-based NMT system. The experiments, performed on the IWSLT'15 dataset translating from English to French, show that while the performance do not always increase, the system can manage a much larger vocabulary and consistently reduce the OOV rate. We observe up to 2% BLEU point improvement in a simulated out of domain translation setup.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.
-
Multimodal Attention for Neural Machine Translation
Authors:
Ozan Caglayan,
Loïc Barrault,
Fethi Bougares
Abstract:
The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultane…
▽ More
The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.
△ Less
Submitted 13 September, 2016;
originally announced September 2016.
-
Does Multimodality Help Human and Machine for Translation and Image Captioning?
Authors:
Ozan Caglayan,
Walid Aransa,
Yaxing Wang,
Marc Masana,
Mercedes García-Martínez,
Fethi Bougares,
Loïc Barrault,
Joost van de Weijer
Abstract:
This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation an…
▽ More
This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
△ Less
Submitted 16 August, 2016; v1 submitted 30 May, 2016;
originally announced May 2016.
-
On Using Monolingual Corpora in Neural Machine Translation
Authors:
Caglar Gulcehre,
Orhan Firat,
Kelvin Xu,
Kyunghyun Cho,
Loic Barrault,
Huei-Chi Lin,
Fethi Bougares,
Holger Schwenk,
Yoshua Bengio
Abstract:
Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hi…
▽ More
Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hierarchical baseline, we obtain up to $1.96$ BLEU improvement on the low-resource language pair Turkish-English, and $1.59$ BLEU on the focused domain task of Chinese-English chat messages. While our method was initially targeted toward such tasks with less parallel data, we show that it also extends to high resource languages such as Cs-En and De-En where we obtain an improvement of $0.39$ and $0.47$ BLEU scores over the neural machine translation baselines, respectively.
△ Less
Submitted 12 June, 2015; v1 submitted 11 March, 2015;
originally announced March 2015.
-
Incremental Adaptation Strategies for Neural Network Language Models
Authors:
Aram Ter-Sarkisov,
Holger Schwenk,
Loic Barrault,
Fethi Bougares
Abstract:
It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take several days. We present efficient techniques to adapt a neural network language model to new data. Instead of training a completely new model or relying on mixtur…
▽ More
It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take several days. We present efficient techniques to adapt a neural network language model to new data. Instead of training a completely new model or relying on mixture approaches, we propose two new methods: continued training on resampled data or insertion of adaptation layers. We present experimental results in an CAT environment where the post-edits of professional translators are used to improve an SMT system. Both methods are very fast and achieve significant improvements without overfitting the small adaptation data.
△ Less
Submitted 7 July, 2015; v1 submitted 20 December, 2014;
originally announced December 2014.
-
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Authors:
Kyunghyun Cho,
Bart van Merrienboer,
Caglar Gulcehre,
Dzmitry Bahdanau,
Fethi Bougares,
Holger Schwenk,
Yoshua Bengio
Abstract:
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of…
▽ More
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
△ Less
Submitted 2 September, 2014; v1 submitted 3 June, 2014;
originally announced June 2014.