Skip to main content

Showing 1–17 of 17 results for author: Bougares, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.05479  [pdf, other

    cs.CL cs.AI cs.LG

    End-to-End Speech Translation of Arabic to English Broadcast News

    Authors: Fethi Bougares, Salim Jouili

    Abstract: Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

    Comments: Arabic Natural Language Processing Workshop 2022

  2. arXiv:2205.01987  [pdf, ps, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

    Authors: Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, Yannick Estève

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tu… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: IWSLT 2022 system paper

  3. arXiv:2201.05051  [pdf, ps, other

    cs.CL

    Speech Resources in the Tamasheq Language

    Authors: Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève

    Abstract: In this paper we present two datasets for Tamasheq, a develo** language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours)… ▽ More

    Submitted 11 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: Accepted to LREC 2022

  4. arXiv:2005.11861  [pdf, other

    cs.CL eess.AS

    ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

    Authors: Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  5. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  6. arXiv:1809.00151  [pdf, other

    cs.CL

    LIUM-CVC Submissions for WMT18 Multimodal Translation Task

    Authors: Ozan Caglayan, Adrien Bardet, Fethi Bougares, Loïc Barrault, Kai Wang, Marc Masana, Luis Herranz, Joost van de Weijer

    Abstract: This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English-Fr… ▽ More

    Submitted 1 September, 2018; originally announced September 2018.

    Comments: WMT2018

  7. Neural Machine Translation by Generating Multiple Linguistic Factors

    Authors: Mercedes García-Martínez, Loïc Barrault, Fethi Bougares

    Abstract: Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage l… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: 11 pages, 3 figues, SLSP conference

  8. arXiv:1710.07177  [pdf, other

    cs.CL cs.CV

    Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

    Authors: Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia

    Abstract: We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, o… ▽ More

    Submitted 19 October, 2017; originally announced October 2017.

    Journal ref: Proceedings of the Second Conference on Machine Translation, 2017, pp. 215--233

  9. arXiv:1707.04499  [pdf, other

    cs.CL

    LIUM Machine Translation Systems for WMT17 News Translation Task

    Authors: Mercedes García-Martínez, Ozan Caglayan, Walid Aransa, Adrien Bardet, Fethi Bougares, Loïc Barrault

    Abstract: This paper describes LIUM submissions to WMT17 News Translation Task for English-German, English-Turkish, English-Czech and English-Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target mo… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    Comments: News Translation Task System Description paper for WMT17

  10. arXiv:1707.04481  [pdf, other

    cs.CL

    LIUM-CVC Submissions for WMT17 Multimodal Translation Task

    Authors: Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Marc Masana, Luis Herranz, Joost van de Weijer

    Abstract: This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    Comments: MMT System Description Paper for WMT17

  11. NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

    Authors: Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault

    Abstract: In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM's top-rank… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

    Comments: 10 pages, 3 figures

  12. arXiv:1609.04621  [pdf, other

    cs.CL

    Factored Neural Machine Translation

    Authors: Mercedes García-Martínez, Loïc Barrault, Fethi Bougares

    Abstract: We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabul… ▽ More

    Submitted 15 September, 2016; originally announced September 2016.

    Comments: 8 pages, 3 figures

  13. arXiv:1609.03976  [pdf, other

    cs.CL cs.NE

    Multimodal Attention for Neural Machine Translation

    Authors: Ozan Caglayan, Loïc Barrault, Fethi Bougares

    Abstract: The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultane… ▽ More

    Submitted 13 September, 2016; originally announced September 2016.

    Comments: 10 pages, under review COLING 2016

  14. arXiv:1605.09186  [pdf, other

    cs.CL cs.LG cs.NE

    Does Multimodality Help Human and Machine for Translation and Image Captioning?

    Authors: Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Joost van de Weijer

    Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation an… ▽ More

    Submitted 16 August, 2016; v1 submitted 30 May, 2016; originally announced May 2016.

    Comments: 7 pages, 2 figures, v4: Small clarification in section 4 title and content

  15. arXiv:1503.03535  [pdf, other

    cs.CL

    On Using Monolingual Corpora in Neural Machine Translation

    Authors: Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

    Abstract: Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hi… ▽ More

    Submitted 12 June, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

    Comments: 9 pages, 2 figures

  16. arXiv:1412.6650  [pdf, other

    cs.NE cs.CL cs.LG

    Incremental Adaptation Strategies for Neural Network Language Models

    Authors: Aram Ter-Sarkisov, Holger Schwenk, Loic Barrault, Fethi Bougares

    Abstract: It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take several days. We present efficient techniques to adapt a neural network language model to new data. Instead of training a completely new model or relying on mixtur… ▽ More

    Submitted 7 July, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: accepted as workshop paper at ACL-IJCNLP 2015

  17. arXiv:1406.1078  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Authors: Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

    Abstract: In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of… ▽ More

    Submitted 2 September, 2014; v1 submitted 3 June, 2014; originally announced June 2014.

    Comments: EMNLP 2014