Skip to main content

Showing 1–32 of 32 results for author: Barrault, L

.
  1. arXiv:2401.03831  [pdf, other

    cs.CL cs.LG

    We Need to Talk About Classification Evaluation Metrics in NLP

    Authors: Peter Vickers, Loïc Barrault, Emilio Monti, Nikolaos Aletras

    Abstract: In Natural Language Processing (NLP) classification tasks such as topic categorisation and sentiment analysis, model generalizability is generally measured with standard metrics such as Accuracy, F-Measure, or AUC-ROC. The diversity of metrics, and the arbitrariness of their application suggest that there is no agreement within NLP on a single best metric to use. This lack suggests there has not b… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Appeared in AACL 2023

  2. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  3. arXiv:2308.11596  [pdf, other

    cs.CL

    SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

    Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    ACM Class: I.2.7

  4. arXiv:2305.11746  [pdf, other

    cs.CL

    HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation

    Authors: David Dale, Elena Voita, Janice Lam, Prangthip Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Loïc Barrault, Marta R. Costa-jussà

    Abstract: Hallucinations in machine translation are translations that contain information completely unrelated to the input. Omissions are translations that do not include some of the input information. While both cases tend to be catastrophic errors undermining user trust, annotated data with these types of pathologies is extremely scarce and is limited to a few high-resource languages. In this work, we re… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    ACM Class: I.2.7

    Journal ref: EMNLP 2023

  5. arXiv:2302.05611  [pdf, other

    cs.CL

    Metaphor Detection with Effective Context Denoising

    Authors: Shun Wang, Yucheng Li, Chenghua Lin, Loïc Barrault, Frank Guerin

    Abstract: We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection. Compared to existing models, RoPPT focuses on semantically relevant information and achieves the state-of-the-art on several main metaphor datasets. We also compare our approach against several popular denoising and pruning methods, demonstrating the effectiveness of our ap… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

  6. arXiv:2302.04834  [pdf, other

    cs.CL

    FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning

    Authors: Yucheng Li, Shun Wang, Chenghua Lin, Frank Guerin, Loïc Barrault

    Abstract: In this paper, we propose FrameBERT, a RoBERTa-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection. FrameBERT not only achieves better or comparable performance to the state-of-the-art, but also is more explainable and interpretable compared to existing models, attributing to its ability of accounting for external knowledge of FrameNet.

    Submitted 9 February, 2023; originally announced February 2023.

  7. arXiv:2212.08597  [pdf, other

    cs.CL

    Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

    Authors: David Dale, Elena Voita, Loïc Barrault, Marta R. Costa-jussà

    Abstract: While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model ca… ▽ More

    Submitted 20 December, 2022; v1 submitted 16 December, 2022; originally announced December 2022.

    ACM Class: I.2.7

  8. arXiv:2207.04672  [pdf

    cs.CL cs.AI

    No Language Left Behind: Scaling Human-Centered Machine Translation

    Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

    Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More

    Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 190 pages

    MSC Class: 68T50 ACM Class: I.2.7

  9. arXiv:2205.05990  [pdf, other

    cs.CL

    Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

    Authors: Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

    Abstract: This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the p… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 8 pages, 10 figures, IWSLT22 camera-ready (system paper @ ACL-IWSLT Shared Task on Formality Control for Spoken Language Translation)

  10. arXiv:2205.04747  [pdf, other

    cs.CL cs.AI

    Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation

    Authors: Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

    Abstract: Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-tex… ▽ More

    Submitted 30 May, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: 9 pages, 9 figures, EAMT2022 camera-ready

    Journal ref: Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, p. 121-130, Ghent, Belgium, June 2022

  11. arXiv:2205.01987  [pdf, ps, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

    Authors: Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, Yannick Estève

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tu… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: IWSLT 2022 system paper

  12. arXiv:2201.05051  [pdf, ps, other

    cs.CL

    Speech Resources in the Tamasheq Language

    Authors: Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève

    Abstract: In this paper we present two datasets for Tamasheq, a develo** language mainly spoken in Mali and Niger. These two datasets were made available for the IWSLT 2022 low-resource speech translation track, and they consist of collections of radio recordings from daily broadcast news in Niger (Studio Kalangou) and Mali (Studio Tamani). We share (i) a massive amount of unlabeled audio data (671 hours)… ▽ More

    Submitted 11 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: Accepted to LREC 2022

  13. arXiv:2109.03764  [pdf, other

    cs.CL cs.AI cs.LG

    Active Learning by Acquiring Contrastive Examples

    Authors: Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras

    Abstract: Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textit{contrastive examples}, i.e. data points that are similar in the model feature space and ye… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021

  14. arXiv:2104.08320  [pdf, other

    cs.CL

    On the Importance of Effectively Adapting Pretrained Language Models for Active Learning

    Authors: Katerina Margatina, Loïc Barrault, Nikolaos Aletras

    Abstract: Recent Active Learning (AL) approaches in Natural Language Processing (NLP) proposed using off-the-shelf pretrained language models (LMs). In this paper, we argue that these LMs are not adapted effectively to the downstream task during AL and we explore ways to address this issue. We suggest to first adapt the pretrained LM to the target task by continuing training with all the available unlabeled… ▽ More

    Submitted 2 March, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: To appear at ACL 2022

  15. arXiv:2009.07310  [pdf, other

    cs.CL

    Simultaneous Machine Translation with Visual Context

    Authors: Ozan Caglayan, Julia Ive, Veneta Haralampieva, Pranava Madhyastha, Loïc Barrault, Lucia Specia

    Abstract: Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible. The translation thus has to start with an incomplete source text, which is read progressively, creating the need for anticipation. In this paper, we seek to understand whether the addition of visual information can compensate for the m… ▽ More

    Submitted 13 October, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: Long paper accepted to EMNLP 2020, Camera-ready version

  16. arXiv:1903.08678  [pdf, other

    cs.CL

    Probing the Need for Visual Context in Multimodal Machine Translation

    Authors: Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Loïc Barrault

    Abstract: Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial. We posit that this is a consequence of the very simple, short and repetitive sentences used in the only available dataset for the task (Multi30K), rendering the source text sufficient as context. In the general case, however, we believe that it is possibl… ▽ More

    Submitted 2 June, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

    Comments: Accepted to NAACL-HLT 2019, reviewer comments addressed, camera-ready

  17. arXiv:1811.03865  [pdf, other

    cs.CL

    Multimodal Grounding for Sequence-to-Sequence Speech Recognition

    Authors: Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze

    Abstract: Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or to recall named entities. Motivated by this, there have been many works studying the integration of visual information into the speech recognition pipeline. Spe… ▽ More

    Submitted 19 February, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: ICASSP 2019

  18. arXiv:1811.00347  [pdf, other

    cs.CL

    How2: A Large-scale Dataset for Multimodal Language Understanding

    Authors: Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze

    Abstract: In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations. We also present integrated sequence-to-sequence baselines for machine translation, automatic speech recognition, spoken language translation, and multimodal summarization. By making available data and code for several multimodal natural language tasks,… ▽ More

    Submitted 7 December, 2018; v1 submitted 1 November, 2018; originally announced November 2018.

  19. arXiv:1809.00151  [pdf, other

    cs.CL

    LIUM-CVC Submissions for WMT18 Multimodal Translation Task

    Authors: Ozan Caglayan, Adrien Bardet, Fethi Bougares, Loïc Barrault, Kai Wang, Marc Masana, Luis Herranz, Joost van de Weijer

    Abstract: This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English-Fr… ▽ More

    Submitted 1 September, 2018; originally announced September 2018.

    Comments: WMT2018

  20. arXiv:1805.01070  [pdf, other

    cs.CL

    What you can cram into a single vector: Probing sentence embeddings for linguistic properties

    Authors: Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni

    Abstract: Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the repres… ▽ More

    Submitted 8 July, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  21. Neural Machine Translation by Generating Multiple Linguistic Factors

    Authors: Mercedes García-Martínez, Loïc Barrault, Fethi Bougares

    Abstract: Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage l… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: 11 pages, 3 figues, SLSP conference

  22. arXiv:1710.07177  [pdf, other

    cs.CL cs.CV

    Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

    Authors: Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia

    Abstract: We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, o… ▽ More

    Submitted 19 October, 2017; originally announced October 2017.

    Journal ref: Proceedings of the Second Conference on Machine Translation, 2017, pp. 215--233

  23. arXiv:1707.04499  [pdf, other

    cs.CL

    LIUM Machine Translation Systems for WMT17 News Translation Task

    Authors: Mercedes García-Martínez, Ozan Caglayan, Walid Aransa, Adrien Bardet, Fethi Bougares, Loïc Barrault

    Abstract: This paper describes LIUM submissions to WMT17 News Translation Task for English-German, English-Turkish, English-Czech and English-Latvian language pairs. We train BPE-based attentive Neural Machine Translation systems with and without factored outputs using the open source nmtpy framework. Competitive scores were obtained by ensembling various systems and exploiting the availability of target mo… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    Comments: News Translation Task System Description paper for WMT17

  24. arXiv:1707.04481  [pdf, other

    cs.CL

    LIUM-CVC Submissions for WMT17 Multimodal Translation Task

    Authors: Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Marc Masana, Luis Herranz, Joost van de Weijer

    Abstract: This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    Comments: MMT System Description Paper for WMT17

  25. NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems

    Authors: Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault

    Abstract: In this paper, we present nmtpy, a flexible Python toolkit based on Theano for training Neural Machine Translation and other neural sequence-to-sequence architectures. nmtpy decouples the specification of a network from the training and inference utilities to simplify the addition of a new architecture and reduce the amount of boilerplate code to be written. nmtpy has been used for LIUM's top-rank… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

    Comments: 10 pages, 3 figures

  26. arXiv:1705.02364  [pdf, ps, other

    cs.CL

    Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

    Authors: Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes

    Abstract: Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, w… ▽ More

    Submitted 8 July, 2018; v1 submitted 5 May, 2017; originally announced May 2017.

    Comments: EMNLP 2017

  27. arXiv:1609.04621  [pdf, other

    cs.CL

    Factored Neural Machine Translation

    Authors: Mercedes García-Martínez, Loïc Barrault, Fethi Bougares

    Abstract: We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabul… ▽ More

    Submitted 15 September, 2016; originally announced September 2016.

    Comments: 8 pages, 3 figures

  28. arXiv:1609.03976  [pdf, other

    cs.CL cs.NE

    Multimodal Attention for Neural Machine Translation

    Authors: Ozan Caglayan, Loïc Barrault, Fethi Bougares

    Abstract: The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultane… ▽ More

    Submitted 13 September, 2016; originally announced September 2016.

    Comments: 10 pages, under review COLING 2016

  29. arXiv:1606.01781  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Very Deep Convolutional Networks for Text Classification

    Authors: Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun

    Abstract: The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses on… ▽ More

    Submitted 27 January, 2017; v1 submitted 6 June, 2016; originally announced June 2016.

    Comments: 10 pages, EACL 2017, camera-ready

  30. arXiv:1605.09186  [pdf, other

    cs.CL cs.LG cs.NE

    Does Multimodality Help Human and Machine for Translation and Image Captioning?

    Authors: Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Joost van de Weijer

    Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation an… ▽ More

    Submitted 16 August, 2016; v1 submitted 30 May, 2016; originally announced May 2016.

    Comments: 7 pages, 2 figures, v4: Small clarification in section 4 title and content

  31. arXiv:1503.03535  [pdf, other

    cs.CL

    On Using Monolingual Corpora in Neural Machine Translation

    Authors: Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

    Abstract: Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hi… ▽ More

    Submitted 12 June, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

    Comments: 9 pages, 2 figures

  32. arXiv:1412.6650  [pdf, other

    cs.NE cs.CL cs.LG

    Incremental Adaptation Strategies for Neural Network Language Models

    Authors: Aram Ter-Sarkisov, Holger Schwenk, Loic Barrault, Fethi Bougares

    Abstract: It is today acknowledged that neural network language models outperform backoff language models in applications like speech recognition or statistical machine translation. However, training these models on large amounts of data can take several days. We present efficient techniques to adapt a neural network language model to new data. Instead of training a completely new model or relying on mixtur… ▽ More

    Submitted 7 July, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: accepted as workshop paper at ACL-IJCNLP 2015