Skip to main content

Showing 1–46 of 46 results for author: Grangier, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.01093  [pdf, other

    cs.LG cs.CL

    Specialized Language Models with Cheap Inference from Limited Domain Data

    Authors: David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun

    Abstract: Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets. This work formalizes these constraints and distinguishes four important variables: the pretraining budget (for training before the target domain is known), the specialization budget (for training after the target domain is known), the infer… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  2. arXiv:2401.16380  [pdf, other

    cs.CL

    Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

    Authors: Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly

    Abstract: Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance of both compute and data, which grows with the size of the model being trained. This is infeasible both because of the large compute costs and duration associated with pre-training, and the impending s… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  3. arXiv:2311.11973  [pdf, ps, other

    cs.LG cs.CL

    Adaptive Training Distributions with Scalable Online Bilevel Optimization

    Authors: David Grangier, Pierre Ablin, Awni Hannun

    Abstract: Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motiva… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  4. arXiv:2311.06382  [pdf, other

    cs.CL cs.LG

    Transfer Learning for Structured Pruning under Limited Task Data

    Authors: Lucio Dery, David Grangier, Awni Hannun

    Abstract: Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by drop** structural units like layers and attention heads in a manner that takes into account the end-task. However, these pruning algorithms require more task-specific data than is typically available. We… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 8 pages, 7 figures and 3 tables

  5. arXiv:2211.07534  [pdf, other

    cs.CL

    High-Resource Methodological Bias in Low-Resource Investigations

    Authors: Maartje ter Hoeve, David Grangier, Natalie Schluter

    Abstract: The central bottleneck for low-resource NLP is typically regarded to be the quantity of accessible data, overlooking the contribution of data quality. This is particularly seen in the development and evaluation of low-resource systems via down sampling of high-resource language data. In this work we investigate the validity of this approach, and we specifically focus on two well-known NLP tasks fo… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  6. arXiv:2209.03143  [pdf, other

    cs.SD cs.LG eess.AS

    AudioLM: a Language Modeling Approach to Audio Generation

    Authors: Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

    Abstract: We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenizati… ▽ More

    Submitted 25 July, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

  7. arXiv:2202.01653  [pdf, other

    cs.LG

    Learning strides in convolutional neural networks

    Authors: Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

    Abstract: Convolutional neural networks typically contain several downsampling operators, such as strided convolutions or pooling layers, that progressively reduce the resolution of intermediate representations. This provides some shift-invariance while reducing the computational complexity of the whole architecture. A critical hyperparameter of such layers is their stride: the integer factor of downsamplin… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: Spotlight at ICLR2022, open-source code available at https://github.com/google-research/diffstride

  8. arXiv:2111.09388  [pdf, other

    cs.CL cs.AI cs.LG

    High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics

    Authors: Markus Freitag, David Grangier, Qijun Tan, Bowen Liang

    Abstract: In Neural Machine Translation, it is typically assumed that the sentence with the highest estimated probability should also be the translation with the highest quality as measured by humans. In this work, we question this assumption and show that model estimates and translation quality only vaguely correlate. We apply Minimum Bayes Risk (MBR) decoding on unbiased samples to optimize diverse automa… ▽ More

    Submitted 25 April, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: Accepted at TACL, presented at NAACL22

  9. arXiv:2109.10274  [pdf, other

    cs.CL

    The Trade-offs of Domain Adaptation for Neural Language Models

    Authors: David Grangier, Dan Iter

    Abstract: This work connects language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. We derive how the benefit of training a model on either set depends on the size of the sets and the distance between their underlying distributions. We analyze how out-of-domain pre-training before in-domain fine-tuning achiev… ▽ More

    Submitted 21 March, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

    Comments: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022

  10. arXiv:2109.07591  [pdf, other

    cs.CL cs.LG

    On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

    Authors: Dan Iter, David Grangier

    Abstract: Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experim… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

  11. arXiv:2108.11346  [pdf, other

    cs.LG

    Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral

    Authors: Lucio M. Dery, Yann Dauphin, David Grangier

    Abstract: While deep learning has been very beneficial in data-rich settings, tasks with smaller training set often resort to pre-training or multitask learning to leverage data from other tasks. In this case, careful consideration is needed to select tasks and model parameterizations such that updates from the auxiliary tasks actually help the primary task. We seek to alleviate this burden by formulating a… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 15 pages, 3 figures, Accepted to International Conference on Learning Representations (ICLR) 2021 See https://github.com/ldery/ATTITTUD}{https://github.com/ldery/ATTITTUD for associated code

  12. arXiv:2106.15818  [pdf, other

    cs.CL

    On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation

    Authors: Kelly Marchisio, Markus Freitag, David Grangier

    Abstract: Modern unsupervised machine translation (MT) systems reach reasonable translation quality under clean and controlled data conditions. As the performance gap between supervised and unsupervised MT narrows, it is interesting to ask whether the different training methods result in systematically different output beyond what is visible via quality metrics like adequacy or BLEU. We compare translations… ▽ More

    Submitted 13 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: NAACL 2022 Camera-Ready. Tiny text changes to deal with compiler differences between arxiv and Overleaf

  13. arXiv:2105.13802  [pdf, other

    cs.SD cs.LG eess.AS

    DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

    Authors: Neil Zeghidour, Olivier Teboul, David Grangier

    Abstract: We introduce DIVE, an end-to-end speaker diarization algorithm. Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each speaker conditioned on the extracted representations. This strategy intrinsically resolves the speaker ordering ambiguity without requiring the classical permut… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  14. arXiv:2104.14478  [pdf, other

    cs.CL cs.AI cs.LG

    Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

    Authors: Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, Wolfgang Macherey

    Abstract: Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions. While there has been considerable research on human evaluation, the field still lacks a commonly-accepted standard procedure. As a step toward this goal, we propose an evaluation methodology grounded in… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  15. arXiv:2010.13694  [pdf, other

    eess.SP cs.LG

    Learning from Heterogeneous EEG Signals with Differentiable Channel Reordering

    Authors: Aaqib Saeed, David Grangier, Olivier Pietquin, Neil Zeghidour

    Abstract: We propose CHARM, a method for training a single neural network across inconsistent input channels. Our work is motivated by Electroencephalography (EEG), where data collection protocols from different headsets result in varying channel ordering and number, which limits the feasibility of transferring trained systems across datasets. Our approach builds upon attention mechanisms to estimate a late… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  16. arXiv:2010.10915  [pdf, other

    cs.SD cs.LG eess.AS

    Contrastive Learning of General-Purpose Audio Representations

    Authors: Aaqib Saeed, David Grangier, Neil Zeghidour

    Abstract: We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio. Our approach is based on contrastive learning: it learns a representation which assigns high similarity to audio segments extracted from the same recording while assigning lower similarity to segments from different recordings. We build on top of recent advances in contrastive learnin… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  17. arXiv:2010.10245  [pdf, other

    cs.CL cs.LG

    Human-Paraphrased References Improve Neural Machine Translation

    Authors: Markus Freitag, George Foster, David Grangier, Colin Cherry

    Abstract: Automatic evaluation comparing candidate translations to human-generated paraphrases of reference translations has recently been proposed by Freitag et al. When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment. This effect holds for a variety of different automatic metrics, and tends to favor natural formulations over mo… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted at WMT 2020

  18. arXiv:2005.05255  [pdf, other

    cs.CL

    Toward Better Storylines with Sentence-Level Language Models

    Authors: Daphne Ippolito, David Grangier, Douglas Eck, Chris Callison-Burch

    Abstract: We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives. Since it does not need to model fluency, the sentence-level language model can focus on longer range dependencies, which are crucial for multi-sentence coherence. Rather than dealing with individual words, our method treats the story so far as a list of pre-trained senten… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: ACL 2020 short paper

  19. arXiv:2004.06063  [pdf, other

    cs.CL cs.AI cs.LG

    BLEU might be Guilty but References are not Innocent

    Authors: Markus Freitag, David Grangier, Isaac Caswell

    Abstract: The quality of automatic metrics for machine translation has been increasingly called into question, especially for high-quality systems. This paper demonstrates that, while choice of metric is important, the nature of the references is also critical. We study different methods to collect references and compare their value in automated evaluation by reporting correlation with human evaluation for… ▽ More

    Submitted 20 October, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted at EMNLP 2020

  20. arXiv:2003.05997  [pdf, other

    cs.LG eess.AS stat.ML

    Efficient Content-Based Sparse Attention with Routing Transformers

    Authors: Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier

    Abstract: Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic… ▽ More

    Submitted 24 October, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: TACL 2020; pre-MIT Press publication version; v5 has a random attention baseline

  21. arXiv:2002.08933  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Wavesplit: End-to-End Speech Separation by Speaker Clustering

    Authors: Neil Zeghidour, David Grangier

    Abstract: We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the model infers a representation for each source and then estimates each source signal given the inferred representations. The model is trained to jointly perform both tasks from the raw waveform. Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation proble… ▽ More

    Submitted 2 July, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  22. arXiv:1911.03823  [pdf, other

    cs.CL

    Translationese as a Language in "Multilingual" NMT

    Authors: Parker Riley, Isaac Caswell, Markus Freitag, David Grangier

    Abstract: Machine translation has an undesirable propensity to produce "translationese" artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target te… ▽ More

    Submitted 9 July, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020) 7737-7746

  23. arXiv:1907.09190  [pdf, other

    cs.CL

    ELI5: Long Form Question Answering

    Authors: Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli

    Abstract: We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to open-ended questions. The dataset comprises 270K threads from the Reddit forum ``Explain Like I'm Five'' (ELI5) where an online community provides answers to questions which are comprehensible by five year olds. Compared to existing datasets, ELI5 comprises diverse questio… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

  24. arXiv:1906.06442  [pdf, other

    cs.CL cs.LG

    Tagged Back-Translation

    Authors: Isaac Caswell, Ciprian Chelba, David Grangier

    Abstract: Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative t… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: Accepted as oral presentation in WMT 2019; 9 pages; 9 tables; 1 figure

  25. arXiv:1905.12752  [pdf, other

    cs.LG cs.CL stat.ML

    Unsupervised Paraphrasing without Translation

    Authors: Aurko Roy, David Grangier

    Abstract: Paraphrasing exemplifies the ability to abstract semantic content from surface forms. Recent work on automatic paraphrasing is dominated by methods leveraging Machine Translation (MT) as an intermediate step. This contrasts with humans, who can paraphrase without being bilingual. This work proposes to learn paraphrasing models from an unlabeled monolingual corpus only. To that end, we propose a re… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: ACL 2019

  26. arXiv:1904.01038  [pdf, other

    cs.CL

    fairseq: A Fast, Extensible Toolkit for Sequence Modeling

    Authors: Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

    Abstract: fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: NAACL 2019 Demo paper

  27. Modeling Human Motion with Quaternion-based Neural Networks

    Authors: Dario Pavllo, Christoph Feichtenhofer, Michael Auli, David Grangier

    Abstract: Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angles or exponential maps as parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configuration… ▽ More

    Submitted 26 October, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

    Comments: Follow-up work of arXiv:1805.06485. This is a pre-print of an article published in IJCV. The final authenticated version is available online at https://doi.org/10.1007/s11263-019-01245-6

    Journal ref: International Journal of Computer Vision (Special Issue on Machine Vision with Deep Learning), 2019. Online ISSN: 1573-1405

  28. arXiv:1811.11742  [pdf, other

    cs.CV

    3D human pose estimation in video with temporal convolutions and semi-supervised training

    Authors: Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli

    Abstract: In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-pro… ▽ More

    Submitted 29 March, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: CVPR 2019

  29. arXiv:1808.09381  [pdf, ps, other

    cs.CL

    Understanding Back-Translation at Scale

    Authors: Sergey Edunov, Myle Ott, Michael Auli, David Grangier

    Abstract: An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or… ▽ More

    Submitted 2 October, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: 12 pages; EMNLP 2018

  30. arXiv:1806.00187  [pdf, other

    cs.CL

    Scaling Neural Machine Translation

    Authors: Myle Ott, Sergey Edunov, David Grangier, Michael Auli

    Abstract: Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. On WMT'14 English-German translation, we match the accuracy of Vaswani et al. (20… ▽ More

    Submitted 4 September, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: WMT 2018

  31. arXiv:1805.06485  [pdf, other

    cs.CV

    QuaterNet: A Quaternion-based Recurrent Model for Human Motion

    Authors: Dario Pavllo, David Grangier, Michael Auli

    Abstract: Deep learning for predicting or generating 3D human pose sequences is an active research area. Previous work regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angle or exponential map parameterizations. The latter requires re-projection onto skeleton constraints to avoid bon… ▽ More

    Submitted 31 July, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: British Machine Vision Conference (BMVC), 2018

  32. arXiv:1803.00047  [pdf, other

    cs.CL

    Analyzing Uncertainty in Neural Machine Translation

    Authors: Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

    Abstract: Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty o… ▽ More

    Submitted 13 August, 2018; v1 submitted 28 February, 2018; originally announced March 2018.

    Comments: ICML 2018

  33. arXiv:1711.05217  [pdf, other

    cs.CL

    Controllable Abstractive Summarization

    Authors: Angela Fan, David Grangier, Michael Auli

    Abstract: Current models for document summarization disregard user preferences such as the desired length, style, the entities that the user might be interested in, or how much of the document the user has already read. We present a neural summarization model with a simple but effective mechanism to enable users to specify these high level attributes in order to control the shape of the final summaries to b… ▽ More

    Submitted 18 May, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: ACL2018 Workshop on Neural Machine Translation and Generation (NMT@ACL)

  34. arXiv:1711.04956  [pdf, other

    cs.CL

    Classical Structured Prediction Losses for Sequence to Sequence Learning

    Authors: Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

    Abstract: There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losse… ▽ More

    Submitted 5 October, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: 10 pages, NAACL 2018

  35. arXiv:1711.04805  [pdf, other

    cs.CL

    QuickEdit: Editing Text & Translations by Crossing Words Out

    Authors: David Grangier, Michael Auli

    Abstract: We propose a framework for computer-assisted text editing. It applies to translation post-editing and to paraphrasing. Our proposal relies on very simple interactions: a human editor modifies a sentence by marking tokens they would like the system to change. Our model then generates a new sentence which reformulates the initial sentence by avoiding marked words. The approach builds upon neural seq… ▽ More

    Submitted 28 March, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: NAACL'18

  36. arXiv:1705.03122  [pdf, other

    cs.CL

    Convolutional Sequence to Sequence Learning

    Authors: Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

    Abstract: The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed… ▽ More

    Submitted 24 July, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

  37. arXiv:1612.08083  [pdf, other

    cs.CL

    Language Modeling with Gated Convolutional Networks

    Authors: Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

    Abstract: The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism tha… ▽ More

    Submitted 8 September, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

  38. arXiv:1611.02344  [pdf, other

    cs.CL

    A Convolutional Encoder Model for Neural Machine Translation

    Authors: Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin

    Abstract: The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. In this paper we present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the entire source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 English-Rom… ▽ More

    Submitted 24 July, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: 13 pages

  39. arXiv:1610.06602  [pdf, other

    cs.CL

    Iterative Refinement for Machine Translation

    Authors: Roman Novak, Michael Auli, David Grangier

    Abstract: Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions. As a result, earlier mistakes cannot be corrected at a later stage. In this paper, we present a translation scheme that starts from an initial guess and then makes iterative improvements that may revisit previous decisions. We parameterize our model as a conv… ▽ More

    Submitted 13 April, 2018; v1 submitted 20 October, 2016; originally announced October 2016.

    Comments: Presented as a poster at BayLearn 2017

  40. arXiv:1610.00072  [pdf, other

    cs.CL

    Vocabulary Selection Strategies for Neural Machine Translation

    Authors: Gurvan L'Hostis, David Grangier, Michael Auli

    Abstract: Classical translation models constrain the space of possible outputs by selecting a subset of translation rules based on the input sentence. Recent work on improving the efficiency of neural translation models adopted a similar strategy by restricting the output vocabulary to a subset of likely candidates given the source. In this paper we experiment with context and embedding-based selection meth… ▽ More

    Submitted 30 September, 2016; originally announced October 2016.

  41. arXiv:1609.04309  [pdf, other

    cs.CL cs.LG

    Efficient softmax approximation for GPUs

    Authors: Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou

    Abstract: We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by… ▽ More

    Submitted 19 June, 2017; v1 submitted 14 September, 2016; originally announced September 2016.

    Comments: Accepted to ICML 2017

  42. arXiv:1606.07545  [pdf, other

    cs.CL stat.ML

    Interactive Semantic Featuring for Text Classification

    Authors: Camille Jandot, Patrice Simard, Max Chickering, David Grangier, **a Suh

    Abstract: In text classification, dictionaries can be used to define human-comprehensible features. We propose an improvement to dictionary features called smoothed dictionary features. These features recognize document contexts instead of n-grams. We describe a principled methodology to solicit dictionary features from a teacher, and present results showing that models built using these human-comprehensibl… ▽ More

    Submitted 23 June, 2016; originally announced June 2016.

    Comments: presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

  43. arXiv:1603.07771  [pdf, other

    cs.CL

    Neural Text Generation from Structured Data with Application to the Biography Domain

    Authors: Remi Lebret, David Grangier, Michael Auli

    Abstract: This paper introduces a neural model for concept-to-text generation that scales to large, rich domains. We experiment with a new dataset of biographies from Wikipedia that is an order of magnitude larger than existing resources with over 700k samples. The dataset is also vastly more diverse with a 400k vocabulary, compared to a few hundred words for Weathergov or Robocup. Our model builds upon rec… ▽ More

    Submitted 23 September, 2016; v1 submitted 24 March, 2016; originally announced March 2016.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016

  44. arXiv:1512.04906  [pdf, other

    cs.CL cs.LG

    Strategies for Training Large Vocabulary Neural Language Models

    Authors: Welin Chen, David Grangier, Michael Auli

    Abstract: Training neural network language models over large vocabularies is still computationally very costly compared to count-based models such as Kneser-Ney. At the same time, neural language models are gaining popularity for many applications such as speech recognition and machine translation whose success depends on scalability. We present a systematic comparison of strategies to represent and train l… ▽ More

    Submitted 15 December, 2015; originally announced December 2015.

    Comments: 12 pages; journal paper; under review

  45. arXiv:1511.05622  [pdf, other

    cs.LG cs.CV

    Predicting distributions with Linearizing Belief Networks

    Authors: Yann N. Dauphin, David Grangier

    Abstract: Conditional belief networks introduce stochastic binary variables in neural networks. Contrary to a classical neural network, a belief network can predict more than the expected value of the output $Y$ given the input $X$. It can predict a distribution of outputs $Y$ which is useful when an input can admit multiple outputs whose average is not necessarily a valid answer. Such networks are particul… ▽ More

    Submitted 1 May, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  46. arXiv:1409.4814  [pdf

    cs.AI cs.IR

    ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

    Authors: Patrice Simard, David Chickering, Aparna Lakshmiratan, Denis Charles, Leon Bottou, Carlos Garcia Jurado Suarez, David Grangier, Saleema Amershi, Johan Verwey, **a Suh

    Abstract: Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples… ▽ More

    Submitted 16 September, 2014; originally announced September 2014.