Skip to main content

Showing 1–43 of 43 results for author: Bahdanau, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13022  [pdf, other

    cs.CL cs.LG

    LLMs can learn self-restraint through iterative self-reflection

    Authors: Alexandre Piché, Aristides Milios, Dzmitry Bahdanau, Chris Pal

    Abstract: In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likeli… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  2. arXiv:2404.05961  [pdf, other

    cs.CL cs.AI

    LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

    Authors: Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy

    Abstract: Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2311.09635  [pdf, other

    cs.CL

    Evaluating In-Context Learning of Libraries for Code Generation

    Authors: Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi

    Abstract: Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open question… ▽ More

    Submitted 4 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  4. arXiv:2310.14192  [pdf, other

    cs.CL cs.AI

    PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation

    Authors: Gaurav Sahu, Olga Vechtomova, Dzmitry Bahdanau, Issam H. Laradji

    Abstract: Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follo… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Long paper)

  5. arXiv:2310.11634  [pdf, other

    cs.CL

    MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

    Authors: Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau

    Abstract: Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff and are costly to finetune repeatedly. Therefore, it is crucial for LLMs to learn novel interpretations in-context. In this paper, we systematically analyse the a… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  6. arXiv:2309.10954  [pdf, other

    cs.CL cs.LG

    In-Context Learning for Text Classification with Many Labels

    Authors: Aristides Milios, Siva Reddy, Dzmitry Bahdanau

    Abstract: In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: 12 pages, 4 figures

  7. arXiv:2306.10998  [pdf, other

    cs.LG cs.AI cs.PL cs.SE

    RepoFusion: Training Code Models to Understand Your Repository

    Authors: Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak

    Abstract: Despite the huge success of Large Language Models (LLMs) in coding assistants like GitHub Copilot, these models struggle to understand the context present in the repository (e.g., imports, parent classes, files with similar names, etc.), thereby producing inaccurate code completions. This effect is more pronounced when using these assistants for repositories that the model has not seen during trai… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  8. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  9. arXiv:2301.03988  [pdf, other

    cs.SE cs.AI cs.LG

    SantaCoder: don't reach for the stars!

    Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

    Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More

    Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

  10. arXiv:2211.15533  [pdf, other

    cs.CL cs.AI

    The Stack: 3 TB of permissively licensed source code

    Authors: Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, Harm de Vries

    Abstract: Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language processing but also for code understanding and generation. To stimulate open and responsible research on LLMs for code, we introduce The Stack, a 3.1 TB dataset consisting of permissively licensed source code in 30 programming languages. We describe how we collect t… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  11. arXiv:2211.08473  [pdf, other

    cs.CL cs.LG

    On the Compositional Generalization Gap of In-Context Learning

    Authors: Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

    Abstract: Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization abilities. Scaling such models has been shown to improve their performance on various NLP tasks even just by conditioning them on a few examples to solve the task without any fine-tuning (also known as in-context learning). In this work, we look at the gap between th… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  12. arXiv:2205.09607   

    cs.CL cs.AI

    LAGr: Label Aligned Graphs for Better Systematic Generalization in Semantic Parsing

    Authors: Dora Jambor, Dzmitry Bahdanau

    Abstract: Semantic parsing is the task of producing structured meaning representations for natural language sentences. Recent research has pointed out that the commonly-used sequence-to-sequence (seq2seq) semantic parsers struggle to generalize systematically, i.e. to handle examples that require recombining known knowledge in novel settings. In this work, we show that better systematic generalization can b… ▽ More

    Submitted 1 June, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: published latest version of a paper that's already on arxiv instead of adding it as a new version. Please see arXiv:2110.07572

  13. arXiv:2204.01959  [pdf, other

    cs.CL cs.AI

    Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

    Authors: Gaurav Sahu, Pau Rodriguez, Issam H. Laradji, Parmida Atighehchian, David Vazquez, Dzmitry Bahdanau

    Abstract: Data augmentation is a widely employed technique to alleviate the problem of data scarcity. In this work, we propose a prompting-based approach to generate labelled training data for intent classification with off-the-shelf language models (LMs) such as GPT-3. An advantage of this method is that no task-specific LM-fine-tuning for data generation is required; hence the method requires no hyper-par… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to 4th Workshop on NLP for Conversational AI, ACL 2022

  14. arXiv:2204.00498  [pdf, other

    cs.CL cs.DB cs.LG

    Evaluating the Text-to-SQL Capabilities of Large Language Models

    Authors: Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau

    Abstract: We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform b… ▽ More

    Submitted 15 March, 2022; originally announced April 2022.

  15. arXiv:2112.00578  [pdf, other

    cs.CL cs.LG

    Systematic Generalization with Edge Transformers

    Authors: Leon Bergen, Timothy J. O'Donnell, Dzmitry Bahdanau

    Abstract: Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector state… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted as a conference paper at NeurIPS 2021

  16. arXiv:2110.07572  [pdf, other

    cs.CL cs.AI

    LAGr: Labeling Aligned Graphs for Improving Systematic Generalization in Semantic Parsing

    Authors: Dora Jambor, Dzmitry Bahdanau

    Abstract: Semantic parsing is the task of producing a structured meaning representation for natural language utterances or questions. Recent research has pointed out that the commonly-used sequence-to-sequence (seq2seq) semantic parsers struggle to generalize systematically, i.e. to handle examples that require recombining known knowledge in novel settings. In this work, we show that better systematic gener… ▽ More

    Submitted 1 June, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  17. arXiv:2110.06843  [pdf, other

    cs.CL

    Compositional Generalization in Dependency Parsing

    Authors: Emily Goodwin, Siva Reddy, Timothy J. O'Donnell, Dzmitry Bahdanau

    Abstract: Compositionality -- the ability to combine familiar units like words into novel phrases and sentences -- has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over… ▽ More

    Submitted 15 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 12 pages 7 figures

  18. arXiv:2109.05093  [pdf, other

    cs.CL cs.PL

    PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

    Authors: Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau

    Abstract: Large pre-trained language models for textual data have an unconstrained output space; at each decoding step, they can produce any of 10,000s of sub-word tokens. When fine-tuned to target constrained formal languages like SQL, these models often generate invalid code, rendering it unusable. We propose PICARD (code and trained models available at https://github.com/ElementAI/picard), a method for c… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021. 7 pages

  19. arXiv:2105.03519  [pdf, other

    cs.CL

    Understanding by Understanding Not: Modeling Negation in Language Models

    Authors: Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

    Abstract: Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language models often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the r… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

  20. arXiv:2104.06645  [pdf, other

    cs.CL

    Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention

    Authors: Leon Bergen, Dzmitry Bahdanau, Timothy J. O'Donnell

    Abstract: We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics. Our model builds on the neurosymbolic approach of Mao et al. (2019), learning to ground objects in the CLEVR dataset (Johnson et al., 2017) using a novel parallel attention mechanism. The model achieves state of the art performance on visual question answering, learni… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  21. DuoRAT: Towards Simpler Text-to-SQL Models

    Authors: Torsten Scholak, Raymond Li, Dzmitry Bahdanau, Harm de Vries, Chris Pal

    Abstract: Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have proposed increasingly sophisticated solutions to the problem. Contrary to this trend, in this paper we focus on simplifications. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL… ▽ More

    Submitted 10 September, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted to NAACL 2021. 9 pages

  22. arXiv:2007.14435  [pdf, other

    cs.CL

    Towards Ecologically Valid Research on Language User Interfaces

    Authors: Harm de Vries, Dzmitry Bahdanau, Christopher Manning

    Abstract: Language User Interfaces (LUIs) could improve human-machine interaction for a wide variety of tasks, such as playing music, getting insights from databases, or instructing domestic robots. In contrast to traditional hand-crafted approaches, recent work attempts to build LUIs in a data-driven way using modern deep learning methods. To satisfy the data needs of such learning algorithms, researchers… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  23. arXiv:2007.12770  [pdf, other

    cs.AI cs.CL cs.LG

    BabyAI 1.1

    Authors: David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

    Abstract: The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent's architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: 9 pages, 1 figure, technical report

  24. arXiv:2002.00412  [pdf, other

    cs.LG cs.AI stat.ML

    Combating False Negatives in Adversarial Imitation Learning

    Authors: Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

    Abstract: In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's t… ▽ More

    Submitted 2 February, 2020; originally announced February 2020.

    Comments: This is an extended version of the student abstract published at 34th AAAI Conference on Artificial Intelligence

  25. arXiv:1912.05783  [pdf, other

    cs.AI cs.LG

    CLOSURE: Assessing Systematic Generalization of CLEVR Models

    Authors: Dzmitry Bahdanau, Harm de Vries, Timothy J. O'Donnell, Shikhar Murty, Philippe Beaudoin, Yoshua Bengio, Aaron Courville

    Abstract: The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations… ▽ More

    Submitted 17 October, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: Technical report

  26. arXiv:1912.00444  [pdf, other

    cs.LG cs.AI stat.ML

    Automated curriculum generation for Policy Gradients from Demonstrations

    Authors: Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert, Yoshua Bengio

    Abstract: In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our meth… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

    Comments: Accepted to Deep RL Workshop at NeurIPS 2019

  27. arXiv:1811.12889  [pdf, other

    cs.CL cs.AI

    Systematic Generalization: What Is Required and Can It Be Learned?

    Authors: Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville

    Abstract: Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, w… ▽ More

    Submitted 21 April, 2019; v1 submitted 30 November, 2018; originally announced November 2018.

    Comments: Published as a conference paper at ICLR 2019

  28. arXiv:1810.08272  [pdf, other

    cs.AI cs.CL

    BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

    Authors: Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

    Abstract: Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded languag… ▽ More

    Submitted 19 December, 2019; v1 submitted 18 October, 2018; originally announced October 2018.

    Comments: Accepted at ICLR 2019

  29. arXiv:1806.01946  [pdf, other

    cs.AI cs.LG

    Learning to Understand Goal Specifications by Modelling Reward

    Authors: Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

    Abstract: Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we prese… ▽ More

    Submitted 23 December, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

    Comments: 19 pages, 9 figures

  30. arXiv:1804.09259  [pdf, other

    cs.CL

    Commonsense mining as knowledge base completion? A study on the impact of novelty

    Authors: Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung

    Abstract: Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by the recent work by Li et al., we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the diffi… ▽ More

    Submitted 24 April, 2018; originally announced April 2018.

    Comments: Published in Workshop on New Forms of Generalization in Deep Learning and Natural Language Processing (NAACL 2018)

  31. arXiv:1706.00286  [pdf, other

    cs.LG cs.CL

    Learning to Compute Word Embeddings On the Fly

    Authors: Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

    Abstract: Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words… ▽ More

    Submitted 7 March, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

  32. arXiv:1611.02796  [pdf, other

    cs.LG cs.AI

    Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

    Authors: Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

    Abstract: This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is t… ▽ More

    Submitted 16 October, 2017; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: Add supplementary material

  33. arXiv:1607.07086  [pdf, other

    cs.LG

    An Actor-Critic Algorithm for Sequence Prediction

    Authors: Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

    Abstract: We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \texti… ▽ More

    Submitted 3 March, 2017; v1 submitted 24 July, 2016; originally announced July 2016.

  34. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  35. arXiv:1511.06456  [pdf, other

    cs.LG

    Task Loss Estimation for Sequence Prediction

    Authors: Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

    Abstract: Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for… ▽ More

    Submitted 19 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016

  36. arXiv:1508.04395  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    End-to-End Attention-based Large Vocabulary Speech Recognition

    Authors: Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

    Abstract: Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence decoding. We investigate a more direct approach in which the HMM is replaced with a Recurrent Neural Network (RNN)… ▽ More

    Submitted 14 March, 2016; v1 submitted 18 August, 2015; originally announced August 2015.

  37. arXiv:1506.07503  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Attention-Based Models for Speech Recognition

    Authors: Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

    Abstract: Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in reaches… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

  38. arXiv:1506.00619  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Blocks and Fuel: Frameworks for deep learning

    Authors: Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio

    Abstract: We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano's symbolic computational graph, and providing an extensive set of utilities to assist training th… ▽ More

    Submitted 1 June, 2015; originally announced June 2015.

  39. arXiv:1412.1602  [pdf, other

    cs.NE cs.LG stat.ML

    End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

    Authors: Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

    Abstract: We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context creat… ▽ More

    Submitted 4 December, 2014; originally announced December 2014.

    Comments: As accepted to: Deep Learning and Representation Learning Workshop, NIPS 2014

  40. arXiv:1409.1259  [pdf, other

    cs.CL stat.ML

    On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

    Authors: Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio

    Abstract: Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on anal… ▽ More

    Submitted 7 October, 2014; v1 submitted 3 September, 2014; originally announced September 2014.

    Comments: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)

  41. arXiv:1409.1257  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

    Authors: Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer, Kyunghyun Cho, Yoshua Bengio

    Abstract: The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems. In this paper, we propose a way to address this issue by automatically segmenting an input sentence into phrases that can be easily translated by the neu… ▽ More

    Submitted 7 October, 2014; v1 submitted 3 September, 2014; originally announced September 2014.

    Comments: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)

  42. arXiv:1409.0473  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Neural Machine Translation by Jointly Learning to Align and Translate

    Authors: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

    Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of a… ▽ More

    Submitted 19 May, 2016; v1 submitted 1 September, 2014; originally announced September 2014.

    Comments: Accepted at ICLR 2015 as oral presentation

  43. arXiv:1406.1078  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Authors: Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

    Abstract: In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of… ▽ More

    Submitted 2 September, 2014; v1 submitted 3 June, 2014; originally announced June 2014.

    Comments: EMNLP 2014