Skip to main content

Showing 101–150 of 184 results for author: Courville, A

.
  1. arXiv:1810.09536  [pdf, other

    cs.CL cs.LG

    Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

    Authors: Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville

    Abstract: Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hi… ▽ More

    Submitted 8 May, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019

  2. arXiv:1809.06848  [pdf, other

    cs.LG cs.AI stat.ML

    On the Learning Dynamics of Deep Neural Networks

    Authors: Remi Tachet, Mohammad Pezeshki, Samira Shabanian, Aaron Courville, Yoshua Bengio

    Abstract: While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm emp… ▽ More

    Submitted 11 December, 2020; v1 submitted 18 September, 2018; originally announced September 2018.

    Comments: 19 pages, 7 figures

  3. arXiv:1809.01818  [pdf, other

    cs.LG stat.ML

    Improving Explorability in Variational Inference with Annealed Variational Objectives

    Authors: Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville

    Abstract: Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Import… ▽ More

    Submitted 25 October, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: To appear in NIPS 2018

  4. arXiv:1808.09819  [pdf, other

    cs.LG cs.AI stat.ML

    Approximate Exploration through State Abstraction

    Authors: Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare

    Abstract: Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exp… ▽ More

    Submitted 24 January, 2019; v1 submitted 29 August, 2018; originally announced August 2018.

  5. arXiv:1808.04446  [pdf, other

    cs.CV cs.CL cs.LG stat.ML

    Visual Reasoning with Multi-hop Feature Modulation

    Authors: Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

    Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to… ▽ More

    Submitted 12 October, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: In Proc of ECCV 2018

  6. arXiv:1806.08734  [pdf, other

    stat.ML cs.LG

    On the Spectral Bias of Neural Networks

    Authors: Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

    Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output map**s with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuatio… ▽ More

    Submitted 31 May, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: 23 pages

    Journal ref: ICML 2019

  7. Learning Distributed Representations from Reviews for Collaborative Filtering

    Authors: Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, Aaron Courville

    Abstract: Recent work has shown that collaborative filter-based recommender systems can be improved by incorporating side information, such as natural language reviews, as a way of regularizing the derived product representations. Motivated by the success of this approach, we introduce two different models of reviews and study their effect on collaborative filtering performance. While the previous state-of-… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: Published in RecSys 2015 conference

  8. arXiv:1806.05236  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Manifold Mixup: Better Representations by Interpolating Hidden States

    Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

    Abstract: Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden repr… ▽ More

    Submitted 11 May, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: To appear in ICML 2019

  9. arXiv:1806.04168  [pdf, other

    cs.CL cs.AI cs.LG

    Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

    Authors: Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio

    Abstract: In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: Published at ACL2018

  10. arXiv:1804.00779  [pdf, other

    cs.LG stat.ML

    Neural Autoregressive Flows

    Authors: Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

    Abstract: Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Comments: 16 pages, 10 figures, 3 tables

  11. arXiv:1803.02710  [pdf, other

    cs.CL cs.AI

    Generating Contradictory, Neutral, and Entailing Sentences

    Authors: Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville

    Abstract: Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to learn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. W… ▽ More

    Submitted 7 March, 2018; originally announced March 2018.

  12. arXiv:1802.10151  [pdf, other

    cs.LG

    Augmented CycleGAN: Learning Many-to-Many Map**s from Unpaired Data

    Authors: Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville

    Abstract: Learning inter-domain map**s from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain map** is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexibl… ▽ More

    Submitted 18 June, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: ICML 2018

  13. arXiv:1802.01071  [pdf, other

    stat.ML cs.LG

    Hierarchical Adversarially Learned Inference

    Authors: Mohamed Ishmael Belghazi, Sai Rajeswar, Olivier Mastropietro, Negar Rostamzadeh, Jovana Mitrovic, Aaron Courville

    Abstract: We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with diffe… ▽ More

    Submitted 3 February, 2018; originally announced February 2018.

    Comments: 18 pages, 7 figures

  14. arXiv:1801.04062  [pdf, other

    cs.LG stat.ML

    MINE: Mutual Information Neural Estimation

    Authors: Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm

    Abstract: We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be… ▽ More

    Submitted 14 August, 2021; v1 submitted 12 January, 2018; originally announced January 2018.

    Comments: 19 pages, 6 figures

    Journal ref: ICML 2018

  15. arXiv:1712.04120  [pdf, other

    stat.ML cs.LG

    GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

    Authors: Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

    Abstract: Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet samp… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

    Comments: NIPS 2017

  16. arXiv:1711.11017  [pdf, other

    cs.AI cs.CL cs.CV cs.RO cs.SD eess.AS

    HoME: a Household Multimodal Environment

    Authors: Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville

    Abstract: We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017's Visually-Grounded Interaction and Language Workshop

  17. arXiv:1711.02013  [pdf, other

    cs.CL cs.AI

    Neural Language Modeling by Jointly Learning Syntax and Lexicon

    Authors: Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville

    Abstract: We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additio… ▽ More

    Submitted 18 February, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: 16 pages, 5 figures, ICLR 2018

  18. arXiv:1710.04759  [pdf, other

    stat.ML cs.AI cs.LG

    Bayesian Hypernetworks

    Authors: David Krueger, Chin-Wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville

    Abstract: We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork $\h$ is a neural network which learns to transform a simple noise distribution, $p(\vecε) = \N(\vec 0,\mat I)$, to a distribution $q(\pp) := q(h(\vecε))$ over the parameters $\pp$ of another neural network (the "primary network")\@. We train $q$ with variational inference, us… ▽ More

    Submitted 24 April, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

    Comments: David Krueger and Chin-Wei Huang contributed equally

  19. arXiv:1710.02248  [pdf, other

    cs.LG cs.AI stat.ML

    Learnable Explicit Density for Continuous Latent Space and Variational Inference

    Authors: Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

    Abstract: In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF)… ▽ More

    Submitted 5 October, 2017; originally announced October 2017.

    Comments: 2 figures, 5 pages, submitted to ICML Principled Approaches to Deep Learning workshop

  20. arXiv:1709.07871  [pdf, other

    cs.CV cs.AI cs.CL stat.ML

    FiLM: Visual Reasoning with a General Conditioning Layer

    Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

    Abstract: We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process -… ▽ More

    Submitted 18 December, 2017; v1 submitted 22 September, 2017; originally announced September 2017.

    Comments: AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.03017

  21. arXiv:1707.08588  [pdf, other

    cs.CL cs.LG

    Self-organized Hierarchical Softmax

    Authors: Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville

    Abstract: We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sent… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

  22. arXiv:1707.03017  [pdf, other

    cs.CV cs.AI cs.CL stat.ML

    Learning Visual Reasoning Without Strong Priors

    Authors: Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville

    Abstract: Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than lear… ▽ More

    Submitted 18 December, 2017; v1 submitted 10 July, 2017; originally announced July 2017.

    Comments: Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/film

  23. arXiv:1707.00683  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating early visual processing by language

    Authors: Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

    Abstract: It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and pro… ▽ More

    Submitted 18 December, 2017; v1 submitted 2 July, 2017; originally announced July 2017.

    Comments: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  24. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  25. arXiv:1705.10929  [pdf, other

    cs.CL cs.AI cs.NE stat.ML

    Adversarial Generation of Natural Language

    Authors: Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville

    Abstract: Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generatin… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

    Comments: 11 pages, 3 figures, 5 tables

  26. arXiv:1704.00028  [pdf, other

    cs.LG stat.ML

    Improved Training of Wasserstein GANs

    Authors: Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville

    Abstract: Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clip** in WGAN to enforce a Lipschitz constraint on the… ▽ More

    Submitted 25 December, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

    Comments: NIPS camera-ready

  27. arXiv:1703.05423  [pdf, other

    cs.CL

    End-to-end optimization of goal-driven and visually grounded dialogue systems

    Authors: Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

    Abstract: End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too s… ▽ More

    Submitted 15 March, 2017; originally announced March 2017.

  28. arXiv:1702.01691  [pdf, other

    cs.LG

    Calibrating Energy-based Generative Adversarial Networks

    Authors: Zihang Dai, Amjad Almahairi, Philip Bachman, Eduard Hovy, Aaron Courville

    Abstract: In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples.Specifically, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to the true data distribution, but also enables the discriminator to retain the density information at the global optimal. We derive th… ▽ More

    Submitted 23 February, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

    Comments: ICLR 2017 camera ready

  29. arXiv:1701.02720  [pdf, other

    cs.CL cs.LG stat.ML

    Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

    Authors: Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville

    Abstract: Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classi… ▽ More

    Submitted 10 January, 2017; originally announced January 2017.

  30. arXiv:1612.07837  [pdf, other

    cs.SD cs.AI

    SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

    Authors: Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio

    Abstract: In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very lon… ▽ More

    Submitted 11 February, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

    Comments: Published as a conference paper at ICLR 2017

  31. arXiv:1612.03809  [pdf, other

    stat.ML cs.CV cs.LG

    Generalizable Features From Unsupervised Learning

    Authors: Mehdi Mirza, Aaron Courville, Yoshua Bengio

    Abstract: Humans learn a predictive model of the world and use this model to reason about future events and the consequences of actions. In contrast to most machine predictors, we exhibit an impressive ability to generalize to unseen scenarios and reason intelligently in these settings. One important aspect of this ability is physical intuition(Lake et al., 2016). In this work, we explore the potential of u… ▽ More

    Submitted 12 December, 2016; originally announced December 2016.

  32. arXiv:1612.00799  [pdf, other

    cs.CV

    A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images

    Authors: David Vázquez, Jorge Bernal, F. Javier Sánchez, Gloria Fernández-Esparrach, Antonio M. López, Adriana Romero, Michal Drozdzal, Aaron Courville

    Abstract: Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

  33. arXiv:1612.00377  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Piecewise Latent Variables for Neural Variational Text Processing

    Authors: Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron Courville

    Abstract: Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such… ▽ More

    Submitted 23 September, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: 19 pages, 2 figures, 8 tables; EMNLP 2017

    ACM Class: I.5.1; I.2.7

  34. arXiv:1611.08481  [pdf, other

    cs.AI cs.CV

    GuessWhat?! Visual object discovery through multi-modal dialogue

    Authors: Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

    Abstract: We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the colle… ▽ More

    Submitted 6 February, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: 23 pages; CVPR 2017 submission; see https://guesswhat.ai

  35. arXiv:1611.07810  [pdf, other

    cs.CV

    A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

    Authors: Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, Christopher Pal

    Abstract: While deep convolutional neural networks frequently approach or exceed human-level performance at benchmark tasks involving static images, extending this success to moving images is not straightforward. Having models which can learn to understand video is of interest for many applications, including content recommendation, prediction, summarization, event/object detection and understanding human v… ▽ More

    Submitted 5 February, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

  36. arXiv:1611.05013  [pdf, other

    cs.LG

    PixelVAE: A Latent Variable Model for Natural Images

    Authors: Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, Aaron Courville

    Abstract: Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder… ▽ More

    Submitted 15 November, 2016; originally announced November 2016.

  37. arXiv:1610.09038  [pdf, other

    stat.ML cs.LG

    Professor Forcing: A New Algorithm for Training Recurrent Networks

    Authors: Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio

    Abstract: The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling f… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: NIPS 2016 Accepted Paper

  38. arXiv:1607.07086  [pdf, other

    cs.LG

    An Actor-Critic Algorithm for Sequence Prediction

    Authors: Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

    Abstract: We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \texti… ▽ More

    Submitted 3 March, 2017; v1 submitted 24 July, 2016; originally announced July 2016.

  39. arXiv:1606.02680  [pdf, other

    cs.CL

    First Result on Arabic Neural Machine Translation

    Authors: Amjad Almahairi, Kyunghyun Cho, Nizar Habash, Aaron Courville

    Abstract: Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation. We notice however that much of research on neural machine translation has focused on European languages despite its language agnostic nature. In this paper, we apply neural machine translation to the task of Arabic translation (Ar<->En) and compare it against a standard phrase-bas… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

    Comments: EMNLP submission

  40. arXiv:1606.01305  [pdf, other

    cs.NE cs.CL cs.LG

    Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

    Authors: David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

    Abstract: We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of drop** hidden units, gradient information and state information are more readily propagated through time, as in feed… ▽ More

    Submitted 22 September, 2017; v1 submitted 3 June, 2016; originally announced June 2016.

    Comments: David Krueger and Tegan Maharaj contributed equally to this work

  41. arXiv:1606.00776  [pdf, other

    cs.CL cs.AI cs.LG cs.NE stat.ML

    Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

    Authors: Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville

    Abstract: We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is suf… ▽ More

    Submitted 13 June, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: 21 pages, 2 figures, 10 tables

    ACM Class: I.5.1; I.2.7

  42. arXiv:1606.00704  [pdf, other

    stat.ML cs.LG

    Adversarially Learned Inference

    Authors: Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville

    Abstract: We introduce the adversarially learned inference (ALI) model, which jointly learns a generation network and an inference network using an adversarial process. The generation network maps samples from stochastic latent variables to the data space while the inference network maps training examples in data space to the space of latent variables. An adversarial game is cast between these two networks… ▽ More

    Submitted 21 February, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

  43. arXiv:1605.06069  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

    Authors: Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

    Abstract: Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue. In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps. We apply the proposed model to the task of dialogue r… ▽ More

    Submitted 13 June, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

    Comments: 15 pages, 5 tables, 4 figures

    ACM Class: I.5.1; I.2.7

  44. arXiv:1605.03705  [pdf, other

    cs.CV cs.CL

    Movie Description

    Authors: Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele

    Abstract: Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full… ▽ More

    Submitted 12 May, 2016; originally announced May 2016.

  45. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  46. arXiv:1603.09025  [pdf, other

    cs.LG

    Recurrent Batch Normalization

    Authors: Tim Cooijmans, Nicolas Ballas, César Laurent, Çağlar Gülçehre, Aaron Courville

    Abstract: We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate… ▽ More

    Submitted 27 February, 2017; v1 submitted 29 March, 2016; originally announced March 2016.

  47. arXiv:1603.06807  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

    Authors: Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sung** Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio

    Abstract: Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base… ▽ More

    Submitted 29 May, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

    Comments: 13 pages, 1 figure, 7 tables

    ACM Class: H.3.4; I.5.1; I.2.6; I.2.7

  48. arXiv:1602.03220  [pdf, other

    stat.ML cs.LG

    Discriminative Regularization for Generative Models

    Authors: Alex Lamb, Vincent Dumoulin, Aaron Courville

    Abstract: We explore the question of whether the representations learned by classifiers can be used to enhance the quality of generative models. Our conjecture is that labels correspond to characteristics of natural data which are most salient to humans: identity in faces, objects in images, and utterances in speech. We propose to take advantage of this by using the representations from discriminative class… ▽ More

    Submitted 15 February, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

  49. arXiv:1511.07838  [pdf, other

    cs.LG cs.NE

    Dynamic Capacity Networks

    Authors: Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, Aaron Courville

    Abstract: We introduce the Dynamic Capacity Network (DCN), a neural network that can adaptively assign its capacity across different portions of the input data. This is achieved by combining modules of two types: low-capacity sub-networks and high-capacity sub-networks. The low-capacity sub-networks are applied across most of the input, but also provide a guide to select a few portions of the input on which… ▽ More

    Submitted 22 May, 2016; v1 submitted 24 November, 2015; originally announced November 2015.

    Comments: ICML 2016

  50. arXiv:1511.07053  [pdf, other

    cs.CV cs.LG

    ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

    Authors: Francesco Visin, Marco Ciccone, Adriana Romero, Kyle Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Matteucci, Aaron Courville

    Abstract: We propose a structured prediction architecture, which exploits the local generic features extracted by Convolutional Neural Networks and the capacity of Recurrent Neural Networks (RNN) to retrieve distant dependencies. The proposed architecture, called ReSeg, is based on the recently introduced ReNet model for image classification. We modify and extend it to perform the more challenging task of s… ▽ More

    Submitted 24 May, 2016; v1 submitted 22 November, 2015; originally announced November 2015.

    Comments: In CVPR Deep Vision Workshop, 2016