Skip to main content

Showing 151–184 of 184 results for author: Courville, A

.
  1. arXiv:1511.06481  [pdf, other

    stat.ML cs.LG

    Variance Reduction in SGD by Distributed Importance Sampling

    Authors: Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio

    Abstract: Humans are able to accelerate their learning by selecting training materials that are the most informative and at the appropriate level of difficulty. We propose a framework for distributing deep learning in which one set of workers search for the most informative examples in parallel while a single worker updates the model on examples selected by importance sampling. This leads the model to updat… ▽ More

    Submitted 16 April, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  2. arXiv:1511.06456  [pdf, other

    cs.LG

    Task Loss Estimation for Sequence Prediction

    Authors: Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

    Abstract: Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for… ▽ More

    Submitted 19 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016

  3. arXiv:1511.06432  [pdf, other

    cs.CV cs.LG cs.NE

    Delving Deeper into Convolutional Networks for Learning Video Representations

    Authors: Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville

    Abstract: We propose an approach to learn spatio-temporal features in videos from intermediate visual representations we call "percepts" using Gated-Recurrent-Unit Recurrent Networks (GRUs).Our method relies on percepts that are extracted from all level of a deep convolutional network trained on the large ImageNet dataset. While high-level percepts contain highly discriminative information, they tend to hav… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR 2016

  4. arXiv:1511.06430  [pdf, other

    cs.LG

    Deconstructing the Ladder Network Architecture

    Authors: Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

    Abstract: The Manual labeling of data is and will remain a costly endeavor. For this reason, semi-supervised learning remains a topic of practical importance. The recently proposed Ladder Network is one such approach that has proven to be very successful. In addition to the supervised objective, the Ladder Network also adds an unsupervised objective corresponding to the reconstruction costs of a stack of de… ▽ More

    Submitted 24 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016

  5. arXiv:1511.06428  [pdf, other

    cs.LG cs.CV

    A Controller-Recognizer Framework: How necessary is recognition for control?

    Authors: Marcin Moczulski, Kelvin Xu, Aaron Courville, Kyunghyun Cho

    Abstract: Recently there has been growing interest in building active visual object recognizers, as opposed to the usual passive recognizers which classifies a given static image into a predefined set of object categories. In this paper we propose to generalize these recently proposed end-to-end active visual recognizers into a controller-recognizer framework. A model in the controller-recognizer framework… ▽ More

    Submitted 9 February, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  6. arXiv:1507.04808  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

    Authors: Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

    Abstract: We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural netwo… ▽ More

    Submitted 6 April, 2016; v1 submitted 16 July, 2015; originally announced July 2015.

    Comments: 8 pages with references; Published in AAAI 2016 (Special Track on Cognitive Systems)

    ACM Class: I.5.1; I.2.7

  7. arXiv:1507.01053  [pdf, other

    cs.NE cs.CL cs.CV cs.LG

    Describing Multimedia Content using Attention-based Encoder--Decoder Networks

    Authors: Kyunghyun Cho, Aaron Courville, Yoshua Bengio

    Abstract: Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input. We focus in this paper on the case where the input also has a rich structure and the input and output structures are somehow relat… ▽ More

    Submitted 3 July, 2015; originally announced July 2015.

    Comments: Submitted to IEEE Transactions on Multimedia Special Issue on Deep Learning for Multimedia Computing

  8. arXiv:1506.02216  [pdf, other

    cs.LG

    A Recurrent Latent Variable Model for Sequential Data

    Authors: Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

    Abstract: In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirical… ▽ More

    Submitted 6 April, 2016; v1 submitted 7 June, 2015; originally announced June 2015.

  9. Brain Tumor Segmentation with Deep Neural Networks

    Authors: Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal, Pierre-Marc Jodoin, Hugo Larochelle

    Abstract: In this paper, we present a fully automatic brain tumor segmentation method based on Deep Neural Networks (DNNs). The proposed networks are tailored to glioblastomas (both low and high grade) pictured in MR images. By their very nature, these tumors can appear anywhere in the brain and have almost any kind of shape, size, and contrast. These reasons motivate our exploration of a machine learning s… ▽ More

    Submitted 20 May, 2016; v1 submitted 13 May, 2015; originally announced May 2015.

  10. arXiv:1505.00393  [pdf, other

    cs.CV

    ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks

    Authors: Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio

    Abstract: In this paper, we propose a deep neural network architecture for object recognition based on recurrent neural networks. The proposed network, called ReNet, replaces the ubiquitous convolution+pooling layer of the deep convolutional neural network with four recurrent neural networks that sweep horizontally and vertically in both directions across the image. We evaluate the proposed ReNet on three w… ▽ More

    Submitted 23 July, 2015; v1 submitted 3 May, 2015; originally announced May 2015.

  11. arXiv:1503.01800  [pdf, other

    cs.LG cs.CV

    EmoNets: Multimodal deep learning approaches for emotion recognition in video

    Authors: Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, SĂ©bastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio

    Abstract: The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple… ▽ More

    Submitted 29 March, 2015; v1 submitted 5 March, 2015; originally announced March 2015.

  12. arXiv:1503.01070  [pdf, other

    cs.CV cs.AI

    Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

    Authors: Atousa Torabi, Christopher Pal, Hugo Larochelle, Aaron Courville

    Abstract: In this work, we introduce a dataset of video annotated with high quality natural language phrases describing the visual content in a given segment of time. Our dataset is based on the Descriptive Video Service (DVS) that is now encoded on many digital media products such as DVDs. DVS is an audio narration describing the visual elements and actions in a movie for the visually impaired. It is tempo… ▽ More

    Submitted 3 March, 2015; originally announced March 2015.

    Comments: 7 pages

  13. arXiv:1502.08029  [pdf, other

    stat.ML cs.AI cs.CL cs.CV cs.LG

    Describing Videos by Exploiting Temporal Structure

    Authors: Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville

    Abstract: Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully… ▽ More

    Submitted 30 September, 2015; v1 submitted 27 February, 2015; originally announced February 2015.

    Comments: Accepted to ICCV15. This version comes with code release and supplementary material

  14. arXiv:1502.03044  [pdf, other

    cs.LG cs.CV

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    Authors: Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio

    Abstract: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to auto… ▽ More

    Submitted 19 April, 2016; v1 submitted 10 February, 2015; originally announced February 2015.

  15. arXiv:1410.0123  [pdf, other

    cs.LG stat.ML

    Deep Tempering

    Authors: Guillaume Desjardins, Heng Luo, Aaron Courville, Yoshua Bengio

    Abstract: Restricted Boltzmann Machines (RBMs) are one of the fundamental building blocks of deep learning. Approximate maximum likelihood training of RBMs typically necessitates sampling from these models. In many training scenarios, computationally efficient Gibbs sampling procedures are crippled by poor mixing. In this work we propose a novel method of sampling from Boltzmann machines that demonstrates a… ▽ More

    Submitted 1 October, 2014; originally announced October 2014.

  16. arXiv:1406.2661  [pdf, other

    stat.ML cs.LG

    Generative Adversarial Networks

    Authors: Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

    Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This fram… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

  17. arXiv:1312.6211  [pdf, other

    stat.ML cs.LG cs.NE

    An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

    Authors: Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua Bengio

    Abstract: Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, co… ▽ More

    Submitted 3 March, 2015; v1 submitted 21 December, 2013; originally announced December 2013.

  18. arXiv:1312.6197  [pdf, other

    stat.ML cs.LG cs.NE

    An empirical analysis of dropout in piecewise linear networks

    Authors: David Warde-Farley, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

    Abstract: The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters. In this work we empirically investigate several questions related to the efficacy of dropout, sp… ▽ More

    Submitted 2 January, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Extensive updates; 8 pages plus acknowledgements/references

  19. arXiv:1312.5258  [pdf, other

    stat.ML cs.LG

    On the Challenges of Physical Implementations of RBMs

    Authors: Vincent Dumoulin, Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

    Abstract: Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing sample… ▽ More

    Submitted 24 October, 2014; v1 submitted 18 December, 2013; originally announced December 2013.

    Journal ref: Proc. AAAI 2014, pp. 1199-1205

  20. arXiv:1308.3432  [pdf, other

    cs.LG

    Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

    Authors: Yoshua Bengio, Nicholas LĂ©onard, Aaron Courville

    Abstract: Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons? I.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and compare fou… ▽ More

    Submitted 15 August, 2013; originally announced August 2013.

    Comments: arXiv admin note: substantial text overlap with arXiv:1305.2982

  21. arXiv:1307.0414  [pdf, other

    stat.ML cs.LG

    Challenges in Representation Learning: A report on three machine learning contests

    Authors: Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, **g**g Xie, Lukasz Romaszko , et al. (3 additional authors not shown)

    Abstract: The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin… ▽ More

    Submitted 1 July, 2013; originally announced July 2013.

    Comments: 8 pages, 2 figures

  22. arXiv:1302.4389  [pdf, other

    stat.ML cs.LG

    Maxout Networks

    Authors: Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio

    Abstract: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model av… ▽ More

    Submitted 20 September, 2013; v1 submitted 18 February, 2013; originally announced February 2013.

    Comments: This is the version of the paper that appears in ICML 2013

    Journal ref: JMLR WCP 28 (3): 1319-1327, 2013

  23. arXiv:1301.3568  [pdf, other

    stat.ML cs.LG

    Joint Training Deep Boltzmann Machines for Classification

    Authors: Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

    Abstract: We introduce a new method for training deep Boltzmann machines jointly. Prior methods of training DBMs require an initial learning pass that trains the model greedily, one layer at a time, or do not perform well on classification tasks. In our approach, we train all layers of the DBM simultaneously, using a novel training procedure called multi-prediction training. The resulting model can either b… ▽ More

    Submitted 1 May, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

    Comments: Major revision with new techniques and experiments. This version includes new material put on the poster for the ICLR workshop

  24. arXiv:1301.3545  [pdf, other

    cs.LG cs.NE stat.ML

    Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

    Authors: Guillaume Desjardins, Razvan Pascanu, Aaron Courville, Yoshua Bengio

    Abstract: This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitely storing the natural gradient metric $L$. This metric is shown to be the expected second derivative of… ▽ More

    Submitted 16 March, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

  25. arXiv:1212.2686  [pdf, ps, other

    stat.ML cs.LG

    Joint Training of Deep Boltzmann Machines

    Authors: Ian Goodfellow, Aaron Courville, Yoshua Bengio

    Abstract: We introduce a new method for training deep Boltzmann machines jointly. Prior methods require an initial learning pass that trains the deep Boltzmann machine greedily, one layer at a time, or do not perform well on classifi- cation tasks.

    Submitted 11 December, 2012; originally announced December 2012.

    Comments: 4 pages

  26. arXiv:1211.5687  [pdf, other

    cs.LG stat.ML

    Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions

    Authors: Heng Luo, Pierre Luc Carrier, Aaron Courville, Yoshua Bengio

    Abstract: We apply the spike-and-slab Restricted Boltzmann Machine (ssRBM) to texture modeling. The ssRBM with tiled-convolution weight sharing (TssRBM) achieves or surpasses the state-of-the-art on texture synthesis and inpainting by parametric models. We also develop a novel RBM model with a spike-and-slab visible layer and binary variables in the hidden layer. This model is designed to be stacked on top… ▽ More

    Submitted 24 November, 2012; originally announced November 2012.

  27. arXiv:1210.5474  [pdf, other

    stat.ML cs.LG cs.NE

    Disentangling Factors of Variation via Generative Entangling

    Authors: Guillaume Desjardins, Aaron Courville, Yoshua Bengio

    Abstract: Here we propose a novel model family with the objective of learning to disentangle the factors of variation in data. Our approach is based on the spike-and-slab restricted Boltzmann machine which we generalize to include higher-order interactions among multiple latent variables. Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation. Inf… ▽ More

    Submitted 19 October, 2012; originally announced October 2012.

  28. arXiv:1209.0521  [pdf, other

    cs.LG stat.ML

    Efficient EM Training of Gaussian Mixtures with Missing Data

    Authors: Olivier Delalleau, Aaron Courville, Yoshua Bengio

    Abstract: In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore in this paper is the use of a generative model (a mixture of Gaussians) to compute the conditional expectation of the missing variables given the observed variables. Since training a Gaus… ▽ More

    Submitted 8 January, 2018; v1 submitted 3 September, 2012; originally announced September 2012.

  29. arXiv:1206.6407  [pdf

    cs.LG stat.ML

    Large-Scale Feature Learning With Spike-and-Slab Sparse Coding

    Authors: Ian Goodfellow, Aaron Courville, Yoshua Bengio

    Abstract: We consider the problem of object recognition with a large number of classes. In order to overcome the low amount of labeled examples available in this setting, we introduce a new feature learning and extraction procedure based on a factor model we call spike-and-slab sparse coding (S3C). Prior work on S3C has not prioritized the ability to exploit parallel architectures and scale S3C to the enorm… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012). arXiv admin note: substantial text overlap with arXiv:1201.3382

  30. arXiv:1206.5538  [pdf, other

    cs.LG

    Representation Learning: A Review and New Perspectives

    Authors: Yoshua Bengio, Aaron Courville, Pascal Vincent

    Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is… ▽ More

    Submitted 23 April, 2014; v1 submitted 24 June, 2012; originally announced June 2012.

  31. arXiv:1203.4416  [pdf, other

    cs.NE cs.AI cs.LG

    On Training Deep Boltzmann Machines

    Authors: Guillaume Desjardins, Aaron Courville, Yoshua Bengio

    Abstract: The deep Boltzmann machine (DBM) has been an important development in the quest for powerful "deep" probabilistic models. To date, simultaneous or joint training of all layers of the DBM has been largely unsuccessful with existing training methods. We introduce a simple regularization scheme that encourages the weight vectors associated with each hidden unit to have similar norms. We demonstrate t… ▽ More

    Submitted 20 March, 2012; originally announced March 2012.

  32. arXiv:1201.3382  [pdf, other

    stat.ML cs.LG

    Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery

    Authors: Ian J. Goodfellow, Aaron Courville, Yoshua Bengio

    Abstract: We consider the problem of using a factor model we call {\em spike-and-slab sparse coding} (S3C) to learn features for a classification task. The S3C model resembles both the spike-and-slab RBM and sparse coding. Since exact inference in this model is intractable, we derive a structured variational inference procedure and employ a variational EM training algorithm. Prior work on approximate infere… ▽ More

    Submitted 3 April, 2012; v1 submitted 16 January, 2012; originally announced January 2012.

  33. arXiv:1109.6638  [pdf, other

    cs.CV cs.AI

    The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)

    Authors: James Bergstra, Aaron Courville, Yoshua Bengio

    Abstract: Sparse coding is a proven principle for learning compact representations of images. However, sparse coding by itself often leads to very redundant dictionaries. With images, this often takes the form of similar edge detectors which are replicated many times at various positions, scales and orientations. An immediate consequence of this observation is that the estimation of the dictionary component… ▽ More

    Submitted 30 September, 2011; v1 submitted 29 September, 2011; originally announced September 2011.

    Comments: 9 pages, 8 figures

  34. arXiv:1012.3476  [pdf, other

    stat.ML cs.NE

    Adaptive Parallel Tempering for Stochastic Maximum Likelihood Learning of RBMs

    Authors: Guillaume Desjardins, Aaron Courville, Yoshua Bengio

    Abstract: Restricted Boltzmann Machines (RBM) have attracted a lot of attention of late, as one the principle building blocks of deep networks. Training RBMs remains problematic however, because of the intractibility of their partition function. The maximum likelihood gradient requires a very robust sampler which can accurately sample from the model despite the loss of ergodicity often incurred during learn… ▽ More

    Submitted 15 December, 2010; originally announced December 2010.

    Comments: Presented at the "NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning"