Skip to main content

Showing 101–117 of 117 results for author: Pascanu, R

.
  1. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  2. arXiv:1511.06295  [pdf, other

    cs.LG

    Policy Distillation

    Authors: Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

    Abstract: Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016

  3. arXiv:1507.00210  [pdf, other

    stat.ML cs.LG cs.NE

    Natural Neural Networks

    Authors: Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, Koray Kavukcuoglu

    Abstract: We introduce Natural Neural Networks, a novel family of algorithms that speed up convergence by adapting their internal representation during training to improve conditioning of the Fisher matrix. In particular, we show a specific example that employs a simple and efficient reparametrization of the neural network weights by implicitly whitening the representation obtained at each layer, while pres… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

  4. arXiv:1406.2572  [pdf, other

    cs.LG math.OC stat.ML

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Authors: Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

    Abstract: A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima wit… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

    Comments: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG]

  5. arXiv:1405.4604  [pdf, other

    cs.LG cs.NE

    On the saddle point problem for non-convex optimization

    Authors: Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

    Abstract: A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of l… ▽ More

    Submitted 27 May, 2014; v1 submitted 19 May, 2014; originally announced May 2014.

    Comments: 11 pages, 8 figures

  6. arXiv:1402.1869  [pdf, other

    stat.ML cs.LG cs.NE

    On the Number of Linear Regions of Deep Neural Networks

    Authors: Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio

    Abstract: We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs.… ▽ More

    Submitted 7 June, 2014; v1 submitted 8 February, 2014; originally announced February 2014.

  7. arXiv:1312.6098  [pdf, other

    cs.LG cs.NE

    On the number of response regions of deep feed forward networks with piece-wise linear activations

    Authors: Razvan Pascanu, Guido Montufar, Yoshua Bengio

    Abstract: This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions base… ▽ More

    Submitted 14 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: 17 pages, 9 figures

  8. arXiv:1312.6026  [pdf, other

    cs.NE cs.LG stat.ML

    How to Construct Deep Recurrent Neural Networks

    Authors: Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

    Abstract: In this paper, we explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-… ▽ More

    Submitted 24 April, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Accepted at ICLR 2014 (Conference Track). 10-page text + 3-page references

  9. arXiv:1311.1780  [pdf, other

    cs.NE cs.LG stat.ML

    Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

    Authors: Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio

    Abstract: In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks. The proposed $L_p$ unit receives signals from several projections of a subset of units in the layer below and computes a normalized $L_p$ norm. We notice two interesting interpretations of the $L_p$ unit. First, the proposed unit can be understood as a generalization of a number of convent… ▽ More

    Submitted 1 September, 2014; v1 submitted 7 November, 2013; originally announced November 2013.

    Comments: ECML/PKDD 2014

  10. arXiv:1308.4214  [pdf, ps, other

    stat.ML cs.LG cs.MS

    Pylearn2: a machine learning research library

    Authors: Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James Bergstra, Frédéric Bastien, Yoshua Bengio

    Abstract: Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summa… ▽ More

    Submitted 19 August, 2013; originally announced August 2013.

    Comments: 9 pages

  11. arXiv:1301.3584  [pdf, other

    cs.LG math.NA

    Revisiting Natural Gradient for Deep Networks

    Authors: Razvan Pascanu, Yoshua Bengio

    Abstract: We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods for training deep models: Hessian-Free (Martens, 2010), Krylov Subspace Descent (Vinyals and Povey, 2012) and TONGA (Le Roux et al., 2008). We describe how… ▽ More

    Submitted 17 February, 2014; v1 submitted 15 January, 2013; originally announced January 2013.

  12. arXiv:1301.3545  [pdf, other

    cs.LG cs.NE stat.ML

    Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

    Authors: Guillaume Desjardins, Razvan Pascanu, Aaron Courville, Yoshua Bengio

    Abstract: This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitely storing the natural gradient metric $L$. This metric is shown to be the expected second derivative of… ▽ More

    Submitted 16 March, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

  13. arXiv:1212.0901  [pdf, ps, other

    cs.LG

    Advances in Optimizing Recurrent Networks

    Authors: Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu

    Abstract: After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding dee… ▽ More

    Submitted 13 December, 2012; v1 submitted 4 December, 2012; originally announced December 2012.

  14. arXiv:1211.5590  [pdf, other

    cs.SC cs.LG

    Theano: new features and speed improvements

    Authors: Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio

    Abstract: Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations. In this paper, we present new features and efficiency improvements to Theano, and benchmarks demonstrating Theano's performance relative to Torch7, a recently introduced machine learning library, and to RNNLM, a C++ library targeted at recurre… ▽ More

    Submitted 23 November, 2012; originally announced November 2012.

    Comments: Presented at the Deep Learning Workshop, NIPS 2012

  15. arXiv:1211.5063  [pdf, other

    cs.LG

    On the difficulty of training Recurrent Neural Networks

    Authors: Razvan Pascanu, Tomas Mikolov, Yoshua Bengio

    Abstract: There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective s… ▽ More

    Submitted 15 February, 2013; v1 submitted 21 November, 2012; originally announced November 2012.

    Comments: Improved description of the exploding gradient problem and description and analysis of the vanishing gradient problem

  16. arXiv:1103.2832  [pdf, other

    cs.LG cs.IR cs.SD

    Autotagging music with conditional restricted Boltzmann machines

    Authors: Michael Mandel, Razvan Pascanu, Hugo Larochelle, Yoshua Bengio

    Abstract: This paper describes two applications of conditional restricted Boltzmann machines (CRBMs) to the task of autotagging music. The first consists of training a CRBM to predict tags that a user would apply to a clip of a song based on tags already applied by other users. By learning the relationships between tags, this model is able to pre-process training data to significantly improve the performanc… ▽ More

    Submitted 14 March, 2011; originally announced March 2011.

  17. arXiv:1009.3589  [pdf, other

    cs.LG cs.CV cs.NE

    Deep Self-Taught Learning for Handwritten Character Recognition

    Authors: Frédéric Bastien, Yoshua Bengio, Arnaud Bergeron, Nicolas Boulanger-Lewandowski, Thomas Breuel, Youssouf Chherawala, Moustapha Cisse, Myriam Côté, Dumitru Erhan, Jeremy Eustache, Xavier Glorot, Xavier Muller, Sylvain Pannetier Lebeuf, Razvan Pascanu, Salah Rifai, Francois Savard, Guillaume Sicard

    Abstract: Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of… ▽ More

    Submitted 18 September, 2010; originally announced September 2010.

    Report number: 1353, Dept. IRO, U. Montreal MSC Class: 68T05 ACM Class: I.2.6