Skip to main content

Showing 1–9 of 9 results for author: Moczulski, M

.
  1. arXiv:1907.10247  [pdf, other

    cs.LG cs.AI stat.ML

    Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

    Authors: Yijie Guo, Jongwook Choi, Marcin Moczulski, Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

    Abstract: Reinforcement learning with sparse rewards is challenging because an agent can rarely obtain non-zero rewards and hence, gradient-based optimization of parameterized policies can be incremental and slow. Recent work demonstrated that using a memory buffer of previous successful trajectories can result in more effective policies. However, existing methods may overly exploit past successful experien… ▽ More

    Submitted 14 February, 2021; v1 submitted 24 July, 2019; originally announced July 2019.

  2. arXiv:1811.01483  [pdf, other

    cs.LG cs.AI stat.ML

    Contingency-Aware Exploration in Reinforcement Learning

    Authors: Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee

    Abstract: This paper investigates whether learning contingency-awareness and controllable aspects of an environment can lead to better exploration in reinforcement learning. To investigate this question, we consider an instantiation of this hypothesis evaluated on the Arcade Learning Element (ALE). In this study, we develop an attentive dynamics model (ADM) that discovers controllable elements of the observ… ▽ More

    Submitted 4 March, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

    Comments: In ICLR 2019

  3. arXiv:1703.00788  [pdf, other

    cs.LG

    A Robust Adaptive Stochastic Gradient Method for Deep Learning

    Authors: Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

    Abstract: Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate algorithm, which utilizes stoch… ▽ More

    Submitted 2 March, 2017; originally announced March 2017.

    Comments: IJCNN 2017 Accepted Paper, An extension of our paper, "ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient"

  4. arXiv:1608.04980  [pdf, other

    cs.LG cs.NE

    Mollifying Networks

    Authors: Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

    Abstract: The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e.g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent. In this paper, we attack the problem of optimization of highly non-convex neural n… ▽ More

    Submitted 17 August, 2016; originally announced August 2016.

  5. arXiv:1603.00391  [pdf, other

    cs.LG cs.NE stat.ML

    Noisy Activation Functions

    Authors: Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

    Abstract: Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of… ▽ More

    Submitted 3 April, 2016; v1 submitted 1 March, 2016; originally announced March 2016.

  6. arXiv:1511.06428  [pdf, other

    cs.LG cs.CV

    A Controller-Recognizer Framework: How necessary is recognition for control?

    Authors: Marcin Moczulski, Kelvin Xu, Aaron Courville, Kyunghyun Cho

    Abstract: Recently there has been growing interest in building active visual object recognizers, as opposed to the usual passive recognizers which classifies a given static image into a predefined set of object categories. In this paper we propose to generalize these recently proposed end-to-end active visual recognizers into a controller-recognizer framework. A model in the controller-recognizer framework… ▽ More

    Submitted 9 February, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  7. arXiv:1511.05946  [pdf, other

    cs.LG cs.NE

    ACDC: A Structured Efficient Linear Layer

    Authors: Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas

    Abstract: The linear layer is one of the most pervasive modules in deep learning representations. However, it requires $O(N^2)$ parameters and $O(N^2)$ operations. These costs can be prohibitive in mobile applications or prevent scaling in many domains. Here, we introduce a deep, differentiable, fully-connected neural network module composed of diagonal matrices of parameters, $\mathbf{A}$ and $\mathbf{D}$,… ▽ More

    Submitted 19 March, 2016; v1 submitted 18 November, 2015; originally announced November 2015.

  8. arXiv:1412.7419  [pdf, other

    cs.LG cs.NE stat.ML

    ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

    Authors: Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

    Abstract: Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information for automat… ▽ More

    Submitted 31 October, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

    Comments: 8 pages, 3 figures, ICLR workshop submission

  9. arXiv:1412.7149  [pdf, other

    cs.LG cs.NE stat.ML

    Deep Fried Convnets

    Authors: Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang

    Abstract: The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters. Reducing the number of parameters while preserving essentially the same predictive performance is critically important for operating deep neural networks in memory constrained environments such as GP… ▽ More

    Submitted 17 July, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

    Comments: svd experiments included