-
Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban
Authors:
Peter Karkus,
Mehdi Mirza,
Arthur Guez,
Andrew Jaegle,
Timothy Lillicrap,
Lars Buesing,
Nicolas Heess,
Theophane Weber
Abstract:
Intelligent robots need to achieve abstract objectives using concrete, spatiotemporally complex sensory information and motor control. Tabula rasa deep reinforcement learning (RL) has tackled demanding tasks in terms of either visual, abstract, or physical reasoning, but solving these jointly remains a formidable challenge. One recent, unsolved benchmark task that integrates these challenges is Mu…
▽ More
Intelligent robots need to achieve abstract objectives using concrete, spatiotemporally complex sensory information and motor control. Tabula rasa deep reinforcement learning (RL) has tackled demanding tasks in terms of either visual, abstract, or physical reasoning, but solving these jointly remains a formidable challenge. One recent, unsolved benchmark task that integrates these challenges is Mujoban, where a robot needs to arrange 3D warehouses generated from 2D Sokoban puzzles. We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures. Unlike classic architectures that are typically model-based, we use only model-free modules trained with RL or supervised learning. We find that our modular RL approach dramatically outperforms the state-of-the-art monolithic RL agent on Mujoban. Further, learned modules can be reused when, e.g., using a different robot platform to solve the same task. Together our results give strong evidence for the importance of research into modular RL designs. Project website: https://sites.google.com/view/modular-rl/
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
An investigation of model-free planning
Authors:
Arthur Guez,
Mehdi Mirza,
Karol Gregor,
Rishabh Kabra,
Sébastien Racanière,
Théophane Weber,
David Raposo,
Adam Santoro,
Laurent Orseau,
Tom Eccles,
Greg Wayne,
David Silver,
Timothy Lillicrap
Abstract:
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos…
▽ More
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.
△ Less
Submitted 20 May, 2019; v1 submitted 11 January, 2019;
originally announced January 2019.
-
Unsupervised Predictive Memory in a Goal-Directed Agent
Authors:
Greg Wayne,
Chia-Chun Hung,
David Amos,
Mehdi Mirza,
Arun Ahuja,
Agnieszka Grabska-Barwinska,
Jack Rae,
Piotr Mirowski,
Joel Z. Leibo,
Adam Santoro,
Mevlana Gemici,
Malcolm Reynolds,
Tim Harley,
Josh Abramson,
Shakir Mohamed,
Danilo Rezende,
David Saxton,
Adam Cain,
Chloe Hillier,
David Silver,
Koray Kavukcuoglu,
Matt Botvinick,
Demis Hassabis,
Timothy Lillicrap
Abstract:
Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l…
▽ More
Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Generalizable Features From Unsupervised Learning
Authors:
Mehdi Mirza,
Aaron Courville,
Yoshua Bengio
Abstract:
Humans learn a predictive model of the world and use this model to reason about future events and the consequences of actions. In contrast to most machine predictors, we exhibit an impressive ability to generalize to unseen scenarios and reason intelligently in these settings. One important aspect of this ability is physical intuition(Lake et al., 2016). In this work, we explore the potential of u…
▽ More
Humans learn a predictive model of the world and use this model to reason about future events and the consequences of actions. In contrast to most machine predictors, we exhibit an impressive ability to generalize to unseen scenarios and reason intelligently in these settings. One important aspect of this ability is physical intuition(Lake et al., 2016). In this work, we explore the potential of unsupervised learning to find features that promote better generalization to settings outside the supervised training distribution. Our task is predicting the stability of towers of square blocks. We demonstrate that an unsupervised model, trained to predict future frames of a video sequence of stable and unstable block configurations, can yield features that support extrapolating stability prediction to blocks configurations outside the training set distribution
△ Less
Submitted 12 December, 2016;
originally announced December 2016.
-
Conditional Generative Adversarial Nets
Authors:
Mehdi Mirza,
Simon Osindero
Abstract:
Generative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate…
▽ More
Generative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
Generative Adversarial Networks
Authors:
Ian J. Goodfellow,
Jean Pouget-Abadie,
Mehdi Mirza,
Bing Xu,
David Warde-Farley,
Sherjil Ozair,
Aaron Courville,
Yoshua Bengio
Abstract:
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This fram…
▽ More
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
△ Less
Submitted 10 June, 2014;
originally announced June 2014.
-
An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
Authors:
Ian J. Goodfellow,
Mehdi Mirza,
Da Xiao,
Aaron Courville,
Yoshua Bengio
Abstract:
Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, co…
▽ More
Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between tasks result in very different rankings of activation function performance. This suggests the choice of activation function should always be cross-validated.
△ Less
Submitted 3 March, 2015; v1 submitted 21 December, 2013;
originally announced December 2013.
-
Pylearn2: a machine learning research library
Authors:
Ian J. Goodfellow,
David Warde-Farley,
Pascal Lamblin,
Vincent Dumoulin,
Mehdi Mirza,
Razvan Pascanu,
James Bergstra,
Frédéric Bastien,
Yoshua Bengio
Abstract:
Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summa…
▽ More
Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summary of the library's architecture, and a description of how the Pylearn2 community functions socially.
△ Less
Submitted 19 August, 2013;
originally announced August 2013.
-
Challenges in Representation Learning: A report on three machine learning contests
Authors:
Ian J. Goodfellow,
Dumitru Erhan,
Pierre Luc Carrier,
Aaron Courville,
Mehdi Mirza,
Ben Hamner,
Will Cukierski,
Yichuan Tang,
David Thaler,
Dong-Hyun Lee,
Yingbo Zhou,
Chetan Ramaiah,
Fangxiang Feng,
Ruifan Li,
Xiaojie Wang,
Dimitris Athanasakis,
John Shawe-Taylor,
Maxim Milakov,
John Park,
Radu Ionescu,
Marius Popescu,
Cristian Grozea,
James Bergstra,
**g**g Xie,
Lukasz Romaszko
, et al. (3 additional authors not shown)
Abstract:
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin…
▽ More
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.
△ Less
Submitted 1 July, 2013;
originally announced July 2013.
-
Maxout Networks
Authors:
Ian J. Goodfellow,
David Warde-Farley,
Mehdi Mirza,
Aaron Courville,
Yoshua Bengio
Abstract:
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model av…
▽ More
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.
△ Less
Submitted 20 September, 2013; v1 submitted 18 February, 2013;
originally announced February 2013.