-
Meta-learning of Sequential Strategies
Authors:
Pedro A. Ortega,
Jane X. Wang,
Mark Rowland,
Tim Genewein,
Zeb Kurth-Nelson,
Razvan Pascanu,
Nicolas Heess,
Joel Veness,
Alex Pritzel,
Pablo Sprechmann,
Siddhant M. Jayakumar,
Tom McGrath,
Kevin Miller,
Mohammad Azar,
Ian Osband,
Neil Rabinowitz,
András György,
Silvia Chiappa,
Simon Osindero,
Yee Whye Teh,
Hado van Hasselt,
Nando de Freitas,
Matthew Botvinick,
Shane Legg
Abstract:
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred…
▽ More
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.
△ Less
Submitted 18 July, 2019; v1 submitted 8 May, 2019;
originally announced May 2019.
-
Causal Reasoning from Meta-reinforcement Learning
Authors:
Ishita Dasgupta,
Jane Wang,
Silvia Chiappa,
Jovana Mitrovic,
Pedro Ortega,
David Raposo,
Edward Hughes,
Peter Battaglia,
Matthew Botvinick,
Zeb Kurth-Nelson
Abstract:
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel…
▽ More
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Been There, Done That: Meta-Learning with Episodic Recall
Authors:
Samuel Ritter,
Jane X. Wang,
Zeb Kurth-Nelson,
Siddhant M. Jayakumar,
Charles Blundell,
Razvan Pascanu,
Matthew Botvinick
Abstract:
Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious e…
▽ More
Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.
△ Less
Submitted 6 July, 2018; v1 submitted 24 May, 2018;
originally announced May 2018.
-
Learning to reinforcement learn
Authors:
Jane X Wang,
Zeb Kurth-Nelson,
Dhruva Tirumala,
Hubert Soyer,
Joel Z Leibo,
Remi Munos,
Charles Blundell,
Dharshan Kumaran,
Matt Botvinick
Abstract:
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this cha…
▽ More
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.
△ Less
Submitted 23 January, 2017; v1 submitted 17 November, 2016;
originally announced November 2016.