-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Authors:
Aleksandar Botev,
Soham De,
Samuel L Smith,
Anushan Fernando,
George-Cristian Muraru,
Ruba Haroun,
Leonard Berrada,
Razvan Pascanu,
Pier Giuseppe Sessa,
Robert Dadashi,
Léonard Hussenot,
Johan Ferret,
Sertan Girgin,
Olivier Bachem,
Alek Andreev,
Kathleen Kenealy,
Thomas Mesnard,
Cassidy Hardin,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti
, et al. (37 additional authors not shown)
Abstract:
We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var…
▽ More
We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Authors:
Mel Vecerik,
Carl Doersch,
Yi Yang,
Todor Davchev,
Yusuf Aytar,
Guangyao Zhou,
Raia Hadsell,
Lourdes Agapito,
Jon Scholz
Abstract:
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster a…
▽ More
For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.
△ Less
Submitted 31 August, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Authors:
Konstantinos Bousmalis,
Giulia Vezzani,
Dushyant Rao,
Coline Devin,
Alex X. Lee,
Maria Bauza,
Todor Davchev,
Yuxiang Zhou,
Agrim Gupta,
Akhil Raju,
Antoine Laurens,
Claudio Fantacci,
Valentin Dalibard,
Martina Zambelli,
Murilo Martins,
Rugile Pevceviciute,
Michiel Blokzijl,
Misha Denil,
Nathan Batchelor,
Thomas Lampe,
Emilio Parisotto,
Konrad Żołna,
Scott Reed,
Sergio Gómez Colmenarejo,
Jon Scholz
, et al. (14 additional authors not shown)
Abstract:
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de…
▽ More
The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.
△ Less
Submitted 22 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Barkour: Benchmarking Animal-level Agility with Quadruped Robots
Authors:
Ken Caluwaerts,
Atil Iscen,
J. Chase Kew,
Wenhao Yu,
Tingnan Zhang,
Daniel Freeman,
Kuang-Huei Lee,
Lisa Lee,
Stefano Saliceti,
Vincent Zhuang,
Nathan Batchelor,
Steven Bohez,
Federico Casarini,
Jose Enrique Chen,
Omar Cortes,
Erwin Coumans,
Adil Dostmohamed,
Gabriel Dulac-Arnold,
Alejandro Escontrela,
Erik Frey,
Roland Hafner,
Deepali Jain,
Bauyrjan Jyenis,
Yuheng Kuang,
Edward Lee
, et al. (19 additional authors not shown)
Abstract:
Animals have evolved various agile locomotion strategies, such as sprinting, lea**, and jum**. There is a growing interest in develo** legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agili…
▽ More
Animals have evolved various agile locomotion strategies, such as sprinting, lea**, and jum**. There is a growing interest in develo** legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agility. We introduce the Barkour benchmark, an obstacle course to quantify agility for legged robots. Inspired by dog agility competitions, it consists of diverse obstacles and a time based scoring mechanism. This encourages researchers to develop controllers that not only move fast, but do so in a controllable and versatile way. To set strong baselines, we present two methods for tackling the benchmark. In the first approach, we train specialist locomotion skills using on-policy reinforcement learning methods and combine them with a high-level navigation controller. In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived environment and robot states. Using a custom-built quadruped robot, we demonstrate that our method can complete the course at half the speed of a dog. We hope that our work represents a step towards creating controllers that enable robots to reach animal-level agility.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Authors:
Tuomas Haarnoja,
Ben Moran,
Guy Lever,
Sandy H. Huang,
Dhruva Tirumala,
Jan Humplik,
Markus Wulfmeier,
Saran Tunyasuvunakool,
Noah Y. Siegel,
Roland Hafner,
Michael Bloesch,
Kristian Hartikainen,
Arunkumar Byravan,
Leonard Hasenclever,
Yuval Tassa,
Fereshteh Sadeghi,
Nathan Batchelor,
Federico Casarini,
Stefano Saliceti,
Charles Game,
Neil Sreendra,
Kushal Patel,
Marlon Gwira,
Andrea Huber,
Nicole Hurley
, et al. (3 additional authors not shown)
Abstract:
We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust…
▽ More
We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives.
△ Less
Submitted 11 April, 2024; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Hindering Adversarial Attacks with Implicit Neural Representations
Authors:
Andrei A. Rusu,
Dan A. Calian,
Sven Gowal,
Raia Hadsell
Abstract:
We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-$10$ classifiers for perturbations up to $ε= 8/255$ in $L_\infty$ norm and $ε= 0.5$ in $L_2$ norm. Implicit neural representations are used to approximately encode pixel colour intensities in $2\text{D}$ images such that classifie…
▽ More
We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-$10$ classifiers for perturbations up to $ε= 8/255$ in $L_\infty$ norm and $ε= 0.5$ in $L_2$ norm. Implicit neural representations are used to approximately encode pixel colour intensities in $2\text{D}$ images such that classifiers trained on transformed data appear to have robustness to small perturbations without adversarial training or large drops in performance. The seed of the random number generator used to initialise and train the implicit neural representation turns out to be necessary information for stronger generic attacks, suggesting its role as a private key. We devise a Parametric Bypass Approximation (PBA) attack strategy for key-based defences, which successfully invalidates an existing method in this category. Interestingly, our LINAC defence also hinders some transfer and adaptive attacks, including our novel PBA strategy. Our results emphasise the importance of a broad range of customised attacks despite apparent robustness according to standard evaluations. LINAC source code and parameters of defended classifier evaluated throughout this submission are available: https://github.com/deepmind/linac
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Probing Transfer in Deep Reinforcement Learning without Task Engineering
Authors:
Andrei A. Rusu,
Sebastian Flennerhag,
Dushyant Rao,
Razvan Pascanu,
Raia Hadsell
Abstract:
We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents. Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway, making them progressively more challenging for human players. By formally or…
▽ More
We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents. Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway, making them progressively more challenging for human players. By formally organising these modifications into several factors of variation, we are able to show that Analyses of Variance (ANOVA) are a potent tool for studying the effects of human-relevant domain changes on the learning and transfer performance of a deep reinforcement learning agent. Since no manual task engineering is needed on our part, leveraging the original multi-factorial design avoids the pitfalls of unintentionally biasing the experimental setup. We find that game design factors have a large and statistically significant impact on an agent's ability to learn, and so do their combinatorial interactions. Furthermore, we show that zero-shot transfer from the basic games to their respective variations is possible, but the variance in performance is also largely explained by interactions between factors. As such, we argue that Atari game curricula offer a challenging benchmark for transfer learning in RL, that can help the community better understand the generalisation capabilities of RL agents along dimensions which meaningfully impact human generalisation performance. As a start, we report that value-function finetuning of regularly trained agents achieves positive transfer in a majority of cases, but significant headroom for algorithmic innovation remains. We conclude with the observation that selective transfer from multiple variants could further improve performance.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
MO2: Model-Based Offline Options
Authors:
Sasha Salter,
Markus Wulfmeier,
Dhruva Tirumala,
Nicolas Heess,
Martin Riedmiller,
Raia Hadsell,
Dushyant Rao
Abstract:
The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck states have been long sought after for inducing plans of minimum description length across tasks. Prior approaches have either only supported online, on-policy, bottl…
▽ More
The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck states have been long sought after for inducing plans of minimum description length across tasks. Prior approaches have either only supported online, on-policy, bottleneck state discovery, limiting sample-efficiency, or discrete state-action domains, restricting applicability. To address this, we introduce Model-Based Offline Options (MO2), an offline hindsight framework supporting sample-efficient bottleneck option discovery over continuous state-action spaces. Once bottleneck options are learnt offline over source domains, they are transferred online to improve exploration and value estimation on the transfer domain. Our experiments show that on complex long-horizon continuous control tasks with sparse, delayed rewards, MO2's properties are essential and lead to performance exceeding recent option learning methods. Additional ablations further demonstrate the impact on option predictability and credit assignment.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
The CLRS Algorithmic Reasoning Benchmark
Authors:
Petar Veličković,
Adrià Puigdomènech Badia,
David Budden,
Razvan Pascanu,
Andrea Banino,
Misha Dashevskiy,
Raia Hadsell,
Charles Blundell
Abstract:
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate…
▽ More
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate specific hypotheses, making results hard to transfer across publications, and increasing the barrier of entry. To consolidate progress and work towards unified evaluation, we propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms. We perform extensive experiments to demonstrate how several popular algorithmic reasoning baselines perform on these tasks, and consequently, highlight links to several open challenges. Our library is readily available at https://github.com/deepmind/clrs.
△ Less
Submitted 4 June, 2022; v1 submitted 31 May, 2022;
originally announced May 2022.
-
A Generalist Agent
Authors:
Scott Reed,
Konrad Zolna,
Emilio Parisotto,
Sergio Gomez Colmenarejo,
Alexander Novikov,
Gabriel Barth-Maron,
Mai Gimenez,
Yury Sulsky,
Jackie Kay,
Jost Tobias Springenberg,
Tom Eccles,
Jake Bruce,
Ali Razavi,
Ashley Edwards,
Nicolas Heess,
Yutian Chen,
Raia Hadsell,
Oriol Vinyals,
Mahyar Bordbar,
Nando de Freitas
Abstract:
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, dec…
▽ More
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
△ Less
Submitted 11 November, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors
Authors:
Steven Bohez,
Saran Tunyasuvunakool,
Philemon Brakel,
Fereshteh Sadeghi,
Leonard Hasenclever,
Yuval Tassa,
Emilio Parisotto,
Jan Humplik,
Tuomas Haarnoja,
Roland Hafner,
Markus Wulfmeier,
Michael Neunert,
Ben Moran,
Noah Siegel,
Andrea Huber,
Francesco Romano,
Nathan Batchelor,
Federico Casarini,
Josh Merel,
Raia Hadsell,
Nicolas Heess
Abstract:
We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our appro…
▽ More
We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies
Authors:
Dushyant Rao,
Fereshteh Sadeghi,
Leonard Hasenclever,
Markus Wulfmeier,
Martina Zambelli,
Giulia Vezzani,
Dhruva Tirumala,
Yusuf Aytar,
Josh Merel,
Nicolas Heess,
Raia Hadsell
Abstract:
For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model. In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent varia…
▽ More
For robots operating in the real world, it is desirable to learn reusable behaviours that can effectively be transferred and adapted to numerous tasks and scenarios. We propose an approach to learn abstract motor skills from data using a hierarchical mixture latent variable model. In contrast to existing work, our method exploits a three-level hierarchy of both discrete and continuous latent variables, to capture a set of high-level behaviours while allowing for variance in how they are executed. We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model. The resulting skills can be transferred and fine-tuned on new tasks, unseen objects, and from state to vision-based policies, yielding better sample efficiency and asymptotic performance compared to existing skill- and imitation-based methods. We further analyse how and when the skills are most beneficial: they encourage directed exploration to cover large regions of the state space relevant to the task, making them most effective in challenging sparse-reward settings.
△ Less
Submitted 14 March, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings
Authors:
Mel Vecerik,
Jackie Kay,
Raia Hadsell,
Lourdes Agapito,
Jon Scholz
Abstract:
Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points,…
▽ More
Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points, trading generality for accuracy. In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few, e.g. grasp points on a target object. Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding that indicates which point to track. Our central finding is that this approach provides the generality of dense-embedding models, while offering accuracy significantly closer to sparse-keypoint approaches. We present results illustrating this capacity vs. accuracy trade-off, and demonstrate the ability to zero-shot transfer to new object instances (within-class) using a real-robot pick-and-place task.
△ Less
Submitted 13 December, 2021; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes
Authors:
Alex X. Lee,
Coline Devin,
Yuxiang Zhou,
Thomas Lampe,
Konstantinos Bousmalis,
Jost Tobias Springenberg,
Arunkumar Byravan,
Abbas Abdolmaleki,
Nimrod Gileadi,
David Khosid,
Claudio Fantacci,
Jose Enrique Chen,
Akhil Raju,
Rae Jeong,
Michael Neunert,
Antoine Laurens,
Stefano Saliceti,
Federico Casarini,
Martin Riedmiller,
Raia Hadsell,
Francesco Nori
Abstract:
We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can ef…
▽ More
We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can efficiently handle multiple object combinations in the real world and exhibit a large variety of stacking skills. In a large experimental study, we investigate what choices matter for learning such general vision-based agents in simulation, and what affects optimal transfer to the real robot. We then leverage data collected by such policies and improve upon them with offline RL. A video and a blog post of our work are provided as supplementary material.
△ Less
Submitted 3 November, 2021; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Reasoning-Modulated Representations
Authors:
Petar Veličković,
Matko Bošnjak,
Thomas Kipf,
Alexander Lerchner,
Raia Hadsell,
Razvan Pascanu,
Charles Blundell
Abstract:
Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not purely opaque. Indeed, very often we may have access to information about the underlying system (e.g. that observations must obey certain laws of physics) that…
▽ More
Neural networks leverage robust internal representations in order to generalise. Learning them is difficult, and often requires a large training set that covers the data distribution densely. We study a common setting where our task is not purely opaque. Indeed, very often we may have access to information about the underlying system (e.g. that observations must obey certain laws of physics) that any "tabula rasa" neural network would need to re-learn from scratch, penalising performance. We incorporate this information into a pre-trained reasoning module, and investigate its role in sha** the discovered representations in diverse self-supervised learning settings from pixels. Our approach paves the way for a new class of representation learning, grounded in algorithmic priors.
△ Less
Submitted 3 December, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Tactile Sim-to-Real Policy Transfer via Real-to-Sim Image Translation
Authors:
Alex Church,
John Lloyd,
Raia Hadsell,
Nathan F. Lepora
Abstract:
Simulation has recently become key for deep reinforcement learning to safely and efficiently acquire general and complex control policies from visual and proprioceptive inputs. Tactile information is not usually considered despite its direct relation to environment interaction. In this work, we present a suite of simulated environments tailored towards tactile robotics and reinforcement learning.…
▽ More
Simulation has recently become key for deep reinforcement learning to safely and efficiently acquire general and complex control policies from visual and proprioceptive inputs. Tactile information is not usually considered despite its direct relation to environment interaction. In this work, we present a suite of simulated environments tailored towards tactile robotics and reinforcement learning. A simple and fast method of simulating optical tactile sensors is provided, where high-resolution contact geometry is represented as depth images. Proximal Policy Optimisation (PPO) is used to learn successful policies across all considered tasks. A data-driven approach enables translation of the current state of a real tactile sensor to corresponding simulated depth images. This policy is implemented within a real-time control loop on a physical robot to demonstrate zero-shot sim-to-real policy transfer on several physically-interactive tasks requiring a sense of touch.
△ Less
Submitted 31 October, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning
Authors:
Abbas Abdolmaleki,
Sandy H. Huang,
Giulia Vezzani,
Bobak Shahriari,
Jost Tobias Springenberg,
Shruti Mishra,
Dhruva TB,
Arunkumar Byravan,
Konstantinos Bousmalis,
Andras Gyorgy,
Csaba Szepesvari,
Raia Hadsell,
Nicolas Heess,
Martin Riedmiller
Abstract:
Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and au…
▽ More
Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and auxiliary objectives are in conflict, and in this paper we argue that this makes it natural to treat these cases as instances of multi-objective (MO) optimization problems. We demonstrate how this perspective allows us to develop novel and more effective RL algorithms. In particular, we focus on offline RL and finetuning as case studies, and show that existing approaches can be understood as MO algorithms relying on linear scalarization. We hypothesize that replacing linear scalarization with a better algorithm can improve performance. We introduce Distillation of a Mixture of Experts (DiME), a new MORL algorithm that outperforms linear scalarization and can be applied to these non-standard MO problems. We demonstrate that for offline RL, DiME leads to a simple new algorithm that outperforms state-of-the-art. For finetuning, we derive new algorithms that learn to outperform the teacher policy.
△ Less
Submitted 1 August, 2023; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Skillful Precipitation Nowcasting using Deep Generative Models of Radar
Authors:
Suman Ravuri,
Karel Lenc,
Matthew Willson,
Dmitry Kangin,
Remi Lam,
Piotr Mirowski,
Megan Fitzsimons,
Maria Athanassiadou,
Sheleem Kashem,
Sam Madge,
Rachel Prudden,
Amol Mandhane,
Aidan Clark,
Andrew Brock,
Karen Simonyan,
Raia Hadsell,
Niall Robinson,
Ellen Clancy,
Alberto Arribas,
Shakir Mohamed
Abstract:
Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socio-economic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initi…
▽ More
Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socio-economic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initiations. Recently introduced deep learning methods use radar to directly predict future rain rates, free of physical constraints. While they accurately predict low-intensity rainfall, their operational utility is limited because their lack of constraints produces blurry nowcasts at longer lead times, yielding poor performance on more rare medium-to-heavy rain events. To address these challenges, we present a Deep Generative Model for the probabilistic nowcasting of precipitation from radar. Our model produces realistic and spatio-temporally consistent predictions over regions up to 1536 km x 1280 km and with lead times from 5-90 min ahead. In a systematic evaluation by more than fifty expert forecasters from the Met Office, our generative model ranked first for its accuracy and usefulness in 88% of cases against two competitive methods, demonstrating its decision-making value and ability to provide physical insight to real-world experts. When verified quantitatively, these nowcasts are skillful without resorting to blurring. We show that generative nowcasting can provide probabilistic predictions that improve forecast value and support operational utility, and at resolutions and lead times where alternative methods struggle.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Learning rich touch representations through cross-modal self-supervision
Authors:
Martina Zambelli,
Yusuf Aytar,
Francesco Visin,
Yuxiang Zhou,
Raia Hadsell
Abstract:
The sense of touch is fundamental in several manipulation tasks, but rarely used in robot manipulation. In this work we tackle the problem of learning rich touch features from cross-modal self-supervision. We evaluate them identifying objects and their properties in a few-shot classification setting. Two new datasets are introduced using a simulated anthropomorphic robotic hand equipped with tacti…
▽ More
The sense of touch is fundamental in several manipulation tasks, but rarely used in robot manipulation. In this work we tackle the problem of learning rich touch features from cross-modal self-supervision. We evaluate them identifying objects and their properties in a few-shot classification setting. Two new datasets are introduced using a simulated anthropomorphic robotic hand equipped with tactile sensors on both synthetic and daily life objects. Several self-supervised learning methods are benchmarked on these datasets, by evaluating few-shot classification on unseen objects and poses. Our experiments indicate that cross-modal self-supervision effectively improves touch representation, and in turn has great potential to enhance robot manipulation skills.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency
Authors:
Mel Vecerik,
Jean-Baptiste Regli,
Oleg Sushkov,
David Barker,
Rugile Pevceviciute,
Thomas Rothörl,
Christopher Schuster,
Raia Hadsell,
Lourdes Agapito,
Jonathan Scholz
Abstract:
A robot's ability to act is fundamentally constrained by what it can perceive. Many existing approaches to visual representation learning utilize general-purpose training criteria, e.g. image reconstruction, smoothness in latent space, or usefulness for control, or else make use of large datasets annotated with specific features (bounding boxes, segmentations, etc.). However, both approaches often…
▽ More
A robot's ability to act is fundamentally constrained by what it can perceive. Many existing approaches to visual representation learning utilize general-purpose training criteria, e.g. image reconstruction, smoothness in latent space, or usefulness for control, or else make use of large datasets annotated with specific features (bounding boxes, segmentations, etc.). However, both approaches often struggle to capture the fine-detail required for precision tasks on specific objects, e.g. gras** and mating a plug and socket. We argue that these difficulties arise from a lack of geometric structure in these models. In this work we advocate semantic 3D keypoints as a visual representation, and present a semi-supervised training objective that can allow instance or category-level keypoints to be trained to 1-5 millimeter-accuracy with minimal supervision. Furthermore, unlike local texture-based approaches, our model integrates contextual information from a large area and is therefore robust to occlusion, noise, and lack of discernible texture. We demonstrate that this ability to locate semantic keypoints enables high level scripting of human understandable behaviours. Finally we show that these keypoints provide a good way to define reward functions for reinforcement learning and are a good representation for training agents.
△ Less
Submitted 13 October, 2020; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Deep Reinforcement Learning for Tactile Robotics: Learning to Type on a Braille Keyboard
Authors:
Alex Church,
John Lloyd,
Raia Hadsell,
Nathan F. Lepora
Abstract:
Artificial touch would seem well-suited for Reinforcement Learning (RL), since both paradigms rely on interaction with an environment. Here we propose a new environment and set of tasks to encourage development of tactile reinforcement learning: learning to type on a braille keyboard. Four tasks are proposed, progressing in difficulty from arrow to alphabet keys and from discrete to continuous act…
▽ More
Artificial touch would seem well-suited for Reinforcement Learning (RL), since both paradigms rely on interaction with an environment. Here we propose a new environment and set of tasks to encourage development of tactile reinforcement learning: learning to type on a braille keyboard. Four tasks are proposed, progressing in difficulty from arrow to alphabet keys and from discrete to continuous actions. A simulated counterpart is also constructed by sampling tactile data from the physical environment. Using state-of-the-art deep RL algorithms, we show that all of these tasks can be successfully learnt in simulation, and 3 out of 4 tasks can be learned on the real robot. A lack of sample efficiency currently makes the continuous alphabet task impractical on the robot. To the best of our knowledge, this work presents the first demonstration of successfully training deep RL agents in the real world using observations that exclusively consist of tactile images. To aid future research utilising this environment, the code for this project has been released along with designs of the braille keycaps for 3D printing and a guide for recreating the experiments. A brief video summary is also available at https://youtu.be/eNylCA2uE_E.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
A Distributional View on Multi-Objective Policy Optimization
Authors:
Abbas Abdolmaleki,
Sandy H. Huang,
Leonard Hasenclever,
Michael Neunert,
H. Francis Song,
Martina Zambelli,
Murilo F. Martins,
Nicolas Heess,
Raia Hadsell,
Martin Riedmiller
Abstract:
Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for obj…
▽ More
Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Disentangled Cumulants Help Successor Representations Transfer to New Tasks
Authors:
Christopher Grimm,
Irina Higgins,
Andre Barreto,
Denis Teplyashin,
Markus Wulfmeier,
Tim Hertweck,
Raia Hadsell,
Satinder Singh
Abstract:
Biological intelligence can learn to solve many diverse tasks in a data efficient manner by re-using basic knowledge and skills from one task to another. Furthermore, many of such skills are acquired without explicit supervision in an intrinsically driven fashion. This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch an…
▽ More
Biological intelligence can learn to solve many diverse tasks in a data efficient manner by re-using basic knowledge and skills from one task to another. Furthermore, many of such skills are acquired without explicit supervision in an intrinsically driven fashion. This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer. In this paper we propose a principled way to learn a basis set of policies, which, when recombined through generalised policy improvement, come with guarantees on the coverage of the final task space. In particular, we concentrate on solving goal-based downstream tasks where the execution order of actions is not important. We demonstrate both theoretically and empirically that learning a small number of policies that reach intrinsically specified goal regions in a disentangled latent space can be re-used to quickly achieve a high level of performance on an exponentially larger number of externally specified, often significantly more complex downstream tasks. Our learning pipeline consists of two stages. First, the agent learns to perform intrinsically generated, goal-based tasks in the total absence of environmental rewards. Second, the agent leverages this experience to quickly achieve a high level of performance on numerous diverse externally specified tasks.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Attention-Privileged Reinforcement Learning
Authors:
Sasha Salter,
Dushyant Rao,
Markus Wulfmeier,
Raia Hadsell,
Ingmar Posner
Abstract:
Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning ra…
▽ More
Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning rate and performance, and requires knowledge of potential variations during deployment. In this paper, we introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a self-supervised attention mechanism to significantly alleviate these drawbacks: by focusing on task-relevant aspects of the observations, attention provides robustness to distractors as well as significantly increased learning efficiency. APRiL trains two attention-augmented actor-critic agents: one purely based on image observations, available across training and transfer domains; and one with access to privileged information (such as environment states) available only during training. Experience is shared between both agents and their attention mechanisms are aligned. The image-based policy can then be deployed without access to privileged information. We experimentally demonstrate accelerated and more robust learning on a diverse set of domains, leading to improved final performance for environments both within and outside the training distribution.
△ Less
Submitted 11 January, 2021; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Continual Unsupervised Representation Learning
Authors:
Dushyant Rao,
Francesco Visin,
Andrei A. Rusu,
Yee Whye Teh,
Razvan Pascanu,
Raia Hadsell
Abstract:
Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially. Prior art in the field has largely considered supervised or reinforcement learning tasks, and often assumes full knowledge of task labels and boundaries. In this work, we propose an approach (CURL) to tackle a more gen…
▽ More
Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially. Prior art in the field has largely considered supervised or reinforcement learning tasks, and often assumes full knowledge of task labels and boundaries. In this work, we propose an approach (CURL) to tackle a more general problem that we will refer to as unsupervised continual learning. The focus is on learning representations without any knowledge about task identity, and we explore scenarios when there are abrupt changes between tasks, smooth transitions from one task to another, or even when the data is shuffled. The proposed approach performs task inference directly within the model, is able to dynamically expand to capture new concepts over its lifetime, and incorporates additional rehearsal-based techniques to deal with catastrophic forgetting. We demonstrate the efficacy of CURL in an unsupervised learning setting with MNIST and Omniglot, where the lack of labels ensures no information is leaked about the task. Further, we demonstrate strong performance compared to prior art in an i.i.d setting, or when adapting the technique to supervised tasks such as incremental class learning.
△ Less
Submitted 31 October, 2019;
originally announced October 2019.
-
Neural Execution of Graph Algorithms
Authors:
Petar Veličković,
Rex Ying,
Matilde Padovano,
Raia Hadsell,
Charles Blundell
Abstract:
Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art G…
▽ More
Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first search, Bellman-Ford) as well as sequential (Prim's algorithm). As graph algorithms usually rely on making discrete decisions within neighbourhoods, we hypothesise that maximisation-based message passing neural networks are best-suited for such objectives, and validate this claim empirically. We also demonstrate how learning in the space of algorithms can yield new opportunities for positive transfer between tasks---showing how learning a shortest-path algorithm can be substantially improved when simultaneously learning a reachability algorithm.
△ Less
Submitted 15 January, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Stabilizing Transformers for Reinforcement Learning
Authors:
Emilio Parisotto,
H. Francis Song,
Jack W. Rae,
Razvan Pascanu,
Caglar Gulcehre,
Siddhant M. Jayakumar,
Max Jaderberg,
Raphael Lopez Kaufman,
Aidan Clark,
Seb Noury,
Matthew M. Botvinick,
Nicolas Heess,
Raia Hadsell
Abstract:
Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons o…
▽ More
Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve the stability and learning speed of the original Transformer and XL variant. The proposed architecture, the Gated Transformer-XL (GTrXL), surpasses LSTMs on challenging memory environments and achieves state-of-the-art results on the multi-task DMLab-30 benchmark suite, exceeding the performance of an external memory architecture. We show that the GTrXL, trained using the same losses, has stability and performance that consistently matches or exceeds a competitive LSTM baseline, including on more reactive tasks where memory is less critical. GTrXL offers an easy-to-train, simple-to-implement but substantially more expressive architectural alternative to the standard multi-layer LSTM ubiquitously used for RL agents in partially observable environments.
△ Less
Submitted 13 October, 2019;
originally announced October 2019.
-
Meta-Learning with Warped Gradient Descent
Authors:
Sebastian Flennerhag,
Andrei A. Rusu,
Razvan Pascanu,
Francesco Visin,
Hujun Yin,
Raia Hadsell
Abstract:
Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of thes…
▽ More
Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of these approaches pose challenges. On one hand, directly producing an update forgoes a useful inductive bias and can easily lead to non-converging behaviour. On the other hand, approaches that try to control a gradient-based update rule typically resort to computing gradients through the learning process to obtain their meta-gradients, leading to methods that can not scale beyond few-shot task adaptation. In this work, we propose Warped Gradient Descent (WarpGrad), a method that intersects these approaches to mitigate their limitations. WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution. Preconditioning arises by interleaving non-linear layers, referred to as warp-layers, between the layers of a task-learner. Warp-layers are meta-learned without backpropagating through the task training process in a manner similar to methods that learn to directly produce updates. WarpGrad is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems. We provide a geometrical interpretation of the approach and evaluate its effectiveness in a variety of settings, including few-shot, standard supervised, continual and reinforcement learning.
△ Less
Submitted 18 February, 2020; v1 submitted 30 August, 2019;
originally announced September 2019.
-
Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning
Authors:
Sandy H. Huang,
Martina Zambelli,
Jackie Kay,
Murilo F. Martins,
Yuval Tassa,
Patrick M. Pilarski,
Raia Hadsell
Abstract:
Robots must know how to be gentle when they need to interact with fragile objects, or when the robot itself is prone to wear and tear. We propose an approach that enables deep reinforcement learning to train policies that are gentle, both during exploration and task execution. In a reward-based learning environment, a natural approach involves augmenting the (task) reward with a penalty for non-ge…
▽ More
Robots must know how to be gentle when they need to interact with fragile objects, or when the robot itself is prone to wear and tear. We propose an approach that enables deep reinforcement learning to train policies that are gentle, both during exploration and task execution. In a reward-based learning environment, a natural approach involves augmenting the (task) reward with a penalty for non-gentleness, which can be defined as excessive impact force. However, augmenting with only this penalty impairs learning: policies get stuck in a local optimum which avoids all contact with the environment. Prior research has shown that combining auxiliary tasks or intrinsic rewards can be beneficial for stabilizing and accelerating learning in sparse-reward domains, and indeed we find that introducing a surprise-based intrinsic reward does avoid the no-contact failure case. However, we show that a simple dynamics-based surprise is not as effective as penalty-based surprise. Penalty-based surprise, based on predicting forceful contacts, has a further benefit: it encourages exploration which is contact-rich yet gentle. We demonstrate the effectiveness of the approach using a complex, tendon-powered robot hand with tactile sensors. Videos are available at http://sites.google.com/view/gentlemanipulation.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
The StreetLearn Environment and Dataset
Authors:
Piotr Mirowski,
Andras Banki-Horvath,
Keith Anderson,
Denis Teplyashin,
Karl Moritz Hermann,
Mateusz Malinowski,
Matthew Koichi Grimes,
Karen Simonyan,
Koray Kavukcuoglu,
Andrew Zisserman,
Raia Hadsell
Abstract:
Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for dec…
▽ More
Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
Learning To Follow Directions in Street View
Authors:
Karl Moritz Hermann,
Mateusz Malinowski,
Piotr Mirowski,
Andras Banki-Horvath,
Keith Anderson,
Raia Hadsell
Abstract:
Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data.…
▽ More
Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, models that establish strong baselines, and extensive analysis of the task and the trained agents.
△ Less
Submitted 21 November, 2019; v1 submitted 1 March, 2019;
originally announced March 2019.
-
Value constrained model-free continuous control
Authors:
Steven Bohez,
Abbas Abdolmaleki,
Michael Neunert,
Jonas Buchli,
Nicolas Heess,
Raia Hadsell
Abstract:
The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially as bang-bang control. Although such solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang control may lead to incre…
▽ More
The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially as bang-bang control. Although such solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang control may lead to increased wear and tear or energy consumption, and tends to excite undesired second-order dynamics. To counteract this issue, multi-objective optimization can be used to simultaneously optimize both the reward and some auxiliary cost that discourages undesired (e.g. high-amplitude) control. In principle, such an approach can yield the sought after, smooth, control policies. It can, however, be hard to find the correct trade-off between cost and return that results in the desired behavior. In this paper we propose a new constraint-based reinforcement learning approach that ensures task success while minimizing one or more auxiliary costs (such as control effort). We employ Lagrangian relaxation to learn both (a) the parameters of a control policy that satisfies the desired constraints and (b) the Lagrangian multipliers for the optimization. Moreover, we demonstrate that we can satisfy constraints either in expectation or in a per-step fashion, and can even learn a single policy that is able to dynamically trade-off between return and cost. We demonstrate the efficacy of our approach using a number of continuous control benchmark tasks, a realistic, energy-optimized quadruped locomotion task, as well as a reaching task on a real robot arm.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Gras** via Randomized-to-Canonical Adaptation Networks
Authors:
Stephen James,
Paul Wohlhart,
Mrinal Kalakrishnan,
Dmitry Kalashnikov,
Alex Irpan,
Julian Ibarz,
Sergey Levine,
Raia Hadsell,
Konstantinos Bousmalis
Abstract:
Real world data, especially in the domain of robotics, is notoriously costly to collect. One way to circumvent this can be to leverage the power of simulation to produce large amounts of labelled data. However, training models on simulated images does not readily transfer to real-world ones. Using domain adaptation methods to cross this "reality gap" requires a large amount of unlabelled real-worl…
▽ More
Real world data, especially in the domain of robotics, is notoriously costly to collect. One way to circumvent this can be to leverage the power of simulation to produce large amounts of labelled data. However, training models on simulated images does not readily transfer to real-world ones. Using domain adaptation methods to cross this "reality gap" requires a large amount of unlabelled real-world data, whilst domain randomization alone can waste modeling power. In this paper, we present Randomized-to-Canonical Adaptation Networks (RCANs), a novel approach to crossing the visual reality gap that uses no real-world data. Our method learns to translate randomized rendered images into their equivalent non-randomized, canonical versions. This in turn allows for real images to also be translated into canonical sim images. We demonstrate the effectiveness of this sim-to-real approach by training a vision-based closed-loop gras** reinforcement learning agent in simulation, and then transferring it to the real world to attain 70% zero-shot grasp success on unseen objects, a result that almost doubles the success of learning the same task directly on domain randomization alone. Additionally, by joint finetuning in the real-world with only 5,000 real-world grasps, our method achieves 91%, attaining comparable performance to a state-of-the-art system trained with 580,000 real-world grasps, resulting in a reduction of real-world data by more than 99%.
△ Less
Submitted 21 July, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
From pixels to percepts: Highly robust edge perception and contour following using deep learning and an optical biomimetic tactile sensor
Authors:
Nathan F. Lepora,
Alex Church,
Conrad De Kerckhove,
Raia Hadsell,
John Lloyd
Abstract:
Deep learning has the potential to have the impact on robot touch that it has had on robot vision. Optical tactile sensors act as a bridge between the subjects by allowing techniques from vision to be applied to touch. In this paper, we apply deep learning to an optical biomimetic tactile sensor, the TacTip, which images an array of papillae (pins) inside its sensing surface analogous to structure…
▽ More
Deep learning has the potential to have the impact on robot touch that it has had on robot vision. Optical tactile sensors act as a bridge between the subjects by allowing techniques from vision to be applied to touch. In this paper, we apply deep learning to an optical biomimetic tactile sensor, the TacTip, which images an array of papillae (pins) inside its sensing surface analogous to structures within human skin. Our main result is that the application of a deep CNN can give reliable edge perception and thus a robust policy for planning contact points to move around object contours. Robustness is demonstrated over several irregular and compliant objects with both tap** and continuous sliding, using a model trained only by tap** onto a disk. These results relied on using techniques to encourage generalization to tasks beyond which the model was trained. We expect this is a generic problem in practical applications of tactile sensing that deep learning will solve. A video demonstrating the approach can be found at https://www.youtube.com/watch?v=QHrGsG9AHts
△ Less
Submitted 6 February, 2019; v1 submitted 7 December, 2018;
originally announced December 2018.
-
Meta-Learning with Latent Embedding Optimization
Authors:
Andrei A. Rusu,
Dushyant Rao,
Jakub Sygnowski,
Oriol Vinyals,
Razvan Pascanu,
Simon Osindero,
Raia Hadsell
Abstract:
Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of mod…
▽ More
Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space. The resulting approach, latent embedding optimization (LEO), decouples the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.
△ Less
Submitted 26 March, 2019; v1 submitted 16 July, 2018;
originally announced July 2018.
-
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Authors:
Jake Bruce,
Niko Sünderhauf,
Piotr Mirowski,
Raia Hadsell,
Michael Milford
Abstract:
Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, f…
▽ More
Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Graph networks as learnable physics engines for inference and control
Authors:
Alvaro Sanchez-Gonzalez,
Nicolas Heess,
Jost Tobias Springenberg,
Josh Merel,
Martin Riedmiller,
Raia Hadsell,
Peter Battaglia
Abstract:
Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical sys…
▽ More
Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.
△ Less
Submitted 4 June, 2018;
originally announced June 2018.
-
Progress & Compress: A scalable framework for continual learning
Authors:
Jonathan Schwarz,
Jelena Luketina,
Wojciech M. Czarnecki,
Agnieszka Grabska-Barwinska,
Yee Whye Teh,
Razvan Pascanu,
Raia Hadsell
Abstract:
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of…
▽ More
We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.
△ Less
Submitted 2 July, 2018; v1 submitted 16 May, 2018;
originally announced May 2018.
-
The Limits and Potentials of Deep Learning for Robotics
Authors:
Niko Sünderhauf,
Oliver Brock,
Walter Scheirer,
Raia Hadsell,
Dieter Fox,
Jürgen Leitner,
Ben Upcroft,
Pieter Abbeel,
Wolfram Burgard,
Michael Milford,
Peter Corke
Abstract:
The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique ch…
▽ More
The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and help fulfill the promising potentials of deep learning in robotics.
△ Less
Submitted 18 April, 2018;
originally announced April 2018.
-
Learning to Navigate in Cities Without a Map
Authors:
Piotr Mirowski,
Matthew Koichi Grimes,
Mateusz Malinowski,
Karl Moritz Hermann,
Keith Anderson,
Denis Teplyashin,
Karen Simonyan,
Koray Kavukcuoglu,
Andrew Zisserman,
Raia Hadsell
Abstract:
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on develo** an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support cont…
▽ More
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on develo** an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn
△ Less
Submitted 9 January, 2019; v1 submitted 31 March, 2018;
originally announced April 2018.
-
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
Authors:
Yuke Zhu,
Ziyu Wang,
Josh Merel,
Andrei Rusu,
Tom Erez,
Serkan Cabi,
Saran Tunyasuvunakool,
János Kramár,
Raia Hadsell,
Nando de Freitas,
Nicolas Heess
Abstract:
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which en…
▽ More
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. In experiments, our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0
△ Less
Submitted 27 May, 2018; v1 submitted 26 February, 2018;
originally announced February 2018.
-
One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay
Authors:
Jake Bruce,
Niko Suenderhauf,
Piotr Mirowski,
Raia Hadsell,
Michael Milford
Abstract:
Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fi…
▽ More
Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning.
△ Less
Submitted 28 November, 2017; v1 submitted 28 November, 2017;
originally announced November 2017.
-
Distral: Robust Multitask Reinforcement Learning
Authors:
Yee Whye Teh,
Victor Bapst,
Wojciech Marian Czarnecki,
John Quan,
James Kirkpatrick,
Raia Hadsell,
Nicolas Heess,
Razvan Pascanu
Abstract:
Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from d…
▽ More
Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Overcoming catastrophic forgetting in neural networks
Authors:
James Kirkpatrick,
Razvan Pascanu,
Neil Rabinowitz,
Joel Veness,
Guillaume Desjardins,
Andrei A. Rusu,
Kieran Milan,
John Quan,
Tiago Ramalho,
Agnieszka Grabska-Barwinska,
Demis Hassabis,
Claudia Clopath,
Dharshan Kumaran,
Raia Hadsell
Abstract:
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have…
▽ More
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.
△ Less
Submitted 25 January, 2017; v1 submitted 2 December, 2016;
originally announced December 2016.
-
Learning to Navigate in Complex Environments
Authors:
Piotr Mirowski,
Razvan Pascanu,
Fabio Viola,
Hubert Soyer,
Andrew J. Ballard,
Andrea Banino,
Misha Denil,
Ross Goroshin,
Laurent Sifre,
Koray Kavukcuoglu,
Dharshan Kumaran,
Raia Hadsell
Abstract:
Learning to navigate in complex environments with dynamic elements is an important milestone in develo** AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly lea…
▽ More
Learning to navigate in complex environments with dynamic elements is an important milestone in develo** AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.
△ Less
Submitted 13 January, 2017; v1 submitted 11 November, 2016;
originally announced November 2016.
-
Sim-to-Real Robot Learning from Pixels with Progressive Nets
Authors:
Andrei A. Rusu,
Mel Vecerik,
Thomas Rothörl,
Nicolas Heess,
Razvan Pascanu,
Raia Hadsell
Abstract:
Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the…
▽ More
Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world. The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills. We present an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging the reality gap. Unlike other proposed approaches, our real-world experiments demonstrate successful task learning from raw visual input on a fully actuated robot manipulator. Moreover, rather than relying on model-based trajectory optimisation, the task learning is accomplished using only deep reinforcement learning and sparse rewards.
△ Less
Submitted 22 May, 2018; v1 submitted 13 October, 2016;
originally announced October 2016.
-
Progressive Neural Networks
Authors:
Andrei A. Rusu,
Neil C. Rabinowitz,
Guillaume Desjardins,
Hubert Soyer,
James Kirkpatrick,
Koray Kavukcuoglu,
Razvan Pascanu,
Raia Hadsell
Abstract:
Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architec…
▽ More
Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
△ Less
Submitted 22 October, 2022; v1 submitted 15 June, 2016;
originally announced June 2016.
-
Policy Distillation
Authors:
Andrei A. Rusu,
Sergio Gomez Colmenarejo,
Caglar Gulcehre,
Guillaume Desjardins,
James Kirkpatrick,
Razvan Pascanu,
Volodymyr Mnih,
Koray Kavukcuoglu,
Raia Hadsell
Abstract:
Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and…
▽ More
Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.
△ Less
Submitted 7 January, 2016; v1 submitted 19 November, 2015;
originally announced November 2015.