Skip to main content

Showing 1–50 of 59 results for author: Riedmiller, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02425  [pdf, other

    cs.RO cs.AI

    Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning

    Authors: Dhruva Tirumala, Markus Wulfmeier, Ben Moran, Sandy Huang, Jan Humplik, Guy Lever, Tuomas Haarnoja, Leonard Hasenclever, Arunkumar Byravan, Nathan Batchelor, Neil Sreendra, Kushal Patel, Marlon Gwira, Francesco Nori, Martin Riedmiller, Nicolas Heess

    Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-b… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2402.06102  [pdf, other

    cs.RO cs.LG

    Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning

    Authors: Mohak Bhardwaj, Thomas Lampe, Michael Neunert, Francesco Romano, Abbas Abdolmaleki, Arunkumar Byravan, Markus Wulfmeier, Martin Riedmiller, Jonas Buchli

    Abstract: Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work,… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2402.05546  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Actor-Critic Reinforcement Learning Scales to Large Models

    Authors: Jost Tobias Springenberg, Abbas Abdolmaleki, **gwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  4. arXiv:2312.11374  [pdf, other

    cs.RO

    Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

    Authors: Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, **gwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

    Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2312.09120  [pdf, other

    cs.LG cs.AI cs.RO

    Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning

    Authors: Martin Riedmiller, Tim Hertweck, Roland Hafner

    Abstract: Humans instinctively know how to neglect details when it comes to solve complex decision making problems in environments with unforeseeable variations. This abstraction process seems to be a vital property for most biological systems and helps to 'abstract away' unnecessary details and boost generalisation. In this work we introduce the dispatcher/ executor principle for the design of multi-task R… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 13 pages, 9 figures. Videos showing the results can be found at https://sites.google.com/view/dispatcher-executor

  6. arXiv:2311.15951  [pdf, other

    cs.LG cs.AI cs.RO

    Replay across Experiments: A Natural Extension of Off-Policy RL

    Authors: Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess, Markus Wulfmeier

    Abstract: Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involve… ▽ More

    Submitted 28 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  7. arXiv:2309.07578  [pdf, other

    cs.LG cs.AI cs.RO

    Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

    Authors: Cristina Pinneri, Sarah Bechtle, Markus Wulfmeier, Arunkumar Byravan, **gwei Zhang, William F. Whitney, Martin Riedmiller

    Abstract: We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with re… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  8. arXiv:2308.15470  [pdf, other

    cs.LG

    Policy composition in reinforcement learning via multi-objective policy optimization

    Authors: Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

    Abstract: We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies… ▽ More

    Submitted 30 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  9. arXiv:2308.07741  [pdf, other

    cs.RO cs.LG

    Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

    Authors: Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabas Gavin Cangan, Bernhard Schölkopf, Georg Martius

    Abstract: Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore… ▽ More

    Submitted 24 November, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Typo in author list fixed

  10. arXiv:2307.11546  [pdf, other

    physics.plasm-ph cs.LG

    Towards practical reinforcement learning for tokamak magnetic control

    Authors: Brendan D. Tracey, Andrea Michi, Yuri Chervonyi, Ian Davies, Cosmin Paduraru, Nevena Lazic, Federico Felici, Timo Ewalds, Craig Donner, Cristian Galperti, Jonas Buchli, Michael Neunert, Andrea Huber, Jonathan Evens, Paula Kurylowicz, Daniel J. Mankowitz, Martin Riedmiller, The TCV Team

    Abstract: Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the stea… ▽ More

    Submitted 5 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

  11. arXiv:2307.09668  [pdf, other

    cs.RO cs.AI cs.LG

    Towards A Unified Agent with Foundation Models

    Authors: Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, Martin Riedmiller

    Abstract: Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core rea… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  12. arXiv:2306.11706  [pdf, other

    cs.RO cs.LG

    RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

    Authors: Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz , et al. (14 additional authors not shown)

    Abstract: The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Transactions on Machine Learning Research (12/2023)

  13. arXiv:2305.10912  [pdf, other

    cs.AI cs.RO

    A Generalist Dynamics Model for Control

    Authors: Ingmar Schubert, **gwei Zhang, Jake Bruce, Sarah Bechtle, Emilio Parisotto, Martin Riedmiller, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Nicolas Heess

    Abstract: We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any fu… ▽ More

    Submitted 23 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  14. arXiv:2302.12617  [pdf, other

    cs.RO cs.AI cs.LG

    Leveraging Jumpy Models for Planning and Fast Learning in Robotic Domains

    Authors: **gwei Zhang, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao, Nicolas Heess, Martin Riedmiller

    Abstract: In this paper we study the problem of learning multi-step dynamics prediction models (jumpy models) from unlabeled experience and their utility for fast inference of (high-level) plans in downstream tasks. In particular we propose to learn a jumpy model alongside a skill embedding space offline, from previously collected experience for which no labels or reward annotations are required. We then in… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  15. arXiv:2211.13743  [pdf, other

    cs.LG cs.AI cs.RO

    SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

    Authors: Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas Abdolmaleki, Ben Moran, Tuomas Haarnoja, Jan Humplik, Roland Hafner, Michael Neunert, Claudio Fantacci, Tim Hertweck, Thomas Lampe, Fereshteh Sadeghi, Nicolas Heess, Martin Riedmiller

    Abstract: The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert be… ▽ More

    Submitted 11 January, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  16. arXiv:2210.12566  [pdf, other

    cs.LG cs.AI cs.RO

    Solving Continuous Control via Q-learning

    Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller, Daniela Rus, Markus Wulfmeier

    Abstract: While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a… ▽ More

    Submitted 25 September, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

  17. arXiv:2209.01947  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    MO2: Model-Based Offline Options

    Authors: Sasha Salter, Markus Wulfmeier, Dhruva Tirumala, Nicolas Heess, Martin Riedmiller, Raia Hadsell, Dushyant Rao

    Abstract: The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence. Inspired by neuroscience, discovering behaviours that switch at bottleneck states have been long sought after for inducing plans of minimum description length across tasks. Prior approaches have either only supported online, on-policy, bottl… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: Accepted at 1st Conference on Lifelong Learning Agents (CoLLAs) Conference Track, 2022

  18. arXiv:2204.10256  [pdf, other

    cs.LG cs.AI

    Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

    Authors: Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, Siqi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

    Abstract: Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value est… ▽ More

    Submitted 22 April, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

  19. arXiv:2201.11861  [pdf, other

    cs.LG

    The Challenges of Exploration for Offline Reinforcement Learning

    Authors: Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

    Abstract: Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  20. arXiv:2111.02552  [pdf, other

    cs.LG cs.AI cs.RO

    Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

    Authors: Tim Seyde, Igor Gilitschenski, Wilko Schwarting, Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

    Abstract: Reinforcement learning (RL) for continuous control typically employs distributions whose support covers the entire action space. In this work, we investigate the colloquially known phenomenon that trained agents often prefer actions at the boundaries of that space. We draw theoretical connections to the emergence of bang-bang behavior in optimal control, and provide extensive empirical evaluation… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  21. arXiv:2110.06192  [pdf, other

    cs.RO cs.LG

    Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes

    Authors: Alex X. Lee, Coline Devin, Yuxiang Zhou, Thomas Lampe, Konstantinos Bousmalis, Jost Tobias Springenberg, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori

    Abstract: We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can ef… ▽ More

    Submitted 3 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: CoRL 2021. Video: https://dpmd.ai/robotics-stacking-YT . Blog: https://dpmd.ai/robotics-stacking . Code: https://github.com/deepmind/rgb_stacking

  22. arXiv:2110.03363  [pdf, other

    cs.RO cs.AI cs.LG

    Evaluating model-based planning and planner amortization for continuous control

    Authors: Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller

    Abstract: There is a widespread intuition that model-based control methods should be able to surpass the data efficiency of model-free approaches. In this paper we attempt to evaluate this intuition on various challenging locomotion tasks. We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning; the learned policy serves as a proposal for MPC.… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 9 pages main text, 30 pages with references and appendix including several ablations and additional experiments. Submitted to ICLR 2022

  23. arXiv:2109.08603  [pdf, other

    cs.LG cs.NE cs.RO

    Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

    Authors: Oliver Groth, Markus Wulfmeier, Giulia Vezzani, Vibhavari Dasagi, Tim Hertweck, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks. However, as the agent learns to reach previously unexplored spaces and the objective adapts to reward new areas, many behaviours emerge only to disappear due to being overwritten by the constantly shifting objective. We argue that merely… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: 14 pages, 7 figures, 2 tables

    ACM Class: I.2.6; I.2.9

  24. arXiv:2108.10273  [pdf, other

    cs.LG

    Collect & Infer -- a fresh look at data-efficient Reinforcement Learning

    Authors: Martin Riedmiller, Jost Tobias Springenberg, Roland Hafner, Nicolas Heess

    Abstract: This position paper proposes a fresh look at Reinforcement Learning (RL) from the perspective of data-efficiency. Data-efficient RL has gone through three major stages: pure on-line RL where every data-point is considered only once, RL with a replay buffer where additional learning is done on a portion of the experience, and finally transition memory based RL, where, conceptually, all transitions… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  25. arXiv:2106.08199  [pdf, other

    cs.LG cs.RO

    On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

    Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and au… ▽ More

    Submitted 1 August, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  26. arXiv:2101.09458  [pdf, other

    cs.LG

    Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

    Authors: William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller

    Abstract: Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers… ▽ More

    Submitted 1 July, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

  27. arXiv:2011.01758  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Representation Matters: Improving Perception and Exploration for Robotics

    Authors: Markus Wulfmeier, Arunkumar Byravan, Tim Hertweck, Irina Higgins, Ankush Gupta, Tejas Kulkarni, Malcolm Reynolds, Denis Teplyashin, Roland Hafner, Thomas Lampe, Martin Riedmiller

    Abstract: Projecting high-dimensional environment observations into lower-dimensional structured representations can considerably improve data-efficiency for reinforcement learning in domains with limited data such as robotics. Can a single generally useful representation be found? In order to answer this question, it is important to understand how the representation will be used by the agent and what prope… ▽ More

    Submitted 21 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Published at ICRA 2021

  28. arXiv:2010.15492  [pdf, other

    cs.RO

    "What, not how": Solving an under-actuated insertion task from scratch

    Authors: Giulia Vezzani, Michael Neunert, Markus Wulfmeier, Rae Jeong, Thomas Lampe, Noah Siegel, Roland Hafner, Abbas Abdolmaleki, Martin Riedmiller, Francesco Nori

    Abstract: Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as gras** an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete… ▽ More

    Submitted 30 October, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

  29. arXiv:2010.10644  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

    Authors: Daniel J. Mankowitz, Dan A. Calian, Rae Jeong, Cosmin Paduraru, Nicolas Heess, Sumanth Dathathri, Martin Riedmiller, Timothy Mann

    Abstract: Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the… ▽ More

    Submitted 3 March, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

  30. arXiv:2010.05545  [pdf, other

    cs.LG cs.AI stat.ML

    Local Search for Policy Iteration in Continuous Control

    Authors: Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller

    Abstract: We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  31. arXiv:2008.12228  [pdf, other

    cs.RO cs.AI cs.LG stat.ML

    Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

    Authors: Roland Hafner, Tim Hertweck, Philipp Klöppner, Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

    Abstract: Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to tr… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  32. arXiv:2007.15588  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Data-efficient Hindsight Off-policy Option Learning

    Authors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option fr… ▽ More

    Submitted 15 June, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Published at ICML2021

  33. arXiv:2005.07541  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Simple Sensor Intentions for Exploration

    Authors: Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

    Abstract: Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic sy… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  34. arXiv:2005.07513  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    A Distributional View on Multi-Objective Policy Optimization

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Leonard Hasenclever, Michael Neunert, H. Francis Song, Martina Zambelli, Murilo F. Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller

    Abstract: Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for obj… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  35. arXiv:2002.08396  [pdf, other

    cs.LG cs.RO stat.ML

    Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

    Authors: Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In th… ▽ More

    Submitted 17 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    ACM Class: I.2.6; I.2.9

    Journal ref: ICLR 2020

  36. arXiv:2001.00449  [pdf, other

    cs.LG cs.RO stat.ML

    Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

    Authors: Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller

    Abstract: Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

    Comments: Presented at the 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan. Video: https://youtu.be/eUqQDLQXb7I

  37. arXiv:1911.01831  [pdf, other

    cs.LG cs.NE

    Quinoa: a Q-function You Infer Normalized Over Actions

    Authors: Jonas Degrave, Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Martin Riedmiller

    Abstract: We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form. We use recent advances in normalising flows for parametrising the policy together with a learned value-function; and show how this combination can be used to implicitly represent Q-… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: Deep RL Workshop/NeurIPS

  38. arXiv:1910.04142  [pdf, other

    cs.RO cs.AI cs.CV cs.LG cs.NE

    Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

    Authors: Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predicti… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: To appear at the 3rd annual Conference on Robot Learning, Osaka, Japan (CoRL 2019). 24 pages including appendix (main paper - 8 pages)

  39. arXiv:1909.12238  [pdf, other

    cs.AI cs.LG

    V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

    Authors: H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick

    Abstract: Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: * equal contribution

  40. arXiv:1906.11228  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Compositional Transfer in Hierarchical Reinforcement Learning

    Authors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multip… ▽ More

    Submitted 19 May, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: Robotics Science and Systems 2020

  41. arXiv:1906.07516  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Reinforcement Learning for Continuous Control with Model Misspecification

    Authors: Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, Martin Riedmiller

    Abstract: We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a… ▽ More

    Submitted 11 February, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

  42. arXiv:1902.04706  [pdf, other

    cs.LG cs.RO stat.ML

    Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

    Authors: Devin Schwab, Tobias Springenberg, Murilo F. Martins, Thomas Lampe, Michael Neunert, Abbas Abdolmaleki, Tim Hertweck, Roland Hafner, Francesco Nori, Martin Riedmiller

    Abstract: We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-ti… ▽ More

    Submitted 18 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: Videos can be found at https://sites.google.com/view/rss-2019-sawyer-bic/

  43. arXiv:1901.00943  [pdf, other

    cs.LG cs.AI cs.NE cs.RO

    Self-supervised Learning of Image Embedding for Continuous Control

    Authors: Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, Martin Riedmiller

    Abstract: Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches assume the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy a… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: Contributed talk at Inference to Control workshop at NeurIPS2018

  44. arXiv:1812.02256  [pdf, other

    cs.LG stat.ML

    Relative Entropy Regularized Policy Iteration

    Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller

    Abstract: We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  45. arXiv:1806.06920  [pdf, other

    cs.LG cs.AI cs.IT cs.RO stat.ML

    Maximum a Posteriori Policy Optimisation

    Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

    Abstract: We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

  46. arXiv:1806.01242  [pdf, other

    cs.LG cs.AI stat.ML

    Graph networks as learnable physics engines for inference and control

    Authors: Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia

    Abstract: Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical sys… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

    Comments: ICML 2018

  47. arXiv:1802.10567  [pdf, other

    cs.LG cs.RO stat.ML

    Learning by Playing - Solving Sparse Reward Tasks from Scratch

    Authors: Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg

    Abstract: We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors - from scratch - in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is t… ▽ More

    Submitted 28 February, 2018; originally announced February 2018.

    Comments: A video of the rich set of learned behaviours can be found at https://youtu.be/mPKyvocNe_M

  48. arXiv:1801.00690  [pdf, other

    cs.AI

    DeepMind Control Suite

    Authors: Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller

    Abstract: The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. The tasks are written in Python and powered by the MuJoCo physics engine, making them easy to use and modify. We include benchmarks for several learning algorithms. The Control Suite is publicly avail… ▽ More

    Submitted 2 January, 2018; originally announced January 2018.

    Comments: 24 pages, 7 figures, 2 tables

  49. arXiv:1707.08817  [pdf, other

    cs.AI

    Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

    Authors: Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller

    Abstract: We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mecha… ▽ More

    Submitted 8 October, 2018; v1 submitted 27 July, 2017; originally announced July 2017.

  50. arXiv:1707.02286  [pdf, other

    cs.AI

    Emergence of Locomotion Behaviours in Rich Environments

    Authors: Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver

    Abstract: The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically… ▽ More

    Submitted 10 July, 2017; v1 submitted 7 July, 2017; originally announced July 2017.