Skip to main content

Showing 1–35 of 35 results for author: Abdolmaleki, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.06102  [pdf, other

    cs.RO cs.LG

    Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning

    Authors: Mohak Bhardwaj, Thomas Lampe, Michael Neunert, Francesco Romano, Abbas Abdolmaleki, Arunkumar Byravan, Markus Wulfmeier, Martin Riedmiller, Jonas Buchli

    Abstract: Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work,… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  2. arXiv:2402.05546  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Actor-Critic Reinforcement Learning Scales to Large Models

    Authors: Jost Tobias Springenberg, Abbas Abdolmaleki, **gwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2312.11374  [pdf, other

    cs.RO

    Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

    Authors: Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, **gwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

    Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2308.15470  [pdf, other

    cs.LG

    Policy composition in reinforcement learning via multi-objective policy optimization

    Authors: Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

    Abstract: We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies… ▽ More

    Submitted 30 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  5. arXiv:2306.11706  [pdf, other

    cs.RO cs.LG

    RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

    Authors: Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz , et al. (14 additional authors not shown)

    Abstract: The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Transactions on Machine Learning Research (12/2023)

  6. arXiv:2302.12617  [pdf, other

    cs.RO cs.AI cs.LG

    Leveraging Jumpy Models for Planning and Fast Learning in Robotic Domains

    Authors: **gwei Zhang, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao, Nicolas Heess, Martin Riedmiller

    Abstract: In this paper we study the problem of learning multi-step dynamics prediction models (jumpy models) from unlabeled experience and their utility for fast inference of (high-level) plans in downstream tasks. In particular we propose to learn a jumpy model alongside a skill embedding space offline, from previously collected experience for which no labels or reward annotations are required. We then in… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  7. arXiv:2211.13743  [pdf, other

    cs.LG cs.AI cs.RO

    SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

    Authors: Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas Abdolmaleki, Ben Moran, Tuomas Haarnoja, Jan Humplik, Roland Hafner, Michael Neunert, Claudio Fantacci, Tim Hertweck, Thomas Lampe, Fereshteh Sadeghi, Nicolas Heess, Martin Riedmiller

    Abstract: The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert be… ▽ More

    Submitted 11 January, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  8. arXiv:2205.03353  [pdf, other

    cs.RO cs.LG

    How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

    Authors: Alex X. Lee, Coline Devin, Jost Tobias Springenberg, Yuxiang Zhou, Thomas Lampe, Abbas Abdolmaleki, Konstantinos Bousmalis

    Abstract: Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a subopt… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  9. arXiv:2204.10256  [pdf, other

    cs.LG cs.AI

    Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

    Authors: Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, Siqi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

    Abstract: Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value est… ▽ More

    Submitted 22 April, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

  10. arXiv:2204.05893  [pdf, other

    cs.RO cs.AI cs.LG

    Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data

    Authors: Wenxuan Zhou, Steven Bohez, Jan Humplik, Abbas Abdolmaleki, Dushyant Rao, Markus Wulfmeier, Tuomas Haarnoja, Nicolas Heess

    Abstract: Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the environment variations it has encountered. At the same time, it should still be able to learn fast in a new environment. We identify two challenges in Reinforcemen… ▽ More

    Submitted 18 August, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Published at 1st Conference on Lifelong Learning Agents, 2022

  11. arXiv:2110.06192  [pdf, other

    cs.RO cs.LG

    Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes

    Authors: Alex X. Lee, Coline Devin, Yuxiang Zhou, Thomas Lampe, Konstantinos Bousmalis, Jost Tobias Springenberg, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori

    Abstract: We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can ef… ▽ More

    Submitted 3 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: CoRL 2021. Video: https://dpmd.ai/robotics-stacking-YT . Blog: https://dpmd.ai/robotics-stacking . Code: https://github.com/deepmind/rgb_stacking

  12. arXiv:2110.03363  [pdf, other

    cs.RO cs.AI cs.LG

    Evaluating model-based planning and planner amortization for continuous control

    Authors: Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller

    Abstract: There is a widespread intuition that model-based control methods should be able to surpass the data efficiency of model-free approaches. In this paper we attempt to evaluate this intuition on various challenging locomotion tasks. We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning; the learned policy serves as a proposal for MPC.… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 9 pages main text, 30 pages with references and appendix including several ablations and additional experiments. Submitted to ICLR 2022

  13. arXiv:2106.08199  [pdf, other

    cs.LG cs.RO

    On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

    Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and au… ▽ More

    Submitted 1 August, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  14. arXiv:2105.12196  [pdf, other

    cs.AI cs.MA cs.NE cs.RO

    From Motor Control to Team Play in Simulated Humanoid Football

    Authors: Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

    Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  15. arXiv:2101.09458  [pdf, other

    cs.LG

    Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

    Authors: William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller

    Abstract: Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers… ▽ More

    Submitted 1 July, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

  16. arXiv:2010.15492  [pdf, other

    cs.RO

    "What, not how": Solving an under-actuated insertion task from scratch

    Authors: Giulia Vezzani, Michael Neunert, Markus Wulfmeier, Rae Jeong, Thomas Lampe, Noah Siegel, Roland Hafner, Abbas Abdolmaleki, Martin Riedmiller, Francesco Nori

    Abstract: Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as gras** an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete… ▽ More

    Submitted 30 October, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

  17. arXiv:2010.05545  [pdf, other

    cs.LG cs.AI stat.ML

    Local Search for Policy Iteration in Continuous Control

    Authors: Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller

    Abstract: We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  18. arXiv:2007.15588  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Data-efficient Hindsight Off-policy Option Learning

    Authors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option fr… ▽ More

    Submitted 15 June, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Published at ICML2021

  19. arXiv:2006.00979  [pdf, other

    cs.LG cs.AI

    Acme: A Research Framework for Distributed Reinforcement Learning

    Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

    Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More

    Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme

  20. arXiv:2005.07513  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    A Distributional View on Multi-Objective Policy Optimization

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Leonard Hasenclever, Michael Neunert, H. Francis Song, Martina Zambelli, Murilo F. Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller

    Abstract: Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for obj… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  21. arXiv:2002.08396  [pdf, other

    cs.LG cs.RO stat.ML

    Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

    Authors: Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In th… ▽ More

    Submitted 17 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    ACM Class: I.2.6; I.2.9

    Journal ref: ICLR 2020

  22. arXiv:2001.00449  [pdf, other

    cs.LG cs.RO stat.ML

    Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

    Authors: Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller

    Abstract: Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

    Comments: Presented at the 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan. Video: https://youtu.be/eUqQDLQXb7I

  23. arXiv:1911.01831  [pdf, other

    cs.LG cs.NE

    Quinoa: a Q-function You Infer Normalized Over Actions

    Authors: Jonas Degrave, Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Martin Riedmiller

    Abstract: We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form. We use recent advances in normalising flows for parametrising the policy together with a learned value-function; and show how this combination can be used to implicitly represent Q-… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: Deep RL Workshop/NeurIPS

  24. arXiv:1910.09471  [pdf, other

    cs.RO cs.LG

    Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer

    Authors: Rae Jeong, Jackie Kay, Francesco Romano, Thomas Lampe, Tom Rothorl, Abbas Abdolmaleki, Tom Erez, Yuval Tassa, Francesco Nori

    Abstract: Learning robotic control policies in the real world gives rise to challenges in data efficiency, safety, and controlling the initial condition of the system. On the other hand, simulations are a useful alternative as they provide an abundant source of data without the restrictions of the real world. Unfortunately, simulations often fail to accurately model complex real-world phenomena. Traditional… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

  25. arXiv:1910.04142  [pdf, other

    cs.RO cs.AI cs.CV cs.LG cs.NE

    Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

    Authors: Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predicti… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: To appear at the 3rd annual Conference on Robot Learning, Osaka, Japan (CoRL 2019). 24 pages including appendix (main paper - 8 pages)

  26. arXiv:1910.00528  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Augmenting learning using symmetry in a biologically-inspired domain

    Authors: Shruti Mishra, Abbas Abdolmaleki, Arthur Guez, Piotr Trochim, Doina Precup

    Abstract: Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the natural sciences to reduce the dimensionality of systems of equations. In supervised learning, such as in image classification tasks, rotation, translation and scale invariances are used to augment training datasets. In this work, we use data augmentation in a… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  27. arXiv:1909.12238  [pdf, other

    cs.AI cs.LG

    V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

    Authors: H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick

    Abstract: Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: * equal contribution

  28. arXiv:1906.11228  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Compositional Transfer in Hierarchical Reinforcement Learning

    Authors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multip… ▽ More

    Submitted 19 May, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: Robotics Science and Systems 2020

  29. arXiv:1906.07516  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Reinforcement Learning for Continuous Control with Model Misspecification

    Authors: Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, Martin Riedmiller

    Abstract: We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a… ▽ More

    Submitted 11 February, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

  30. arXiv:1902.04706  [pdf, other

    cs.LG cs.RO stat.ML

    Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

    Authors: Devin Schwab, Tobias Springenberg, Murilo F. Martins, Thomas Lampe, Michael Neunert, Abbas Abdolmaleki, Tim Hertweck, Roland Hafner, Francesco Nori, Martin Riedmiller

    Abstract: We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-ti… ▽ More

    Submitted 18 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: Videos can be found at https://sites.google.com/view/rss-2019-sawyer-bic/

  31. arXiv:1902.04623  [pdf, other

    cs.RO

    Value constrained model-free continuous control

    Authors: Steven Bohez, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess, Raia Hadsell

    Abstract: The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially as bang-bang control. Although such solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang control may lead to incre… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  32. arXiv:1812.02256  [pdf, other

    cs.LG stat.ML

    Relative Entropy Regularized Policy Iteration

    Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller

    Abstract: We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  33. arXiv:1806.06920  [pdf, other

    cs.LG cs.AI cs.IT cs.RO stat.ML

    Maximum a Posteriori Policy Optimisation

    Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

    Abstract: We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

  34. arXiv:1801.00690  [pdf, other

    cs.AI

    DeepMind Control Suite

    Authors: Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller

    Abstract: The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. The tasks are written in Python and powered by the MuJoCo physics engine, making them easy to use and modify. We include benchmarks for several learning algorithms. The Control Suite is publicly avail… ▽ More

    Submitted 2 January, 2018; originally announced January 2018.

    Comments: 24 pages, 7 figures, 2 tables

  35. arXiv:1606.09197  [pdf, other

    cs.LG cs.RO

    Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

    Authors: Riad Akrour, Abbas Abdolmaleki, Hany Abdulsamad, Jan Peters, Gerhard Neumann

    Abstract: Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-t… ▽ More

    Submitted 2 July, 2018; v1 submitted 29 June, 2016; originally announced June 2016.