Skip to main content

Showing 1–43 of 43 results for author: Springenberg, T

.
  1. arXiv:2402.05546  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Actor-Critic Reinforcement Learning Scales to Large Models

    Authors: Jost Tobias Springenberg, Abbas Abdolmaleki, **gwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  2. arXiv:2401.08525  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    GATS: Gather-Attend-Scatter

    Authors: Konrad Zolna, Serkan Cabi, Yutian Chen, Eric Lau, Claudio Fantacci, Jurgis Pasukonis, Jost Tobias Springenberg, Sergio Gomez Colmenarejo

    Abstract: As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalit… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  3. arXiv:2312.11374  [pdf, other

    cs.RO

    Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

    Authors: Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, **gwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

    Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2306.11706  [pdf, other

    cs.RO cs.LG

    RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

    Authors: Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz , et al. (14 additional authors not shown)

    Abstract: The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned de… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Transactions on Machine Learning Research (12/2023)

  5. arXiv:2305.10912  [pdf, other

    cs.AI cs.RO

    A Generalist Dynamics Model for Control

    Authors: Ingmar Schubert, **gwei Zhang, Jake Bruce, Sarah Bechtle, Emilio Parisotto, Martin Riedmiller, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Nicolas Heess

    Abstract: We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any fu… ▽ More

    Submitted 23 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  6. arXiv:2302.12617  [pdf, other

    cs.RO cs.AI cs.LG

    Leveraging Jumpy Models for Planning and Fast Learning in Robotic Domains

    Authors: **gwei Zhang, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao, Nicolas Heess, Martin Riedmiller

    Abstract: In this paper we study the problem of learning multi-step dynamics prediction models (jumpy models) from unlabeled experience and their utility for fast inference of (high-level) plans in downstream tasks. In particular we propose to learn a jumpy model alongside a skill embedding space offline, from previously collected experience for which no labels or reward annotations are required. We then in… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  7. arXiv:2205.06175  [pdf, other

    cs.AI cs.CL cs.LG cs.RO

    A Generalist Agent

    Authors: Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

    Abstract: Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, dec… ▽ More

    Submitted 11 November, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: Published at TMLR, 42 pages

    Journal ref: Transactions on Machine Learning Research, 11/2022, https://openreview.net/forum?id=1ikK0kHjvj

  8. arXiv:2205.03353  [pdf, other

    cs.RO cs.LG

    How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

    Authors: Alex X. Lee, Coline Devin, Jost Tobias Springenberg, Yuxiang Zhou, Thomas Lampe, Abbas Abdolmaleki, Konstantinos Bousmalis

    Abstract: Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a subopt… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  9. arXiv:2204.10256  [pdf, other

    cs.LG cs.AI

    Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

    Authors: Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, Siqi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

    Abstract: Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value est… ▽ More

    Submitted 22 April, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

  10. arXiv:2110.06192  [pdf, other

    cs.RO cs.LG

    Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes

    Authors: Alex X. Lee, Coline Devin, Yuxiang Zhou, Thomas Lampe, Konstantinos Bousmalis, Jost Tobias Springenberg, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori

    Abstract: We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can ef… ▽ More

    Submitted 3 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: CoRL 2021. Video: https://dpmd.ai/robotics-stacking-YT . Blog: https://dpmd.ai/robotics-stacking . Code: https://github.com/deepmind/rgb_stacking

  11. arXiv:2110.03363  [pdf, other

    cs.RO cs.AI cs.LG

    Evaluating model-based planning and planner amortization for continuous control

    Authors: Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller

    Abstract: There is a widespread intuition that model-based control methods should be able to surpass the data efficiency of model-free approaches. In this paper we attempt to evaluate this intuition on various challenging locomotion tasks. We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning; the learned policy serves as a proposal for MPC.… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: 9 pages main text, 30 pages with references and appendix including several ablations and additional experiments. Submitted to ICLR 2022

  12. arXiv:2108.10273  [pdf, other

    cs.LG

    Collect & Infer -- a fresh look at data-efficient Reinforcement Learning

    Authors: Martin Riedmiller, Jost Tobias Springenberg, Roland Hafner, Nicolas Heess

    Abstract: This position paper proposes a fresh look at Reinforcement Learning (RL) from the perspective of data-efficiency. Data-efficient RL has gone through three major stages: pure on-line RL where every data-point is considered only once, RL with a replay buffer where additional learning is done on a portion of the experience, and finally transition memory based RL, where, conceptually, all transitions… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  13. arXiv:2106.08199  [pdf, other

    cs.LG cs.RO

    On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

    Authors: Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

    Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and au… ▽ More

    Submitted 1 August, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  14. arXiv:2101.09458  [pdf, other

    cs.LG

    Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

    Authors: William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller

    Abstract: Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers… ▽ More

    Submitted 1 July, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

  15. arXiv:2010.15040  [pdf, other

    stat.ML cs.LG

    Training Generative Adversarial Networks by Solving Ordinary Differential Equations

    Authors: Chongli Qin, Yan Wu, Jost Tobias Springenberg, Andrew Brock, Jeff Donahue, Timothy P. Lillicrap, Pushmeet Kohli

    Abstract: The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly st… ▽ More

    Submitted 28 November, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

  16. arXiv:2010.08587  [pdf, other

    cs.RO cs.AI

    Learning Dexterous Manipulation from Suboptimal Experts

    Authors: Rae Jeong, Jost Tobias Springenberg, Jackie Kay, Daniel Zheng, Yuxiang Zhou, Alexandre Galashov, Nicolas Heess, Francesco Nori

    Abstract: Learning dexterous manipulation in high-dimensional state-action spaces is an important open challenge with exploration presenting a major bottleneck. Although in many cases the learning process could be guided by demonstrations or other suboptimal experts, current RL algorithms for continuous action spaces often fail to effectively utilize combinations of highly off-policy expert data and on-poli… ▽ More

    Submitted 5 January, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

  17. arXiv:2010.05545  [pdf, other

    cs.LG cs.AI stat.ML

    Local Search for Policy Iteration in Continuous Control

    Authors: Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller

    Abstract: We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  18. arXiv:2006.15134  [pdf, other

    cs.LG cs.AI stat.ML

    Critic Regularized Regression

    Authors: Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

    Abstract: Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learnin… ▽ More

    Submitted 22 September, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: 24 pages; presented at NeurIPS 2020

  19. arXiv:2005.07541  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Simple Sensor Intentions for Exploration

    Authors: Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

    Abstract: Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic sy… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  20. arXiv:2002.08396  [pdf, other

    cs.LG cs.RO stat.ML

    Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

    Authors: Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In th… ▽ More

    Submitted 17 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    ACM Class: I.2.6; I.2.9

    Journal ref: ICLR 2020

  21. arXiv:2001.00449  [pdf, other

    cs.LG cs.RO stat.ML

    Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

    Authors: Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller

    Abstract: Many real-world control problems involve both discrete decision variables - such as the choice of control modes, gear switching or digital outputs - as well as continuous decision variables - such as velocity setpoints, control gains or analogue outputs. However, when defining the corresponding optimal control or reinforcement learning problem, it is commonly approximated with fully continuous or… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

    Comments: Presented at the 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan. Video: https://youtu.be/eUqQDLQXb7I

  22. arXiv:1911.01831  [pdf, other

    cs.LG cs.NE

    Quinoa: a Q-function You Infer Normalized Over Actions

    Authors: Jonas Degrave, Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Martin Riedmiller

    Abstract: We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form. We use recent advances in normalising flows for parametrising the policy together with a learned value-function; and show how this combination can be used to implicitly represent Q-… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: Deep RL Workshop/NeurIPS

  23. arXiv:1910.04142  [pdf, other

    cs.RO cs.AI cs.CV cs.LG cs.NE

    Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

    Authors: Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predicti… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

    Comments: To appear at the 3rd annual Conference on Robot Learning, Osaka, Japan (CoRL 2019). 24 pages including appendix (main paper - 8 pages)

  24. arXiv:1909.12238  [pdf, other

    cs.AI cs.LG

    V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

    Authors: H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick

    Abstract: Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: * equal contribution

  25. arXiv:1906.11228  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Compositional Transfer in Hierarchical Reinforcement Learning

    Authors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

    Abstract: The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multip… ▽ More

    Submitted 19 May, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: Robotics Science and Systems 2020

  26. arXiv:1906.07516  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Reinforcement Learning for Continuous Control with Model Misspecification

    Authors: Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, Martin Riedmiller

    Abstract: We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a… ▽ More

    Submitted 11 February, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

  27. arXiv:1902.04706  [pdf, other

    cs.LG cs.RO stat.ML

    Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

    Authors: Devin Schwab, Tobias Springenberg, Murilo F. Martins, Thomas Lampe, Michael Neunert, Abbas Abdolmaleki, Tim Hertweck, Roland Hafner, Francesco Nori, Martin Riedmiller

    Abstract: We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-ti… ▽ More

    Submitted 18 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: Videos can be found at https://sites.google.com/view/rss-2019-sawyer-bic/

  28. arXiv:1901.00943  [pdf, other

    cs.LG cs.AI cs.NE cs.RO

    Self-supervised Learning of Image Embedding for Continuous Control

    Authors: Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, Martin Riedmiller

    Abstract: Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control. Recently, Reinforcement Learning methods have been proposed to solve specific tasks end-to-end, from pixels to torques. However, these approaches assume the access to a specified reward which may require specialized instrumentation of the environment. Furthermore, the obtained policy a… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: Contributed talk at Inference to Control workshop at NeurIPS2018

  29. arXiv:1812.02256  [pdf, other

    cs.LG stat.ML

    Relative Entropy Regularized Policy Iteration

    Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller

    Abstract: We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  30. arXiv:1806.06920  [pdf, other

    cs.LG cs.AI cs.IT cs.RO stat.ML

    Maximum a Posteriori Policy Optimisation

    Authors: Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

    Abstract: We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

  31. arXiv:1806.01242  [pdf, other

    cs.LG cs.AI stat.ML

    Graph networks as learnable physics engines for inference and control

    Authors: Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia

    Abstract: Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical sys… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

    Comments: ICML 2018

  32. arXiv:1802.10567  [pdf, other

    cs.LG cs.RO stat.ML

    Learning by Playing - Solving Sparse Reward Tasks from Scratch

    Authors: Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg

    Abstract: We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors - from scratch - in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is t… ▽ More

    Submitted 28 February, 2018; originally announced February 2018.

    Comments: A video of the rich set of learned behaviours can be found at https://youtu.be/mPKyvocNe_M

  33. Deep learning with convolutional neural networks for EEG decoding and visualization

    Authors: Robin Tibor Schirrmeister, Jost Tobias Springenberg, Lukas Dominique Josef Fiederer, Martin Glasstetter, Katharina Eggensperger, Michael Tangermann, Frank Hutter, Wolfram Burgard, Tonio Ball

    Abstract: PLEASE READ AND CITE THE REVISED VERSION at Human Brain Map**: http://onlinelibrary.wiley.com/doi/10.1002/hbm.23730/full Code available here: https://github.com/robintibor/braindecode

    Submitted 8 June, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: A revised manuscript (with the new title) has been accepted at Human Brain Map**, see http://onlinelibrary.wiley.com/doi/10.1002/hbm.23730/full

    ACM Class: I.2.6

  34. arXiv:1612.05533  [pdf, other

    cs.RO cs.AI cs.LG

    Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments

    Authors: **gwei Zhang, Jost Tobias Springenberg, Joschka Boedecker, Wolfram Burgard

    Abstract: In this paper we consider the problem of robot navigation in simple maze-like environments where the robot has to rely on its onboard sensors to perform the navigation task. In particular, we are interested in solutions to this problem that do not require localization, map** or planning. Additionally, we require that our solution can quickly adapt to new situations (e.g., changing navigation goa… ▽ More

    Submitted 23 July, 2017; v1 submitted 16 December, 2016; originally announced December 2016.

    Comments: Camera ready version for IROS 2017

  35. arXiv:1612.00767  [pdf, other

    stat.ML cs.AI cs.LG

    Asynchronous Stochastic Gradient MCMC with Elastic Coupling

    Authors: Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter

    Abstract: We consider parallel asynchronous Markov Chain Monte Carlo (MCMC) sampling for problems where we can leverage (stochastic) gradients to define continuous dynamics which explore the target distribution. We outline a solution strategy for this setting based on stochastic gradient Hamiltonian Monte Carlo sampling (SGHMC) which we alter to include an elastic coupling term that ties together multiple M… ▽ More

    Submitted 8 December, 2016; v1 submitted 2 December, 2016; originally announced December 2016.

  36. arXiv:1511.06390  [pdf, other

    stat.ML cs.LG

    Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks

    Authors: Jost Tobias Springenberg

    Abstract: In this paper we present a method for learning a discriminative classifier from unlabeled or partially labeled data. Our approach is based on an objective function that trades-off mutual information between observed examples and their predicted categorical class distribution, against robustness of the classifier to an adversarial generative model. The resulting algorithm can either be interpreted… ▽ More

    Submitted 30 April, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  37. arXiv:1507.06821  [pdf, other

    cs.CV cs.LG cs.NE cs.RO

    Multimodal Deep Learning for Robust RGB-D Object Recognition

    Authors: Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard

    Abstract: Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network.… ▽ More

    Submitted 18 August, 2015; v1 submitted 24 July, 2015; originally announced July 2015.

    Comments: Final version submitted to IROS'2015, results unchanged, reformulation of some text passages in abstract and introduction

  38. arXiv:1506.07365  [pdf, other

    cs.LG cs.CV stat.ML

    Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

    Authors: Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, Martin Riedmiller

    Abstract: We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images. E2C consists of a deep generative model, belonging to the family of variational autoencoders, that learns to generate image trajectories from a latent space in which the dynamics is constrained to be locally linear. Our model is derived directly from an optimal contro… ▽ More

    Submitted 20 November, 2015; v1 submitted 24 June, 2015; originally announced June 2015.

    Comments: Final NIPS version

  39. arXiv:1412.6806  [pdf, other

    cs.LG cs.CV cs.NE

    Striving for Simplicity: The All Convolutional Net

    Authors: Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller

    Abstract: Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers. We re-evaluate the state of the art for object recognition from small images with convolutional networks, questioning the necessity of different components in the pipeline. We find that… ▽ More

    Submitted 13 April, 2015; v1 submitted 21 December, 2014; originally announced December 2014.

    Comments: accepted to ICLR-2015 workshop track; no changes other than style

  40. arXiv:1411.5928  [pdf, other

    cs.CV cs.LG cs.NE

    Learning to Generate Chairs, Tables and Cars with Convolutional Networks

    Authors: Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox

    Abstract: We train generative 'up-convolutional' neural networks which are able to generate images of objects given object style, viewpoint, and color. We train the networks on rendered 3D models of chairs, tables, and cars. Our experiments show that the networks do not merely learn all images by heart, but rather find a meaningful representation of 3D models allowing them to assess the similarity of differ… ▽ More

    Submitted 2 August, 2017; v1 submitted 21 November, 2014; originally announced November 2014.

    Comments: v4: final PAMI version. New architecture figure

  41. arXiv:1406.6909  [pdf, other

    cs.LG cs.CV cs.NE

    Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

    Authors: Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox

    Abstract: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a… ▽ More

    Submitted 19 June, 2015; v1 submitted 26 June, 2014; originally announced June 2014.

    Comments: PAMI submission. Includes matching experiments as in arXiv:1405.5769v1. Also includes new network architectures, experiments on Caltech-256, experiment on combining Exemplar-CNN with clustering

  42. arXiv:1312.6116  [pdf, other

    stat.ML cs.LG cs.NE

    Improving Deep Neural Networks with Probabilistic Maxout Units

    Authors: Jost Tobias Springenberg, Martin Riedmiller

    Abstract: We present a probabilistic variant of the recently introduced maxout unit. The success of deep neural networks utilizing maxout can partly be attributed to favorable performance under dropout, when compared to rectified linear units. It however also depends on the fact that each maxout unit performs a pooling operation over a group of linear transformations and is thus partially invariant to chang… ▽ More

    Submitted 19 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

  43. arXiv:1312.5242  [pdf, other

    cs.CV cs.LG cs.NE

    Unsupervised feature learning by augmenting single images

    Authors: Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

    Abstract: When deep learning is applied to visual object recognition, data augmentation is often used to generate additional training data without extra labeling cost. It helps to reduce overfitting and increase the performance of the algorithm. In this paper we investigate if it is possible to use data augmentation as the main component of an unsupervised feature learning architecture. To that end we sampl… ▽ More

    Submitted 16 February, 2014; v1 submitted 18 December, 2013; originally announced December 2013.

    Comments: ICLR 2014 workshop track submission (7 pages, 4 figures, 1 table)