Skip to main content

Showing 1–9 of 9 results for author: Stadie, B C

.
  1. arXiv:2310.13914  [pdf, other

    cs.RO

    Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States

    Authors: Zidan Wang, Takeru Oba, Takuma Yoneda, Rui Shen, Matthew Walter, Bradly C. Stadie

    Abstract: Learning from demonstrations (LfD) has successfully trained robots to exhibit remarkable generalization capabilities. However, many powerful imitation techniques do not prioritize the feasibility of the robot behaviors they generate. In this work, we explore the feasibility of plans produced by LfD. As in prior work, we employ a temporal diffusion model with fixed start and goal states to facilita… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  2. arXiv:2209.13046  [pdf, other

    cs.LG cs.AI

    Understanding Hindsight Goal Relabeling from a Divergence Minimization Perspective

    Authors: Lunjun Zhang, Bradly C. Stadie

    Abstract: Hindsight goal relabeling has become a foundational technique in multi-goal reinforcement learning (RL). The essential idea is that any trajectory can be seen as a sub-optimal demonstration for reaching its final state. Intuitively, learning from those arbitrary demonstrations can be seen as a form of imitation learning (IL). However, the connection between hindsight goal relabeling and imitation… ▽ More

    Submitted 30 January, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

  3. arXiv:2011.12491  [pdf, other

    cs.AI

    World Model as a Graph: Learning Latent Landmarks for Planning

    Authors: Lunjun Zhang, Ge Yang, Bradly C. Stadie

    Abstract: Planning - the ability to analyze the structure of a problem in the large and decompose it into interrelated subproblems - is a hallmark of human intelligence. While deep reinforcement learning (RL) has shown great promise for solving relatively straightforward control tasks, it remains an open problem how to best incorporate planning into existing deep RL paradigms to handle increasingly complex… ▽ More

    Submitted 30 June, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

    Journal ref: International Conference on Machine Learning (ICML). 2021

  4. arXiv:1808.07804  [pdf, other

    stat.ML cs.AI cs.LG stat.AP

    Transfer Learning for Estimating Causal Effects using Neural Networks

    Authors: Sören R. Künzel, Bradly C. Stadie, Nikita Vemuri, Varsha Ramakrishnan, Jasjeet S. Sekhon, Pieter Abbeel

    Abstract: We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the exta… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

  5. arXiv:1803.01118  [pdf, other

    cs.AI

    Some Considerations on Learning to Explore via Meta-Reinforcement Learning

    Authors: Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

    Abstract: We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

    Submitted 11 January, 2019; v1 submitted 3 March, 2018; originally announced March 2018.

  6. arXiv:1802.04821  [pdf, other

    cs.LG cs.AI

    Evolved Policy Gradients

    Authors: Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel

    Abstract: We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into a… ▽ More

    Submitted 29 April, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

  7. arXiv:1703.07326  [pdf, other

    cs.AI cs.LG cs.NE cs.RO

    One-Shot Imitation Learning

    Authors: Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba

    Abstract: Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineer… ▽ More

    Submitted 4 December, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

  8. arXiv:1703.01703  [pdf, other

    cs.LG

    Third-Person Imitation Learning

    Authors: Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever

    Abstract: Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demo… ▽ More

    Submitted 22 September, 2019; v1 submitted 5 March, 2017; originally announced March 2017.

    Comments: Only changed the abstract to remove unneeded hyphens

  9. arXiv:1507.00814  [pdf, other

    cs.AI cs.LG stat.ML

    Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

    Authors: Bradly C. Stadie, Sergey Levine, Pieter Abbeel

    Abstract: Achieving efficient and scalable exploration in complex domains poses a major challenge in reinforcement learning. While Bayesian and PAC-MDP approaches to the exploration problem offer strong formal guarantees, they are often impractical in higher dimensions due to their reliance on enumerating the state-action space. Hence, exploration in complex domains is often performed with simple epsilon-gr… ▽ More

    Submitted 19 November, 2015; v1 submitted 3 July, 2015; originally announced July 2015.