Skip to main content

Showing 1–6 of 6 results for author: Walter, M R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.01737  [pdf, other

    cs.LG cs.AI stat.ML

    Blending Imitation and Reinforcement Learning for Robust Policy Improvement

    Authors: Xuefeng Liu, Takuma Yoneda, Rick L. Stevens, Matthew R. Walter, Yuxin Chen

    Abstract: While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. which actively interleaves between IL and RL based on an online estim… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  2. arXiv:2306.10259  [pdf, other

    cs.LG cs.AI stat.ML

    Active Policy Improvement from Multiple Black-box Oracles

    Authors: Xuefeng Liu, Takuma Yoneda, Chaoqi Wang, Matthew R. Walter, Yuxin Chen

    Abstract: Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle.… ▽ More

    Submitted 5 July, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

  3. arXiv:2109.10957  [pdf, other

    cs.RO stat.AP

    Real Robot Challenge: A Robotics Competition in the Cloud

    Authors: Stefan Bauer, Felix Widmaier, Manuel Wüthrich, Annika Buchholz, Sebastian Stark, Anirudh Goyal, Thomas Steinbrenner, Joel Akpo, Shruti Joshi, Vincent Berenz, Vaibhav Agrawal, Niklas Funk, Julen Urain De Jesus, Jan Peters, Joe Watson, Claire Chen, Krishnan Srinivasan, Junwu Zhang, Jeffrey Zhang, Matthew R. Walter, Rishabh Madan, Charles Schaff, Takahiro Maeda, Takuma Yoneda, Denis Yarats , et al. (17 additional authors not shown)

    Abstract: Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able… ▽ More

    Submitted 10 June, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

  4. arXiv:2008.01205  [pdf, other

    cs.LG cs.RO stat.ML

    Concurrent Training Improves the Performance of Behavioral Cloning from Observation

    Authors: Zachary W. Robertson, Matthew R. Walter

    Abstract: Learning from demonstration is widely used as an efficient way for robots to acquire new skills. However, it typically requires that demonstrations provide full access to the state and action sequences. In contrast, learning from observation offers a way to utilize unlabeled demonstrations (e.g., video) to perform imitation learning. One approach to this is behavioral cloning from observation (BCO… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: 13 pages, 2 figures, Submitted to the 4th Conference on Robot Learning (CoRL 2020)

  5. arXiv:2002.06299  [pdf, other

    cs.LG stat.ML

    Loop Estimator for Discounted Values in Markov Reward Processes

    Authors: Falcon Z. Dai, Matthew R. Walter

    Abstract: At the working heart of policy iteration algorithms commonly used and studied in the discounted setting of reinforcement learning, the policy evaluation step estimates the value of states with samples from a Markov reward process induced by following a Markov policy in a Markov decision process. We propose a simple and efficient estimator called loop estimator that exploits the regenerative struct… ▽ More

    Submitted 3 March, 2021; v1 submitted 14 February, 2020; originally announced February 2020.

    Comments: accepted to AAAI 2021

  6. arXiv:1907.02114  [pdf, ps, other

    cs.LG stat.ML

    Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

    Authors: Falcon Z. Dai, Matthew R. Walter

    Abstract: We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of sever… ▽ More

    Submitted 4 November, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

    Comments: Minor post-review revision. Main paper with appendix. To appear at NeurIPS 2019