Skip to main content

Showing 1–16 of 16 results for author: Osa, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05993  [pdf, other

    cs.LG stat.ML

    Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning

    Authors: Takayuki Osa, Tatsuya Harada

    Abstract: Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: ICML 2024, 21 pages

  2. arXiv:2406.04896  [pdf, other

    cs.LG cs.AI

    Stabilizing Extreme Q-learning by Maclaurin Expansion

    Authors: Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

    Abstract: In Extreme Q-learning (XQL), Gumbel Regression is performed with an assumed Gumbel distribution for the error distribution. This allows learning of the value function without sampling out-of-distribution actions and has shown excellent performance mainly in Offline RL. However, issues remained, including the exponential term in the loss function causing instability and the potential for an error d… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at RLC 2024: The first Reinforcement Learning Conference

  3. arXiv:2405.14114  [pdf, other

    cs.LG cs.AI

    Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

    Authors: Johannes Ackermann, Takayuki Osa, Masashi Sugiyama

    Abstract: Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within ea… ▽ More

    Submitted 27 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted for Reinforcement Learning Conference (RLC) 2024

  4. Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning

    Authors: Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

    Abstract: In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman ope… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted at AAAI 2024: The 38th Annual AAAI Conference on Artificial Intelligence (Main Tech Track)

  5. arXiv:2403.00344  [pdf, other

    cs.RO cs.LG

    Robustifying a Policy in Multi-Agent RL with Diverse Cooperative Behaviors and Adversarial Style Sampling for Assistive Tasks

    Authors: Takayuki Osa, Tatsuya Harada

    Abstract: Autonomous assistance of people with motor impairments is one of the most promising applications of autonomous robotic systems. Recent studies have reported encouraging results using deep reinforcement learning (RL) in the healthcare domain. Previous studies showed that assistive tasks can be formulated as multi-agent RL, wherein there are two agents: a caregiver and a care-receiver. However, poli… ▽ More

    Submitted 1 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 7 pages, accepted for ICRA 2024

  6. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  7. arXiv:2107.05842  [pdf, other

    cs.RO cs.LG

    Motion Planning by Learning the Solution Manifold in Trajectory Optimization

    Authors: Takayuki Osa

    Abstract: The objective function used in trajectory optimization is often non-convex and can have an infinite set of local optima. In such cases, there are diverse solutions to perform a given task. Although there are a few methods to find multiple solutions for motion planning, they are limited to generating a finite set of solutions. To address this issue, we presents an optimization method that learns an… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: 24 pages, to appear in the International Journal of Robotics Research

  8. arXiv:2103.07084  [pdf, other

    stat.ML cs.AI cs.LG

    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information

    Authors: Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

    Abstract: Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an app… ▽ More

    Submitted 12 April, 2022; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: 35 pages

  9. arXiv:2007.12397  [pdf, other

    cs.RO cs.AI cs.LG

    Learning the Solution Manifold in Optimization and Its Application in Motion Planning

    Authors: Takayuki Osa

    Abstract: Optimization is an essential component for solving problems in wide-ranging fields. Ideally, the objective function should be designed such that the solution is unique and the optimization problem can be solved stably. However, the objective function used in a practical application is usually non-convex, and sometimes it even has an infinite set of solutions. To address this issue, we propose to l… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  10. arXiv:2006.02608  [pdf, ps, other

    cs.LG stat.ML

    Meta-Model-Based Meta-Policy Optimization

    Authors: Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka

    Abstract: Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarante… ▽ More

    Submitted 11 October, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: ACML 2021. Video demo: https://drive.google.com/file/d/1DRA-pmIWnHGNv5G_gFrml8YzKCtMcGnu/view?usp=sharing URL Source code: https://github.com/TakuyaHiraoka/Meta-Model-Based-Meta-Policy-Optimization

  11. Multimodal Trajectory Optimization for Motion Planning

    Authors: Takayuki Osa

    Abstract: Existing motion planning methods often have two drawbacks: 1) goal configurations need to be specified by a user, and 2) only a single solution is generated under a given condition. In practice, multiple possible goal configurations exist to achieve a task. Although the choice of the goal configuration significantly affects the quality of the resulting trajectory, it is not trivial for a user to s… ▽ More

    Submitted 23 September, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

    Comments: 17 pages, International Journal of Robotics Research

  12. Goal-Conditioned Variational Autoencoder Trajectory Primitives with Continuous and Discrete Latent Codes

    Authors: Takayuki Osa, Shuhei Ikemoto

    Abstract: Imitation learning is an intuitive approach for teaching motion to robotic systems. Although previous studies have proposed various methods to model demonstrated movement primitives, one of the limitations of existing methods is that the shape of the trajectories are encoded in high dimensional space. The high dimensionality of the trajectory representation can be a bottleneck in the subsequent pr… ▽ More

    Submitted 23 September, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

    Comments: 8 pages, SN Computer Science

  13. arXiv:1910.01465  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

    Authors: Johannes Ackermann, Volker Gabler, Takayuki Osa, Masashi Sugiyama

    Abstract: Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, w… ▽ More

    Submitted 2 December, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: Accepted for the Deep RL Workshop at NeurIPS 2019; Changes for v2: Changed Figures 3,4, due to an error in the implementation of MATD3. Please refer to this version for fair evaluation

  14. arXiv:1901.01365  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

    Authors: Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

    Abstract: Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent… ▽ More

    Submitted 7 March, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: 16 pages, ICLR 2019

  15. An Algorithmic Perspective on Imitation Learning

    Authors: Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, Jan Peters

    Abstract: As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to d… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: 187 pages. Published in Foundations and Trends in Robotics

  16. arXiv:1711.10173  [pdf, other

    cs.LG stat.ML

    Hierarchical Policy Search via Return-Weighted Density Estimation

    Authors: Takayuki Osa, Masashi Sugiyama

    Abstract: Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcement learning (RL). Hierarchical RL (HRL) tackles this problem by learning a hierarchical policy, where multiple option policies are in charge of different strategies corresponding to modes of a reward function and a gating policy selects the best option for a given context. Although HRL has been dem… ▽ More

    Submitted 30 November, 2017; v1 submitted 28 November, 2017; originally announced November 2017.

    Comments: The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 9 pages