-
6-DoF Stability Field via Diffusion Models
Authors:
Takuma Yoneda,
Tianchong Jiang,
Gregory Shakhnarovich,
Matthew R. Walter
Abstract:
A core capability for robot manipulation is reasoning over where and how to stably place objects in cluttered environments. Traditionally, robots have relied on object-specific, hand-crafted heuristics in order to perform such reasoning, with limited generalizability beyond a small number of object instances and object interaction patterns. Recent approaches instead learn notions of physical inter…
▽ More
A core capability for robot manipulation is reasoning over where and how to stably place objects in cluttered environments. Traditionally, robots have relied on object-specific, hand-crafted heuristics in order to perform such reasoning, with limited generalizability beyond a small number of object instances and object interaction patterns. Recent approaches instead learn notions of physical interaction, namely motion prediction, but require supervision in the form of labeled object information or come at the cost of high sample complexity, and do not directly reason over stability or object placement. We present 6-DoFusion, a generative model capable of generating 3D poses of an object that produces a stable configuration of a given scene. Underlying 6-DoFusion is a diffusion model that incrementally refines a randomly initialized SE(3) pose to generate a sample from a learned, context-dependent distribution over stable poses. We evaluate our model on different object placement and stacking tasks, demonstrating its ability to construct stable scenes that involve novel object classes as well as to improve the accuracy of state-of-the-art 3D pose estimation methods.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States
Authors:
Zidan Wang,
Takeru Oba,
Takuma Yoneda,
Rui Shen,
Matthew Walter,
Bradly C. Stadie
Abstract:
Learning from demonstrations (LfD) has successfully trained robots to exhibit remarkable generalization capabilities. However, many powerful imitation techniques do not prioritize the feasibility of the robot behaviors they generate. In this work, we explore the feasibility of plans produced by LfD. As in prior work, we employ a temporal diffusion model with fixed start and goal states to facilita…
▽ More
Learning from demonstrations (LfD) has successfully trained robots to exhibit remarkable generalization capabilities. However, many powerful imitation techniques do not prioritize the feasibility of the robot behaviors they generate. In this work, we explore the feasibility of plans produced by LfD. As in prior work, we employ a temporal diffusion model with fixed start and goal states to facilitate imitation through in-painting. Unlike previous studies, we apply cold diffusion to ensure the optimization process is directed through the agent's replay buffer of previously visited states. This routing approach increases the likelihood that the final trajectories will predominantly occupy the feasible region of the robot's state space. We test this method in simulated robotic environments with obstacles and observe a significant improvement in the agent's ability to avoid these obstacles during planning.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Blending Imitation and Reinforcement Learning for Robust Policy Improvement
Authors:
Xuefeng Liu,
Takuma Yoneda,
Rick L. Stevens,
Matthew R. Walter,
Yuxin Chen
Abstract:
While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. which actively interleaves between IL and RL based on an online estim…
▽ More
While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. which actively interleaves between IL and RL based on an online estimate of their performance. RPI draws on the strengths of IL, using oracle queries to facilitate exploration, an aspect that is notably challenging in sparse-reward RL, particularly during the early stages of learning. As learning unfolds, RPI gradually transitions to RL, effectively treating the learned policy as an improved oracle. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles. Integral to RPI are Robust Active Policy Selection (RAPS) and Robust Policy Gradient (RPG), both of which reason over whether to perform state-wise imitation from the oracles or learn from its own value function when the learner's performance surpasses that of the oracles in a specific state. Empirical evaluations and theoretical analysis validate that RPI excels in comparison to existing state-of-the-art methodologies, demonstrating superior performance across various benchmark domains.
△ Less
Submitted 4 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Universality of almost periodicity in bounded discrete time series
Authors:
Chikara Nakayama,
Tsuyoshi Yoneda
Abstract:
We consider arbitrary bounded discrete time series. From its statistical feature, without any use of the Fourier transform, we find an almost periodic function which suitably characterizes the corresponding time series.
We consider arbitrary bounded discrete time series. From its statistical feature, without any use of the Fourier transform, we find an almost periodic function which suitably characterizes the corresponding time series.
△ Less
Submitted 24 March, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Statler: State-Maintaining Language Models for Embodied Reasoning
Authors:
Takuma Yoneda,
Jiading Fang,
Peng Li,
Huanyu Zhang,
Tianchong Jiang,
Shengjie Lin,
Ben Picker,
David Yunis,
Hongyuan Mei,
Matthew R. Walter
Abstract:
There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framew…
▽ More
There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.
△ Less
Submitted 20 May, 2024; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Active Policy Improvement from Multiple Black-box Oracles
Authors:
Xuefeng Liu,
Takuma Yoneda,
Chaoqi Wang,
Matthew R. Walter,
Yuxin Chen
Abstract:
Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle.…
▽ More
Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps.
△ Less
Submitted 5 July, 2023; v1 submitted 17 June, 2023;
originally announced June 2023.
-
Pointwise convergence of Fourier series and deep neural network for the indicator function of d-dimensional ball
Authors:
Ryota Kawasumi,
Tsuyoshi Yoneda
Abstract:
In this paper, we clarify the crucial difference between a deep neural network and the Fourier series. For the multiple Fourier series of periodization of some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the behavior of the spherical partial sum and discovered the third phenomenon other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular, the third one exh…
▽ More
In this paper, we clarify the crucial difference between a deep neural network and the Fourier series. For the multiple Fourier series of periodization of some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the behavior of the spherical partial sum and discovered the third phenomenon other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular, the third one exhibits prevention of pointwise convergence. In contrast to it, we give a specific deep neural network and prove pointwise convergence.
△ Less
Submitted 25 June, 2024; v1 submitted 17 April, 2023;
originally announced April 2023.
-
To the Noise and Back: Diffusion for Shared Autonomy
Authors:
Takuma Yoneda,
Luzhe Sun,
and Ge Yang,
Bradly Stadie,
Matthew Walter
Abstract:
Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the u…
▽ More
Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.
△ Less
Submitted 15 June, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Invariance Through Latent Alignment
Authors:
Takuma Yoneda,
Ge Yang,
Matthew R. Walter,
Bradly Stadie
Abstract:
A robot's deployment environment often involves perceptual changes that differ from what it has experienced during training. Standard practices such as data augmentation attempt to bridge this gap by augmenting source images in an effort to extend the support of the training distribution to better cover what the agent might experience at test time. In many cases, however, it is impossible to know…
▽ More
A robot's deployment environment often involves perceptual changes that differ from what it has experienced during training. Standard practices such as data augmentation attempt to bridge this gap by augmenting source images in an effort to extend the support of the training distribution to better cover what the agent might experience at test time. In many cases, however, it is impossible to know test-time distribution-shift a priori, making these schemes infeasible. In this paper, we introduce a general approach, called Invariance Through Latent Alignment (ILA), that improves the test-time performance of a visuomotor control policy in deployment environments with unknown perceptual variations. ILA performs unsupervised adaptation at deployment-time by matching the distribution of latent features on the target domain to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of challenging adaptation scenarios, including changes in lighting conditions, the content in the scene, and camera poses. We present results on calibrated control benchmarks in simulation -- the distractor control suite -- and a physical robot under a sim-to-real setup.
△ Less
Submitted 17 May, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Real Robot Challenge: A Robotics Competition in the Cloud
Authors:
Stefan Bauer,
Felix Widmaier,
Manuel Wüthrich,
Annika Buchholz,
Sebastian Stark,
Anirudh Goyal,
Thomas Steinbrenner,
Joel Akpo,
Shruti Joshi,
Vincent Berenz,
Vaibhav Agrawal,
Niklas Funk,
Julen Urain De Jesus,
Jan Peters,
Joe Watson,
Claire Chen,
Krishnan Srinivasan,
Junwu Zhang,
Jeffrey Zhang,
Matthew R. Walter,
Rishabh Madan,
Charles Schaff,
Takahiro Maeda,
Takuma Yoneda,
Denis Yarats
, et al. (17 additional authors not shown)
Abstract:
Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able…
▽ More
Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.
△ Less
Submitted 10 June, 2022; v1 submitted 22 September, 2021;
originally announced September 2021.
-
Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation
Authors:
Niklas Funk,
Charles Schaff,
Rishabh Madan,
Takuma Yoneda,
Julen Urain De Jesus,
Joe Watson,
Ethan K. Gordon,
Felix Widmaier,
Stefan Bauer,
Siddhartha S. Srinivasa,
Tapomayukh Bhattacharjee,
Matthew R. Walter,
Jan Peters
Abstract:
Dexterous manipulation is a challenging and important problem in robotics. While data-driven methods are a promising approach, current benchmarks require simulation or extensive engineering support due to the sample inefficiency of popular methods. We present benchmarks for the TriFinger system, an open-source robotic platform for dexterous manipulation and the focus of the 2020 Real Robot Challen…
▽ More
Dexterous manipulation is a challenging and important problem in robotics. While data-driven methods are a promising approach, current benchmarks require simulation or extensive engineering support due to the sample inefficiency of popular methods. We present benchmarks for the TriFinger system, an open-source robotic platform for dexterous manipulation and the focus of the 2020 Real Robot Challenge. The benchmarked methods, which were successful in the challenge, can be generally described as structured policies, as they combine elements of classical robotics and modern policy optimization. This inclusion of inductive biases facilitates sample efficiency, interpretability, reliability and high performance. The key aspects of this benchmarking is validation of the baselines across both simulation and the real system, thorough ablation study over the core features of each solution, and a retrospective analysis of the challenge as a manipulation benchmark. The code and demo videos for this work can be found on our website (https://sites.google.com/view/benchmark-rrc).
△ Less
Submitted 8 December, 2021; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Grasp and Motion Planning for Dexterous Manipulation for the Real Robot Challenge
Authors:
Takuma Yoneda,
Charles Schaff,
Takahiro Maeda,
Matthew Walter
Abstract:
This report describes our winning submission to the Real Robot Challenge (https://real-robot-challenge.com/). The Real Robot Challenge is a three-phase dexterous manipulation competition that involves manipulating various rectangular objects with the TriFinger Platform. Our approach combines motion planning with several motion primitives to manipulate the object. For Phases 1 and 2, we additionall…
▽ More
This report describes our winning submission to the Real Robot Challenge (https://real-robot-challenge.com/). The Real Robot Challenge is a three-phase dexterous manipulation competition that involves manipulating various rectangular objects with the TriFinger Platform. Our approach combines motion planning with several motion primitives to manipulate the object. For Phases 1 and 2, we additionally learn a residual policy in simulation that applies corrective actions on top of our controller. Our approach won first place in Phase 2 and Phase 3 of the competition. We were anonymously known as `ardentstork' on the competition leaderboard (https://real-robot-challenge.com/leader-board). Videos and our code can be found at https://github.com/ripl-ttic/real-robot-challenge.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Pow-Wow: A Dataset and Study on Collaborative Communication in Pommerman
Authors:
Takuma Yoneda,
Matthew R. Walter,
Jason Naradowsky
Abstract:
In multi-agent learning, agents must coordinate with each other in order to succeed. For humans, this coordination is typically accomplished through the use of language. In this work we perform a controlled study of human language use in a competitive team-based game, and search for useful lessons for structuring communication protocol between autonomous agents. We construct Pow-Wow, a new dataset…
▽ More
In multi-agent learning, agents must coordinate with each other in order to succeed. For humans, this coordination is typically accomplished through the use of language. In this work we perform a controlled study of human language use in a competitive team-based game, and search for useful lessons for structuring communication protocol between autonomous agents. We construct Pow-Wow, a new dataset for studying situated goal-directed human communication. Using the Pommerman game environment, we enlisted teams of humans to play against teams of AI agents, recording their observations, actions, and communications. We analyze the types of communications which result in effective game strategies, annotate them accordingly, and present corpus-level statistical analysis of how trends in communications affect game outcomes. Based on this analysis, we design a communication policy for learning agents, and show that agents which utilize communication achieve higher win-rates against baseline systems than those which do not.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Algorithms and System Architecture for Immediate Personalized News Recommendations
Authors:
Takeshi Yoneda,
Shunsuke Kozawa,
Keisuke Osone,
Yukinori Koide,
Yosuke Abe,
Yoshifumi Seki
Abstract:
Personalization plays an important role in many services, just as news does. Many studies have examined news personalization algorithms, but few have considered practical environments. This paper provides algorithms and system architecture for generating immediate personalized news in a practical environment. Immediacy means changes in news trends and user interests are reflected in recommended ne…
▽ More
Personalization plays an important role in many services, just as news does. Many studies have examined news personalization algorithms, but few have considered practical environments. This paper provides algorithms and system architecture for generating immediate personalized news in a practical environment. Immediacy means changes in news trends and user interests are reflected in recommended news lists quickly. Since news trends and user interests rapidly change, immediacy is critical in news personalization applications. We develop algorithms and system architecture to realize immediacy. Our algorithms are based on collaborative filtering of user clusters and evaluate news articles using click-through rate and decay scores based on the time elapsed since the user's last access. Existing studies have not fully discussed system architecture, so a major contribution of this paper is that we demonstrate a system architecture and realize our algorithms and a configuration example implemented on top of Amazon Web Services. We evaluate the proposed method both offline and online. The offline experiments are conducted through a real-world dataset from a commercial news delivery service, and online experiments are conducted via A/B testing on production environments. We confirm the effectiveness of our proposed method and also that our system architecture can operate in large-scale production environments.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Greedy Optimized Multileaving for Personalization
Authors:
Kojiro Iizuka,
Takeshi Yoneda,
Yoshifumi Seki
Abstract:
Personalization plays an important role in many services. To evaluate personalized rankings, online evaluation, such as A/B testing, is widely used today. Recently, multileaving has been found to be an efficient method for evaluating rankings in information retrieval fields. This paper describes the first attempt to optimize the multileaving method for personalization settings. We clarify the chal…
▽ More
Personalization plays an important role in many services. To evaluate personalized rankings, online evaluation, such as A/B testing, is widely used today. Recently, multileaving has been found to be an efficient method for evaluating rankings in information retrieval fields. This paper describes the first attempt to optimize the multileaving method for personalization settings. We clarify the challenges of applying this method to personalized rankings. Then, to solve these challenges, we propose greedy optimized multileaving (GOM) with a new credit feedback function. The empirical results showed that GOM was stable for increasing ranking lengths and the number of rankers. We implemented GOM on our actual news recommender systems, and compared its online performance. The results showed that GOM evaluated the personalized rankings precisely, with significantly smaller sample sizes (< 1/10) than A/B testing.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Bib2vec: An Embedding-based Search System for Bibliographic Information
Authors:
Takuma Yoneda,
Koki Mori,
Makoto Miwa,
Yutaka Sasaki
Abstract:
We propose a novel embedding model that represents relationships among several elements in bibliographic information with high representation ability and flexibility. Based on this model, we present a novel search system that shows the relationships among the elements in the ACL Anthology Reference Corpus. The evaluation results show that our model can achieve a high prediction ability and produce…
▽ More
We propose a novel embedding model that represents relationships among several elements in bibliographic information with high representation ability and flexibility. Based on this model, we present a novel search system that shows the relationships among the elements in the ACL Anthology Reference Corpus. The evaluation results show that our model can achieve a high prediction ability and produce reasonable search results.
△ Less
Submitted 5 April, 2018; v1 submitted 15 June, 2017;
originally announced June 2017.