-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Authors:
Alexander Khazatsky,
Karl Pertsch,
Suraj Nair,
Ashwin Balakrishna,
Sudeep Dasari,
Siddharth Karamcheti,
Soroush Nasiriany,
Mohan Kumar Srirama,
Lawrence Yunliang Chen,
Kirsty Ellis,
Peter David Fagan,
Joey Hejna,
Masha Itkina,
Marion Lepert,
Yecheng Jason Ma,
Patrick Tree Miller,
Jimmy Wu,
Suneel Belkhale,
Shivin Dass,
Huy Ha,
Arhan Jain,
Abraham Lee,
Youngwoon Lee,
Marius Memmel,
Sungjae Park
, et al. (74 additional authors not shown)
Abstract:
The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu…
▽ More
The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
URLB: Unsupervised Reinforcement Learning Benchmark
Authors:
Michael Laskin,
Denis Yarats,
Hao Liu,
Kimin Lee,
Albert Zhan,
Kevin Lu,
Catherine Cang,
Lerrel Pinto,
Pieter Abbeel
Abstract:
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms…
▽ More
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Hierarchical Few-Shot Imitation with Skill Transition Models
Authors:
Kourosh Hakhamaneshi,
Ruihan Zhao,
Albert Zhan,
Pieter Abbeel,
Michael Laskin
Abstract:
A desirable property of autonomous agents is the ability to both solve long-horizon problems and generalize to unseen tasks. Recent advances in data-driven skill learning have shown that extracting behavioral priors from offline data can enable agents to solve challenging long-horizon tasks with reinforcement learning. However, generalization to tasks unseen during behavioral prior training remain…
▽ More
A desirable property of autonomous agents is the ability to both solve long-horizon problems and generalize to unseen tasks. Recent advances in data-driven skill learning have shown that extracting behavioral priors from offline data can enable agents to solve challenging long-horizon tasks with reinforcement learning. However, generalization to tasks unseen during behavioral prior training remains an outstanding challenge. To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations. FIST learns an inverse skill dynamics model, a distance function, and utilizes a semi-parametric approach for imitation. We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments requiring traversing unseen parts of a large maze and 7-DoF robotic arm experiments requiring manipulating previously unseen objects in a kitchen.
△ Less
Submitted 10 March, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation
Authors:
Albert Zhan,
Ruihan Zhao,
Lerrel Pinto,
Pieter Abbeel,
Michael Laskin
Abstract:
Recent advances in unsupervised representation learning significantly improved the sample efficiency of training Reinforcement Learning policies in simulated environments. However, similar gains have not yet been seen for real-robot reinforcement learning. In this work, we focus on enabling data-efficient real-robot learning from pixels. We present Contrastive Pre-training and Data Augmentation fo…
▽ More
Recent advances in unsupervised representation learning significantly improved the sample efficiency of training Reinforcement Learning policies in simulated environments. However, similar gains have not yet been seen for real-robot reinforcement learning. In this work, we focus on enabling data-efficient real-robot learning from pixels. We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards. While contrastive pre-training, data augmentation, demonstrations, and reinforcement learning are alone insufficient for efficient learning, our main contribution is showing that the combination of these disparate techniques results in a simple yet data-efficient method. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flip** a switch, and opening a drawer in just 30 minutes of mean real-world training time. We include videos and code on the project website: https://sites.google.com/view/efficient-robotic-manipulation/home
△ Less
Submitted 16 October, 2022; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Preventing Imitation Learning with Adversarial Policy Ensembles
Authors:
Albert Zhan,
Stas Tiomkin,
Pieter Abbeel
Abstract:
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy. Policies, such as human, or policies on deployed robots, can all be cloned without consent from the owners. How can we protect against external observers cloning our proprietary policies? To answer this question we introduce a new reinforcement learning framework, where we train an ensem…
▽ More
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy. Policies, such as human, or policies on deployed robots, can all be cloned without consent from the owners. How can we protect against external observers cloning our proprietary policies? To answer this question we introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies, whose demonstrations are guaranteed to be useless for an external observer. We formulate this idea by a constrained optimization problem, where the objective is to improve proprietary policies, and at the same time deteriorate the virtual policy of an eventual external observer. We design a tractable algorithm to solve this new optimization problem by modifying the standard policy gradient algorithm. Our formulation can be interpreted in lenses of confidentiality and adversarial behaviour, which enables a broader perspective of this work. We demonstrate the existence of "non-clonable" ensembles, providing a solution to the above optimization problem, which is calculated by our modified policy gradient algorithm. To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.
△ Less
Submitted 2 August, 2020; v1 submitted 30 January, 2020;
originally announced February 2020.
-
High Frequency Remote Monitoring of Parkinson's Disease via Smartphone: Platform Overview and Medication Response Detection
Authors:
Andong Zhan,
Max A. Little,
Denzil A. Harris,
Solomon O. Abiola,
E. Ray Dorsey,
Suchi Saria,
Andreas Terzis
Abstract:
Objective: The aim of this study is to develop a smartphone-based high-frequency remote monitoring platform, assess its feasibility for remote monitoring of symptoms in Parkinson's disease, and demonstrate the value of data collected using the platform by detecting dopaminergic medication response. Methods: We have developed HopkinsPD, a novel smartphone-based monitoring platform, which measures s…
▽ More
Objective: The aim of this study is to develop a smartphone-based high-frequency remote monitoring platform, assess its feasibility for remote monitoring of symptoms in Parkinson's disease, and demonstrate the value of data collected using the platform by detecting dopaminergic medication response. Methods: We have developed HopkinsPD, a novel smartphone-based monitoring platform, which measures symptoms actively (i.e. data are collected when a suite of tests is initiated by the individual at specific times during the day), and passively (i.e. data are collected continuously in the background). After data collection, we extract features to assess measures of five key behaviors related to PD symptoms -- voice, balance, gait, dexterity, and reaction time. A random forest classifier is used to discriminate measurements taken after a dose of medication (treatment) versus before the medication dose (baseline). Results: A worldwide study for remote PD monitoring was established using HopkinsPD in July, 2014. This study used entirely remote, online recruitment and installation, demonstrating highly cost-effective scalability. In six months, 226 individuals (121 PD and 105 controls) contributed over 46,000 hours of passive monitoring data and approximately 8,000 instances of structured tests of voice, balance, gait, reaction, and dexterity. To the best of our knowledge, this is the first study to have collected data at such a scale for remote PD monitoring. Moreover, we demonstrate the initial ability to discriminate treatment from baseline with 71.0(+-0.4)% accuracy, which suggests medication response can be monitored remotely via smartphone-based measures.
△ Less
Submitted 5 January, 2016;
originally announced January 2016.