Search | arXiv e-print repository

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Authors: Yide Shentu, Philipp Wu, Aravind Rajeswaran, Pieter Abbeel

Abstract: Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g.… ▽ More Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g. performing a dance routine). Further, it makes end-to-end finetuning on embodied data challenging due to domain shift and catastrophic forgetting. We introduce our method -- Learnable Latent Codes as Bridges (LCB) -- as an alternate architecture to overcome these limitations. \method~uses a learnable latent code to act as a bridge between LLMs and low-level policies. This enables LLMs to flexibly communicate goals in the task plan without being entirely constrained by language limitations. Additionally, it enables end-to-end finetuning without destroying the embedding space of word tokens learned during pre-training. Through experiments on Language Table and Calvin, two common language based benchmarks for embodied agents, we find that \method~outperforms baselines (including those w/ GPT-4V) that leverage pure language as the interface layer on tasks that require reasoning and multi-step behaviors. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2309.13037 [pdf, other]

GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

Authors: Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, Pieter Abbeel

Abstract: Imitation learning from human demonstrations is a powerful framework to teach robots new skills. However, the performance of the learned policies is bottlenecked by the quality, scale, and variety of the demonstration data. In this paper, we aim to lower the barrier to collecting large and high-quality human demonstration data by proposing GELLO, a general framework for building low-cost and intui… ▽ More Imitation learning from human demonstrations is a powerful framework to teach robots new skills. However, the performance of the learned policies is bottlenecked by the quality, scale, and variety of the demonstration data. In this paper, we aim to lower the barrier to collecting large and high-quality human demonstration data by proposing GELLO, a general framework for building low-cost and intuitive teleoperation systems for robotic manipulation. Given a target robot arm, we build a GELLO controller that has the same kinematic structure as the target arm, leveraging 3D-printed parts and off-the-shelf motors. GELLO is easy to build and intuitive to use. Through an extensive user study, we show that GELLO enables more reliable and efficient demonstration collection compared to commonly used teleoperation devices in the imitation learning literature such as VR controllers and 3D spacemouses. We further demonstrate the capabilities of GELLO for performing complex bi-manual and contact-rich manipulation tasks. To make GELLO accessible to everyone, we have designed and built GELLO systems for 3 commonly used robotic arms: Franka, UR5, and xArm. All software and hardware are open-sourced and can be found on our website: https://wuphilipp.github.io/gello/. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2307.12909 [pdf, other]

Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Authors: Shangzhan Zhang, Sida Peng, Yinji ShenTu, Qing Shuai, Tianrun Chen, Kaicheng Yu, Hujun Bao, Xiaowei Zhou

Abstract: Recently, the editing of neural radiance fields (NeRFs) has gained considerable attention, but most prior works focus on static scenes while research on the appearance editing of dynamic scenes is relatively lacking. In this paper, we propose a novel framework to edit the local appearance of dynamic NeRFs by manipulating pixels in a single frame of training video. Specifically, to locally edit the… ▽ More Recently, the editing of neural radiance fields (NeRFs) has gained considerable attention, but most prior works focus on static scenes while research on the appearance editing of dynamic scenes is relatively lacking. In this paper, we propose a novel framework to edit the local appearance of dynamic NeRFs by manipulating pixels in a single frame of training video. Specifically, to locally edit the appearance of dynamic NeRFs while preserving unedited regions, we introduce a local surface representation of the edited region, which can be inserted into and rendered along with the original NeRF and warped to arbitrary other frames through a learned invertible motion representation network. By employing our method, users without professional expertise can easily add desired content to the appearance of a dynamic scene. We extensively evaluate our approach on various scenes and show that our approach achieves spatially and temporally consistent editing results. Notably, our approach is versatile and applicable to different variants of dynamic NeRF representations. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: project page: https://dyn-e.github.io/

arXiv:2210.07424 [pdf, other]

Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Authors: YuXuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen

Abstract: 3D bounding boxes are a widespread intermediate representation in many computer vision applications. However, predicting them is a challenging task, largely due to partial observability, which motivates the need for a strong sense of uncertainty. While many recent methods have explored better architectures for consuming sparse and unstructured point cloud data, we hypothesize that there is room fo… ▽ More 3D bounding boxes are a widespread intermediate representation in many computer vision applications. However, predicting them is a challenging task, largely due to partial observability, which motivates the need for a strong sense of uncertainty. While many recent methods have explored better architectures for consuming sparse and unstructured point cloud data, we hypothesize that there is room for improvement in the modeling of the output distribution and explore how this can be achieved using an autoregressive prediction head. Additionally, we release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications, where 3D bounding box prediction has largely been underexplored. We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: In ECCV 2022. Code and dataset are available at https://bbox.yuxuanliu.com

arXiv:2008.05406 [pdf, other]

doi 10.1002/pst.2104

Principal Stratum Strategy: Potential Role in Drug Development

Authors: Björn Bornkamp, Kaspar Rufibach, Jianchang Lin, Yi Liu, Devan V. Mehrotra, Satrajit Roychoudhury, Heinz Schmidli, Yue Shentu, Marcel Wolbers

Abstract: A randomized trial allows estimation of the causal effect of an intervention compared to a control in the overall population and in subpopulations defined by baseline characteristics. Often, however, clinical questions also arise regarding the treatment effect in subpopulations of patients, which would experience clinical or disease related events post-randomization. Events that occur after treatm… ▽ More A randomized trial allows estimation of the causal effect of an intervention compared to a control in the overall population and in subpopulations defined by baseline characteristics. Often, however, clinical questions also arise regarding the treatment effect in subpopulations of patients, which would experience clinical or disease related events post-randomization. Events that occur after treatment initiation and potentially affect the interpretation or the existence of the measurements are called {\it intercurrent events} in the ICH E9(R1) guideline. If the intercurrent event is a consequence of treatment, randomization alone is no longer sufficient to meaningfully estimate the treatment effect. Analyses comparing the subgroups of patients without the intercurrent events for intervention and control will not estimate a causal effect. This is well known, but post-hoc analyses of this kind are commonly performed in drug development. An alternative approach is the principal stratum strategy, which classifies subjects according to their potential occurrence of an intercurrent event on both study arms. We illustrate with examples that questions formulated through principal strata occur naturally in drug development and argue that approaching these questions with the ICH E9(R1) estimand framework has the potential to lead to more transparent assumptions as well as more adequate analyses and conclusions. In addition, we provide an overview of assumptions required for estimation of effects in principal strata. Most of these assumptions are unverifiable and should hence be based on solid scientific understanding. Sensitivity analyses are needed to assess robustness of conclusions. △ Less

Submitted 8 February, 2021; v1 submitted 12 August, 2020; originally announced August 2020.

Journal ref: Pharm. Stat., 2021, 20, 737-751

arXiv:2006.08807 [pdf]

A Nonparametric Method for Value Function Guided Subgroup Identification via Gradient Tree Boosting for Censored Survival Data

Authors: **ye Zhang, Junshui Ma, Xinqun Chen, Yue Shentu

Abstract: In randomized clinical trials with survival outcome, there has been an increasing interest in subgroup identification based on baseline genomic, proteomic markers or clinical characteristics. Some of the existing methods identify subgroups that benefit substantially from the experimental treatment by directly modeling outcomes or treatment effect. When the goal is to find an optimal treatment for… ▽ More In randomized clinical trials with survival outcome, there has been an increasing interest in subgroup identification based on baseline genomic, proteomic markers or clinical characteristics. Some of the existing methods identify subgroups that benefit substantially from the experimental treatment by directly modeling outcomes or treatment effect. When the goal is to find an optimal treatment for a given patient rather than finding the right patient for a given treatment, methods under the individualized treatment regime framework estimate an individualized treatment rule that would lead to the best expected clinical outcome as measured by a value function. Connecting the concept of value function to subgroup identification, we propose a nonparametric method that searches for subgroup membership scores by maximizing a value function that directly reflects the subgroup-treatment interaction effect based on restricted mean survival time. A gradient tree boosting algorithm is proposed to search for the individual subgroup membership scores. We conduct simulation studies to evaluate the performance of the proposed method and an application to an AIDS clinical trial is performed for illustration. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 33 pages, 3 figures, 4 tables. Revisions Submitted to Statistics in Medicine

arXiv:2006.04480 [pdf]

doi 10.1080/19466315.2020.1785543

Assessing the Impact of COVID-19 on the Objective and Analysis of Oncology Clinical Trials -- Application of the Estimand Framework

Authors: Evgeny Degtyarev, Kaspar Rufibach, Yue Shentu, Godwin Yung, Michelle Casey, Stefan Englert, Feng Liu, Yi Liu, Oliver Sailer, Jonathan Siegel, Steven Sun, Rui Tang, Jiangxiu Zhou

Abstract: COVID-19 outbreak has rapidly evolved into a global pandemic. The impact of COVID-19 on patient journeys in oncology represents a new risk to interpretation of trial results and its broad applicability for future clinical practice. We identify key intercurrent events that may occur due to COVID-19 in oncology clinical trials with a focus on time-to-event endpoints and discuss considerations pertai… ▽ More COVID-19 outbreak has rapidly evolved into a global pandemic. The impact of COVID-19 on patient journeys in oncology represents a new risk to interpretation of trial results and its broad applicability for future clinical practice. We identify key intercurrent events that may occur due to COVID-19 in oncology clinical trials with a focus on time-to-event endpoints and discuss considerations pertaining to the other estimand attributes introduced in the ICH E9 addendum. We propose strategies to handle COVID-19 related intercurrent events, depending on their relationship with malignancy and treatment and the interpretability of data after them. We argue that the clinical trial objective from a world without COVID-19 pandemic remains valid. The estimand framework provides a common language to discuss the impact of COVID-19 in a structured and transparent manner. This demonstrates that the applicability of the framework may even go beyond what it was initially intended for. △ Less

Submitted 21 June, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: Paper written on behalf of the industry working group on estimands in oncology (www.oncoestimand.org). Accepted for publication in a special issue of Statistics in Biopharmaceutical Research

Journal ref: Statistics in Biopharmaceutical Research, 2020, 12(4), 427-437

arXiv:2005.10248 [pdf]

Statistical Issues and Recommendations for Clinical Trials Conducted During the COVID-19 Pandemic

Authors: R. Daniel Meyer, Bohdana Ratitch, Marcel Wolbers, Olga Marchenko, Hui Quan, Daniel Li, Chrissie Fletcher, Xin Li, David Wright, Yue Shentu, Stefan Englert, Wei Shen, Jyotirmoy Dey, Thomas Liu, Ming Zhou, Norman Bohidar, Peng-Liang Zhao, Michael Hale

Abstract: The COVID-19 pandemic has had and continues to have major impacts on planned and ongoing clinical trials. Its effects on trial data create multiple potential statistical issues. The scale of impact is unprecedented, but when viewed individually, many of the issues are well defined and feasible to address. A number of strategies and recommendations are put forward to assess and address issues relat… ▽ More The COVID-19 pandemic has had and continues to have major impacts on planned and ongoing clinical trials. Its effects on trial data create multiple potential statistical issues. The scale of impact is unprecedented, but when viewed individually, many of the issues are well defined and feasible to address. A number of strategies and recommendations are put forward to assess and address issues related to estimands, missing data, validity and modifications of statistical analysis methods, need for additional analyses, ability to meet objectives and overall trial interpretability. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: Accepted for publication in Statistics in Biopharmaceutical Research. 40 pages

arXiv:1806.08354 [pdf, other]

Learning Instance Segmentation by Interaction

Authors: Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik

Abstract: We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions g… ▽ More We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by self-supervised interactions, we propose robust set loss. A dataset of robot's interactions along-with a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream vision-based control task of rearranging multiple objects into target configurations from visual inputs alone. Videos, code, and robotic interaction dataset are available at https://pathak22.github.io/seg-by-interaction/ △ Less

Submitted 21 June, 2018; originally announced June 2018.

Comments: Website at https://pathak22.github.io/seg-by-interaction/

arXiv:1804.08606 [pdf, other]

Zero-Shot Visual Imitation

Authors: Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell

Abstract: The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert… ▽ More The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both 'what' and 'how' to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert is only to communicate the goals (i.e., what to imitate) during inference. The learned policy is then employed to mimic the expert (i.e., how to imitate) after seeing just a sequence of images demonstrating the desired task. Our method is 'zero-shot' in the sense that the agent never has access to expert actions during training or for the task demonstration at inference. We evaluate our zero-shot imitator in two real-world settings: complex rope manipulation with a Baxter robot and navigation in previously unseen office environments with a TurtleBot. Through further experiments in VizDoom simulation, we provide evidence that better mechanisms for exploration lead to learning a more capable policy which in turn improves end task performance. Videos, models, and more details are available at https://pathak22.github.io/zeroshot-imitation/ △ Less

Submitted 23 April, 2018; originally announced April 2018.

Comments: Oral presentation at ICLR 2018. Website at https://pathak22.github.io/zeroshot-imitation/

Showing 1–10 of 10 results for author: Shentu, Y