-
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
Authors:
H. J. Terry Suh,
Glen Chou,
Hongkai Dai,
Lujie Yang,
Abhishek Gupta,
Russ Tedrake
Abstract:
Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to dat…
▽ More
Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties: (i) it allows gradient-based methods that attempt to minimize uncertainty to drive iterates to data as smoothing is annealed, and (ii) it facilitates analysis of model bias with Lipschitz constants. As distance to data can be expensive to compute online, we consider settings where we need amortize this computation. Instead of learning the distance however, we propose to learn its gradients directly as an oracle for first-order optimizers. We show these gradients can be efficiently learned with score-matching techniques by leveraging the equivalence between distance to data and data likelihood. Using this insight, we propose Score-Guided Planning (SGP), a planning algorithm for offline RL that utilizes score-matching to enable first-order planning in high-dimensional problems, where zeroth-order methods were unable to scale, and ensembles were unable to overcome local minima. Website: https://sites.google.com/view/score-guided-planning/home
△ Less
Submitted 16 October, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Global Planning for Contact-Rich Manipulation via Local Smoothing of Quasi-dynamic Contact Models
Authors:
Tao Pang,
H. J. Terry Suh,
Lujie Yang,
Russ Tedrake
Abstract:
The empirical success of Reinforcement Learning (RL) in the setting of contact-rich manipulation leaves much to be understood from a model-based perspective, where the key difficulties are often attributed to (i) the explosion of contact modes, (ii) stiff, non-smooth contact dynamics and the resulting exploding / discontinuous gradients, and (iii) the non-convexity of the planning problem. The sto…
▽ More
The empirical success of Reinforcement Learning (RL) in the setting of contact-rich manipulation leaves much to be understood from a model-based perspective, where the key difficulties are often attributed to (i) the explosion of contact modes, (ii) stiff, non-smooth contact dynamics and the resulting exploding / discontinuous gradients, and (iii) the non-convexity of the planning problem. The stochastic nature of RL addresses (i) and (ii) by effectively sampling and averaging the contact modes. On the other hand, model-based methods have tackled the same challenges by smoothing contact dynamics analytically. Our first contribution is to establish the theoretical equivalence of the two methods for simple systems, and provide qualitative and empirical equivalence on a number of complex examples. In order to further alleviate (ii), our second contribution is a convex, differentiable and quasi-dynamic formulation of contact dynamics, which is amenable to both smoothing schemes, and has proven through experiments to be highly effective for contact-rich planning. Our final contribution resolves (iii), where we show that classical sampling-based motion planning algorithms can be effective in global planning when contact modes are abstracted via smoothing. Applying our method on a collection of challenging contact-rich manipulation tasks, we demonstrate that efficient model-based motion planning can achieve results comparable to RL with dramatically less computation. Video: https://youtu.be/12Ew4xC-VwA
△ Less
Submitted 27 February, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Do Differentiable Simulators Give Better Policy Gradients?
Authors:
H. J. Terry Suh,
Max Simchowitz,
Kaiqing Zhang,
Russ Tedrake
Abstract:
Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial…
▽ More
Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $α$-order gradient estimator, with $α\in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $α$-order estimator on some numerical examples.
△ Less
Submitted 22 August, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
SEED: Series Elastic End Effectors in 6D for Visuotactile Tool Use
Authors:
H. J. Terry Suh,
Naveen Kuppuswamy,
Tao Pang,
Paul Mitiguy,
Alex Alspach,
Russ Tedrake
Abstract:
We propose the framework of Series Elastic End Effectors in 6D (SEED), which combines a spatially compliant element with visuotactile sensing to grasp and manipulate tools in the wild. Our framework generalizes the benefits of series elasticity to 6-dof, while providing an abstraction of control using visuotactile sensing. We propose an algorithm for relative pose estimation from visuotactile sens…
▽ More
We propose the framework of Series Elastic End Effectors in 6D (SEED), which combines a spatially compliant element with visuotactile sensing to grasp and manipulate tools in the wild. Our framework generalizes the benefits of series elasticity to 6-dof, while providing an abstraction of control using visuotactile sensing. We propose an algorithm for relative pose estimation from visuotactile sensing, and a spatial hybrid force-position controller capable of achieving stable force interaction with the environment. We demonstrate the effectiveness of our framework on tools that require regulation of spatial forces. Video link: https://youtu.be/2-YuIfspDrk
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Bundled Gradients through Contact via Randomized Smoothing
Authors:
H. J. Terry Suh,
Tao Pang,
Russ Tedrake
Abstract:
The empirical success of derivative-free methods in reinforcement learning for planning through contact seems at odds with the perceived fragility of classical gradient-based optimization methods in these domains. What is causing this gap, and how might we use the answer to improve gradient-based methods? We believe a stochastic formulation of dynamics is one crucial ingredient. We use tools from…
▽ More
The empirical success of derivative-free methods in reinforcement learning for planning through contact seems at odds with the perceived fragility of classical gradient-based optimization methods in these domains. What is causing this gap, and how might we use the answer to improve gradient-based methods? We believe a stochastic formulation of dynamics is one crucial ingredient. We use tools from randomized smoothing to analyze sampling-based approximations of the gradient, and formalize such approximations through the gradient bundle. We show that using the gradient bundle in lieu of the gradient mitigates fast-changing gradients of non-smooth contact dynamics modeled by the implicit time-step**, or the penalty method. Finally, we apply the gradient bundle to optimal control using iLQR, introducing a novel algorithm which improves convergence over using exact gradients. Combining our algorithm with a convex implicit time-step** formulation of contact, we show that we can tractably tackle planning-through-contact problems in manipulation.
△ Less
Submitted 21 January, 2022; v1 submitted 10 September, 2021;
originally announced September 2021.
-
The Surprising Effectiveness of Linear Models for Visual Foresight in Object Pile Manipulation
Authors:
H. J. Terry Suh,
Russ Tedrake
Abstract:
In this paper, we tackle the problem of pushing piles of small objects into a desired target set using visual feedback. Unlike conventional single-object manipulation pipelines, which estimate the state of the system parametrized by pose, the underlying physical state of this system is difficult to observe from images. Thus, we take the approach of reasoning directly in the space of images, and ac…
▽ More
In this paper, we tackle the problem of pushing piles of small objects into a desired target set using visual feedback. Unlike conventional single-object manipulation pipelines, which estimate the state of the system parametrized by pose, the underlying physical state of this system is difficult to observe from images. Thus, we take the approach of reasoning directly in the space of images, and acquire the dynamics of visual measurements in order to synthesize a visual-feedback policy. We present a simple controller using an image-space Lyapunov function, and evaluate the closed-loop performance using three different class of models for image prediction: deep-learning-based models for image-to-image translation, an object-centric model obtained from treating each pixel as a particle, and a switched-linear system where an action-dependent linear map is used. Through results in simulation and experiment, we show that for this task, a linear model works surprisingly well -- achieving better prediction error, downstream task performance, and generalization to new environments than the deep models we trained on the same amount of data. We believe these results provide an interesting example in the spectrum of models that are most useful for vision-based feedback in manipulation, considering both the quality of visual prediction, as well as compatibility with rigorous methods for control design and analysis. Project site: https://sites.google.com/view/linear-visual-foresight/home
△ Less
Submitted 15 June, 2020; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Energy-Efficient Motion Planning for Multi-Modal Hybrid Locomotion
Authors:
H. J. Terry Suh,
Xiaobin Xiong,
Andrew Singletary,
Aaron D. Ames,
Joel W. Burdick
Abstract:
Hybrid locomotion, which combines multiple modalities of locomotion within a single robot, enables robots to carry out complex tasks in diverse environments. This paper presents a novel method for planning multi-modal locomotion trajectories using approximate dynamic programming. We formulate this problem as a shortest-path search through a state-space graph, where the edge cost is assigned as opt…
▽ More
Hybrid locomotion, which combines multiple modalities of locomotion within a single robot, enables robots to carry out complex tasks in diverse environments. This paper presents a novel method for planning multi-modal locomotion trajectories using approximate dynamic programming. We formulate this problem as a shortest-path search through a state-space graph, where the edge cost is assigned as optimal transport cost along each segment. This cost is approximated from batches of offline trajectory optimizations, which allows the complex effects of vehicle under-actuation and dynamic constraints to be approximately captured in a tractable way. Our method is illustrated on a hybrid double-integrator, an amphibious robot, and a flying-driving drone, showing the practicality of the approach.
△ Less
Submitted 4 August, 2020; v1 submitted 23 September, 2019;
originally announced September 2019.