Search | arXiv e-print repository

DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Gras** with Geometric Fabrics

Authors: Tyler Ga Wei Lum, Martin Matak, Viktor Makoviychuk, Ankur Handa, Arthur Allshire, Tucker Hermans, Nathan D. Ratliff, Karl Van Wyk

Abstract: A pivotal challenge in robotics is achieving fast, safe, and robust dexterous gras** across a diverse range of objects, an important goal within industrial applications. However, existing methods often have very limited speed, dexterity, and generality, along with limited or no hardware safety guarantees. In this work, we introduce DextrAH-G, a depth-based dexterous gras** policy trained entir… ▽ More A pivotal challenge in robotics is achieving fast, safe, and robust dexterous gras** across a diverse range of objects, an important goal within industrial applications. However, existing methods often have very limited speed, dexterity, and generality, along with limited or no hardware safety guarantees. In this work, we introduce DextrAH-G, a depth-based dexterous gras** policy trained entirely in simulation that combines reinforcement learning, geometric fabrics, and teacher-student distillation. We address key challenges in joint arm-hand policy learning, such as high-dimensional observation and action spaces, the sim2real gap, collision avoidance, and hardware constraints. DextrAH-G enables a 23 motor arm-hand robot to safely and continuously grasp and transport a large variety of objects at high speed using multi-modal inputs including depth images, allowing generalization across object geometry. Videos at https://sites.google.com/view/dextrah-g. △ Less

Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.02250 [pdf, other]

Geometric Fabrics: a Safe Guiding Medium for Policy Learning

Authors: Karl Van Wyk, Ankur Handa, Viktor Makoviychuk, Yijie Guo, Arthur Allshire, Nathan D. Ratliff

Abstract: Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers… ▽ More Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2310.17274 [pdf, other]

cuRobo: Parallelized Collision-Free Minimum-Jerk Robot Motion Generation

Authors: Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, Dieter Fox

Abstract: This paper explores the problem of collision-free motion generation for manipulators by formulating it as a global motion optimization problem. We develop a parallel optimization technique to solve this problem and demonstrate its effectiveness on massively parallel GPUs. We show that combining simple optimization techniques with many parallel seeds leads to solving difficult motion generation pro… ▽ More This paper explores the problem of collision-free motion generation for manipulators by formulating it as a global motion optimization problem. We develop a parallel optimization technique to solve this problem and demonstrate its effectiveness on massively parallel GPUs. We show that combining simple optimization techniques with many parallel seeds leads to solving difficult motion generation problems within 50ms on average, 60x faster than state-of-the-art (SOTA) trajectory optimization methods. We achieve SOTA performance by combining L-BFGS step direction estimation with a novel parallel noisy line search scheme and a particle-based optimization solver. To further aid trajectory optimization, we develop a parallel geometric planner that plans within 20ms and also introduce a collision-free IK solver that can solve over 7000 queries/s. We package our contributions into a state of the art GPU accelerated motion generation library, cuRobo and release it to enrich the robotics community. Additional details are available at https://curobo.org △ Less

Submitted 3 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: revised technical report, 62 pages, Website: https://curobo.org

arXiv:2309.07368 [pdf, ps, other]

Fabrics: A Foundationally Stable Medium for Encoding Prior Experience

Authors: Nathan Ratliff, Karl Van Wyk

Abstract: Most dynamics functions are not well-aligned to task requirements. Controllers, therefore, often invert the dynamics and reshape it into something more useful. The learning community has found that these controllers, such as Operational Space Control (OSC), can offer important inductive biases for training. However, OSC only captures straight line end-effector motion. There's a lot more behavior w… ▽ More Most dynamics functions are not well-aligned to task requirements. Controllers, therefore, often invert the dynamics and reshape it into something more useful. The learning community has found that these controllers, such as Operational Space Control (OSC), can offer important inductive biases for training. However, OSC only captures straight line end-effector motion. There's a lot more behavior we could and should be packing into these systems. Earlier work [15][16][19] developed a theory that generalized these ideas and constructed a broad and flexible class of second-order dynamical systems which was simultaneously expressive enough to capture substantial behavior (such as that listed above), and maintained the types of stability properties that make OSC and controllers like it a good foundation for policy design and learning. This paper, motivated by the empirical success of the types of fabrics used in [20], reformulates the theory of fabrics into a form that's more general and easier to apply to policy learning problems. We focus on the stability properties that make fabrics a good foundation for policy synthesis. Fabrics create a fundamentally stable medium within which a policy can operate; they influence the system's behavior without preventing it from achieving tasks within its constraints. When a fabrics is geometric (path consistent) we can interpret the fabric as forming a road network of paths that the system wants to follow at constant speed absent a forcing policy, giving geometric intuition to its role as a prior. The policy operating over the geometric fabric acts to modulate speed and steers the system from one road to the next as it accomplishes its task. We reformulate the theory of fabrics here rigorously and develop theoretical results characterizing system behavior and illuminating how to design these systems, while also emphasizing intuition throughout. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2109.10443 [pdf, other]

Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior

Authors: Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana, Buck Babich, Bryan Peele, Qian Wan, Iretiayo Akinola, Balakumar Sundaralingam, Dieter Fox, Byron Boots, Nathan D. Ratliff

Abstract: Classical mechanical systems are central to controller design in energy sha** methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guara… ▽ More Classical mechanical systems are central to controller design in energy sha** methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guarantees. In this work, we generalize classical mechanics to what we call geometric fabrics, whose expressivity and theory enable the design of systems that outperform RMPs in practice. Geometric fabrics strictly generalize classical mechanics forming a new physics of behavior by first generalizing them to Finsler geometries and then explicitly bending them to shape their behavior while maintaining stability. We develop the theory of fabrics and present both a collection of controlled experiments examining their theoretical properties and a set of robot system experiments showing improved performance over a well-engineered and hardened implementation of RMPs, our current state-of-the-art in controller design. △ Less

Submitted 18 January, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2105.03019 [pdf, other]

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Authors: Mandy Xie, Anqi Li, Karl Van Wyk, Frank Dellaert, Byron Boots, Nathan Ratliff

Abstract: Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoper… ▽ More Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors. △ Less

Submitted 5 June, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

arXiv:2104.13542 [pdf, other]

STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation

Authors: Mohak Bhardwaj, Balakumar Sundaralingam, Arsalan Mousavian, Nathan Ratliff, Dieter Fox, Fabio Ramos, Byron Boots

Abstract: Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running… ▽ More Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a system for fast, joint space sampling-based MPC for manipulators that is efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 8ms~(125Hz) to compute the next control command. Further, our method can tightly integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of dynamic manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. We also provide highly optimized, open-source code to be used by the wider robot learning and control community. Videos of experiments can be found at: https://sites.google.com/view/manipulation-mpc △ Less

Submitted 14 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: Accepted for oral presentation at the Conference on Robot Learning (CoRL), 2021. Code available at: https://github.com/NVlabs/storm

Journal ref: 5th Annual Conference on Robot Learning, 2021

arXiv:2103.05922 [pdf, other]

RMP2: A Structured Composable Policy Class for Robot Learning

Authors: Anqi Li, Ching-An Cheng, M. Asif Rana, Man Xie, Karl Van Wyk, Nathan Ratliff, Byron Boots

Abstract: We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different lev… ▽ More We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning RMPflow policies faces several computational challenges. In this work, we re-examine the message passing algorithm of RMPflow and propose a more efficient alternate algorithm, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) easy programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning which is suitable encoding domain knowledge. Our experiments show that using structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2012.13457 [pdf, other]

Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees

Authors: M. Asif Rana, Anqi Li, Dieter Fox, Sonia Chernova, Byron Boots, Nathan Ratliff

Abstract: Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the… ▽ More Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the user an interface to 1) specifying the spaces that are directly relevant to the completion of the tasks, and 2) designing policies for certain tasks that do not need to be learned. We derive an end-to-end learning objective function that is suitable for the multi-task problem, emphasizing the deviation of motions on task spaces. Furthermore, the motion generated from the learned policy class is guaranteed to be stable. We validate the effectiveness of our proposed learning framework through qualitative and quantitative evaluations on three robotic tasks on a 7-DOF Rethink Sawyer robot. △ Less

Submitted 10 March, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

arXiv:2010.15676 [pdf, other]

Optimization Fabrics for Behavioral Design

Authors: Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana

Abstract: A common approach to the provably stable design of reactive behavior, exemplified by operational space control, is to reduce the problem to the design of virtual classical mechanical systems (energy sha**). This framework is widely used, and through it we gain stability, but at the price of expressivity. This work presents a comprehensive theoretical framework expanding this approach showing tha… ▽ More A common approach to the provably stable design of reactive behavior, exemplified by operational space control, is to reduce the problem to the design of virtual classical mechanical systems (energy sha**). This framework is widely used, and through it we gain stability, but at the price of expressivity. This work presents a comprehensive theoretical framework expanding this approach showing that there is a much larger class of differential equations generalizing classical mechanical systems (and the broader class of Lagrangian systems) and greatly expanding their expressivity while maintaining the same governing stability principles. At the core of our framework is a class of differential equations we call fabrics which constitute a behavioral medium across which we can optimize a potential function. These fabrics shape the system's behavior during optimization but still always provably converge to a local minimum, making them a building block of stable behavioral design. We build the theoretical foundations of our framework here and provide a simple empirical demonstration of a practical class of geometric fabrics, which additionally exhibit a natural geometric path consistency making them convenient for flexible and intuitive behavioral design. △ Less

Submitted 25 June, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2008.02399

arXiv:2010.14750 [pdf, other]

Geometric Fabrics for the Acceleration-based Design of Robotic Motion

Authors: Mandy Xie, Karl Van Wyk, Anqi Li, Muhammad Asif Rana, Qian Wan, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: This paper describes the pragmatic design and construction of geometric fabrics for sha** a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called… ▽ More This paper describes the pragmatic design and construction of geometric fabrics for sha** a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called optimization fabrics. Fabrics generalize recent work on Riemannian Motion Policies (RMPs); they add provable stability guarantees and improve design consistency while promoting the intuitive acceleration-based principles of modular design that make RMPs successful. We describe a suite of mathematical modeling tools that practitioners can employ in practice and demonstrate both how to mitigate system complexity by constructing behaviors layer-wise and how to employ these tools to design robust, strongly-generalizing, policies that solve practical problems one would expect to find in industry applications. Our system exhibits intelligent global navigation behaviors expressed entirely as provably stable fabrics with zero planning or state machine governance. △ Less

Submitted 25 June, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.14745 [pdf, other]

Generalized Nonlinear and Finsler Geometry for Robotics

Authors: Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana

Abstract: Robotics research has found numerous important applications of Riemannian geometry. Despite that, the concept remain challenging to many roboticists because the background material is complex and strikingly foreign. Beyond {\em Riemannian} geometry, there are many natural generalizations in the mathematical literature -- areas such as Finsler geometry and spray geometry -- but those generalization… ▽ More Robotics research has found numerous important applications of Riemannian geometry. Despite that, the concept remain challenging to many roboticists because the background material is complex and strikingly foreign. Beyond {\em Riemannian} geometry, there are many natural generalizations in the mathematical literature -- areas such as Finsler geometry and spray geometry -- but those generalizations are largely inaccessible, and as a result there remain few applications within robotics. This paper presents a re-derivation of spray and Finsler geometries we found critical for the development of our recent work on a powerful behavioral design tool we call geometric fabrics. These derivations build from basic tools in advanced calculus and the calculus of variations making them more accessible to a robotics audience than standard presentations. We focus on the pragmatic and calculable results, avoiding the use of tensor notation to appeal to a broader audience, emphasizing geometric path consistency over ideas around connections and curvature. We hope that these derivations will contribute to an increased understanding of generalized nonlinear, and even classical Riemannian, geometry within the robotics community and inspire future research into new applications. △ Less

Submitted 2 July, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2008.02399 [pdf, other]

Optimization Fabrics

Authors: Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana

Abstract: This paper presents a theory of optimization fabrics, second-order differential equations that encode nominal behaviors on a space and can be used to define the behavior of a smooth optimizer. Optimization fabrics can encode commonalities among optimization problems that reflect the structure of the space itself, enabling smooth optimization processes to intelligently navigate each problem even wh… ▽ More This paper presents a theory of optimization fabrics, second-order differential equations that encode nominal behaviors on a space and can be used to define the behavior of a smooth optimizer. Optimization fabrics can encode commonalities among optimization problems that reflect the structure of the space itself, enabling smooth optimization processes to intelligently navigate each problem even when optimizing simple naive potential functions. Importantly, optimization over a fabric is inherently asymptotically stable. The majority of this paper is dedicated to the development of a tool set for the design and use of a broad class of fabrics called geometric fabrics. Geometric fabrics encode behavior as general nonlinear geometries which are covariant second-order differential equations with a special homogeneity property that ensures their behavior is independent of the system's speed through the medium. A class of Finsler Lagrangian energies can be used to both define how these nonlinear geometries combine with one another and how they react when potential functions force them from their nominal paths. Furthermore, these geometric fabrics are closed under the standard operations of pullback and combination on a transform tree. For behavior representation, this class of geometric fabrics constitutes a broad class of spectral semi-sprays (specs), also known as Riemannian Motion Policies (RMPs) in the context of robotic motion generation, that captures both the intuitive separation between acceleration policy and priority metric critical for modular design and are inherently stable. Therefore, geometric fabrics are safe and easier to use by less experienced behavioral designers. Application of this theory to policy representation and generalization in learning are discussed as well. △ Less

Submitted 21 August, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

arXiv:2007.14256 [pdf, other]

RMPflow: A Geometric Framework for Generation of Multi-Task Motion Policies

Authors: Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: Generating robot motion for multiple tasks in dynamic environments is challenging, requiring an algorithm to respond reactively while accounting for complex nonlinear relationships between tasks. In this paper, we develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies… ▽ More Generating robot motion for multiple tasks in dynamic environments is challenging, requiring an algorithm to respond reactively while accounting for complex nonlinear relationships between tasks. In this paper, we develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies that parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can combine these policies to generate an expressive global policy, while simultaneously exploiting sparse structure for computational efficiency. We study the geometric properties of RMPflow and provide sufficient conditions for stability. Finally, we experimentally demonstrate that accounting for the natural Riemannian geometry of task policies can simplify classically difficult problems, such as planning through clutter on high-DOF manipulation systems. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1811.07049

arXiv:2007.04842 [pdf, other]

An Interior Point Method Solving Motion Planning Problems with Narrow Passages

Authors: Jim Mainprice, Nathan Ratliff, Marc Toussaint, Stefan Schaal

Abstract: Algorithmic solutions for the motion planning problem have been investigated for five decades. Since the development of A* in 1969 many approaches have been investigated, traditionally classified as either grid decomposition, potential fields or sampling-based. In this work, we focus on using numerical optimization, which is understudied for solving motion planning problems. This lack of interest… ▽ More Algorithmic solutions for the motion planning problem have been investigated for five decades. Since the development of A* in 1969 many approaches have been investigated, traditionally classified as either grid decomposition, potential fields or sampling-based. In this work, we focus on using numerical optimization, which is understudied for solving motion planning problems. This lack of interest in the favor of sampling-based methods is largely due to the non-convexity introduced by narrow passages. We address this shortcoming by grounding the solution in differential geometry. We demonstrate through a series of experiments on 3 Dofs and 6 Dofs narrow passage problems, how modeling explicitly the underlying Riemannian manifold leads to an efficient interior-point non-linear programming solution. △ Less

Submitted 24 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: IEEE RO-MAN 2020, 6 pages

arXiv:2006.03106 [pdf, other]

doi 10.1109/LRA.2020.2972836

Model-Based Generalization Under Parameter Uncertainty Using Path Integral Control

Authors: Ian Abraham, Ankur Handa, Nathan Ratliff, Kendall Lowrey, Todd D. Murphey, Dieter Fox

Abstract: This work addresses the problem of robot interaction in complex environments where online control and adaptation is necessary. By expanding the sample space in the free energy formulation of path integral control, we derive a natural extension to the path integral control that embeds uncertainty into action and provides robustness for model-based robot planning. Our algorithm is applied to a diver… ▽ More This work addresses the problem of robot interaction in complex environments where online control and adaptation is necessary. By expanding the sample space in the free energy formulation of path integral control, we derive a natural extension to the path integral control that embeds uncertainty into action and provides robustness for model-based robot planning. Our algorithm is applied to a diverse set of tasks using different robots and validate our results in simulation and real-world experiments. We further show that our method is capable of running in real-time without loss of performance. Videos of the experiments as well as additional implementation details can be found at https://sites.google.com/view/emppi. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Journal ref: IEEE Robotics and Automation Letters ( Volume: 5 , Issue: 2 , April 2020 )

arXiv:2005.13143 [pdf, other]

Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems

Authors: Muhammad Asif Rana, Anqi Li, Dieter Fox, Byron Boots, Fabio Ramos, Nathan Ratliff

Abstract: Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphi… ▽ More Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphism, is equivalent to a simple, hand-specified dynamical system. As an immediate result of using diffeomorphisms, the stability property of the hand-specified dynamical system directly carry over to the learned dynamical system. Inspired by recent works in density estimation, we propose to represent the diffeomorphism as a composition of simple parameterized diffeomorphisms. Additional structure is imposed to provide guarantees on the smoothness of the generated motions. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system. △ Less

Submitted 21 September, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: 2nd Annual Conference on Learning for Dynamics and Control (L4DC) 2020 -- Revised Version

arXiv:2005.10872 [pdf, other]

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Authors: Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, Dieter Fox

Abstract: Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this w… ▽ More Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl △ Less

Submitted 26 May, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

Journal ref: International Conference in Robotics and Automation 2020

arXiv:1910.04339 [pdf, other]

Collaborative Behavior Models for Optimized Human-Robot Teamwork

Authors: Adam Fishman, Chris Paxton, Wei Yang, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: Effective human-robot collaboration requires informed anticipation. The robot must anticipate the human's actions, but also react quickly and intuitively when its predictions are wrong. The robot must plan its actions to account for the human's own plan, with the knowledge that the human's behavior will change based on what the robot actually does. This cyclical game of predicting a human's future… ▽ More Effective human-robot collaboration requires informed anticipation. The robot must anticipate the human's actions, but also react quickly and intuitively when its predictions are wrong. The robot must plan its actions to account for the human's own plan, with the knowledge that the human's behavior will change based on what the robot actually does. This cyclical game of predicting a human's future actions and generating a corresponding motion plan is extremely difficult to model using standard techniques. In this work, we describe a novel Model Predictive Control (MPC)-based framework for finding optimal trajectories in a collaborative, multi-agent setting, in which we simultaneously plan for the robot while predicting the actions of its external collaborators. We use human-robot handovers to demonstrate that with a strong model of the collaborator, our framework produces fluid, reactive human-robot interactions in novel, cluttered environments. Our method efficiently generates coordinated trajectories, and achieves a high success rate in handover, even in the presence of significant sensor noise. △ Less

Submitted 3 September, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: 3 figures, 7 pages

arXiv:1910.03135 [pdf, other]

DexPilot: Vision Based Teleoperation of Dexterous Robotic Hand-Arm System

Authors: Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, Dieter Fox

Abstract: Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system… ▽ More Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system, DexPilot, was developed that allows for complete control over the full 23 DoA robotic system by merely observing the bare human hand. DexPilot enables operators to carry out a variety of complex manipulation tasks that go beyond simple pick-and-place operations. This allows for collection of high dimensional, multi-modality, state-action data that can be leveraged in the future to learn sensorimotor policies for challenging manipulation tasks. The system performance was measured through speed and reliability metrics across two human demonstrators on a variety of tasks. The videos of the experiments can be found at https://sites.google.com/view/dex-pilot. △ Less

Submitted 14 October, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: 17 pages, first version of DexPilot

arXiv:1910.02646 [pdf, other]

Riemannian Motion Policy Fusion through Learnable Lyapunov Function Resha**

Authors: Mustafa Mukadam, Ching-An Cheng, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: RMPflow is a recently proposed policy-fusion framework based on differential geometry. While RMPflow has demonstrated promising performance, it requires the user to provide sensible subtask policies as Riemannian motion policies (RMPs: a motion policy and an importance matrix function), which can be a difficult design problem in its own right. We propose RMPfusion, a variation of RMPflow, to addre… ▽ More RMPflow is a recently proposed policy-fusion framework based on differential geometry. While RMPflow has demonstrated promising performance, it requires the user to provide sensible subtask policies as Riemannian motion policies (RMPs: a motion policy and an importance matrix function), which can be a difficult design problem in its own right. We propose RMPfusion, a variation of RMPflow, to address this issue. RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment. This extra flexibility can remedy imperfect subtask RMPs provided by the user, improving the combined policy's performance. These weight functions can be learned by back-propagation. Moreover, we prove that, under mild restrictions on the weight functions, RMPfusion always yields a globally Lyapunov-stable motion policy. This implies that we can treat RMPfusion as a structured policy class in policy optimization that is guaranteed to generate stable policies, even during the immature phase of learning. We demonstrate these properties of RMPfusion in imitation learning experiments both in simulation and on a real-world robot. △ Less

Submitted 8 October, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: Conference on Robot Learning (CoRL), 2019

arXiv:1909.12329 [pdf, other]

Scaling Local Control to Large-Scale Topological Navigation

Authors: Xiangyun Meng, Nathan Ratliff, Yu Xiang, Dieter Fox

Abstract: Visual topological navigation has been revitalized recently thanks to the advancement of deep learning that substantially improves robot perception. However, the scalability and reliability issue remain challenging due to the complexity and ambiguity of real world images and mechanical constraints of real robots. We present an intuitive solution to show that by accurately measuring the capability… ▽ More Visual topological navigation has been revitalized recently thanks to the advancement of deep learning that substantially improves robot perception. However, the scalability and reliability issue remain challenging due to the complexity and ambiguity of real world images and mechanical constraints of real robots. We present an intuitive solution to show that by accurately measuring the capability of a local controller, large-scale visual topological navigation can be achieved while being scalable and robust. Our approach achieves state-of-the-art results in trajectory following and planning in large-scale environments. It also generalizes well to real robots and new environments without retraining or finetuning. △ Less

Submitted 18 March, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

arXiv:1908.01896 [pdf, other]

Representing Robot Task Plans as Robust Logical-Dynamical Systems

Authors: Chris Paxton, Nathan Ratliff, Clemens Eppner, Dieter Fox

Abstract: It is difficult to create robust, reusable, and reactive behaviors for robots that can be easily extended and combined. Frameworks such as Behavior Trees are flexible but difficult to characterize, especially when designing reactions and recovery behaviors to consistently converge to a desired goal condition. We propose a framework which we call Robust Logical-Dynamical Systems (RLDS), which combi… ▽ More It is difficult to create robust, reusable, and reactive behaviors for robots that can be easily extended and combined. Frameworks such as Behavior Trees are flexible but difficult to characterize, especially when designing reactions and recovery behaviors to consistently converge to a desired goal condition. We propose a framework which we call Robust Logical-Dynamical Systems (RLDS), which combines the advantages of task representations like behavior trees with theoretical guarantees on performance. RLDS can also be constructed automatically from simple sequential task plans and will still achieve robust, reactive behavior in dynamic real-world environments. In this work, we describe both our proposed framework and a case study on a simple household manipulation task, with examples for how specific pieces can be implemented to achieve robust behavior. Finally, we show how in the context of these manipulation tasks, a combination of an RLDS with planning can achieve better results under adversarial conditions. △ Less

Submitted 5 August, 2019; originally announced August 2019.

Comments: 9 pages, extended version of IROS 2019 paper

arXiv:1904.01762 [pdf, other]

Neural Autonomous Navigation with Riemannian Motion Policy

Authors: Xiangyun Meng, Nathan Ratliff, Yu Xiang, Dieter Fox

Abstract: End-to-end learning for autonomous navigation has received substantial attention recently as a promising method for reducing modeling error. However, its data complexity, especially around generalization to unseen environments, is high. We introduce a novel image-based autonomous navigation technique that leverages in policy structure using the Riemannian Motion Policy (RMP) framework for deep lea… ▽ More End-to-end learning for autonomous navigation has received substantial attention recently as a promising method for reducing modeling error. However, its data complexity, especially around generalization to unseen environments, is high. We introduce a novel image-based autonomous navigation technique that leverages in policy structure using the Riemannian Motion Policy (RMP) framework for deep learning of vehicular control. We design a deep neural network to predict control point RMPs of the vehicle from visual images, from which the optimal control commands can be computed analytically. We show that our network trained in the Gibson environment can be used for indoor obstacle avoidance and navigation on a real RC car, and our RMP representation generalizes better to unseen environments than predicting local geometry or predicting control commands directly. △ Less

Submitted 3 April, 2019; originally announced April 2019.

arXiv:1903.03699 [pdf, other]

Joint Inference of Kinematic and Force Trajectories with Visuo-Tactile Sensing

Authors: Alexander Lambert, Mustafa Mukadam, Balakumar Sundaralingam, Nathan Ratliff, Byron Boots, Dieter Fox

Abstract: To perform complex tasks, robots must be able to interact with and manipulate their surroundings. One of the key challenges in accomplishing this is robust state estimation during physical interactions, where the state involves not only the robot and the object being manipulated, but also the state of the contact itself. In this work, within the context of planar pushing, we extend previous infere… ▽ More To perform complex tasks, robots must be able to interact with and manipulate their surroundings. One of the key challenges in accomplishing this is robust state estimation during physical interactions, where the state involves not only the robot and the object being manipulated, but also the state of the contact itself. In this work, within the context of planar pushing, we extend previous inference-based approaches to state estimation in several ways. We estimate the robot, object, and the contact state on multiple manipulation platforms configured with a vision-based articulated model tracker, and either a biomimetic tactile sensor or a force-torque sensor. We show how to fuse raw measurements from the tracker and tactile sensors to jointly estimate the trajectory of the kinematic states and the forces in the system via probabilistic inference on factor graphs, in both batch and incremental settings. We perform several benchmarks with our framework and show how performance is affected by incorporating various geometric and physics based constraints, occluding vision sensors, or injecting noise in tactile sensors. We also compare with prior work on multiple datasets and demonstrate that our approach can effectively optimize over multi-modal sensor data and reduce uncertainty to find better state estimates. △ Less

Submitted 8 March, 2019; originally announced March 2019.

arXiv:1811.07049 [pdf, other]

RMPflow: A Computational Graph for Automatic Motion Policy Generation

Authors: Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies designed to parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can consistently combine these local polic… ▽ More We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies designed to parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can consistently combine these local policies to generate an expressive global policy, while simultaneously exploiting sparse structure for computational efficiency. We study the geometric properties of RMPflow and provide sufficient conditions for stability. Finally, we experimentally demonstrate that accounting for the geometry of task policies can simplify classically difficult problems, such as planning through clutter on high-DOF manipulation systems. △ Less

Submitted 5 April, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

Comments: WAFR 2018

arXiv:1811.03704 [pdf, other]

doi 10.1109/ICRA.2019.8793520

Learning Latent Space Dynamics for Tactile Servoing

Authors: Giovanni Sutanto, Nathan Ratliff, Balakumar Sundaralingam, Yevgen Chebotar, Zhe Su, Ankur Handa, Dieter Fox

Abstract: To achieve a dexterous robotic manipulation, we need to endow our robot with tactile feedback capability, i.e. the ability to drive action based on tactile sensing. In this paper, we specifically address the challenge of tactile servoing, i.e. given the current tactile sensing and a target/goal tactile sensing --memorized from a successful task execution in the past-- what is the action that will… ▽ More To achieve a dexterous robotic manipulation, we need to endow our robot with tactile feedback capability, i.e. the ability to drive action based on tactile sensing. In this paper, we specifically address the challenge of tactile servoing, i.e. given the current tactile sensing and a target/goal tactile sensing --memorized from a successful task execution in the past-- what is the action that will bring the current tactile sensing to move closer towards the target tactile sensing at the next time step. We develop a data-driven approach to acquire a dynamics model for tactile servoing by learning from demonstration. Moreover, our method represents the tactile sensing information as to lie on a surface --or a 2D manifold-- and perform a manifold learning, making it applicable to any tactile skin geometry. We evaluate our method on a contact point tracking task using a robot equipped with a tactile finger. A video demonstrating our approach can be seen in https://youtu.be/0QK0-Vx7WkI △ Less

Submitted 15 April, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: Accepted to be published at the International Conference on Robotics and Automation (ICRA) 2019. The final version for publication at ICRA 2019 is 7 pages (i.e. 6 pages of technical content (including text, figures, tables, acknowledgement, etc.) and 1 page of the Bibliography/References), while this arXiv version is 8 pages (added Appendix and some extra details)

arXiv:1810.06509 [pdf, other]

Predictor-Corrector Policy Optimization

Authors: Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots

Abstract: We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen futur… ▽ More We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online learning, which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information. We show, in both theory and simulation, that the convergence rate of several first-order model-free algorithms can be improved by PicCoLO. △ Less

Submitted 24 May, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

arXiv:1810.06187 [pdf, other]

Robust Learning of Tactile Force Estimation through Robot Interaction

Authors: Balakumar Sundaralingam, Alexander Lambert, Ankur Handa, Byron Boots, Tucker Hermans, Stan Birchfield, Nathan Ratliff, Dieter Fox

Abstract: Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a map** for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage… ▽ More Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a map** for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage information about the sensor surface to regularize the loss function. To learn a robust tactile force model that transfers across tasks, we generate ground truth data from three different sources: (1) the BioTac rigidly mounted to a force torque~(FT) sensor, (2) a robot interacting with a ball rigidly attached to the same FT sensor, and (3) through force inference on a planar pushing task by formalizing the mechanics as a system of particles and optimizing over the object motion. A total of 140k samples were collected from the three sources. We achieve a median angular accuracy of 3.5 degrees in predicting force direction (66% improvement over the current state of the art) and a median magnitude accuracy of 0.06 N (93% improvement) on a test dataset. Additionally, we evaluate the learned force model in a force feedback grasp controller performing object lifting and gentle placement. Our results can be found on https://sites.google.com/view/tactile-force. △ Less

Submitted 5 March, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: accepted to ICRA 2019 (camera ready version)

arXiv:1810.05687 [pdf, other]

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

Authors: Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, Dieter Fox

Abstract: We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by ma… ▽ More We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt △ Less

Submitted 5 March, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

arXiv:1801.02854 [pdf, other]

Riemannian Motion Policies

Authors: Nathan D. Ratliff, Jan Issac, Daniel Kappler, Stan Birchfield, Dieter Fox

Abstract: We introduce the Riemannian Motion Policy (RMP), a new mathematical object for modular motion generation. An RMP is a second-order dynamical system (acceleration field or motion policy) coupled with a corresponding Riemannian metric. The motion policy maps positions and velocities to accelerations, while the metric captures the directions in the space important to the policy. We show that RMPs pro… ▽ More We introduce the Riemannian Motion Policy (RMP), a new mathematical object for modular motion generation. An RMP is a second-order dynamical system (acceleration field or motion policy) coupled with a corresponding Riemannian metric. The motion policy maps positions and velocities to accelerations, while the metric captures the directions in the space important to the policy. We show that RMPs provide a straightforward and convenient method for combining multiple motion policies and transforming such policies from one space (such as the task space) to another (such as the configuration space) in geometrically consistent ways. The operators we derive for these combinations and transformations are provably optimal, have linearity properties making them agnostic to the order of application, and are strongly analogous to the covariant transformations of natural gradients popular in the machine learning literature. The RMP framework enables the fusion of motion policies from different motion generation paradigms, such as dynamical systems, dynamic movement primitives (DMPs), optimal control, operational space control, nonlinear reactive controllers, motion optimization, and model predictive control (MPC), thus unifying these disparate techniques from the literature. RMPs are easy to implement and manipulate, facilitate controller design, simplify handling of joint limits, and clarify a number of open questions regarding the proper fusion of motion generation methods (such as incorporating local reactive policies into long-horizon optimizers). We demonstrate the effectiveness of RMPs on both simulation and real robots, including their ability to naturally and efficiently solve complicated collision avoidance problems previously handled by more complex planners. △ Less

Submitted 25 July, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

arXiv:1710.02513 [pdf, other]

A New Data Source for Inverse Dynamics Learning

Authors: Daniel Kappler, Franziska Meier, Nathan Ratliff, Stefan Schaal

Abstract: Modern robotics is gravitating toward increasingly collaborative human robot interaction. Tools such as acceleration policies can naturally support the realization of reactive, adaptive, and compliant robots. These tools require us to model the system dynamics accurately -- a difficult task. The fundamental problem remains that simulation and reality diverge--we do not know how to accurately chang… ▽ More Modern robotics is gravitating toward increasingly collaborative human robot interaction. Tools such as acceleration policies can naturally support the realization of reactive, adaptive, and compliant robots. These tools require us to model the system dynamics accurately -- a difficult task. The fundamental problem remains that simulation and reality diverge--we do not know how to accurately change a robot's state. Thus, recent research on improving inverse dynamics models has been focused on making use of machine learning techniques. Traditional learning techniques train on the actual realized accelerations, instead of the policy's desired accelerations, which is an indirect data source. Here we show how an additional training signal -- measured at the desired accelerations -- can be derived from a feedback control signal. This effectively creates a second data source for learning inverse dynamics models. Furthermore, we show how both the traditional and this new data source, can be used to train task-specific models of the inverse dynamics, when used independently or combined. We analyze the use of both data sources in simulation and demonstrate its effectiveness on a real-world robotic platform. We show that our system incrementally improves the learned inverse dynamics model, and when using both data sources combined converges more consistently and faster. △ Less

Submitted 6 October, 2017; originally announced October 2017.

Comments: IROS 2017

arXiv:1703.03512 [pdf, other]

Real-time Perception meets Reactive Motion Generation

Authors: Daniel Kappler, Franziska Meier, Jan Issac, Jim Mainprice, Cristina Garcia Cifuentes, Manuel Wüthrich, Vincent Berenz, Stefan Schaal, Nathan Ratliff, Jeannette Bohg

Abstract: We address the challenging problem of robotic gras** and manipulation in the presence of uncertainty. This uncertainty is due to noisy sensing, inaccurate models and hard-to-predict environment dynamics. We quantify the importance of continuous, real-time perception and its tight integration with reactive motion generation methods in dynamic manipulation scenarios. We compare three different sys… ▽ More We address the challenging problem of robotic gras** and manipulation in the presence of uncertainty. This uncertainty is due to noisy sensing, inaccurate models and hard-to-predict environment dynamics. We quantify the importance of continuous, real-time perception and its tight integration with reactive motion generation methods in dynamic manipulation scenarios. We compare three different systems that are instantiations of the most common architectures in the field: (i) a traditional sense-plan-act approach that is still widely used, (ii) a myopic controller that only reacts to local environment dynamics and (iii) a reactive planner that integrates feedback control and motion optimization. All architectures rely on the same components for real-time perception and reactive motion generation to allow a quantitative evaluation. We extensively evaluate the systems on a real robotic platform in four scenarios that exhibit either a challenging workspace geometry or a dynamic environment. In 333 experiments, we quantify the robustness and accuracy that is due to integrating real-time feedback at different time scales in a reactive motion generation system. We also report on the lessons learned for system building. △ Less

Submitted 6 October, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

arXiv:1608.00309 [pdf, other]

DOOMED: Direct Online Optimization of Modeling Errors in Dynamics

Authors: Nathan Ratliff, Franziska Meier, Daniel Kappler, Stefan Schaal

Abstract: It has long been hoped that model-based control will improve tracking performance while maintaining or increasing compliance. This hope hinges on having or being able to estimate an accurate inverse dynamics model. As a result, substantial effort has gone into modeling and estimating dynamics (error) models. Most recent research has focused on learning the true inverse dynamics using data points m… ▽ More It has long been hoped that model-based control will improve tracking performance while maintaining or increasing compliance. This hope hinges on having or being able to estimate an accurate inverse dynamics model. As a result, substantial effort has gone into modeling and estimating dynamics (error) models. Most recent research has focused on learning the true inverse dynamics using data points map** observed accelerations to the torques used to generate them. Unfortunately, if the initial tracking error is bad, such learning processes may train substantially off-distribution to predict well on actual observed acceleration rather then the desired accelerations. This work takes a different approach. We define a class of gradient-based online learning algorithms we term Direct Online Optimization for Modeling Errors in Dynamics (DOOMED) that directly minimize an objective measuring the divergence between actual and desired accelerations. Our objective is defined in terms of the true system's unknown dynamics and is therefore impossible to evaluate. However, we show that its gradient is measurable online from system data. We develop a novel adaptive control approach based on running online learning to directly correct (inverse) dynamics errors in real time using the data stream from the robot to accurately achieve desired accelerations during execution. △ Less

Submitted 9 August, 2016; v1 submitted 31 July, 2016; originally announced August 2016.

Comments: Added an acknowledgements section

arXiv:1605.09296 [pdf, other]

On the Fundamental Importance of Gauss-Newton in Motion Optimization

Authors: Nathan Ratliff, Marc Toussaint, Jeannette Bohg, Stefan Schaal

Abstract: Hessian information speeds convergence substantially in motion optimization. The better the Hessian approximation the better the convergence. But how good is a given approximation theoretically? How much are we losing? This paper addresses that question and proves that for a particularly popular and empirically strong approximation known as the Gauss-Newton approximation, we actually lose very lit… ▽ More Hessian information speeds convergence substantially in motion optimization. The better the Hessian approximation the better the convergence. But how good is a given approximation theoretically? How much are we losing? This paper addresses that question and proves that for a particularly popular and empirically strong approximation known as the Gauss-Newton approximation, we actually lose very little--for a large class of highly expressive objective terms, the true Hessian actually limits to the Gauss-Newton Hessian quickly as the trajectory's time discretization becomes small. This result both motivates it's use and offers insight into computationally efficient design. For instance, traditional representations of kinetic energy exploit the generalized inertia matrix whose derivatives are usually difficult to compute. We introduce here a novel reformulation of rigid body kinetic energy designed explicitly for fast and accurate curvature calculation. Our theorem proves that the Gauss-Newton Hessian under this formulation efficiently captures the kinetic energy curvature, but requires only as much computation as a single evaluation of the traditional representation. Additionally, we introduce a technique that exploits these ideas implicitly using Cholesky decompositions for some cases when similar objective terms reformulations exist but may be difficult to find. Our experiments validate these findings and demonstrate their use on a real-world motion optimization system for high-dof motion generation. △ Less

Submitted 30 May, 2016; originally announced May 2016.

arXiv:1503.06375 [pdf, other]

Policy Learning with Hypothesis based Local Action Selection

Authors: Bharath Sankaran, Jeannette Bohg, Nathan Ratliff, Stefan Schaal

Abstract: For robots to be able to manipulate in unknown and unstructured environments the robot should be capable of operating under partial observability of the environment. Object occlusions and unmodeled environments are some of the factors that result in partial observability. A common scenario where this is encountered is manipulation in clutter. In the case that the robot needs to locate an object of… ▽ More For robots to be able to manipulate in unknown and unstructured environments the robot should be capable of operating under partial observability of the environment. Object occlusions and unmodeled environments are some of the factors that result in partial observability. A common scenario where this is encountered is manipulation in clutter. In the case that the robot needs to locate an object of interest and manipulate it, it needs to perform a series of decluttering actions to accurately detect the object of interest. To perform such a series of actions, the robot also needs to account for the dynamics of objects in the environment and how they react to contact. This is a non trivial problem since one needs to reason not only about robot-object interactions but also object-object interactions in the presence of contact. In the example scenario of manipulation in clutter, the state vector would have to account for the pose of the object of interest and the structure of the surrounding environment. The process model would have to account for all the aforementioned robot-object, object-object interactions. The complexity of the process model grows exponentially as the number of objects in the scene increases. This is commonly the case in unstructured environments. Hence it is not reasonable to attempt to model all object-object and robot-object interactions explicitly. Under this setting we propose a hypothesis based action selection algorithm where we construct a hypothesis set of the possible poses of an object of interest given the current evidence in the scene and select actions based on our current set of hypothesis. This hypothesis set tends to represent the belief about the structure of the environment and the number of poses the object of interest can take. The agent's only stop** criterion is when the uncertainty regarding the pose of the object is fully resolved. △ Less

Submitted 8 May, 2015; v1 submitted 21 March, 2015; originally announced March 2015.

Comments: RLDM abstract

arXiv:1202.3702 [pdf]

Semi-supervised Learning with Density Based Distances

Authors: Avleen S. Bijral, Nathan Ratliff, Nathan Srebro

Abstract: We present a simple, yet effective, approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distance-based supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also… ▽ More We present a simple, yet effective, approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distance-based supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also present a novel algorithm which integrates nearest neighbor computations into the shortest path search and can find exact shortest paths even in extremely large dense graphs. Significant runtime improvement over the commonly used Laplacian regularization method is then shown on a large scale dataset. △ Less

Submitted 14 February, 2012; originally announced February 2012.

Report number: UAI-P-2011-PG-43-50

Showing 1–37 of 37 results for author: Ratliff, N