Search | arXiv e-print repository

arXiv:2404.19264 [pdf, other]

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.05291 [pdf, other]

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Authors: Yutao Ouyang, **han Li, Yunfei Li, Zhongyu Li, Chao Yu, Koushil Sreenath, Yi Wu

Abstract: We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environm… ▽ More We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner for sketching a plan, a parameter calculator for predicting arguments in the plan, and a code generator to convert the plan into executable robot code. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.20328 [pdf, other]

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

Authors: Zhengmao He, Kun Lei, Yanjie Ze, Koushil Sreenath, Zhongyu Li, Huazhe Xu

Abstract: Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four… ▽ More Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Project website: https://zhengmaohe.github.io/leg-manip

arXiv:2403.20001 [pdf, other]

Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

Authors: Boyuan Liang, Lingfeng Sun, Xinghao Zhu, Bike Zhang, Ziyin Xiong, Chenran Li, Koushil Sreenath, Masayoshi Tomizuka

Abstract: In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Pre-defined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we propose a simplified, energy-centric reward strategy to foster the de… ▽ More In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Pre-defined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we propose a simplified, energy-centric reward strategy to foster the development of energy-efficient locomotion across various speeds in quadruped robots. By implementing an adaptive energy reward function and adjusting the weights based on velocity, we demonstrate that our approach enables ANYmal-C and Unitree Go1 robots to autonomously select appropriate gaits, such as four-beat walking at lower speeds and trotting at higher speeds, resulting in improved energy efficiency and stable velocity tracking compared to previous methods using complex reward designs and prior gait knowledge. The effectiveness of our policy is validated through simulations in the IsaacGym simulation environment and on real robots, demonstrating its potential to facilitate stable and adaptive locomotion. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 8 pages, 5 figures

arXiv:2403.17320 [pdf, other]

Leveraging Symmetry in RL-based Legged Locomotion Control

Authors: Zhi Su, Xiaoyu Huang, Daniel Ordoñez-Apraez, Yunfei Li, Zhongyu Li, Qiayuan Liao, Giulio Turrisi, Massimiliano Pontil, Claudio Semini, Yi Wu, Koushil Sreenath

Abstract: Model-free reinforcement learning is a promising approach for autonomously solving challenging robotics control problems, but faces exploration difficulty without information of the robot's kinematics and dynamics morphology. The under-exploration of multiple modalities with symmetric states leads to behaviors that are often unnatural and sub-optimal. This issue becomes particularly pronounced in… ▽ More Model-free reinforcement learning is a promising approach for autonomously solving challenging robotics control problems, but faces exploration difficulty without information of the robot's kinematics and dynamics morphology. The under-exploration of multiple modalities with symmetric states leads to behaviors that are often unnatural and sub-optimal. This issue becomes particularly pronounced in the context of robotic systems with morphological symmetries, such as legged robots for which the resulting asymmetric and aperiodic behaviors compromise performance, robustness, and transferability to real hardware. To mitigate this challenge, we can leverage symmetry to guide and improve the exploration in policy learning via equivariance/invariance constraints. In this paper, we investigate the efficacy of two approaches to incorporate symmetry: modifying the network architectures to be strictly equivariant/invariant, and leveraging data augmentation to approximate equivariant/invariant actor-critics. We implement the methods on challenging loco-manipulation and bipedal locomotion tasks and compare with an unconstrained baseline. We find that the strictly equivariant policy consistently outperforms other methods in sample efficiency and task performance in simulation. In addition, symmetry-incorporated approaches exhibit better gait quality, higher robustness and can be deployed zero-shot in real-world experiments. △ Less

Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.19469 [pdf, other]

Humanoid Locomotion as Next Token Prediction

Authors: Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

Abstract: We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This gen… ▽ More We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This general formulation enables us to leverage data with missing modalities, like video trajectories without actions. We train our model on a collection of simulated trajectories coming from prior neural network policies, model-based controllers, motion capture data, and YouTube videos of humans. We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot. Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize to commands not seen during training like walking backward. These findings suggest a promising path toward learning challenging real-world control tasks by generative modeling of sensorimotor trajectories. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.05279 [pdf, other]

Safety Filters for Black-Box Dynamical Systems by Learning Discriminating Hyperplanes

Authors: Will Lavanakul, Jason J. Choi, Koushil Sreenath, Claire J. Tomlin

Abstract: Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at… ▽ More Learning-based approaches are emerging as an effective approach for safety filters for black-box dynamical systems. Existing methods have relied on certificate functions like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) reachability value functions. The primary motivation for our work is the recognition that ultimately, enforcing the safety constraint as a control input constraint at each state is what matters. By focusing on this constraint, we can eliminate dependence on any specific certificate function-based design. To achieve this, we define a discriminating hyperplane that shapes the half-space constraint on control input at each state, serving as a sufficient condition for safety. This concept not only generalizes over traditional safety methods but also simplifies safety filter design by eliminating dependence on specific certificate functions. We present two strategies to learn the discriminating hyperplane: (a) a supervised learning approach, using pre-verified control invariant sets for labeling, and (b) a reinforcement learning (RL) approach, which does not require such labels. The main advantage of our method, unlike conventional safe RL approaches, is the separation of performance and safety. This offers a reusable safety filter for learning new tasks, avoiding the need to retrain from scratch. As such, we believe that the new notion of the discriminating hyperplane offers a more generalizable direction towards designing safety filters, encompassing and extending existing certificate-function-based or safe RL methodologies. △ Less

Submitted 21 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: * Indicate co-first authors. This is an extended version of the paper presented at L4DC 2024

arXiv:2401.16889 [pdf, other]

Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

Authors: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Abstract: This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jum** and standing. Our RL-based controller incorporates a n… ▽ More This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jum** and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in develo** locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jum** skills, such as standing long jumps and high jumps. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2311.13824 [pdf, other]

Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems

Authors: Jason J. Choi, Fernando Castañeda, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant rel… ▽ More As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant reliance on high-quality training data. In response to these challenges, this study presents a scalable data-driven controller that efficiently identifies and infers from the most informative data points for implementing data-driven safety filters. Our approach is grounded in the integration of a model-based certificate function-based method and Gaussian Process (GP) regression, reinforced by a novel online data selection algorithm that reduces time complexity from quadratic to linear relative to dataset size. Empirical evidence, gathered from successful real-world cart-pole swing-up experiments and simulated locomotion of a five-link bipedal robot, demonstrates the efficacy of our approach. Our findings reveal that our efficient online data selection algorithm, which strategically selects key data points, enhances the practicality and efficiency of data-driven certifying filters in complex robotic systems, significantly mitigating scalability concerns inherent in nonparametric learning-based control methods. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: The first three authors contributed equally to the work. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2310.17180 [pdf, other]

A Forward Reachability Perspective on Robust Control Invariance and Discount Factors in Reachability Analysis

Authors: Jason J. Choi, Donggun Lee, Boyang Li, Jonathan P. How, Koushil Sreenath, Sylvia L. Herbert, Claire J. Tomlin

Abstract: Control invariant sets are crucial for various methods that aim to design safe control policies for systems whose state constraints must be satisfied over an indefinite time horizon. In this article, we explore the connections among reachability, control invariance, and Control Barrier Functions (CBFs) by examining the forward reachability problem associated with control invariant sets. We present… ▽ More Control invariant sets are crucial for various methods that aim to design safe control policies for systems whose state constraints must be satisfied over an indefinite time horizon. In this article, we explore the connections among reachability, control invariance, and Control Barrier Functions (CBFs) by examining the forward reachability problem associated with control invariant sets. We present the notion of an "inevitable Forward Reachable Tube" (FRT) as a tool for analyzing control invariant sets. Our findings show that the inevitable FRT of a robust control invariant set with a differentiable boundary is the set itself. We highlight the role of the differentiability of the boundary in sha** the FRTs of the sets through numerical examples. We also formulate a zero-sum differential game between the control and disturbance, where the inevitable FRT is characterized by the zero-superlevel set of the value function. By incorporating a discount factor in the cost function of the game, the barrier constraint of the CBF naturally arises as the constraint that is imposed on the optimal control policy. As a result, the value function of our FRT formulation serves as a CBF-like function, which has not been previously realized in reachability studies. Conversely, any valid CBF is also a forward reachability value function inside the control invariant set, thereby revealing the inverse optimality of the CBF. As such, our work establishes a strong link between reachability, control invariance, and CBFs, filling a gap that prior formulations based on backward reachability were unable to bridge. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: The first two authors contributed equally to this work

arXiv:2309.09969 [pdf, other]

Prompt a Robot to Walk with Large Language Models

Authors: Yen-Jen Wang, Bike Zhang, Jianyu Chen, Koushil Sreenath

Abstract: Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. Recently, there has been escalating interest in deploying LLMs for robotics, aiming to harness the power of foundation models in real-world settings. However, this approach faces significant challenges, particularly in grounding these models in the physical world and… ▽ More Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. Recently, there has been escalating interest in deploying LLMs for robotics, aiming to harness the power of foundation models in real-world settings. However, this approach faces significant challenges, particularly in grounding these models in the physical world and in generating dynamic robot motions. To address these issues, we introduce a novel paradigm in which we use few-shot prompts collected from the physical environment, enabling the LLM to autoregressively generate low-level control commands for robots without task-specific fine-tuning. Experiments across various robots and environments validate that our method can effectively prompt a robot to walk. We thus illustrate how LLMs can proficiently function as low-level feedback controllers for dynamic motion control even in high-dimensional robotic systems. The project website and source code can be found at: https://prompt2walk.github.io/ . △ Less

Submitted 16 November, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2306.13259 [pdf, other]

Nonsmooth Control Barrier Functions for Obstacle Avoidance between Convex Regions

Authors: Akshay Thirugnanam, Jun Zeng, Koushil Sreenath

Abstract: In this paper, we focus on non-conservative obstacle avoidance between robots with control affine dynamics with strictly convex and polytopic shapes. The core challenge for this obstacle avoidance problem is that the minimum distance between strictly convex regions or polytopes is generally implicit and non-smooth, such that distance constraints cannot be enforced directly in the optimization prob… ▽ More In this paper, we focus on non-conservative obstacle avoidance between robots with control affine dynamics with strictly convex and polytopic shapes. The core challenge for this obstacle avoidance problem is that the minimum distance between strictly convex regions or polytopes is generally implicit and non-smooth, such that distance constraints cannot be enforced directly in the optimization problem. To handle this challenge, we employ non-smooth control barrier functions to reformulate the avoidance problem in the dual space, with the positivity of the minimum distance between robots equivalently expressed using a quadratic program. Our approach is proven to guarantee system safety. We theoretically analyze the smoothness properties of the minimum distance quadratic program and its KKT conditions. We validate our approach by demonstrating computationally-efficient obstacle avoidance for multi-agent robotic systems with strictly convex and polytopic shapes. To our best knowledge, this is the first time a real-time QP problem can be formulated for general non-conservative avoidance between strictly convex shapes and polytopes. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 17 pages

arXiv:2304.07954 [pdf, other]

Velocity Obstacle for Polytopic Collision Avoidance for Distributed Multi-robot Systems

Authors: Jihao Huang, Jun Zeng, Xuemin Chi, Koushil Sreenath, Zhitao Liu, Hongye Su

Abstract: Obstacle avoidance for multi-robot navigation with polytopic shapes is challenging. Existing works simplify the system dynamics or consider it as a convex or non-convex optimization problem with positive distance constraints between robots, which limits real-time performance and scalability. Additionally, generating collision-free behavior for polytopic-shaped robots is harder due to implicit and… ▽ More Obstacle avoidance for multi-robot navigation with polytopic shapes is challenging. Existing works simplify the system dynamics or consider it as a convex or non-convex optimization problem with positive distance constraints between robots, which limits real-time performance and scalability. Additionally, generating collision-free behavior for polytopic-shaped robots is harder due to implicit and non-differentiable distance functions between polytopes. In this paper, we extend the concept of velocity obstacle (VO) principle for polytopic-shaped robots and propose a novel approach to construct the VO in the function of vertex coordinates and other robot's states. Compared with existing work about obstacle avoidance between polytopic-shaped robots, our approach is much more computationally efficient as the proposed approach for construction of VO between polytopes is optimization-free. Based on VO representation for polytopic shapes, we later propose a navigation approach for distributed multi-robot systems. We validate our proposed VO representation and navigation approach in multiple challenging scenarios including large-scale randomized tests, and our approach outperforms the state of art in many evaluation metrics, including completion rate, deadlock rate, and the average travel distance. △ Less

Submitted 10 June, 2024; v1 submitted 16 April, 2023; originally announced April 2023.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L) 2023, with open source repository released

arXiv:2303.03381 [pdf, other]

Real-World Humanoid Locomotion with Reinforcement Learning

Authors: Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

Abstract: Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based appr… ▽ More Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesize that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in-context, without updating its weights. We train our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deploy it to the real world zero-shot. Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context. △ Less

Submitted 14 December, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Project page: https://learning-humanoid-locomotion.github.io

arXiv:2302.14246 [pdf, other]

i2LQR: Iterative LQR for Iterative Tasks in Dynamic Environments

Authors: Yifan Zeng, Suiyi He, Han Hoang Nguyen, Yihan Li, Zhongyu Li, Koushil Sreenath, Jun Zeng

Abstract: This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike… ▽ More This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike existing algorithms, the i2LQR computes the optimal solution in an iterative manner at each timestamp, rendering it well-suited for iterative tasks with changing constraints at different iterations. To evaluate the performance of the proposed algorithm, we conduct numerical simulations for an iterative task aimed at minimizing completion time. The results show that i2LQR achieves an optimized performance with respect to learning-based MPC (LMPC) as the benchmark in static environments, and outperforms LMPC in dynamic environments with both static and dynamics obstacles. △ Less

Submitted 6 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted by 2023 62nd IEEE Conference on Decision and Control (CDC)

arXiv:2302.09450 [pdf, other]

Robust and Versatile Bipedal Jum** Control through Reinforcement Learning

Authors: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Abstract: This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a reinforcement learning framework for training a robot to accomplish a large variety of jum** tasks, such as jum** to different locations and directions. To improve performance on these challenging tasks, we d… ▽ More This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a reinforcement learning framework for training a robot to accomplish a large variety of jum** tasks, such as jum** to different locations and directions. To improve performance on these challenging tasks, we develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to a short-term I/O history. In order to train a versatile jum** policy, we utilize a multi-stage training scheme that includes different training stages for different objectives. After multi-stage training, the policy can be directly transferred to a real bipedal Cassie robot. Training on different tasks and exploring more diverse scenarios lead to highly robust policies that can exploit the diverse set of learned maneuvers to recover from perturbations or poor landings during real-world deployment. Such robustness in the proposed policy enables Cassie to succeed in completing a variety of challenging jump tasks in the real world, such as standing long jumps, jum** onto elevated platforms, and multi-axes jumps. △ Less

Submitted 31 May, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: Accepted in Robotics: Science and Systems 2023 (RSS 2023). The accompanying video is at https://youtu.be/aAPSZ2QFB-E

arXiv:2301.12012 [pdf, other]

In-Distribution Barrier Functions: Self-Supervised Policy Filters that Avoid Out-of-Distribution States

Authors: Fernando Castañeda, Haruki Nishimura, Rowan McAllister, Koushil Sreenath, Adrien Gaidon

Abstract: Learning-based control approaches have shown great promise in performing complex tasks directly from high-dimensional perception data for real robotic systems. Nonetheless, the learned controllers can behave unexpectedly if the trajectories of the system divert from the training data distribution, which can compromise safety. In this work, we propose a control filter that wraps any reference polic… ▽ More Learning-based control approaches have shown great promise in performing complex tasks directly from high-dimensional perception data for real robotic systems. Nonetheless, the learned controllers can behave unexpectedly if the trajectories of the system divert from the training data distribution, which can compromise safety. In this work, we propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations. Our methodology is inspired by Control Barrier Functions (CBFs), which are model-based tools from the nonlinear control literature that can be used to construct minimally invasive safe policy filters. While existing methods based on CBFs require a known low-dimensional state representation, our proposed approach is directly applicable to systems that rely solely on high-dimensional visual observations by learning in a latent state-space. We demonstrate that our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2212.14199 [pdf, other]

Walking in Narrow Spaces: Safety-critical Locomotion Control for Quadrupedal Robots with Duality-based Optimization

Authors: Qiayuan Liao, Zhongyu Li, Akshay Thirugnanam, Jun Zeng, Koushil Sreenath

Abstract: This paper presents a safety-critical locomotion control framework for quadrupedal robots. Our goal is to enable quadrupedal robots to safely navigate in cluttered environments. To tackle this, we introduce exponential Discrete Control Barrier Functions (exponential DCBFs) with duality-based obstacle avoidance constraints into a Nonlinear Model Predictive Control (NMPC) with Whole-Body Control (WB… ▽ More This paper presents a safety-critical locomotion control framework for quadrupedal robots. Our goal is to enable quadrupedal robots to safely navigate in cluttered environments. To tackle this, we introduce exponential Discrete Control Barrier Functions (exponential DCBFs) with duality-based obstacle avoidance constraints into a Nonlinear Model Predictive Control (NMPC) with Whole-Body Control (WBC) framework for quadrupedal locomotion control. This enables us to use polytopes to describe the shapes of the robot and obstacles for collision avoidance while doing locomotion control of quadrupedal robots. Compared to most prior work, especially using CBFs, that utilize spherical and conservative approximation for obstacle avoidance, this work demonstrates a quadrupedal robot autonomously and safely navigating through very tight spaces in the real world. (Our open-source code is available at github.com/HybridRobotics/quadruped_nmpc_dcbf_duality, and the video is available at youtu.be/p1gSQjwXm1Q.) △ Less

Submitted 9 August, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: Accepted to International Conference on Intelligent Robots and Systems (IROS) 2023

arXiv:2210.04435 [pdf, other]

Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning

Authors: Xiaoyu Huang, Zhongyu Li, Yanzhen Xiang, Yiming Ni, Yufeng Chi, Yunhao Li, Lizhi Yang, Xue Bin Peng, Koushil Sreenath

Abstract: We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkee** tasks in the real world. Soccer goalkee** using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion ma… ▽ More We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkee** tasks in the real world. Soccer goalkee** using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion maneuvers in a very short amount of time, usually less than one second. In this paper, we propose to address this problem using a hierarchical model-free RL framework. The first component of the framework contains multiple control policies for distinct locomotion skills, which can be used to cover different regions of the goal. Each control policy enables the robot to track random parametric end-effector trajectories while performing one specific locomotion skill, such as jump, dive, and sidestep. These skills are then utilized by the second part of the framework which is a high-level planner to determine a desired skill and end-effector trajectory in order to intercept a ball flying to different regions of the goal. We deploy the proposed framework on a Mini Cheetah quadrupedal robot and demonstrate the effectiveness of our framework for various agile interceptions of a fast-moving ball in the real world. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally. Accompanying video is at https://youtu.be/iX6OgG67-ZQ

arXiv:2210.04361 [pdf, other]

Iterative Convex Optimization for Model Predictive Control with Discrete-Time High-Order Control Barrier Functions

Authors: Shuo Liu, Jun Zeng, Koushil Sreenath, Calin A. Belta

Abstract: Safety is one of the fundamental challenges in control theory. Recently, multi-step optimal control problems for discrete-time dynamical systems were formulated to enforce stability, while subject to input constraints as well as safety-critical requirements using discrete-time control barrier functions within a model predictive control (MPC) framework. Existing work usually focus on the feasibilit… ▽ More Safety is one of the fundamental challenges in control theory. Recently, multi-step optimal control problems for discrete-time dynamical systems were formulated to enforce stability, while subject to input constraints as well as safety-critical requirements using discrete-time control barrier functions within a model predictive control (MPC) framework. Existing work usually focus on the feasibility or the safety for the optimization problem, and the majority of the existing work restrict the discussions to relative-degree one control barrier functions. Additionally, the real-time computation is challenging when a large horizon is considered in the MPC problem for relative-degree one or high-order control barrier functions. In this paper, we propose a framework that solves the safety-critical MPC problem in an iterative optimization, which is applicable for any relative-degree control barrier functions. In the proposed formulation, the nonlinear system dynamics as well as the safety constraints modeled as discrete-time high-order control barrier functions (DHOCBF) are linearized at each time step. Our formulation is generally valid for any control barrier function with an arbitrary relative-degree. The advantages of fast computational performance with safety guarantee are analyzed and validated with numerical results. △ Less

Submitted 13 July, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

Comments: The open source code is added and the paper is accepted to American Control Conference (ACC) 2023 (8 pages)

arXiv:2209.05309 [pdf, other]

GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Authors: Gilbert Feng, Hongbo Zhang, Zhongyu Li, Xue Bin Peng, Bhuvan Basireddy, Linzhu Yue, Zhitao Song, Lizhi Yang, Yunhui Liu, Koushil Sreenath, Sergey Levine

Abstract: Recent years have seen a surge in commercially-available and affordable quadrupedal robots, with many of these platforms being actively used in research and industry. As the availability of legged robots grows, so does the need for controllers that enable these robots to perform useful skills. However, most learning-based frameworks for controller development focus on training robot-specific contr… ▽ More Recent years have seen a surge in commercially-available and affordable quadrupedal robots, with many of these platforms being actively used in research and industry. As the availability of legged robots grows, so does the need for controllers that enable these robots to perform useful skills. However, most learning-based frameworks for controller development focus on training robot-specific controllers, a process that needs to be repeated for every new robot. In this work, we introduce a framework for training generalized locomotion (GenLoco) controllers for quadrupedal robots. Our framework synthesizes general-purpose locomotion controllers that can be deployed on a large variety of quadrupedal robots with similar morphologies. We present a simple but effective morphology randomization method that procedurally generates a diverse set of simulated robots for training. We show that by training a controller on this large set of simulated robots, our models acquire more general control strategies that can be directly transferred to novel simulated and real-world robots with diverse morphologies, which were not observed during training. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: First two authors contributed equally

arXiv:2208.10733 [pdf, other]

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Authors: Fernando Castañeda, Jason J. Choi, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Learning-based control schemes have recently shown great efficacy performing complex tasks for a wide variety of applications. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. Among the currently most popular methods to tackle this challenge, Control Barrier Functions (CBFs) serve as math… ▽ More Learning-based control schemes have recently shown great efficacy performing complex tasks for a wide variety of applications. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. Among the currently most popular methods to tackle this challenge, Control Barrier Functions (CBFs) serve as mathematical tools that provide a formal safety-preserving control synthesis procedure for systems with known dynamics. In this paper, we first introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers using Gaussian Process (GP) regression to bridge the gap between an approximate mathematical model and the real system. Compared to previous approaches, we study the feasibility of the resulting robust safety-critical controller. This feasibility analysis results in a set of richness conditions that the available information about the system should satisfy to guarantee that a safe control action can be found at all times. We then use these conditions to devise an event-triggered online data collection strategy that ensures the recursive feasibility of the learned safety-critical controller. Our proposed methodology endows the system with the ability to reason at all times about whether the current information at its disposal is enough to ensure safety or if new measurements are required. This, in turn, allows us to provide formal results of forward invariance of a safe set with high probability, even in a priori unexplored regions. Finally, we validate the proposed framework in numerical simulations of an adaptive cruise control system and a kinematic vehicle. △ Less

Submitted 26 September, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Journal article. Includes the results of the 2021 CDC paper titled "Pointwise feasibility of gaussian process-based safety-critical control under model uncertainty" and proposes a recursively feasible safe online learning algorithm as new contribution

arXiv:2208.06721 [pdf, other]

Lyapunov Design for Robust and Efficient Robotic Reinforcement Learning

Authors: Tyler Westenbroek, Fernando Castaneda, Ayush Agrawal, Shankar Sastry, Koushil Sreenath

Abstract: Recent advances in the reinforcement learning (RL) literature have enabled roboticists to automatically train complex policies in simulated environments. However, due to the poor sample complexity of these methods, solving RL problems using real-world data remains a challenging problem. This paper introduces a novel cost-sha** method which aims to reduce the number of samples needed to learn a s… ▽ More Recent advances in the reinforcement learning (RL) literature have enabled roboticists to automatically train complex policies in simulated environments. However, due to the poor sample complexity of these methods, solving RL problems using real-world data remains a challenging problem. This paper introduces a novel cost-sha** method which aims to reduce the number of samples needed to learn a stabilizing controller. The method adds a term involving a Control Lyapunov Function (CLF) -- an `energy-like' function from the model-based control literature -- to typical cost formulations. Theoretical results demonstrate the new costs lead to stabilizing controllers when smaller discount factors are used, which is well-known to reduce sample complexity. Moreover, the addition of the CLF term `robustifies' the search for a stabilizing controller by ensuring that even highly sub-optimal polices will stabilize the system. We demonstrate our approach with two hardware examples where we learn stabilizing controllers for a cartpole and an A1 quadruped with only seconds and a few minutes of fine-tuning data, respectively. Furthermore, simulation benchmark studies show that obtaining stabilizing policies by optimizing our proposed costs requires orders of magnitude less data compared to standard cost designs. △ Less

Submitted 17 November, 2022; v1 submitted 13 August, 2022; originally announced August 2022.

arXiv:2208.01160 [pdf, other]

Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot

Authors: Yandong Ji, Zhongyu Li, Yinan Sun, Xue Bin Peng, Sergey Levine, Glen Berseth, Koushil Sreenath

Abstract: We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Develo** algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability… ▽ More We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Develo** algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability during the control of a dynamic legged robot. Moreover, we need to consider motion planning to shoot the hard-to-model deformable ball rolling on the ground with uncertain friction to a desired location. In this paper, we propose a hierarchical framework that leverages deep reinforcement learning to train (a) a robust motion control policy that can track arbitrary motions and (b) a planning policy to decide the desired kicking motion to shoot a soccer ball to a target. We deploy the proposed framework on an A1 quadrupedal robot and enable it to accurately shoot the ball to random targets in the real world. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: Accepted to 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

arXiv:2206.14424 [pdf, other]

Collaborative Navigation and Manipulation of a Cable-towed Load by Multiple Quadrupedal Robots

Authors: Chenyu Yang, Guo Ning Sue, Zhongyu Li, Lizhi Yang, Haotian Shen, Yufeng Chi, Akshara Rai, Jun Zeng, Koushil Sreenath

Abstract: This paper tackles the problem of robots collaboratively towing a load with cables to a specified goal location while avoiding collisions in real time. The introduction of cables (as opposed to rigid links) enables the robotic team to travel through narrow spaces by changing its intrinsic dimensions through slack/taut switches of the cable. However, this is a challenging problem because of the hyb… ▽ More This paper tackles the problem of robots collaboratively towing a load with cables to a specified goal location while avoiding collisions in real time. The introduction of cables (as opposed to rigid links) enables the robotic team to travel through narrow spaces by changing its intrinsic dimensions through slack/taut switches of the cable. However, this is a challenging problem because of the hybrid mode switches and the dynamical coupling among multiple robots and the load. Previous attempts at addressing such a problem were performed offline and do not consider avoiding obstacles online. In this paper, we introduce a cascaded planning scheme with a parallelized centralized trajectory optimization that deals with hybrid mode switches. We additionally develop a set of decentralized planners per robot, which enables our approach to solve the problem of collaborative load manipulation online. We develop and demonstrate one of the first collaborative autonomy framework that is able to move a cable-towed load, which is too heavy to move by a single robot, through narrow spaces with real-time feedback and reactive planning in experiments. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: Extended version of the manuscript accepted to IEEE Robotics and Automation Letters (RA-L) 2022

arXiv:2205.15299 [pdf, other]

Adapting Rapid Motor Adaptation for Bipedal Robots

Authors: Ashish Kumar, Zhongyu Li, Jun Zeng, Deepak Pathak, Koushil Sreenath, Jitendra Malik

Abstract: Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which p… ▽ More Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module. This extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. In this paper, we propose A-RMA (Adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuning it using model-free RL. We demonstrate that A-RMA outperforms a number of RL-based baseline controllers and model-based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable a bipedal robot, Cassie, to walk in a variety of different scenarios in the real world beyond what it has seen during training. Videos and results at https://ashish-kmr.github.io/a-rma/ △ Less

Submitted 6 September, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: First two authors contributed equally. Website at https://ashish-kmr.github.io/a-rma/

arXiv:2205.05787 [pdf, other]

Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

Authors: Zhongyu Li, Jun Zeng, Akshay Thirugnanam, Koushil Sreenath

Abstract: Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propo… ▽ More Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propose a new method to combine model-based safety with model-free reinforcement learning by explicitly finding a low-dimensional model of the system controlled by a RL policy and applying stability and safety guarantees on that simple model. We use a complex bipedal robot Cassie, which is a high dimensional nonlinear system with hybrid dynamics and underactuation, and its RL-based walking controller as an example. We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system. We demonstrate that this model is linear, asymptotically stable, and is decoupled across control input in all dimensions. We further exemplify that such linearity exists even when using different RL control policies. Such results point out an interesting direction to understand the relationship between RL and optimal control: whether RL tends to linearize the nonlinear system during training in some cases. Furthermore, we illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework, e.g., Model Predictive Control with Control Barrier Functions, on an example of autonomous navigation using Cassie while taking advantage of the agility provided by the RL-based controller. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Accepted in Proceedings of Robotics: Science and Systems 2022 (RSS 2022)

arXiv:2204.03134 [pdf, other]

Perception-aware receding horizon trajectory planning for multicopters with visual-inertial odometry

Authors: Xiangyu Wu, Shuxiao Chen, Koushil Sreenath, Mark W. Mueller

Abstract: Visual inertial odometry (VIO) is widely used for the state estimation of multicopters, but it may function poorly in environments with few visual features or in overly aggressive flights. In this work, we propose a perception-aware collision avoidance trajectory planner for multicopters, that may be used with any feature-based VIO algorithm. Our approach is able to fly the vehicle to a goal posit… ▽ More Visual inertial odometry (VIO) is widely used for the state estimation of multicopters, but it may function poorly in environments with few visual features or in overly aggressive flights. In this work, we propose a perception-aware collision avoidance trajectory planner for multicopters, that may be used with any feature-based VIO algorithm. Our approach is able to fly the vehicle to a goal position at fast speed, avoiding obstacles in an unknown stationary environment while achieving good VIO state estimation accuracy. The proposed planner samples a group of minimum jerk trajectories and finds collision-free trajectories among them, which are then evaluated based on their speed to the goal and perception quality. Both the motion blur of features and their locations are considered for the perception quality. Our novel consideration of the motion blur of features enables automatic adaptation of the trajectory's aggressiveness under environments with different light levels. The best trajectory from the evaluation is tracked by the vehicle and is updated in a receding horizon manner when new images are received from the camera. Only generic assumptions about the VIO are made, so that the planner may be used with various existing systems. The proposed method can run in real-time on a small embedded computer on board. We validated the effectiveness of our proposed approach through experiments in both indoor and outdoor environments. Compared to a perception-agnostic planner, the proposed planner kept more features in the camera's view and made the flight less aggressive, making the VIO more accurate. It also reduced VIO failures, which occurred for the perception-agnostic planner but not for the proposed planner. The ability of the proposed planner to fly through dense obstacles was also validated. The experiment video can be found at https://youtu.be/qO3LZIrpwtQ. △ Less

Submitted 1 August, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: 12 pages

arXiv:2203.08180 [pdf, other]

Tethered Power for a Series of Quadcopters: Analysis and Applications

Authors: Karan P. Jain, Prasanth Kotaru, Massimiliano de Sa, Koushil Sreenath, Mark W. Mueller

Abstract: Tethered quadcopters are used for extended flight operations where the power to the system is provided via a tether connected to an external power source. In this work, we consider a system of multiple quadcopters powered by a single tether. We study the design factors that influence the power requirements, such as the electrical resistance of the tether, input voltage, and quadcopters' positions.… ▽ More Tethered quadcopters are used for extended flight operations where the power to the system is provided via a tether connected to an external power source. In this work, we consider a system of multiple quadcopters powered by a single tether. We study the design factors that influence the power requirements, such as the electrical resistance of the tether, input voltage, and quadcopters' positions. We present an analysis to predict the required power to be supplied to a series of N tethered quadcopters, with respect to the thrust of each quadcopter which guarantees electrical safety and helps in design optimization. We find that there is a critical boundary of thrusts that cannot be exceeded due to fundamental electrical limitations. We compare the power consumption for one tethered quadcopter and two tethered quadcopters and show that for large quadcopters far enough from the anchor point, a two-quadcopter system consumes lesser power. We show that, for a representative use case of firefighting, a tethered system with two quadcopters consumes 26% less power than a corresponding system with one quadcopter. Finally, we present experiments demonstrating the use of a two-quadcopter tethered system as compared to a one-quadcopter tethered system in a cluttered environment, such as passing through a window and gras** an object over an obstacle. △ Less

Submitted 26 September, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: Submitted to ICRA 2023

arXiv:2203.05194 [pdf, other]

Learning Torque Control for Quadrupedal Locomotion

Authors: Shuxiao Chen, Bike Zhang, Mark W. Mueller, Akshara Rai, Koushil Sreenath

Abstract: Reinforcement learning (RL) has become a promising approach to develo** controllers for quadrupedal robots. Conventionally, an RL design for locomotion follows a position-based paradigm, wherein an RL policy outputs target joint positions at a low frequency that are then tracked by a high-frequency proportional-derivative (PD) controller to produce joint torques. In contrast, for the model-based… ▽ More Reinforcement learning (RL) has become a promising approach to develo** controllers for quadrupedal robots. Conventionally, an RL design for locomotion follows a position-based paradigm, wherein an RL policy outputs target joint positions at a low frequency that are then tracked by a high-frequency proportional-derivative (PD) controller to produce joint torques. In contrast, for the model-based control of quadrupedal locomotion, there has been a paradigm shift from position-based control to torque-based control. In light of the recent advances in model-based control, we explore an alternative to the position-based RL paradigm, by introducing a torque-based RL framework, where an RL policy directly predicts joint torques at a high frequency, thus circumventing the use of a PD controller. The proposed learning torque control framework is validated with extensive experiments, in which a quadruped is capable of traversing various terrain and resisting external disturbances while following user-specified commands. Furthermore, compared to learning position control, learning torque control demonstrates the potential to achieve a higher reward and is more robust to significant external disturbances. To our knowledge, this is the first sim-to-real attempt for end-to-end learning torque control of quadrupedal locomotion. △ Less

Submitted 12 March, 2023; v1 submitted 10 March, 2022; originally announced March 2022.

arXiv:2203.02570 [pdf, other]

Bayesian Optimization Meets Hybrid Zero Dynamics: Safe Parameter Learning for Bipedal Locomotion Control

Authors: Lizhi Yang, Zhongyu Li, Jun Zeng, Koushil Sreenath

Abstract: In this paper, we propose a multi-domain control parameter learning framework that combines Bayesian Optimization (BO) and Hybrid Zero Dynamics (HZD) for locomotion control of bipedal robots. We leverage BO to learn the control parameters used in the HZD-based controller. The learning process is firstly deployed in simulation to optimize different control parameters for a large repertoire of gaits… ▽ More In this paper, we propose a multi-domain control parameter learning framework that combines Bayesian Optimization (BO) and Hybrid Zero Dynamics (HZD) for locomotion control of bipedal robots. We leverage BO to learn the control parameters used in the HZD-based controller. The learning process is firstly deployed in simulation to optimize different control parameters for a large repertoire of gaits. Next, to tackle the discrepancy between the simulation and the real world, the learning process is applied on the physical robot to learn for corrections to the control parameters learned in simulation while also respecting a safety constraint for gait stability. This method empowers an efficient sim-to-real transition with a small number of samples in the real world, and does not require a valid controller to initialize the training in simulation. Our proposed learning framework is experimentally deployed and validated on a bipedal robot Cassie to perform versatile locomotion skills with improved performance on smoothness of walking gaits and reduction of steady-state tracking errors. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: Accepted to 2022 International Conference on Robotics and Automation (ICRA 2022)

arXiv:2203.02091 [pdf, other]

Teaching Robots to Span the Space of Functional Expressive Motion

Authors: Arjun Sripathy, Andreea Bobu, Zhongyu Li, Koushil Sreenath, Daniel S. Brown, Anca D. Dragan

Abstract: Our goal is to enable robots to perform functional tasks in emotive ways, be it in response to their users' emotional states, or expressive of their confidence levels. Prior work has proposed learning independent cost functions from user feedback for each target emotion, so that the robot may optimize it alongside task and environment specific objectives for any situation it encounters. However, t… ▽ More Our goal is to enable robots to perform functional tasks in emotive ways, be it in response to their users' emotional states, or expressive of their confidence levels. Prior work has proposed learning independent cost functions from user feedback for each target emotion, so that the robot may optimize it alongside task and environment specific objectives for any situation it encounters. However, this approach is inefficient when modeling multiple emotions and unable to generalize to new ones. In this work, we leverage the fact that emotions are not independent of each other: they are related through a latent space of Valence-Arousal-Dominance (VAD). Our key idea is to learn a model for how trajectories map onto VAD with user labels. Considering the distance between a trajectory's map** and a target VAD allows this single model to represent cost functions for all emotions. As a result 1) all user feedback can contribute to learning about every emotion; 2) the robot can generate trajectories for any emotion in the space instead of only a few predefined ones; and 3) the robot can respond emotively to user-generated natural language by map** it to a target VAD. We introduce a method that interactively learns to map trajectories to this latent space and test it in simulation and in a user study. In experiments, we use a simple vacuum robot as well as the Cassie biped. △ Less

Submitted 2 August, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2201.08538 [pdf, other]

Computation of Regions of Attraction for Hybrid Limit Cycles Using Reachability: An Application to Walking Robots

Authors: Jason J. Choi, Ayush Agrawal, Koushil Sreenath, Claire J. Tomlin, Somil Bansal

Abstract: Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) r… ▽ More Contact-rich robotic systems, such as legged robots and manipulators, are often represented as hybrid systems. However, the stability analysis and region-of-attraction computation for these systems are often challenging because of the discontinuous state changes upon contact (also referred to as state resets). In this work, we cast the computation of region-ofattraction as a Hamilton-Jacobi (HJ) reachability problem. This enables us to leverage HJ reachability tools that are compatible with general nonlinear system dynamics, and can formally deal with state and input constraints as well as bounded disturbances. Our main contribution is the generalization of HJ reachability framework to account for the discontinuous state changes originating from state resets, which has remained a challenge until now. We apply our approach for computing region-of-attractions for several underactuated walking robots and demonstrate that the proposed approach can (a) recover a bigger region-of-attraction than state-of-the-art approaches, (b) handle state resets, nonlinear dynamics, external disturbances, and input constraints, and (c) also provides a stabilizing controller for the system that can leverage the state resets for enhancing system stability. △ Less

Submitted 8 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to IEEE RA-L & ICRA, 2022

arXiv:2201.01347 [pdf, other]

Learning Differentiable Safety-Critical Control using Control Barrier Functions for Generalization to Novel Environments

Authors: Hengbo Ma, Bike Zhang, Masayoshi Tomizuka, Koushil Sreenath

Abstract: Control barrier functions (CBFs) have become a popular tool to enforce safety of a control system. CBFs are commonly utilized in a quadratic program formulation (CBF-QP) as safety-critical constraints. A class $\mathcal{K}$ function in CBFs usually needs to be tuned manually in order to balance the trade-off between performance and safety for each environment. However, this process is often heuris… ▽ More Control barrier functions (CBFs) have become a popular tool to enforce safety of a control system. CBFs are commonly utilized in a quadratic program formulation (CBF-QP) as safety-critical constraints. A class $\mathcal{K}$ function in CBFs usually needs to be tuned manually in order to balance the trade-off between performance and safety for each environment. However, this process is often heuristic and can become intractable for high relative-degree systems. Moreover, it prevents the CBF-QP from generalizing to different environments in the real world. By embedding the optimization procedure of the exponential control barrier function based quadratic program (ECBF-QP) as a differentiable layer within a deep learning architecture, we propose a differentiable safety-critical control framework that enables generalization to new environments for high relative-degree systems with forward invariance guarantees. Finally, we validate the proposed control design with 2D double and quadruple integrator systems in various environments. △ Less

Submitted 10 April, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

Comments: Accepted by European Control Conference 2022 (ECC22)

arXiv:2112.06435 [pdf, other]

Autonomous Racing with Multiple Vehicles using a Parallelized Optimization with Safety Guarantee using Control Barrier Functions

Authors: Suiyi He, Jun Zeng, Koushil Sreenath

Abstract: This paper presents a novel planning and control strategy for competing with multiple vehicles in a car racing scenario. The proposed racing strategy switches between two modes. When there are no surrounding vehicles, a learning-based model predictive control (MPC) trajectory planner is used to guarantee that the ego vehicle achieves better lap timing performance. When the ego vehicle is competing… ▽ More This paper presents a novel planning and control strategy for competing with multiple vehicles in a car racing scenario. The proposed racing strategy switches between two modes. When there are no surrounding vehicles, a learning-based model predictive control (MPC) trajectory planner is used to guarantee that the ego vehicle achieves better lap timing performance. When the ego vehicle is competing with other surrounding vehicles to overtake, an optimization-based planner generates multiple dynamically-feasible trajectories through parallel computation. Each trajectory is optimized under a MPC formulation with different homotopic Bezier-curve reference paths lying laterally between surrounding vehicles. The time-optimal trajectory among these different homotopic trajectories is selected and a low-level MPC controller with control barrier function constraints for obstacle avoidance is used to guarantee system's safety-critical performance. The proposed algorithm has the capability to generate collision-free trajectories and track them while enhancing the lap timing performance with steady low computational complexity, outperforming existing approaches in both timing and performance for a autonomous racing environment. To demonstrate the performance of our racing strategy, we simulate with multiple randomly generated moving vehicles on the track and test the ego vehicle's overtake maneuvers. △ Less

Submitted 27 March, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA 2022)

arXiv:2110.00891 [pdf, other]

Vision-aided Dynamic Quadrupedal Locomotion on Discrete Terrain using Motion Libraries

Authors: Ayush Agrawal, Shuxiao Chen, Akshara Rai, Koushil Sreenath

Abstract: In this paper, we present a framework rooted in control and planning that enables quadrupedal robots to traverse challenging terrains with discrete footholds using visual feedback. Navigating discrete terrain is challenging for quadrupeds because the motion of the robot can be aperiodic, highly dynamic, and blind for the hind legs of the robot. Additionally, the robot needs to reason over both the… ▽ More In this paper, we present a framework rooted in control and planning that enables quadrupedal robots to traverse challenging terrains with discrete footholds using visual feedback. Navigating discrete terrain is challenging for quadrupeds because the motion of the robot can be aperiodic, highly dynamic, and blind for the hind legs of the robot. Additionally, the robot needs to reason over both the feasible footholds as well as robot velocity by speeding up and slowing down at different parts of the terrain. We build an offline library of periodic gaits which span two trotting steps on the robot, and switch between different motion primitives to achieve aperiodic motions of different step lengths on an A1 robot. The motion library is used to provide targets to a geometric model predictive controller which controls stance. To incorporate visual feedback, we use terrain map** tools to build a local height map of the terrain around the robot using RGB and depth cameras, and extract feasible foothold locations around both the front and hind legs of the robot. Our experiments show a Unitree A1 robot navigating multiple unknown, challenging and discrete terrains in the real world. △ Less

Submitted 4 March, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

Comments: Accepted to ICRA 2022

arXiv:2109.12313 [pdf, other]

Safety-Critical Control and Planning for Obstacle Avoidance between Polytopes with Control Barrier Functions

Authors: Akshay Thirugnanam, Jun Zeng, Koushil Sreenath

Abstract: Obstacle avoidance between polytopes is a challenging topic for optimal control and optimization-based trajectory planning problems. Existing work either solves this problem through mixed-integer optimization, relying on simplification of system dynamics, or through model predictive control with dual variables using distance constraints, requiring long horizons for obstacle avoidance. In either ca… ▽ More Obstacle avoidance between polytopes is a challenging topic for optimal control and optimization-based trajectory planning problems. Existing work either solves this problem through mixed-integer optimization, relying on simplification of system dynamics, or through model predictive control with dual variables using distance constraints, requiring long horizons for obstacle avoidance. In either case, the solution can only be applied as an offline planning algorithm. In this paper, we exploit the property that a smaller horizon is sufficient for obstacle avoidance by using discrete-time control barrier function (DCBF) constraints and we propose a novel optimization formulation with dual variables based on DCBFs to generate a collision-free dynamically-feasible trajectory. The proposed optimization formulation has lower computational complexity compared to existing work and can be used as a fast online algorithm for control and planning for general nonlinear dynamical systems. We validate our algorithm on different robot shapes using numerical simulations with a kinematic bicycle model, resulting in successful navigation through maze environments with polytopic obstacles. △ Less

Submitted 30 May, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA 2022)

arXiv:2109.05714 [pdf, other]

Autonomous Navigation of Underactuated Bipedal Robots in Height-Constrained Environments

Authors: Zhongyu Li, Jun Zeng, Shuxiao Chen, Koushil Sreenath

Abstract: Navigating a large-scaled robot in unknown and cluttered height-constrained environments is challenging. Not only is a fast and reliable planning algorithm required to go around obstacles, the robot should also be able to change its intrinsic dimension by crouching in order to travel underneath height-constrained regions. There are few mobile robots that are capable of handling such a challenge, a… ▽ More Navigating a large-scaled robot in unknown and cluttered height-constrained environments is challenging. Not only is a fast and reliable planning algorithm required to go around obstacles, the robot should also be able to change its intrinsic dimension by crouching in order to travel underneath height-constrained regions. There are few mobile robots that are capable of handling such a challenge, and bipedal robots provide a solution. However, as bipedal robots have nonlinear and hybrid dynamics, trajectory planning while ensuring dynamic feasibility and safety on these robots is challenging. This paper presents an end-to-end autonomous navigation framework which leverages three layers of planners and a variable walking height controller to enable bipedal robots to safely explore height-constrained environments. A vertically-actuated Spring-Loaded Inverted Pendulum (vSLIP) model is introduced to capture the robot's coupled dynamics of planar walking and vertical walking height. This reduced-order model is utilized to optimize for long-term and short-term safe trajectory plans. A variable walking height controller is leveraged to enable the bipedal robot to maintain stable periodic walking gaits while following the planned trajectory. The entire framework is tested and experimentally validated using a bipedal robot Cassie. This demonstrates reliable autonomy to drive the robot to safely avoid obstacles while walking to the goal location in various kinds of height-constrained cluttered environments. △ Less

Submitted 13 July, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted in International Journal of Robotics Research (IJRR) 2023. This is the author's version and will no longer be updated as the copyright may get transferred at anytime

arXiv:2108.03344 [pdf, other]

Real-time Geo-localization Using Satellite Imagery and Topography for Unmanned Aerial Vehicles

Authors: Shuxiao Chen, Xiangyu Wu, Mark W. Mueller, Koushil Sreenath

Abstract: The capabilities of autonomous flight with unmanned aerial vehicles (UAVs) have significantly increased in recent times. However, basic problems such as fast and robust geo-localization in GPS-denied environments still remain unsolved. Existing research has primarily concentrated on improving the accuracy of localization at the cost of long and varying computation time in various situations, which… ▽ More The capabilities of autonomous flight with unmanned aerial vehicles (UAVs) have significantly increased in recent times. However, basic problems such as fast and robust geo-localization in GPS-denied environments still remain unsolved. Existing research has primarily concentrated on improving the accuracy of localization at the cost of long and varying computation time in various situations, which often necessitates the use of powerful ground station machines. In order to make image-based geo-localization online and pragmatic for lightweight embedded systems on UAVs, we propose a framework that is reliable in changing scenes, flexible about computing resource allocation and adaptable to common camera placements. The framework is comprised of two stages: offline database preparation and online inference. At the first stage, color images and depth maps are rendered as seen from potential vehicle poses quantized over the satellite and topography maps of anticipated flying areas. A database is then populated with the global and local descriptors of the rendered images. At the second stage, for each captured real-world query image, top global matches are retrieved from the database and the vehicle pose is further refined via local descriptor matching. We present field experiments of image-based localization on two different UAV platforms to validate our results. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: Accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2107.08360 [pdf, other]

Duality-based Convex Optimization for Real-time Obstacle Avoidance between Polytopes with Control Barrier Functions

Authors: Akshay Thirugnanam, Jun Zeng, Koushil Sreenath

Abstract: Develo** controllers for obstacle avoidance between polytopes is a challenging and necessary problem for navigation in tight spaces. Traditional approaches can only formulate the obstacle avoidance problem as an offline optimization problem. To address these challenges, we propose a duality-based safety-critical optimal control using nonsmooth control barrier functions for obstacle avoidance bet… ▽ More Develo** controllers for obstacle avoidance between polytopes is a challenging and necessary problem for navigation in tight spaces. Traditional approaches can only formulate the obstacle avoidance problem as an offline optimization problem. To address these challenges, we propose a duality-based safety-critical optimal control using nonsmooth control barrier functions for obstacle avoidance between polytopes, which can be solved in real-time with a QP-based optimization problem. A dual optimization problem is introduced to represent the minimum distance between polytopes and the Lagrangian function for the dual form is applied to construct a control barrier function. We validate the obstacle avoidance with the proposed dual formulation for L-shaped (sofa-shaped) controlled robot in a corridor environment. We demonstrate real-time tight obstacle avoidance with non-conservative maneuvers on a moving sofa (piano) problem with nonlinear dynamics. △ Less

Submitted 18 April, 2022; v1 submitted 18 July, 2021; originally announced July 2021.

Comments: Accepted to 2022 American Control Conference (ACC) with full version of proofs in the appendix

arXiv:2107.00773 [pdf, other]

Autonomous Navigation for Quadrupedal Robots with Optimized Jum** through Constrained Obstacles

Authors: Scott Gilroy, Derek Lau, Lizhi Yang, Ed Izaguirre, Kristen Biermayer, Anxing Xiao, Mengti Sun, Ayush Agrawal, Jun Zeng, Zhongyu Li, Koushil Sreenath

Abstract: Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jum** modes. To obtain a dynamic jum** maneuver while avoiding obstacles, dynamically-feasible trajectories are… ▽ More Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jum** modes. To obtain a dynamic jum** maneuver while avoiding obstacles, dynamically-feasible trajectories are optimized offline through collocation-based optimization where safety constraints are imposed. Such optimization schematic allows the robot to jump through window-shaped obstacles by considering both obstacles in the air and on the ground. The resulted jum** mode is utilized in an autonomous navigation pipeline that leverages a search-based global planner and a local planner to enable the robot to reach the goal location by walking. A state machine together with a decision making strategy allows the system to switch behaviors between walking around obstacles or jum** through them. The proposed framework is experimentally deployed and validated on a quadrupedal robot, a Mini Cheetah, to enable the robot to autonomously navigate through an environment while avoiding obstacles and jum** over a maximum height of 13 cm to pass through a window-shaped opening in order to reach its goal. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted to 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE 2021)

arXiv:2106.07108 [pdf, other]

Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty

Authors: Fernando Castañeda, Jason J. Choi, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when t… ▽ More Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. They are commonly utilized to build constraints that can be incorporated in a min-norm quadratic program (CBF-CLF-QP) which solves for a safety-critical control input. However, since these constraints rely on a model of the system, when this model is inaccurate the guarantees of safety and stability can be easily lost. In this paper, we present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs. The considered model uncertainty is affected by both state and control input. We derive probabilistic bounds on the effects that such model uncertainty has on the dynamics of the CBF and CLF. We then use these bounds to build safety and stability chance constraints that can be incorporated in a min-norm convex optimization-based controller, called GP-CBF-CLF-SOCP. As the main theoretical result of the paper, we present necessary and sufficient conditions for pointwise feasibility of the proposed optimization problem. We believe that these conditions could serve as a starting point towards understanding what are the minimal requirements on the distribution of data collected from the real system in order to guarantee safety. Finally, we validate the proposed framework with numerical simulations of an adaptive cruise controller for an automotive system. △ Less

Submitted 1 October, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: The first two authors contributed equally. Accepted for publication in IEEE 60th Conference on Decision and Control (CDC 2021)

arXiv:2105.10596 [pdf, other]

Enhancing Feasibility and Safety of Nonlinear Model Predictive Control with Discrete-Time Control Barrier Functions

Authors: Jun Zeng, Zhongyu Li, Koushil Sreenath

Abstract: Safety is one of the fundamental problems in robotics. Recently, one-step or multi-step optimal control problems for discrete-time nonlinear dynamical system were formulated to offer tracking stability using control Lyapunov functions (CLFs) while subject to input constraints as well as safety-critical constraints using control barrier functions (CBFs). The limitations of these existing approaches… ▽ More Safety is one of the fundamental problems in robotics. Recently, one-step or multi-step optimal control problems for discrete-time nonlinear dynamical system were formulated to offer tracking stability using control Lyapunov functions (CLFs) while subject to input constraints as well as safety-critical constraints using control barrier functions (CBFs). The limitations of these existing approaches are mainly about feasibility and safety. In the existing approaches, the feasibility of the optimization and the system safety cannot be enhanced at the same time theoretically. In this paper, we propose two formulations that unifies CLFs and CBFs under the framework of nonlinear model predictive control (NMPC). In the proposed formulations, safety criteria is commonly formulated as CBF constraints and stability performance is ensured with either a terminal cost function or CLF constraints. Slack variables with relaxing technique are introduced on the CBF constraints to resolve the tradeoff between feasibility and safety so that they can be enhanced at the same. The advantages about feasibility and safety of proposed formulations compared with existing methods are analyzed theoretically and validated with numerical results. △ Less

Submitted 1 October, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

Comments: Accepted to 2021 Conference on Decision and Control (CDC 2021)

arXiv:2104.04238 [pdf, other]

Legged Robot State Estimation in Slippery Environments Using Invariant Extended Kalman Filter with Velocity Update

Authors: Sangli Teng, Mark Wilfried Mueller, Koushil Sreenath

Abstract: This paper proposes a state estimator for legged robots operating in slippery environments. An Invariant Extended Kalman Filter (InEKF) is implemented to fuse inertial and velocity measurements from a tracking camera and leg kinematic constraints. {\color{black}The misalignment between the camera and the robot-frame is also modeled thus enabling auto-calibration of camera pose.} The leg kinematics… ▽ More This paper proposes a state estimator for legged robots operating in slippery environments. An Invariant Extended Kalman Filter (InEKF) is implemented to fuse inertial and velocity measurements from a tracking camera and leg kinematic constraints. {\color{black}The misalignment between the camera and the robot-frame is also modeled thus enabling auto-calibration of camera pose.} The leg kinematics based velocity measurement is formulated as a right-invariant observation. Nonlinear observability analysis shows that other than the rotation around the gravity vector and the absolute position, all states are observable except for some singular cases. Discrete observability analysis demonstrates that our filter is consistent with the underlying nonlinear system. An online noise parameter tuning method is developed to adapt to the highly time-varying camera measurement noise. The proposed method is experimentally validated on a Cassie bipedal robot walking over slippery terrain. A video for the experiment can be found at https://youtu.be/VIqJL0cUr7s. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: To appear on the 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2104.02808 [pdf, other]

Robust Control Barrier-Value Functions for Safety-Critical Control

Authors: Jason J. Choi, Donggun Lee, Koushil Sreenath, Claire J. Tomlin, Sylvia L. Herbert

Abstract: This paper works towards unifying two popular approaches in the safety control community: Hamilton-Jacobi (HJ) reachability and Control Barrier Functions (CBFs). HJ Reachability has methods for direct construction of value functions that provide safety guarantees and safe controllers, however the online implementation can be overly conservative and/or rely on chattering bang-bang control. The CBF… ▽ More This paper works towards unifying two popular approaches in the safety control community: Hamilton-Jacobi (HJ) reachability and Control Barrier Functions (CBFs). HJ Reachability has methods for direct construction of value functions that provide safety guarantees and safe controllers, however the online implementation can be overly conservative and/or rely on chattering bang-bang control. The CBF community has methods for safe-guarding controllers in the form of point-wise optimization using quadratic programs (CBF-QP), where the CBF-based safety certificate is used as a constraint. However, finding a valid CBF for a general dynamical system is challenging. This paper unifies these two methods by introducing a new reachability formulation inspired by the structure of CBFs to construct a Control Barrier-Value Function (CBVF). We verify that CBVF is a viscosity solution to a novel Hamilton-Jacobi-Isaacs Variational Inequality and preserves the same safety guarantee as the original reachability formulation. Finally, inspired by the CBF-QP, we propose a QP-based online control synthesis for systems affine in control and disturbance, whose solution is always the CBVF's optimal control signal robust to bounded disturbance. We demonstrate the benefit of using the CBVFs for double-integrator and Dubins car systems by comparing it to previous methods. △ Less

Submitted 25 October, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: IEEE CDC 2021

arXiv:2103.14300 [pdf, other]

Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction

Authors: Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, Koushil Sreenath

Abstract: An autonomous robot that is able to physically guide humans through narrow and cluttered spaces could be a big boon to the visually-impaired. Most prior robotic guiding systems are based on wheeled platforms with large bases with actuated rigid guiding canes. The large bases and the actuated arms limit these prior approaches from operating in narrow and cluttered environments. We propose a method… ▽ More An autonomous robot that is able to physically guide humans through narrow and cluttered spaces could be a big boon to the visually-impaired. Most prior robotic guiding systems are based on wheeled platforms with large bases with actuated rigid guiding canes. The large bases and the actuated arms limit these prior approaches from operating in narrow and cluttered environments. We propose a method that introduces a quadrupedal robot with a leash to enable the robot-guiding human system to change its intrinsic dimension (by letting the leash go slack) in order to fit into narrow spaces. We propose a hybrid physical Human-Robot Interaction model that involves leash tension to describe the dynamical relationship in the robot-guiding human system. This hybrid model is utilized in a mixed-integer programming problem to develop a reactive planner that is able to utilize slack-taut switching to guide a blind-folded person to safely travel in a confined space. The proposed leash-guided robot framework is deployed on a Mini Cheetah quadrupedal robot and validated in experiments. △ Less

Submitted 28 June, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: Accepted to 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2103.14295 [pdf, other]

Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

Authors: Zhongyu Li, Xuxin Cheng, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Abstract: Develo** robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, w… ▽ More Develo** robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: To appear on 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2103.12382 [pdf, other]

Rule-Based Safety-Critical Control Design using Control Barrier Functions with Application to Autonomous Lane Change

Authors: Suiyi He, Jun Zeng, Bike Zhang, Koushil Sreenath

Abstract: This paper develops a new control design for guaranteeing a vehicle's safety during lane change maneuvers in a complex traffic environment. The proposed method uses a finite state machine (FSM), where a quadratic program based optimization problem using control Lyapunov functions and control barrier functions (CLF-CBF-QP) is used to calculate the system's optimal inputs via rule-based control stra… ▽ More This paper develops a new control design for guaranteeing a vehicle's safety during lane change maneuvers in a complex traffic environment. The proposed method uses a finite state machine (FSM), where a quadratic program based optimization problem using control Lyapunov functions and control barrier functions (CLF-CBF-QP) is used to calculate the system's optimal inputs via rule-based control strategies. The FSM can make switches between different states automatically according to the command of driver and traffic environment, which makes the ego vehicle find a safe opportunity to do a collision-free lane change maneuver. By using a convex quadratic program, the controller can guarantee the system's safety at a high update frequency. A set of pre-designed typical lane change scenarios as well as randomly generated driving scenarios are simulated to show the performance of our controller. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: Accepted to ACC 2021

arXiv:2103.12375 [pdf, other]

Safety-Critical Control using Optimal-decay Control Barrier Function with Guaranteed Point-wise Feasibility

Authors: Jun Zeng, Bike Zhang, Zhongyu Li, Koushil Sreenath

Abstract: Safety is one of the fundamental problems in robotics. Recently, a quadratic program-based control barrier function (CBF) method has emerged as a way to enforce safety-critical constraints. Together with control Lyapunov function (CLF), it forms a safety-critical control strategy, named CLF-CBF-QP, which can mediate between achieving the control objective and ensuring safety, while being executabl… ▽ More Safety is one of the fundamental problems in robotics. Recently, a quadratic program-based control barrier function (CBF) method has emerged as a way to enforce safety-critical constraints. Together with control Lyapunov function (CLF), it forms a safety-critical control strategy, named CLF-CBF-QP, which can mediate between achieving the control objective and ensuring safety, while being executable in real-time. However, once additional constraints such as input constraints are introduced, the CLF-CBF-QP may encounter infeasibility. In order to address the challenge that arises due to the infeasibility, we propose an optimal-decay form for safety-critical control wherein the decay rate of the CBF is optimized point-wise in time so as to guarantee point-wise feasibility when the state lies inside the safe set. The proposed control design is numerically validated using an adaptive cruise control example. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2101.05916 [pdf, other]

Scalable Learning of Safety Guarantees for Autonomous Systems using Hamilton-Jacobi Reachability

Authors: Sylvia Herbert, Jason J. Choi, Suvansh Sanjeev, Marsalis Gibson, Koushil Sreenath, Claire J. Tomlin

Abstract: Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new k… ▽ More Autonomous systems like aircraft and assistive robots often operate in scenarios where guaranteeing safety is critical. Methods like Hamilton-Jacobi reachability can provide guaranteed safe sets and controllers for such systems. However, often these same scenarios have unknown or uncertain environments, system dynamics, or predictions of other agents. As the system is operating, it may learn new knowledge about these uncertainties and should therefore update its safety analysis accordingly. However, work to learn and update safety analysis is limited to small systems of about two dimensions due to the computational complexity of the analysis. In this paper we synthesize several techniques to speed up computation: decomposition, warm-starting, and adaptive grids. Using this new framework we can update safe sets by one or more orders of magnitude faster than prior work, making this technique practical for many realistic systems. We demonstrate our results on simulated 2D and 10D near-hover quadcopters operating in a windy environment. △ Less

Submitted 2 April, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: The first two authors are co-first authors. ICRA 2021

Showing 1–50 of 67 results for author: Sreenath, K