-
Hierarchical Learned Risk-Aware Planning Framework for Human Driving Modeling
Authors:
Nathan Ludlow,
Yiwei Lyu,
John Dolan
Abstract:
This paper presents a novel approach to modeling human driving behavior, designed for use in evaluating autonomous vehicle control systems in a simulation environments. Our methodology leverages a hierarchical forward-looking, risk-aware estimation framework with learned parameters to generate human-like driving trajectories, accommodating multiple driver levels determined by model parameters. Thi…
▽ More
This paper presents a novel approach to modeling human driving behavior, designed for use in evaluating autonomous vehicle control systems in a simulation environments. Our methodology leverages a hierarchical forward-looking, risk-aware estimation framework with learned parameters to generate human-like driving trajectories, accommodating multiple driver levels determined by model parameters. This approach is grounded in multimodal trajectory prediction, using a deep neural network with LSTM-based social pooling to predict the trajectories of surrounding vehicles. These trajectories are used to compute forward-looking risk assessments along the ego vehicle's path, guiding its navigation. Our method aims to replicate human driving behaviors by learning parameters that emulate human decision-making during driving. We ensure that our model exhibits robust generalization capabilities by conducting simulations, employing real-world driving data to validate the accuracy of our approach in modeling human behavior. The results reveal that our model effectively captures human behavior, showcasing its versatility in modeling human drivers in diverse highway scenarios.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
WROOM: An Autonomous Driving Approach for Off-Road Navigation
Authors:
Dvij Kalaria,
Shreya Sharma,
Sarthak Bhagat,
Haoru Xue,
John M. Dolan
Abstract:
Off-road navigation is a challenging problem both at the planning level to get a smooth trajectory and at the control level to avoid flip** over, hitting obstacles, or getting stuck at a rough patch. There have been several recent works using classical approaches involving depth map prediction followed by smooth trajectory planning and using a controller to track it. We design an end-to-end rein…
▽ More
Off-road navigation is a challenging problem both at the planning level to get a smooth trajectory and at the control level to avoid flip** over, hitting obstacles, or getting stuck at a rough patch. There have been several recent works using classical approaches involving depth map prediction followed by smooth trajectory planning and using a controller to track it. We design an end-to-end reinforcement learning (RL) system for an autonomous vehicle in off-road environments using a custom-designed simulator in the Unity game engine. We warm-start the agent by imitating a rule-based controller and utilize Proximal Policy Optimization (PPO) to improve the policy based on a reward that incorporates Control Barrier Functions (CBF), facilitating the agent's ability to generalize effectively to real-world scenarios. The training involves agents concurrently undergoing domain-randomized trials in various environments. We also propose a novel simulation environment to replicate off-road driving scenarios and deploy our proposed approach on a real buggy RC car.
Videos and additional results: https://sites.google.com/view/wroom-utd/home
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving
Authors:
Adam Villaflor,
Brian Yang,
Huangyuan Su,
Katerina Fragkiadaki,
John Dolan,
Jeff Schneider
Abstract:
Significant progress has been made in training multimodal trajectory forecasting models for autonomous driving. However, effectively integrating these models with downstream planners and model-based control approaches is still an open problem. Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop…
▽ More
Significant progress has been made in training multimodal trajectory forecasting models for autonomous driving. However, effectively integrating these models with downstream planners and model-based control approaches is still an open problem. Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining. We consider recent trajectory prediction approaches which leverage learned anchor embeddings to predict multiple trajectories, finding that these anchor embeddings can parameterize discrete and distinct modes representing high-level driving behaviors. We propose to perform fully reactive closed-loop planning over these discrete latent modes, allowing us to tractably model the causal interactions between agents at each step. We validate our approach on a suite of more dynamic merging scenarios, finding that our approach avoids the $\textit{frozen robot problem}$ which is pervasive in conventional planners. Our approach also outperforms the previous state-of-the-art in CARLA on challenging dense traffic scenarios when evaluated at realistic speeds.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Synthesis and verification of robust-adaptive safe controllers
Authors:
Simin Liu,
Kai S. Yun,
John M. Dolan,
Changliu Liu
Abstract:
Safe control with guarantees generally requires the system model to be known. It is far more challenging to handle systems with uncertain parameters. In this paper, we propose a generic algorithm that can synthesize and verify safe controllers for systems with constant, unknown parameters. In particular, we use robust-adaptive control barrier functions (raCBFs) to achieve safety. We develop new th…
▽ More
Safe control with guarantees generally requires the system model to be known. It is far more challenging to handle systems with uncertain parameters. In this paper, we propose a generic algorithm that can synthesize and verify safe controllers for systems with constant, unknown parameters. In particular, we use robust-adaptive control barrier functions (raCBFs) to achieve safety. We develop new theories and techniques using sum-of-squares that enable us to pose synthesis and verification as a series of convex optimization problems. In our experiments, we show that our algorithms are general and scalable, applying them to three different polynomial systems of up to moderate size (7D). Our raCBFs are currently the most effective way to guarantee safety for uncertain systems, achieving 100% safety and up to 55% performance improvement over a robust baseline.
△ Less
Submitted 2 April, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Safe Deep Policy Adaptation
Authors:
Wenli Xiao,
Tairan He,
John Dolan,
Guanya Shi
Abstract:
A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and…
▽ More
A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
△ Less
Submitted 28 April, 2024; v1 submitted 7 October, 2023;
originally announced October 2023.
-
Learning Model Predictive Control with Error Dynamics Regression for Autonomous Racing
Authors:
Haoru Xue,
Edward L. Zhu,
John M. Dolan,
Francesco Borrelli
Abstract:
This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, li…
▽ More
This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, linear, data-driven learning of the error dynamics. We conducted experiments in simulation and on 1/10th scale hardware, and deployed the proposed LMPC on a full-scale autonomous race car used in the Indy Autonomous Challenge (IAC) with closed loop experiments at the Putnam Park Road Course in Indiana, USA. The results show that the proposed control policy exhibits improved robustness to parameter tuning and data scarcity. Incremental and safety-aware exploration toward the limit of handling and iterative learning of the vehicle dynamics in high-speed domains is observed both in simulations and experiments.
△ Less
Submitted 7 March, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Spline-Based Minimum-Curvature Trajectory Optimization for Autonomous Racing
Authors:
Haoru Xue,
Tianwei Yue,
John M. Dolan
Abstract:
We propose a novel B-spline trajectory optimization method for autonomous racing. We consider the unavailability of sophisticated race car and race track dynamics in early-stage autonomous motorsports development and derive methods that work with limited dynamics data and additional conservative constraints. We formulate a minimum-curvature optimization problem with only the spline control points…
▽ More
We propose a novel B-spline trajectory optimization method for autonomous racing. We consider the unavailability of sophisticated race car and race track dynamics in early-stage autonomous motorsports development and derive methods that work with limited dynamics data and additional conservative constraints. We formulate a minimum-curvature optimization problem with only the spline control points as optimization variables. We then compare the current state-of-the-art method with our optimization result, which achieves a similar level of optimality with a 90% reduction on the decision variable dimension, and in addition offers mathematical smoothness guarantee and flexible manipulation options. We concurrently reduce the problem computation time from seconds to milliseconds for a long race track, enabling future online adaptation of the previously offline technique.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Authors:
Siddarth Venkatraman,
Shivesh Khaitan,
Ravi Tej Akella,
John Dolan,
Jeff Schneider,
Glen Berseth
Abstract:
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approach…
▽ More
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Towards Optimal Head-to-head Autonomous Racing with Curriculum Reinforcement Learning
Authors:
Dvij Kalaria,
Qin Lin,
John M. Dolan
Abstract:
Head-to-head autonomous racing is a challenging problem, as the vehicle needs to operate at the friction or handling limits in order to achieve minimum lap times while also actively looking for strategies to overtake/stay ahead of the opponent. In this work we propose a head-to-head racing environment for reinforcement learning which accurately models vehicle dynamics. Some previous works have tri…
▽ More
Head-to-head autonomous racing is a challenging problem, as the vehicle needs to operate at the friction or handling limits in order to achieve minimum lap times while also actively looking for strategies to overtake/stay ahead of the opponent. In this work we propose a head-to-head racing environment for reinforcement learning which accurately models vehicle dynamics. Some previous works have tried learning a policy directly in the complex vehicle dynamics environment but have failed to learn an optimal policy. In this work, we propose a curriculum learning-based framework by transitioning from a simpler vehicle model to a more complex real environment to teach the reinforcement learning agent a policy closer to the optimal policy. We also propose a control barrier function-based safe reinforcement learning algorithm to enforce the safety of the agent in a more effective way while not compromising on optimality.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Risk-aware Safe Control for Decentralized Multi-agent Systems via Dynamic Responsibility Allocation
Authors:
Yiwei Lyu,
Wenhao Luo,
John M. Dolan
Abstract:
Decentralized control schemes are increasingly favored in various domains that involve multi-agent systems due to the need for computational efficiency as well as general applicability to large-scale systems. However, in the absence of an explicit global coordinator, it is hard for distributed agents to determine how to efficiently interact with others. In this paper, we present a risk-aware decen…
▽ More
Decentralized control schemes are increasingly favored in various domains that involve multi-agent systems due to the need for computational efficiency as well as general applicability to large-scale systems. However, in the absence of an explicit global coordinator, it is hard for distributed agents to determine how to efficiently interact with others. In this paper, we present a risk-aware decentralized control framework that provides guidance on how much relative responsibility share (a percentage) an individual agent should take to avoid collisions with others while moving efficiently without direct communications. We propose a novel Control Barrier Function (CBF)-inspired risk measurement to characterize the aggregate risk agents face from potential collisions under motion uncertainty. We use this measurement to allocate responsibility shares among agents dynamically and develop risk-aware decentralized safe controllers. In this way, we are able to leverage the flexibility of robots with lower risk to improve the motion flexibility for those with higher risk, thus achieving improved collective safety. We demonstrate the validity and efficiency of our proposed approach through two examples: ramp merging in autonomous driving and a multi-agent position-swap** game.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Active Probing and Influencing Human Behaviors Via Autonomous Agents
Authors:
Shuangge Wang,
Yiwei Lyu,
John M. Dolan
Abstract:
Autonomous agents (robots) face tremendous challenges while interacting with heterogeneous human agents in close proximity. One of these challenges is that the autonomous agent does not have an accurate model tailored to the specific human that the autonomous agent is interacting with, which could sometimes result in inefficient human-robot interaction and suboptimal system dynamics. Develo** an…
▽ More
Autonomous agents (robots) face tremendous challenges while interacting with heterogeneous human agents in close proximity. One of these challenges is that the autonomous agent does not have an accurate model tailored to the specific human that the autonomous agent is interacting with, which could sometimes result in inefficient human-robot interaction and suboptimal system dynamics. Develo** an online method to enable the autonomous agent to learn information about the human model is therefore an ongoing research goal. Existing approaches position the robot as a passive learner in the environment to observe the physical states and the associated human response. This passive design, however, only allows the robot to obtain information that the human chooses to exhibit, which sometimes doesn't capture the human's full intention. In this work, we present an online optimization-based probing procedure for the autonomous agent to clarify its belief about the human model in an active manner. By optimizing an information radius, the autonomous agent chooses the action that most challenges its current conviction. This procedure allows the autonomous agent to actively probe the human agents to reveal information that's previously unavailable to the autonomous agent. With this gathered information, the autonomous agent can interactively influence the human agent for some designated objectives. Our main contributions include a coherent theoretical framework that unifies the probing and influence procedures and two case studies in autonomous driving that show how active probing can help to create better participant experience during influence, like higher efficiency or less perturbations.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning
Authors:
Wenli Xiao,
Yiwei Lyu,
John Dolan
Abstract:
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses…
▽ More
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Adaptive Planning and Control with Time-Varying Tire Models for Autonomous Racing Using Extreme Learning Machine
Authors:
Dvij Kalaria,
Qin Lin,
John M. Dolan
Abstract:
Autonomous racing is a challenging problem, as the vehicle needs to operate at the friction or handling limits in order to achieve minimum lap times. Autonomous race cars require highly accurate perception, state estimation, planning and precise application of controls. What makes it even more challenging is the accurate identification of vehicle model parameters that dictate the effects of the la…
▽ More
Autonomous racing is a challenging problem, as the vehicle needs to operate at the friction or handling limits in order to achieve minimum lap times. Autonomous race cars require highly accurate perception, state estimation, planning and precise application of controls. What makes it even more challenging is the accurate identification of vehicle model parameters that dictate the effects of the lateral tire slip, which may change over time, for example, due to wear and tear of the tires. Current works either propose model identification offline or need good parameters to start with (within 15-20\% of actual value), which is not enough to account for major changes in tire model that occur during actual races when driving at the control limits. We propose a unified framework which learns the tire model online from the collected data, as well as adjusts the model based on environmental changes even if the model parameters change by a higher margin. We demonstrate our approach in numeric and high-fidelity simulators for a 1:43 scale race car and a full-size car.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Towards Safety Assured End-to-End Vision-Based Control for Autonomous Racing
Authors:
Dvij Kalaria,
Qin Lin,
John M. Dolan
Abstract:
Autonomous car racing is a challenging task, as it requires precise applications of control while the vehicle is operating at cornering speeds. Traditional autonomous pipelines require accurate pre-map**, localization, and planning which make the task computationally expensive and environment-dependent. Recent works propose use of imitation and reinforcement learning to train end-to-end deep neu…
▽ More
Autonomous car racing is a challenging task, as it requires precise applications of control while the vehicle is operating at cornering speeds. Traditional autonomous pipelines require accurate pre-map**, localization, and planning which make the task computationally expensive and environment-dependent. Recent works propose use of imitation and reinforcement learning to train end-to-end deep neural networks and have shown promising results for high-speed racing. However, the end-to-end models may be dangerous to be deployed on real systems, as the neural networks are treated as black-box models devoid of any provable safety guarantees. In this work we propose a decoupled approach where an optimal end-to-end controller and a state prediction end-to-end model are learned together, and the predicted state of the vehicle is used to formulate a control barrier function for safeguarding the vehicle to stay within lane boundaries. We validate our algorithm both on a high-fidelity Carla driving simulator and a 1/10-scale RC car on a real track. The evaluation results suggest that using an explicit safety controller helps to learn the task safely with fewer iterations and makes it possible to safely navigate the vehicle on the track along the more challenging racing line.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Safe Reinforcement Learning with Probabilistic Control Barrier Functions for Ramp Merging
Authors:
Soumith Udatha,
Yiwei Lyu,
John Dolan
Abstract:
Prior work has looked at applying reinforcement learning and imitation learning approaches to autonomous driving scenarios, but either the safety or the efficiency of the algorithm is compromised. With the use of control barrier functions embedded into the reinforcement learning policy, we arrive at safe policies to optimize the performance of the autonomous driving vehicle. However, control barri…
▽ More
Prior work has looked at applying reinforcement learning and imitation learning approaches to autonomous driving scenarios, but either the safety or the efficiency of the algorithm is compromised. With the use of control barrier functions embedded into the reinforcement learning policy, we arrive at safe policies to optimize the performance of the autonomous driving vehicle. However, control barrier functions need a good approximation of the model of the car. We use probabilistic control barrier functions as an estimate of the model uncertainty. The algorithm is implemented as an online version in the CARLA (Dosovitskiy et al., 2017) Simulator and as an offline version on a dataset extracted from the NGSIM Database. The proposed algorithm is not just a safe ramp merging algorithm but a safe autonomous driving algorithm applied to address ramp merging on highways.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Safe Control Under Input Limits with Neural Control Barrier Functions
Authors:
Simin Liu,
Changliu Liu,
John Dolan
Abstract:
We propose new methods to synthesize control barrier function (CBF)-based safe controllers that avoid input saturation, which can cause safety violations. In particular, our method is created for high-dimensional, general nonlinear systems, for which such tools are scarce. We leverage techniques from machine learning, like neural networks and deep learning, to simplify this challenging problem in…
▽ More
We propose new methods to synthesize control barrier function (CBF)-based safe controllers that avoid input saturation, which can cause safety violations. In particular, our method is created for high-dimensional, general nonlinear systems, for which such tools are scarce. We leverage techniques from machine learning, like neural networks and deep learning, to simplify this challenging problem in nonlinear control design. The method consists of a learner-critic architecture, in which the critic gives counterexamples of input saturation and the learner optimizes a neural CBF to eliminate those counterexamples. We provide empirical results on a 10D state, 4D input quadcopter-pendulum system. Our learned CBF avoids input saturation and maintains safety over nearly 100% of trials.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Spatio-temporal Motion Planning for Autonomous Vehicles with Trapezoidal Prism Corridors and Bézier Curves
Authors:
Srujan Deolasee,
Qin Lin,
Jialun Li,
John M. Dolan
Abstract:
Safety-guaranteed motion planning is critical for self-driving cars to generate collision-free trajectories. A layered motion planning approach with decoupled path and speed planning is widely used for this purpose. This approach is prone to be suboptimal in the presence of dynamic obstacles. Spatial-temporal approaches deal with path planning and speed planning simultaneously; however, the existi…
▽ More
Safety-guaranteed motion planning is critical for self-driving cars to generate collision-free trajectories. A layered motion planning approach with decoupled path and speed planning is widely used for this purpose. This approach is prone to be suboptimal in the presence of dynamic obstacles. Spatial-temporal approaches deal with path planning and speed planning simultaneously; however, the existing methods only support simple-shaped corridors like cuboids, which restrict the search space for optimization in complex scenarios. We propose to use trapezoidal prism-shaped corridors for optimization, which significantly enlarges the solution space compared to the existing cuboidal corridors-based method. Finally, a piecewise Bézier curve optimization is conducted in our proposed corridors. This formulation theoretically guarantees the safety of the continuous-time trajectory. We validate the efficiency and effectiveness of the proposed approach in numerical and CommonRoad simulations.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Delay-aware Robust Control for Safe Autonomous Driving and Racing
Authors:
Dvij Kalaria,
Qin Lin,
John M. Dolan
Abstract:
Delays endanger safety of autonomous systems operating in a rapidly changing environment, such as nondeterministic surrounding traffic participants in autonomous driving and high-speed racing. Unfortunately, delays are typically not considered during the conventional controller design or learning-enabled controller training phases prior to deployment in the physical world. In this paper, the compu…
▽ More
Delays endanger safety of autonomous systems operating in a rapidly changing environment, such as nondeterministic surrounding traffic participants in autonomous driving and high-speed racing. Unfortunately, delays are typically not considered during the conventional controller design or learning-enabled controller training phases prior to deployment in the physical world. In this paper, the computation delay from nonlinear optimization for motion planning and control, as well as other unavoidable delays caused by actuators, are addressed systematically and unifiedly. To deal with all these delays, in our framework: 1) we propose a new filtering approach with no prior knowledge of dynamics and disturbance distribution to adaptively and safely estimate the time-variant computation delay; 2) we model actuation dynamics for steering delay; 3) all the constrained optimization is realized in a robust tube model predictive controller. For the application merits, we demonstrate that our approach is suitable for both autonomous driving and autonomous racing. Our approach is a novel design for a standalone delay compensation controller. In addition, in the case that a learning-enabled controller assuming no delay works as a primary controller, our approach serves as the primary controller's safety guard.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
Authors:
Adam Villaflor,
Zhe Huang,
Swapnil Pande,
John Dolan,
Jeff Schneider
Abstract:
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because thes…
▽ More
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
State Dropout-Based Curriculum Reinforcement Learning for Self-Driving at Unsignalized Intersections
Authors:
Shivesh Khaitan,
John M. Dolan
Abstract:
Traversing intersections is a challenging problem for autonomous vehicles, especially when the intersections do not have traffic control. Recently deep reinforcement learning has received massive attention due to its success in dealing with autonomous driving tasks. In this work, we address the problem of traversing unsignalized intersections using a novel curriculum for deep reinforcement learnin…
▽ More
Traversing intersections is a challenging problem for autonomous vehicles, especially when the intersections do not have traffic control. Recently deep reinforcement learning has received massive attention due to its success in dealing with autonomous driving tasks. In this work, we address the problem of traversing unsignalized intersections using a novel curriculum for deep reinforcement learning. The proposed curriculum leads to: 1) A faster training process for the reinforcement learning agent, and 2) Better performance compared to an agent trained without curriculum. Our main contribution is two-fold: 1) Presenting a unique curriculum for training deep reinforcement learning agents, and 2) showing the application of the proposed curriculum for the unsignalized intersection traversal task. The framework expects processed observations of the surroundings from the perception system of the autonomous vehicle. We test our method in the CommonRoad motion planning simulator on T-intersections and four-way intersections.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
Responsibility-associated Multi-agent Collision Avoidance with Social Preferences
Authors:
Yiwei Lyu,
Wenhao Luo,
John M. Dolan
Abstract:
This paper introduces a novel social preference-aware decentralized safe control framework to address the responsibility allocation problem in multi-agent collision avoidance. Considering that agents do not necessarily cooperate in symmetric ways, this paper focuses on semi-cooperative behavior among heterogeneous agents with varying cooperation levels. Drawing upon the idea of Social Value Orient…
▽ More
This paper introduces a novel social preference-aware decentralized safe control framework to address the responsibility allocation problem in multi-agent collision avoidance. Considering that agents do not necessarily cooperate in symmetric ways, this paper focuses on semi-cooperative behavior among heterogeneous agents with varying cooperation levels. Drawing upon the idea of Social Value Orientation (SVO) for quantifying the individual selfishness, we propose a novel concept of Responsibility-associated Social Value Orientation (R-SVO) to express the intended relative social implications between pairwise agents. This is used to redefine each agent's social preferences or personalities in terms of corresponding responsibility shares in contributing to the coordination scenario, such as semi-cooperative collision avoidance where all agents interact in an asymmetric way. By incorporating such relative social implications through proposed Local Pairwise Responsibility Weights, we develop a Responsibility-associated Control Barrier Function-based safe control framework for individual agents, and multi-agent collision avoidance is achieved with formally provable safety guarantees. Simulations are provided to demonstrate the effectiveness and efficiency of the proposed framework in several multi-agent navigation tasks, such as a position-swap** game, a self-driving car highway ramp merging scenario, and a circular position swap** game.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Provable Probabilistic Safety and Feasibility-Assured Control for Autonomous Vehicles using Exponential Control Barrier Functions
Authors:
Spencer Van Koevering,
Yiwei Lyu,
Wenhao Luo,
John Dolan
Abstract:
With the increasing need for safe control in the domain of autonomous driving, model-based safety-critical control approaches are widely used, especially Control Barrier Function (CBF)-based approaches. Among them, Exponential CBF (eCBF) is particularly popular due to its realistic applicability to high-relative-degree systems. However, for most of the optimization-based controllers utilizing CBF-…
▽ More
With the increasing need for safe control in the domain of autonomous driving, model-based safety-critical control approaches are widely used, especially Control Barrier Function (CBF)-based approaches. Among them, Exponential CBF (eCBF) is particularly popular due to its realistic applicability to high-relative-degree systems. However, for most of the optimization-based controllers utilizing CBF-based constraints, solution feasibility is a common issue arising from potential conflict among different constraints. Moreover, how to incorporate uncertainty into the eCBF-based constraints in high-relative-degree systems to account for safety remains an open challenge. In this paper, we present a novel approach to extend an eCBF-based safe critical controller to a probabilistic setting to handle potential motion uncertainty from system dynamics. More importantly, we leverage an optimization-based technique to provide a solution feasibility guarantee in run time, while ensuring probabilistic safety. Lane changing and intersection handling are demonstrated as two use cases, and experiment results are provided to show the effectiveness of the proposed approach.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
BATS: Best Action Trajectory Stitching
Authors:
Ian Char,
Viraj Mehta,
Adam Villaflor,
John M. Dolan,
Jeff Schneider
Abstract:
The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for develo** algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by plannin…
▽ More
The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for develo** algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by planning on the fixed dataset directly. Specifically, we introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We do this by using learned dynamics models to plan short trajectories between states. Since exact value iteration can be performed on this constructed MDP, it becomes easy to identify which trajectories are advantageous to add to the MDP. Crucially, since most transitions in this MDP come from the logged data, trajectories from the MDP can be rolled out for long periods with confidence. We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics. Finally, we demonstrate empirically how algorithms that uniformly constrain the learned policy to the entire dataset can result in unwanted behavior, and we show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.
△ Less
Submitted 25 April, 2022;
originally announced April 2022.
-
Adaptive Safe Merging Control for Heterogeneous Autonomous Vehicles using Parametric Control Barrier Functions
Authors:
Yiwei Lyu,
Wenhao Luo,
John M. Dolan
Abstract:
With the increasing emphasis on the safe autonomy for robots, model-based safe control approaches such as Control Barrier Functions have been extensively studied to ensure guaranteed safety during inter-robot interactions. In this paper, we introduce the Parametric Control Barrier Function (Parametric-CBF), a novel variant of the traditional Control Barrier Function to extend its expressivity in d…
▽ More
With the increasing emphasis on the safe autonomy for robots, model-based safe control approaches such as Control Barrier Functions have been extensively studied to ensure guaranteed safety during inter-robot interactions. In this paper, we introduce the Parametric Control Barrier Function (Parametric-CBF), a novel variant of the traditional Control Barrier Function to extend its expressivity in describing different safe behaviors among heterogeneous robots. Instead of assuming cooperative and homogeneous robots using the same safe controllers, the ego robot is able to model the neighboring robots' underlying safe controllers through different Parametric-CBFs with observed data. Given learned parametric-CBF and proved forward invariance, it provides greater flexibility for the ego robot to better coordinate with other heterogeneous robots with improved efficiency while enjoying formally provable safety guarantees. We demonstrate the usage of Parametric-CBF in behavior prediction and adaptive safe control in the ramp merging scenario from the applications of autonomous driving. Compared to traditional CBF, Parametric-CBF has the advantage of capturing varying drivers' characteristics given richer description of robot behavior in the context of safe control. Numerical simulations are given to validate the effectiveness of the proposed method.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Delay-aware Robust Control for Safe Autonomous Driving
Authors:
Dvij Kalaria,
Qin Lin,
John M. Dolan
Abstract:
With the advancement of affordable self-driving vehicles using complicated nonlinear optimization but limited computation resources, computation time becomes a matter of concern. Other factors such as actuator dynamics and actuator command processing cost also unavoidably cause delays. In high-speed scenarios, these delays are critical to the safety of a vehicle. Recent works consider these delays…
▽ More
With the advancement of affordable self-driving vehicles using complicated nonlinear optimization but limited computation resources, computation time becomes a matter of concern. Other factors such as actuator dynamics and actuator command processing cost also unavoidably cause delays. In high-speed scenarios, these delays are critical to the safety of a vehicle. Recent works consider these delays individually, but none unifies them all in the context of autonomous driving. Moreover, recent works inappropriately consider computation time as a constant or a large upper bound, which makes the control either less responsive or over-conservative. To deal with all these delays, we present a unified framework by 1) modeling actuation dynamics, 2) using robust tube model predictive control, 3) using a novel adaptive Kalman filter without assuminga known process model and noise covariance, which makes the controller safe while minimizing conservativeness. On onehand, our approach can serve as a standalone controller; on theother hand, our approach provides a safety guard for a high-level controller, which assumes no delay. This can be used for compensating the sim-to-real gap when deploying a black-box learning-enabled controller trained in a simplistic environment without considering delays for practical vehicle systems.
△ Less
Submitted 16 October, 2023; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Probabilistic Safety-Assured Adaptive Merging Control for Autonomous Vehicles
Authors:
Yiwei Lyu,
Wenhao Luo,
John M. Dolan
Abstract:
Autonomous vehicles face tremendous challenges while interacting with human drivers in different kinds of scenarios. Develo** control methods with safety guarantees while performing interactions with uncertainty is an ongoing research goal. In this paper, we present a real-time safe control framework using bi-level optimization with Control Barrier Function (CBF) that enables an autonomous ego v…
▽ More
Autonomous vehicles face tremendous challenges while interacting with human drivers in different kinds of scenarios. Develo** control methods with safety guarantees while performing interactions with uncertainty is an ongoing research goal. In this paper, we present a real-time safe control framework using bi-level optimization with Control Barrier Function (CBF) that enables an autonomous ego vehicle to interact with human-driven cars in ramp merging scenarios with a consistent safety guarantee. In order to explicitly address motion uncertainty, we propose a novel extension of control barrier functions to a probabilistic setting with provable chance-constrained safety and analyze the feasibility of our control design. The formulated bi-level optimization framework entails first choosing the ego vehicle's optimal driving style in terms of safety and primary objective, and then minimally modifying a nominal controller in the context of quadratic programming subject to the probabilistic safety constraints. This allows for adaptation to different driving strategies with a formally provable feasibility guarantee for the ego vehicle's safe controller. Experimental results are provided to demonstrate the effectiveness of our proposed approach.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios
Authors:
Christoph Killing,
Adam Villaflor,
John M. Dolan
Abstract:
Recently, autonomous driving has made substantial progress in addressing the most common traffic scenarios like intersection navigation and lane changing. However, most of these successes have been limited to scenarios with well-defined traffic rules and require minimal negotiation with other vehicles. In this paper, we introduce a previously unconsidered, yet everyday, high-conflict driving scena…
▽ More
Recently, autonomous driving has made substantial progress in addressing the most common traffic scenarios like intersection navigation and lane changing. However, most of these successes have been limited to scenarios with well-defined traffic rules and require minimal negotiation with other vehicles. In this paper, we introduce a previously unconsidered, yet everyday, high-conflict driving scenario requiring negotiations between agents of equal rights and priorities. There exists no centralized control structure and we do not allow communications. Therefore, it is unknown if other drivers are willing to cooperate, and if so to what extent. We train policies to robustly negotiate with opposing vehicles of an unobservable degree of cooperativeness using multi-agent reinforcement learning (MARL). We propose Discrete Asymmetric Soft Actor-Critic (DASAC), a maximum-entropy off-policy MARL algorithm allowing for centralized training with decentralized execution. We show that using DASAC we are able to successfully negotiate and traverse the scenario considered over 99% of the time. Our agents are robust to an unknown timing of opponent decisions, an unobservable degree of cooperativeness of the opposing vehicle, and previously unencountered policies. Furthermore, they learn to exhibit human-like behaviors such as defensive driving, anticipating solution options and interpreting the behavior of other agents.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning
Authors:
Kaleb Ben Naveed,
Zhiqian Qiao,
John M. Dolan
Abstract:
Planning safe trajectories under uncertain and dynamic conditions makes the autonomous driving problem significantly complex. Current sampling-based methods such as Rapidly Exploring Random Trees (RRTs) are not ideal for this problem because of the high computational cost. Supervised learning methods such as Imitation Learning lack generalization and safety guarantees. To address these problems an…
▽ More
Planning safe trajectories under uncertain and dynamic conditions makes the autonomous driving problem significantly complex. Current sampling-based methods such as Rapidly Exploring Random Trees (RRTs) are not ideal for this problem because of the high computational cost. Supervised learning methods such as Imitation Learning lack generalization and safety guarantees. To address these problems and in order to ensure a robust framework, we propose a Hierarchical Reinforcement Learning (HRL) structure combined with a Proportional-Integral-Derivative (PID) controller for trajectory planning. HRL helps divide the task of autonomous vehicle driving into sub-goals and supports the network to learn policies for both high-level options and low-level trajectory planner choices. The introduction of sub-goals decreases convergence time and enables the policies learned to be reused for other scenarios. In addition, the proposed planner is made robust by guaranteeing smooth trajectories and by handling the noisy perception system of the ego-car. The PID controller is used for tracking the waypoints, which ensures smooth trajectories and reduces jerk. The problem of incomplete observations is handled by using a Long-Short-Term-Memory (LSTM) layer in the network. Results from the high-fidelity CARLA simulator indicate that the proposed method reduces convergence time, generates smoother trajectories, and is able to handle dynamic surroundings and noisy observations.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Safe Trajectory Planning Using Reinforcement Learning for Self Driving
Authors:
Josiah Coad,
Zhiqian Qiao,
John M. Dolan
Abstract:
Self-driving vehicles must be able to act intelligently in diverse and difficult environments, marked by high-dimensional state spaces, a myriad of optimization objectives and complex behaviors. Traditionally, classical optimization and search techniques have been applied to the problem of self-driving; but they do not fully address operations in environments with high-dimensional states and compl…
▽ More
Self-driving vehicles must be able to act intelligently in diverse and difficult environments, marked by high-dimensional state spaces, a myriad of optimization objectives and complex behaviors. Traditionally, classical optimization and search techniques have been applied to the problem of self-driving; but they do not fully address operations in environments with high-dimensional states and complex behaviors. Recently, imitation learning has been proposed for the task of self-driving; but it is labor-intensive to obtain enough training data. Reinforcement learning has been proposed as a way to directly control the car, but this has safety and comfort concerns. We propose using model-free reinforcement learning for the trajectory planning stage of self-driving and show that this approach allows us to operate the car in a more safe, general and comfortable manner, required for the task of self driving.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Behavior Planning at Urban Intersections through Hierarchical Reinforcement Learning
Authors:
Zhiqian Qiao,
Jeff Schneider,
John M. Dolan
Abstract:
For autonomous vehicles, effective behavior planning is crucial to ensure safety of the ego car. In many urban scenarios, it is hard to create sufficiently general heuristic rules, especially for challenging scenarios that some new human drivers find difficult. In this work, we propose a behavior planning structure based on reinforcement learning (RL) which is capable of performing autonomous vehi…
▽ More
For autonomous vehicles, effective behavior planning is crucial to ensure safety of the ego car. In many urban scenarios, it is hard to create sufficiently general heuristic rules, especially for challenging scenarios that some new human drivers find difficult. In this work, we propose a behavior planning structure based on reinforcement learning (RL) which is capable of performing autonomous vehicle behavior planning with a hierarchical structure in simulated urban environments. Application of the hierarchical structure allows the various layers of the behavior planning system to be satisfied. Our algorithms can perform better than heuristic-rule-based methods for elective decisions such as when to turn left between vehicles approaching from the opposite direction or possible lane-change when approaching an intersection due to lane blockage or delay in front of the ego car. Such behavior is hard to evaluate as correct or incorrect, but for some aggressive expert human drivers handle such scenarios effectively and quickly. On the other hand, compared to traditional RL methods, our algorithm is more sample-efficient, due to the use of a hybrid reward mechanism and heuristic exploration during the training process. The results also show that the proposed method converges to an optimal policy faster than traditional RL methods.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Safe planning and control under uncertainty for self-driving
Authors:
Shivesh Khaitan,
Qin Lin,
John M. Dolan
Abstract:
Motion Planning under uncertainty is critical for safe self-driving. In this paper, we propose a unified obstacle avoidance framework that deals with 1) uncertainty in ego-vehicle motion; and 2) prediction uncertainty of dynamic obstacles from environment. A two-stage traffic participant trajectory predictor comprising short-term and long-term prediction is used in the planning layer to generate s…
▽ More
Motion Planning under uncertainty is critical for safe self-driving. In this paper, we propose a unified obstacle avoidance framework that deals with 1) uncertainty in ego-vehicle motion; and 2) prediction uncertainty of dynamic obstacles from environment. A two-stage traffic participant trajectory predictor comprising short-term and long-term prediction is used in the planning layer to generate safe but not over-conservative trajectories for the ego vehicle. The prediction module cooperates well with existing planning approaches. Our work showcases its effectiveness in a Frenet frame planner. A robust controller using tube MPC guarantees safe execution of the trajectory in the presence of state noise and dynamic model uncertainty. A Gaussian process regression model is used for online identification of the uncertainty's bound. We demonstrate effectiveness, safety, and real-time performance of our framework in the CARLA simulator.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera
Authors:
Chen Fu,
Chiyu Dong,
Christoph Mertz,
John M. Dolan
Abstract:
Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even though state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of s…
▽ More
Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even though state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of sparse LIDAR features and dense RGB features through an end-to-end fusion architecture. In this paper, we introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model. The proposed demonstration and aggregation network propagates the mixed context and depth features to the prediction network and serves as a prior knowledge of the depth completion. This late-fusion block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features. In addition to evaluating the proposed method on benchmark depth completion datasets including NYUDepthV2 and KITTI, we also test the proposed method on a simulated planar LIDAR dataset. Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset with various 3D densities.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Safe Planning for Self-Driving Via Adaptive Constrained ILQR
Authors:
Yanjun Pan,
Qin Lin,
Het Shah,
John M. Dolan
Abstract:
Constrained Iterative Linear Quadratic Regulator (CILQR), a variant of ILQR, has been recently proposed for motion planning problems of autonomous vehicles to deal with constraints such as obstacle avoidance and reference tracking. However, the previous work considers either deterministic trajectories or persistent prediction for target dynamical obstacles. The other drawback is lack of generality…
▽ More
Constrained Iterative Linear Quadratic Regulator (CILQR), a variant of ILQR, has been recently proposed for motion planning problems of autonomous vehicles to deal with constraints such as obstacle avoidance and reference tracking. However, the previous work considers either deterministic trajectories or persistent prediction for target dynamical obstacles. The other drawback is lack of generality - it requires manual weight tuning for different scenarios. In this paper, two significant improvements are achieved. Firstly, a two-stage uncertainty-aware prediction is proposed. The short-term prediction with safety guarantee based on reachability analysis is responsible for dealing with extreme maneuvers conducted by target vehicles. The long-term prediction leveraging an adaptive least square filter preserves the long-term optimality of the planned trajectory since using reachability only for long-term prediction is too pessimistic and makes the planner over-conservative. Secondly, to allow a wider coverage over different scenarios and to avoid tedious parameter tuning case by case, this paper designs a scenario-based analytical function taking the states from the ego vehicle and the target vehicle as input, and carrying weights of a cost function as output. It allows the ego vehicle to execute multiple behaviors (such as lane-kee** and overtaking) under a single planner. We demonstrate safety, effectiveness, and real-time performance of the proposed planner in simulations.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Human Driver Behavior Prediction based on UrbanFlow
Authors:
Zhiqian Qiao,
**g Zhao,
Zachariah Tyree,
Priyantha Mudalige,
Jeff Schneider,
John M. Dolan
Abstract:
How autonomous vehicles and human drivers share public transportation systems is an important problem, as fully automatic transportation environments are still a long way off. Understanding human drivers' behavior can be beneficial for autonomous vehicle decision making and planning, especially when the autonomous vehicle is surrounded by human drivers who have various driving behaviors and patter…
▽ More
How autonomous vehicles and human drivers share public transportation systems is an important problem, as fully automatic transportation environments are still a long way off. Understanding human drivers' behavior can be beneficial for autonomous vehicle decision making and planning, especially when the autonomous vehicle is surrounded by human drivers who have various driving behaviors and patterns of interaction with other vehicles. In this paper, we propose an LSTM-based trajectory prediction method for human drivers which can help the autonomous vehicle make better decisions, especially in urban intersection scenarios. Meanwhile, in order to collect human drivers' driving behavior data in the urban scenario, we describe a system called UrbanFlow which includes the whole procedure from raw bird's-eye view data collection via drone to the final processed trajectories. The system is mainly intended for urban scenarios but can be extended to be used for any traffic scenarios.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning
Authors:
Zhiqian Qiao,
Zachariah Tyree,
Priyantha Mudalige,
Jeff Schneider,
John M. Dolan
Abstract:
In this work, we propose a hierarchical reinforcement learning (HRL) structure which is capable of performing autonomous vehicle planning tasks in simulated environments with multiple sub-goals. In this hierarchical structure, the network is capable of 1) learning one task with multiple sub-goals simultaneously; 2) extracting attentions of states according to changing sub-goals during the learning…
▽ More
In this work, we propose a hierarchical reinforcement learning (HRL) structure which is capable of performing autonomous vehicle planning tasks in simulated environments with multiple sub-goals. In this hierarchical structure, the network is capable of 1) learning one task with multiple sub-goals simultaneously; 2) extracting attentions of states according to changing sub-goals during the learning process; 3) reusing the well-trained network of sub-goals for other similar tasks with the same sub-goals. The states are defined as processed observations which are transmitted from the perception system of the autonomous vehicle. A hybrid reward mechanism is designed for different hierarchical layers in the proposed HRL structure. Compared to traditional RL methods, our algorithm is more sample-efficient since its modular design allows reusing the policies of sub-goals across similar tasks. The results show that the proposed method converges to an optimal policy faster than traditional RL methods.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data
Authors:
Qin Lin,
Sicco Verwer,
John Dolan
Abstract:
Imitation learning provides a way to automatically construct a controller by mimicking human behavior from data. For safety-critical systems such as autonomous vehicles, it can be problematic to use controllers learned from data because they cannot be guaranteed to be collision-free. Recently, a method has been proposed for learning a multi-mode hybrid automaton cruise controller (MOHA). Besides b…
▽ More
Imitation learning provides a way to automatically construct a controller by mimicking human behavior from data. For safety-critical systems such as autonomous vehicles, it can be problematic to use controllers learned from data because they cannot be guaranteed to be collision-free. Recently, a method has been proposed for learning a multi-mode hybrid automaton cruise controller (MOHA). Besides being accurate, the logical nature of this model makes it suitable for formal verification. In this paper, we demonstrate this capability using the SpaceEx hybrid model checker as follows. After learning, we translate the automaton model into constraints and equations required by SpaceEx. We then verify that a pure MOHA controller is not collision-free. By adding a safety state based on headway in time, a rule that human drivers should follow anyway, we do obtain a provably safe cruise control. Moreover, the safe controller remains more human-like than existing cruise controllers.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
Measuring Similarity of Interactive Driving Behaviors Using Matrix Profile
Authors:
Qin Lin,
Wenshuo Wang,
Yihuan Zhang,
John Dolan
Abstract:
Understanding multi-vehicle interactive behaviors with temporal sequential observations is crucial for autonomous vehicles to make appropriate decisions in an uncertain traffic environment. On-demand similarity measures are significant for autonomous vehicles to deal with massive interactive driving behaviors by clustering and classifying diverse scenarios. This paper proposes a general approach f…
▽ More
Understanding multi-vehicle interactive behaviors with temporal sequential observations is crucial for autonomous vehicles to make appropriate decisions in an uncertain traffic environment. On-demand similarity measures are significant for autonomous vehicles to deal with massive interactive driving behaviors by clustering and classifying diverse scenarios. This paper proposes a general approach for measuring spatiotemporal similarity of interactive behaviors using a multivariate matrix profile technique. The key attractive features of the approach are its superior space and time complexity, real-time online computing for streaming traffic data, and possible capability of leveraging hardware for parallel computation. The proposed approach is validated through automatically discovering similar interactive driving behaviors at intersections from sequential data.
△ Less
Submitted 11 March, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Low-cost LIDAR based Vehicle Pose Estimation and Tracking
Authors:
Chen Fu,
Chiyu Dong,
Xiao Zhang,
John M. Dolan
Abstract:
Detecting surrounding vehicles by low-cost LIDAR has been drawing enormous attention. In low-cost LIDAR, vehicles present a multi-layer L-Shape. Based on our previous optimization/criteria-based L-Shape fitting algorithm, we here propose a data-driven and model-based method for robust vehicle segmentation and tracking. The new method uses T-linkage RANSAC to take a limited amount of noisy data and…
▽ More
Detecting surrounding vehicles by low-cost LIDAR has been drawing enormous attention. In low-cost LIDAR, vehicles present a multi-layer L-Shape. Based on our previous optimization/criteria-based L-Shape fitting algorithm, we here propose a data-driven and model-based method for robust vehicle segmentation and tracking. The new method uses T-linkage RANSAC to take a limited amount of noisy data and performs a robust segmentation for a moving car against noise. Compared with our previous method, T-Linkage RANSAC is more tolerant of observation uncertainties, i.e., the number of sides of the target being observed, and gets rid of the L-Shape assumption. In addition, a vehicle tracking system with Multi-Model Association (MMA) is built upon the segmentation result, which provides smooth trajectories of tracked objects. A manually labeled dataset from low-cost multi-layer LIDARs for validation will also be released with the paper. Experiments on the dataset show that the new approach outperforms previous ones based on multiple criteria. The new algorithm can also run in real-time.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Learning On-Road Visual Control for Self-Driving Vehicles with Auxiliary Tasks
Authors:
Yilun Chen,
Praveen Palanisamy,
Priyantha Mudalige,
Katharina Muelling,
John M. Dolan
Abstract:
A safe and robust on-road navigation system is a crucial component of achieving fully automated vehicles. NVIDIA recently proposed an End-to-End algorithm that can directly learn steering commands from raw pixels of a front camera by using one convolutional neural network. In this paper, we leverage auxiliary information aside from raw images and design a novel network structure, called Auxiliary…
▽ More
A safe and robust on-road navigation system is a crucial component of achieving fully automated vehicles. NVIDIA recently proposed an End-to-End algorithm that can directly learn steering commands from raw pixels of a front camera by using one convolutional neural network. In this paper, we leverage auxiliary information aside from raw images and design a novel network structure, called Auxiliary Task Network (ATN), to help boost the driving performance while maintaining the advantage of minimal training data and an End-to-End training method. In this network, we introduce human prior knowledge into vehicle navigation by transferring features from image recognition tasks. Image semantic segmentation is applied as an auxiliary task for navigation. We consider temporal information by introducing an LSTM module and optical flow to the network. Finally, we combine vehicle kinematics with a sensor fusion step. We discuss the benefits of our method over state-of-the-art visual navigation methods both in the Udacity simulation environment and on the real-world Comma.ai dataset.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena
Authors:
Jie Chen,
Kian Hsiang Low,
Colin Keng-Yan Tan,
Ali Oran,
Patrick Jaillet,
John Dolan,
Gaurav Sukhatme
Abstract:
The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicti…
▽ More
The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-oftheart centralized algorithms while achieving comparable predictive performance.
△ Less
Submitted 9 August, 2014;
originally announced August 2014.
-
Information-Theoretic Approach to Efficient Adaptive Path Planning for Mobile Robotic Environmental Sensing
Authors:
Kian Hsiang Low,
John M. Dolan,
Pradeep Khosla
Abstract:
Recent research in robot exploration and map** has focused on sampling environmental hotspot fields. This exploration task is formalized by Low, Dolan, and Khosla (2008) in a sequential decision-theoretic planning under uncertainty framework called MASP. The time complexity of solving MASP approximately depends on the map resolution, which limits its use in large-scale, high-resolution explorati…
▽ More
Recent research in robot exploration and map** has focused on sampling environmental hotspot fields. This exploration task is formalized by Low, Dolan, and Khosla (2008) in a sequential decision-theoretic planning under uncertainty framework called MASP. The time complexity of solving MASP approximately depends on the map resolution, which limits its use in large-scale, high-resolution exploration and map**. To alleviate this computational difficulty, this paper presents an information-theoretic approach to MASP (iMASP) for efficient adaptive path planning; by reformulating the cost-minimizing iMASP as a reward-maximizing problem, its time complexity becomes independent of map resolution and is less sensitive to increasing robot team size as demonstrated both theoretically and empirically. Using the reward-maximizing dual, we derive a novel adaptive variant of maximum entropy sampling, thus improving the induced exploration policy performance. It also allows us to establish theoretical bounds quantifying the performance advantage of optimal adaptive over non-adaptive policies and the performance quality of approximately optimal vs. optimal adaptive policies. We show analytically and empirically the superior performance of iMASP-based policies for sampling the log-Gaussian process to that of policies for the widely-used Gaussian process in map** the hotspot field. Lastly, we provide sufficient conditions that, when met, guarantee adaptivity has no benefit under an assumed environment model.
△ Less
Submitted 27 May, 2013;
originally announced May 2013.
-
Multi-Robot Informative Path Planning for Active Sensing of Environmental Phenomena: A Tale of Two Algorithms
Authors:
Nannan Cao,
Kian Hsiang Low,
John M. Dolan
Abstract:
A key problem of robotic environmental sensing and monitoring is that of active sensing: How can a team of robots plan the most informative observation paths to minimize the uncertainty in modeling and predicting an environmental phenomenon? This paper presents two principled approaches to efficient information-theoretic path planning based on entropy and mutual information criteria for in situ ac…
▽ More
A key problem of robotic environmental sensing and monitoring is that of active sensing: How can a team of robots plan the most informative observation paths to minimize the uncertainty in modeling and predicting an environmental phenomenon? This paper presents two principled approaches to efficient information-theoretic path planning based on entropy and mutual information criteria for in situ active sensing of an important broad class of widely-occurring environmental phenomena called anisotropic fields. Our proposed algorithms are novel in addressing a trade-off between active sensing performance and time efficiency. An important practical consequence is that our algorithms can exploit the spatial correlation structure of Gaussian process-based anisotropic fields to improve time efficiency while preserving near-optimal active sensing performance. We analyze the time complexity of our algorithms and prove analytically that they scale better than state-of-the-art algorithms with increasing planning horizon length. We provide theoretical guarantees on the active sensing performance of our algorithms for a class of exploration tasks called transect sampling, which, in particular, can be improved with longer planning time and/or lower spatial correlation along the transect. Empirical evaluation on real-world anisotropic field data shows that our algorithms can perform better or at least as well as the state-of-the-art algorithms while often incurring a few orders of magnitude less computational time, even when the field conditions are less favorable.
△ Less
Submitted 5 February, 2013; v1 submitted 4 February, 2013;
originally announced February 2013.
-
Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena
Authors:
Jie Chen,
Kian Hsiang Low,
Colin Keng-Yan Tan,
Ali Oran,
Patrick Jaillet,
John M. Dolan,
Gaurav S. Sukhatme
Abstract:
The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicti…
▽ More
The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-of-the-art centralized algorithms while achieving comparable predictive performance.
△ Less
Submitted 28 June, 2012; v1 submitted 27 June, 2012;
originally announced June 2012.
-
Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing
Authors:
Kian Hsiang Low,
John M. Dolan,
Pradeep Khosla
Abstract:
Recent research in multi-robot exploration and map** has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing information-theoretic exploration strategies for learning GP-based environmental field maps adopt the non-Markovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes com…
▽ More
Recent research in multi-robot exploration and map** has focused on sampling environmental fields, which are typically modeled using the Gaussian process (GP). Existing information-theoretic exploration strategies for learning GP-based environmental field maps adopt the non-Markovian problem structure and consequently scale poorly with the length of history of observations. Hence, it becomes computationally impractical to use these strategies for in situ, real-time active sampling. To ease this computational burden, this paper presents a Markov-based approach to efficient information-theoretic path planning for active sampling of GP-based fields. We analyze the time complexity of solving the Markov-based path planning problem, and demonstrate analytically that it scales better than that of deriving the non-Markovian strategies with increasing length of planning horizon. For a class of exploration tasks called the transect sampling task, we provide theoretical guarantees on the active sampling performance of our Markov-based policy, from which ideal environmental field conditions and sampling task settings can be established to limit its performance degradation due to violation of the Markov assumption. Empirical evaluation on real-world temperature and plankton density field data shows that our Markov-based policy can generally achieve active sampling performance comparable to that of the widely-used non-Markovian greedy policies under less favorable realistic field conditions and task settings while enjoying significant computational gain over them.
△ Less
Submitted 28 January, 2011;
originally announced January 2011.