-
Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems
Authors:
Aman Sinha,
Matthew O'Kelly,
Russ Tedrake,
John Duchi
Abstract:
Learning-based methodologies increasingly find applications in safety-critical domains like autonomous driving and medical robotics. Due to the rare nature of dangerous events, real-world testing is prohibitively expensive and unscalable. In this work, we employ a probabilistic approach to safety evaluation in simulation, where we are concerned with computing the probability of dangerous events. W…
▽ More
Learning-based methodologies increasingly find applications in safety-critical domains like autonomous driving and medical robotics. Due to the rare nature of dangerous events, real-world testing is prohibitively expensive and unscalable. In this work, we employ a probabilistic approach to safety evaluation in simulation, where we are concerned with computing the probability of dangerous events. We develop a novel rare-event simulation method that combines exploration, exploitation, and optimization techniques to find failure modes and estimate their rate of occurrence. We provide rigorous guarantees for the performance of our method in terms of both statistical and computational efficiency. Finally, we demonstrate the efficacy of our approach on a variety of scenarios, illustrating its usefulness as a tool for rapid sensitivity analysis and model comparison that are essential to develo** and testing safety-critical autonomous systems.
△ Less
Submitted 8 August, 2021; v1 submitted 24 August, 2020;
originally announced August 2020.
-
FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis
Authors:
Aman Sinha,
Matthew O'Kelly,
Hongrui Zheng,
Rahul Mangharam,
John Duchi,
Russ Tedrake
Abstract:
Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorith…
▽ More
Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents' behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.
△ Less
Submitted 22 August, 2020; v1 submitted 8 March, 2020;
originally announced March 2020.
-
Efficient Black-box Assessment of Autonomous Vehicle Safety
Authors:
Justin Norden,
Matthew O'Kelly,
Aman Sinha
Abstract:
While autonomous vehicle (AV) technology has shown substantial progress, we still lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de-facto}$ evaluation method, is dangerous to the public. Moreover, due to the rare nature of failures, billions of miles of driving are needed to statistically validate performance claims. Thus, the industry has largely turned to simulati…
▽ More
While autonomous vehicle (AV) technology has shown substantial progress, we still lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de-facto}$ evaluation method, is dangerous to the public. Moreover, due to the rare nature of failures, billions of miles of driving are needed to statistically validate performance claims. Thus, the industry has largely turned to simulation to evaluate AV systems. However, having a simulation stack alone is not a solution. A simulation testing framework needs to prioritize which scenarios to run, learn how the chosen scenarios provide coverage of failure modes, and rank failure scenarios in order of importance. We implement a simulation testing framework that evaluates an entire modern AV system as a black box. This framework estimates the probability of accidents under a base distribution governing standard traffic behavior. In order to accelerate rare-event probability evaluation, we efficiently learn to identify and rank failure scenarios via adaptive importance-sampling methods. Using this framework, we conduct the first independent evaluation of a full-stack commercial AV system, Comma AI's OpenPilot.
△ Less
Submitted 5 June, 2020; v1 submitted 8 December, 2019;
originally announced December 2019.
-
In-silico Risk Analysis of Personalized Artificial Pancreas Controllers via Rare-event Simulation
Authors:
Matthew O'Kelly,
Aman Sinha,
Justin Norden,
Hongseok Namkoong
Abstract:
Modern treatments for Type 1 diabetes (T1D) use devices known as artificial pancreata (APs), which combine an insulin pump with a continuous glucose monitor (CGM) operating in a closed-loop manner to control blood glucose levels. In practice, poor performance of APs (frequent hyper- or hypoglycemic events) is common enough at a population level that many T1D patients modify the algorithms on exist…
▽ More
Modern treatments for Type 1 diabetes (T1D) use devices known as artificial pancreata (APs), which combine an insulin pump with a continuous glucose monitor (CGM) operating in a closed-loop manner to control blood glucose levels. In practice, poor performance of APs (frequent hyper- or hypoglycemic events) is common enough at a population level that many T1D patients modify the algorithms on existing AP systems with unregulated open-source software. Anecdotally, the patients in this group have shown superior outcomes compared with standard of care, yet we do not understand how safe any AP system is since adverse outcomes are rare. In this paper, we construct generative models of individual patients' physiological characteristics and eating behaviors. We then couple these models with a T1D simulator approved for pre-clinical trials by the FDA. Given the ability to simulate patient outcomes in-silico, we utilize techniques from rare-event simulation theory in order to efficiently quantify the performance of a device with respect to a particular patient. We show a 72,000$\times$ speedup in simulation speed over real-time and up to 2-10 times increase in the frequency which we are able to sample adverse conditions relative to standard Monte Carlo sampling. In practice our toolchain enables estimates of the likelihood of hypoglycemic events with approximately an order of magnitude fewer simulations.
△ Less
Submitted 1 December, 2018;
originally announced December 2018.
-
Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation
Authors:
Matthew O'Kelly,
Aman Sinha,
Hongseok Namkoong,
John Duchi,
Russ Tedrake
Abstract:
While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de facto}$ evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation fram…
▽ More
While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de facto}$ evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms. Using adaptive importance-sampling methods to accelerate rare-event probability evaluation, we estimate the probability of an accident under a base distribution governing standard traffic behavior. We demonstrate our framework on a highway scenario, accelerating system evaluation by $2$-$20$ times over naive Monte Carlo sampling methods and $10$-$300 \mathsf{P}$ times (where $\mathsf{P}$ is the number of processors) over real-world testing.
△ Less
Submitted 12 January, 2019; v1 submitted 31 October, 2018;
originally announced November 2018.