-
Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates
Authors:
Udayan Mandal,
Guy Amir,
Haoze Wu,
Ieva Daukantas,
Fletcher Lee Newell,
Umberto J. Ravaioli,
Baoluo Meng,
Michael Durling,
Milan Ganai,
Tobey Shim,
Guy Katz,
Clark Barrett
Abstract:
Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the "black box" nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions…
▽ More
Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the "black box" nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions over the system whose properties indirectly imply that an agent behaves as desired. However, NLB-based certificates are typically difficult to learn and even more difficult to verify, especially for complex systems. In this work, we present a novel method for training and verifying NLB-based certificates for discrete-time systems. Specifically, we introduce a technique for certificate composition, which simplifies the verification of highly-complex systems by strategically designing a sequence of certificates. When jointly verified with neural network verification engines, these certificates provide a formal guarantee that a DRL agent both achieves its goals and avoids unsafe behavior. Furthermore, we introduce a technique for certificate filtering, which significantly simplifies the process of producing formally verified certificates. We demonstrate the merits of our approach with a case study on providing safety and liveness guarantees for a DRL-controlled spacecraft.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Iterative Reachability Estimation for Safe Reinforcement Learning
Authors:
Milan Ganai,
Zheng Gong,
Chenning Yu,
Sylvia Herbert,
Sicun Gao
Abstract:
Ensuring safety is important for the practical deployment of reinforcement learning (RL). Various challenges must be addressed, such as handling stochasticity in the environments, providing rigorous guarantees of persistent state-wise safety satisfaction, and avoiding overly conservative behaviors that sacrifice performance. We propose a new framework, Reachability Estimation for Safe Policy Optim…
▽ More
Ensuring safety is important for the practical deployment of reinforcement learning (RL). Various challenges must be addressed, such as handling stochasticity in the environments, providing rigorous guarantees of persistent state-wise safety satisfaction, and avoiding overly conservative behaviors that sacrifice performance. We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained RL in general stochastic settings. In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety. Outside this feasible set, our optimization produces the safest behavior by guaranteeing entrance into the feasible set whenever possible with the least cumulative discounted violations. We introduce a class of algorithms using our novel reachability estimation function to optimize in our proposed framework and in similar frameworks such as those concurrently handling multiple hard and soft constraints. We theoretically establish that our algorithms almost surely converge to locally optimal policies of our safe optimization framework. We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Target-independent XLA optimization using Reinforcement Learning
Authors:
Milan Ganai,
Haichen Li,
Theodore Enns,
Yida Wang,
Randy Huang
Abstract:
An important challenge in Machine Learning compilers like XLA is multi-pass optimization and analysis. There has been recent interest chiefly in XLA target-dependent optimization on the graph-level, subgraph-level, and kernel-level phases. We specifically focus on target-independent optimization XLA HLO pass ordering: our approach aims at finding the optimal sequence of compiler optimization passe…
▽ More
An important challenge in Machine Learning compilers like XLA is multi-pass optimization and analysis. There has been recent interest chiefly in XLA target-dependent optimization on the graph-level, subgraph-level, and kernel-level phases. We specifically focus on target-independent optimization XLA HLO pass ordering: our approach aims at finding the optimal sequence of compiler optimization passes, which is decoupled from target-dependent optimization. However, there is little domain specific study in pass ordering for XLA HLO. To this end, we propose introducing deep Reinforcement Learning (RL) based search for optimal XLA HLO pass ordering. We also propose enhancements to the deep RL algorithms to further improve optimal search performance and open the research direction for domain-specific guidance for RL. We create an XLA Gym experimentation framework as a tool to enable RL algorithms to interact with the compiler for passing optimizations and thereby train agents. Overall, in our experimentation we observe an average of $13.3\%$ improvement in operation count reduction on a benchmark of GPT-2 training graphs and $10.4\%$ improvement on a diverse benchmark including GPT-2, BERT, and ResNet graphs using the proposed approach over the compiler's default phase ordering.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Learning Stabilization Control from Observations by Learning Lyapunov-like Proxy Models
Authors:
Milan Ganai,
Chiaki Hirayama,
Ya-Chien Chang,
Sicun Gao
Abstract:
The deployment of Reinforcement Learning to robotics applications faces the difficulty of reward engineering. Therefore, approaches have focused on creating reward functions by Learning from Observations (LfO) which is the task of learning policies from expert trajectories that only contain state sequences. We propose new methods for LfO for the important class of continuous control problems of le…
▽ More
The deployment of Reinforcement Learning to robotics applications faces the difficulty of reward engineering. Therefore, approaches have focused on creating reward functions by Learning from Observations (LfO) which is the task of learning policies from expert trajectories that only contain state sequences. We propose new methods for LfO for the important class of continuous control problems of learning to stabilize, by introducing intermediate proxy models acting as reward functions between the expert and the agent policy based on Lyapunov stability theory. Our LfO training process consists of two steps. The first step attempts to learn a Lyapunov-like landscape proxy model from expert state sequences without access to any kinematics model, and the second step uses the learned landscape model to guide in training the learner's policy. We formulate novel learning objectives for the two steps that are important for overall training success. We evaluate our methods in real automobile robot environments and other simulated stabilization control problems in model-free settings, like Quadrotor control and maintaining upright positions of Hopper in MuJoCo. We compare with state-of-the-art approaches and show the proposed methods can learn efficiently with less expert observations.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Verification of Embedded Memory Systems using Efficient Memory Modeling
Authors:
Malay K. Ganai,
Aarti Gupta,
Pranav Ashar
Abstract:
We describe verification techniques for embedded memory systems using efficient memory modeling (EMM), without explicitly modeling each memory bit. We extend our previously proposed approach of EMM in Bounded Model Checking (BMC) for a single read/write port single memory system, to more commonly occurring systems with multiple memories, having multiple read and write ports. More importantly, we…
▽ More
We describe verification techniques for embedded memory systems using efficient memory modeling (EMM), without explicitly modeling each memory bit. We extend our previously proposed approach of EMM in Bounded Model Checking (BMC) for a single read/write port single memory system, to more commonly occurring systems with multiple memories, having multiple read and write ports. More importantly, we augment such EMM to providing correctness proofs, in addition to finding real bugs as before. The novelties of our verification approach are in a) combining EMM with proof-based abstraction that preserves the correctness of a property up to a certain analysis depth of SAT-based BMC, and b) modeling arbitrary initial memory state precisely and thereby, providing inductive proofs using SAT-based BMC for embedded memory systems. Similar to the previous approach, we construct a verification model by eliminating memory arrays, but retaining the memory interface signals with their control logic and adding constraints on those signals at every analysis depth to preserve the data forwarding semantics. The size of these EMM constraints depends quadratically on the number of memory accesses and the number of read and write ports; and linearly on the address and data widths and the number of memories. We show the effectiveness of our approach on several industry designs and software programs.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.