Search | arXiv e-print repository

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

Authors: Tochukwu Elijah Ogri, Muzaffar Qureshi, Zachary I. Bell, Rushikesh Kamalapurkar

Abstract: In this paper, a novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments. The developed framework ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller. The approach leverages linear matrix inequal… ▽ More In this paper, a novel online, output-feedback, critic-only, model-based reinforcement learning framework is developed for safety-critical control systems operating in complex environments. The developed framework ensures system stability and safety, regardless of the lack of full-state measurement, while learning and implementing an optimal controller. The approach leverages linear matrix inequality-based observer design method to efficiently search for observer gains for effective state estimation. Then, approximate dynamic programming is used to develop an approximate controller that uses simulated experiences to guarantee the safety and stability of the closed-loop system. Safety is enforced by adding a recentered robust Lyapunov-like barrier function to the cost function that effectively enforces safety constraints, even in the presence of uncertainty in the state. Lyapunov-based stability analysis is used to guarantee uniform ultimate boundedness of the trajectories of the closed-loop system and ensure safety. Simulation studies are performed to demonstrate the effectiveness of the developed method through two real-world safety-critical scenarios, ensuring that the state trajectories of a given system remain in a given set and obstacle avoidance. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.10124 [pdf]

Technical Report: A Totally Asynchronous Nesterov's Accelerated Gradient Method for Convex Optimization

Authors: Ellie Pond, April Sebok, Zachary Bell, Matthew Hale

Abstract: We present a totally asynchronous algorithm for convex optimization that is based on a novel generalization of Nesterov's accelerated gradient method. This algorithm is developed for fast convergence under "total asynchrony," i.e., allowing arbitrarily long delays between agents' computations and communications without assuming any form of delay bound. These conditions may arise, for example, due… ▽ More We present a totally asynchronous algorithm for convex optimization that is based on a novel generalization of Nesterov's accelerated gradient method. This algorithm is developed for fast convergence under "total asynchrony," i.e., allowing arbitrarily long delays between agents' computations and communications without assuming any form of delay bound. These conditions may arise, for example, due to jamming by adversaries. Our framework is block-based, in the sense that each agent is only responsible for computing updates to (and communicating the values of) a small subset of the network-level decision variables. In our main result, we present bounds on the algorithm's parameters that guarantee linear convergence to an optimizer. Then, we quantify the relationship between (i) the total number of computations and communications executed by the agents and (ii) the agents' collective distance to an optimum. Numerical simulations show that this algorithm requires 28% fewer iterations than the heavy ball algorithm and 61% fewer iterations than gradient descent under total asynchrony. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 11 pages, 1 figure

arXiv:2402.00671 [pdf, other]

Uncertainty-Aware Guidance for Target Tracking subject to Intermittent Measurements using Motion Model Learning

Authors: Andres Pulido, Kyle Volle, Kristy Waters, Zachary I. Bell, Prashant Ganesh, Jane Shin

Abstract: This letter presents a novel guidance law for target tracking applications where the target motion model is unknown and sensor measurements are intermittent due to unknown environmental conditions and low measurement update rate. In this work, the target motion model is represented by a transformer-based neural network and trained by previous target position measurements. This neural network (NN)-… ▽ More This letter presents a novel guidance law for target tracking applications where the target motion model is unknown and sensor measurements are intermittent due to unknown environmental conditions and low measurement update rate. In this work, the target motion model is represented by a transformer-based neural network and trained by previous target position measurements. This neural network (NN)-based motion model serves as the prediction step in a particle filter for target state estimation and uncertainty quantification. Then this estimation uncertainty is utilized in the information-driven guidance law to compute a path for the mobile agent to travel to a position with maximum expected entropy reduction (EER). The computation of EER is performed in real-time by approximating the probability distribution of the state using the particle representation from particle filter. Simulation and hardware experiments are performed with a quadcopter agent and TurtleBot target to demonstrate that the presented guidance law outperforms two other baseline guidance methods. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.09658 [pdf, ps, other]

An adaptive optimal control approach to monocular depth observability maximization

Authors: Tochukwu Elijah Ogri, Muzaffar Qureshi, Zachary I. Bell, Kristy Waters, Rushikesh Kamalapurkar

Abstract: This paper presents an integral concurrent learning (ICL)-based observer for a monocular camera to accurately estimate the Euclidean distance to features on a stationary object, under the restriction that state information is unavailable. Using distance estimates, an infinite horizon optimal regulation problem is solved, which aims to regulate the camera to a goal location while maximizing feature… ▽ More This paper presents an integral concurrent learning (ICL)-based observer for a monocular camera to accurately estimate the Euclidean distance to features on a stationary object, under the restriction that state information is unavailable. Using distance estimates, an infinite horizon optimal regulation problem is solved, which aims to regulate the camera to a goal location while maximizing feature observability. Lyapunov-based stability analysis is used to guarantee exponential convergence of depth estimates and input-to-state stability of the goal location relative to the camera. The effectiveness of the proposed approach is verified in simulation, and a table illustrating improved observability is provided. △ Less

Submitted 6 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2311.03338 [pdf, other]

Defending a Static Target Point with a Slow Defender

Authors: Goutam Das, Michael Dorothy, Zachary I. Bell, Daigo Shishika

Abstract: This paper studies a target-defense game played between a slow defender and a fast attacker. The attacker wins the game if it reaches the target while avoiding the defender's capture disk. The defender wins the game by preventing the attacker from reaching the target, which includes reaching the target and containing it in the capture disk. Depending on the initial condition, the attacker must cir… ▽ More This paper studies a target-defense game played between a slow defender and a fast attacker. The attacker wins the game if it reaches the target while avoiding the defender's capture disk. The defender wins the game by preventing the attacker from reaching the target, which includes reaching the target and containing it in the capture disk. Depending on the initial condition, the attacker must circumnavigate the defender's capture disk, resulting in a constrained trajectory. This condition produces three phases of the game, which we analyze to solve for the game of kind. We provide the barrier surface that divides the state space into attacker-win and defender win regions, and present the corresponding strategies that guarantee win for each region. Numerical experiments demonstrate the theoretical results as well as the efficacy of the proposed strategies. △ Less

Submitted 16 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: 8 pages, 12 figures, Accepted for Publication to IEEE ACC 2024

arXiv:2310.09502 [pdf, ps, other]

Deep Nonlinear Adaptive Control for Unmanned Aerial Systems Operating under Dynamic Uncertainties

Authors: Zachary Lamb, Zachary I. Bell, Matthew Longmire, Jared Paquet, Prashant Ganesh, Ricardo Sanfelice

Abstract: Recent literature in the field of machine learning (ML) control has shown promising theoretical results for a Deep Neural Network (DNN) based Nonlinear Adaptive Controller (DNAC) capable of achieving trajectory tracking for nonlinear systems. Expanding on this work, this paper applies DNAC to the Attitude Control System (ACS) of a quadrotor and shows improvement to attitude control performance und… ▽ More Recent literature in the field of machine learning (ML) control has shown promising theoretical results for a Deep Neural Network (DNN) based Nonlinear Adaptive Controller (DNAC) capable of achieving trajectory tracking for nonlinear systems. Expanding on this work, this paper applies DNAC to the Attitude Control System (ACS) of a quadrotor and shows improvement to attitude control performance under disturbed flying conditions where the model uncertainty is high. Moreover, these results are noteworthy for ML control because they were achieved with no prior training data and an arbitrary system dynamics initialization; simply put, the controller presented in this paper is practically modelless, yet yields the ability to force trajectory tracking for nonlinear systems while rejecting significant undesirable model disturbances learned through a DNN. The combination of ML techniques to learn a system's dynamics and the Lyapunov analysis required to provide stability guarantees leads to a controller with applications in safety-critical systems that may undergo uncertain model changes, as is the case for most aerial systems. Experimental findings are analyzed in the final section of this paper, and DNAC is shown to outperform the trajectory tracking capabilities of PID, MRAC, and the recently developed Deep Model Reference Adaptive Control (DMRAC) schemes. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2309.13753 [pdf, other]

Policy Stitching: Learning Transferable Robot Policies

Authors: **cheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

Abstract: Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations… ▽ More Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations and scale to realistic tasks due to complex architecture design or strong regularization that limits the capacity of the learned policy. We propose Policy Stitching, a novel framework that facilitates robot transfer learning for novel combinations of robots and tasks. Our key idea is to apply modular policy design and align the latent representations between the modular interfaces. Our method allows direct stitching of the robot and task modules trained separately to form a new policy for fast adaptation. Our simulated and real-world experiments on various 3D manipulation tasks demonstrate the superior zero-shot and few-shot transfer learning performances of our method. Our project website is at: http://generalroboticslab.com/PolicyStitching/ . △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: CoRL 2023

arXiv:2304.01526 [pdf, ps, other]

State and Parameter Estimation for Affine Nonlinear Systems

Authors: Tochukwu Elijah Ogri, Zachary I. Bell, Rushikesh Kamalapurkar

Abstract: Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multipli… ▽ More Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multiplier matrices, a convenient way to search for observer gains is designed along with a controller that learns from simulated experience to ensure stability and convergence of trajectories of the closed-loop system to a neighborhood of the origin. Local uniform ultimate boundedness of the trajectories is established using a Lyapunov-based analysis and demonstrated through simulation results, under mild excitation conditions. △ Less

Submitted 21 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 16 pages, 2 figures, Submitted to 62nd IEEE Conference on Decision and Control

arXiv:2210.06637 [pdf, other]

Output Feedback Adaptive Optimal Control of Affine Nonlinear systems with a Linear Measurement Model

Authors: Tochukwu Elijah Ogri, S. M. Nahid Mahmud, Zachary I. Bell, Rushikesh Kamalapurkar

Abstract: Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multipli… ▽ More Real-world control applications in complex and uncertain environments require adaptability to handle model uncertainties and robustness against disturbances. This paper presents an online, output-feedback, critic-only, model-based reinforcement learning architecture that simultaneously learns and implements an optimal controller while maintaining stability during the learning phase. Using multiplier matrices, a convenient way to search for observer gains is designed along with a controller that learns from simulated experience to ensure stability and convergence of trajectories of the closed-loop system to a neighborhood of the origin. Local uniform ultimate boundedness of the trajectories is established using a Lyapunov-based analysis and demonstrated through simulation results, under mild excitation conditions. △ Less

Submitted 3 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 16 pages, 5 figures, submitted to 2023 IEEE Conference on Control Technology and Applications

arXiv:2209.09318 [pdf, other]

Guarding a Non-Maneuverable Translating Line with an Attached Defender

Authors: Goutam Das, Michael Dorothy, Zachary I. Bell, Daigo Shishika

Abstract: In this paper we consider a target-guarding differential game where the defender must protect a linearly translating line-segment by intercepting an attacker who tries to reach it. In contrast to common target-guarding problems, we assume that the defender is attached to the target and moves along with it. This assumption affects the defenders' maximum speed in inertial frame, which depends on the… ▽ More In this paper we consider a target-guarding differential game where the defender must protect a linearly translating line-segment by intercepting an attacker who tries to reach it. In contrast to common target-guarding problems, we assume that the defender is attached to the target and moves along with it. This assumption affects the defenders' maximum speed in inertial frame, which depends on the target's direction of motion. Zero-sum differential game of degree for both the attacker-win and defender-win scenarios are studied, where the payoff is defined to be the distance between the two agents at the time of game termination. We derive the equilibrium strategies and the Value function by leveraging the solution for the infinite-length target scenario. The zero-level set of this Value function provides the barrier surface that divides the state space into defender-win and attacker-win regions. We present simulation results to demonstrate the theoretical results. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 8 pages, 8 figures. arXiv admin note: text overlap with arXiv:2207.04098

arXiv:2204.01409 [pdf, other]

Safe Controller for Output Feedback Linear Systems using Model-Based Reinforcement Learning

Authors: S M Nahid Mahmud, Moad Abudia, Scott A Nivison, Zachary I. Bell, Rushikesh Kamalapurkar

Abstract: The objective of this research is to enable safety-critical systems to simultaneously learn and execute optimal control policies in a safe manner to achieve complex autonomy. Learning optimal policies via trial and error, i.e., traditional reinforcement learning, is difficult to implement in safety-critical systems, particularly when task restarts are unavailable. Safe model-based reinforcement le… ▽ More The objective of this research is to enable safety-critical systems to simultaneously learn and execute optimal control policies in a safe manner to achieve complex autonomy. Learning optimal policies via trial and error, i.e., traditional reinforcement learning, is difficult to implement in safety-critical systems, particularly when task restarts are unavailable. Safe model-based reinforcement learning techniques based on a barrier transformation have recently been developed to address this problem. However, these methods rely on full state feedback, limiting their usability in a real-world environment. In this work, an output-feedback safe model-based reinforcement learning technique based on a novel barrier-aware dynamic state estimator has been designed to address this issue. The developed approach facilitates simultaneous learning and execution of safe control policies for safety-critical linear systems. Simulation results indicate that barrier transformation is an effective approach to achieve online reinforcement learning in safety-critical systems using output feedback. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2110.00271

arXiv:2110.00271 [pdf, other]

Safety aware model-based reinforcement learning for optimal control of a class of output-feedback nonlinear systems

Authors: S M Nahid Mahmud, Moad Abudia, Scott A Nivison, Zachary I. Bell, Rushikesh Kamalapurkar

Abstract: The ability to learn and execute optimal control policies safely is critical to realization of complex autonomy, especially where task restarts are not available and/or the systems are safety-critical. Safety requirements are often expressed in terms of state and/or control constraints. Methods such as barrier transformation and control barrier functions have been successfully used, in conjunction… ▽ More The ability to learn and execute optimal control policies safely is critical to realization of complex autonomy, especially where task restarts are not available and/or the systems are safety-critical. Safety requirements are often expressed in terms of state and/or control constraints. Methods such as barrier transformation and control barrier functions have been successfully used, in conjunction with model-based reinforcement learning, for safe learning in systems under state constraints, to learn the optimal control policy. However, existing barrier-based safe learning methods rely on full state feedback. In this paper, an output-feedback safe model-based reinforcement learning technique is developed that utilizes a novel dynamic state estimator to implement simultaneous learning and control for a class of safety-critical systems with partially observable state. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.12666

arXiv:2007.12666 [pdf, other]

Safe Model-Based Reinforcement Learning for Systems with Parametric Uncertainties

Authors: S M Nahid Mahmud, Scott A Nivison, Zachary I. Bell, Rushikesh Kamalapurkar

Abstract: Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In o… ▽ More Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems. △ Less

Submitted 5 October, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: This manuscript has been accepted in Frontiers in Robotics and AI. doi: 10.3389/frobt.2021.733104

arXiv:1803.05584 [pdf, other]

A Switched Systems Approach to Path Following with Intermittent State Feedback

Authors: Hsi-Yuan Chen, Zachary I. Bell, Patryk Deptula, Warren E. Dixon

Abstract: Autonomous agents are often tasked with operating in an area where feedback is unavailable. Inspired by such applications, this paper develops a novel switched systems-based control method for uncertain nonlinear systems with temporary loss of state feedback. To compensate for intermittent feedback, an observer is used while state feedback is available to reduce the estimation error, and a predict… ▽ More Autonomous agents are often tasked with operating in an area where feedback is unavailable. Inspired by such applications, this paper develops a novel switched systems-based control method for uncertain nonlinear systems with temporary loss of state feedback. To compensate for intermittent feedback, an observer is used while state feedback is available to reduce the estimation error, and a predictor is utilized to propagate the estimates while state feedback is unavailable. Based on the resulting subsystems, maximum and minimum dwell time conditions are developed via a Lyapunov-based switched systems analysis to relax the constraint of maintaining constant feedback. Using the dwell time conditions, a switching trajectory is developed to enter and exit the feedback denied region in a manner that ensures the overall switched system remains stable. A scheme for designing a switching trajectory with a smooth transition function is provided. Simulation and experimental results are presented to demonstrate the performance of control design. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Showing 1–14 of 14 results for author: Bell, Z