-
Online Policy Optimization in Unknown Nonlinear Systems
Authors:
Yiheng Lin,
James A. Preiss,
Fengze Xie,
Emile Anand,
Soon-Jo Chung,
Yisong Yue,
Adam Wierman
Abstract:
We study online policy optimization in nonlinear time-varying dynamical systems where the true dynamical models are unknown to the controller. This problem is challenging because, unlike in linear systems, the controller cannot obtain globally accurate estimations of the ground-truth dynamics using local exploration. We propose a meta-framework that combines a general online policy optimization al…
▽ More
We study online policy optimization in nonlinear time-varying dynamical systems where the true dynamical models are unknown to the controller. This problem is challenging because, unlike in linear systems, the controller cannot obtain globally accurate estimations of the ground-truth dynamics using local exploration. We propose a meta-framework that combines a general online policy optimization algorithm ($\texttt{ALG}$) with a general online estimator of the dynamical system's model parameters ($\texttt{EST}$). We show that if the hypothetical joint dynamics induced by $\texttt{ALG}$ with known parameters satisfies several desired properties, the joint dynamics under inexact parameters from $\texttt{EST}$ will be robust to errors. Importantly, the final policy regret only depends on $\texttt{EST}$'s predictions on the visited trajectory, which relaxes a bottleneck on identifying the true parameters globally. To demonstrate our framework, we develop a computationally efficient variant of Gradient-based Adaptive Policy Selection, called Memoryless GAPS (M-GAPS), and use it to instantiate $\texttt{ALG}$. Combining M-GAPS with online gradient descent to instantiate $\texttt{EST}$ yields (to our knowledge) the first local regret bound for online policy optimization in nonlinear time-varying systems with unknown dynamics.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Online switching control with stability and regret guarantees
Authors:
Yingying Li,
James A. Preiss,
Na Li,
Yiheng Lin,
Adam Wierman,
Jeff Shamma
Abstract:
This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stabi…
▽ More
This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stability throughout the duration of its execution. We also provide a sublinear policy regret guarantee compared with the optimal stabilizing candidate controller. Lastly, we numerically test our algorithm on quadrotor planar flights and compare it with a classical switching control algorithm, falsification-based switching, and a classical multi-armed bandit algorithm, Exp3 with batches.
△ Less
Submitted 23 January, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations
Authors:
Yiheng Lin,
James A. Preiss,
Emile Anand,
Yingying Li,
Yisong Yue,
Adam Wierman
Abstract:
We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent a…
▽ More
We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection. Our numerical experiments show that GAPS can adapt to changing environments more quickly than existing benchmarks.
△ Less
Submitted 12 June, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Suboptimal coverings for continuous spaces of control tasks
Authors:
James A. Preiss,
Gaurav S. Sukhatme
Abstract:
We propose the α-suboptimal covering number to characterize multi-task control problems where the set of dynamical systems and/or cost functions is infinite, analogous to the cardinality of finite task sets. This notion may help quantify the function class expressiveness needed to represent a good multi-task policy, which is important for learning-based control methods that use parameterized funct…
▽ More
We propose the α-suboptimal covering number to characterize multi-task control problems where the set of dynamical systems and/or cost functions is infinite, analogous to the cardinality of finite task sets. This notion may help quantify the function class expressiveness needed to represent a good multi-task policy, which is important for learning-based control methods that use parameterized function approximation. We study suboptimal covering numbers for linear dynamical systems with quadratic cost (LQR problems) and construct a class of multi-task LQR problems amenable to analysis. For the scalar case, we show logarithmic dependence on the "breadth" of the space. For the matrix case, we present experiments 1) measuring the efficiency of a particular constructive cover, and 2) visualizing the behavior of two candidate systems for the lower bound.
△ Less
Submitted 23 April, 2021;
originally announced April 2021.