Search | arXiv e-print repository

Online Nonstochastic Model-Free Reinforcement Learning

Authors: Udaya Ghai, Arushi Gupta, Wenhan Xia, Karan Singh, Elad Hazan

Abstract: We investigate robust model-free reinforcement learning algorithms designed for environments that may be dynamic or even adversarial. Traditional state-based policies often struggle to accommodate the challenges imposed by the presence of unmodeled disturbances in such settings. Moreover, optimizing linear state-based policies pose an obstacle for efficient optimization, leading to nonconvex objec… ▽ More We investigate robust model-free reinforcement learning algorithms designed for environments that may be dynamic or even adversarial. Traditional state-based policies often struggle to accommodate the challenges imposed by the presence of unmodeled disturbances in such settings. Moreover, optimizing linear state-based policies pose an obstacle for efficient optimization, leading to nonconvex objectives, even in benign environments like linear dynamical systems. Drawing inspiration from recent advancements in model-based control, we introduce a novel class of policies centered on disturbance signals. We define several categories of these signals, which we term pseudo-disturbances, and develop corresponding policy classes based on them. We provide efficient and practical algorithms for optimizing these policies. Next, we examine the task of online adaptation of reinforcement learning agents in the face of adversarial disturbances. Our methods seamlessly integrate with any black-box model-free approach, yielding provable regret guarantees when dealing with linear dynamics. These regret guarantees unconditionally improve the best-known results for bandit linear control in having no dependence on the state-space dimension. We evaluate our method over various standard RL benchmarks and demonstrate improved robustness. △ Less

Submitted 31 October, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: Camera-ready version for NeurIPS 2023

arXiv:2205.15235 [pdf, other]

Non-convex online learning via algorithmic equivalence

Authors: Udaya Ghai, Zhou Lu, Elad Hazan

Abstract: We study an algorithmic equivalence technique between non-convex gradient descent and convex mirror descent. We start by looking at a harder problem of regret minimization in online non-convex optimization. We show that under certain geometric and smoothness conditions, online gradient descent applied to non-convex functions is an approximation of online mirror descent applied to convex functions… ▽ More We study an algorithmic equivalence technique between non-convex gradient descent and convex mirror descent. We start by looking at a harder problem of regret minimization in online non-convex optimization. We show that under certain geometric and smoothness conditions, online gradient descent applied to non-convex functions is an approximation of online mirror descent applied to convex functions under reparameterization. In continuous time, the gradient flow with this reparameterization was shown to be exactly equivalent to continuous-time mirror descent by Amid and Warmuth 2020, but theory for the analogous discrete time algorithms is left as an open problem. We prove an $O(T^{\frac{2}{3}})$ regret bound for non-convex online gradient descent in this setting, answering this open problem. Our analysis is based on a new and simple algorithmic equivalence method. △ Less

Submitted 12 October, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2201.13288 [pdf, other]

A Regret Minimization Approach to Multi-Agent Control

Authors: Udaya Ghai, Udari Madhushani, Naomi Leonard, Elad Hazan

Abstract: We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distri… ▽ More We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distributed algorithm. The reduction guarantees that the resulting distributed algorithm has low regret relative to the optimal precomputed joint policy. Our methodology involves generalizing online convex optimization to a multi-agent setting and applying recent tools from nonstochastic control derived for a single agent. We empirically evaluate our method on a model of an overactuated aircraft. We show that the distributed method is robust to failure and to adversarial perturbations in the dynamics. △ Less

Submitted 25 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:7422-7434, 2022

arXiv:2107.07732 [pdf, ps, other]

Robust Online Control with Model Misspecification

Authors: Xinyi Chen, Udaya Ghai, Elad Hazan, Alexandre Megretski

Abstract: We study online control of an unknown nonlinear dynamical system that is approximated by a time-invariant linear system with model misspecification. Our study focuses on robustness, a measure of how much deviation from the assumed linear approximation can be tolerated by a controller while maintaining finite $\ell_2$-gain. A basic methodology to analyze robustness is via the small gain theorem.… ▽ More We study online control of an unknown nonlinear dynamical system that is approximated by a time-invariant linear system with model misspecification. Our study focuses on robustness, a measure of how much deviation from the assumed linear approximation can be tolerated by a controller while maintaining finite $\ell_2$-gain. A basic methodology to analyze robustness is via the small gain theorem. However, as an implication of recent lower bounds on adaptive control, this method can only yield robustness that is exponentially small in the dimension of the system and its parametric uncertainty. The work of Cusumano and Poolla shows that much better robustness can be obtained, but the control algorithm is inefficient, taking exponential time in the worst case. In this paper we investigate whether there exists an efficient algorithm with provable robustness beyond the small gain theorem. We demonstrate that for a fully actuated system, this is indeed attainable. We give an efficient controller that can tolerate robustness that is polynomial in the dimension and independent of the parametric uncertainty; furthermore, the controller obtains an $\ell_2$-gain whose dimension dependence is near optimal. △ Less

Submitted 4 April, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2012.06695 [pdf, other]

Generating Adversarial Disturbances for Controller Verification

Authors: Udaya Ghai, David Snyder, Anirudha Majumdar, Elad Hazan

Abstract: We consider the problem of generating maximally adversarial disturbances for a given controller assuming only blackbox access to it. We propose an online learning approach to this problem that \emph{adaptively} generates disturbances based on control inputs chosen by the controller. The goal of the disturbance generator is to minimize \emph{regret} versus a benchmark disturbance-generating policy… ▽ More We consider the problem of generating maximally adversarial disturbances for a given controller assuming only blackbox access to it. We propose an online learning approach to this problem that \emph{adaptively} generates disturbances based on control inputs chosen by the controller. The goal of the disturbance generator is to minimize \emph{regret} versus a benchmark disturbance-generating policy class, i.e., to maximize the cost incurred by the controller as well as possible compared to the best possible disturbance generator \emph{in hindsight} (chosen from a benchmark policy class). In the setting where the dynamics are linear and the costs are quadratic, we formulate our problem as an online trust region (OTR) problem with memory and present a new online learning algorithm (\emph{MOTR}) for this problem. We prove that this method competes with the best disturbance generator in hindsight (chosen from a rich class of benchmark policies that includes linear-dynamical disturbance generating policies). We demonstrate our approach on two simulated examples: (i) synthetically generated linear systems, and (ii) generating wind disturbances for the popular PX4 controller in the AirSim simulator. On these examples, we demonstrate that our approach outperforms several baseline approaches, including $H_{\infty}$ disturbance generation and gradient-based methods. △ Less

Submitted 31 January, 2022; v1 submitted 11 December, 2020; originally announced December 2020.

arXiv:2002.02064 [pdf, ps, other]

No-Regret Prediction in Marginally Stable Systems

Authors: Udaya Ghai, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang

Abstract: We consider the problem of online prediction in a marginally stable linear dynamical system subject to bounded adversarial or (non-isotropic) stochastic perturbations. This poses two challenges. Firstly, the system is in general unidentifiable, so recent and classical results on parameter recovery do not apply. Secondly, because we allow the system to be marginally stable, the state can grow polyn… ▽ More We consider the problem of online prediction in a marginally stable linear dynamical system subject to bounded adversarial or (non-isotropic) stochastic perturbations. This poses two challenges. Firstly, the system is in general unidentifiable, so recent and classical results on parameter recovery do not apply. Secondly, because we allow the system to be marginally stable, the state can grow polynomially with time; this causes standard regret bounds in online convex optimization to be vacuous. In spite of these challenges, we show that the online least-squares algorithm achieves sublinear regret (improvable to polylogarithmic in the stochastic setting), with polynomial dependence on the system's parameters. This requires a refined regret analysis, including a structural lemma showing the current state of the system to be a small linear combination of past states, even if the state grows polynomially. By applying our techniques to learning an autoregressive filter, we also achieve logarithmic regret in the partially observed setting under Gaussian noise, with polynomial dependence on the memory of the associated Kalman filter. △ Less

Submitted 23 June, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Comments: 43 pages. Appears in COLT 2020

arXiv:1902.01903 [pdf, other]

Exponentiated Gradient Meets Gradient Descent

Authors: Udaya Ghai, Elad Hazan, Yoram Singer

Abstract: The (stochastic) gradient descent and the multiplicative update method are probably the most popular algorithms in machine learning. We introduce and study a new regularization which provides a unification of the additive and multiplicative updates. This regularization is derived from an hyperbolic analogue of the entropy function, which we call hypentropy. It is motivated by a natural extension o… ▽ More The (stochastic) gradient descent and the multiplicative update method are probably the most popular algorithms in machine learning. We introduce and study a new regularization which provides a unification of the additive and multiplicative updates. This regularization is derived from an hyperbolic analogue of the entropy function, which we call hypentropy. It is motivated by a natural extension of the multiplicative update to negative numbers. The hypentropy has a natural spectral counterpart which we use to derive a family of matrix-based updates that bridge gradient methods and the multiplicative method for matrices. While the latter is only applicable to positive semi-definite matrices, the spectral hypentropy method can naturally be used with general rectangular matrices. We analyze the new family of updates by deriving tight regret bounds. We study empirically the applicability of the new update for settings such as multiclass learning, in which the parameters constitute a general rectangular matrix. △ Less

Submitted 5 February, 2019; originally announced February 2019.

Showing 1–7 of 7 results for author: Ghai, U