Robust Online Convex Optimization for Disturbance Rejection

Joyce Lai and Peter Seiler J. Lai and P. Seiler are with the Department of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor, MI 48109, USA. {joycelai,pseiler}@umich.edu

Abstract

Online convex optimization (OCO) is a powerful tool for learning sequential data, making it ideal for high precision control applications where the disturbances are arbitrary and unknown in advance. However, the ability of OCO-based controllers to accurately learn the disturbance while maintaining closed-loop stability relies on having an accurate model of the plant. This paper studies the performance of OCO-based controllers for linear time-invariant (LTI) systems subject to disturbance and model uncertainty. The model uncertainty can cause the closed-loop to become unstable. We provide a sufficient condition for robust stability based on the small gain theorem. This condition is easily incorporated as an on-line constraint in the OCO controller. Finally, we verify via numerical simulations that imposing the robust stability condition on the OCO controller ensures closed-loop stability.

1 Introduction

This paper considers a class of controllers recently developed using online convex optimization (OCO). Online machine learning and convex optimization methods are powerful tools for learning sequential data. This makes these techniques ideal for high precision control applications like satellite pointing and photolithography. These systems have reliable physics-based models with small error (within the control bandwidth) but are subject to unknown arbitrary disturbances.

This has motivated a large body of recent work using online learning and convex optimization for control [1, 2, 3, 4, 5, 6, 7, 8, 9]. The most closely related work is the class of OCO controllers defined in [10]. Here, OCO with memory is introduced to the discrete-time control setting as an ideal cost minimization problem (which we describe in detail in Section 4.2) to handle arbitrary disturbances and general time-varying convex cost functions. The OCO controller has promising regret guarantees and makes less restrictive assumptions about the disturbance characteristics (e.g., white noise or worst-case) than that of $H_{2}$ and $H_{\infty}$ optimal control techniques [11, 12]. This makes OCO methods well suited for high precision control applications with unknown, arbitrary disturbances that degrade the system performance.

The OCO framework in [10] aims to learn the disturbance characteristics in real time. However, small model errors can cause instability and thus must be explicitly considered in the design. There are additional works that attempt to learn the model from data [13, 14, 15, 16, 17, 18, 19]. However, dynamic uncertainties in many high precision applications arise due to high frequency, time-varying, and/or nonlinear effects. It is difficult to learn such unmodeled effects from real-time data. In these cases, it is useful to design a robust OCO-based controller that can learn the disturbance features and tolerate model uncertainty, thus motivating our work.

There are three main contributions of our work. First, we provide a robust stability condition for OCO control of a discrete linear time-invariant (LTI) plant (Theorem 2 in Section 3.2). The scaled small gain condition is written abstractly with an arbitrary choice of an induced system norm. Our second contribution is to present a constrained OCO (C-OCO) control algorithm which is robust to nonparametric model uncertainties (Section 4). This algorithm uses a specific implementation of the scaled small gain condition with the induced $\ell_{\infty}$ -norm (Section 3.3). This particular choice for the induced norm enables easy implementation of the robust stability condition in the C-OCO algorithm. The third contribution is to present numerical results that illustrate the effect of this robust stability constraint on the OCO controller (Section 5).

2 Problem Formulation

This section formulates the OCO control problem for discrete-time LTI plants subject to both model uncertainty and unknown disturbances.

2.1 Notation

Let $v\in\mathbb{R}^{n}$ be a vector. The $p$ -norm of this vector is defined as $\|v\|_{p}:=\big{[}\sum_{i=1}^{n}|v_{i}|^{p}\big{]}^{\frac{1}{p}}$ . Next, $\mathbb{N}$ denotes the set of non-negative integers. Let $d:\mathbb{N}\to\mathbb{R}^{n}$ denote a vector-valued sequence $\{d_{0},d_{1},\ldots\}$ . The $\ell_{p}$ -norm of $d$ is defined as:

\displaystyle\|d\|_{p}=\left[\sum_{t=0}^{\infty}\|d_{t}\|_{p}^{p}\right]^{% \frac{1}{p}}.

(1)

Note that $\|d_{t}\|_{p}$ is the $p$ -norm of the vector $d_{t}\in\mathbb{R}^{n}$ at time $t$ while $\|d\|_{p}$ is the $\ell_{p}$ -norm of the sequence. The set $\ell_{p}$ consists of sequences that have finite $\ell_{p}$ -norm. The subset $\ell_{pe}\subset\ell_{p}$ is the extended space of sequences that have finite $\ell_{p}$ -norm on all finite intervals, i.e. $\sum_{t=0}^{T}\|d_{t}\|_{p}^{p}<\infty$ for all $T\geq 0$ . Finally, let $G:\ell_{pe}\to\ell_{pe}$ denote systems that map an input signal $u\in\ell_{pe}$ to an output signal $y\in\ell_{pe}$ . The induced $\ell_{p}$ -norm for this system is defined as:

\displaystyle\|G\|_{p\to p}=\sup_{0\neq u\in\ell_{p}}\frac{\|y\|_{p}}{\|u\|_{p% }}.

(2)

To simplify notation, we’ll often use $\|d\|$ and $\|G\|$ for the signal norm and system induced norm when the specific $p$ -norm is not important.

2.2 Model Uncertainty

In this section, we consider the feedback system in Figure 1 and discuss the model uncertainty $\Delta(z)$ in more detail.

Figure 1: Discrete-time feedback system with unknown disturbance

d

and uncertainty

\Delta(z)

. OCO control is used to reject the disturbance

d

without knowledge of the uncertainty

\Delta(z)

Consider the nominal discrete-time, LTI plant $G(z)$ with dynamics:

\displaystyle x_{t+1}=A\,x_{t}+B\,v_{t},

(3)

where $x_{t}\in\mathbb{R}^{n_{x}}$ and $v_{t}\in\mathbb{R}^{n_{u}}$ are the nominal plant state and input at time $t$ , respectively. We assume $x_{0}=0$ for simplicity.

Model uncertainty for systems with physics-based models often shows up as unmodeled actuator dynamics affecting the plant input [11, 20, 12]. We can account for these unmodeled dynamics by defining an input-multiplicative uncertainty set $\mathcal{G}_{\delta}$ as:

\displaystyle\mathcal{G}_{\delta}=\left\{\tilde{G}(z)=G(z)(I+\Delta(z)):\|% \Delta\|\leq\delta\right\},

(4)

where $\delta\in[0,\infty)$ . Note that the induced $2$ -norm is common choice to bound the uncertainty. However, our main result in Section 3 holds for any induced $p$ -norm.

Let $\tilde{G}_{0}(z)$ denote the true plant dynamics. We assume that the true plant is within the uncertainty set, i.e. $\tilde{G}_{0}(z)\in\mathcal{G}_{\delta}$ . In other words, there exists a specific $\Delta_{0}(z)$ such that $\|\Delta_{0}\|\leq\delta$ and $\tilde{G}_{0}(z)=G(z)(I+\Delta_{0}(z))\in\mathcal{G}_{\delta}$ . More generally, we refer to $\tilde{G}(z)=G(z)(I+\Delta(z))$ as the uncertain plant. An alternative viewpoint is that the uncertain plant is $\tilde{G}(z)=G(z)F(z)$ where $F(z)=I+\Delta(z)$ represents unmodeled dynamics. Note that we assume the uncertainty $\Delta(z)$ is LTI. However, our main result in Section 3 can be extended to the case where $\Delta$ is a possibly nonlinear time-varying (NLTV) system.

2.3 OCO Control

This section describes the OCO controller. We consider the feedback system in Figure 2 where the OCO controller is shown in more detail.

Figure 2: Block diagram representation of the OCO controller in a discrete-time feedback system with unknown disturbance

d_{t}

and uncertain plant

\tilde{G}(z)

. The OCO controller is composed of a state-feedback gain

K

, an estimator

E(z)

, and an LTV system

M_{LTV}

Unknown disturbances are often caused by environmental factors and moving physical components which degrade system performance. However, these disturbances often also have learnable characteristics. It is typical to model such disturbances as entering at the plant input as shown in Figure 2.

OCO control can be used to learn and reject the disturbance without a priori knowledge of the disturbance [1, 2, 3, 4, 5, 6, 7, 8, 9]. Here, we describe a class of OCO controllers closely related to [10] which considers the case when $\Delta(z)=0$ . The OCO controller has the block diagram representation shown in Figure 2. This corresponds to the class of disturbance action controllers defined as:

\displaystyle u_{t}=-Kx_{t}+\sum_{i=0}^{H-1}M_{t}^{[i]}\hat{w}_{t-i},

(5)

where $K\in\mathbb{R}^{n_{u}\times n_{x}}$ , $M_{t}^{[i]}\in\mathbb{R}^{n_{u}\times n_{x}}$ , and $\hat{w}_{t}\in\mathbb{R}^{n_{x}}$ are the state-feedback gain, learned coefficients, and disturbance estimate, at time $t$ , respectively. The state-feedback gain $K$ is user-selected while the learned coefficients $\{M_{t}\}_{i=0}^{H-1}$ are typically updated via some online optimization method. For example, [10] uses online projected gradient descent (OPGD) with memory (see Section 4.2).

The disturbance estimate $\hat{w}_{t}$ is assumed to be the output of an LTI estimator $E(z)$ with dynamics:

\displaystyle\begin{split}x_{t+1}^{e}&=A_{e}x_{t}^{e}+B_{e1}x_{t}+B_{e2}u_{t}% \\ \hat{w}_{t}&=C_{e}x_{t}^{e}+D_{e1}x_{t}+D_{e2}u_{t},\end{split}

(6)

where $x_{t}^{e}\in\mathbb{R}^{n_{e}}$ and $\hat{w}_{t}\in\mathbb{R}^{n_{x}}$ are the estimator state and output at time $t$ , respectively. Typically, $\hat{w}_{t}$ is an estimate of $Bd_{t}\in\mathbb{R}^{n_{x}}$ (possibly with delay), i.e., it is an estimate the disturbance effect on the (nominal) state. The estimate is constructed from $x_{t}$ and $u_{t}$ . This estimator is motivated by the case when $\Delta(z)=0$ .

The first term in (5) is considered the baseline controller which we denote by:

\displaystyle u_{t}^{base}=-Kx_{t}.

(7)

The main results in Section 3 can be generalized to the case when the baseline control $u_{t}^{base}$ is the output of an LTI controller $K(z)$ with input $x_{t}$ . We assume the baseline controller is a static, state-feedback gain for simplicity.

The second term in (5) is the output of an finite impulse response (FIR) filter with time-varying coefficients. We denote the FIR filter with time-varying coefficients as a linear time-varying (LTV) system $M_{LTV}$ with input-output dynamics defined as:

\displaystyle u_{t}^{oco}=\sum_{i=0}^{H-1}M_{t}^{[i]}\hat{w}_{t-i}.

(8)

where $\hat{w}_{t}\in\mathbb{R}^{n_{x}}$ and $u_{t}^{oco}\in\mathbb{R}^{n_{u}}$ are the input and output at time $t$ , respectively. The FIR filter order $H$ is also referred to as the learning horizon since the coefficients are often updated via OCO using the past $H$ disturbance estimates. We provide an example of online optimization in Sections 4 and 5, but the main results in Section 3 assume only that the coefficients are time-varying.

Given (7) and (8), the OCO controller (5) can be interpreted as a baseline controller $u_{t}^{base}$ plus an adapting term $u_{t}^{oco}$ which corrects for the unknown disturbance $d_{t}$ based on disturbance estimates.

2.4 Model Uncertainty Effects on OCO Control

The uncertainty $\Delta(z)$ and disturbance $d_{t}$ have different effects on closed-loop stability. Suppose the state-feedback gain $K$ is stabilizing, i.e., all eigenvalues of $(A-BK)$ are strictly inside the unit disk. Given a perfect plant model, i.e., $\Delta(z)=0$ , OCO control can be designed to achieve disturbance rejection with provable guarantees [10]. In this case, a bounded disturbance $d$ cannot cause signals $x,u,\hat{w}$ , etc. to grow unbounded. However, small amounts of model uncertainty can cause the system to become unstable.

As shown in Figures 1 and 2, the (true) plant input is the control input perturbed by an unknown disturbance:

\displaystyle p_{t}=u_{t}+d_{t},

(9)

where $u_{t},d_{t},p_{t}\in\mathbb{R}^{n_{u}}$ are the control input, disturbance, and perturbed (true) plant input at time $t$ , respectively. The perturbed input $p_{t}$ is further distorted by the uncertainty $\Delta(z)$ . The resulting input to the nominal plant $G(z)$ is:

\displaystyle v_{t}

\displaystyle=(I+\Delta)\,p_{t}=u_{t}+d_{t}+q_{t},

(10)

where $q_{t}=\Delta p_{t}\in\mathbb{R}^{n_{u}}$ . Again, $v_{t}$ is the nominal plant input at time $t$ . Not only is there an unknown disturbance $d_{t}$ , but also a distorted signal $q_{t}$ due to uncertainty $\Delta(z)$ .

The additional perturbation $q_{t}$ can lead to unexpected behaviors that affect the disturbance estimate and FIR filter coefficient update when left unaccounted for in the OCO design. This can occur even when the state-feedback gain $K$ is stabilizing for the true plant $\tilde{G}(z)$ . Thus, the OCO controller is required to: i) learn and compensate for the disturbance, and ii) stabilize the system in the presence of uncertainty. The OCO controller must achieve these objectives without a priori knowledge of the disturbance or uncertainty.

3 Main Result

This section provides a condition on $M_{LTV}$ that ensures the feedback system with OCO control remains stable even in the presence of the model uncertainty.

3.1 Linear Fractional Transformation

As a first step, we transform the feedback system of the OCO controller and uncertain plant (Figures 1 and 2) to a standard form as shown in Figure 3. This diagram separates the LTI dynamics $P$ from the uncertainty $\Delta$ and time-varying OCO dynamics $M_{LTV}$ . Here $P$ includes the dynamics due to the plant, estimator, and state-feedback gain. This diagram is called a linear fractional transformation (LFT) in the robust control literature [11, 12]. We use the notation $F_{U}(P,\Gamma)$ for this interconnection with $\Gamma=\left[\begin{smallmatrix}\Delta&0\\ 0&M_{LTV}\end{smallmatrix}\right]$ closed around the upper channels of $P$ .

Figure 3: Equivalent LFT

F_{U}(P,\Gamma)

of original system separating LTI dynamics

P

from uncertainty

\Delta

and time-varying learning dynamics

M_{LTV}

An explicit state-space model for $P$ can be determined from the various components of the feedback system described in Section 2. The dynamics of $P$ are given by:

	$\displaystyle\begin{bmatrix}x_{t+1}\\ x_{t+1}^{e}\end{bmatrix}$	$\displaystyle=\begin{bmatrix}A-BK&0\\ B_{e1}-B_{e2}K&A_{e}\end{bmatrix}\begin{bmatrix}x_{t}\\ x_{t}^{e}\end{bmatrix}+\begin{bmatrix}B&B&B\\ 0&B_{e2}&0\end{bmatrix}\begin{bmatrix}q_{t}\\ u_{t}^{oco}\\ d_{t}\end{bmatrix}$
	$\displaystyle\begin{bmatrix}p_{t}\\ \hat{w}_{t}\\ x_{t}\end{bmatrix}$	$\displaystyle=\begin{bmatrix}-K&0\\ D_{e1}-D_{e2}K&C_{e}\\ I&0\end{bmatrix}\begin{bmatrix}x_{t}\\ x_{t}^{e}\end{bmatrix}+\begin{bmatrix}0&I&I\\ 0&D_{e2}&0\\ 0&0&0\end{bmatrix}\begin{bmatrix}q_{t}\\ u_{t}^{oco}\\ d_{t}\end{bmatrix}$

We use the LFT representation $F_{U}(P,\Gamma)$ to formulate and state our robust stability theorem in the next subsection.

3.2 Scaled Small Gain Theorem

Our first stability result is a variation of the standard small gain theorem (see Section 5.4 of [21]). This provides a sufficient condition for the dynamics $F_{U}(P,\Gamma)$ to have a bounded gain from disturbance $d$ to state $x$ . Note stability here is in the sense of bounded gain in some induced norm.

Theorem 1.

Consider the interconnection $F_{U}(P,\Gamma)$ where $P:\ell_{pe}\to\ell_{pe}$ and $\Gamma:\ell_{pe}\to\ell_{pe}$ are linear systems with finite induced $\ell_{p}$ -norm. Partition $P$ as:

\displaystyle\begin{bmatrix}\bar{p}\\ x\end{bmatrix}=\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{21}\end{bmatrix}\,\begin{bmatrix}\bar{q}\\ d\end{bmatrix},

(11)

where $\bar{p}:=\left[\begin{smallmatrix}p\\ \hat{w}\end{smallmatrix}\right]$ and $\bar{q}:=\left[\begin{smallmatrix}q\\ u^{oco}\end{smallmatrix}\right]$ are the inputs and outputs of $\Gamma$ . The interconnection has finite induced $\ell_{p}$ -norm, i.e. $\|F_{U}(P,\Gamma)\|<\infty$ , if $\|P_{11}\|\,\|\Gamma\|<1$ .

Proof.

The system $P$ is LTI so by the principle of superposition (assuming zero initial conditions):

\displaystyle\bar{p}=P_{11}\bar{q}+P_{12}d.

(12)

We can bound $\bar{p}$ using the triangle inequality and the definition of the induced norm:

\displaystyle\|\bar{p}\|\leq\|P_{11}\|\,\|\bar{q}\|+\|P_{12}\|\,\|d\|.

(13)

Next, $\bar{q}=\Gamma\bar{p}$ so that $\|\bar{q}\|\leq\|\Gamma\|\,\|\bar{p}\|$ . Substitute this bound into (13) and re-arrange to obtain:

\displaystyle\|\bar{p}\|\leq\frac{\|P_{12}\|}{1-\|P_{11}\|\|\Gamma\|}\,\|d\|.

(14)

This last step requires the small gain condition $\|P_{11}\|\,\|\Gamma\|<1$ to obtain the bound on $\|\bar{p}\|$ .

Finally, the state is $x=P_{21}\bar{q}+P_{22}d$ . We can use similar steps and the bound on $\bar{p}$ to obtain:

\displaystyle\|x\|\leq\left[\|P_{22}\|+\frac{\|P_{21}\|\,\|P_{12}\|\,\|\Gamma% \|}{1-\|P_{11}\|\,\|\Gamma\|}\right]\|d\|.

(15)

Hence, $F_{U}(P,\Gamma)$ has finite induced $\ell_{p}$ -norm. ∎

The small gain condition in the previous theorem can be conservative as it does not exploit the block structure $\Gamma=\left[\begin{smallmatrix}\Delta&0\\ 0&M_{LTV}\end{smallmatrix}\right]$ . We can reduce the conservatism by normalizing the blocks and introducing scalings. Specifically, assume $\|\Delta\|\leq\delta$ and $\|M_{LTV}\|\leq\beta$ . Define the normalized uncertainty and learning dynamics as: $\tilde{\Delta}=\frac{1}{\delta}\Delta$ and $\tilde{M}_{LTV}=\frac{1}{\beta}M_{LTV}$ . Stacking these together yields

\displaystyle\tilde{\Gamma}:=\begin{bmatrix}\frac{1}{\delta}&0\\ 0&\frac{1}{\beta}\end{bmatrix}\Gamma=\begin{bmatrix}\frac{1}{\delta}\Delta&0\\ 0&\frac{1}{\beta}M_{LTV}\end{bmatrix}.

(16)

The scaling normalizes each block so that $\|\Gamma\|\leq 1$ .

Next, the uncertainty is LTI and hence $d_{1}\Delta=\Delta d_{1}$ for any scalar $d_{1}>0$ . (In fact, this relation holds even if $d_{1}$ is also an LTI system but we will not pursue this generalization.) Similarly, the learning dynamics are also linear and hence $d_{2}M_{LTV}=M_{LTV}d_{2}$ for any scalar $d_{2}>0$ . It follows that the normalized systems can be equivalently written, for any $d_{1},d_{2}>0$ , as:

\displaystyle\tilde{\Gamma}:=\begin{bmatrix}\frac{1}{d_{1}\delta}&0\\ 0&\frac{1}{d_{2}\beta}\end{bmatrix}\Gamma\begin{bmatrix}d_{1}&0\\ 0&d_{2}\end{bmatrix}.

(17)

This discussion leads to the following scaled small gain result.

Theorem 2.

Consider the interconnection $F_{U}(P,\Gamma)$ where $P:\ell_{pe}\to\ell_{pe}$ and $\Gamma:\ell_{pe}\to\ell_{pe}$ are linear systems with finite induced $\ell_{p}$ -norm. Assume $\Gamma:=\left[\begin{smallmatrix}\Delta&0\\ 0&M_{LTV}\end{smallmatrix}\right]$ where $\|\Delta\|\leq\delta$ and $\|M_{LTV}\|\leq\beta$ . Partition $P$ as:

\displaystyle\begin{bmatrix}\bar{p}\\ x\end{bmatrix}=\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{21}\end{bmatrix}\,\begin{bmatrix}\bar{q}\\ d\end{bmatrix},

(18)

\displaystyle\tilde{P}_{11}:=\begin{bmatrix}\frac{1}{d_{1}}\,I&0\\ 0&\frac{1}{d_{2}}\,I\end{bmatrix}P_{11}\begin{bmatrix}d_{1}\delta\,I&0\\ 0&d_{2}\beta\,I\end{bmatrix}

(19)

satisfies $\|\tilde{P}_{11}\|<1$ .

Proof.

Define a scaled version of the nominal dynamics $P$ as:

\displaystyle\tilde{P}=\left[\begin{array}[]{cc|c}\frac{1}{d_{1}}I&0&0\\ 0&\frac{1}{d_{2}}I&0\\ \hline\cr 0&0&I\end{array}\right]\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{22}\end{bmatrix}\left[\begin{array}[]{cc|c}d_{1}\delta\,I&0&0\\ 0&d_{2}\beta\,I&0\\ \hline\cr 0&0&I\end{array}\right].

The constants introduced in the scaled plant $\tilde{P}$ cancel those introduced for $\tilde{\Gamma}$ in (16). In other words, $F_{U}(P,\Gamma)$ and $F_{U}(\tilde{P},\tilde{\Gamma})$ define the same dynamics from $d$ to $x$ . Moreover, $\|\tilde{P}_{11}\|<1$ and $\|\Gamma\|\leq 1$ by assumption. It follows from the small gain theorem (Theorem 1) that $F_{U}(\tilde{P},\tilde{\Gamma})=F_{U}(P,\Gamma)$ has finite induced $\ell_{p}$ -norm. ∎

The scalings $d_{1}$ and $d_{2}$ in the robust stability condition (Theorem 2) can be used to reduce the conservatism of the small gain condition (Theorem 1). They are known as $D$ -scales in the robust control literature ([22] and Chapter 11 in [11]) and are used in structured singular value robust stability tests.

3.3 Bounding the LTV Dynamics

In this section, we provide a result specific to the induced $\ell_{\infty}$ -norm for the OCO control implementation. The induced $\ell_{\infty}$ -norm is useful as it allows us to relate $\|M_{LTV}\|_{\infty\to\infty}$ to $\|M_{t}\|_{\infty\to\infty}$ . The robust stability constraint can then be imposed as a point-wise in time constraint $\beta$ on the coefficients $\|M_{t}\|_{\infty\to\infty}\leq\beta$ in the projection step of OPGD. We discuss this further in Section 4.2 and 4.3.

The dynamics $M_{LTV}$ in (8) can be expressed as:

\displaystyle u_{t}^{oco}

\displaystyle=M_{t}\hat{W}_{t},

(20)

where

	$\displaystyle M_{t}$	$\displaystyle:=\begin{bmatrix}M_{t}^{[0]}&\cdots&M_{t}^{[H-1]}\end{bmatrix}\in% \mathbb{R}^{n_{u}\times n_{x}H},\mbox{ and }$		(21)
	$\displaystyle\hat{W}_{t}$	$\displaystyle:=\left[\begin{smallmatrix}\hat{w}_{t}\\ \vdots\\ \hat{w}_{t-H+1}\end{smallmatrix}\right]\in\mathbb{R}^{n_{x}H}$		(22)

are the stacked FIR coefficients and estimated disturbance history. The following theorem relates the induced $\ell_{\infty}$ -norm of the system $M_{LTV}$ to the matrix induced $\infty$ -norm of $M_{t}$ .

Theorem 3.

Let $M_{LTV}$ be the LTV system defined in (20) and $M_{t}$ be the stacked gains defined in (21). Then

\displaystyle\|M_{LTV}\|_{\infty\to\infty}=\sup_{t}\|M_{t}\|_{\infty\to\infty}.

(23)

Proof.

The equality in (23) is shown in two steps: (A) $\|M_{LTV}\|_{\infty\to\infty}\leq\sup_{t}\|M_{t}\|_{\infty\to\infty}$ and (B) $\|M_{LTV}\|_{\infty\to\infty}\geq\sup_{t}\|M_{t}\|_{\infty\to\infty}$ .

First, we show direction (A). Let $\hat{w}$ and $u^{oco}$ be any input-output pair of $M_{LTV}$ . Equation (20) and the definition of the induced matrix norm imply that

	$\displaystyle\\|u^{oco}\\|_{\infty}$	$\displaystyle=\sup_{t}\\|M_{t}\hat{W}_{t}\\|_{\infty}$
		$\displaystyle\leq\sup_{t}\\|M_{t}\\|_{\infty\to\infty}\cdot\\|\hat{w}\\|_{\infty}.$		(24)

Thus, $\frac{\|u^{oco}\|_{\infty}}{\|\hat{w}\|_{\infty}}\leq\sup_{t}\,\|M_{t}\|_{% \infty\to\infty}$ so that $\|M_{LTV}\|_{\infty\to\infty}\leq\sup_{t}\|M_{t}\|_{\infty\to\infty}$ . Hence, claim (A) holds.

Next, we show direction (B). Suppose $\sup_{t}\|M_{t}\|_{\infty\to\infty}$ achieves its maximum at some finite time $t_{0}$ . (The proof can be modified if the supremum occurs as $t\to\infty$ .) Then there exists a vector $\hat{W}_{0}$ such that

\displaystyle\|\hat{W}_{0}\|_{\infty}=1\mbox{ and }\|M_{t_{0}}\,\hat{W}_{0}\|_% {\infty}=\sup_{t}\|M_{t}\|_{\infty\to\infty}.

We can use the vector $\hat{W}_{0}$ to construct a signal $\hat{w}_{0}$ such that $u^{oco}=M_{LTV}\,\hat{w}_{0}$ satisfies

\displaystyle\|u^{oco}\|_{\infty}\geq\left[\sup_{t}\|M_{t}\|_{\infty\to\infty}% \right]\|\hat{w}_{0}\|_{\infty}.

Hence, claim (B) holds. ∎

4 Application to OCO

In this section, we demonstrate how the main results can be applied to ensure robust stability of existing OCO controllers. We focus on the OCO controllers in [10, 7] where the coefficients of $M_{LTV}$ are updated via OPGD.

4.1 Estimator Design

The class of OCO controllers defined by [10] considers the feedback system with OCO control (Figure 2) and no uncertainty (Figure 1) when $\Delta(z)=0$ . In this case, a perfect plant model is assumed $\tilde{G}(z)=G(z)$ . Thus, the nominal plant dynamics can be used to design an estimator $E(z)$ and OPGD to update the coefficients in $M_{LTV}$ . Later, we will show how the OPGD projection step can be modified to ensure robust stability for the case that there is uncertainty $\Delta(z)\neq 0$ .

Without uncertainty, the plant dynamics with unknown disturbance reduce to:

\displaystyle x_{t+1}=Ax_{t}+Bu_{t}+Bd_{t}.

Note that $Bd_{t}$ is the effective disturbance on the state at time $t$ . Assuming the state $x_{t}$ is measurable, we can perfectly reconstruct this effective disturbance at the previous time step. Use the measured state and rearranging the plant dynamics:

\displaystyle\hat{w}_{t}=x_{t}-Ax_{t-1}-Bu_{t-1}.

(25)

With no uncertainty, this estimator perfectly reconstructs the effective disturbance with a one-step delay: $\hat{w}_{t}=Bd_{t-1}$ . However, perfect reconstruction is no longer guaranteed with uncertainty, i.e. if $\Delta(z)\neq 0$ then $\hat{w}_{t}\neq Bd_{t-1}$ . In this case, $\hat{w}_{t}$ is considered an estimate of $Bd_{t-1}$ .

The disturbance reconstruction (25) can be expressed in state-space form as:

	$\displaystyle x^{e}_{t+1}$	$\displaystyle=0\,x^{e}_{t}-A\,x_{t}-Bu_{t}$
	$\displaystyle\hat{w}_{t}$	$\displaystyle=x^{e}_{t}+x_{t},$

where $x^{e}_{t}=-Ax_{t-1}-Bu_{t-1}$ is the estimator state. This has the form of the general LTI estimator $E(z)$ in (6). The estimates $\hat{w}_{t}$ of past disturbances are used to update the FIR coefficients $M_{t}$ defined in (21) by minimizing an “ideal” cost which we describe next.

4.2 OPGD on an Ideal Cost

The coefficients $M_{t}$ are updated at each time step via OPGD in the direction of an “ideal” (per-step) cost. This cost is associated with the nominal plant dynamics (3) and a per-step cost function. Here, we consider quadratic per-step costs:

\displaystyle c(x_{t},d_{t})=x_{t}^{\top}Q\,x_{t}+u_{t}^{\top}R\,u_{t},

(26)

where $Q=Q^{\top}\succeq 0\in\mathbb{R}^{n_{x}\times n_{x}}$ and $R=R^{\top}\succ 0\in\mathbb{R}^{n_{u}\times n_{u}}$ . Note that the finite-horizon cost is defined as:

\displaystyle J_{T}(x,d)=\sum_{t=0}^{T}c(x_{t},d_{t}),

(27)

where $T$ is the total time horizon. The ideal cost $g(M)$ is defined for any static gain $M\subset\mathbb{R}^{n_{u}\times n_{x}H}$ based on this per-step cost (26) which is computed and defined as follows.

Let $\tilde{x}_{\tau}\in\mathbb{R}^{n_{x}}$ and $\tilde{u}_{\tau}\in\mathbb{R}^{n_{u}}$ denote the ideal state and control input at time $\tau$ , respectively. The ideal state and input are initialized at $\tau=t-H$ by:

\displaystyle\tilde{x}_{t-H}=0\mbox{ and }\tilde{u}_{t-H}=\sum_{i=0}^{H-1}M^{[% i-1]}\,w_{t-H-i}.

(28)

where $t$ is the current time. The ideal state and control input are then computed for $\tau=t-H+1,\ldots,t$ by iterating over the plant dynamics with the static gains $M$ :

	$\displaystyle\tilde{x}_{\tau}$	$\displaystyle=A\,\tilde{x}_{\tau-1}+B\,\tilde{u}_{\tau-1}+\hat{w}_{\tau-1}$		(29)
	$\displaystyle\tilde{u}_{\tau}$	$\displaystyle=-K\,\tilde{x}_{\tau}+\sum_{i=0}^{H-1}M^{[i]}\,\hat{w}_{\tau-i}.$		(30)

The ideal cost is then defined as $g(M):=c(\tilde{x}_{t},\tilde{u}_{t})$ . In other words, the ideal cost $g(M)$ is the cost of the plant dynamics evolving with static gain $M$ over the learning horizon $H$ , neglecting dynamics beyond time $t-H$ . The coefficients $M_{t}$ are updated via OPGD on this ideal cost:

\displaystyle M_{t+1}=\Pi_{\mathcal{M}}\left(M_{t}-\eta\nabla_{M}g(M_{t})% \right),

(31)

where $\eta$ is the learning rate, and $\Pi_{\mathcal{M}}$ is the projection of the gradient step of $M_{t}$ onto a constraint set $\mathcal{M}$ . Additional details are given in [10, 7]. Next, we show how the constraint set $\mathcal{M}$ can be modified to ensure the robust stability of the OCO feedback system (Figures 1 and 2) when $\Delta(z)\neq 0$ .

4.3 Robust OCO Control

Assuming the uncertainty $\Delta(z)$ is bounded by some $\delta$ , i.e., $\|\Delta\|_{\infty\to\infty}\leq\delta$ , we can use a bisection to find the required bound $\beta$ on the FIR filter $\|M_{LTV}\|_{\infty\to\infty}\leq\beta$ such that the robust stability condition (Theorem 2) is satisfied. Larger values of $\beta$ risk stability, yet can improve disturbance rejection as they allow the OCO more freedom to adapt to the gains in $M_{LTV}$ . Thus, it is important to determine the largest possible value of $\beta$ such that the robust stability condition holds. We refer to this $\beta$ as the stability bound. Theorem 3 allows us to impose this constraint as a point-wise in time constraint on the FIR coefficients $M_{t}$ .

Once the constraint $\beta$ has been determined, we can impose the constraint by defining the constraint set $\mathcal{M}$ as:

\displaystyle\mathcal{M}:=\Big{\{}M\in\mathbb{R}^{n_{u}\times n_{x}H}:\|M\|_{% \infty\to\infty}\leq\beta\Big{\}}.

(32)

Thus, the projection $\Pi_{\mathcal{M}}$ can be implemented by:

\displaystyle M_{t+1}=\begin{cases}M_{step},&\|M_{step}\|_{\infty\to\infty}% \leq\beta\\ \beta\left(\frac{M_{step}}{\|M_{step}\|_{\infty\to\infty}}\right),&\|M_{step}% \|_{\infty\to\infty}>\beta,\end{cases}

(33)

where $M_{step}:=M_{t}-\eta\nabla_{M}g(M_{t})$ is the gradient step of the FIR coefficients $M_{t}$ , defined at time $t$ . The constraint set $\mathcal{M}$ defined in (32) and projection $\Pi_{\mathcal{M}}$ in (33) can be implemented as part of Algorithm 1 in [10]. The numerical results in the following section are based on this implementation.

5 Numerical Results

In this section, we provide numerical results of OPGD on a plant with uncertainty. Although we do not explicitly use the robust stability condition (Theorem 2) to compute the stability bound $\beta$ , we perform numerical studies to illustrate its effect. Future studies will focus on computing the exact bound, while the results here suggest that a stability bound $\beta$ exists.

Here, unconstrained OCO (U-OCO) refers to OCO control with $\beta=\infty$ , i.e., unconstrained FIR filter gains $M_{t}$ . Constrained OCO (C-OCO) refers to OCO control with gains $M_{t}$ bounded by some $\beta<\infty$ . We compare results between U-OCO and C-OCO on the following models:

	$\displaystyle G(z)$	$\displaystyle=\frac{0.1}{z-0.9}$
	$\displaystyle F(z)$	$\displaystyle=\frac{0.1185z+0.1145}{z^{2}-1.672z+0.9048}$

where $G(z)$ and $F(z)$ are the nominal plant and unmodeled high frequency actuator dynamics, respectively. Note that $\Delta(z)=F(z)-1$ and $\tilde{G}(z)=G(z)F(z)$ . The following disturbance $d_{t}$ was generated to perturb the control input $u_{t}$ :

\displaystyle d_{t}=\begin{cases}100&0\leq t\leq 500\\ -100&500<t\leq T,\end{cases}

where the time horizon is $T=1000$ . We use the quadratic per-step cost $c(x_{t},d_{t})$ and total cost $J_{T}(x,d)$ defined in (26) and (27), respectively, with $Q=1$ and $R=10^{-1}$ . The state-feedback gain $K=0.15$ , learning horizon $H=1$ , and learning rate $\eta=5\times 10^{-4}$ are used for all simulations. Note that $K=0.15$ is stabilizing for both the nominal and true plant dynamics.

Figure 4 shows the per-step cost $c(x_{t},d_{t})$ and estimated disturbance $\hat{w}_{t}$ of U-OCO at each time $t$ . We compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. Again, a perfect model is without uncertainty $\Delta(z)=0$ , and an imperfect model is with uncertainty $\Delta(z)\neq 0$ . The disturbance is perfectly reconstructed $\hat{w}_{t}=Bd_{t-1}$ (see Section 4.1) with a perfect plant model. However, with an imperfect plant model, this is not the case $\hat{w}_{t}\neq Bd_{t-1}$ . Since the ideal cost $g(M)$ computation assumes a perfect plant model and disturbance estimates, this mismatch introduces an error in the coefficient update $M_{t+1}$ . This causes an instability which is reflected by the per-step cost and estimated disturbance growing unbounded. On the other hand, U-OCO performance is stable without uncertainty because the disturbance is estimated perfectly. Thus, the constraint $\beta$ is needed on the coefficient update to ensure stability for the imperfect plant model.

Refer to caption — Figure 4: Per-step cost (top) and disturbance estimate (bottom) of running U-OCO on a perfect (red dashed) and imperfect (blue solid) plant model. U-OCO is stable with a perfect model and unstable for an imperfect model.

Figure 5 shows the per-step cost $c(x_{t},d_{t})$ and estimated disturbance $\hat{w}_{t}$ of C-OCO for $\beta=1.5$ at each time $t$ . Again, we compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. As mentioned before, an error in the disturbance estimate introduces an error in the ideal cost gradient. The ideal cost gradient error can cause the gradient step $M_{step}=M_{t}-\nabla_{M}g(M_{t})$ to grow too large in the wrong direction. When the constraint $\beta$ is chosen such that the robust stability condition (Theorem 2) is satisfied, the effect of uncertainty induced error on the gradient step of the coefficient update is limited. This is illustrated in Figure 5 as the performance of C-OCO on the imperfect plant model eventually recovers the performance on the perfect model with $\beta=1.5$ . Thus, imposing the constraint $\beta$ can ensure that OCO is robust to uncertainty.

As mentioned in Section 4.3, the choice of $\beta$ is critical. Figure 6 shows the averaged per-step cost $J_{T}(x,d)/T$ for C-OCO as a function of $\beta$ . Again, we compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. When $\beta=0$ , the OCO has no freedom to learn the disturbance, and pure state-feedback (SF) is recovered for both the perfect and imperfect plants (red and blue circles, respectively). As $\beta$ is increased, the OCO is allowed more freedom to learn the disturbance, and we see similar improved performance in both the perfect and imperfect plants. However, when $\beta$ is ”too large” such that the robust stability condition (Theorem 2) no longer holds, C-OCO on the imperfect plant becomes unstable. Figure 6 suggests that the stability bound occurs around $\beta=1.5$ . Note that once the constraint $\beta$ becomes inactive, C-OCO recovers U-OCO performance for the perfect and imperfect plants (red and blue squares, respectively). For the perfect plant, this indicates a limit as to how much the OCO can improve on the baseline controller. For the imperfect plant, this indicates a limit as to how much the OCO performance can be degraded by uncertainty. Hence, there is this trade off between OCO performance and robustness to uncertainty.

6 Conclusion

In this paper, we establish a robust stability condition using the small gain theorem for a class of OCO controllers with memory and use this result to develop an OCO control algorithm (C-OCO) robust to model uncertainty. In particular, we impose this constraint on the controller by bounding the LTV dynamics of the OCO controller point-wise in time. We provide numerical results to illustrate that imposing the robust stability constraint keeps the closed-loop system stable when it would go unstable otherwise. Future work will study the numerical implementation of the scaled small gain theorem to compute the stability bound $\beta$ .

References

[1] O. Anava, E. Hazan, and S. Mannor, “Online convex optimization against adversaries with memory and application to statistical arbitrage,” 2014.
[2] E. Hazan, “The Convex Optimization Approach to Regret Minimization,” in Optimization for Machine Learning, The MIT Press, 09 2011.
[3] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th international conference on machine learning (icml-03), pp. 928–936, 2003.
[4] E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 2-3, pp. 169–192, 2007.
[5] S. Shalev-Shwartz, “Online learning and online convex optimization,” Foundations and trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
[6] E. Hazan, “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 3-4, pp. 157–325, 2016.
[7] N. Agarwal, E. Hazan, and K. Singh, “Logarithmic regret for online control,” in Advances in Neural Information Processing Systems, pp. 10175–10184, 2019.
[8] D. Foster and M. Simchowitz, “Logarithmic regret for adversarial online control,” in International Conference on Machine Learning, pp. 3211–3221, 2020.
[9] G. Goel, N. Agarwal, K. Singh, and E. Hazan, “Best of both worlds in online control: Competitive ratio and policy regret,” arXiv preprint arXiv:2211.11219, 2022.
[10] N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online control with adversarial disturbances,” in International Conference on Machine Learning, pp. 111–119, PMLR, 2019.
[11] K. Zhou, J. Doyle, and K. Glover, Robust and optimal control. Pearson, 1995.
[12] S. Skogestad and I. Postlethwaite, Multivariable Feedback Control: Analysis and Design. John Wiley and Sons, 2nd ed., 2005.
[13] Y. Rahman, A. Xie, J. B. Hoagg, and D. S. Bernstein, “A tutorial and overview of retrospective cost adaptive control,” in 2016 American Control Conference, pp. 3386–3409, 2016.
[14] R. Venugopal and D. S. Bernstein, “Adaptive disturbance rejection using ARMARKOV/Toeplitz models,” IEEE Transactions on Control Systems Technology, vol. 8, no. 2, pp. 257–269, 2000.
[15] M. A. Santillo and D. S. Bernstein, “Adaptive control based on retrospective cost optimization,” Journal of guidance, control, and dynamics, vol. 33, no. 2, pp. 289–304, 2010.
[16] G. Goel and B. Hassibi, “Measurement-feedback control with optimal data-dependent regret,” arXiv preprint arXiv:2209.06425, 2022.
[17] G. Goel and B. Hassibi, “Regret-optimal estimation and control,” arXiv preprint arXiv:2106.12097, 2021.
[18] G. Goel and B. Hassibi, “Regret-optimal measurement-feedback control,” in Learning for Dynamics and Control, pp. 1270–1280, PMLR, 2021.
[19] G. Goel and B. Hassibi, “Regret-optimal control in dynamic environments,” arXiv preprint arXiv:2010.10473, 2020.
[20] J. Doyle, “Guaranteed margins for LQG regulators,” IEEE Transactions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978.
[21] H. K. Khalil, Nonlinear Systems. Upper Saddle River, NJ: Prentice-Hall, third ed. ed., 2002.
[22] A. Packard and J. Doyle, “The complex structured singular value,” Automatica, vol. 29, no. 1, pp. 71–109, 1993.

	$\displaystyle\\|u^{oco}\\|_{\infty}$	$\displaystyle=\sup_{t}\\|M_{t}\hat{W}_{t}\\|_{\infty}$
		$\displaystyle\leq\sup_{t}\\|M_{t}\\|_{\infty\to\infty}\cdot\\|\hat{w}\\|_{\infty}.$		(24)