Robust Online Convex Optimization for Disturbance Rejection

Joyce Lai and Peter Seiler J. Lai and P. Seiler are with the Department of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor, MI 48109, USA. {joycelai,pseiler}@umich.edu
Abstract

Online convex optimization (OCO) is a powerful tool for learning sequential data, making it ideal for high precision control applications where the disturbances are arbitrary and unknown in advance. However, the ability of OCO-based controllers to accurately learn the disturbance while maintaining closed-loop stability relies on having an accurate model of the plant. This paper studies the performance of OCO-based controllers for linear time-invariant (LTI) systems subject to disturbance and model uncertainty. The model uncertainty can cause the closed-loop to become unstable. We provide a sufficient condition for robust stability based on the small gain theorem. This condition is easily incorporated as an on-line constraint in the OCO controller. Finally, we verify via numerical simulations that imposing the robust stability condition on the OCO controller ensures closed-loop stability.

1 Introduction

This paper considers a class of controllers recently developed using online convex optimization (OCO). Online machine learning and convex optimization methods are powerful tools for learning sequential data. This makes these techniques ideal for high precision control applications like satellite pointing and photolithography. These systems have reliable physics-based models with small error (within the control bandwidth) but are subject to unknown arbitrary disturbances.

This has motivated a large body of recent work using online learning and convex optimization for control [1, 2, 3, 4, 5, 6, 7, 8, 9]. The most closely related work is the class of OCO controllers defined in [10]. Here, OCO with memory is introduced to the discrete-time control setting as an ideal cost minimization problem (which we describe in detail in Section 4.2) to handle arbitrary disturbances and general time-varying convex cost functions. The OCO controller has promising regret guarantees and makes less restrictive assumptions about the disturbance characteristics (e.g., white noise or worst-case) than that of H2subscript𝐻2H_{2}italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and Hsubscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT optimal control techniques [11, 12]. This makes OCO methods well suited for high precision control applications with unknown, arbitrary disturbances that degrade the system performance.

The OCO framework in [10] aims to learn the disturbance characteristics in real time. However, small model errors can cause instability and thus must be explicitly considered in the design. There are additional works that attempt to learn the model from data [13, 14, 15, 16, 17, 18, 19]. However, dynamic uncertainties in many high precision applications arise due to high frequency, time-varying, and/or nonlinear effects. It is difficult to learn such unmodeled effects from real-time data. In these cases, it is useful to design a robust OCO-based controller that can learn the disturbance features and tolerate model uncertainty, thus motivating our work.

There are three main contributions of our work. First, we provide a robust stability condition for OCO control of a discrete linear time-invariant (LTI) plant (Theorem 2 in Section 3.2). The scaled small gain condition is written abstractly with an arbitrary choice of an induced system norm. Our second contribution is to present a constrained OCO (C-OCO) control algorithm which is robust to nonparametric model uncertainties (Section 4). This algorithm uses a specific implementation of the scaled small gain condition with the induced subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm (Section 3.3). This particular choice for the induced norm enables easy implementation of the robust stability condition in the C-OCO algorithm. The third contribution is to present numerical results that illustrate the effect of this robust stability constraint on the OCO controller (Section 5).

2 Problem Formulation

This section formulates the OCO control problem for discrete-time LTI plants subject to both model uncertainty and unknown disturbances.

2.1 Notation

Let vn𝑣superscript𝑛v\in\mathbb{R}^{n}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a vector. The p𝑝pitalic_p-norm of this vector is defined as vp:=[i=1n|vi|p]1passignsubscriptnorm𝑣𝑝superscriptdelimited-[]superscriptsubscript𝑖1𝑛superscriptsubscript𝑣𝑖𝑝1𝑝\|v\|_{p}:=\big{[}\sum_{i=1}^{n}|v_{i}|^{p}\big{]}^{\frac{1}{p}}∥ italic_v ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT := [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT. Next, \mathbb{N}blackboard_N denotes the set of non-negative integers. Let d:n:𝑑superscript𝑛d:\mathbb{N}\to\mathbb{R}^{n}italic_d : blackboard_N → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denote a vector-valued sequence {d0,d1,}subscript𝑑0subscript𝑑1\{d_{0},d_{1},\ldots\}{ italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … }. The psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm of d𝑑ditalic_d is defined as:

dp=[t=0dtpp]1p.subscriptnorm𝑑𝑝superscriptdelimited-[]superscriptsubscript𝑡0superscriptsubscriptnormsubscript𝑑𝑡𝑝𝑝1𝑝\displaystyle\|d\|_{p}=\left[\sum_{t=0}^{\infty}\|d_{t}\|_{p}^{p}\right]^{% \frac{1}{p}}.∥ italic_d ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT . (1)

Note that dtpsubscriptnormsubscript𝑑𝑡𝑝\|d_{t}\|_{p}∥ italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the p𝑝pitalic_p-norm of the vector dtnsubscript𝑑𝑡superscript𝑛d_{t}\in\mathbb{R}^{n}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT at time t𝑡titalic_t while dpsubscriptnorm𝑑𝑝\|d\|_{p}∥ italic_d ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm of the sequence. The set psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT consists of sequences that have finite psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm. The subset pepsubscript𝑝𝑒subscript𝑝\ell_{pe}\subset\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT ⊂ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the extended space of sequences that have finite psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm on all finite intervals, i.e. t=0Tdtpp<superscriptsubscript𝑡0𝑇superscriptsubscriptnormsubscript𝑑𝑡𝑝𝑝\sum_{t=0}^{T}\|d_{t}\|_{p}^{p}<\infty∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT < ∞ for all T0𝑇0T\geq 0italic_T ≥ 0. Finally, let G:pepe:𝐺subscript𝑝𝑒subscript𝑝𝑒G:\ell_{pe}\to\ell_{pe}italic_G : roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT → roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT denote systems that map an input signal upe𝑢subscript𝑝𝑒u\in\ell_{pe}italic_u ∈ roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT to an output signal ype𝑦subscript𝑝𝑒y\in\ell_{pe}italic_y ∈ roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT. The induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm for this system is defined as:

Gpp=sup0upypup.subscriptnorm𝐺𝑝𝑝subscriptsupremum0𝑢subscript𝑝subscriptnorm𝑦𝑝subscriptnorm𝑢𝑝\displaystyle\|G\|_{p\to p}=\sup_{0\neq u\in\ell_{p}}\frac{\|y\|_{p}}{\|u\|_{p% }}.∥ italic_G ∥ start_POSTSUBSCRIPT italic_p → italic_p end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT 0 ≠ italic_u ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ∥ italic_y ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_u ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG . (2)

To simplify notation, we’ll often use dnorm𝑑\|d\|∥ italic_d ∥ and Gnorm𝐺\|G\|∥ italic_G ∥ for the signal norm and system induced norm when the specific p𝑝pitalic_p-norm is not important.

2.2 Model Uncertainty

In this section, we consider the feedback system in Figure 1 and discuss the model uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ) in more detail.

OCOControlleru𝑢uitalic_ud𝑑ditalic_dp𝑝pitalic_pΔ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z )q𝑞qitalic_qG~=G(I+Δ)~𝐺𝐺𝐼Δ\tilde{G}=G(I+\Delta)over~ start_ARG italic_G end_ARG = italic_G ( italic_I + roman_Δ )Uncertain Plantv𝑣vitalic_vG(z)𝐺𝑧G(z)italic_G ( italic_z )x𝑥xitalic_x
Figure 1: Discrete-time feedback system with unknown disturbance d𝑑ditalic_d and uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ). OCO control is used to reject the disturbance d𝑑ditalic_d without knowledge of the uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ).

Consider the nominal discrete-time, LTI plant G(z)𝐺𝑧G(z)italic_G ( italic_z ) with dynamics:

xt+1=Axt+Bvt,subscript𝑥𝑡1𝐴subscript𝑥𝑡𝐵subscript𝑣𝑡\displaystyle x_{t+1}=A\,x_{t}+B\,v_{t},italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_A italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (3)

where xtnxsubscript𝑥𝑡superscriptsubscript𝑛𝑥x_{t}\in\mathbb{R}^{n_{x}}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and vtnusubscript𝑣𝑡superscriptsubscript𝑛𝑢v_{t}\in\mathbb{R}^{n_{u}}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the nominal plant state and input at time t𝑡titalic_t, respectively. We assume x0=0subscript𝑥00x_{0}=0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 for simplicity.

Model uncertainty for systems with physics-based models often shows up as unmodeled actuator dynamics affecting the plant input [11, 20, 12]. We can account for these unmodeled dynamics by defining an input-multiplicative uncertainty set 𝒢δsubscript𝒢𝛿\mathcal{G}_{\delta}caligraphic_G start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT as:

𝒢δ={G~(z)=G(z)(I+Δ(z)):Δδ},subscript𝒢𝛿conditional-set~𝐺𝑧𝐺𝑧𝐼Δ𝑧normΔ𝛿\displaystyle\mathcal{G}_{\delta}=\left\{\tilde{G}(z)=G(z)(I+\Delta(z)):\|% \Delta\|\leq\delta\right\},caligraphic_G start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = { over~ start_ARG italic_G end_ARG ( italic_z ) = italic_G ( italic_z ) ( italic_I + roman_Δ ( italic_z ) ) : ∥ roman_Δ ∥ ≤ italic_δ } , (4)

where δ[0,)𝛿0\delta\in[0,\infty)italic_δ ∈ [ 0 , ∞ ). Note that the induced 2222-norm is common choice to bound the uncertainty. However, our main result in Section 3 holds for any induced p𝑝pitalic_p-norm.

Let G~0(z)subscript~𝐺0𝑧\tilde{G}_{0}(z)over~ start_ARG italic_G end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) denote the true plant dynamics. We assume that the true plant is within the uncertainty set, i.e. G~0(z)𝒢δsubscript~𝐺0𝑧subscript𝒢𝛿\tilde{G}_{0}(z)\in\mathcal{G}_{\delta}over~ start_ARG italic_G end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) ∈ caligraphic_G start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT. In other words, there exists a specific Δ0(z)subscriptΔ0𝑧\Delta_{0}(z)roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) such that Δ0δnormsubscriptΔ0𝛿\|\Delta_{0}\|\leq\delta∥ roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ≤ italic_δ and G~0(z)=G(z)(I+Δ0(z))𝒢δsubscript~𝐺0𝑧𝐺𝑧𝐼subscriptΔ0𝑧subscript𝒢𝛿\tilde{G}_{0}(z)=G(z)(I+\Delta_{0}(z))\in\mathcal{G}_{\delta}over~ start_ARG italic_G end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) = italic_G ( italic_z ) ( italic_I + roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_z ) ) ∈ caligraphic_G start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT. More generally, we refer to G~(z)=G(z)(I+Δ(z))~𝐺𝑧𝐺𝑧𝐼Δ𝑧\tilde{G}(z)=G(z)(I+\Delta(z))over~ start_ARG italic_G end_ARG ( italic_z ) = italic_G ( italic_z ) ( italic_I + roman_Δ ( italic_z ) ) as the uncertain plant. An alternative viewpoint is that the uncertain plant is G~(z)=G(z)F(z)~𝐺𝑧𝐺𝑧𝐹𝑧\tilde{G}(z)=G(z)F(z)over~ start_ARG italic_G end_ARG ( italic_z ) = italic_G ( italic_z ) italic_F ( italic_z ) where F(z)=I+Δ(z)𝐹𝑧𝐼Δ𝑧F(z)=I+\Delta(z)italic_F ( italic_z ) = italic_I + roman_Δ ( italic_z ) represents unmodeled dynamics. Note that we assume the uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ) is LTI. However, our main result in Section 3 can be extended to the case where ΔΔ\Deltaroman_Δ is a possibly nonlinear time-varying (NLTV) system.

2.3 OCO Control

This section describes the OCO controller. We consider the feedback system in Figure 2 where the OCO controller is shown in more detail.

E(z)𝐸𝑧E(z)italic_E ( italic_z )w^^𝑤\hat{w}over^ start_ARG italic_w end_ARGMLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPTuocosuperscript𝑢𝑜𝑐𝑜u^{oco}italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPTK𝐾-K- italic_Kubasesuperscript𝑢𝑏𝑎𝑠𝑒u^{base}italic_u start_POSTSUPERSCRIPT italic_b italic_a italic_s italic_e end_POSTSUPERSCRIPTOCO Controlleru𝑢uitalic_ud𝑑ditalic_dp𝑝pitalic_pG~(z)~𝐺𝑧\tilde{G}(z)over~ start_ARG italic_G end_ARG ( italic_z )x𝑥xitalic_x
Figure 2: Block diagram representation of the OCO controller in a discrete-time feedback system with unknown disturbance dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and uncertain plant G~(z)~𝐺𝑧\tilde{G}(z)over~ start_ARG italic_G end_ARG ( italic_z ). The OCO controller is composed of a state-feedback gain K𝐾Kitalic_K, an estimator E(z)𝐸𝑧E(z)italic_E ( italic_z ), and an LTV system MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT.

Unknown disturbances are often caused by environmental factors and moving physical components which degrade system performance. However, these disturbances often also have learnable characteristics. It is typical to model such disturbances as entering at the plant input as shown in Figure 2.

OCO control can be used to learn and reject the disturbance without a priori knowledge of the disturbance [1, 2, 3, 4, 5, 6, 7, 8, 9]. Here, we describe a class of OCO controllers closely related to [10] which considers the case when Δ(z)=0Δ𝑧0\Delta(z)=0roman_Δ ( italic_z ) = 0. The OCO controller has the block diagram representation shown in Figure 2. This corresponds to the class of disturbance action controllers defined as:

ut=Kxt+i=0H1Mt[i]w^ti,subscript𝑢𝑡𝐾subscript𝑥𝑡superscriptsubscript𝑖0𝐻1superscriptsubscript𝑀𝑡delimited-[]𝑖subscript^𝑤𝑡𝑖\displaystyle u_{t}=-Kx_{t}+\sum_{i=0}^{H-1}M_{t}^{[i]}\hat{w}_{t-i},italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_K italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT , (5)

where Knu×nx𝐾superscriptsubscript𝑛𝑢subscript𝑛𝑥K\in\mathbb{R}^{n_{u}\times n_{x}}italic_K ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, Mt[i]nu×nxsuperscriptsubscript𝑀𝑡delimited-[]𝑖superscriptsubscript𝑛𝑢subscript𝑛𝑥M_{t}^{[i]}\in\mathbb{R}^{n_{u}\times n_{x}}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and w^tnxsubscript^𝑤𝑡superscriptsubscript𝑛𝑥\hat{w}_{t}\in\mathbb{R}^{n_{x}}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the state-feedback gain, learned coefficients, and disturbance estimate, at time t𝑡titalic_t, respectively. The state-feedback gain K𝐾Kitalic_K is user-selected while the learned coefficients {Mt}i=0H1superscriptsubscriptsubscript𝑀𝑡𝑖0𝐻1\{M_{t}\}_{i=0}^{H-1}{ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPTare typically updated via some online optimization method. For example, [10] uses online projected gradient descent (OPGD) with memory (see Section 4.2).

The disturbance estimate w^tsubscript^𝑤𝑡\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is assumed to be the output of an LTI estimator E(z)𝐸𝑧E(z)italic_E ( italic_z ) with dynamics:

xt+1e=Aexte+Be1xt+Be2utw^t=Cexte+De1xt+De2ut,superscriptsubscript𝑥𝑡1𝑒subscript𝐴𝑒superscriptsubscript𝑥𝑡𝑒subscript𝐵𝑒1subscript𝑥𝑡subscript𝐵𝑒2subscript𝑢𝑡subscript^𝑤𝑡subscript𝐶𝑒superscriptsubscript𝑥𝑡𝑒subscript𝐷𝑒1subscript𝑥𝑡subscript𝐷𝑒2subscript𝑢𝑡\displaystyle\begin{split}x_{t+1}^{e}&=A_{e}x_{t}^{e}+B_{e1}x_{t}+B_{e2}u_{t}% \\ \hat{w}_{t}&=C_{e}x_{t}^{e}+D_{e1}x_{t}+D_{e2}u_{t},\end{split}start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_CELL start_CELL = italic_A start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT + italic_B start_POSTSUBSCRIPT italic_e 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B start_POSTSUBSCRIPT italic_e 2 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL = italic_C start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT + italic_D start_POSTSUBSCRIPT italic_e 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT italic_e 2 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , end_CELL end_ROW (6)

where xtenesuperscriptsubscript𝑥𝑡𝑒superscriptsubscript𝑛𝑒x_{t}^{e}\in\mathbb{R}^{n_{e}}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and w^tnxsubscript^𝑤𝑡superscriptsubscript𝑛𝑥\hat{w}_{t}\in\mathbb{R}^{n_{x}}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the estimator state and output at time t𝑡titalic_t, respectively. Typically, w^tsubscript^𝑤𝑡\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is an estimate of Bdtnx𝐵subscript𝑑𝑡superscriptsubscript𝑛𝑥Bd_{t}\in\mathbb{R}^{n_{x}}italic_B italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (possibly with delay), i.e., it is an estimate the disturbance effect on the (nominal) state. The estimate is constructed from xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. This estimator is motivated by the case when Δ(z)=0Δ𝑧0\Delta(z)=0roman_Δ ( italic_z ) = 0.

The first term in (5) is considered the baseline controller which we denote by:

utbase=Kxt.superscriptsubscript𝑢𝑡𝑏𝑎𝑠𝑒𝐾subscript𝑥𝑡\displaystyle u_{t}^{base}=-Kx_{t}.italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b italic_a italic_s italic_e end_POSTSUPERSCRIPT = - italic_K italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (7)

The main results in Section 3 can be generalized to the case when the baseline control utbasesuperscriptsubscript𝑢𝑡𝑏𝑎𝑠𝑒u_{t}^{base}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b italic_a italic_s italic_e end_POSTSUPERSCRIPT is the output of an LTI controller K(z)𝐾𝑧K(z)italic_K ( italic_z ) with input xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. We assume the baseline controller is a static, state-feedback gain for simplicity.

The second term in (5) is the output of an finite impulse response (FIR) filter with time-varying coefficients. We denote the FIR filter with time-varying coefficients as a linear time-varying (LTV) system MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT with input-output dynamics defined as:

utoco=i=0H1Mt[i]w^ti.superscriptsubscript𝑢𝑡𝑜𝑐𝑜superscriptsubscript𝑖0𝐻1superscriptsubscript𝑀𝑡delimited-[]𝑖subscript^𝑤𝑡𝑖\displaystyle u_{t}^{oco}=\sum_{i=0}^{H-1}M_{t}^{[i]}\hat{w}_{t-i}.italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t - italic_i end_POSTSUBSCRIPT . (8)

where w^tnxsubscript^𝑤𝑡superscriptsubscript𝑛𝑥\hat{w}_{t}\in\mathbb{R}^{n_{x}}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and utoconusuperscriptsubscript𝑢𝑡𝑜𝑐𝑜superscriptsubscript𝑛𝑢u_{t}^{oco}\in\mathbb{R}^{n_{u}}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the input and output at time t𝑡titalic_t, respectively. The FIR filter order H𝐻Hitalic_H is also referred to as the learning horizon since the coefficients are often updated via OCO using the past H𝐻Hitalic_H disturbance estimates. We provide an example of online optimization in Sections 4 and 5, but the main results in Section 3 assume only that the coefficients are time-varying.

Given (7) and (8), the OCO controller (5) can be interpreted as a baseline controller utbasesuperscriptsubscript𝑢𝑡𝑏𝑎𝑠𝑒u_{t}^{base}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b italic_a italic_s italic_e end_POSTSUPERSCRIPT plus an adapting term utocosuperscriptsubscript𝑢𝑡𝑜𝑐𝑜u_{t}^{oco}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT which corrects for the unknown disturbance dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT based on disturbance estimates.

2.4 Model Uncertainty Effects on OCO Control

The uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ) and disturbance dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT have different effects on closed-loop stability. Suppose the state-feedback gain K𝐾Kitalic_K is stabilizing, i.e., all eigenvalues of (ABK)𝐴𝐵𝐾(A-BK)( italic_A - italic_B italic_K ) are strictly inside the unit disk. Given a perfect plant model, i.e., Δ(z)=0Δ𝑧0\Delta(z)=0roman_Δ ( italic_z ) = 0, OCO control can be designed to achieve disturbance rejection with provable guarantees [10]. In this case, a bounded disturbance d𝑑ditalic_d cannot cause signals x,u,w^𝑥𝑢^𝑤x,u,\hat{w}italic_x , italic_u , over^ start_ARG italic_w end_ARG, etc. to grow unbounded. However, small amounts of model uncertainty can cause the system to become unstable.

As shown in Figures 1 and 2, the (true) plant input is the control input perturbed by an unknown disturbance:

pt=ut+dt,subscript𝑝𝑡subscript𝑢𝑡subscript𝑑𝑡\displaystyle p_{t}=u_{t}+d_{t},italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (9)

where ut,dt,ptnusubscript𝑢𝑡subscript𝑑𝑡subscript𝑝𝑡superscriptsubscript𝑛𝑢u_{t},d_{t},p_{t}\in\mathbb{R}^{n_{u}}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the control input, disturbance, and perturbed (true) plant input at time t𝑡titalic_t, respectively. The perturbed input ptsubscript𝑝𝑡p_{t}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is further distorted by the uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ). The resulting input to the nominal plant G(z)𝐺𝑧G(z)italic_G ( italic_z ) is:

vtsubscript𝑣𝑡\displaystyle v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =(I+Δ)pt=ut+dt+qt,absent𝐼Δsubscript𝑝𝑡subscript𝑢𝑡subscript𝑑𝑡subscript𝑞𝑡\displaystyle=(I+\Delta)\,p_{t}=u_{t}+d_{t}+q_{t},= ( italic_I + roman_Δ ) italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (10)

where qt=Δptnusubscript𝑞𝑡Δsubscript𝑝𝑡superscriptsubscript𝑛𝑢q_{t}=\Delta p_{t}\in\mathbb{R}^{n_{u}}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Δ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Again, vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the nominal plant input at time t𝑡titalic_t. Not only is there an unknown disturbance dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, but also a distorted signal qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT due to uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ).

The additional perturbation qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can lead to unexpected behaviors that affect the disturbance estimate and FIR filter coefficient update when left unaccounted for in the OCO design. This can occur even when the state-feedback gain K𝐾Kitalic_K is stabilizing for the true plant G~(z)~𝐺𝑧\tilde{G}(z)over~ start_ARG italic_G end_ARG ( italic_z ). Thus, the OCO controller is required to: i) learn and compensate for the disturbance, and ii) stabilize the system in the presence of uncertainty. The OCO controller must achieve these objectives without a priori knowledge of the disturbance or uncertainty.

3 Main Result

This section provides a condition on MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT that ensures the feedback system with OCO control remains stable even in the presence of the model uncertainty.

3.1 Linear Fractional Transformation

As a first step, we transform the feedback system of the OCO controller and uncertain plant (Figures 1 and 2) to a standard form as shown in Figure 3. This diagram separates the LTI dynamics P𝑃Pitalic_P from the uncertainty ΔΔ\Deltaroman_Δ and time-varying OCO dynamics MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT. Here P𝑃Pitalic_P includes the dynamics due to the plant, estimator, and state-feedback gain. This diagram is called a linear fractional transformation (LFT) in the robust control literature [11, 12]. We use the notation FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) for this interconnection with Γ=[Δ00MLTV]Γdelimited-[]Δ00subscript𝑀𝐿𝑇𝑉\Gamma=\left[\begin{smallmatrix}\Delta&0\\ 0&M_{LTV}\end{smallmatrix}\right]roman_Γ = [ start_ROW start_CELL roman_Δ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT end_CELL end_ROW ] closed around the upper channels of P𝑃Pitalic_P.

P𝑃Pitalic_Pd𝑑ditalic_dx𝑥xitalic_xΔΔ\Deltaroman_Δ0000MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPTΓΓ\Gammaroman_Γ[pw^]matrix𝑝^𝑤\begin{bmatrix}p\\ \hat{w}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_p end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_w end_ARG end_CELL end_ROW end_ARG ][quoco]matrix𝑞superscript𝑢𝑜𝑐𝑜\begin{bmatrix}q\\ u^{oco}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_q end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]
Figure 3: Equivalent LFT FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) of original system separating LTI dynamics P𝑃Pitalic_P from uncertainty ΔΔ\Deltaroman_Δ and time-varying learning dynamics MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT.

An explicit state-space model for P𝑃Pitalic_P can be determined from the various components of the feedback system described in Section 2. The dynamics of P𝑃Pitalic_P are given by:

[xt+1xt+1e]matrixsubscript𝑥𝑡1superscriptsubscript𝑥𝑡1𝑒\displaystyle\begin{bmatrix}x_{t+1}\\ x_{t+1}^{e}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] =[ABK0Be1Be2KAe][xtxte]+[BBB0Be20][qtutocodt]absentmatrix𝐴𝐵𝐾0subscript𝐵𝑒1subscript𝐵𝑒2𝐾subscript𝐴𝑒matrixsubscript𝑥𝑡superscriptsubscript𝑥𝑡𝑒matrix𝐵𝐵𝐵0subscript𝐵𝑒20matrixsubscript𝑞𝑡superscriptsubscript𝑢𝑡𝑜𝑐𝑜subscript𝑑𝑡\displaystyle=\begin{bmatrix}A-BK&0\\ B_{e1}-B_{e2}K&A_{e}\end{bmatrix}\begin{bmatrix}x_{t}\\ x_{t}^{e}\end{bmatrix}+\begin{bmatrix}B&B&B\\ 0&B_{e2}&0\end{bmatrix}\begin{bmatrix}q_{t}\\ u_{t}^{oco}\\ d_{t}\end{bmatrix}= [ start_ARG start_ROW start_CELL italic_A - italic_B italic_K end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_B start_POSTSUBSCRIPT italic_e 1 end_POSTSUBSCRIPT - italic_B start_POSTSUBSCRIPT italic_e 2 end_POSTSUBSCRIPT italic_K end_CELL start_CELL italic_A start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_B end_CELL start_CELL italic_B end_CELL start_CELL italic_B end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_B start_POSTSUBSCRIPT italic_e 2 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]
[ptw^txt]matrixsubscript𝑝𝑡subscript^𝑤𝑡subscript𝑥𝑡\displaystyle\begin{bmatrix}p_{t}\\ \hat{w}_{t}\\ x_{t}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =[K0De1De2KCeI0][xtxte]+[0II0De20000][qtutocodt]absentmatrix𝐾0subscript𝐷𝑒1subscript𝐷𝑒2𝐾subscript𝐶𝑒𝐼0matrixsubscript𝑥𝑡superscriptsubscript𝑥𝑡𝑒matrix0𝐼𝐼0subscript𝐷𝑒20000matrixsubscript𝑞𝑡superscriptsubscript𝑢𝑡𝑜𝑐𝑜subscript𝑑𝑡\displaystyle=\begin{bmatrix}-K&0\\ D_{e1}-D_{e2}K&C_{e}\\ I&0\end{bmatrix}\begin{bmatrix}x_{t}\\ x_{t}^{e}\end{bmatrix}+\begin{bmatrix}0&I&I\\ 0&D_{e2}&0\\ 0&0&0\end{bmatrix}\begin{bmatrix}q_{t}\\ u_{t}^{oco}\\ d_{t}\end{bmatrix}= [ start_ARG start_ROW start_CELL - italic_K end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_D start_POSTSUBSCRIPT italic_e 1 end_POSTSUBSCRIPT - italic_D start_POSTSUBSCRIPT italic_e 2 end_POSTSUBSCRIPT italic_K end_CELL start_CELL italic_C start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_I end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_I end_CELL start_CELL italic_I end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_D start_POSTSUBSCRIPT italic_e 2 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]

We use the LFT representation FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) to formulate and state our robust stability theorem in the next subsection.

3.2 Scaled Small Gain Theorem

Our first stability result is a variation of the standard small gain theorem (see Section 5.4 of [21]). This provides a sufficient condition for the dynamics FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) to have a bounded gain from disturbance d𝑑ditalic_d to state x𝑥xitalic_x. Note stability here is in the sense of bounded gain in some induced norm.

Theorem 1.

Consider the interconnection FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) where P:pepe:𝑃subscript𝑝𝑒subscript𝑝𝑒P:\ell_{pe}\to\ell_{pe}italic_P : roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT → roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT and Γ:pepe:Γsubscript𝑝𝑒subscript𝑝𝑒\Gamma:\ell_{pe}\to\ell_{pe}roman_Γ : roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT → roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT are linear systems with finite induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm. Partition P𝑃Pitalic_P as:

[p¯x]=[P11P12P21P21][q¯d],matrix¯𝑝𝑥matrixsubscript𝑃11subscript𝑃12subscript𝑃21subscript𝑃21matrix¯𝑞𝑑\displaystyle\begin{bmatrix}\bar{p}\\ x\end{bmatrix}=\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{21}\end{bmatrix}\,\begin{bmatrix}\bar{q}\\ d\end{bmatrix},[ start_ARG start_ROW start_CELL over¯ start_ARG italic_p end_ARG end_CELL end_ROW start_ROW start_CELL italic_x end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL over¯ start_ARG italic_q end_ARG end_CELL end_ROW start_ROW start_CELL italic_d end_CELL end_ROW end_ARG ] , (11)

where p¯:=[pw^]assign¯𝑝delimited-[]𝑝^𝑤\bar{p}:=\left[\begin{smallmatrix}p\\ \hat{w}\end{smallmatrix}\right]over¯ start_ARG italic_p end_ARG := [ start_ROW start_CELL italic_p end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_w end_ARG end_CELL end_ROW ] and q¯:=[quoco]assign¯𝑞delimited-[]𝑞superscript𝑢𝑜𝑐𝑜\bar{q}:=\left[\begin{smallmatrix}q\\ u^{oco}\end{smallmatrix}\right]over¯ start_ARG italic_q end_ARG := [ start_ROW start_CELL italic_q end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT end_CELL end_ROW ] are the inputs and outputs of ΓΓ\Gammaroman_Γ. The interconnection has finite induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm, i.e. FU(P,Γ)<normsubscript𝐹𝑈𝑃Γ\|F_{U}(P,\Gamma)\|<\infty∥ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) ∥ < ∞, if P11Γ<1normsubscript𝑃11normΓ1\|P_{11}\|\,\|\Gamma\|<1∥ italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ ∥ roman_Γ ∥ < 1.

Proof.

The system P𝑃Pitalic_P is LTI so by the principle of superposition (assuming zero initial conditions):

p¯=P11q¯+P12d.¯𝑝subscript𝑃11¯𝑞subscript𝑃12𝑑\displaystyle\bar{p}=P_{11}\bar{q}+P_{12}d.over¯ start_ARG italic_p end_ARG = italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT over¯ start_ARG italic_q end_ARG + italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT italic_d . (12)

We can bound p¯¯𝑝\bar{p}over¯ start_ARG italic_p end_ARG using the triangle inequality and the definition of the induced norm:

p¯P11q¯+P12d.norm¯𝑝normsubscript𝑃11norm¯𝑞normsubscript𝑃12norm𝑑\displaystyle\|\bar{p}\|\leq\|P_{11}\|\,\|\bar{q}\|+\|P_{12}\|\,\|d\|.∥ over¯ start_ARG italic_p end_ARG ∥ ≤ ∥ italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ ∥ over¯ start_ARG italic_q end_ARG ∥ + ∥ italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ∥ ∥ italic_d ∥ . (13)

Next, q¯=Γp¯¯𝑞Γ¯𝑝\bar{q}=\Gamma\bar{p}over¯ start_ARG italic_q end_ARG = roman_Γ over¯ start_ARG italic_p end_ARG so that q¯Γp¯norm¯𝑞normΓnorm¯𝑝\|\bar{q}\|\leq\|\Gamma\|\,\|\bar{p}\|∥ over¯ start_ARG italic_q end_ARG ∥ ≤ ∥ roman_Γ ∥ ∥ over¯ start_ARG italic_p end_ARG ∥. Substitute this bound into (13) and re-arrange to obtain:

p¯P121P11Γd.norm¯𝑝normsubscript𝑃121normsubscript𝑃11normΓnorm𝑑\displaystyle\|\bar{p}\|\leq\frac{\|P_{12}\|}{1-\|P_{11}\|\|\Gamma\|}\,\|d\|.∥ over¯ start_ARG italic_p end_ARG ∥ ≤ divide start_ARG ∥ italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ∥ end_ARG start_ARG 1 - ∥ italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ ∥ roman_Γ ∥ end_ARG ∥ italic_d ∥ . (14)

This last step requires the small gain condition P11Γ<1normsubscript𝑃11normΓ1\|P_{11}\|\,\|\Gamma\|<1∥ italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ ∥ roman_Γ ∥ < 1 to obtain the bound on p¯norm¯𝑝\|\bar{p}\|∥ over¯ start_ARG italic_p end_ARG ∥.

Finally, the state is x=P21q¯+P22d𝑥subscript𝑃21¯𝑞subscript𝑃22𝑑x=P_{21}\bar{q}+P_{22}ditalic_x = italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT over¯ start_ARG italic_q end_ARG + italic_P start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT italic_d. We can use similar steps and the bound on p¯¯𝑝\bar{p}over¯ start_ARG italic_p end_ARG to obtain:

x[P22+P21P12Γ1P11Γ]d.norm𝑥delimited-[]normsubscript𝑃22normsubscript𝑃21normsubscript𝑃12normΓ1normsubscript𝑃11normΓnorm𝑑\displaystyle\|x\|\leq\left[\|P_{22}\|+\frac{\|P_{21}\|\,\|P_{12}\|\,\|\Gamma% \|}{1-\|P_{11}\|\,\|\Gamma\|}\right]\|d\|.∥ italic_x ∥ ≤ [ ∥ italic_P start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ∥ + divide start_ARG ∥ italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ∥ ∥ italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ∥ ∥ roman_Γ ∥ end_ARG start_ARG 1 - ∥ italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ ∥ roman_Γ ∥ end_ARG ] ∥ italic_d ∥ . (15)

Hence, FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) has finite induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm. ∎

The small gain condition in the previous theorem can be conservative as it does not exploit the block structure Γ=[Δ00MLTV]Γdelimited-[]Δ00subscript𝑀𝐿𝑇𝑉\Gamma=\left[\begin{smallmatrix}\Delta&0\\ 0&M_{LTV}\end{smallmatrix}\right]roman_Γ = [ start_ROW start_CELL roman_Δ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT end_CELL end_ROW ]. We can reduce the conservatism by normalizing the blocks and introducing scalings. Specifically, assume ΔδnormΔ𝛿\|\Delta\|\leq\delta∥ roman_Δ ∥ ≤ italic_δ and MLTVβnormsubscript𝑀𝐿𝑇𝑉𝛽\|M_{LTV}\|\leq\beta∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ ≤ italic_β. Define the normalized uncertainty and learning dynamics as: Δ~=1δΔ~Δ1𝛿Δ\tilde{\Delta}=\frac{1}{\delta}\Deltaover~ start_ARG roman_Δ end_ARG = divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG roman_Δ and M~LTV=1βMLTVsubscript~𝑀𝐿𝑇𝑉1𝛽subscript𝑀𝐿𝑇𝑉\tilde{M}_{LTV}=\frac{1}{\beta}M_{LTV}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_β end_ARG italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT. Stacking these together yields

Γ~:=[1δ001β]Γ=[1δΔ001βMLTV].assign~Γmatrix1𝛿001𝛽Γmatrix1𝛿Δ001𝛽subscript𝑀𝐿𝑇𝑉\displaystyle\tilde{\Gamma}:=\begin{bmatrix}\frac{1}{\delta}&0\\ 0&\frac{1}{\beta}\end{bmatrix}\Gamma=\begin{bmatrix}\frac{1}{\delta}\Delta&0\\ 0&\frac{1}{\beta}M_{LTV}\end{bmatrix}.over~ start_ARG roman_Γ end_ARG := [ start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_β end_ARG end_CELL end_ROW end_ARG ] roman_Γ = [ start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG roman_Δ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_β end_ARG italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (16)

The scaling normalizes each block so that Γ1normΓ1\|\Gamma\|\leq 1∥ roman_Γ ∥ ≤ 1.

Next, the uncertainty is LTI and hence d1Δ=Δd1subscript𝑑1ΔΔsubscript𝑑1d_{1}\Delta=\Delta d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ = roman_Δ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for any scalar d1>0subscript𝑑10d_{1}>0italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0. (In fact, this relation holds even if d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is also an LTI system but we will not pursue this generalization.) Similarly, the learning dynamics are also linear and hence d2MLTV=MLTVd2subscript𝑑2subscript𝑀𝐿𝑇𝑉subscript𝑀𝐿𝑇𝑉subscript𝑑2d_{2}M_{LTV}=M_{LTV}d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for any scalar d2>0subscript𝑑20d_{2}>0italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0. It follows that the normalized systems can be equivalently written, for any d1,d2>0subscript𝑑1subscript𝑑20d_{1},d_{2}>0italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, as:

Γ~:=[1d1δ001d2β]Γ[d100d2].assign~Γmatrix1subscript𝑑1𝛿001subscript𝑑2𝛽Γmatrixsubscript𝑑100subscript𝑑2\displaystyle\tilde{\Gamma}:=\begin{bmatrix}\frac{1}{d_{1}\delta}&0\\ 0&\frac{1}{d_{2}\beta}\end{bmatrix}\Gamma\begin{bmatrix}d_{1}&0\\ 0&d_{2}\end{bmatrix}.over~ start_ARG roman_Γ end_ARG := [ start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ end_ARG end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β end_ARG end_CELL end_ROW end_ARG ] roman_Γ [ start_ARG start_ROW start_CELL italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (17)

This discussion leads to the following scaled small gain result.

Theorem 2.

Consider the interconnection FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) where P:pepe:𝑃subscript𝑝𝑒subscript𝑝𝑒P:\ell_{pe}\to\ell_{pe}italic_P : roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT → roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT and Γ:pepe:Γsubscript𝑝𝑒subscript𝑝𝑒\Gamma:\ell_{pe}\to\ell_{pe}roman_Γ : roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT → roman_ℓ start_POSTSUBSCRIPT italic_p italic_e end_POSTSUBSCRIPT are linear systems with finite induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm. Assume Γ:=[Δ00MLTV]assignΓdelimited-[]Δ00subscript𝑀𝐿𝑇𝑉\Gamma:=\left[\begin{smallmatrix}\Delta&0\\ 0&M_{LTV}\end{smallmatrix}\right]roman_Γ := [ start_ROW start_CELL roman_Δ end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT end_CELL end_ROW ] where ΔδnormΔ𝛿\|\Delta\|\leq\delta∥ roman_Δ ∥ ≤ italic_δ and MLTVβnormsubscript𝑀𝐿𝑇𝑉𝛽\|M_{LTV}\|\leq\beta∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ ≤ italic_β. Partition P𝑃Pitalic_P as:

[p¯x]=[P11P12P21P21][q¯d],matrix¯𝑝𝑥matrixsubscript𝑃11subscript𝑃12subscript𝑃21subscript𝑃21matrix¯𝑞𝑑\displaystyle\begin{bmatrix}\bar{p}\\ x\end{bmatrix}=\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{21}\end{bmatrix}\,\begin{bmatrix}\bar{q}\\ d\end{bmatrix},[ start_ARG start_ROW start_CELL over¯ start_ARG italic_p end_ARG end_CELL end_ROW start_ROW start_CELL italic_x end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL over¯ start_ARG italic_q end_ARG end_CELL end_ROW start_ROW start_CELL italic_d end_CELL end_ROW end_ARG ] , (18)

where p¯:=[pw^]assign¯𝑝delimited-[]𝑝^𝑤\bar{p}:=\left[\begin{smallmatrix}p\\ \hat{w}\end{smallmatrix}\right]over¯ start_ARG italic_p end_ARG := [ start_ROW start_CELL italic_p end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_w end_ARG end_CELL end_ROW ] and q¯:=[quoco]assign¯𝑞delimited-[]𝑞superscript𝑢𝑜𝑐𝑜\bar{q}:=\left[\begin{smallmatrix}q\\ u^{oco}\end{smallmatrix}\right]over¯ start_ARG italic_q end_ARG := [ start_ROW start_CELL italic_q end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT end_CELL end_ROW ] are the inputs and outputs of ΓΓ\Gammaroman_Γ. The interconnection has finite induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm, i.e. FU(P,Γ)<normsubscript𝐹𝑈𝑃Γ\|F_{U}(P,\Gamma)\|<\infty∥ italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) ∥ < ∞, if there exists scalars d1,d2>0subscript𝑑1subscript𝑑20d_{1},d_{2}>0italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that

P~11:=[1d1I001d2I]P11[d1δI00d2βI]assignsubscript~𝑃11matrix1subscript𝑑1𝐼001subscript𝑑2𝐼subscript𝑃11matrixsubscript𝑑1𝛿𝐼00subscript𝑑2𝛽𝐼\displaystyle\tilde{P}_{11}:=\begin{bmatrix}\frac{1}{d_{1}}\,I&0\\ 0&\frac{1}{d_{2}}\,I\end{bmatrix}P_{11}\begin{bmatrix}d_{1}\delta\,I&0\\ 0&d_{2}\beta\,I\end{bmatrix}over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_I end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_I end_CELL end_ROW end_ARG ] italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ italic_I end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β italic_I end_CELL end_ROW end_ARG ] (19)

satisfies P~11<1normsubscript~𝑃111\|\tilde{P}_{11}\|<1∥ over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ < 1.

Proof.

Define a scaled version of the nominal dynamics P𝑃Pitalic_P as:

P~=[1d1I0001d2I000I][P11P12P21P22][d1δI000d2βI000I].~𝑃delimited-[]1subscript𝑑1𝐼0001subscript𝑑2𝐼0missing-subexpressionmissing-subexpressionmissing-subexpression00𝐼matrixsubscript𝑃11subscript𝑃12subscript𝑃21subscript𝑃22delimited-[]subscript𝑑1𝛿𝐼000subscript𝑑2𝛽𝐼0missing-subexpressionmissing-subexpressionmissing-subexpression00𝐼\displaystyle\tilde{P}=\left[\begin{array}[]{cc|c}\frac{1}{d_{1}}I&0&0\\ 0&\frac{1}{d_{2}}I&0\\ \hline\cr 0&0&I\end{array}\right]\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{22}\end{bmatrix}\left[\begin{array}[]{cc|c}d_{1}\delta\,I&0&0\\ 0&d_{2}\beta\,I&0\\ \hline\cr 0&0&I\end{array}\right].over~ start_ARG italic_P end_ARG = [ start_ARRAY start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_I end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_I end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_I end_CELL end_ROW end_ARRAY ] [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARRAY start_ROW start_CELL italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ italic_I end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β italic_I end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_I end_CELL end_ROW end_ARRAY ] .

The constants introduced in the scaled plant P~~𝑃\tilde{P}over~ start_ARG italic_P end_ARG cancel those introduced for Γ~~Γ\tilde{\Gamma}over~ start_ARG roman_Γ end_ARG in (16). In other words, FU(P,Γ)subscript𝐹𝑈𝑃ΓF_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) and FU(P~,Γ~)subscript𝐹𝑈~𝑃~ΓF_{U}(\tilde{P},\tilde{\Gamma})italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_P end_ARG , over~ start_ARG roman_Γ end_ARG ) define the same dynamics from d𝑑ditalic_d to x𝑥xitalic_x. Moreover, P~11<1normsubscript~𝑃111\|\tilde{P}_{11}\|<1∥ over~ start_ARG italic_P end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ∥ < 1 and Γ1normΓ1\|\Gamma\|\leq 1∥ roman_Γ ∥ ≤ 1 by assumption. It follows from the small gain theorem (Theorem 1) that FU(P~,Γ~)=FU(P,Γ)subscript𝐹𝑈~𝑃~Γsubscript𝐹𝑈𝑃ΓF_{U}(\tilde{P},\tilde{\Gamma})=F_{U}(P,\Gamma)italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( over~ start_ARG italic_P end_ARG , over~ start_ARG roman_Γ end_ARG ) = italic_F start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_P , roman_Γ ) has finite induced psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm. ∎

The scalings d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in the robust stability condition (Theorem 2) can be used to reduce the conservatism of the small gain condition (Theorem 1). They are known as D𝐷Ditalic_D-scales in the robust control literature ([22] and Chapter 11 in [11]) and are used in structured singular value robust stability tests.

3.3 Bounding the LTV Dynamics

In this section, we provide a result specific to the induced subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm for the OCO control implementation. The induced subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm is useful as it allows us to relate MLTVsubscriptnormsubscript𝑀𝐿𝑇𝑉\|M_{LTV}\|_{\infty\to\infty}∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT to Mtsubscriptnormsubscript𝑀𝑡\|M_{t}\|_{\infty\to\infty}∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT. The robust stability constraint can then be imposed as a point-wise in time constraint β𝛽\betaitalic_β on the coefficients Mtβsubscriptnormsubscript𝑀𝑡𝛽\|M_{t}\|_{\infty\to\infty}\leq\beta∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ italic_β in the projection step of OPGD. We discuss this further in Section 4.2 and 4.3.

The dynamics MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT in (8) can be expressed as:

utocosuperscriptsubscript𝑢𝑡𝑜𝑐𝑜\displaystyle u_{t}^{oco}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT =MtW^t,absentsubscript𝑀𝑡subscript^𝑊𝑡\displaystyle=M_{t}\hat{W}_{t},= italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (20)

where

Mtsubscript𝑀𝑡\displaystyle M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :=[Mt[0]Mt[H1]]nu×nxH, andformulae-sequenceassignabsentmatrixsuperscriptsubscript𝑀𝑡delimited-[]0superscriptsubscript𝑀𝑡delimited-[]𝐻1superscriptsubscript𝑛𝑢subscript𝑛𝑥𝐻 and\displaystyle:=\begin{bmatrix}M_{t}^{[0]}&\cdots&M_{t}^{[H-1]}\end{bmatrix}\in% \mathbb{R}^{n_{u}\times n_{x}H},\mbox{ and }:= [ start_ARG start_ROW start_CELL italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 0 ] end_POSTSUPERSCRIPT end_CELL start_CELL ⋯ end_CELL start_CELL italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_H - 1 ] end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_H end_POSTSUPERSCRIPT , and (21)
W^tsubscript^𝑊𝑡\displaystyle\hat{W}_{t}over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT :=[w^tw^tH+1]nxHassignabsentdelimited-[]subscript^𝑤𝑡subscript^𝑤𝑡𝐻1superscriptsubscript𝑛𝑥𝐻\displaystyle:=\left[\begin{smallmatrix}\hat{w}_{t}\\ \vdots\\ \hat{w}_{t-H+1}\end{smallmatrix}\right]\in\mathbb{R}^{n_{x}H}:= [ start_ROW start_CELL over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t - italic_H + 1 end_POSTSUBSCRIPT end_CELL end_ROW ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_H end_POSTSUPERSCRIPT (22)

are the stacked FIR coefficients and estimated disturbance history. The following theorem relates the induced subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm of the system MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT to the matrix induced \infty-norm of Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Theorem 3.

Let MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT be the LTV system defined in (20) and Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the stacked gains defined in (21). Then

MLTV=suptMt.subscriptnormsubscript𝑀𝐿𝑇𝑉subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\displaystyle\|M_{LTV}\|_{\infty\to\infty}=\sup_{t}\|M_{t}\|_{\infty\to\infty}.∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT . (23)
Proof.

The equality in (23) is shown in two steps: (A) MLTVsuptMtsubscriptnormsubscript𝑀𝐿𝑇𝑉subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\|M_{LTV}\|_{\infty\to\infty}\leq\sup_{t}\|M_{t}\|_{\infty\to\infty}∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT and (B) MLTVsuptMtsubscriptnormsubscript𝑀𝐿𝑇𝑉subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\|M_{LTV}\|_{\infty\to\infty}\geq\sup_{t}\|M_{t}\|_{\infty\to\infty}∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≥ roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT.

First, we show direction (A). Let w^^𝑤\hat{w}over^ start_ARG italic_w end_ARG and uocosuperscript𝑢𝑜𝑐𝑜u^{oco}italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT be any input-output pair of MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT. Equation (20) and the definition of the induced matrix norm imply that

uocosubscriptnormsuperscript𝑢𝑜𝑐𝑜\displaystyle\|u^{oco}\|_{\infty}∥ italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT =suptMtW^tabsentsubscriptsupremum𝑡subscriptnormsubscript𝑀𝑡subscript^𝑊𝑡\displaystyle=\sup_{t}\|M_{t}\hat{W}_{t}\|_{\infty}= roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT
suptMtw^.absentsubscriptsupremum𝑡subscriptnormsubscript𝑀𝑡subscriptnorm^𝑤\displaystyle\leq\sup_{t}\|M_{t}\|_{\infty\to\infty}\cdot\|\hat{w}\|_{\infty}.≤ roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ⋅ ∥ over^ start_ARG italic_w end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT . (24)

Thus, uocow^suptMtsubscriptnormsuperscript𝑢𝑜𝑐𝑜subscriptnorm^𝑤subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\frac{\|u^{oco}\|_{\infty}}{\|\hat{w}\|_{\infty}}\leq\sup_{t}\,\|M_{t}\|_{% \infty\to\infty}divide start_ARG ∥ italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG start_ARG ∥ over^ start_ARG italic_w end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG ≤ roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT so that MLTVsuptMtsubscriptnormsubscript𝑀𝐿𝑇𝑉subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\|M_{LTV}\|_{\infty\to\infty}\leq\sup_{t}\|M_{t}\|_{\infty\to\infty}∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT. Hence, claim (A) holds.

Next, we show direction (B). Suppose suptMtsubscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\sup_{t}\|M_{t}\|_{\infty\to\infty}roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT achieves its maximum at some finite time t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. (The proof can be modified if the supremum occurs as t𝑡t\to\inftyitalic_t → ∞.) Then there exists a vector W^0subscript^𝑊0\hat{W}_{0}over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that

W^0=1 and Mt0W^0=suptMt.subscriptnormsubscript^𝑊01 and subscriptnormsubscript𝑀subscript𝑡0subscript^𝑊0subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡\displaystyle\|\hat{W}_{0}\|_{\infty}=1\mbox{ and }\|M_{t_{0}}\,\hat{W}_{0}\|_% {\infty}=\sup_{t}\|M_{t}\|_{\infty\to\infty}.∥ over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = 1 and ∥ italic_M start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT .

We can use the vector W^0subscript^𝑊0\hat{W}_{0}over^ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to construct a signal w^0subscript^𝑤0\hat{w}_{0}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that uoco=MLTVw^0superscript𝑢𝑜𝑐𝑜subscript𝑀𝐿𝑇𝑉subscript^𝑤0u^{oco}=M_{LTV}\,\hat{w}_{0}italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT = italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT satisfies

uoco[suptMt]w^0.subscriptnormsuperscript𝑢𝑜𝑐𝑜delimited-[]subscriptsupremum𝑡subscriptnormsubscript𝑀𝑡subscriptnormsubscript^𝑤0\displaystyle\|u^{oco}\|_{\infty}\geq\left[\sup_{t}\|M_{t}\|_{\infty\to\infty}% \right]\|\hat{w}_{0}\|_{\infty}.∥ italic_u start_POSTSUPERSCRIPT italic_o italic_c italic_o end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≥ [ roman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ] ∥ over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT .

Hence, claim (B) holds. ∎

4 Application to OCO

In this section, we demonstrate how the main results can be applied to ensure robust stability of existing OCO controllers. We focus on the OCO controllers in [10, 7] where the coefficients of MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT are updated via OPGD.

4.1 Estimator Design

The class of OCO controllers defined by [10] considers the feedback system with OCO control (Figure 2) and no uncertainty (Figure 1) when Δ(z)=0Δ𝑧0\Delta(z)=0roman_Δ ( italic_z ) = 0. In this case, a perfect plant model is assumed G~(z)=G(z)~𝐺𝑧𝐺𝑧\tilde{G}(z)=G(z)over~ start_ARG italic_G end_ARG ( italic_z ) = italic_G ( italic_z ). Thus, the nominal plant dynamics can be used to design an estimator E(z)𝐸𝑧E(z)italic_E ( italic_z ) and OPGD to update the coefficients in MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT. Later, we will show how the OPGD projection step can be modified to ensure robust stability for the case that there is uncertainty Δ(z)0Δ𝑧0\Delta(z)\neq 0roman_Δ ( italic_z ) ≠ 0.

Without uncertainty, the plant dynamics with unknown disturbance reduce to:

xt+1=Axt+But+Bdt.subscript𝑥𝑡1𝐴subscript𝑥𝑡𝐵subscript𝑢𝑡𝐵subscript𝑑𝑡\displaystyle x_{t+1}=Ax_{t}+Bu_{t}+Bd_{t}.italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_A italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_B italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Note that Bdt𝐵subscript𝑑𝑡Bd_{t}italic_B italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the effective disturbance on the state at time t𝑡titalic_t. Assuming the state xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is measurable, we can perfectly reconstruct this effective disturbance at the previous time step. Use the measured state and rearranging the plant dynamics:

w^t=xtAxt1But1.subscript^𝑤𝑡subscript𝑥𝑡𝐴subscript𝑥𝑡1𝐵subscript𝑢𝑡1\displaystyle\hat{w}_{t}=x_{t}-Ax_{t-1}-Bu_{t-1}.over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_A italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_B italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT . (25)

With no uncertainty, this estimator perfectly reconstructs the effective disturbance with a one-step delay: w^t=Bdt1subscript^𝑤𝑡𝐵subscript𝑑𝑡1\hat{w}_{t}=Bd_{t-1}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_B italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. However, perfect reconstruction is no longer guaranteed with uncertainty, i.e. if Δ(z)0Δ𝑧0\Delta(z)\neq 0roman_Δ ( italic_z ) ≠ 0 then w^tBdt1subscript^𝑤𝑡𝐵subscript𝑑𝑡1\hat{w}_{t}\neq Bd_{t-1}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ italic_B italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. In this case, w^tsubscript^𝑤𝑡\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is considered an estimate of Bdt1𝐵subscript𝑑𝑡1Bd_{t-1}italic_B italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT.

The disturbance reconstruction (25) can be expressed in state-space form as:

xt+1esubscriptsuperscript𝑥𝑒𝑡1\displaystyle x^{e}_{t+1}italic_x start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =0xteAxtButabsent0subscriptsuperscript𝑥𝑒𝑡𝐴subscript𝑥𝑡𝐵subscript𝑢𝑡\displaystyle=0\,x^{e}_{t}-A\,x_{t}-Bu_{t}= 0 italic_x start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_A italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_B italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
w^tsubscript^𝑤𝑡\displaystyle\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =xte+xt,absentsubscriptsuperscript𝑥𝑒𝑡subscript𝑥𝑡\displaystyle=x^{e}_{t}+x_{t},= italic_x start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where xte=Axt1But1subscriptsuperscript𝑥𝑒𝑡𝐴subscript𝑥𝑡1𝐵subscript𝑢𝑡1x^{e}_{t}=-Ax_{t-1}-Bu_{t-1}italic_x start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_A italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT - italic_B italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is the estimator state. This has the form of the general LTI estimator E(z)𝐸𝑧E(z)italic_E ( italic_z ) in (6). The estimates w^tsubscript^𝑤𝑡\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of past disturbances are used to update the FIR coefficients Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT defined in (21) by minimizing an “ideal” cost which we describe next.

4.2 OPGD on an Ideal Cost

The coefficients Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are updated at each time step via OPGD in the direction of an “ideal” (per-step) cost. This cost is associated with the nominal plant dynamics (3) and a per-step cost function. Here, we consider quadratic per-step costs:

c(xt,dt)=xtQxt+utRut,𝑐subscript𝑥𝑡subscript𝑑𝑡superscriptsubscript𝑥𝑡top𝑄subscript𝑥𝑡superscriptsubscript𝑢𝑡top𝑅subscript𝑢𝑡\displaystyle c(x_{t},d_{t})=x_{t}^{\top}Q\,x_{t}+u_{t}^{\top}R\,u_{t},italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (26)

where Q=Q0nx×nx𝑄superscript𝑄topsucceeds-or-equals0superscriptsubscript𝑛𝑥subscript𝑛𝑥Q=Q^{\top}\succeq 0\in\mathbb{R}^{n_{x}\times n_{x}}italic_Q = italic_Q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⪰ 0 ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and R=R0nu×nu𝑅superscript𝑅topsucceeds0superscriptsubscript𝑛𝑢subscript𝑛𝑢R=R^{\top}\succ 0\in\mathbb{R}^{n_{u}\times n_{u}}italic_R = italic_R start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ≻ 0 ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Note that the finite-horizon cost is defined as:

JT(x,d)=t=0Tc(xt,dt),subscript𝐽𝑇𝑥𝑑superscriptsubscript𝑡0𝑇𝑐subscript𝑥𝑡subscript𝑑𝑡\displaystyle J_{T}(x,d)=\sum_{t=0}^{T}c(x_{t},d_{t}),italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x , italic_d ) = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (27)

where T𝑇Titalic_T is the total time horizon. The ideal cost g(M)𝑔𝑀g(M)italic_g ( italic_M ) is defined for any static gain Mnu×nxH𝑀superscriptsubscript𝑛𝑢subscript𝑛𝑥𝐻M\subset\mathbb{R}^{n_{u}\times n_{x}H}italic_M ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_H end_POSTSUPERSCRIPT based on this per-step cost (26) which is computed and defined as follows.

Let x~τnxsubscript~𝑥𝜏superscriptsubscript𝑛𝑥\tilde{x}_{\tau}\in\mathbb{R}^{n_{x}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and u~τnusubscript~𝑢𝜏superscriptsubscript𝑛𝑢\tilde{u}_{\tau}\in\mathbb{R}^{n_{u}}over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote the ideal state and control input at time τ𝜏\tauitalic_τ, respectively. The ideal state and input are initialized at τ=tH𝜏𝑡𝐻\tau=t-Hitalic_τ = italic_t - italic_H by:

x~tH=0 and u~tH=i=0H1M[i1]wtHi.subscript~𝑥𝑡𝐻0 and subscript~𝑢𝑡𝐻superscriptsubscript𝑖0𝐻1superscript𝑀delimited-[]𝑖1subscript𝑤𝑡𝐻𝑖\displaystyle\tilde{x}_{t-H}=0\mbox{ and }\tilde{u}_{t-H}=\sum_{i=0}^{H-1}M^{[% i-1]}\,w_{t-H-i}.over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t - italic_H end_POSTSUBSCRIPT = 0 and over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t - italic_H end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT [ italic_i - 1 ] end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_t - italic_H - italic_i end_POSTSUBSCRIPT . (28)

where t𝑡titalic_t is the current time. The ideal state and control input are then computed for τ=tH+1,,t𝜏𝑡𝐻1𝑡\tau=t-H+1,\ldots,titalic_τ = italic_t - italic_H + 1 , … , italic_t by iterating over the plant dynamics with the static gains M𝑀Mitalic_M:

x~τsubscript~𝑥𝜏\displaystyle\tilde{x}_{\tau}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT =Ax~τ1+Bu~τ1+w^τ1absent𝐴subscript~𝑥𝜏1𝐵subscript~𝑢𝜏1subscript^𝑤𝜏1\displaystyle=A\,\tilde{x}_{\tau-1}+B\,\tilde{u}_{\tau-1}+\hat{w}_{\tau-1}= italic_A over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT + italic_B over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT + over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT (29)
u~τsubscript~𝑢𝜏\displaystyle\tilde{u}_{\tau}over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT =Kx~τ+i=0H1M[i]w^τi.absent𝐾subscript~𝑥𝜏superscriptsubscript𝑖0𝐻1superscript𝑀delimited-[]𝑖subscript^𝑤𝜏𝑖\displaystyle=-K\,\tilde{x}_{\tau}+\sum_{i=0}^{H-1}M^{[i]}\,\hat{w}_{\tau-i}.= - italic_K over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_τ - italic_i end_POSTSUBSCRIPT . (30)

The ideal cost is then defined as g(M):=c(x~t,u~t)assign𝑔𝑀𝑐subscript~𝑥𝑡subscript~𝑢𝑡g(M):=c(\tilde{x}_{t},\tilde{u}_{t})italic_g ( italic_M ) := italic_c ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). In other words, the ideal cost g(M)𝑔𝑀g(M)italic_g ( italic_M ) is the cost of the plant dynamics evolving with static gain M𝑀Mitalic_M over the learning horizon H𝐻Hitalic_H, neglecting dynamics beyond time tH𝑡𝐻t-Hitalic_t - italic_H. The coefficients Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are updated via OPGD on this ideal cost:

Mt+1=Π(MtηMg(Mt)),subscript𝑀𝑡1subscriptΠsubscript𝑀𝑡𝜂subscript𝑀𝑔subscript𝑀𝑡\displaystyle M_{t+1}=\Pi_{\mathcal{M}}\left(M_{t}-\eta\nabla_{M}g(M_{t})% \right),italic_M start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = roman_Π start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η ∇ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT italic_g ( italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) , (31)

where η𝜂\etaitalic_η is the learning rate, and ΠsubscriptΠ\Pi_{\mathcal{M}}roman_Π start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT is the projection of the gradient step of Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT onto a constraint set \mathcal{M}caligraphic_M. Additional details are given in [10, 7]. Next, we show how the constraint set \mathcal{M}caligraphic_M can be modified to ensure the robust stability of the OCO feedback system (Figures 1 and 2) when Δ(z)0Δ𝑧0\Delta(z)\neq 0roman_Δ ( italic_z ) ≠ 0.

4.3 Robust OCO Control

Assuming the uncertainty Δ(z)Δ𝑧\Delta(z)roman_Δ ( italic_z ) is bounded by some δ𝛿\deltaitalic_δ, i.e., ΔδsubscriptnormΔ𝛿\|\Delta\|_{\infty\to\infty}\leq\delta∥ roman_Δ ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ italic_δ, we can use a bisection to find the required bound β𝛽\betaitalic_β on the FIR filter MLTVβsubscriptnormsubscript𝑀𝐿𝑇𝑉𝛽\|M_{LTV}\|_{\infty\to\infty}\leq\beta∥ italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ italic_β such that the robust stability condition (Theorem 2) is satisfied. Larger values of β𝛽\betaitalic_β risk stability, yet can improve disturbance rejection as they allow the OCO more freedom to adapt to the gains in MLTVsubscript𝑀𝐿𝑇𝑉M_{LTV}italic_M start_POSTSUBSCRIPT italic_L italic_T italic_V end_POSTSUBSCRIPT. Thus, it is important to determine the largest possible value of β𝛽\betaitalic_β such that the robust stability condition holds. We refer to this β𝛽\betaitalic_β as the stability bound. Theorem 3 allows us to impose this constraint as a point-wise in time constraint on the FIR coefficients Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Once the constraint β𝛽\betaitalic_β has been determined, we can impose the constraint by defining the constraint set \mathcal{M}caligraphic_M as:

:={Mnu×nxH:Mβ}.assignconditional-set𝑀superscriptsubscript𝑛𝑢subscript𝑛𝑥𝐻subscriptnorm𝑀𝛽\displaystyle\mathcal{M}:=\Big{\{}M\in\mathbb{R}^{n_{u}\times n_{x}H}:\|M\|_{% \infty\to\infty}\leq\beta\Big{\}}.caligraphic_M := { italic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_H end_POSTSUPERSCRIPT : ∥ italic_M ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ italic_β } . (32)

Thus, the projection ΠsubscriptΠ\Pi_{\mathcal{M}}roman_Π start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT can be implemented by:

Mt+1={Mstep,Mstepββ(MstepMstep),Mstep>β,subscript𝑀𝑡1casessubscript𝑀𝑠𝑡𝑒𝑝subscriptnormsubscript𝑀𝑠𝑡𝑒𝑝𝛽𝛽subscript𝑀𝑠𝑡𝑒𝑝subscriptnormsubscript𝑀𝑠𝑡𝑒𝑝subscriptnormsubscript𝑀𝑠𝑡𝑒𝑝𝛽\displaystyle M_{t+1}=\begin{cases}M_{step},&\|M_{step}\|_{\infty\to\infty}% \leq\beta\\ \beta\left(\frac{M_{step}}{\|M_{step}\|_{\infty\to\infty}}\right),&\|M_{step}% \|_{\infty\to\infty}>\beta,\end{cases}italic_M start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = { start_ROW start_CELL italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT , end_CELL start_CELL ∥ italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT ≤ italic_β end_CELL end_ROW start_ROW start_CELL italic_β ( divide start_ARG italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT end_ARG start_ARG ∥ italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT end_ARG ) , end_CELL start_CELL ∥ italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ → ∞ end_POSTSUBSCRIPT > italic_β , end_CELL end_ROW (33)

where Mstep:=MtηMg(Mt)assignsubscript𝑀𝑠𝑡𝑒𝑝subscript𝑀𝑡𝜂subscript𝑀𝑔subscript𝑀𝑡M_{step}:=M_{t}-\eta\nabla_{M}g(M_{t})italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT := italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_η ∇ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT italic_g ( italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the gradient step of the FIR coefficients Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, defined at time t𝑡titalic_t. The constraint set \mathcal{M}caligraphic_M defined in (32) and projection ΠsubscriptΠ\Pi_{\mathcal{M}}roman_Π start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT in (33) can be implemented as part of Algorithm 1 in [10]. The numerical results in the following section are based on this implementation.

5 Numerical Results

In this section, we provide numerical results of OPGD on a plant with uncertainty. Although we do not explicitly use the robust stability condition (Theorem 2) to compute the stability bound β𝛽\betaitalic_β, we perform numerical studies to illustrate its effect. Future studies will focus on computing the exact bound, while the results here suggest that a stability bound β𝛽\betaitalic_β exists.

Here, unconstrained OCO (U-OCO) refers to OCO control with β=𝛽\beta=\inftyitalic_β = ∞, i.e., unconstrained FIR filter gains Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Constrained OCO (C-OCO) refers to OCO control with gains Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bounded by some β<𝛽\beta<\inftyitalic_β < ∞. We compare results between U-OCO and C-OCO on the following models:

G(z)𝐺𝑧\displaystyle G(z)italic_G ( italic_z ) =0.1z0.9absent0.1𝑧0.9\displaystyle=\frac{0.1}{z-0.9}= divide start_ARG 0.1 end_ARG start_ARG italic_z - 0.9 end_ARG
F(z)𝐹𝑧\displaystyle F(z)italic_F ( italic_z ) =0.1185z+0.1145z21.672z+0.9048absent0.1185𝑧0.1145superscript𝑧21.672𝑧0.9048\displaystyle=\frac{0.1185z+0.1145}{z^{2}-1.672z+0.9048}= divide start_ARG 0.1185 italic_z + 0.1145 end_ARG start_ARG italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1.672 italic_z + 0.9048 end_ARG

where G(z)𝐺𝑧G(z)italic_G ( italic_z ) and F(z)𝐹𝑧F(z)italic_F ( italic_z ) are the nominal plant and unmodeled high frequency actuator dynamics, respectively. Note that Δ(z)=F(z)1Δ𝑧𝐹𝑧1\Delta(z)=F(z)-1roman_Δ ( italic_z ) = italic_F ( italic_z ) - 1 and G~(z)=G(z)F(z)~𝐺𝑧𝐺𝑧𝐹𝑧\tilde{G}(z)=G(z)F(z)over~ start_ARG italic_G end_ARG ( italic_z ) = italic_G ( italic_z ) italic_F ( italic_z ). The following disturbance dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT was generated to perturb the control input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

dt={1000t500100500<tT,subscript𝑑𝑡cases1000𝑡500100500𝑡𝑇\displaystyle d_{t}=\begin{cases}100&0\leq t\leq 500\\ -100&500<t\leq T,\end{cases}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { start_ROW start_CELL 100 end_CELL start_CELL 0 ≤ italic_t ≤ 500 end_CELL end_ROW start_ROW start_CELL - 100 end_CELL start_CELL 500 < italic_t ≤ italic_T , end_CELL end_ROW

where the time horizon is T=1000𝑇1000T=1000italic_T = 1000. We use the quadratic per-step cost c(xt,dt)𝑐subscript𝑥𝑡subscript𝑑𝑡c(x_{t},d_{t})italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and total cost JT(x,d)subscript𝐽𝑇𝑥𝑑J_{T}(x,d)italic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x , italic_d ) defined in (26) and (27), respectively, with Q=1𝑄1Q=1italic_Q = 1 and R=101𝑅superscript101R=10^{-1}italic_R = 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. The state-feedback gain K=0.15𝐾0.15K=0.15italic_K = 0.15, learning horizon H=1𝐻1H=1italic_H = 1, and learning rate η=5×104𝜂5superscript104\eta=5\times 10^{-4}italic_η = 5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT are used for all simulations. Note that K=0.15𝐾0.15K=0.15italic_K = 0.15 is stabilizing for both the nominal and true plant dynamics.

Figure 4 shows the per-step cost c(xt,dt)𝑐subscript𝑥𝑡subscript𝑑𝑡c(x_{t},d_{t})italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and estimated disturbance w^tsubscript^𝑤𝑡\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of U-OCO at each time t𝑡titalic_t. We compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. Again, a perfect model is without uncertainty Δ(z)=0Δ𝑧0\Delta(z)=0roman_Δ ( italic_z ) = 0, and an imperfect model is with uncertainty Δ(z)0Δ𝑧0\Delta(z)\neq 0roman_Δ ( italic_z ) ≠ 0. The disturbance is perfectly reconstructed w^t=Bdt1subscript^𝑤𝑡𝐵subscript𝑑𝑡1\hat{w}_{t}=Bd_{t-1}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_B italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT(see Section 4.1) with a perfect plant model. However, with an imperfect plant model, this is not the case w^tBdt1subscript^𝑤𝑡𝐵subscript𝑑𝑡1\hat{w}_{t}\neq Bd_{t-1}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ italic_B italic_d start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. Since the ideal cost g(M)𝑔𝑀g(M)italic_g ( italic_M ) computation assumes a perfect plant model and disturbance estimates, this mismatch introduces an error in the coefficient update Mt+1subscript𝑀𝑡1M_{t+1}italic_M start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT. This causes an instability which is reflected by the per-step cost and estimated disturbance growing unbounded. On the other hand, U-OCO performance is stable without uncertainty because the disturbance is estimated perfectly. Thus, the constraint β𝛽\betaitalic_β is needed on the coefficient update to ensure stability for the imperfect plant model.

Refer to caption
Figure 4: Per-step cost (top) and disturbance estimate (bottom) of running U-OCO on a perfect (red dashed) and imperfect (blue solid) plant model. U-OCO is stable with a perfect model and unstable for an imperfect model.

Figure 5 shows the per-step cost c(xt,dt)𝑐subscript𝑥𝑡subscript𝑑𝑡c(x_{t},d_{t})italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and estimated disturbance w^tsubscript^𝑤𝑡\hat{w}_{t}over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of C-OCO for β=1.5𝛽1.5\beta=1.5italic_β = 1.5 at each time t𝑡titalic_t. Again, we compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. As mentioned before, an error in the disturbance estimate introduces an error in the ideal cost gradient. The ideal cost gradient error can cause the gradient step Mstep=MtMg(Mt)subscript𝑀𝑠𝑡𝑒𝑝subscript𝑀𝑡subscript𝑀𝑔subscript𝑀𝑡M_{step}=M_{t}-\nabla_{M}g(M_{t})italic_M start_POSTSUBSCRIPT italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - ∇ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT italic_g ( italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to grow too large in the wrong direction. When the constraint β𝛽\betaitalic_β is chosen such that the robust stability condition (Theorem 2) is satisfied, the effect of uncertainty induced error on the gradient step of the coefficient update is limited. This is illustrated in Figure 5 as the performance of C-OCO on the imperfect plant model eventually recovers the performance on the perfect model with β=1.5𝛽1.5\beta=1.5italic_β = 1.5. Thus, imposing the constraint β𝛽\betaitalic_β can ensure that OCO is robust to uncertainty.

As mentioned in Section 4.3, the choice of β𝛽\betaitalic_β is critical. Figure 6 shows the averaged per-step cost JT(x,d)/Tsubscript𝐽𝑇𝑥𝑑𝑇J_{T}(x,d)/Titalic_J start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x , italic_d ) / italic_T for C-OCO as a function of β𝛽\betaitalic_β. Again, we compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. When β=0𝛽0\beta=0italic_β = 0, the OCO has no freedom to learn the disturbance, and pure state-feedback (SF) is recovered for both the perfect and imperfect plants (red and blue circles, respectively). As β𝛽\betaitalic_β is increased, the OCO is allowed more freedom to learn the disturbance, and we see similar improved performance in both the perfect and imperfect plants. However, when β𝛽\betaitalic_β is ”too large” such that the robust stability condition (Theorem 2) no longer holds, C-OCO on the imperfect plant becomes unstable. Figure 6 suggests that the stability bound occurs around β=1.5𝛽1.5\beta=1.5italic_β = 1.5. Note that once the constraint β𝛽\betaitalic_β becomes inactive, C-OCO recovers U-OCO performance for the perfect and imperfect plants (red and blue squares, respectively). For the perfect plant, this indicates a limit as to how much the OCO can improve on the baseline controller. For the imperfect plant, this indicates a limit as to how much the OCO performance can be degraded by uncertainty. Hence, there is this trade off between OCO performance and robustness to uncertainty.

Refer to caption
Figure 5: Per-step cost (top) and disturbance estimate (bottom) of running C-OCO at β=1.5𝛽1.5\beta=1.5italic_β = 1.5 on a perfect (red dashed) and imperfect (blue solid) plant model. C-OCO is stable for the perfect and imperfect models.
Refer to caption
Figure 6: Averaged per-step cost of running C-OCO for varying β𝛽\betaitalic_β on a perfect (red dashed) and imperfect (blue solid) plant model. C-OCO results in improved performance for the perfect and imperfect models until the constraint becomes too large and C-OCO on the imperfect model becomes unstable.

6 Conclusion

In this paper, we establish a robust stability condition using the small gain theorem for a class of OCO controllers with memory and use this result to develop an OCO control algorithm (C-OCO) robust to model uncertainty. In particular, we impose this constraint on the controller by bounding the LTV dynamics of the OCO controller point-wise in time. We provide numerical results to illustrate that imposing the robust stability constraint keeps the closed-loop system stable when it would go unstable otherwise. Future work will study the numerical implementation of the scaled small gain theorem to compute the stability bound β𝛽\betaitalic_β.

References

  • [1] O. Anava, E. Hazan, and S. Mannor, “Online convex optimization against adversaries with memory and application to statistical arbitrage,” 2014.
  • [2] E. Hazan, “The Convex Optimization Approach to Regret Minimization,” in Optimization for Machine Learning, The MIT Press, 09 2011.
  • [3] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th international conference on machine learning (icml-03), pp. 928–936, 2003.
  • [4] E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 2-3, pp. 169–192, 2007.
  • [5] S. Shalev-Shwartz, “Online learning and online convex optimization,” Foundations and trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
  • [6] E. Hazan, “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 3-4, pp. 157–325, 2016.
  • [7] N. Agarwal, E. Hazan, and K. Singh, “Logarithmic regret for online control,” in Advances in Neural Information Processing Systems, pp. 10175–10184, 2019.
  • [8] D. Foster and M. Simchowitz, “Logarithmic regret for adversarial online control,” in International Conference on Machine Learning, pp. 3211–3221, 2020.
  • [9] G. Goel, N. Agarwal, K. Singh, and E. Hazan, “Best of both worlds in online control: Competitive ratio and policy regret,” arXiv preprint arXiv:2211.11219, 2022.
  • [10] N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online control with adversarial disturbances,” in International Conference on Machine Learning, pp. 111–119, PMLR, 2019.
  • [11] K. Zhou, J. Doyle, and K. Glover, Robust and optimal control. Pearson, 1995.
  • [12] S. Skogestad and I. Postlethwaite, Multivariable Feedback Control: Analysis and Design. John Wiley and Sons, 2nd ed., 2005.
  • [13] Y. Rahman, A. Xie, J. B. Hoagg, and D. S. Bernstein, “A tutorial and overview of retrospective cost adaptive control,” in 2016 American Control Conference, pp. 3386–3409, 2016.
  • [14] R. Venugopal and D. S. Bernstein, “Adaptive disturbance rejection using ARMARKOV/Toeplitz models,” IEEE Transactions on Control Systems Technology, vol. 8, no. 2, pp. 257–269, 2000.
  • [15] M. A. Santillo and D. S. Bernstein, “Adaptive control based on retrospective cost optimization,” Journal of guidance, control, and dynamics, vol. 33, no. 2, pp. 289–304, 2010.
  • [16] G. Goel and B. Hassibi, “Measurement-feedback control with optimal data-dependent regret,” arXiv preprint arXiv:2209.06425, 2022.
  • [17] G. Goel and B. Hassibi, “Regret-optimal estimation and control,” arXiv preprint arXiv:2106.12097, 2021.
  • [18] G. Goel and B. Hassibi, “Regret-optimal measurement-feedback control,” in Learning for Dynamics and Control, pp. 1270–1280, PMLR, 2021.
  • [19] G. Goel and B. Hassibi, “Regret-optimal control in dynamic environments,” arXiv preprint arXiv:2010.10473, 2020.
  • [20] J. Doyle, “Guaranteed margins for LQG regulators,” IEEE Transactions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978.
  • [21] H. K. Khalil, Nonlinear Systems. Upper Saddle River, NJ: Prentice-Hall, third ed. ed., 2002.
  • [22] A. Packard and J. Doyle, “The complex structured singular value,” Automatica, vol. 29, no. 1, pp. 71–109, 1993.