Robust Online Convex Optimization for Disturbance Rejection
Abstract
Online convex optimization (OCO) is a powerful tool for learning sequential data, making it ideal for high precision control applications where the disturbances are arbitrary and unknown in advance. However, the ability of OCO-based controllers to accurately learn the disturbance while maintaining closed-loop stability relies on having an accurate model of the plant. This paper studies the performance of OCO-based controllers for linear time-invariant (LTI) systems subject to disturbance and model uncertainty. The model uncertainty can cause the closed-loop to become unstable. We provide a sufficient condition for robust stability based on the small gain theorem. This condition is easily incorporated as an on-line constraint in the OCO controller. Finally, we verify via numerical simulations that imposing the robust stability condition on the OCO controller ensures closed-loop stability.
1 Introduction
This paper considers a class of controllers recently developed using online convex optimization (OCO). Online machine learning and convex optimization methods are powerful tools for learning sequential data. This makes these techniques ideal for high precision control applications like satellite pointing and photolithography. These systems have reliable physics-based models with small error (within the control bandwidth) but are subject to unknown arbitrary disturbances.
This has motivated a large body of recent work using online learning and convex optimization for control [1, 2, 3, 4, 5, 6, 7, 8, 9]. The most closely related work is the class of OCO controllers defined in [10]. Here, OCO with memory is introduced to the discrete-time control setting as an ideal cost minimization problem (which we describe in detail in Section 4.2) to handle arbitrary disturbances and general time-varying convex cost functions. The OCO controller has promising regret guarantees and makes less restrictive assumptions about the disturbance characteristics (e.g., white noise or worst-case) than that of and optimal control techniques [11, 12]. This makes OCO methods well suited for high precision control applications with unknown, arbitrary disturbances that degrade the system performance.
The OCO framework in [10] aims to learn the disturbance characteristics in real time. However, small model errors can cause instability and thus must be explicitly considered in the design. There are additional works that attempt to learn the model from data [13, 14, 15, 16, 17, 18, 19]. However, dynamic uncertainties in many high precision applications arise due to high frequency, time-varying, and/or nonlinear effects. It is difficult to learn such unmodeled effects from real-time data. In these cases, it is useful to design a robust OCO-based controller that can learn the disturbance features and tolerate model uncertainty, thus motivating our work.
There are three main contributions of our work. First, we provide a robust stability condition for OCO control of a discrete linear time-invariant (LTI) plant (Theorem 2 in Section 3.2). The scaled small gain condition is written abstractly with an arbitrary choice of an induced system norm. Our second contribution is to present a constrained OCO (C-OCO) control algorithm which is robust to nonparametric model uncertainties (Section 4). This algorithm uses a specific implementation of the scaled small gain condition with the induced -norm (Section 3.3). This particular choice for the induced norm enables easy implementation of the robust stability condition in the C-OCO algorithm. The third contribution is to present numerical results that illustrate the effect of this robust stability constraint on the OCO controller (Section 5).
2 Problem Formulation
This section formulates the OCO control problem for discrete-time LTI plants subject to both model uncertainty and unknown disturbances.
2.1 Notation
Let be a vector. The -norm of this vector is defined as . Next, denotes the set of non-negative integers. Let denote a vector-valued sequence . The -norm of is defined as:
(1) |
Note that is the -norm of the vector at time while is the -norm of the sequence. The set consists of sequences that have finite -norm. The subset is the extended space of sequences that have finite -norm on all finite intervals, i.e. for all . Finally, let denote systems that map an input signal to an output signal . The induced -norm for this system is defined as:
(2) |
To simplify notation, we’ll often use and for the signal norm and system induced norm when the specific -norm is not important.
2.2 Model Uncertainty
In this section, we consider the feedback system in Figure 1 and discuss the model uncertainty in more detail.
Consider the nominal discrete-time, LTI plant with dynamics:
(3) |
where and are the nominal plant state and input at time , respectively. We assume for simplicity.
Model uncertainty for systems with physics-based models often shows up as unmodeled actuator dynamics affecting the plant input [11, 20, 12]. We can account for these unmodeled dynamics by defining an input-multiplicative uncertainty set as:
(4) |
where . Note that the induced -norm is common choice to bound the uncertainty. However, our main result in Section 3 holds for any induced -norm.
Let denote the true plant dynamics. We assume that the true plant is within the uncertainty set, i.e. . In other words, there exists a specific such that and . More generally, we refer to as the uncertain plant. An alternative viewpoint is that the uncertain plant is where represents unmodeled dynamics. Note that we assume the uncertainty is LTI. However, our main result in Section 3 can be extended to the case where is a possibly nonlinear time-varying (NLTV) system.
2.3 OCO Control
This section describes the OCO controller. We consider the feedback system in Figure 2 where the OCO controller is shown in more detail.
Unknown disturbances are often caused by environmental factors and moving physical components which degrade system performance. However, these disturbances often also have learnable characteristics. It is typical to model such disturbances as entering at the plant input as shown in Figure 2.
OCO control can be used to learn and reject the disturbance without a priori knowledge of the disturbance [1, 2, 3, 4, 5, 6, 7, 8, 9]. Here, we describe a class of OCO controllers closely related to [10] which considers the case when . The OCO controller has the block diagram representation shown in Figure 2. This corresponds to the class of disturbance action controllers defined as:
(5) |
where , , and are the state-feedback gain, learned coefficients, and disturbance estimate, at time , respectively. The state-feedback gain is user-selected while the learned coefficients are typically updated via some online optimization method. For example, [10] uses online projected gradient descent (OPGD) with memory (see Section 4.2).
The disturbance estimate is assumed to be the output of an LTI estimator with dynamics:
(6) |
where and are the estimator state and output at time , respectively. Typically, is an estimate of (possibly with delay), i.e., it is an estimate the disturbance effect on the (nominal) state. The estimate is constructed from and . This estimator is motivated by the case when .
The first term in (5) is considered the baseline controller which we denote by:
(7) |
The main results in Section 3 can be generalized to the case when the baseline control is the output of an LTI controller with input . We assume the baseline controller is a static, state-feedback gain for simplicity.
The second term in (5) is the output of an finite impulse response (FIR) filter with time-varying coefficients. We denote the FIR filter with time-varying coefficients as a linear time-varying (LTV) system with input-output dynamics defined as:
(8) |
where and are the input and output at time , respectively. The FIR filter order is also referred to as the learning horizon since the coefficients are often updated via OCO using the past disturbance estimates. We provide an example of online optimization in Sections 4 and 5, but the main results in Section 3 assume only that the coefficients are time-varying.
2.4 Model Uncertainty Effects on OCO Control
The uncertainty and disturbance have different effects on closed-loop stability. Suppose the state-feedback gain is stabilizing, i.e., all eigenvalues of are strictly inside the unit disk. Given a perfect plant model, i.e., , OCO control can be designed to achieve disturbance rejection with provable guarantees [10]. In this case, a bounded disturbance cannot cause signals , etc. to grow unbounded. However, small amounts of model uncertainty can cause the system to become unstable.
As shown in Figures 1 and 2, the (true) plant input is the control input perturbed by an unknown disturbance:
(9) |
where are the control input, disturbance, and perturbed (true) plant input at time , respectively. The perturbed input is further distorted by the uncertainty . The resulting input to the nominal plant is:
(10) |
where . Again, is the nominal plant input at time . Not only is there an unknown disturbance , but also a distorted signal due to uncertainty .
The additional perturbation can lead to unexpected behaviors that affect the disturbance estimate and FIR filter coefficient update when left unaccounted for in the OCO design. This can occur even when the state-feedback gain is stabilizing for the true plant . Thus, the OCO controller is required to: i) learn and compensate for the disturbance, and ii) stabilize the system in the presence of uncertainty. The OCO controller must achieve these objectives without a priori knowledge of the disturbance or uncertainty.
3 Main Result
This section provides a condition on that ensures the feedback system with OCO control remains stable even in the presence of the model uncertainty.
3.1 Linear Fractional Transformation
As a first step, we transform the feedback system of the OCO controller and uncertain plant (Figures 1 and 2) to a standard form as shown in Figure 3. This diagram separates the LTI dynamics from the uncertainty and time-varying OCO dynamics . Here includes the dynamics due to the plant, estimator, and state-feedback gain. This diagram is called a linear fractional transformation (LFT) in the robust control literature [11, 12]. We use the notation for this interconnection with closed around the upper channels of .
An explicit state-space model for can be determined from the various components of the feedback system described in Section 2. The dynamics of are given by:
We use the LFT representation to formulate and state our robust stability theorem in the next subsection.
3.2 Scaled Small Gain Theorem
Our first stability result is a variation of the standard small gain theorem (see Section 5.4 of [21]). This provides a sufficient condition for the dynamics to have a bounded gain from disturbance to state . Note stability here is in the sense of bounded gain in some induced norm.
Theorem 1.
Consider the interconnection where and are linear systems with finite induced -norm. Partition as:
(11) |
where and are the inputs and outputs of . The interconnection has finite induced -norm, i.e. , if .
Proof.
The system is LTI so by the principle of superposition (assuming zero initial conditions):
(12) |
We can bound using the triangle inequality and the definition of the induced norm:
(13) |
Next, so that . Substitute this bound into (13) and re-arrange to obtain:
(14) |
This last step requires the small gain condition to obtain the bound on .
Finally, the state is . We can use similar steps and the bound on to obtain:
(15) |
Hence, has finite induced -norm. ∎
The small gain condition in the previous theorem can be conservative as it does not exploit the block structure . We can reduce the conservatism by normalizing the blocks and introducing scalings. Specifically, assume and . Define the normalized uncertainty and learning dynamics as: and . Stacking these together yields
(16) |
The scaling normalizes each block so that .
Next, the uncertainty is LTI and hence for any scalar . (In fact, this relation holds even if is also an LTI system but we will not pursue this generalization.) Similarly, the learning dynamics are also linear and hence for any scalar . It follows that the normalized systems can be equivalently written, for any , as:
(17) |
This discussion leads to the following scaled small gain result.
Theorem 2.
Consider the interconnection where and are linear systems with finite induced -norm. Assume where and . Partition as:
(18) |
where and are the inputs and outputs of . The interconnection has finite induced -norm, i.e. , if there exists scalars such that
(19) |
satisfies .
Proof.
Define a scaled version of the nominal dynamics as:
The constants introduced in the scaled plant cancel those introduced for in (16). In other words, and define the same dynamics from to . Moreover, and by assumption. It follows from the small gain theorem (Theorem 1) that has finite induced -norm. ∎
3.3 Bounding the LTV Dynamics
In this section, we provide a result specific to the induced -norm for the OCO control implementation. The induced -norm is useful as it allows us to relate to . The robust stability constraint can then be imposed as a point-wise in time constraint on the coefficients in the projection step of OPGD. We discuss this further in Section 4.2 and 4.3.
The dynamics in (8) can be expressed as:
(20) |
where
(21) | ||||
(22) |
are the stacked FIR coefficients and estimated disturbance history. The following theorem relates the induced -norm of the system to the matrix induced -norm of .
Theorem 3.
Proof.
The equality in (23) is shown in two steps: (A) and (B) .
First, we show direction (A). Let and be any input-output pair of . Equation (20) and the definition of the induced matrix norm imply that
(24) |
Thus, so that . Hence, claim (A) holds.
Next, we show direction (B). Suppose achieves its maximum at some finite time . (The proof can be modified if the supremum occurs as .) Then there exists a vector such that
We can use the vector to construct a signal such that satisfies
Hence, claim (B) holds. ∎
4 Application to OCO
In this section, we demonstrate how the main results can be applied to ensure robust stability of existing OCO controllers. We focus on the OCO controllers in [10, 7] where the coefficients of are updated via OPGD.
4.1 Estimator Design
The class of OCO controllers defined by [10] considers the feedback system with OCO control (Figure 2) and no uncertainty (Figure 1) when . In this case, a perfect plant model is assumed . Thus, the nominal plant dynamics can be used to design an estimator and OPGD to update the coefficients in . Later, we will show how the OPGD projection step can be modified to ensure robust stability for the case that there is uncertainty .
Without uncertainty, the plant dynamics with unknown disturbance reduce to:
Note that is the effective disturbance on the state at time . Assuming the state is measurable, we can perfectly reconstruct this effective disturbance at the previous time step. Use the measured state and rearranging the plant dynamics:
(25) |
With no uncertainty, this estimator perfectly reconstructs the effective disturbance with a one-step delay: . However, perfect reconstruction is no longer guaranteed with uncertainty, i.e. if then . In this case, is considered an estimate of .
The disturbance reconstruction (25) can be expressed in state-space form as:
where is the estimator state. This has the form of the general LTI estimator in (6). The estimates of past disturbances are used to update the FIR coefficients defined in (21) by minimizing an “ideal” cost which we describe next.
4.2 OPGD on an Ideal Cost
The coefficients are updated at each time step via OPGD in the direction of an “ideal” (per-step) cost. This cost is associated with the nominal plant dynamics (3) and a per-step cost function. Here, we consider quadratic per-step costs:
(26) |
where and . Note that the finite-horizon cost is defined as:
(27) |
where is the total time horizon. The ideal cost is defined for any static gain based on this per-step cost (26) which is computed and defined as follows.
Let and denote the ideal state and control input at time , respectively. The ideal state and input are initialized at by:
(28) |
where is the current time. The ideal state and control input are then computed for by iterating over the plant dynamics with the static gains :
(29) | ||||
(30) |
The ideal cost is then defined as . In other words, the ideal cost is the cost of the plant dynamics evolving with static gain over the learning horizon , neglecting dynamics beyond time . The coefficients are updated via OPGD on this ideal cost:
(31) |
where is the learning rate, and is the projection of the gradient step of onto a constraint set . Additional details are given in [10, 7]. Next, we show how the constraint set can be modified to ensure the robust stability of the OCO feedback system (Figures 1 and 2) when .
4.3 Robust OCO Control
Assuming the uncertainty is bounded by some , i.e., , we can use a bisection to find the required bound on the FIR filter such that the robust stability condition (Theorem 2) is satisfied. Larger values of risk stability, yet can improve disturbance rejection as they allow the OCO more freedom to adapt to the gains in . Thus, it is important to determine the largest possible value of such that the robust stability condition holds. We refer to this as the stability bound. Theorem 3 allows us to impose this constraint as a point-wise in time constraint on the FIR coefficients .
Once the constraint has been determined, we can impose the constraint by defining the constraint set as:
(32) |
Thus, the projection can be implemented by:
(33) |
where is the gradient step of the FIR coefficients , defined at time . The constraint set defined in (32) and projection in (33) can be implemented as part of Algorithm 1 in [10]. The numerical results in the following section are based on this implementation.
5 Numerical Results
In this section, we provide numerical results of OPGD on a plant with uncertainty. Although we do not explicitly use the robust stability condition (Theorem 2) to compute the stability bound , we perform numerical studies to illustrate its effect. Future studies will focus on computing the exact bound, while the results here suggest that a stability bound exists.
Here, unconstrained OCO (U-OCO) refers to OCO control with , i.e., unconstrained FIR filter gains . Constrained OCO (C-OCO) refers to OCO control with gains bounded by some . We compare results between U-OCO and C-OCO on the following models:
where and are the nominal plant and unmodeled high frequency actuator dynamics, respectively. Note that and . The following disturbance was generated to perturb the control input :
where the time horizon is . We use the quadratic per-step cost and total cost defined in (26) and (27), respectively, with and . The state-feedback gain , learning horizon , and learning rate are used for all simulations. Note that is stabilizing for both the nominal and true plant dynamics.
Figure 4 shows the per-step cost and estimated disturbance of U-OCO at each time . We compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. Again, a perfect model is without uncertainty , and an imperfect model is with uncertainty . The disturbance is perfectly reconstructed (see Section 4.1) with a perfect plant model. However, with an imperfect plant model, this is not the case . Since the ideal cost computation assumes a perfect plant model and disturbance estimates, this mismatch introduces an error in the coefficient update . This causes an instability which is reflected by the per-step cost and estimated disturbance growing unbounded. On the other hand, U-OCO performance is stable without uncertainty because the disturbance is estimated perfectly. Thus, the constraint is needed on the coefficient update to ensure stability for the imperfect plant model.
![Refer to caption](extracted/2405.07037v1/Figures/u-oco.png)
Figure 5 shows the per-step cost and estimated disturbance of C-OCO for at each time . Again, we compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. As mentioned before, an error in the disturbance estimate introduces an error in the ideal cost gradient. The ideal cost gradient error can cause the gradient step to grow too large in the wrong direction. When the constraint is chosen such that the robust stability condition (Theorem 2) is satisfied, the effect of uncertainty induced error on the gradient step of the coefficient update is limited. This is illustrated in Figure 5 as the performance of C-OCO on the imperfect plant model eventually recovers the performance on the perfect model with . Thus, imposing the constraint can ensure that OCO is robust to uncertainty.
As mentioned in Section 4.3, the choice of is critical. Figure 6 shows the averaged per-step cost for C-OCO as a function of . Again, we compare the performance with a perfect (red dashed) and imperfect (blue solid) plant model. When , the OCO has no freedom to learn the disturbance, and pure state-feedback (SF) is recovered for both the perfect and imperfect plants (red and blue circles, respectively). As is increased, the OCO is allowed more freedom to learn the disturbance, and we see similar improved performance in both the perfect and imperfect plants. However, when is ”too large” such that the robust stability condition (Theorem 2) no longer holds, C-OCO on the imperfect plant becomes unstable. Figure 6 suggests that the stability bound occurs around . Note that once the constraint becomes inactive, C-OCO recovers U-OCO performance for the perfect and imperfect plants (red and blue squares, respectively). For the perfect plant, this indicates a limit as to how much the OCO can improve on the baseline controller. For the imperfect plant, this indicates a limit as to how much the OCO performance can be degraded by uncertainty. Hence, there is this trade off between OCO performance and robustness to uncertainty.
![Refer to caption](extracted/2405.07037v1/Figures/c-oco.png)
![Refer to caption](extracted/2405.07037v1/Figures/beta-sweep.png)
6 Conclusion
In this paper, we establish a robust stability condition using the small gain theorem for a class of OCO controllers with memory and use this result to develop an OCO control algorithm (C-OCO) robust to model uncertainty. In particular, we impose this constraint on the controller by bounding the LTV dynamics of the OCO controller point-wise in time. We provide numerical results to illustrate that imposing the robust stability constraint keeps the closed-loop system stable when it would go unstable otherwise. Future work will study the numerical implementation of the scaled small gain theorem to compute the stability bound .
References
- [1] O. Anava, E. Hazan, and S. Mannor, “Online convex optimization against adversaries with memory and application to statistical arbitrage,” 2014.
- [2] E. Hazan, “The Convex Optimization Approach to Regret Minimization,” in Optimization for Machine Learning, The MIT Press, 09 2011.
- [3] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th international conference on machine learning (icml-03), pp. 928–936, 2003.
- [4] E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 2-3, pp. 169–192, 2007.
- [5] S. Shalev-Shwartz, “Online learning and online convex optimization,” Foundations and trends in Machine Learning, vol. 4, no. 2, pp. 107–194, 2011.
- [6] E. Hazan, “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 3-4, pp. 157–325, 2016.
- [7] N. Agarwal, E. Hazan, and K. Singh, “Logarithmic regret for online control,” in Advances in Neural Information Processing Systems, pp. 10175–10184, 2019.
- [8] D. Foster and M. Simchowitz, “Logarithmic regret for adversarial online control,” in International Conference on Machine Learning, pp. 3211–3221, 2020.
- [9] G. Goel, N. Agarwal, K. Singh, and E. Hazan, “Best of both worlds in online control: Competitive ratio and policy regret,” arXiv preprint arXiv:2211.11219, 2022.
- [10] N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online control with adversarial disturbances,” in International Conference on Machine Learning, pp. 111–119, PMLR, 2019.
- [11] K. Zhou, J. Doyle, and K. Glover, Robust and optimal control. Pearson, 1995.
- [12] S. Skogestad and I. Postlethwaite, Multivariable Feedback Control: Analysis and Design. John Wiley and Sons, 2nd ed., 2005.
- [13] Y. Rahman, A. Xie, J. B. Hoagg, and D. S. Bernstein, “A tutorial and overview of retrospective cost adaptive control,” in 2016 American Control Conference, pp. 3386–3409, 2016.
- [14] R. Venugopal and D. S. Bernstein, “Adaptive disturbance rejection using ARMARKOV/Toeplitz models,” IEEE Transactions on Control Systems Technology, vol. 8, no. 2, pp. 257–269, 2000.
- [15] M. A. Santillo and D. S. Bernstein, “Adaptive control based on retrospective cost optimization,” Journal of guidance, control, and dynamics, vol. 33, no. 2, pp. 289–304, 2010.
- [16] G. Goel and B. Hassibi, “Measurement-feedback control with optimal data-dependent regret,” arXiv preprint arXiv:2209.06425, 2022.
- [17] G. Goel and B. Hassibi, “Regret-optimal estimation and control,” arXiv preprint arXiv:2106.12097, 2021.
- [18] G. Goel and B. Hassibi, “Regret-optimal measurement-feedback control,” in Learning for Dynamics and Control, pp. 1270–1280, PMLR, 2021.
- [19] G. Goel and B. Hassibi, “Regret-optimal control in dynamic environments,” arXiv preprint arXiv:2010.10473, 2020.
- [20] J. Doyle, “Guaranteed margins for LQG regulators,” IEEE Transactions on Automatic Control, vol. 23, no. 4, pp. 756–757, 1978.
- [21] H. K. Khalil, Nonlinear Systems. Upper Saddle River, NJ: Prentice-Hall, third ed. ed., 2002.
- [22] A. Packard and J. Doyle, “The complex structured singular value,” Automatica, vol. 29, no. 1, pp. 71–109, 1993.