Disturbance Rejection-Guarded Learning for Vibration Suppression of Two-Inertia Systems

Fan Zhang¹, **feng Chen¹, Yu Hu¹, Zhiqiang Gao¹, Ge Lv², Qin Lin¹ ¹The authors are with the Center for Advanced Control Technologies (CACT), Cleveland State University, 2121 Euclid Avenue, Cleveland, OH 44115, USA. Corresponding author: Qin Lin, [email protected]²Ge Lv is with the Department of Mechanical Engineering, Clemson University, 105 Sikes Hall, Clemson, SC 29634, USA.

Abstract

Model uncertainty presents significant challenges in vibration suppression of multi-inertia systems, as these systems often rely on inaccurate nominal mathematical models due to system identification errors or unmodeled dynamics. An observer, such as an extended state observer (ESO), can estimate the discrepancy between the inaccurate nominal model and the true model, thus improving control performance via disturbance rejection. The conventional observer design is memoryless in the sense that once its estimated disturbance is obtained and sent to the controller, the datum is discarded. In this research, we propose a seamless integration of ESO and machine learning. On one hand, the machine learning model attempts to model the disturbance. With the assistance of prior information about the disturbance, the observer is expected to achieve faster convergence in disturbance estimation. On the other hand, machine learning benefits from an additional assurance layer provided by the ESO, as any imperfections in the machine learning model can be compensated for by the ESO. We validated the effectiveness of this novel learning-for-control paradigm through simulation and physical tests on two-inertial motion control systems used for vibration studies.

Index Terms:

Machine Learning, Disturbance Rejection, Extended State Observer, Model Uncertainty

I Introduction

Vibration suppression of multi-inertia systems is critical in many engineering applications, including automotive suspensions, series elastic actuators (SEA), and various other motion control systems [1]. These systems often involve multiple inertia components with a two-inertia subsystem serving as a fundamental block, connected by flexible couplings, which leads to inherent resonance issues. This resonance can cause dynamic stresses, energy wastes, and performance degradation, therefore posing significant challenges to the systems’ efficiency and stability [2, 3]. Given the fundamental challenge of system identification and the necessity for real-time performance, it is common practice to employ a simplified or inaccurate nominal dynamic model. Consequently, the disturbances become inevitable, necessitating their rejection to achieve robust control. The disturbance includes internal (i.e., unknown or unmodelled parts of the plant dynamics) and external (i.e., perturbations from the outside affecting the dynamics) [4, 5].

The observer-based method has emerged as a promising approach to estimating the disturbance for the subsequent design of a disturbance rejection controller. Among the array of existing disturbance observers, the extended state observer (ESO) [6] is gaining popularity due to its simplicity in implementation. For the formulation of an ESO, the system is modeled as a simple chained integrator with a total disturbance term (also called lumped disturbance, $f$ ) that includes both internal and external disturbances. The total disturbance is treated as an extended state to be estimated together with other states. The estimated disturbance can be mitigated through various means, including a simple state feedback controller or more advanced control strategies such as sliding mode control [7] and model predictive control [8].

It is worth noting that the traditional ESO operates in a memoryless fashion, i.e., once it estimates a disturbance and transmits it to the controller, the datum used for estimation is then discarded. However, as a control system operates, we can improve our understanding of the disturbance through collected operational data. Prior works [9, 10] show that a model-based ESO (MB-ESO), which utilizes prior model information about the disturbance (such as a detailed dynamic model obtained through system identification), tends to exhibit reduced sensitivity to noise when compared to a model-free ESO (MF-ESO) that assumes a simple chained integrator as a nominal model. In order to circumvent the need for extensive system identification and maximize the utilization of disturbance information, we propose to leverage machine learning (ML), which has powerful capacities for nonlinear optimization, to memorize and generalize the past estimations from the ESO as a feedforward estimation of the disturbance. The learning component is expected to capture the internal dynamics as well as patterns of external disturbances.

[11, 12, 13] combine ESO with iterative learning control (ILC) for repetitive control tasks. Our approach focuses on general control tasks rather than just the repetitive ones. In addition, we assume that system dynamics, as well as disturbances, are unknown and not necessarily repetitive. In [14], a neural network is utilized to tune the parameters of ESO rather than explicitly learning the disturbance. Other learning-for-control approaches such as [15] employ neural networks to capture discrepancies between a nominal model $\hat{F}(x_{k},u_{k})$ and the true model $F(x_{k},u_{k})$ . Since the state of the true model is unknown, the measured next state $x_{k+1}$ is used to update the error model represented by the neural network. However, these methods always assume full-state information is available. In addition, when the learning performance falls short of expectations, it may result in suboptimal performance for subsequent model-based controllers. In contrast, our approach represents a novel paradigm that aims at learning the total disturbance with the help of output measurements instead of true values for states. Furthermore, our paradigm includes a correction mechanism for cases where the learning component fails to accurately capture the disturbances. The residual total disturbance, i.e., the remainder excluding the disturbance already estimated by the learning component, will be estimated by a conventional ESO in a feedback correction manner. Through this seamless integration, even when the learning-based estimation struggles to converge effectively, we can leverage the ESO for feedback correction, thereby adding an extra layer of robustness and assurance to the system.

In our new framework, as visualized in Fig. 1, we refer to the learning-enabled extended state observer as L-ESO. The estimation $\hat{f}$ of the true total disturbance $f$ consists of $\hat{f}_{L}$ and $\Delta\hat{f}$ , which are from the learning component and the ESO, respectively. First, ESO uses the information of control $u$ and observation $y$ to estimate the system’s states $\hat{x}$ and the residual disturbance $\Delta\hat{f}$ . Second, ESO’s estimation, including $\hat{x}$ and $u$ are fed as input to the learning component for learning a regression model. The learning component carries out the feedforward estimation $\hat{f}_{L}$ , after which an online optimization iteratively minimizes the difference between $\hat{f}_{L}$ and $\hat{f}$ , allowing the learning component to approximate the total disturbance accurately. In situations where imperfect learning introduces errors, the ESO serves as an additional layer to rectify.

Refer to caption — Figure 1: The proposed framework in this paper, where the red and the blue blocks represent the L-ESO and the disturbance rejection tracking controller, respectively. Once the total disturbance is estimated, the tracking controller will be able to reject disturbance.

The contributions of our work are summarized as follows:

•

We propose a novel framework that combines ML and ESO for feedforward estimation and feedback correction for a general disturbance rejection tracking control task. Compared with existing learning-for-control frameworks, we estimate states and disturbances in a unique way. We also have an extra error correction mechanism for the learning component.
•

The learning component serves as an add-on to existing ESO-based control architecture. As shown in Fig. 1, only a learning component and a few connections (in green) are introduced. The advantage of our modular design is two-fold: 1) no need to change the existing framework; 2) users can customize the learning components by choosing any appropriate machine learning model.
•

Our learning and estimation are real-time and online. We showcase the efficacy of our framework through simulations and a real-world two-inertia testbed as a fundamental block for a multi-inertia system.

The remainder of this paper is structured as follows. We first go through the preliminaries in Sec. II. Then, we construct our framework in Sec. III. Simulation results of the two-mass-spring benchmark system are presented in Sec. IV, followed by the hardware experiments of a torsional plant in Sec. V. Finally, we conclude our work and discuss possible future research directions in Sec. VI.

II Preliminary

The multi-inertia system can be represented as the sum of a nominal part and a nonlinear time-varying part:

\begin{cases}\dot{\bar{x}}(t)=A_{0}\bar{x}(t)+B_{0}u(t)+E_{0}f(x(t),d(t),t)\\ y=C_{0}\bar{x}\end{cases}

(1)

where $\bar{x}\in\mathbb{R}^{n}$ is the state vector, $u\in\mathbb{R}$ is a control input, $y\in\mathbb{R}$ is a measured output, and $f:\mathbb{R}^{n+1}\times[0,\infty]\rightarrow\mathbb{R}$ is an unknown function representing the time-varying uncertainty, which contains external disturbance $d(t)\in\mathbb{R}$ , unmodeled dynamics, and parameter uncertainty. Terms $A_{0}$ , $B_{0}$ , $E_{0}$ and $C_{0}$ are real and known matrices with appropriate dimensions. For the particular case of a two-inertial system with $n=4$ , meaning two states for each inertial position/angular and velocity/angular velocity, please refer to the details in the example in Sec. IV. The justification of classifying (1) as a nonlinear time-varying system can be found in [16, 17].

Traditionally, an ESO is established for a system in a chained integrator form [6]. However, in our most recent work [18], we have significantly expanded the applicability scope of ESO and rigorously proved that for a general system (1), given that Assumption 1 and the Assumption 2 are satisfied, an ESO can be established to estimate $f$ by releasing the chained integrator form requirement.

Assumption 1.

$(A_{0},C_{0})$ is observable.

Assumption 2.

$(A_{0},E_{0},C_{0})$ has no invariant zeros.

For system (1), under the Assumptions 1, and 2 , there exists a matrix

S=\begin{bmatrix}C_{0}\\ C_{0}A_{0}\\ \vdots\\ C_{0}A_{0}^{n-1}\end{bmatrix}

(2)

such that

\begin{split}\bar{A}_{0}=SA_{0}S^{-1}&=\begin{bmatrix}0&1&\dots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\dots&0&1\\ -a_{0}&-a_{1}&\dots&-a_{n-1}\\ \end{bmatrix}\\ \bar{B}_{0}=SB_{0}&=\begin{bmatrix}0&0&\dots&b\end{bmatrix}^{T}\\ \bar{C}_{0}=C_{0}S^{-1}&=\begin{bmatrix}1&0&\dots&0\\ \end{bmatrix}\\ \bar{E}_{0}=SE_{0}&=\begin{bmatrix}0&0&\dots&1\end{bmatrix}^{T}\\ \end{split}

(3)

form the following new system

\begin{cases}\dot{x}=\bar{A}_{0}x+\bar{B}_{0}u+\bar{E}_{0}f\\ y=\bar{C}_{0}x\end{cases}

(4)

The readers are referred to [18] for more details on the matrix transformation. The new system (4) has an observable canonical form such that an ESO can be established for estimating $f$ .

Remark 1.

Assumption 2 is equivalent to the following conditions. The proof can be found in [18].

$C_{0}E_{0}=0,C_{0}A_{0}E_{0}=0,\dots,C_{0}A_{0}^{n-1}E_{0}\neq 0$

According to whether or not the system dynamics are available, we have the following two variants of ESO:

II-A MB-ESO

If the model information, i.e., $-a_{0},-a_{1},\cdots,-a_{n-1},b$ , in matrix $\bar{A}_{0}$ and $\bar{B}_{0}$ is available, we have

\displaystyle\dot{x}=

\displaystyle\underbrace{\begin{bmatrix}0&1&\dots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\dots&0&1\\ -a_{0}&\dots&\dots&-a_{n-1}\\ \end{bmatrix}}_{\bar{A}_{0,MB}}x+\underbrace{\begin{bmatrix}0\\ 0\\ \vdots\\ b\end{bmatrix}}_{\bar{B}_{0,MB}}u+\underbrace{\begin{bmatrix}0\\ 0\\ \vdots\\ 1\end{bmatrix}}_{\bar{E}_{0}}\underbrace{d}_{f}

(5)

The total disturbance can be represented as:

f=d\\

(6)

where $d$ is the external disturbance, $b$ is the true control gain.

II-B MF-ESO

If the model information, i.e., $-a_{0},-a_{1},\cdots,-a_{n-1},b$ , in matrix $\bar{A}_{0}$ and $\bar{B}_{0}$ , is not available, we have

\begin{array}[]{r@{}l}\dot{x}=&\underbrace{\begin{bmatrix}0&1&\dots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\dots&0&1\\ 0&0&\dots&0\\ \end{bmatrix}}_{\bar{A}_{0,MF}}x+\underbrace{\begin{bmatrix}0\\ 0\\ \vdots\\ b_{0}\end{bmatrix}}_{\bar{B}_{0,MF}}u+\\ &\underbrace{\begin{bmatrix}0\\ \vdots\\ 1\end{bmatrix}}_{\bar{E}_{0}}\underbrace{(-a_{0}x_{1}-\dots-a_{n-1}x_{n}+(b-b_% {0})u+d)}_{f}\par\end{array}

(7)

where $-a_{0}x_{1}-\dots-a_{n-1}x_{n}+(b-b_{0})u$ is the internal disturbance (unknown/unmodelled dynamics), $b_{0}$ is the nominal control gain, and $d$ is the external disturbance. In such a case, the total disturbance becomes:

f=-a_{0}x_{1}-\dots-a_{n-1}x_{n}+(b-b_{0})u+d\\

(8)

ESO treats the total disturbance $f$ as an extended state, such that a Luenberger observer can be designed to estimate both the original system state $x$ and the total disturbance $f$ . The augmented dynamic system is as follows:

\begin{cases}\begin{bmatrix}\dot{x}\\ \dot{f}\end{bmatrix}=A\begin{bmatrix}x\\ f\end{bmatrix}+Bu+E\dot{f}\\ y=Cx\end{cases}

(9)

where $A=\begin{bmatrix}\bar{A_{0}}&\bar{E_{0}}\\ 0_{1\times n}&0\end{bmatrix}_{(n+1)\times(n+1)}$ , $B=\begin{bmatrix}\bar{B_{0}}\\ 0\end{bmatrix}_{(n+1)\times 1}$ , $C=[\bar{C_{0}},0]_{1\times(n+1)}$ , $E=[0,\cdots,0,1]_{(n+1)\times 1}^{T}$ .

The Luenberger observer has the following form:

\begin{bmatrix}\dot{\hat{x}}\\ \dot{\hat{f}}\end{bmatrix}=A\begin{bmatrix}\hat{x}\\ \hat{f}\end{bmatrix}+Bu+L\left(y-C\begin{bmatrix}\hat{x}\\ \hat{f}\end{bmatrix}\right)

(10)

where $\hat{x}$ and $\hat{f}$ are estimations of $x$ and $f$ , $L$ is the observer gain. We have the following estimation error dynamics:

\dot{e}=(A-LC)e+E\dot{f}\\

(11)

where $e=\begin{bmatrix}x-\hat{x}&f-\hat{f}\\ \end{bmatrix}^{T}$ .

Theorem 1.

Under Assumption 1 and Assumption 2, the eigenvalues $A-LC$ can be placed at the left side of the plane to make the estimation converge [18, 17].

All eigenvalues can be placed at $-\omega_{o}$ , which is called the observer bandwidth of ESO [19].

III Learning-Enabled ESO

The model-based ESO in (5) and the model-free ESO in (7) can be further expanded as follows:

\begin{array}[]{r@{}l}&\underbrace{\begin{bmatrix}\underbrace{\begin{matrix}0&% 1&\dots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\dots&0&1\\ -a_{0}&\dots&\dots&-a_{n-1}\\ \end{matrix}}_{\bar{A}_{0,MB}}&\bar{E_{0}}\\ 0_{1\times n}&0\end{bmatrix}}_{A_{MB}}\begin{bmatrix}\hat{x}\\ \hat{f}\end{bmatrix}+\underbrace{\begin{bmatrix}0\\ \vdots\\ \underbrace{b_{0}+b-b_{0}}_{\bar{B}_{0,MB}}\\ 0\end{bmatrix}}_{B_{MB}}u=\\ &\underbrace{\begin{bmatrix}\underbrace{\begin{matrix}0&1&\dots&0\\ \vdots&\ddots&\ddots&\vdots\\ 0&\dots&0&1\\ 0&\dots&\dots&0\\ \end{matrix}}_{\bar{A}_{0,MF}}&\bar{E_{0}}\\ 0_{1\times n}&0\end{bmatrix}}_{A_{MF}}\begin{bmatrix}\hat{x}\\ \hat{f}\end{bmatrix}+\underbrace{\begin{bmatrix}0\\ \vdots\\ \underbrace{b_{0}}_{\bar{B}_{0,MF}}\\ 0\end{bmatrix}}_{B_{MF}}u\par+\\ &\begin{bmatrix}\bar{E}_{0}\\ 0\end{bmatrix}(-a_{0}x_{1}\dots-a_{n-1}x_{n}+(b-b_{0})u)\end{array}

(12)

Remark 2.

By incorporating model information, MF-ESO becomes equivalent to MB-ESO.

Remark 3.

The motivation for proposing the learning component can be justified in that the model information is learnable to facilitate the incorporation of model information.

Remark 4.

The learning component is even possible to learn the external disturbance together with the internal disturbance to be incorporated.

Since the learning component has a feedforward estimation $\hat{f}_{L}$ for the total disturbance, ESO can serve as a feedback correction to estimate the residual total disturbance as $\Delta\hat{f}$ . The combination of the feedforward estimation and the feedback correction is realized as follows:

\begin{bmatrix}\dot{\hat{x}}\\ \dot{\Delta\hat{f}}\end{bmatrix}=A\begin{bmatrix}\hat{x}\\ \Delta\hat{f}\end{bmatrix}+Bu+L\left(y-C\begin{bmatrix}\hat{x}\\ \Delta\hat{f}\end{bmatrix}\right)+\begin{bmatrix}\bar{E}_{0}\\ 0\end{bmatrix}\hat{f}_{L}

(13)

Since the learning component is expected to capture the unknown dynamics, we employ a model-free ESO, see Fig. 1. The learning block in Fig. 1 is a function $h_{\theta}(x,u)$ parameterized by $\theta$ . To learn the total disturbance (see (8)), we establish a map** from the input ( $\hat{x}$ estimated by ESO and control input $u$ ) to the output $\hat{f}$ , where $\hat{f}=\hat{f}_{L}+\Delta\hat{f}$ . The total disturbance estimation consists of two parts: 1) the feedforward estimation from the learning component $\hat{f}_{L}=h_{\theta}(\hat{x},u)$ ; 2) feedback correction for the residual disturbance $\Delta\hat{f}$ by an MF-ESO. To optimize the parameters of the machine learning model, a general regression problem is formulated using the following cost function:

J(\theta)=\frac{1}{2}\sum_{i=1}^{n}(h_{\theta}(\hat{x}^{i},u^{i})-\hat{f}^{i})% ^{2}

(14)

where $n$ is the size of the training data. The details are in Alg. 1. When the batch is not yet filled, we run the MF-ESO (see Line 7-14, the learning component does not return optimized parameters).

Input: Control input

u

, system output

y

, learning rate

\alpha

, batch size

n

, maximum running time

N_{max}

Output: Total disturbance

\hat{f}

1 Initialize:

2machine learning input batch

\mathcal{I}^{0}=\emptyset

3disturbance estimation by ESO batch

\Delta\mathcal{F}^{0}=\emptyset

4machine learning output batch

\mathcal{F_{L}}^{0}=\emptyset

5machine learning model parameter

\theta

6machine learning output

\hat{f}_{L}^{0}=0

7for $i=1$ to $n$ do

Get

\hat{x}^{i}

and

\Delta\hat{f}^{i}

by running L-ESO

\triangleright

see (13)

Compute

u^{i}

\triangleright

see (22)

\mathcal{I}^{i}:=[\mathcal{I}^{i-1},[\hat{x}_{1}^{i},\hat{x}_{2}^{i},\dots,% \hat{x}_{n}^{i},u^{i},1]^{T}]

\Delta\mathcal{F}^{i}:=[\Delta\mathcal{F}^{i-1},\Delta\hat{f}^{i}]

\mathcal{F_{L}}^{i}:=[\mathcal{F_{L}}^{i-1},0]

\triangleright

append data into three batches

\hat{f}_{L}^{i}=0

14 end for

16for $i=n$ to $N_{max}$ do

Get

\hat{x}^{i}

and

\Delta\hat{f}^{i}

by running L-ESO

\triangleright

see (13)

Update

\mathcal{I}^{i}

\triangleright

pop oldest datum, push new datum

Update

\Delta\mathcal{F}^{i}

\triangleright

pop oldest datum, push new datum

Update

\theta^{i}

\triangleright

According to (14)

\mathcal{F_{L}}^{i}=h_{\theta^{i}}(\mathcal{I}^{i})

\hat{f}_{L}^{i}=h_{\theta^{i}}(x^{i})

\hat{f}^{i}=\hat{f}_{L}^{i}+\Delta\hat{f}^{i}

\triangleright

compute total disturbance

Compute

u^{i}

\triangleright

see (22)

26 end for

Algorithm 1 L-ESO

Our framework has superior modularity. The design of the ESO is just a conventional model-free convention. We only need to use the estimation from ESO to drive the training of our learning component. First, the learning component can serve as an add-on to existing ESO-based control architecture by just adding a few connections. Second, the learning component is so flexible that users can customize it by choosing appropriate machine learning models, e.g., linear, non-linear, parametric, non-parametric, etc.

IV Simulation Results

IV-A Two-Mass-Spring Problem Formulation

Fig. 2 depicts a schematic of a two-mass-spring system, which is from a well-known benchmark control problem [20]. The system includes two masses: $m_{1}$ and $m_{2}$ , which can slide freely over a horizontal surface without friction. Note that it has been proved that a non-friction setting is more challenging for a controller design [9]. The masses are connected by a light horizontal spring with a spring constant $k$ . The system is subject to two external disturbance forces $w_{1}$ and $w_{2}$ , which act on masses $m_{1}$ and $m_{2}$ , respectively. The control signal $u$ is the force applied to mass $m_{1}$ . Both the positions of mass $m_{1}$ and mass $m_{2}$ are measured, and either one can be used as an output to be controlled.

The states of the two-mass-spring system are defined as the displacements and velocities of the two masses. Specifically, the displacement and velocity of mass $m_{1}$ are $x_{1}$ and $x_{3}$ , respectively, while the displacement and velocity of mass $m_{2}$ are $x_{2}$ and $x_{4}$ , respectively. The dynamics of the system can be represented in the following state-space form:

\begin{split}\begin{bmatrix}\dot{x}_{1}\\ \dot{x}_{2}\\ \dot{x}_{3}\\ \dot{x}_{4}\end{bmatrix}&=\begin{bmatrix}0&0&1&0\\ 0&0&0&1\\ -\frac{k}{m_{1}}&\frac{k}{m_{1}}&0&0\\ \frac{k}{m_{2}}&-\frac{k}{m_{2}}&0&0\\ \end{bmatrix}\begin{bmatrix}x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ \end{bmatrix}\\ &+\begin{bmatrix}0\\ 0\\ \frac{1}{m_{1}}\\ 0\\ \end{bmatrix}(u+w_{1})+\begin{bmatrix}0\\ 0\\ 0\\ \frac{1}{m_{2}}\end{bmatrix}w_{2}\\ y&=\begin{bmatrix}c_{1}&c_{2}&0&0\end{bmatrix}\begin{bmatrix}x_{1}&x_{2}&x_{3}% &x_{4}\\ \end{bmatrix}^{T}\end{split}

(15)

A time-varying unknown external disturbance $w_{2}$ is from the mass $m_{2}$ , control needs to be conducted on $m_{1}$ to allow $x_{2}$ track any desired trajectory. For the output $y$ , i.e., $x_{2}$ , a chained integrator system is derived by taking the derivatives of the output four times. The input and disturbance are in the last channel of this fourth-order system with $b=\frac{k}{m_{1}m_{2}}$ :

y^{(4)}=-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}\ddot{y}+\frac{k}{m_{1}m_{2}}w_{2}+% \frac{1}{m_{2}}\ddot{w}_{2}+bu

(16)

IV-B ESO design

The states in the system are:

x=\begin{bmatrix}y&\dot{y}&\ddot{y}&\dddot{y}\end{bmatrix}^{T}

(17)

The state-space description of the system is

\begin{cases}\begin{bmatrix}\dot{x}\\ \dot{f}\end{bmatrix}=A\begin{bmatrix}x\\ f\end{bmatrix}+Bu+E\dot{f}\\ y=Cx\end{cases}

(18)

IV-B1 Model-free ESO

The state-space model is:

$A_{MF}=\begin{bmatrix}0&1&0&0&0\\ 0&0&1&0&0\\ 0&0&0&1&0\\ 0&0&0&0&1\\ 0&0&0&0&0\\ \end{bmatrix}$ , $B=\begin{bmatrix}0\\ 0\\ 0\\ b_{0}\\ 0\end{bmatrix}$ , $C=\begin{bmatrix}1&0&0&0&0\end{bmatrix}$ , $E=\begin{bmatrix}0&0&0&0&1\end{bmatrix}^{T}$ . As we can see, the model-free design assumes unknown dynamics, such that the total disturbance $f$ can be represented as:

f=-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}\ddot{y}+\frac{k}{m_{1}m_{2}}w_{2}+\frac{1}{% m_{2}}\ddot{w}_{2}+(b-b_{0})u

(19)

where $-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}$ is the model parameter information, $b_{0}$ is the nominal control gain. We have

y^{(4)}=f+b_{0}u

(20)

where everything besides $b_{0}u$ is considered as total disturbance (see (16)). It can be validated that such a system satisfies Assumptions 1, 2, and 3. Therefore, an ESO can be designed for the estimation of $f$ , see (10).

The observer gain is chosen where all the eigenvalues of $A_{MF}-LC$ are placed at $-\omega_{o}$ [19], i.e., $L_{MF}=[5\omega_{o}\quad 10\omega_{o}^{2}\quad 10\omega_{o}^{3}\quad 5\omega_{% o}^{4}\quad\omega_{o}^{5}]$ .

IV-B2 Model-based ESO

The model-based design has the following state-space representation:

$A_{MB}=\begin{bmatrix}0&1&0&0&0\\ 0&0&1&0&0\\ 0&0&0&1&0\\ 0&0&-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}&0&1\\ 0&0&0&0&0\\ \end{bmatrix}$ , $B=\begin{bmatrix}0\\ 0\\ 0\\ b_{0}\\ 0\end{bmatrix}$ , $C=\begin{bmatrix}1&0&0&0&0\end{bmatrix}$ , $E=\begin{bmatrix}0&0&0&0&1\end{bmatrix}^{T}$ . In contrast to the above-mentioned model-free design, such a system tries to leverage the prior knowledge of the dynamic model, by assuming $-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}$ is known (see (16)). In this case, the total disturbance becomes:

f=\frac{k}{m_{1}m_{2}}w_{2}+\frac{1}{m_{2}}\ddot{w}_{2}+(b-b_{0})u

(21)

such that $y^{(4)}=-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}\ddot{y}+f+b_{0}u$

The observer gain is chosen where all eigenvalues of $A_{MB}-LC$ are placed at $-\omega_{o}$ [19]. Let $a=-k\frac{m_{1}+m_{2}}{m_{1}m_{2}}$ , the coefficients of $L_{MB}$ are listed in Table I.

Parameters	Values
$L_{MB,1}$	$5\omega_{o}$
$L_{MB,2}$	$a+10\omega_{o}^{2}$
$L_{MB,3}$	$5a\omega_{o}+10\omega_{o}^{3}$
$L_{MB,4}$	$a^{2}+10a\omega_{o}^{2}+5\omega_{o}^{4}$
$L_{MB,5}$	$5a^{2}\omega_{o}+10a\omega_{o}^{3}+\omega_{o}^{5}$

TABLE I: coefficients of

L_{MB}

IV-B3 L-ESO

As shown in (19), the internal disturbance has a linearly structured map** between the input (state and control) and the output (disturbance). Therefore, a linear regression model is a reasonable choice for the learning component, with $h_{\theta}(\cdot)=\theta^{T}\begin{bmatrix}\hat{x}_{1}&\hat{x}_{2}&\hat{x}_{3}% &\hat{x}_{4}&u&1\end{bmatrix}^{T}$ . Note that as we mentioned before, the learning model is flexible to be linear, nonlinear, parametric, non-parametric, etc. Our contribution is not about the complexity of the learning model but the novel design to seamlessly combine machine learning models with an ESO. A batch gradient descent method is used for optimizing the cost function. In our experiments, we initialize $\theta$ with all zeros.

IV-C Controller Design

The control law for the system (20) can be designed as:

u=\frac{-\hat{f}+u_{0}}{b_{0}}

(22)

such that

y^{(4)}=u_{0}

(23)

It can be controlled by a state feedback controller

u_{0}=-K\hat{x}=k_{1}(r-\hat{x}_{1})-k_{2}\hat{x}_{2}-k_{3}\hat{x}_{3}-k_{4}% \hat{x}_{4}\\

(24)

with a control gain $K=\begin{bmatrix}\omega_{c}^{4}&4\omega_{c}^{3}&6\omega_{c}^{2}&4\omega_{c}% \end{bmatrix}$ , where $\omega_{c}$ is the close-loop natural frequency [19].

IV-D Simulation Results

The system parameters are taken from the benckmark problem [20], i.e., $m_{1}=m_{2}=1$ kg, $k=1$ N/m, $c_{1}=0$ , $c_{2}=1$ . Tracking a desired trajectory for the position of mass $m_{2}$ is the control objective. A sinusoidal wave with a frequency of 1 rad/s and amplitude 1 is applied in the training phase for L-ESO. After 110 seconds, a step reference is given to all three approaches. A band-limited white noise with noise power $10^{-12}$ is added at the system output side. A sinusoidal external disturbance with frequency $\pi/10$ rad/s is applied on $m_{2}$ as $w_{2}$ starting at 150 s. The learning algorithm is running online. The learning phase is designed to emulate the typical operational scenarios of the machine under general conditions, whereas the step response is employed to assess and compare the tracking performance. All the control parameters are set identically for fair comparison.

The controller bandwidth $\omega_{c}$ and the observer bandwidth $\omega_{o}$ are set to 1 rad/s and 10 rad/s, respectively. The control gain is set to 1. All three approaches share such same settings for fair comparison.

The tracking performance and the control input are shown in Fig. 3 and Fig. 4, respectively.

1.

MB-ESO and L-ESO have similar performance for the step reference tracking (see the zoom-in plot from 126 s to 134 s, Fig. 3) after the training phase, see the position plot of $m_{2}$ in Fig. 3, which are better than MF-ESO in terms of overshoot percentage (0 vs. $5$ ‰ ) and settling time (12s vs. 16s).
2.

For external disturbance rejection (see the zoom-in plot from 170 s to 195 s, Fig. 3), L-ESO’s performance is the best. By re-visiting (8), if the external disturbance has a linear component, a linear regression component can still capture it, e.g., the trends of going up and down in a sinusoidal external disturbance.
3.

Adding external disturbance information to the observer can help reduce the required bandwidth. In our experiments, we found that MF-ESO and MB-ESO will need three times more bandwidth to achieve the same performance as the L-ESO.
4.

The control input of the L- ESO has more fluctuations compared with MF-ESO and MB-ESO, as shown in Fig. 4. This is caused by the noise signal and the batch gradient descent method we choose to minimize the cost function. It can be smoothened by increasing the batch size in this example.

V Hardware Experiments Results

We conduct physical experiments on our ECP Model 205 torsional testbed [21], see Fig.5. It is a mechanical system that consists of a flexible vertical shaft connecting two disks - a lower disk and an upper disk. Each disk is equipped with an encoder for position measurement. A DC servo motor drives the lower disk through a belt and pulley system, which provides a 3:1 speed reduction ratio. The system can be used to study the vibration of a torsional two-mass-spring system.

A personal computer with MATLAB^®Simulink Desktop Real-Time™ installed is used for computation. The computer is also equipped with a four-channel quadrature encoder input card (NI-PCI6601) and a multi-function analog and digital I/O card (NI-PCI6221). These cards interface with the torsional plant Model 205 for real-time data acquisition and control. The quadrature encoder input card enables the computer to receive position and velocity data from the encoders on the disks of the plant. The multi-function analog and digital I/O card allows the computer to send control signals to the DC servo motor that drives the lower disk.

V-A System Model

Since the MB-ESO, as a baseline approach, needs the dynamics information, we first use MATLAB^®System identification toolbox and get the transfer function: $G(s)=\frac{4.6\times 10^{4}}{s^{4}+1.901s^{3}+1683s^{2}+1812s+0.1032}$ .

V-B ESO and Controller Design

As this testbed is again a fourth-order dynamic system, the same ESO design pipeline shown before can be applied.

V-C Experiment Results

Tracking a desired trajectory for the upper disk is the control objective. A sinusoidal wave with a frequency of $\pi/2$ rad/s and an amplitude $0.5\pi$ is applied in the training phase of L-ESO. $\omega_{c}$ and $\omega_{o}$ are set to 90 rad/s and 40 rad/s, respectively. The control gain is $5.5\times 10^{4}$ . A trapezoidal profile reference with the final value $\pi$ is used.

From the results illustrated in Fig. 6 and Fig. 7, we have the following observations: 1) L-ESO has the best performance among all the methods after the training phase in terms of overshoot percentage and settling time. The reasons for L-ESO outperforming MB-ESO could be the imperfection of system identification or that our approach can learn internal as well as external disturbance. 2) The fluctuation of control input of L-ESO is between MF-ESO and MB-ESO, as shown in Fig. 7, which is different from the simulation result. This is because the learning rate is conservatively chosen due to the large noise in the hardware. Also, the trapezoidal profile reference is more smooth than the step reference, which is beneficial for learning.

VI CONCLUSIONS

A novel learning-enabled extended state observer L-ESO with the capacity to memorize and generalize from past estimated disturbances is proposed in this paper. The machine learning model is seamlessly integrated into existing disturbance rejection control architecture as a flexible add-on for boosting robustness performance against unknown and time-varying disturbances. Compared with existing learning for control framework, our new paradigm does not rely on access to full states. In addition, the learning is guarded by disturbance rejection that provides an extra assurance layer to compensate for the imperfections of the machine learning model. The efficacy of the proposed approach has been supported by simulation and hardware experiments. In the future, we will further validate in real robotic testbeds.

References

[1] Y. Hori, H. Iseki, and K. Sugiura, “Basic consideration of vibration suppression and disturbance rejection control of multi-inertia system using SFLAC (state feedback and load acceleration control),” IEEE Transactions on Industry Applications, vol. 30, no. 4, pp. 889–896, 1994.
[2] S. Zhao and Z. Gao, “An active disturbance rejection based approach to vibration suppression in two-inertia systems,” Asian Journal of control, vol. 15, no. 2, pp. 350–362, 2013.
[3] Y. Wang, L. Dong, Z. Chen, M. Sun, and X. Long, “Integrated skyhook vibration reduction control with active disturbance rejection decoupling for automotive semi-active suspension systems,” Nonlinear Dynamics, pp. 1–16, 2024.
[4] J. Chen, Y. Hu, and Z. Gao, “On practical solutions of series elastic actuator control in the context of active disturbance rejection,” Advanced Control for Applications: Engineering and Industrial Systems, vol. 3, no. 2, p. e69, 2021.
[5] Q. Zheng, Z. **, S. Soares, Y. Hu, and Z. Gao, “An active disturbance rejection control approach to fan control in servers,” in 2018 IEEE Conference on Control Technology and Applications (CCTA). IEEE, 2018, pp. 294–299.
[6] J. Han, “From PID to active disturbance rejection control,” IEEE Transactions on Industrial Electronics, vol. 56, no. 3, pp. 900–906, 2009.
[7] R. Cui, L. Chen, C. Yang, and M. Chen, “Extended state observer-based integral sliding mode control for an underwater robot with unknown disturbances and uncertain nonlinearities,” IEEE Transactions on Industrial Electronics, vol. 64, no. 8, pp. 6785–6795, 2017.
[8] H. Zhang, Y. Li, Z. Li, C. Zhao, F. Gao, F. Xu, and P. Wang, “Extended-state-observer based model predictive control of a hybrid modular DC transformer,” IEEE Transactions on Industrial Electronics, vol. 69, no. 2, pp. 1561–1572, 2021.
[9] H. Zhang, S. Zhao, and Z. Gao, “An active disturbance rejection control solution for the two-mass-spring benchmark problem,” in 2016 American Control Conference (ACC). IEEE, 2016, pp. 1566–1571.
[10] C. Fu and W. Tan, “Tuning of linear ADRC with known plant information,” ISA transactions, vol. 65, pp. 384–393, 2016.
[11] Y. Hui, R. Chi, B. Huang, and Z. Hou, “Extended state observer-based data-driven iterative learning control for permanent magnet linear motor with initial shifts and disturbances,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 3, pp. 1881–1891, 2021.
[12] J. Wang, D. Huang, S. Fang, Y. Wang, and W. Xu, “Model predictive control for ARC motors using extended state observer and iterative learning methods,” IEEE Transactions on Energy Conversion, vol. 37, no. 3, pp. 2217–2226, 2022.
[13] J. Zhang and D. Meng, “Improving tracking accuracy for repetitive learning systems by high-order extended state observers,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
[14] P. Kicki, K. Łakomy, and K. M. B. Lee, “Tuning of extended state observer with neural network-based control performance assessment,” European Journal of Control, vol. 64, p. 100609, 2022.
[15] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anandkumar, Y. Yue, and S.-J. Chung, “Neural lander: Stable drone landing control using learned dynamics,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 9784–9790.
[16] B. Guo and Z. Zhao, “On the convergence of an extended state observer for nonlinear systems with uncertainty,” Systems & Control Letters, vol. 60, no. 6, pp. 420–430, 2011.
[17] W. Bai, S. Chen, Y. Huang, B. Guo, and Z. Wu, “Observers and observability for uncertain nonlinear systems: A necessary and sufficient condition,” International Journal of Robust and Nonlinear Control, vol. 29, no. 10, pp. 2960–2977, 2019.
[18] J. Chen, Z. Gao, Y. Hu, and S. Shao, “A general model-based extended state observer with built-in zero dynamics,” arXiv preprint arXiv:2208.12314, 2023.
[19] Z. Gao, “Scaling and bandwidth-parameterization based controller tuning,” in Proceedings of the 2003 American Control Conference, 2003. IEEE, 2003, pp. 4989–4996.
[20] B. Wie and D. S. Bernstein, “Benchmark problems for robust control design,” Journal of Guidance, Control, and Dynamics, vol. 15, no. 5, pp. 1057–1059, 1992.
[21] Open AI, “Safety gym,” http://www.ecpsystems.com/controls_torplant.htm [Accessed: 3-23-2024].