Physics-Guided State-Space Model Augmentation Using Weighted Regularized Neural Networks

Yuhan Liu    Roland Tóth    Maarten Schoukens Control Systems Group, Eindhoven University of Technology, Eindhoven, the Netherlands Systems and Control Laboratory, Institute for Computer Science and Control, Budapest, Hungary
Abstract

Physics-guided neural networks (PGNN) is an effective tool that combines the benefits of data-driven modeling with the interpretability and generalization of underlying physical information. However, for a classical PGNN, the penalization of the physics-guided part is at the output level, which leads to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. Furthermore, the classical PGNN cost function regularizes the model estimate over the entire state space with a constant trade-off hyperparameter. In this paper, we introduce a novel model augmentation strategy for nonlinear state-space model identification based on PGNN, using a weighted function regularization (W-PGNN). The proposed approach can efficiently augment the prior physics-based state-space models based on measurement data. A new weighted regularization term is added to the cost function to penalize the difference between the state and output function of the baseline physics-based and final identified model. This ensures the estimated model follows the baseline physics model functions in regions where the data has low information content, while placing greater trust in the data when a high informativity is present. The effectiveness of the proposed strategy over the current PGNN method is demonstrated on a benchmark example.

keywords:
System Identification, Physics-Guided Neural Networks, State Space
thanks: This work is funded by the European Union (ERC, COMPLETE, 101075836). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

1 Introduction

Model-based design plays a crucial role in achieving satisfactory performance for complex dynamic systems by providing an interpretable framework that facilitates a deep understanding of system behaviors, including nonlinearities such as dam** and friction. However, the accurate and comprehensive system dynamics that can be modeled by first principle laws are often costly to obtain.

Nonlinear system identification (Schoukens and Ljung, 2019) is a well-established topic and can be characterized by a wide range of model classes such as state-space models (Schön et al., 2011), block-oriented models (Schoukens and Tiels, 2017), NARMAX (Billings, 2013), etc. Wherein, extensive research (Verdult, 2002; Paduart et al., 2010; Schön et al., 2018) on identification with nonlinear state-space (NLSS) models has shown its flexibility for handling multi-variable systems with potentially fewer parameters. Estimation of state-space models is advantageous for the subsequent control design, given the dependency of many nonlinear control methods on such representation of the system behavior.

Artificial neural networks (ANNs) have long been a focus of interest in the field of nonlinear system identification because of their high expressiveness, flexibility, and capability of approximating functions with arbitrary accuracy (Scarselli and Tsoi, 1998). In Suykens et al. (1995), recurrent neural networks have already been employed to represent a nonlinear state-space model. This structure is referred to as state-space neural network (SS-NN) and has been further discussed in (Amoura et al., 2011; Forgione and Piga, 2020; Schoukens, 2021; Beintema et al., 2023). Recently, Beintema et al. (2023) have introduced a computationally efficient nonlinear system state-space identification algorithm based on a subspace-encoder network (SUBNET). Nonetheless, ANNs are typically black-box models that lack physical interpretation, and exhibit poor generalization capabilities outside the training dataset, especially when the training data is limited. Hence, even though the ANNs may exhibit improved accuracy compared with first-principle modeling, deploying such models in practice or the controllers that are designed for them is simply dangerous.

To address this issue, physics-guided neural network (PGNN) (Karpatne et al., 2017) has been introduced, also within the field of systems and control (Bolderman et al., 2024), that ensures the interpretability and generalization capabilities of the estimated models. Compared with the ANNs, a physics-based cost function is incorporated into the optimization objective of PGNN, ensuring that the learned model not only achieves high accuracy on the training dataset, but also shows consistency with known physics laws on the unseen region without the need for large amounts of ground truth data.

However, there are some open technical issues with using PGNNs in nonlinear state space model identification. First, the classical PGNN does not perform the model augmentation, i.e., the prior model is only used to compute the physics-based regularization term in the cost function. Second, the classical PGNN penalizes the difference between the physics model and the identified model at the output level, which can lead to conservative estimation results. This is because systems with highly similar state-transition functions can have significantly different time-series outputs. Furthermore, the physics-based term of the classical PGNN regularizes the model difference over the entire state space, which makes this approach lose some flexibility, especially when the assumed prior model is inaccurate.

Motivated by these facts, this paper proposes an innovative PGNN-based state-space modeling strategy for nonlinear system identification, namely, W-PGNN, to efficiently complete prior physics-based state-space models with a weighted regularized SS-NN. The main contributions are as follows:

1) A new weighted-regularization cost function is designed to penalize the difference between both the state and output functions of the baseline physics-based and final identified models in regions where measured data provides little information.

2) Compared to the classical PGNNs, the proposed identification approach makes more extensive use of the pre-existing approximate model. The learned dynamics are capable of adhering to the data in regions with high information content, and preserving the behavior of the baseline physics model outside this region. This significantly enhances the flexibility of the SS-NN model.

The remainder of this paper is organized as follows. Section 2 introduces the nonlinear model class and the identification method with a state-space neural network. The classical PGNN method is discussed in Section 3. The proposed W-PGNN method is detailed in Section 4. Numerical simulation results are provided in Section 5, followed by the conclusions in Section 6.

Notation: \mathbb{R}blackboard_R and \mathbb{Z}blackboard_Z denote the sets of real numbers and integers, respectively. The 2-norm of a vector or a matrix is denoted as \|\cdot\|∥ ⋅ ∥. vec(x1,,xn)=[x1xn]vecsubscript𝑥1subscript𝑥𝑛superscriptdelimited-[]superscriptsubscript𝑥1topsuperscriptsubscript𝑥𝑛toptop\mathrm{vec}({x}_{1},...,{x}_{n})=[{x}_{1}^{\top}\ \cdots\ {x}_{n}^{\top}]^{\top}roman_vec ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋯ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT denotes the column-wise composition of vectors. 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) is the standard normal distribution, while 𝒰(a,b)𝒰𝑎𝑏\mathcal{U}(a,b)caligraphic_U ( italic_a , italic_b ) represents a uniform distribution with a support from a𝑎aitalic_a to b𝑏bitalic_b.

2 Problem Statement

2.1 Nonlinear Model Class

Consider the following discrete-time state-space model as the data-generating system:

x(k+1)𝑥𝑘1\displaystyle x(k+1)italic_x ( italic_k + 1 ) =f(x(k),u(k)),absent𝑓𝑥𝑘𝑢𝑘\displaystyle=f(x(k),u(k)),= italic_f ( italic_x ( italic_k ) , italic_u ( italic_k ) ) , (1)
y0(k)subscript𝑦0𝑘\displaystyle y_{0}(k)italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_k ) =g(x(k),u(k)),absent𝑔𝑥𝑘𝑢𝑘\displaystyle=g(x(k),u(k)),= italic_g ( italic_x ( italic_k ) , italic_u ( italic_k ) ) ,

where u(k)nu𝑢𝑘superscriptsubscript𝑛uu(k)\in\mathbb{R}^{n_{\mathrm{u}}}italic_u ( italic_k ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes the input, x(k)nx𝑥𝑘superscriptsubscript𝑛xx(k)\in\mathbb{R}^{n_{\mathrm{x}}}italic_x ( italic_k ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the state, y(k)ny𝑦𝑘superscriptsubscript𝑛yy(k)\in\mathbb{R}^{n_{\mathrm{y}}}italic_y ( italic_k ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the output, and k𝑘k\in\mathbb{Z}italic_k ∈ blackboard_Z represents the discrete time. Additionally, f:nx×nunx:𝑓superscriptsubscript𝑛xsubscript𝑛usuperscriptsubscript𝑛x{f}:\mathbb{R}^{n_{\mathrm{x}}\times n_{\mathrm{u}}}\to\mathbb{R}^{n_{\mathrm{% x}}}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and g:nx×nuny:𝑔superscriptsubscript𝑛xsubscript𝑛usuperscriptsubscript𝑛y{g}:\mathbb{R}^{n_{\mathrm{x}}\times n_{\mathrm{u}}}\to\mathbb{R}^{n_{\mathrm{% y}}}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are bounded deterministic vector functions. The training dataset 𝒟={(y(k),u(k))}k=1N𝒟superscriptsubscript𝑦𝑘𝑢𝑘𝑘1𝑁\mathcal{D}={\{({y}(k),{u}(k))\}_{k=1}^{N}}caligraphic_D = { ( italic_y ( italic_k ) , italic_u ( italic_k ) ) } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT contains N𝑁Nitalic_N noisy outputs y(k)=y0(k)+v(k)𝑦𝑘subscript𝑦0𝑘𝑣𝑘y(k)=y_{0}(k)+v(k)italic_y ( italic_k ) = italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_k ) + italic_v ( italic_k ), collected from an experiment on (1), where the noise v(k)𝑣𝑘v(k)italic_v ( italic_k ) is assumed to be a zero-mean random signal with finite variance independent from the input u(k)𝑢𝑘u(k)italic_u ( italic_k ).

Assume that we only have access to an a priori known state-space model:

x~(k+1)~𝑥𝑘1\displaystyle\tilde{x}(k+1)over~ start_ARG italic_x end_ARG ( italic_k + 1 ) =f~(x~(k),u(k)),absent~𝑓~𝑥𝑘𝑢𝑘\displaystyle=\tilde{f}\left(\tilde{x}(k),u(k)\right),= over~ start_ARG italic_f end_ARG ( over~ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) , (2)
y~0(k)subscript~𝑦0𝑘\displaystyle\tilde{y}_{0}(k)over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_k ) =g~(x~(k),u(k)),absent~𝑔~𝑥𝑘𝑢𝑘\displaystyle=\tilde{g}\left(\tilde{x}(k),u(k)\right),= over~ start_ARG italic_g end_ARG ( over~ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) ,

with the state and output x~(k)nx~𝑥𝑘superscriptsubscript𝑛x\tilde{x}(k)\in\mathbb{R}^{n_{\mathrm{x}}}over~ start_ARG italic_x end_ARG ( italic_k ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and y~(k)ny~𝑦𝑘superscriptsubscript𝑛y\tilde{y}(k)\in\mathbb{R}^{n_{\mathrm{y}}}over~ start_ARG italic_y end_ARG ( italic_k ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT that has the same model order as (1). Note that the functions f~()~𝑓\tilde{f}(\cdot)over~ start_ARG italic_f end_ARG ( ⋅ ) and g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) constitute the physically well-interpretable and a priori know dynamics of the system (1), i.e., the nominal model. However, the prior model (2) does not accurately capture the true dynamics (1). For instance, there may exist local nonlinearities in certain regions, which are not able to be obtained by a rough identification or modeling based on first principles. Hence, it is essential to augment this a priori known model using newly measured data through nonlinear system identification.

2.2 State-Space Neural Network Identification

To this end, we consider the following nonlinear discrete-time state-space model of (1), which has the following structure:

x^(k+1)^𝑥𝑘1\displaystyle\hat{x}(k+1)over^ start_ARG italic_x end_ARG ( italic_k + 1 ) =f~(x^(k),u(k))+fθ(x^(k),u(k)),absent~𝑓^𝑥𝑘𝑢𝑘subscript𝑓𝜃^𝑥𝑘𝑢𝑘\displaystyle=\tilde{f}\left(\hat{x}(k),u(k)\right)+f_{\theta}\left(\hat{x}(k)% ,u(k)\right),= over~ start_ARG italic_f end_ARG ( over^ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) + italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) , (3)
y^(k)^𝑦𝑘\displaystyle\hat{y}(k)over^ start_ARG italic_y end_ARG ( italic_k ) =g~(x^(k),u(k))+gθ(x^(k),u(k)),absent~𝑔^𝑥𝑘𝑢𝑘subscript𝑔𝜃^𝑥𝑘𝑢𝑘\displaystyle=\tilde{g}\left(\hat{x}(k),u(k)\right)+g_{\theta}\left(\hat{x}(k)% ,u(k)\right),= over~ start_ARG italic_g end_ARG ( over^ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) + italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) ,

where fθ()subscript𝑓𝜃f_{\theta}(\cdot)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) and gθ()subscript𝑔𝜃g_{\theta}(\cdot)italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) are the completion functions that model the dynamics that cannot be captured reliably by the idealistic model (2), and are represented by fully connected feedforward neural networks with one hidden layer containing nnsubscript𝑛𝑛n_{n}italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT neurons and a linear output layer:

fθ(x(k),u(k))=Wxϕ([WfxWfu][x(k)u(k)]+bf)+bxsubscript𝑓𝜃𝑥𝑘𝑢𝑘subscript𝑊𝑥italic-ϕdelimited-[]subscript𝑊𝑓𝑥subscript𝑊𝑓𝑢delimited-[]𝑥𝑘𝑢𝑘subscript𝑏𝑓subscript𝑏𝑥f_{\theta}(x(k),u(k))=W_{x}\phi\left(\left[W_{fx}W_{fu}\right]\left[\begin{% array}[]{l}x(k)\\ u(k)\end{array}\right]+b_{f}\right)+b_{x}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ( italic_k ) , italic_u ( italic_k ) ) = italic_W start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_ϕ ( [ italic_W start_POSTSUBSCRIPT italic_f italic_x end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_f italic_u end_POSTSUBSCRIPT ] [ start_ARRAY start_ROW start_CELL italic_x ( italic_k ) end_CELL end_ROW start_ROW start_CELL italic_u ( italic_k ) end_CELL end_ROW end_ARRAY ] + italic_b start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) + italic_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (4)

where ϕ()nu×1italic-ϕsuperscriptsubscript𝑛u1\phi(\cdot)\in\mathbb{R}^{n_{\mathrm{u}}\times 1}italic_ϕ ( ⋅ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_u end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT denotes the activation function, Wxsubscript𝑊𝑥W_{x}italic_W start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, Wfxsubscript𝑊𝑓𝑥W_{fx}italic_W start_POSTSUBSCRIPT italic_f italic_x end_POSTSUBSCRIPT, Wfusubscript𝑊𝑓𝑢W_{fu}italic_W start_POSTSUBSCRIPT italic_f italic_u end_POSTSUBSCRIPT, bfsubscript𝑏𝑓b_{f}italic_b start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and bxsubscript𝑏𝑥b_{x}italic_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT represent the weight and bias parameters of the neural network with proper dimensions, respectively. A similar representation is used for gθ(x(k),u(k))subscript𝑔𝜃𝑥𝑘𝑢𝑘g_{\theta}(x(k),u(k))italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ( italic_k ) , italic_u ( italic_k ) ).

As discussed in Suykens et al. (1995) and Schoukens (2021), the state-space model (3) can be written in a specific form of a recurrent neural network, i.e., an SS-NN. The parameters θ𝜃\thetaitalic_θ for the SS-NN can be trained by optimizing the data-based cost function over N𝑁Nitalic_N samples:

VData(θ,N)subscript𝑉Data𝜃𝑁\displaystyle V_{\mathrm{Data}}(\theta,N)italic_V start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_θ , italic_N ) =1Nk=1Ny(k)y^(k|θ)2\displaystyle=\frac{1}{N}\sum_{k=1}^{N}\left\|y(k)-\hat{y}(k|\theta)\right\|^{2}= divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ italic_y ( italic_k ) - over^ start_ARG italic_y end_ARG ( italic_k | italic_θ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (5)
θ^^𝜃\displaystyle\hat{\theta}over^ start_ARG italic_θ end_ARG =argminθVData(θ,N)absentsubscript𝜃subscript𝑉Data𝜃𝑁\displaystyle=\arg\min_{\theta}V_{\mathrm{Data}}(\theta,N)= roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_θ , italic_N )

where y^(k|θ)^𝑦conditional𝑘𝜃\hat{y}(k|\theta)over^ start_ARG italic_y end_ARG ( italic_k | italic_θ ) is the simulated output of the model (3) given the parameter vector θ𝜃\thetaitalic_θ. More detailed discussions of SS-NN are provided in Section 4.

From (5), it is obvious that the ANN simply learns the map** between system input and output data without considering any prior knowledge about the underlying physics. This makes it difficult for ANN to have good generalization performance outside of the training region, especially when the dataset is limited.

3 Classical PGNN for System Identification

In this section, we briefly introduce the concept of classical PGNN. Compared with the baseline ANN approach, there is an additional regularization term in the cost function to force the learnt model to follow the prior model even outside the training region.

The classical PGNN is trained by minimizing the following cost function:

V(θ,N,N¯)𝑉𝜃𝑁¯𝑁\displaystyle V(\theta,N,\bar{N})italic_V ( italic_θ , italic_N , over¯ start_ARG italic_N end_ARG ) =VData(θ,N)+γVPhy(θ,N¯)absentsubscript𝑉Data𝜃𝑁𝛾subscript𝑉Phy𝜃¯𝑁\displaystyle=V_{\mathrm{Data}}(\theta,N)+\gamma V_{\mathrm{Phy}}(\theta,\bar{% N})= italic_V start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_θ , italic_N ) + italic_γ italic_V start_POSTSUBSCRIPT roman_Phy end_POSTSUBSCRIPT ( italic_θ , over¯ start_ARG italic_N end_ARG ) (6)
θ^^𝜃\displaystyle\hat{\theta}over^ start_ARG italic_θ end_ARG =argminθV(θ,N,N¯)absentsubscript𝜃𝑉𝜃𝑁¯𝑁\displaystyle=\arg\min_{\theta}V(\theta,N,\bar{N})= roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_V ( italic_θ , italic_N , over¯ start_ARG italic_N end_ARG )

where VData(θ,N)subscript𝑉Data𝜃𝑁V_{\mathrm{Data}}(\theta,N)italic_V start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_θ , italic_N ) is given by (5), and the physics-based penalized term VPhy(θ,N¯)subscript𝑉Phy𝜃¯𝑁V_{\mathrm{Phy}}(\theta,\bar{N})italic_V start_POSTSUBSCRIPT roman_Phy end_POSTSUBSCRIPT ( italic_θ , over¯ start_ARG italic_N end_ARG ) is given by:

VPhy(θ,N¯)=1N¯k=1N¯y¯~(k)y¯^(k|θ)2V_{\mathrm{Phy}}(\theta,\bar{N})=\frac{1}{\bar{N}}\sum_{k=1}^{\bar{N}}\left\|% \tilde{\bar{y}}(k)-\hat{\bar{y}}(k|\theta)\right\|^{2}italic_V start_POSTSUBSCRIPT roman_Phy end_POSTSUBSCRIPT ( italic_θ , over¯ start_ARG italic_N end_ARG ) = divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_N end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_N end_ARG end_POSTSUPERSCRIPT ∥ over~ start_ARG over¯ start_ARG italic_y end_ARG end_ARG ( italic_k ) - over^ start_ARG over¯ start_ARG italic_y end_ARG end_ARG ( italic_k | italic_θ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (7)

where y¯~(k)~¯𝑦𝑘\tilde{\bar{y}}(k)over~ start_ARG over¯ start_ARG italic_y end_ARG end_ARG ( italic_k ) and y¯^(k|θ)^¯𝑦conditional𝑘𝜃\hat{\bar{y}}(k|\theta)over^ start_ARG over¯ start_ARG italic_y end_ARG end_ARG ( italic_k | italic_θ ) are the output response of the a priori known model (2) and the simulated model (3) given the regularization input signal u¯(k)¯𝑢𝑘\bar{u}(k)over¯ start_ARG italic_u end_ARG ( italic_k ), respectively, and γ>0𝛾subscriptabsent0\gamma\in\mathbb{R}_{>0}italic_γ ∈ blackboard_R start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT is the constant trade-off hyperparameter that balances between the data-fitting term and the regularization term in the overall cost. In this way, the prior model (2) is embedded in the trained ANN. Note that the physics-based cost VPhy(θ,N¯)subscript𝑉Phy𝜃¯𝑁V_{\mathrm{Phy}}(\theta,\bar{N})italic_V start_POSTSUBSCRIPT roman_Phy end_POSTSUBSCRIPT ( italic_θ , over¯ start_ARG italic_N end_ARG ) does not rely on the measurement y(k)𝑦𝑘y(k)italic_y ( italic_k ) from system (1). It is evaluated over a separate regularization dataset 𝒟Reg={y¯~(k),u¯(k)}k=1N¯subscript𝒟Regsuperscriptsubscript~¯𝑦𝑘¯𝑢𝑘𝑘1¯𝑁\mathcal{D}_{\mathrm{Reg}}=\{\tilde{\bar{y}}(k),\bar{u}(k)\}_{k=1}^{\bar{N}}caligraphic_D start_POSTSUBSCRIPT roman_Reg end_POSTSUBSCRIPT = { over~ start_ARG over¯ start_ARG italic_y end_ARG end_ARG ( italic_k ) , over¯ start_ARG italic_u end_ARG ( italic_k ) } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_N end_ARG end_POSTSUPERSCRIPT generated by the user using the baseline physics model (2).

As can be seen in (6), the penalization of the physics-guided part is at the output level. However, this can lead to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. A good example of this is given by two mass spring damper systems with slightly different resonance frequencies, as shown in Fig. 1. Furthermore, the classical PGNN cost function regularizes the model estimate for the whole state space, which means that the a priori known model is assumed to hold equally for any unseen region. Though this feature enables the classic PGNN to have better generalization performance than the baseline NN, but in an ideal setting, we would like to trust the information in the data when a high informativity is present, while we would like to follow the prior model in regions where the data provides little information, i.e., to preserve the behavior of the a priori known model.

Refer to caption
Figure 1: Frequency and time responses of two mass spring damper systems with only 10%percent1010\%10 % uncertainties in the system parameters. It can be observed that with a slight shift of resonance frequencies, the outputs are significantly different.

4 Weighted PGNN Method

4.1 Weighted function regularization

Unlike other model augmentation strategies, for instance, (Hoekstra et al., 2024), our approach aims to regularize state-space neural network estimation using a reference model and penalize the difference between physics and identified model at both the state and output levels. Moreover, the regularization should only be active in the regions where no data is present, i.e. the reference model prescribes the dynamics that the learned model should fall back to outside the training area.

The proposed approach starts by generating a surrogate input sequence u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG of length N¯¯𝑁\bar{N}over¯ start_ARG italic_N end_ARG. It is worth noting that u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG is not applied to the true system during optimization to acquire output measurements, but plays a role in the regularization of the proposed approach. Ideally, u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG should cover the full range of operation of the system. Then, the model estimate is evaluated both applying u𝑢uitalic_u and u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG on system (3), where the second input sequence results in the state sequence x¯^^¯𝑥\hat{\bar{x}}over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG.

Refer to caption
Figure 2: Model structure of the physics-based SS-NN.

The cost function for the proposed W-PGNN is given by:

V(θ,N,N¯)=VData(θ,N)+VReg(θ,w,N¯)𝑉𝜃𝑁¯𝑁subscript𝑉Data𝜃𝑁subscript𝑉Reg𝜃𝑤¯𝑁V(\theta,N,\bar{N})=V_{\mathrm{Data}}(\theta,N)+V_{\mathrm{Reg}}(\theta,w,\bar% {N})italic_V ( italic_θ , italic_N , over¯ start_ARG italic_N end_ARG ) = italic_V start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_θ , italic_N ) + italic_V start_POSTSUBSCRIPT roman_Reg end_POSTSUBSCRIPT ( italic_θ , italic_w , over¯ start_ARG italic_N end_ARG ) (8)

where the novel weighed regularization term is given by:

VReg(θ,w,N¯)1N¯j=1N¯wj(γxejx+γyejy)subscript𝑉Reg𝜃𝑤¯𝑁1¯𝑁superscriptsubscript𝑗1¯𝑁subscript𝑤𝑗subscript𝛾𝑥subscriptsuperscript𝑒𝑥𝑗subscript𝛾𝑦subscriptsuperscript𝑒𝑦𝑗V_{\mathrm{Reg}}(\theta,w,\bar{N})\triangleq\frac{1}{\bar{N}}\sum_{j=1}^{\bar{% N}}w_{j}\left(\gamma_{x}e^{x}_{j}+\gamma_{y}e^{y}_{j}\right)italic_V start_POSTSUBSCRIPT roman_Reg end_POSTSUBSCRIPT ( italic_θ , italic_w , over¯ start_ARG italic_N end_ARG ) ≜ divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_N end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_N end_ARG end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_γ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (9)

where ejxfθ(x¯^(j),u¯(j))2subscriptsuperscript𝑒𝑥𝑗superscriptnormsubscript𝑓𝜃^¯𝑥𝑗¯𝑢𝑗2e^{x}_{j}\triangleq\left\|f_{\theta}(\hat{\bar{x}}(j),\bar{u}(j))\right\|^{2}italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≜ ∥ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG ( italic_j ) , over¯ start_ARG italic_u end_ARG ( italic_j ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, ejygθ(x¯^(j),u¯(j))2subscriptsuperscript𝑒𝑦𝑗superscriptnormsubscript𝑔𝜃^¯𝑥𝑗¯𝑢𝑗2e^{y}_{j}\triangleq\left\|g_{\theta}(\hat{\bar{x}}(j),\bar{u}(j))\right\|^{2}italic_e start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≜ ∥ italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG ( italic_j ) , over¯ start_ARG italic_u end_ARG ( italic_j ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The weight vector wN¯×1𝑤superscript¯𝑁1w\in\mathbb{R}^{\bar{N}\times 1}italic_w ∈ blackboard_R start_POSTSUPERSCRIPT over¯ start_ARG italic_N end_ARG × 1 end_POSTSUPERSCRIPT is defined as:

wjsubscript𝑤𝑗\displaystyle w_{j}italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT 1k=1Nhk(z¯j)+ϵ,absent1superscriptsubscript𝑘1𝑁subscript𝑘subscript¯𝑧𝑗italic-ϵ\displaystyle\triangleq\frac{1}{\sum_{k=1}^{{N}}h_{k}(\bar{z}_{j})+\epsilon},≜ divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + italic_ϵ end_ARG , (10)
hk(z¯j)subscript𝑘subscript¯𝑧𝑗\displaystyle h_{k}(\bar{z}_{j})italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) =exp(z^kz¯j22σ2),absentsuperscriptnormsubscript^𝑧𝑘subscript¯𝑧𝑗22superscript𝜎2\displaystyle=\exp\left(-\frac{\|\hat{z}_{k}-\bar{z}_{j}\|^{2}}{2\sigma^{2}}% \right),= roman_exp ( - divide start_ARG ∥ over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ,

with the state-input pairs z^k=(x^(k),u(k))subscript^𝑧𝑘^𝑥𝑘𝑢𝑘\hat{z}_{k}=(\hat{x}(k),{u}(k))over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( over^ start_ARG italic_x end_ARG ( italic_k ) , italic_u ( italic_k ) ) and z¯j=(x¯^(j),u¯(j))subscript¯𝑧𝑗^¯𝑥𝑗¯𝑢𝑗\bar{z}_{j}=(\hat{\bar{x}}(j),\bar{u}(j))over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG ( italic_j ) , over¯ start_ARG italic_u end_ARG ( italic_j ) ). Furthermore, x^(k)^𝑥𝑘\hat{x}(k)over^ start_ARG italic_x end_ARG ( italic_k ) and x¯^(j)^¯𝑥𝑗\hat{\bar{x}}(j)over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG ( italic_j ) denote the responses of the estimated model (3) to the training input u𝑢uitalic_u and regularization input u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG, respectively. σ𝜎\sigmaitalic_σ represents the center width of hk()subscript𝑘h_{k}(\cdot)italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ), and ϵitalic-ϵ\epsilonitalic_ϵ is a small constant. Additionally, it is clear that if the weight wjsubscript𝑤𝑗w_{j}italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is set to 1111 for all j=1,,N¯𝑗1¯𝑁j=1,...,\bar{N}italic_j = 1 , … , over¯ start_ARG italic_N end_ARG, then it is a classical PGNN with state-level regularization; if the weight wjsubscript𝑤𝑗w_{j}italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is set to 00 for all j=1,,N¯𝑗1¯𝑁j=1,...,\bar{N}italic_j = 1 , … , over¯ start_ARG italic_N end_ARG, then the cost function (8) will completely fall back to the baseline (5).

One can observe from (10) that, the further away the current regularization state-input pair (x¯^(j),u¯(j))^¯𝑥𝑗¯𝑢𝑗(\hat{\bar{x}}(j),\bar{u}(j))( over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG ( italic_j ) , over¯ start_ARG italic_u end_ARG ( italic_j ) ) is from the training dataset, the larger the cost VReg(θ,w,N¯)subscript𝑉Reg𝜃𝑤¯𝑁V_{\mathrm{Reg}}(\theta,w,\bar{N})italic_V start_POSTSUBSCRIPT roman_Reg end_POSTSUBSCRIPT ( italic_θ , italic_w , over¯ start_ARG italic_N end_ARG ) will be. This pushes fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and gθsubscript𝑔𝜃g_{\theta}italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT towards zero, consequently bringing the identified model closer to the prior model in that region of the joint input-state space. Additionally, the terms ejxsubscriptsuperscript𝑒𝑥𝑗e^{x}_{j}italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and ejysubscriptsuperscript𝑒𝑦𝑗e^{y}_{j}italic_e start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT share the same weight because both the state and the output function estimate depend on the same state-input pair. Note that the proposed cost function does not penalize the difference between the output of the a priori known model and the estimated model, but it rather penalizes the difference between both the state and output function of both models in regions where little information is provided by the measured data. As a consequence, the regularization state-input pair (x¯^(j),u¯(j))^¯𝑥𝑗¯𝑢𝑗(\hat{\bar{x}}(j),\bar{u}(j))( over^ start_ARG over¯ start_ARG italic_x end_ARG end_ARG ( italic_j ) , over¯ start_ARG italic_u end_ARG ( italic_j ) ) should cover the full intended range of operation of the completed model for an effective regularized model augmentation through the proposed approach.

4.2 Implementation

The whole structure of the physics-based SS-NN is illustrated in Fig. 2. Specifically, the SS-NN architecture mainly comprises two components, namely the physics state/output layer and the state/output completion layer, where the physics state/output layer is employed to represent the prior known model f~()~𝑓\tilde{f}(\cdot)over~ start_ARG italic_f end_ARG ( ⋅ ) and g~()~𝑔\tilde{g}(\cdot)over~ start_ARG italic_g end_ARG ( ⋅ ) given in (2), and the state/output completion layer is utilized for estimating the unknown dynamics fθ()subscript𝑓𝜃f_{\theta}(\cdot)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) and gθ()subscript𝑔𝜃g_{\theta}(\cdot)italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) given in (3). It is worth mentioning that the prior model (2) should have the same state dimension as the estimated model (3), which is a limitation of the proposed approach.

Training: The hyperparameters of the classical PGNN, γ𝛾\gammaitalic_γ, and the proposed W-PGNN, γxsubscript𝛾𝑥\gamma_{x}italic_γ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, γysubscript𝛾𝑦\gamma_{y}italic_γ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT, σ𝜎\sigmaitalic_σ, ϵitalic-ϵ\epsilonitalic_ϵ, are determined by grid searching on a validation dataset. Specifically, the selection of σ𝜎\sigmaitalic_σ depends on the density of data distribution, for instance, sparsely distributed data can necessitate choosing a larger σ𝜎\sigmaitalic_σ. The weights and bias parameters θ=vec(Wx,Wfx,Wfu\theta={\mathrm{vec}}(W_{x},W_{fx},W_{fu}italic_θ = roman_vec ( italic_W start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_f italic_x end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_f italic_u end_POSTSUBSCRIPT ,Wy,Wgx,Wgu,bf,bx,bg,by),W_{y},W_{gx},W_{gu},b_{f},b_{x},b_{g},b_{y}), italic_W start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_g italic_x end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_g italic_u end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) of the SS-NN are trained by minimizing the cost function (8) via gradient-based approaches. Several optimization algorithms have been proposed to solve this problem, such as quasi-Newton (Fletcher and Powell, 1963) and conjugate gradients (Fletcher and Reeves, 1964) methods. In this paper, the Levenberg-Marquardt algorithm (Levenberg, 1944) is employed to find the minimum of (8). All the algorithms are implemented in the Matlab Deep Learning Toolbox and the Matlab Optimization Toolbox.

Initialization of model completion layer: Due to the use of the Levenberg-Marquardt optimization algorithm, an initial guess of the parameter values is required. We adopt the method in Schoukens (2021) to intuitively initialize the weight and bias parameters of the model completion layer, i.e., an explicit linear approximation is introduced:

fθ(x(k),u\displaystyle f_{\theta}(x(k),uitalic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ( italic_k ) , italic_u (k))=Aθx(k)+Bθu(k)\displaystyle(k))=A_{\theta}x(k)+B_{\theta}u(k)( italic_k ) ) = italic_A start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_x ( italic_k ) + italic_B start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_u ( italic_k ) (11)
+W~xϕ([W~fxW~fu][x(k)u(k)]+b~f)+b~xsubscript~𝑊𝑥italic-ϕdelimited-[]subscript~𝑊𝑓𝑥subscript~𝑊𝑓𝑢delimited-[]𝑥𝑘𝑢𝑘subscript~𝑏𝑓subscript~𝑏𝑥\displaystyle+\tilde{W}_{x}\phi\left(\left[\tilde{W}_{fx}~{}\tilde{W}_{fu}% \right]\left[\begin{array}[]{l}x(k)\\ u(k)\end{array}\right]+\tilde{b}_{f}\right)+\tilde{b}_{x}+ over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_ϕ ( [ over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_f italic_x end_POSTSUBSCRIPT over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_f italic_u end_POSTSUBSCRIPT ] [ start_ARRAY start_ROW start_CELL italic_x ( italic_k ) end_CELL end_ROW start_ROW start_CELL italic_u ( italic_k ) end_CELL end_ROW end_ARRAY ] + over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) + over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT
gθ(x(k),u\displaystyle g_{\theta}(x(k),uitalic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ( italic_k ) , italic_u (k))=Cθx(k)+Dθu(k)\displaystyle(k))=C_{\theta}x(k)+D_{\theta}u(k)( italic_k ) ) = italic_C start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_x ( italic_k ) + italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_u ( italic_k ) (12)
+W~yϕ([W~gxW~gu][x(k)u(k)]+b~g)+b~ysubscript~𝑊𝑦italic-ϕdelimited-[]subscript~𝑊𝑔𝑥subscript~𝑊𝑔𝑢delimited-[]𝑥𝑘𝑢𝑘subscript~𝑏𝑔subscript~𝑏𝑦\displaystyle~{}+\tilde{W}_{y}\phi\left(\left[\tilde{W}_{gx}~{}\tilde{W}_{gu}% \right]\left[\begin{array}[]{l}x(k)\\ u(k)\end{array}\right]+\tilde{b}_{g}\right)+\tilde{b}_{y}+ over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_ϕ ( [ over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_g italic_x end_POSTSUBSCRIPT over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_g italic_u end_POSTSUBSCRIPT ] [ start_ARRAY start_ROW start_CELL italic_x ( italic_k ) end_CELL end_ROW start_ROW start_CELL italic_u ( italic_k ) end_CELL end_ROW end_ARRAY ] + over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) + over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT

which leaves quite some flexibility in initializing the weights and biases of the nonlinear layers. Then, the weights and biases of the linear layer are initialized as Aθ=Bθ=Cθ=Dθ=0subscript𝐴𝜃subscript𝐵𝜃subscript𝐶𝜃subscript𝐷𝜃0A_{\theta}=B_{\theta}=C_{\theta}=D_{\theta}=0italic_A start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = 0 and b~x=b~y=0subscript~𝑏𝑥subscript~𝑏𝑦0\tilde{b}_{x}=\tilde{b}_{y}=0over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = 0. Additionally, the weights and biases of the nonlinear layer are initialized as W~x=W~y=0subscript~𝑊𝑥subscript~𝑊𝑦0\tilde{W}_{x}=\tilde{W}_{y}=0over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = 0, and W~fx,W~fu,W~gx,W~gu,b~f,b~gsubscript~𝑊𝑓𝑥subscript~𝑊𝑓𝑢subscript~𝑊𝑔𝑥subscript~𝑊𝑔𝑢subscript~𝑏𝑓subscript~𝑏𝑔\tilde{W}_{fx},\tilde{W}_{fu},\tilde{W}_{gx},\tilde{W}_{gu},\tilde{b}_{f},% \tilde{b}_{g}over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_f italic_x end_POSTSUBSCRIPT , over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_f italic_u end_POSTSUBSCRIPT , over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_g italic_x end_POSTSUBSCRIPT , over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_g italic_u end_POSTSUBSCRIPT , over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT are randomly initialized by 𝒰(1,1)𝒰11\mathcal{U}(-1,1)caligraphic_U ( - 1 , 1 ). This chosen parameter initialization ensures that the initial model behaves like the a priori provided physics model. During the optimization, the weights W~x=W~ysubscript~𝑊𝑥subscript~𝑊𝑦\tilde{W}_{x}=\tilde{W}_{y}over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT will become nonzero, and this will activate the model completion part of the model.

5 Simulation Study

In this section, simulation results are presented to illustrate the effectiveness of our proposed W-PGNN approach. A 1-D example is conducted to validate the superior learning performance of the proposed W-PGNN approach over the baseline and classical PGNN approaches. Consider a SISO system:

x(k+1)𝑥𝑘1\displaystyle x(k+1)italic_x ( italic_k + 1 ) =ax(k)+bu(k)+Δ(x(k)),absent𝑎𝑥𝑘𝑏𝑢𝑘Δ𝑥𝑘\displaystyle=ax(k)+bu(k)+\Delta(x(k)),= italic_a italic_x ( italic_k ) + italic_b italic_u ( italic_k ) + roman_Δ ( italic_x ( italic_k ) ) , (13)
y0(k)subscript𝑦0𝑘\displaystyle y_{0}(k)italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_k ) =x(k)absent𝑥𝑘\displaystyle=x(k)= italic_x ( italic_k )

with a=0.8187𝑎0.8187a=0.8187italic_a = 0.8187 and b=0.1813𝑏0.1813b=0.1813italic_b = 0.1813. The function Δ(x(k))Δ𝑥𝑘\Delta(x(k))roman_Δ ( italic_x ( italic_k ) ) is defined as:

Δ(x)=0.2(ex2l2e(xc)2l2)Δ𝑥0.2superscript𝑒superscript𝑥2superscript𝑙2superscript𝑒superscript𝑥𝑐2superscript𝑙2\Delta(x)=0.2\left(e^{-\frac{x^{2}}{l^{2}}}-e^{-\frac{(x-c)^{2}}{l^{2}}}\right)roman_Δ ( italic_x ) = 0.2 ( italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - divide start_ARG ( italic_x - italic_c ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT ) (14)

with c=0.3𝑐0.3c=-0.3italic_c = - 0.3, l=0.2𝑙0.2l=0.2italic_l = 0.2, which represents the local nonlinearity that is not able to be expressed by the given baseline physics model. Then, the augmentation structure (3) is given in terms of the prior physics model f~(x(k),u(k))=ax(k)+bu(k)~𝑓𝑥𝑘𝑢𝑘𝑎𝑥𝑘𝑏𝑢𝑘\tilde{f}(x(k),u(k))=ax(k)+bu(k)over~ start_ARG italic_f end_ARG ( italic_x ( italic_k ) , italic_u ( italic_k ) ) = italic_a italic_x ( italic_k ) + italic_b italic_u ( italic_k ), g~(x(k),u(k))=x(k)~𝑔𝑥𝑘𝑢𝑘𝑥𝑘\tilde{g}(x(k),u(k))=x(k)over~ start_ARG italic_g end_ARG ( italic_x ( italic_k ) , italic_u ( italic_k ) ) = italic_x ( italic_k ) and the completion function fθ(x)subscript𝑓𝜃𝑥f_{\theta}(x)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) aimed to identify Δ(x(k))Δ𝑥𝑘\Delta(x(k))roman_Δ ( italic_x ( italic_k ) ) while gθ=0subscript𝑔𝜃0g_{\theta}=0italic_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = 0 in this case. Thus, the goal is to augment the prior model f~(x(k),u(k))~𝑓𝑥𝑘𝑢𝑘\tilde{f}(x(k),u(k))over~ start_ARG italic_f end_ARG ( italic_x ( italic_k ) , italic_u ( italic_k ) ) with a well-estimated fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT based on the proposed W-PGNN approach.

Refer to caption
Figure 3: Illustration of the training, regularization, and test input and output signals used in the simulation.
Refer to caption
Refer to caption
Figure 4: Estimation results under the considered approaches in terms of fθ(x)+axsubscript𝑓𝜃𝑥𝑎𝑥f_{\theta}(x)+axitalic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) + italic_a italic_x (top), and the absolute value of estimation error of fθ(x)subscript𝑓𝜃𝑥f_{\theta}(x)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) over x𝑥xitalic_x (bottom).

With this SISO system (13), the training input is selected as utrain(k)=sin(0.15k)0.2subscript𝑢train𝑘0.15𝑘0.2u_{\mathrm{train}}(k)=\sin(0.15k)-0.2italic_u start_POSTSUBSCRIPT roman_train end_POSTSUBSCRIPT ( italic_k ) = roman_sin ( 0.15 italic_k ) - 0.2 with N=200𝑁200N=200italic_N = 200 samples to generate the training dataset 𝒟𝒟\mathcal{D}caligraphic_D. Furthermore, the regularization input signal is designed as a concatenation of signals u¯(k)=8sin(0.2k)¯𝑢𝑘80.2𝑘\bar{u}(k)=8\sin(0.2k)over¯ start_ARG italic_u end_ARG ( italic_k ) = 8 roman_sin ( 0.2 italic_k ) and u¯(k)=(8+(2/500)k)𝒩(0,1)¯𝑢𝑘82500𝑘𝒩01\bar{u}(k)=(8+(2/500)k)\mathcal{N}(0,1)over¯ start_ARG italic_u end_ARG ( italic_k ) = ( 8 + ( 2 / 500 ) italic_k ) caligraphic_N ( 0 , 1 ), each with a length of 500, respectively, leading to a 𝒟Regsubscript𝒟Reg\mathcal{D}_{\mathrm{Reg}}caligraphic_D start_POSTSUBSCRIPT roman_Reg end_POSTSUBSCRIPT with size N¯=1000¯𝑁1000\bar{N}=1000over¯ start_ARG italic_N end_ARG = 1000. In addition, the test input signal is selected as utest(k)=sin(0.01k+0.5)+sin(0.02k0.1)2sin(0.03k+0.2)subscript𝑢test𝑘0.01𝑘0.50.02𝑘0.120.03𝑘0.2u_{\mathrm{test}}(k)=\sin(0.01k+0.5)+\sin(0.02k-0.1)-2\sin(0.03k+0.2)italic_u start_POSTSUBSCRIPT roman_test end_POSTSUBSCRIPT ( italic_k ) = roman_sin ( 0.01 italic_k + 0.5 ) + roman_sin ( 0.02 italic_k - 0.1 ) - 2 roman_sin ( 0.03 italic_k + 0.2 ) with N=500𝑁500N=500italic_N = 500 samples, which will explore a much larger region of input-output space than the training dataset. It is worth noting that only noise at the output of the system is present with SNR\approx40dB. The aforementioned signals are visualized in Fig. 3, which implies that the training dataset is significantly less informative than the test dataset. This is in line with the model augmentation philosophy of this work: as an adequate prior model is already in place, we only would like to augment this model using a simple dataset dedicated to a particular region.

To construct the NN model, the activation function is chosen as the radial basis function because of its universal approximation capability. A total of 20 neurons (nn=20subscript𝑛𝑛20n_{n}=20italic_n start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 20) are used in the state/output completion layer. Moreover, to determine the most suitable hyperparameters for classical PGNN and the proposed W-PGNN, a grid search is conducted on the validation dataset 𝒟Valsubscript𝒟Val\mathcal{D}_{\mathrm{Val}}caligraphic_D start_POSTSUBSCRIPT roman_Val end_POSTSUBSCRIPT, which is generated by validation input signal uVal(k)=1.08sin(0.15k)0.2subscript𝑢Val𝑘1.080.15𝑘0.2u_{\mathrm{Val}}(k)=1.08\sin(0.15k)-0.2italic_u start_POSTSUBSCRIPT roman_Val end_POSTSUBSCRIPT ( italic_k ) = 1.08 roman_sin ( 0.15 italic_k ) - 0.2 for the classical PGNN, and uVal(k)=uTrain(k)subscript𝑢Val𝑘subscript𝑢Train𝑘u_{\mathrm{Val}}(k)=u_{\mathrm{Train}}(k)italic_u start_POSTSUBSCRIPT roman_Val end_POSTSUBSCRIPT ( italic_k ) = italic_u start_POSTSUBSCRIPT roman_Train end_POSTSUBSCRIPT ( italic_k ) for the W-PGNN. Both of them are 500 samples long. The results of the hyperparameter search are: γ=103𝛾superscript103\gamma=10^{-3}italic_γ = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, γx=γy=104subscript𝛾𝑥subscript𝛾𝑦superscript104\gamma_{x}=\gamma_{y}=10^{-4}italic_γ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, σ=0.001𝜎0.001\sigma=\sqrt{0.001}italic_σ = square-root start_ARG 0.001 end_ARG, and ϵ=0.1italic-ϵ0.1\epsilon=0.1italic_ϵ = 0.1. Then all three approaches are trained on the obtained dataset 𝒟𝒟\mathcal{D}caligraphic_D and 𝒟Regsubscript𝒟Reg\mathcal{D}_{\mathrm{Reg}}caligraphic_D start_POSTSUBSCRIPT roman_Reg end_POSTSUBSCRIPT, of which the parameters θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG are optimized by the Levenberg-Marquardt algorithm, as mentioned in Section 4.2.

Refer to caption
Figure 5: Test results for the augmented state-space model by the three approaches on the test dataset. The proposed W-PGNN shows the best estimation results both within and outside the training region.

The estimation results in terms of fθ(x)+axsubscript𝑓𝜃𝑥𝑎𝑥f_{\theta}(x)+axitalic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) + italic_a italic_x, and the absolute value of estimation error Δ(x)fθ(x)Δ𝑥subscript𝑓𝜃𝑥\Delta(x)-f_{\theta}(x)roman_Δ ( italic_x ) - italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) are depicted in Fig. 4, where the shaded area indicates the training data region and the black dots represent the linear prior model. It is clear that all three approaches are capable of capturing the true model well inside the training region, however, the baseline NN approach has poor generalization performance with the unseen data. Moreover, both the classical PGNN and the proposed W-PGNN approaches show good learning results outside the training region. However, the performance of the proposed W-PGNN approach is approximately 20% better compared to the classical PGNN (see also Table 1), mainly resulting from the novel weighted-regularization physics-based term in the cost function, which enables the learned model to follow the ground truth within the range of the training data, and in turn, be forced toward the linear prior model within the low-informative data area. This can also be seen in Fig. 5, where the zoom-in sub-figures show the estimation trajectories inside and outside the training region, respectively. It can be observed that despite the test dataset being much larger than the training dataset the proposed W-PGNN still has the capability of identifying the system in the whole state space with the highest estimation accuracy.

Furthermore, a Monte Carlo simulation with 10 runs under random initial parameters is conducted to compare the estimation error of the three approaches. To assess the simulation performance of the identified models, the following root mean squared error (RMSE) on the test dataset is utilized:

eRMSE=1Nk=1N(y(k)y^(k|θ))2subscript𝑒RMSE1𝑁superscriptsubscript𝑘1𝑁superscript𝑦𝑘^𝑦conditional𝑘𝜃2e_{\mathrm{RMSE}}=\sqrt{\frac{1}{N}\sum_{k=1}^{N}(y(k)-\hat{y}(k|\theta))^{2}}italic_e start_POSTSUBSCRIPT roman_RMSE end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_y ( italic_k ) - over^ start_ARG italic_y end_ARG ( italic_k | italic_θ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (15)
Table 1: Quantitative evaluation of the performance of the three approaches in terms of their RMSE on the training and test dataset.
Approach eRMSEsubscript𝑒RMSEe_{\mathrm{RMSE}}italic_e start_POSTSUBSCRIPT roman_RMSE end_POSTSUBSCRIPT(×102absentsuperscript102\times 10^{-2}× 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT)
Training set Test set
Baseline 0.241±plus-or-minus\pm±0.027 199.7±plus-or-minus\pm± 170.3
Classical PGNN 0.206±plus-or-minus\pm±0.024 5.791±plus-or-minus\pm±2.810
W-PGNN 0.209±plus-or-minus\pm±0.021 4.303±1.503plus-or-minus4.3031.503\mathbf{4.303\pm 1.503}bold_4.303 ± bold_1.503

Table 1 quantifies the RMSE and its variability of the three considered approaches on the training and test dataset over 10 runs. One can see that the achieved RMSE of the proposed W-PGNN significantly improves and shows better generalization performance on the unseen dataset compared to the baseline NN and classical PGNN.

6 Conclusion

A novel PGNN-based model completion strategy is proposed in this paper for nonlinear state-space model identification. Specifically, we enhance the interpretability and generalization performance of classical PGNN by introducing a weighted function regularization strategy, i.e., the W-PGNN. A new weighted regularization cost function is presented to penalize the difference between the physics and identified model at both the state and output levels in regions with low information content. The proposed strategy provides new perspectives into the fusion of physics-guided and black-box data-driven modeling approaches, especially in cases where the available data is limited. The effectiveness of W-PGNN has been analyzed and demonstrated by numerical simulations and compared with some classical ANN modeling methods. Future work will focus on extending the application scenarios of the proposed W-PGNN method to more complex and larger benchmarks.

References

  • Amoura et al. (2011) Amoura, K., Wira, P., and Djennoune, S. (2011). A state-space neural network for modeling dynamical nonlinear systems. In Proc. of the International Conference on Neural Computation Theory and Applications, 369–376.
  • Beintema et al. (2023) Beintema, G.I., Schoukens, M., and Tóth, R. (2023). Deep subspace encoders for nonlinear system identification. Automatica, 156, 111210.
  • Billings (2013) Billings, S.A. (2013). Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons.
  • Bolderman et al. (2024) Bolderman, M., Butler, H., Koekebakker, S., van Horssen, E., Kamidi, R., Spaan-Burke, T., Strijbosch, N., and Lazar, M. (2024). Physics-guided neural networks for feedforward control with input-to-state-stability guarantees. Control Engineering Practice, 145, 105851.
  • Fletcher and Reeves (1964) Fletcher, R. and Reeves, C.M. (1964). Function minimization by conjugate gradients. The computer journal, 7(2), 149–154.
  • Fletcher and Powell (1963) Fletcher, R. and Powell, M.J. (1963). A rapidly convergent descent method for minimization. The computer journal, 6(2), 163–168.
  • Forgione and Piga (2020) Forgione, M. and Piga, D. (2020). Model structures and fitting criteria for system identification with neural networks. In Proc. of the 14th International Conference on Application of Information and Communication Technologies, 1–6.
  • Hoekstra et al. (2024) Hoekstra, J.H., Verhoek, C., Tóth, R., and Schoukens, M. (2024). Learning-based model augmentation with LFRs. arXiv preprint arXiv:2404.01901.
  • Karpatne et al. (2017) Karpatne, A., Watkins, W., Read, J., and Kumar, V. (2017). Physics-guided neural networks (PGNN): An application in lake temperature modeling. arXiv preprint arXiv:1710.11431, 2.
  • Levenberg (1944) Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2(2), 164–168.
  • Paduart et al. (2010) Paduart, J., Lauwers, L., Swevers, J., Smolders, K., Schoukens, J., and Pintelon, R. (2010). Identification of nonlinear systems using polynomial nonlinear state space models. Automatica, 46(4), 647–656.
  • Scarselli and Tsoi (1998) Scarselli, F. and Tsoi, A.C. (1998). Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results. Neural networks, 11(1), 15–37.
  • Schön et al. (2018) Schön, T.B., Svensson, A., Murray, L., and Lindsten, F. (2018). Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo. Mechanical systems and signal processing, 104, 866–883.
  • Schön et al. (2011) Schön, T.B., Wills, A., and Ninness, B. (2011). System identification of nonlinear state-space models. Automatica, 47(1), 39–49.
  • Schoukens and Ljung (2019) Schoukens, J. and Ljung, L. (2019). Nonlinear system identification: A user-oriented road map. IEEE Control Systems Magazine, 39(6), 28–99.
  • Schoukens (2021) Schoukens, M. (2021). Improved initialization of state-space artificial neural networks. In Proc. of the European Control Conference, 1913–1918.
  • Schoukens and Tiels (2017) Schoukens, M. and Tiels, K. (2017). Identification of block-oriented nonlinear systems starting from linear approximations: A survey. Automatica, 85, 272–292.
  • Suykens et al. (1995) Suykens, J.A., De Moor, B.L., and Vandewalle, J. (1995). Nonlinear system identification using neural state space models, applicable to robust control design. International Journal of Control, 62(1), 129–152.
  • Verdult (2002) Verdult, V. (2002). Nonlinear system identification: a state-space approach. Ph.D. thesis, University of Twente, The Netherlands.