Physics-Guided State-Space Model Augmentation Using Weighted Regularized Neural Networks
Abstract
Physics-guided neural networks (PGNN) is an effective tool that combines the benefits of data-driven modeling with the interpretability and generalization of underlying physical information. However, for a classical PGNN, the penalization of the physics-guided part is at the output level, which leads to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. Furthermore, the classical PGNN cost function regularizes the model estimate over the entire state space with a constant trade-off hyperparameter. In this paper, we introduce a novel model augmentation strategy for nonlinear state-space model identification based on PGNN, using a weighted function regularization (W-PGNN). The proposed approach can efficiently augment the prior physics-based state-space models based on measurement data. A new weighted regularization term is added to the cost function to penalize the difference between the state and output function of the baseline physics-based and final identified model. This ensures the estimated model follows the baseline physics model functions in regions where the data has low information content, while placing greater trust in the data when a high informativity is present. The effectiveness of the proposed strategy over the current PGNN method is demonstrated on a benchmark example.
keywords:
System Identification, Physics-Guided Neural Networks, State Space1 Introduction
Model-based design plays a crucial role in achieving satisfactory performance for complex dynamic systems by providing an interpretable framework that facilitates a deep understanding of system behaviors, including nonlinearities such as dam** and friction. However, the accurate and comprehensive system dynamics that can be modeled by first principle laws are often costly to obtain.
Nonlinear system identification (Schoukens and Ljung, 2019) is a well-established topic and can be characterized by a wide range of model classes such as state-space models (Schön et al., 2011), block-oriented models (Schoukens and Tiels, 2017), NARMAX (Billings, 2013), etc. Wherein, extensive research (Verdult, 2002; Paduart et al., 2010; Schön et al., 2018) on identification with nonlinear state-space (NLSS) models has shown its flexibility for handling multi-variable systems with potentially fewer parameters. Estimation of state-space models is advantageous for the subsequent control design, given the dependency of many nonlinear control methods on such representation of the system behavior.
Artificial neural networks (ANNs) have long been a focus of interest in the field of nonlinear system identification because of their high expressiveness, flexibility, and capability of approximating functions with arbitrary accuracy (Scarselli and Tsoi, 1998). In Suykens et al. (1995), recurrent neural networks have already been employed to represent a nonlinear state-space model. This structure is referred to as state-space neural network (SS-NN) and has been further discussed in (Amoura et al., 2011; Forgione and Piga, 2020; Schoukens, 2021; Beintema et al., 2023). Recently, Beintema et al. (2023) have introduced a computationally efficient nonlinear system state-space identification algorithm based on a subspace-encoder network (SUBNET). Nonetheless, ANNs are typically black-box models that lack physical interpretation, and exhibit poor generalization capabilities outside the training dataset, especially when the training data is limited. Hence, even though the ANNs may exhibit improved accuracy compared with first-principle modeling, deploying such models in practice or the controllers that are designed for them is simply dangerous.
To address this issue, physics-guided neural network (PGNN) (Karpatne et al., 2017) has been introduced, also within the field of systems and control (Bolderman et al., 2024), that ensures the interpretability and generalization capabilities of the estimated models. Compared with the ANNs, a physics-based cost function is incorporated into the optimization objective of PGNN, ensuring that the learned model not only achieves high accuracy on the training dataset, but also shows consistency with known physics laws on the unseen region without the need for large amounts of ground truth data.
However, there are some open technical issues with using PGNNs in nonlinear state space model identification. First, the classical PGNN does not perform the model augmentation, i.e., the prior model is only used to compute the physics-based regularization term in the cost function. Second, the classical PGNN penalizes the difference between the physics model and the identified model at the output level, which can lead to conservative estimation results. This is because systems with highly similar state-transition functions can have significantly different time-series outputs. Furthermore, the physics-based term of the classical PGNN regularizes the model difference over the entire state space, which makes this approach lose some flexibility, especially when the assumed prior model is inaccurate.
Motivated by these facts, this paper proposes an innovative PGNN-based state-space modeling strategy for nonlinear system identification, namely, W-PGNN, to efficiently complete prior physics-based state-space models with a weighted regularized SS-NN. The main contributions are as follows:
1) A new weighted-regularization cost function is designed to penalize the difference between both the state and output functions of the baseline physics-based and final identified models in regions where measured data provides little information.
2) Compared to the classical PGNNs, the proposed identification approach makes more extensive use of the pre-existing approximate model. The learned dynamics are capable of adhering to the data in regions with high information content, and preserving the behavior of the baseline physics model outside this region. This significantly enhances the flexibility of the SS-NN model.
The remainder of this paper is organized as follows. Section 2 introduces the nonlinear model class and the identification method with a state-space neural network. The classical PGNN method is discussed in Section 3. The proposed W-PGNN method is detailed in Section 4. Numerical simulation results are provided in Section 5, followed by the conclusions in Section 6.
Notation: and denote the sets of real numbers and integers, respectively. The 2-norm of a vector or a matrix is denoted as . denotes the column-wise composition of vectors. is the standard normal distribution, while represents a uniform distribution with a support from to .
2 Problem Statement
2.1 Nonlinear Model Class
Consider the following discrete-time state-space model as the data-generating system:
(1) | ||||
where denotes the input, is the state, is the output, and represents the discrete time. Additionally, and are bounded deterministic vector functions. The training dataset contains noisy outputs , collected from an experiment on (1), where the noise is assumed to be a zero-mean random signal with finite variance independent from the input .
Assume that we only have access to an a priori known state-space model:
(2) | ||||
with the state and output and that has the same model order as (1). Note that the functions and constitute the physically well-interpretable and a priori know dynamics of the system (1), i.e., the nominal model. However, the prior model (2) does not accurately capture the true dynamics (1). For instance, there may exist local nonlinearities in certain regions, which are not able to be obtained by a rough identification or modeling based on first principles. Hence, it is essential to augment this a priori known model using newly measured data through nonlinear system identification.
2.2 State-Space Neural Network Identification
To this end, we consider the following nonlinear discrete-time state-space model of (1), which has the following structure:
(3) | ||||
where and are the completion functions that model the dynamics that cannot be captured reliably by the idealistic model (2), and are represented by fully connected feedforward neural networks with one hidden layer containing neurons and a linear output layer:
(4) |
where denotes the activation function, , , , and represent the weight and bias parameters of the neural network with proper dimensions, respectively. A similar representation is used for .
As discussed in Suykens et al. (1995) and Schoukens (2021), the state-space model (3) can be written in a specific form of a recurrent neural network, i.e., an SS-NN. The parameters for the SS-NN can be trained by optimizing the data-based cost function over samples:
(5) | ||||
where is the simulated output of the model (3) given the parameter vector . More detailed discussions of SS-NN are provided in Section 4.
From (5), it is obvious that the ANN simply learns the map** between system input and output data without considering any prior knowledge about the underlying physics. This makes it difficult for ANN to have good generalization performance outside of the training region, especially when the dataset is limited.
3 Classical PGNN for System Identification
In this section, we briefly introduce the concept of classical PGNN. Compared with the baseline ANN approach, there is an additional regularization term in the cost function to force the learnt model to follow the prior model even outside the training region.
The classical PGNN is trained by minimizing the following cost function:
(6) | ||||
where is given by (5), and the physics-based penalized term is given by:
(7) |
where and are the output response of the a priori known model (2) and the simulated model (3) given the regularization input signal , respectively, and is the constant trade-off hyperparameter that balances between the data-fitting term and the regularization term in the overall cost. In this way, the prior model (2) is embedded in the trained ANN. Note that the physics-based cost does not rely on the measurement from system (1). It is evaluated over a separate regularization dataset generated by the user using the baseline physics model (2).
As can be seen in (6), the penalization of the physics-guided part is at the output level. However, this can lead to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. A good example of this is given by two mass spring damper systems with slightly different resonance frequencies, as shown in Fig. 1. Furthermore, the classical PGNN cost function regularizes the model estimate for the whole state space, which means that the a priori known model is assumed to hold equally for any unseen region. Though this feature enables the classic PGNN to have better generalization performance than the baseline NN, but in an ideal setting, we would like to trust the information in the data when a high informativity is present, while we would like to follow the prior model in regions where the data provides little information, i.e., to preserve the behavior of the a priori known model.
![Refer to caption](x1.png)
4 Weighted PGNN Method
4.1 Weighted function regularization
Unlike other model augmentation strategies, for instance, (Hoekstra et al., 2024), our approach aims to regularize state-space neural network estimation using a reference model and penalize the difference between physics and identified model at both the state and output levels. Moreover, the regularization should only be active in the regions where no data is present, i.e. the reference model prescribes the dynamics that the learned model should fall back to outside the training area.
The proposed approach starts by generating a surrogate input sequence of length . It is worth noting that is not applied to the true system during optimization to acquire output measurements, but plays a role in the regularization of the proposed approach. Ideally, should cover the full range of operation of the system. Then, the model estimate is evaluated both applying and on system (3), where the second input sequence results in the state sequence .
![Refer to caption](x2.png)
The cost function for the proposed W-PGNN is given by:
(8) |
where the novel weighed regularization term is given by:
(9) |
where , . The weight vector is defined as:
(10) | ||||
with the state-input pairs and . Furthermore, and denote the responses of the estimated model (3) to the training input and regularization input , respectively. represents the center width of , and is a small constant. Additionally, it is clear that if the weight is set to for all , then it is a classical PGNN with state-level regularization; if the weight is set to for all , then the cost function (8) will completely fall back to the baseline (5).
One can observe from (10) that, the further away the current regularization state-input pair is from the training dataset, the larger the cost will be. This pushes and towards zero, consequently bringing the identified model closer to the prior model in that region of the joint input-state space. Additionally, the terms and share the same weight because both the state and the output function estimate depend on the same state-input pair. Note that the proposed cost function does not penalize the difference between the output of the a priori known model and the estimated model, but it rather penalizes the difference between both the state and output function of both models in regions where little information is provided by the measured data. As a consequence, the regularization state-input pair should cover the full intended range of operation of the completed model for an effective regularized model augmentation through the proposed approach.
4.2 Implementation
The whole structure of the physics-based SS-NN is illustrated in Fig. 2. Specifically, the SS-NN architecture mainly comprises two components, namely the physics state/output layer and the state/output completion layer, where the physics state/output layer is employed to represent the prior known model and given in (2), and the state/output completion layer is utilized for estimating the unknown dynamics and given in (3). It is worth mentioning that the prior model (2) should have the same state dimension as the estimated model (3), which is a limitation of the proposed approach.
Training: The hyperparameters of the classical PGNN, , and the proposed W-PGNN, , , , , are determined by grid searching on a validation dataset. Specifically, the selection of depends on the density of data distribution, for instance, sparsely distributed data can necessitate choosing a larger . The weights and bias parameters of the SS-NN are trained by minimizing the cost function (8) via gradient-based approaches. Several optimization algorithms have been proposed to solve this problem, such as quasi-Newton (Fletcher and Powell, 1963) and conjugate gradients (Fletcher and Reeves, 1964) methods. In this paper, the Levenberg-Marquardt algorithm (Levenberg, 1944) is employed to find the minimum of (8). All the algorithms are implemented in the Matlab Deep Learning Toolbox and the Matlab Optimization Toolbox.
Initialization of model completion layer: Due to the use of the Levenberg-Marquardt optimization algorithm, an initial guess of the parameter values is required. We adopt the method in Schoukens (2021) to intuitively initialize the weight and bias parameters of the model completion layer, i.e., an explicit linear approximation is introduced:
(11) | ||||
(12) | ||||
which leaves quite some flexibility in initializing the weights and biases of the nonlinear layers. Then, the weights and biases of the linear layer are initialized as and . Additionally, the weights and biases of the nonlinear layer are initialized as , and are randomly initialized by . This chosen parameter initialization ensures that the initial model behaves like the a priori provided physics model. During the optimization, the weights will become nonzero, and this will activate the model completion part of the model.
5 Simulation Study
In this section, simulation results are presented to illustrate the effectiveness of our proposed W-PGNN approach. A 1-D example is conducted to validate the superior learning performance of the proposed W-PGNN approach over the baseline and classical PGNN approaches. Consider a SISO system:
(13) | ||||
with and . The function is defined as:
(14) |
with , , which represents the local nonlinearity that is not able to be expressed by the given baseline physics model. Then, the augmentation structure (3) is given in terms of the prior physics model , and the completion function aimed to identify while in this case. Thus, the goal is to augment the prior model with a well-estimated based on the proposed W-PGNN approach.
![Refer to caption](x3.png)
![Refer to caption](x4.png)
![Refer to caption](x5.png)
With this SISO system (13), the training input is selected as with samples to generate the training dataset . Furthermore, the regularization input signal is designed as a concatenation of signals and , each with a length of 500, respectively, leading to a with size . In addition, the test input signal is selected as with samples, which will explore a much larger region of input-output space than the training dataset. It is worth noting that only noise at the output of the system is present with SNR40dB. The aforementioned signals are visualized in Fig. 3, which implies that the training dataset is significantly less informative than the test dataset. This is in line with the model augmentation philosophy of this work: as an adequate prior model is already in place, we only would like to augment this model using a simple dataset dedicated to a particular region.
To construct the NN model, the activation function is chosen as the radial basis function because of its universal approximation capability. A total of 20 neurons () are used in the state/output completion layer. Moreover, to determine the most suitable hyperparameters for classical PGNN and the proposed W-PGNN, a grid search is conducted on the validation dataset , which is generated by validation input signal for the classical PGNN, and for the W-PGNN. Both of them are 500 samples long. The results of the hyperparameter search are: , , , and . Then all three approaches are trained on the obtained dataset and , of which the parameters are optimized by the Levenberg-Marquardt algorithm, as mentioned in Section 4.2.
![Refer to caption](x6.png)
The estimation results in terms of , and the absolute value of estimation error are depicted in Fig. 4, where the shaded area indicates the training data region and the black dots represent the linear prior model. It is clear that all three approaches are capable of capturing the true model well inside the training region, however, the baseline NN approach has poor generalization performance with the unseen data. Moreover, both the classical PGNN and the proposed W-PGNN approaches show good learning results outside the training region. However, the performance of the proposed W-PGNN approach is approximately 20% better compared to the classical PGNN (see also Table 1), mainly resulting from the novel weighted-regularization physics-based term in the cost function, which enables the learned model to follow the ground truth within the range of the training data, and in turn, be forced toward the linear prior model within the low-informative data area. This can also be seen in Fig. 5, where the zoom-in sub-figures show the estimation trajectories inside and outside the training region, respectively. It can be observed that despite the test dataset being much larger than the training dataset the proposed W-PGNN still has the capability of identifying the system in the whole state space with the highest estimation accuracy.
Furthermore, a Monte Carlo simulation with 10 runs under random initial parameters is conducted to compare the estimation error of the three approaches. To assess the simulation performance of the identified models, the following root mean squared error (RMSE) on the test dataset is utilized:
(15) |
Approach | () | |
---|---|---|
Training set | Test set | |
Baseline | 0.2410.027 | 199.7 170.3 |
Classical PGNN | 0.2060.024 | 5.7912.810 |
W-PGNN | 0.2090.021 |
Table 1 quantifies the RMSE and its variability of the three considered approaches on the training and test dataset over 10 runs. One can see that the achieved RMSE of the proposed W-PGNN significantly improves and shows better generalization performance on the unseen dataset compared to the baseline NN and classical PGNN.
6 Conclusion
A novel PGNN-based model completion strategy is proposed in this paper for nonlinear state-space model identification. Specifically, we enhance the interpretability and generalization performance of classical PGNN by introducing a weighted function regularization strategy, i.e., the W-PGNN. A new weighted regularization cost function is presented to penalize the difference between the physics and identified model at both the state and output levels in regions with low information content. The proposed strategy provides new perspectives into the fusion of physics-guided and black-box data-driven modeling approaches, especially in cases where the available data is limited. The effectiveness of W-PGNN has been analyzed and demonstrated by numerical simulations and compared with some classical ANN modeling methods. Future work will focus on extending the application scenarios of the proposed W-PGNN method to more complex and larger benchmarks.
References
- Amoura et al. (2011) Amoura, K., Wira, P., and Djennoune, S. (2011). A state-space neural network for modeling dynamical nonlinear systems. In Proc. of the International Conference on Neural Computation Theory and Applications, 369–376.
- Beintema et al. (2023) Beintema, G.I., Schoukens, M., and Tóth, R. (2023). Deep subspace encoders for nonlinear system identification. Automatica, 156, 111210.
- Billings (2013) Billings, S.A. (2013). Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons.
- Bolderman et al. (2024) Bolderman, M., Butler, H., Koekebakker, S., van Horssen, E., Kamidi, R., Spaan-Burke, T., Strijbosch, N., and Lazar, M. (2024). Physics-guided neural networks for feedforward control with input-to-state-stability guarantees. Control Engineering Practice, 145, 105851.
- Fletcher and Reeves (1964) Fletcher, R. and Reeves, C.M. (1964). Function minimization by conjugate gradients. The computer journal, 7(2), 149–154.
- Fletcher and Powell (1963) Fletcher, R. and Powell, M.J. (1963). A rapidly convergent descent method for minimization. The computer journal, 6(2), 163–168.
- Forgione and Piga (2020) Forgione, M. and Piga, D. (2020). Model structures and fitting criteria for system identification with neural networks. In Proc. of the 14th International Conference on Application of Information and Communication Technologies, 1–6.
- Hoekstra et al. (2024) Hoekstra, J.H., Verhoek, C., Tóth, R., and Schoukens, M. (2024). Learning-based model augmentation with LFRs. arXiv preprint arXiv:2404.01901.
- Karpatne et al. (2017) Karpatne, A., Watkins, W., Read, J., and Kumar, V. (2017). Physics-guided neural networks (PGNN): An application in lake temperature modeling. arXiv preprint arXiv:1710.11431, 2.
- Levenberg (1944) Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2(2), 164–168.
- Paduart et al. (2010) Paduart, J., Lauwers, L., Swevers, J., Smolders, K., Schoukens, J., and Pintelon, R. (2010). Identification of nonlinear systems using polynomial nonlinear state space models. Automatica, 46(4), 647–656.
- Scarselli and Tsoi (1998) Scarselli, F. and Tsoi, A.C. (1998). Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results. Neural networks, 11(1), 15–37.
- Schön et al. (2018) Schön, T.B., Svensson, A., Murray, L., and Lindsten, F. (2018). Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo. Mechanical systems and signal processing, 104, 866–883.
- Schön et al. (2011) Schön, T.B., Wills, A., and Ninness, B. (2011). System identification of nonlinear state-space models. Automatica, 47(1), 39–49.
- Schoukens and Ljung (2019) Schoukens, J. and Ljung, L. (2019). Nonlinear system identification: A user-oriented road map. IEEE Control Systems Magazine, 39(6), 28–99.
- Schoukens (2021) Schoukens, M. (2021). Improved initialization of state-space artificial neural networks. In Proc. of the European Control Conference, 1913–1918.
- Schoukens and Tiels (2017) Schoukens, M. and Tiels, K. (2017). Identification of block-oriented nonlinear systems starting from linear approximations: A survey. Automatica, 85, 272–292.
- Suykens et al. (1995) Suykens, J.A., De Moor, B.L., and Vandewalle, J. (1995). Nonlinear system identification using neural state space models, applicable to robust control design. International Journal of Control, 62(1), 129–152.
- Verdult (2002) Verdult, V. (2002). Nonlinear system identification: a state-space approach. Ph.D. thesis, University of Twente, The Netherlands.