Solving Differential Equations using Physics-Informed Deep Equilibrium Models
Abstract
This paper introduces Physics-Informed Deep Equilibrium Models (PIDEQs) for solving initial value problems (IVPs) of ordinary differential equations (ODEs). Leveraging recent advancements in deep equilibrium models (DEQs) and physics-informed neural networks (PINNs), PIDEQs combine the implicit output representation of DEQs with physics-informed training techniques. Our validation of PIDEQs, using the Van der Pol oscillator as a benchmark problem, yielded compelling results, demonstrating their efficiency and effectiveness in solving IVPs. Our analysis includes key hyperparameter considerations for optimizing PIDEQ performance. By bridging deep learning and physics-based modeling, this work advances computational techniques for solving IVPs with implications for scientific computing and engineering applications.
I INTRODUCTION
The advent of deep learning has revolutionized various industries, demonstrating its prowess in tackling complex problems across domains ranging from image recognition to natural language processing. Despite the success of deep learning in various industries, applying this technology to solve initial value problems (IVPs) of differential equations presents a formidable challenge, primarily due to the data-driven nature of deep learning models. Gathering sufficient data from real-life dynamical systems can be prohibitively expensive, necessitating a novel approach to training deep learning models for such tasks.
The work by [1] introduced the approach of solving IVPs using deep learning models by optimizing a model’s dynamics rather than solely its outputs. This approach allows deep learning models to approximate the dynamics of a system, provided an accurate description of these dynamics is known, typically in the form of differential equations. Focusing on optimizing the dynamics allows the model to be trained with minimal data, covering only the initial and boundary conditions.
In parallel, [2] and [3] proposed an innovative approach to deep learning by implicitly defining a model’s output as a solution to an equilibrium equation. This methodology results in a model known as a Deep Equilibrium Model (DEQ), which exhibits an infinite-depth network structure with residual connections. This design offers significant representational power with relatively few parameters, widening the architectural possibilities for deep learning models.
The transition from PINNs to DEQs presents an opportunity to combine the strengths of both approaches. PINNs excel in incorporating physical laws into the learning process, reducing the dependency on extensive data. DEQs, with their implicit infinite-depth structure, offer a powerful framework for solving complex problems with fewer parameters. By integrating these methodologies, we can create a Physics-Informed Deep Equilibrium Model (PIDEQ) that leverages the physics-informed training techniques of PINNs within the robust and efficient framework of DEQs.
This paper explores this integration by studying, implementing, and validating PIDEQs as efficient and accurate solvers for IVPs. An efficient solver is characterized by its ability to operate with minimal data and computational resources. In contrast, an effective solver can provide accurate solutions across a vast domain of the independent variable. Our research is guided by three objectives: implementing a DEQ, designing and implementing a PINN training algorithm suitable for DEQs, and evaluating the performance of the physics-informed DEQ in solving IVPs. These objectives form the backbone of our study. More specifically, our contributions are:
-
•
A novel approach for training DEQs for solving IVPs using physics regularization, resulting in a physics-informed deep equilibrium model.
-
•
An experimental evaluation of PIDEQs as efficient and effective solvers for ordinary differential equations using the Van der Pol oscillator.
-
•
An analysis of the key hyperparameters that must be adjusted to develop PIDEQs effectively.
Through these efforts, we aim to provide insights into the suitability of DEQs, a cutting-edge deep learning architecture, for addressing the challenges posed by solving IVPs of ODEs, paving the way for their broader adoption in scientific computing and engineering applications.
II SOLVING IVPs USING DEEP LEARNING
Solving IVPs of differential equations using deep learning poses unique challenges. Unlike traditional deep learning tasks where input-output pairs are readily available, in IVPs, both the target function (solution to the IVP) and input-output pairs are unknown or too complex to be directly useful. This lack of information complicates the training process, as the conventional approach of constructing a dataset of input-output pairs becomes impractical or impossible.
II-A Problem Statement
Consider an IVP of an ODE with boundary conditions. Given a function and initial condition , we aim to find a solution such that the differential equation
holds on the entirety of the interval .
Our objective is to train a deep learning model , parameterized by , to approximate the solution function , satisfying the same conditions, i.e.,
II-B Physics Regularization
The conventional approach of constructing a dataset for training is inefficient in the context of IVPs, as it does not leverage the known dynamics represented by . To address this challenge, [1] proposed a physics-informed learning approach incorporating known dynamics into the training process.
The key idea is to train the model to approximate the solution at , satisfying the initial condition constraint, and simultaneously train its Jacobian to approximate , ensuring that the dynamics are captured accurately.
To achieve this, the cost function is defined as
(1) |
with two distinct components weighted by a scalar parameter . By optimizing this cost function, the model is guided to simultaneously satisfy the initial condition and approximate the dynamics represented by .
The term is introduced to enforce the initial condition constraint. It is given by
using an norm to penalize the deviation from the initial condition.
The term plays a crucial role in ensuring that the model captures the underlying dynamics specified by . By regularizing the model’s Jacobian, this term forces the model to learn not just the solution at discrete points, but also how the solution evolves according to the differential equations. This is achieved by minimizing the difference between the model’s predicted derivative and the true dynamics . For such, it is defined as
where is a uniform distribution over the interval, i.e., the data is uniformly sampled from the domain111The sampling strategy can be considered a design choice..
This approach eliminates the need for explicit knowledge of the target function () and allows for efficient training using randomly constructed samples, making deep learning a viable option for solving IVPs.
III DEEP EQUILIBRIUM MODELS
This section introduces and defines DEQs and presents their combination with physics-informed training in the proposed PIDEQ framework. We follow the notation proposed by [2] and explore the foundational concepts and practical considerations that enable the introduction of our proposed approach.
III-A Introduction and Definition
Deep Learning models typically compose simple parametrized functions to capture complex features. While traditional architectures stack these functions in a sequential manner, in a DEQ, the architecture is based on an infinite stack of the same function. If this function is well-posed, i.e., it respects a Lipschitz-continuity condition [3], the infinite stack leads to an equilibrium point that serves as the model’s output.
III-B Forward Computation
Formally, a DEQ can be defined as the solution to an equilibrium equation. Let be the layer function. Then, the output of a DEQ can be described through the solution of the equilibrium equation
where is the model’s input. Fig. 1 illustrates the equilibrium equation inside a DEQ.
The simplest way to solve the equilibrium equation that defines the output of a DEQ is by iteratively applying the equilibrium function until convergence. Given an input and an initial guess , the procedure updates the equilibrium guess by
until is sufficiently small. This approach, known as the simple iteration method, is intuitive but can be slow to converge and is sensitive to the starting point. Moreover, it only finds equilibrium of functions that are a contraction between the starting point and the equilibrium point [4]. For example, the simple iteration method fails to find the equilibrium of starting from any point other than the equilibrium itself ().
To address these limitations, Newton’s method can be used. It updates the incumbent equilibrium point by
where represents the Jacobian of with respect to 222To avoid the computational burden of inverting the Jacobian matrix, it is common to solve instead..
Newton’s method converges much faster and is applicable to a broader class of functions compared to simple iteration [4]. It offers a faster and more robust alternative by leveraging the Jacobian of . As the deep learning paradigm is to perform gradient-based optimization, the well-posedness of the Jacobian of the equilibrium function is already guaranteed. In other words, we can benefit from the fast convergence rate and broad applicability of Newton’s method. In fact, most modern root-finding algorithms can be used for computing the equilibrium, such as Anderson acceleration [5] and Broyden’s Method [6].
III-C Backward Computation
Computing the gradient of the output of a DEQ with respect to its parameters is not straightforward. The approach of automatic-differentiation frameworks for deep learning is to backpropagate the loss at training time, differentiating every operation in the forward pass. Differentiating through modern root-finding algorithms is, at best, an intricate and costly operation. Even if the forward pass is done through the iterative process, its differentiation would require backpropagating through an unknown number of layers.
Luckily, we can exploit the fact that the output of a DEQ for an equilibrium function defines a parametrization of with respect to . This allows us to apply the implicit function theorem to write the Jacobian of a DEQ as
Note that the Jacobian can be computed regardless of the operations applied during the forward pass. Furthermore, we do not need to compute the entire Jacobian matrix for gradient descent. We need to compute the gradient of a loss function with respect to the model’s parameters. Such gradient can be written as
where is the gradient of the loss function and is the Jacobian of the DEQ with respect to its parameters. Note that the gradient requires us to compute two vector-matrix products, in which the result of can be computed through a root-finding algorithm, assuming that the gradient of the loss function is known. We refer the reader to [2] and [3] for further details.
III-D Physics-Informed Deep Equilibrium Model (PIDEQ)
A Physics-Informed Deep Equilibrium Model (PIDEQ) extends the principles of physics-informed learning to a DEQ. In summary, PIDEQs leverage the principles of physics-informed learning to enhance the capabilities of DEQs. By integrating physics-based constraints, PIDEQs deliver the architectural power of DEQs with robustness and accuracy in modeling physical systems.
The challenge in physics-informing a DEQ lies in optimizing a cost function on its derivatives. As exposed in Sec. III-C, [2] proposed an efficient method to compute the first derivative of a DEQ with respect to either its parameters or the input, which, in the context of physics-informed learning, allows us to compute the loss function value. However, in computing the gradient of the loss function, we require second derivatives, which pose additional challenges as they imply differentiating the backward pass. Furthermore, to the best of our knowledge, higher-order derivatives of DEQs have not been investigated. To address this challenge, we limit ourselves to using solely differentiable operators in the backward pass. This way, automatic differentiation frameworks can automatically compute the second derivatives.
Finally, it has been shown that DEQs benefit significantly from penalizing the presence of large values in its Jacobian [7]. This penalization can be achieved by adding the Frobenius norm of the Jacobian of the equilibrium function as a regularization term in the loss function, which was shown to reduce training and inference times.
Therefore, we propose the following loss function for training PIDEQs:
It combines a base loss term () with a physics-informed loss term () and a regularization term based on the Frobenius norm of the Jacobian of the equilibrium function, weighed by a coefficient.
IV EXPERIMENTS
In this section, we conduct a series of experiments to evaluate the capacity of PIDEQs for solving differential equations. We use the Van der Pol oscillator as our benchmark IVP due to its well-documented complexity and lack of an analytical solution, which makes it an ideal candidate for testing the robustness of numerical solvers. We compare the PIDEQ approach with physic-informed neural networks (PINNs), a well-established deep learning technique for similar tasks [1, 8].
IV-A Problem Definition
The Van der Pol oscillator system is chosen as the target ODE for our experiments due to its nonlinear dynamics and the absence of an analytical solution, providing a stringent test for our models. The system is a classic example in nonlinear dynamics and has been extensively studied, making it a reliable benchmark.
The dynamics of the oscillator are described as a system of first-order equations
We select a value of for the dampening coefficient and an initial condition , i.e., a small perturbation around the unstable equilibrium at the origin, where represents the two states of the system. The desired solution is sought over a time horizon of 2 seconds, during which the system is expected to converge to a limit cycle [9], as illustrated in Fig. 2.
![Refer to caption](extracted/5698745/images/vdp_statespace_mu_10.png)
IV-A1 Evaluation Metrics
Since no analytical solution is available, we assess the performance of the models based on their approximation error compared to a reference solution obtained using the fourth-order Runge-Kutta (RK4) method. The Integral of the Absolute Error (IAE) is computed over 1000 equally time-spaced steps within the solution interval. We also consider the computational time required for training and inference as valuable metrics, especially for time-sensitive applications.
IV-B Training
In our experiments with the PIDEQ, we use an architecture similar to the one proposed by [3], with an equilibrium function that provides as the output the (element-wise) hyperbolic tangent of an affine combination of its inputs, namely, the time value and the hidden states (). The model’s output is a linear combination of the equilibrium vector . Formally, we can write
where the vector parameter is simply a vectorized representation of all coefficients, i.e., , and the input is .
We use the Adam optimizer [10] for training the PIDEQ. The cost function incorporates regularization terms to enforce physics-informed training, as detailed in Sec. III-D. Our default hyperparameter configuration333Determined through early experimentation. was a learning rate of , , , and a budget of 50000 epochs. Our default solver for the forward pass was Anderson acceleration with a tolerance of , while the backward pass is always solved iteratively to ensure differentiability.
Nevertheless, the dimension of the hidden states (), the solver tolerance, and the coefficient for the Jacobian regularization term () are all considered hyperparameters, and we measure their impacts empirically. We also consider as a hyperparameter the choice between Anderson acceleration, Broyden’s method, and the simple iteration procedure as the forward pass solver.
IV-C Experimental Results
All experiments reported below were performed on a high-end computer using an RTX 3060 GPU. Further implementation details can be seen in our code repository444https://github.com/brunompacheco/pideq. For every hyperparameter configuration, five models were trained with random initial values for the trainable parameters.
IV-C1 Baseline
Our baseline PINN model follows the architecture proposed in [8], with four layers of twenty nodes each. As a DEQ with as many states as there are hidden nodes in a deep feedforward network has at least as much representational power [3], we start with hidden states. The results can be seen in Fig. 3. Even though both models achieved low IAE values, the PINN presented better performance in terms of IAE and training time, converging in far fewer epochs.
![Refer to caption](extracted/5698745/images/exp_1_iae.png)
IV-C2 Hyperparameter Optimization
A deeper look at the baseline PIDEQ shows that the parameters converged to an matrix with many null rows. This indicates that the model could achieve similar performance with fewer hidden states. Such intuition proved truthful in our hyperparameter tuning experiments, as shown in Fig. 4. We iteratively halved the number of states until the matrix had no more empty rows, which happened at states. Then, we took one step further, reducing to states, but the results indicated that, indeed, 5 was the sweet spot, with faster convergence and lower final IAE. The following experiments are performed with PIDEQs with five hidden states.
![Refer to caption](extracted/5698745/images/exp_2_iae.png)
We further investigate the importance of the regularization term on the Jacobian of the equilibrium function. Even though the Jacobian of the DEQ, in the context of physics-informed training, is directly learned, our experiments indicate that the presence of the regularization term is essential. Otherwise, the training took over 30 times longer and was very unstable. However, the magnitude of the coefficient has little impact on the outcomes. Our results for different values of (including 0) are illustrated in Fig. 5.
![Refer to caption](extracted/5698745/images/exp_4_iae.png)
Anderson acceleration, Broyden’s method, and the simple iteration procedure were evaluated as solvers for the forward pass. As the results illustrated in Fig. 6 show, Broyden’s method could provide a performance gain, but it came with a significant computational cost, as the epochs took over six times longer in comparison to using Anderson acceleration. At the same time, using the simple iteration procedure was three times faster than using Anderson acceleration. However, as discussed in Sec. III-B, the theoretical limitation of the simple iterative procedure limits its reliability. We also evaluated the performance of PIDEQ under different values for the solver tolerance, but no performance gain was observed for values different than our default of .
![Refer to caption](extracted/5698745/images/exp_5_iae.png)
IV-C3 Results
The final PIDEQ, after hyperparameter tuning, achieves comparable results to the baseline PINN models. In terms of approximation performance, while the baseline PIDEQ achieved an IAE of 0.0082, our best PIDEQ achieved an IAE of 0.0018, and the baseline PINN achieved an IAE of 0.0002, which is seen through the last epochs in Fig. 8. Fig. 7 illustrates how small the difference is between the two approaches, showing them in comparison to the solution computed using RK4. However, the greatest gain comes from the reduced training time, as the tuned PIDEQ converges much faster than the baseline PIDEQ.
![Refer to caption](extracted/5698745/images/final_vdp_y1.png)
![Refer to caption](extracted/5698745/images/final_vdp_y2.png)
The greatest performance gain came from reducing the number of hidden states, guided by the sparsity of the matrix. The smaller model size leads to improved memory efficiency and explainability. We also trained PINN models with two hidden layers of 5 nodes each, totaling 52 trainable parameters, the same number of parameters as the tuned PIDEQ. The results are shown in Fig. 8. Note that both final models take around the same number of epochs to converge.
![Refer to caption](extracted/5698745/images/final_iae.png)
V CONCLUSIONS
This study explored two innovative approaches to deep learning: Physics-Informed Neural Networks (PINNs) and Deep Equilibrium Models (DEQs). While PINNs offer efficiency in training models for physical problems, DEQs promise greater representational power with fewer parameters. We introduced Physics-Informed Deep Equilibrium Models (PIDEQ), uniting the physics-regularization with the infinite-depth architecture.
Our experiments on the Van der Pol oscillator validated PIDEQs’ effectiveness in solving IVPs of ODEs, validating our proposed approach. However, comparisons with PINN models revealed mixed results. Although PIDEQs demonstrated the ability to solve IVPs, they exhibited slightly higher errors and slower training times compared to PINNs. This suggests that while DEQ structures offer theoretical advantages in representational power, these benefits may not translate into practical improvements for certain types of problems like the Van der Pol oscillator.
The strengths of PIDEQs lie in their potential for handling more complex problems, leveraging the implicit depth of DEQs. However, the current implementation’s reliance on simple iterative methods for the backward pass can be a limiting factor, especially for more challenging problems and partial differential equations (PDEs). This limitation points to a significant area for improvement.
Future research should focus on evaluating PIDEQs with more complex problems (e.g., higher-order ODEs and PDEs) to better understand their potential advantages. Additionally, exploring more sophisticated methods for the backward pass, such as implicit differentiation techniques, could enhance the training efficiency and accuracy of PIDEQs.
In conclusion, while PIDEQs present a promising direction for integrating physics-informed principles with advanced deep-learning architectures, further work is still necessary to fully realize their potential and address their current limitations compared to traditional PINNs. This future research could pave the way for more robust and versatile models capable of solving a wide range of complex dynamical systems, providing competitive solver alternatives.
Acknowledgment
This research was funded in part by Fundação de Amparo à Pesquisa e Inovação do Estado de Santa Catarina (FAPESC) under grant 2021TR2265, CNPq under grants 308624/2021-1 and 402099/2023-0, and CAPES under grant PROEX.
References
- [1] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, vol. 378, pp. 686–707, 2019.
- [2] S. Bai, J. Z. Kolter, and V. Koltun, “Deep equilibrium models,” in Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
- [3] L. El Ghaoui, F. Gu, B. Travacca, A. Askari, and A. Tsai, “Implicit deep learning,” SIAM Journal on Mathematics of Data Science, vol. 3, p. 930–958, Jan. 2021.
- [4] E. Süli and D. F. Mayers, An Introduction to Numerical Analysis. Cambridge University Press, Aug. 2003.
- [5] H. F. Walker and P. Ni, “Anderson acceleration for fixed-point iterations,” SIAM Journal on Numerical Analysis, vol. 49, p. 1715–1735, Jan. 2011.
- [6] C. G. Broyden, “A class of methods for solving nonlinear simultaneous equations,” Mathematics of Computation, vol. 19, no. 92, p. 577–593, 1965.
- [7] S. Bai, V. Koltun, and Z. Kolter, “Stabilizing equilibrium models by Jacobian regularization,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139, pp. 554–565, PMLR, 18–24 Jul 2021.
- [8] E. A. Antonelo, E. Camponogara, L. O. Seman, E. R. de Souza, J. P. Jordanou, and J. F. Hübner, “Physics-informed neural nets for control of dynamical systems,” Neurocomputing, vol. 579, p. 127419, April 2024.
- [9] R. Grimshaw, Nonlinear Ordinary Differential Equations: Applied Mathematics and Engineering Science Texts. CRC Press, March 2017.
- [10] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations (ICLR) (Y. Bengio and Y. LeCun, eds.), (San Diego, CA, USA), 2015.