Solving Differential Equations using Physics-Informed Deep Equilibrium Models

Bruno M. Pacheco1 and Eduardo Camponogara1 1B. Pacheco and E. Camponogara are with the Department of Automation and Systems Engineering, Federal University of Santa Catarina, Brazil, emails: [email protected], [email protected]
Abstract

This paper introduces Physics-Informed Deep Equilibrium Models (PIDEQs) for solving initial value problems (IVPs) of ordinary differential equations (ODEs). Leveraging recent advancements in deep equilibrium models (DEQs) and physics-informed neural networks (PINNs), PIDEQs combine the implicit output representation of DEQs with physics-informed training techniques. Our validation of PIDEQs, using the Van der Pol oscillator as a benchmark problem, yielded compelling results, demonstrating their efficiency and effectiveness in solving IVPs. Our analysis includes key hyperparameter considerations for optimizing PIDEQ performance. By bridging deep learning and physics-based modeling, this work advances computational techniques for solving IVPs with implications for scientific computing and engineering applications.

I INTRODUCTION

The advent of deep learning has revolutionized various industries, demonstrating its prowess in tackling complex problems across domains ranging from image recognition to natural language processing. Despite the success of deep learning in various industries, applying this technology to solve initial value problems (IVPs) of differential equations presents a formidable challenge, primarily due to the data-driven nature of deep learning models. Gathering sufficient data from real-life dynamical systems can be prohibitively expensive, necessitating a novel approach to training deep learning models for such tasks.

The work by [1] introduced the approach of solving IVPs using deep learning models by optimizing a model’s dynamics rather than solely its outputs. This approach allows deep learning models to approximate the dynamics of a system, provided an accurate description of these dynamics is known, typically in the form of differential equations. Focusing on optimizing the dynamics allows the model to be trained with minimal data, covering only the initial and boundary conditions.

In parallel, [2] and [3] proposed an innovative approach to deep learning by implicitly defining a model’s output as a solution to an equilibrium equation. This methodology results in a model known as a Deep Equilibrium Model (DEQ), which exhibits an infinite-depth network structure with residual connections. This design offers significant representational power with relatively few parameters, widening the architectural possibilities for deep learning models.

The transition from PINNs to DEQs presents an opportunity to combine the strengths of both approaches. PINNs excel in incorporating physical laws into the learning process, reducing the dependency on extensive data. DEQs, with their implicit infinite-depth structure, offer a powerful framework for solving complex problems with fewer parameters. By integrating these methodologies, we can create a Physics-Informed Deep Equilibrium Model (PIDEQ) that leverages the physics-informed training techniques of PINNs within the robust and efficient framework of DEQs.

This paper explores this integration by studying, implementing, and validating PIDEQs as efficient and accurate solvers for IVPs. An efficient solver is characterized by its ability to operate with minimal data and computational resources. In contrast, an effective solver can provide accurate solutions across a vast domain of the independent variable. Our research is guided by three objectives: implementing a DEQ, designing and implementing a PINN training algorithm suitable for DEQs, and evaluating the performance of the physics-informed DEQ in solving IVPs. These objectives form the backbone of our study. More specifically, our contributions are:

  • A novel approach for training DEQs for solving IVPs using physics regularization, resulting in a physics-informed deep equilibrium model.

  • An experimental evaluation of PIDEQs as efficient and effective solvers for ordinary differential equations using the Van der Pol oscillator.

  • An analysis of the key hyperparameters that must be adjusted to develop PIDEQs effectively.

Through these efforts, we aim to provide insights into the suitability of DEQs, a cutting-edge deep learning architecture, for addressing the challenges posed by solving IVPs of ODEs, paving the way for their broader adoption in scientific computing and engineering applications.

II SOLVING IVPs USING DEEP LEARNING

Solving IVPs of differential equations using deep learning poses unique challenges. Unlike traditional deep learning tasks where input-output pairs are readily available, in IVPs, both the target function (solution to the IVP) and input-output pairs are unknown or too complex to be directly useful. This lack of information complicates the training process, as the conventional approach of constructing a dataset of input-output pairs becomes impractical or impossible.

II-A Problem Statement

Consider an IVP of an ODE with boundary conditions. Given a function 𝒩:×mm:𝒩superscript𝑚superscript𝑚\mathcal{N}:\mathbb{R}\times\mathbb{R}^{m}\to\mathbb{R}^{m}caligraphic_N : blackboard_R × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and initial condition t0I,𝒚0mformulae-sequencesubscript𝑡0𝐼subscript𝒚0superscript𝑚t_{0}\in I\subset\mathbb{R},\,\bm{y}_{0}\subset\mathbb{R}^{m}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ italic_I ⊂ blackboard_R , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, we aim to find a solution ϕ:Im:bold-italic-ϕ𝐼superscript𝑚\bm{\phi}:I\to\mathbb{R}^{m}bold_italic_ϕ : italic_I → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT such that the differential equation

dϕ(t)dt=𝒩(t,ϕ(t)),ϕ(t0)=𝒚0formulae-sequence𝑑bold-italic-ϕ𝑡𝑑𝑡𝒩𝑡bold-italic-ϕ𝑡bold-italic-ϕsubscript𝑡0subscript𝒚0\frac{d\bm{\phi}(t)}{dt}=\mathcal{N}\left(t,\bm{\phi}(t)\right),\,\bm{\phi}(t_% {0})=\bm{y}_{0}divide start_ARG italic_d bold_italic_ϕ ( italic_t ) end_ARG start_ARG italic_d italic_t end_ARG = caligraphic_N ( italic_t , bold_italic_ϕ ( italic_t ) ) , bold_italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

holds on the entirety of the interval I𝐼Iitalic_I.

Our objective is to train a deep learning model Dθ:Im:subscript𝐷𝜃𝐼superscript𝑚D_{\mathbf{\theta}}:I\to\mathbb{R}^{m}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_I → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, parameterized by θΘ𝜃Θ\mathbf{\theta}\in\Thetaitalic_θ ∈ roman_Θ, to approximate the solution function ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ, satisfying the same conditions, i.e.,

dDθ(t)dt=𝒩(t,Dθ(t)),tIandDθ(t0)=𝒚0.formulae-sequence𝑑subscript𝐷𝜃𝑡𝑑𝑡𝒩𝑡subscript𝐷𝜃𝑡formulae-sequence𝑡𝐼andsubscript𝐷𝜃subscript𝑡0subscript𝒚0\frac{dD_{\mathbf{\theta}}(t)}{dt}=\mathcal{N}\left(t,D_{\mathbf{\theta}}(t)% \right),\,t\in I\quad\text{and}\quad D_{\mathbf{\theta}}(t_{0})=\bm{y}_{0}.divide start_ARG italic_d italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_d italic_t end_ARG = caligraphic_N ( italic_t , italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t ) ) , italic_t ∈ italic_I and italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

II-B Physics Regularization

The conventional approach of constructing a dataset for training is inefficient in the context of IVPs, as it does not leverage the known dynamics represented by 𝒩𝒩\mathcal{N}caligraphic_N. To address this challenge, [1] proposed a physics-informed learning approach incorporating known dynamics into the training process.

The key idea is to train the model Dθsubscript𝐷𝜃D_{\mathbf{\theta}}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT to approximate the solution ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ at t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, satisfying the initial condition constraint, and simultaneously train its Jacobian to approximate 𝒩𝒩\mathcal{N}caligraphic_N, ensuring that the dynamics are captured accurately.

To achieve this, the cost function is defined as

J(θ)=Jb(θ)+λJ𝒩(θ)𝐽𝜃subscript𝐽𝑏𝜃𝜆subscript𝐽𝒩𝜃J(\mathbf{\theta})=J_{b}(\mathbf{\theta})+\lambda J_{\mathcal{N}}(\mathbf{% \theta})italic_J ( italic_θ ) = italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ ) + italic_λ italic_J start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT ( italic_θ ) (1)

with two distinct components weighted by a scalar parameter λ+𝜆subscript\lambda\in\mathbb{R}_{+}italic_λ ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. By optimizing this cost function, the model is guided to simultaneously satisfy the initial condition and approximate the dynamics represented by 𝒩𝒩\mathcal{N}caligraphic_N.

The term Jb(θ)subscript𝐽𝑏𝜃J_{b}(\mathbf{\theta})italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ ) is introduced to enforce the initial condition constraint. It is given by

Jb(θ)=D𝜽(t0)𝒚022,subscript𝐽𝑏𝜃superscriptsubscriptnormsubscript𝐷𝜽subscript𝑡0subscript𝒚022J_{b}(\mathbf{\theta})=\|D_{\bm{\theta}}(t_{0})-\bm{y}_{0}\|_{2}^{2},italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_θ ) = ∥ italic_D start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

using an 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm to penalize the deviation from the initial condition.

The term J𝒩(θ)subscript𝐽𝒩𝜃J_{\mathcal{N}}(\mathbf{\theta})italic_J start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT ( italic_θ ) plays a crucial role in ensuring that the model captures the underlying dynamics specified by 𝒩𝒩\mathcal{N}caligraphic_N. By regularizing the model’s Jacobian, this term forces the model to learn not just the solution at discrete points, but also how the solution evolves according to the differential equations. This is achieved by minimizing the difference between the model’s predicted derivative ddtD𝜽(t)𝑑𝑑𝑡subscript𝐷𝜽𝑡\frac{d}{dt}D_{\bm{\theta}}(t)divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_D start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t ) and the true dynamics 𝒩(t,D𝜽(t))𝒩𝑡subscript𝐷𝜽𝑡\mathcal{N}(t,D_{\bm{\theta}}(t))caligraphic_N ( italic_t , italic_D start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t ) ). For such, it is defined as

J𝒩(θ)=t𝒰(I)ddtD𝜽(t)𝒩(t,D𝜽(t))22,subscript𝐽𝒩𝜃subscriptsimilar-to𝑡𝒰𝐼superscriptsubscriptnorm𝑑𝑑𝑡subscript𝐷𝜽𝑡𝒩𝑡subscript𝐷𝜽𝑡22J_{\mathcal{N}}(\mathbf{\theta})=\sum_{t\sim\mathcal{U}(I)}\left\|\frac{d}{dt}% D_{\bm{\theta}}(t)-\mathcal{N}\left(t,D_{\bm{\theta}}(t)\right)\right\|_{2}^{2},italic_J start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT ( italic_θ ) = ∑ start_POSTSUBSCRIPT italic_t ∼ caligraphic_U ( italic_I ) end_POSTSUBSCRIPT ∥ divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_D start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t ) - caligraphic_N ( italic_t , italic_D start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where 𝒰(I)𝒰𝐼\mathcal{U}(I)caligraphic_U ( italic_I ) is a uniform distribution over the I𝐼Iitalic_I interval, i.e., the data is uniformly sampled from the domain111The sampling strategy can be considered a design choice..

This approach eliminates the need for explicit knowledge of the target function (ϕbold-italic-ϕ\bm{\phi}bold_italic_ϕ) and allows for efficient training using randomly constructed samples, making deep learning a viable option for solving IVPs.

III DEEP EQUILIBRIUM MODELS

This section introduces and defines DEQs and presents their combination with physics-informed training in the proposed PIDEQ framework. We follow the notation proposed by [2] and explore the foundational concepts and practical considerations that enable the introduction of our proposed approach.

III-A Introduction and Definition

Deep Learning models typically compose simple parametrized functions to capture complex features. While traditional architectures stack these functions in a sequential manner, in a DEQ, the architecture is based on an infinite stack of the same function. If this function is well-posed, i.e., it respects a Lipschitz-continuity condition [3], the infinite stack leads to an equilibrium point that serves as the model’s output.

III-B Forward Computation

Formally, a DEQ can be defined as the solution to an equilibrium equation. Let 𝒇θ:n×mm:subscript𝒇𝜃superscript𝑛superscript𝑚superscript𝑚\bm{f}_{\theta}:\mathbb{R}^{n}\times\mathbb{R}^{m}\to\mathbb{R}^{m}bold_italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT be the layer function. Then, the output of a DEQ can be described through the solution 𝒛superscript𝒛\bm{z}^{\star}bold_italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT of the equilibrium equation

𝒛=𝒇𝜽(𝒙,𝒛),𝒛subscript𝒇𝜽𝒙𝒛\bm{z}=\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}),bold_italic_z = bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z ) ,

where 𝒙𝒙\bm{x}bold_italic_x is the model’s input. Fig. 1 illustrates the equilibrium equation inside a DEQ.

𝒙𝒙\bm{x}bold_italic_x𝒇𝜽subscript𝒇𝜽\bm{f}_{\bm{\theta}}bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT𝒛𝒛\bm{z}bold_italic_zD𝜽EQsubscriptsuperscript𝐷𝐸𝑄𝜽\quad\,\,\,D^{EQ}_{\bm{\theta}}italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT
Figure 1: Illustration of the recursion within the equilibrium equation that defines the DEQs.

The simplest way to solve the equilibrium equation that defines the output of a DEQ is by iteratively applying the equilibrium function until convergence. Given an input 𝒙𝒙\bm{x}bold_italic_x and an initial guess 𝒛[0]superscript𝒛delimited-[]0\bm{z}^{[0]}bold_italic_z start_POSTSUPERSCRIPT [ 0 ] end_POSTSUPERSCRIPT, the procedure updates the equilibrium guess 𝒛[i]superscript𝒛delimited-[]𝑖\bm{z}^{[i]}bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT by

𝒛[i]=𝒇𝜽(𝒙,𝒛[i1])superscript𝒛delimited-[]𝑖subscript𝒇𝜽𝒙superscript𝒛delimited-[]𝑖1\bm{z}^{[i]}=\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}^{[i-1]})bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z start_POSTSUPERSCRIPT [ italic_i - 1 ] end_POSTSUPERSCRIPT )

until |𝒛[i]𝒛[i1]|superscript𝒛delimited-[]𝑖superscript𝒛delimited-[]𝑖1|\bm{z}^{[i]}-\bm{z}^{[i-1]}|| bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT - bold_italic_z start_POSTSUPERSCRIPT [ italic_i - 1 ] end_POSTSUPERSCRIPT | is sufficiently small. This approach, known as the simple iteration method, is intuitive but can be slow to converge and is sensitive to the starting point. Moreover, it only finds equilibrium of functions that are a contraction between the starting point and the equilibrium point [4]. For example, the simple iteration method fails to find the equilibrium of f(z)=2z1𝑓𝑧2𝑧1f(z)=2z-1italic_f ( italic_z ) = 2 italic_z - 1 starting from any point other than the equilibrium itself (z=1𝑧1z=1italic_z = 1).

To address these limitations, Newton’s method can be used. It updates the incumbent equilibrium point by

𝒛[i+1]=𝒛[i](d𝒇𝜽(𝒙,𝒛[i])d𝒛)1𝒇𝜽(𝒙,𝒛[i]),superscript𝒛delimited-[]𝑖1superscript𝒛delimited-[]𝑖superscript𝑑subscript𝒇𝜽𝒙superscript𝒛delimited-[]𝑖𝑑𝒛1subscript𝒇𝜽𝒙superscript𝒛delimited-[]𝑖\bm{z}^{[i+1]}=\bm{z}^{[i]}-\left(\frac{d\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}^{[% i]})}{d\bm{z}}\right)^{-1}\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}^{[i]}),bold_italic_z start_POSTSUPERSCRIPT [ italic_i + 1 ] end_POSTSUPERSCRIPT = bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT - ( divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_d bold_italic_z end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) ,

where dd𝒛𝒇𝜽(𝒙,𝒛[i])𝑑𝑑𝒛subscript𝒇𝜽𝒙superscript𝒛delimited-[]𝑖\frac{d}{d\bm{z}}\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}^{[i]})divide start_ARG italic_d end_ARG start_ARG italic_d bold_italic_z end_ARG bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) represents the Jacobian of 𝒇𝜽subscript𝒇𝜽\bm{f}_{\bm{\theta}}bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT with respect to 𝒛𝒛\bm{z}bold_italic_z222To avoid the computational burden of inverting the Jacobian matrix, it is common to solve d𝒇𝜽(𝒙,𝒛[i])d𝒛(𝒛[i+1]𝒛[i])=𝒇𝜽(𝒙,𝒛[i])𝑑subscript𝒇𝜽𝒙superscript𝒛delimited-[]𝑖𝑑𝒛superscript𝒛delimited-[]𝑖1superscript𝒛delimited-[]𝑖subscript𝒇𝜽𝒙superscript𝒛delimited-[]𝑖\frac{d\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}^{[i]})}{d\bm{z}}\left(\bm{z}^{[i+1]}% -\bm{z}^{[i]}\right)=-\bm{f}_{\bm{\theta}}(\bm{x},\bm{z}^{[i]})divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_d bold_italic_z end_ARG ( bold_italic_z start_POSTSUPERSCRIPT [ italic_i + 1 ] end_POSTSUPERSCRIPT - bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) = - bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) instead..

Newton’s method converges much faster and is applicable to a broader class of functions compared to simple iteration [4]. It offers a faster and more robust alternative by leveraging the Jacobian of f𝜽subscript𝑓𝜽f_{\bm{\theta}}italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT. As the deep learning paradigm is to perform gradient-based optimization, the well-posedness of the Jacobian of the equilibrium function is already guaranteed. In other words, we can benefit from the fast convergence rate and broad applicability of Newton’s method. In fact, most modern root-finding algorithms can be used for computing the equilibrium, such as Anderson acceleration [5] and Broyden’s Method [6].

III-C Backward Computation

Computing the gradient of the output of a DEQ with respect to its parameters is not straightforward. The approach of automatic-differentiation frameworks for deep learning is to backpropagate the loss at training time, differentiating every operation in the forward pass. Differentiating through modern root-finding algorithms is, at best, an intricate and costly operation. Even if the forward pass is done through the iterative process, its differentiation would require backpropagating through an unknown number of layers.

Luckily, we can exploit the fact that the output of a DEQ for an equilibrium function 𝒇𝜽(𝒙,𝒛)subscript𝒇𝜽𝒙𝒛\bm{f}_{\bm{\theta}}(\bm{x},\bm{z})bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_z ) defines a parametrization of 𝒛𝒛\bm{z}bold_italic_z with respect to 𝒙𝒙\bm{x}bold_italic_x. This allows us to apply the implicit function theorem to write the Jacobian of a DEQ as

dD𝜽EQ(𝒙)d𝜽=[d𝒇𝜽(𝒙,D𝜽EQ(𝒙))d𝒛I]1d𝒇𝜽(𝒙,D𝜽EQ(𝒙))d𝜽.𝑑subscriptsuperscript𝐷𝐸𝑄𝜽𝒙𝑑𝜽superscriptdelimited-[]𝑑subscript𝒇𝜽𝒙subscriptsuperscript𝐷𝐸𝑄𝜽𝒙𝑑𝒛𝐼1𝑑subscript𝒇𝜽𝒙subscriptsuperscript𝐷𝐸𝑄𝜽𝒙𝑑𝜽\frac{dD^{EQ}_{\bm{\theta}}(\bm{x})}{d\bm{\theta}}=\\ -\left[\frac{d\bm{f}_{\bm{\theta}}(\bm{x},D^{EQ}_{\bm{\theta}}(\bm{x}))}{d\bm{% z}}-I\right]^{-1}\frac{d\bm{f}_{\bm{\theta}}(\bm{x},D^{EQ}_{\bm{\theta}}(\bm{x% }))}{d\bm{\theta}}.start_ROW start_CELL divide start_ARG italic_d italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG italic_d bold_italic_θ end_ARG = end_CELL end_ROW start_ROW start_CELL - [ divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) ) end_ARG start_ARG italic_d bold_italic_z end_ARG - italic_I ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x , italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) ) end_ARG start_ARG italic_d bold_italic_θ end_ARG . end_CELL end_ROW

Note that the Jacobian can be computed regardless of the operations applied during the forward pass. Furthermore, we do not need to compute the entire Jacobian matrix for gradient descent. We need to compute the gradient of a loss function \mathcal{L}caligraphic_L with respect to the model’s parameters. Such gradient can be written as

dd𝜽|y^=D𝜽EQ(𝒙)\displaystyle\frac{d\mathcal{L}}{d\bm{\theta}}\bigg{\rvert}_{\hat{y}=D^{EQ}_{% \bm{\theta}}(\bm{x})}divide start_ARG italic_d caligraphic_L end_ARG start_ARG italic_d bold_italic_θ end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG = italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) end_POSTSUBSCRIPT =ddy^|y^=D𝜽EQ(𝒙)dD𝜽EQd𝜽|𝒙\displaystyle=\frac{d\mathcal{L}}{d\hat{y}}\bigg{\rvert}_{\hat{y}=D^{EQ}_{\bm{% \theta}}(\bm{x})}\frac{dD^{EQ}_{\bm{\theta}}}{d\bm{\theta}}\bigg{\rvert}_{\bm{% x}}= divide start_ARG italic_d caligraphic_L end_ARG start_ARG italic_d over^ start_ARG italic_y end_ARG end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG = italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) end_POSTSUBSCRIPT divide start_ARG italic_d italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_ARG start_ARG italic_d bold_italic_θ end_ARG | start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT
=ddy^[d𝒇𝜽d𝒛I]1d𝒇𝜽d𝜽,absent𝑑𝑑^𝑦superscriptdelimited-[]𝑑subscript𝒇𝜽𝑑𝒛𝐼1𝑑subscript𝒇𝜽𝑑𝜽\displaystyle=-\frac{d\mathcal{L}}{d\hat{y}}\left[\frac{d\bm{f}_{\bm{\theta}}}% {d\bm{z}}-I\right]^{-1}\frac{d\bm{f}_{\bm{\theta}}}{d\bm{\theta}},= - divide start_ARG italic_d caligraphic_L end_ARG start_ARG italic_d over^ start_ARG italic_y end_ARG end_ARG [ divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_ARG start_ARG italic_d bold_italic_z end_ARG - italic_I ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_ARG start_ARG italic_d bold_italic_θ end_ARG ,

where ddy^𝑑𝑑^𝑦\frac{d\mathcal{L}}{d\hat{y}}divide start_ARG italic_d caligraphic_L end_ARG start_ARG italic_d over^ start_ARG italic_y end_ARG end_ARG is the gradient of the loss function and dD𝜽EQd𝜽𝑑subscriptsuperscript𝐷𝐸𝑄𝜽𝑑𝜽\frac{dD^{EQ}_{\bm{\theta}}}{d\bm{\theta}}divide start_ARG italic_d italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_ARG start_ARG italic_d bold_italic_θ end_ARG is the Jacobian of the DEQ with respect to its parameters. Note that the gradient requires us to compute two vector-matrix products, in which the result of ddy^[d𝒇𝜽d𝒛I]1𝑑𝑑^𝑦superscriptdelimited-[]𝑑subscript𝒇𝜽𝑑𝒛𝐼1-\frac{d\mathcal{L}}{d\hat{y}}\left[\frac{d\bm{f}_{\bm{\theta}}}{d\bm{z}}-I% \right]^{-1}- divide start_ARG italic_d caligraphic_L end_ARG start_ARG italic_d over^ start_ARG italic_y end_ARG end_ARG [ divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_ARG start_ARG italic_d bold_italic_z end_ARG - italic_I ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT can be computed through a root-finding algorithm, assuming that the gradient of the loss function is known. We refer the reader to [2] and [3] for further details.

III-D Physics-Informed Deep Equilibrium Model (PIDEQ)

A Physics-Informed Deep Equilibrium Model (PIDEQ) extends the principles of physics-informed learning to a DEQ. In summary, PIDEQs leverage the principles of physics-informed learning to enhance the capabilities of DEQs. By integrating physics-based constraints, PIDEQs deliver the architectural power of DEQs with robustness and accuracy in modeling physical systems.

The challenge in physics-informing a DEQ lies in optimizing a cost function on its derivatives. As exposed in Sec. III-C, [2] proposed an efficient method to compute the first derivative of a DEQ with respect to either its parameters or the input, which, in the context of physics-informed learning, allows us to compute the loss function value. However, in computing the gradient of the loss function, we require second derivatives, which pose additional challenges as they imply differentiating the backward pass. Furthermore, to the best of our knowledge, higher-order derivatives of DEQs have not been investigated. To address this challenge, we limit ourselves to using solely differentiable operators in the backward pass. This way, automatic differentiation frameworks can automatically compute the second derivatives.

Finally, it has been shown that DEQs benefit significantly from penalizing the presence of large values in its Jacobian [7]. This penalization can be achieved by adding the Frobenius norm of the Jacobian of the equilibrium function as a regularization term in the loss function, which was shown to reduce training and inference times.

Therefore, we propose the following loss function for training PIDEQs:

J(𝜽)=Jb(𝜽)+λJ𝒩(𝜽)+κd𝒇𝜽d𝒛F.𝐽𝜽subscript𝐽𝑏𝜽𝜆subscript𝐽𝒩𝜽𝜅subscriptdelimited-∥∥𝑑subscript𝒇𝜽𝑑𝒛𝐹J\left(\bm{\theta}\right)=J_{b}\left(\bm{\theta}\right)+\lambda J_{\mathcal{N}% }\left(\bm{\theta}\right)+\kappa\left\lVert\frac{d\bm{f}_{\bm{\theta}}}{d\bm{z% }}\right\rVert_{F}.italic_J ( bold_italic_θ ) = italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( bold_italic_θ ) + italic_λ italic_J start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT ( bold_italic_θ ) + italic_κ ∥ divide start_ARG italic_d bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_ARG start_ARG italic_d bold_italic_z end_ARG ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT .

It combines a base loss term (Jbsubscript𝐽𝑏J_{b}italic_J start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT) with a physics-informed loss term (J𝒩subscript𝐽𝒩J_{\mathcal{N}}italic_J start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT) and a regularization term based on the Frobenius norm of the Jacobian of the equilibrium function, weighed by a κ0𝜅0\kappa\geq 0italic_κ ≥ 0 coefficient.

IV EXPERIMENTS

In this section, we conduct a series of experiments to evaluate the capacity of PIDEQs for solving differential equations. We use the Van der Pol oscillator as our benchmark IVP due to its well-documented complexity and lack of an analytical solution, which makes it an ideal candidate for testing the robustness of numerical solvers. We compare the PIDEQ approach with physic-informed neural networks (PINNs), a well-established deep learning technique for similar tasks [1, 8].

IV-A Problem Definition

The Van der Pol oscillator system is chosen as the target ODE for our experiments due to its nonlinear dynamics and the absence of an analytical solution, providing a stringent test for our models. The system is a classic example in nonlinear dynamics and has been extensively studied, making it a reliable benchmark.

The dynamics of the oscillator are described as a system of first-order equations

ddt[y1(t)y2(t)]=[y2(t)μ(1y1(t)2)y2(t)y1(t)].𝑑𝑑𝑡matrixsubscript𝑦1𝑡subscript𝑦2𝑡matrixsubscript𝑦2𝑡𝜇1subscript𝑦1superscript𝑡2subscript𝑦2𝑡subscript𝑦1𝑡\displaystyle\frac{d}{dt}\begin{bmatrix}y_{1}\left(t\right)\\ y_{2}\left(t\right)\end{bmatrix}=\begin{bmatrix}y_{2}\left(t\right)\\ \mu\left(1-y_{1}\left(t\right)^{2}\right)y_{2}\left(t\right)-y_{1}(t)\end{% bmatrix}.divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG [ start_ARG start_ROW start_CELL italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_μ ( 1 - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ] .

We select a value of μ=1𝜇1\mu=1italic_μ = 1 for the dampening coefficient and an initial condition 𝒚(0)=(0,0.1)𝒚000.1\bm{y}(0)=(0,0.1)bold_italic_y ( 0 ) = ( 0 , 0.1 ), i.e., a small perturbation around the unstable equilibrium at the origin, where 𝒚𝒚\bm{y}bold_italic_y represents the two states of the system. The desired solution is sought over a time horizon of 2 seconds, during which the system is expected to converge to a limit cycle [9], as illustrated in Fig. 2.

Refer to caption
Figure 2: Solution for the Van der Pol oscillator with initial condition y1(0)=0,y2(0)=0.1formulae-sequencesubscript𝑦100subscript𝑦200.1y_{1}(0)=0,\,y_{2}(0)=0.1italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 0 ) = 0 , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 0 ) = 0.1 and μ=1𝜇1\mu=1italic_μ = 1. The trajectory of the numerical solution is shown in a state-space plot, with y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in the vertical axis and y1subscript𝑦1y_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the horizontal axis.

IV-A1 Evaluation Metrics

Since no analytical solution is available, we assess the performance of the models based on their approximation error compared to a reference solution obtained using the fourth-order Runge-Kutta (RK4) method. The Integral of the Absolute Error (IAE) is computed over 1000 equally time-spaced steps within the solution interval. We also consider the computational time required for training and inference as valuable metrics, especially for time-sensitive applications.

IV-B Training

In our experiments with the PIDEQ, we use an architecture similar to the one proposed by [3], with an equilibrium function that provides as the output the (element-wise) hyperbolic tangent of an affine combination of its inputs, namely, the time value and the hidden states (𝒛𝒛\bm{z}bold_italic_z). The model’s output is a linear combination of the equilibrium vector 𝒛superscript𝒛\bm{z}^{\star}bold_italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. Formally, we can write

D𝜽EQ(t)=C𝒛𝒛=𝒇𝜽(t,𝒛)𝒇𝜽(t,𝒛)=tanh(A𝒛+t𝒂+𝒃)subscriptsuperscript𝐷𝐸𝑄𝜽𝑡𝐶superscript𝒛superscript𝒛subscript𝒇𝜽𝑡superscript𝒛subscript𝒇𝜽𝑡𝒛𝐴𝒛𝑡𝒂𝒃\begin{split}D^{EQ}_{\bm{\theta}}(t)&=C\bm{z}^{\star}\\ \bm{z}^{\star}&=\bm{f}_{\bm{\theta}}\left(t,\bm{z}^{\star}\right)\\ \bm{f}_{\bm{\theta}}\left(t,\bm{z}\right)&=\tanh\left(A\bm{z}+t\bm{a}+\bm{b}% \right)\end{split}start_ROW start_CELL italic_D start_POSTSUPERSCRIPT italic_E italic_Q end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t ) end_CELL start_CELL = italic_C bold_italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_CELL start_CELL = bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t , bold_italic_z start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( italic_t , bold_italic_z ) end_CELL start_CELL = roman_tanh ( italic_A bold_italic_z + italic_t bold_italic_a + bold_italic_b ) end_CELL end_ROW

where the vector parameter is simply a vectorized representation of all coefficients, i.e., 𝜽=(A,C,𝒂,𝒃)𝜽𝐴𝐶𝒂𝒃\bm{\theta}=(A,C,\bm{a},\bm{b})bold_italic_θ = ( italic_A , italic_C , bold_italic_a , bold_italic_b ), and the input is 𝒙=t𝒙𝑡\bm{x}=tbold_italic_x = italic_t.

We use the Adam optimizer [10] for training the PIDEQ. The cost function incorporates regularization terms to enforce physics-informed training, as detailed in Sec. III-D. Our default hyperparameter configuration333Determined through early experimentation. was a learning rate of 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, λ=0.1𝜆0.1\lambda=0.1italic_λ = 0.1, κ=1.0𝜅1.0\kappa=1.0italic_κ = 1.0, and a budget of 50000 epochs. Our default solver for the forward pass was Anderson acceleration with a tolerance of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, while the backward pass is always solved iteratively to ensure differentiability.

Nevertheless, the dimension of the hidden states (|𝒛|𝒛|\bm{z}|| bold_italic_z |), the solver tolerance, and the coefficient for the Jacobian regularization term (κ𝜅\kappaitalic_κ) are all considered hyperparameters, and we measure their impacts empirically. We also consider as a hyperparameter the choice between Anderson acceleration, Broyden’s method, and the simple iteration procedure as the forward pass solver.

IV-C Experimental Results

All experiments reported below were performed on a high-end computer using an RTX 3060 GPU. Further implementation details can be seen in our code repository444https://github.com/brunompacheco/pideq. For every hyperparameter configuration, five models were trained with random initial values for the trainable parameters.

IV-C1 Baseline

Our baseline PINN model follows the architecture proposed in [8], with four layers of twenty nodes each. As a DEQ with as many states as there are hidden nodes in a deep feedforward network has at least as much representational power [3], we start with |𝒛|=80𝒛80|\bm{z}|=80| bold_italic_z | = 80 hidden states. The results can be seen in Fig. 3. Even though both models achieved low IAE values, the PINN presented better performance in terms of IAE and training time, converging in far fewer epochs.

Refer to caption
Figure 3: The learning curve for the baseline models trained on the IVP of the Van der Pol oscillator. Solid lines are mean values (n=5𝑛5n=5italic_n = 5), and shaded regions represent minimum and maximum values. For a better visualization, a moving average of 100 epochs was taken.

IV-C2 Hyperparameter Optimization

A deeper look at the baseline PIDEQ shows that the parameters converged to an A𝐴Aitalic_A matrix with many null rows. This indicates that the model could achieve similar performance with fewer hidden states. Such intuition proved truthful in our hyperparameter tuning experiments, as shown in Fig. 4. We iteratively halved the number of states until the A𝐴Aitalic_A matrix had no more empty rows, which happened at |𝒛|=5𝒛5|\bm{z}|=5| bold_italic_z | = 5 states. Then, we took one step further, reducing to |z|=2𝑧2|z|=2| italic_z | = 2 states, but the results indicated that, indeed, 5 was the sweet spot, with faster convergence and lower final IAE. The following experiments are performed with PIDEQs with five hidden states.

Refer to caption
Figure 4: Learning curve of the PIDEQs trained with a varying number of states. The model with 80 hidden states is the same as the baseline PIDEQ from Fig. 3. Solid lines are mean values (n=5𝑛5n=5italic_n = 5). For a better visualization, a moving average of 100 epochs was taken.

We further investigate the importance of the regularization term on the Jacobian of the equilibrium function. Even though the Jacobian of the DEQ, in the context of physics-informed training, is directly learned, our experiments indicate that the presence of the regularization term is essential. Otherwise, the training took over 30 times longer and was very unstable. However, the magnitude of the κ𝜅\kappaitalic_κ coefficient has little impact on the outcomes. Our results for different values of κ𝜅\kappaitalic_κ (including 0) are illustrated in Fig. 5.

Refer to caption
Figure 5: Learning curve of PIDEQs with different κ𝜅\kappaitalic_κ values. Only one model was trained with κ=0𝜅0\kappa=0italic_κ = 0 because the training took over 30 times longer. For all other scenarios, we present the mean over five runs (parameter initializations). For a better visualization, a moving average of 100 epochs was taken.

Anderson acceleration, Broyden’s method, and the simple iteration procedure were evaluated as solvers for the forward pass. As the results illustrated in Fig. 6 show, Broyden’s method could provide a performance gain, but it came with a significant computational cost, as the epochs took over six times longer in comparison to using Anderson acceleration. At the same time, using the simple iteration procedure was three times faster than using Anderson acceleration. However, as discussed in Sec. III-B, the theoretical limitation of the simple iterative procedure limits its reliability. We also evaluated the performance of PIDEQ under different values for the solver tolerance, but no performance gain was observed for values different than our default of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

Refer to caption
Figure 6: Learning curve of PIDEQs with five states using different solvers for the forward pass. Solid lines are mean values (n=5𝑛5n=5italic_n = 5). For a better visualization, a moving average of 100 epochs was taken.

IV-C3 Results

The final PIDEQ, after hyperparameter tuning, achieves comparable results to the baseline PINN models. In terms of approximation performance, while the baseline PIDEQ achieved an IAE of 0.0082, our best PIDEQ achieved an IAE of 0.0018, and the baseline PINN achieved an IAE of 0.0002, which is seen through the last epochs in Fig. 8. Fig. 7 illustrates how small the difference is between the two approaches, showing them in comparison to the solution computed using RK4. However, the greatest gain comes from the reduced training time, as the tuned PIDEQ converges much faster than the baseline PIDEQ.

Refer to caption
(a)
Refer to caption
(b)
Figure 7: Prediction of PINN and PIDEQ in comparison to the reference result from RK4. Both models presented the median performance in the respective experiments.

The greatest performance gain came from reducing the number of hidden states, guided by the sparsity of the A𝐴Aitalic_A matrix. The smaller model size leads to improved memory efficiency and explainability. We also trained PINN models with two hidden layers of 5 nodes each, totaling 52 trainable parameters, the same number of parameters as the tuned PIDEQ. The results are shown in Fig. 8. Note that both final models take around the same number of epochs to converge.

Refer to caption
Figure 8: Learning curve of the final models in comparison to the baselines. “Final PINN” are the small PINN models, with only 52 parameters. “Final PIDEQ” are the PIDEQs after hyperparameter tuning. Solid lines are mean values (n=5𝑛5n=5italic_n = 5), and shaded regions represent minimum and maximum values. For a better visualization, a moving average of 100 epochs was taken.

V CONCLUSIONS

This study explored two innovative approaches to deep learning: Physics-Informed Neural Networks (PINNs) and Deep Equilibrium Models (DEQs). While PINNs offer efficiency in training models for physical problems, DEQs promise greater representational power with fewer parameters. We introduced Physics-Informed Deep Equilibrium Models (PIDEQ), uniting the physics-regularization with the infinite-depth architecture.

Our experiments on the Van der Pol oscillator validated PIDEQs’ effectiveness in solving IVPs of ODEs, validating our proposed approach. However, comparisons with PINN models revealed mixed results. Although PIDEQs demonstrated the ability to solve IVPs, they exhibited slightly higher errors and slower training times compared to PINNs. This suggests that while DEQ structures offer theoretical advantages in representational power, these benefits may not translate into practical improvements for certain types of problems like the Van der Pol oscillator.

The strengths of PIDEQs lie in their potential for handling more complex problems, leveraging the implicit depth of DEQs. However, the current implementation’s reliance on simple iterative methods for the backward pass can be a limiting factor, especially for more challenging problems and partial differential equations (PDEs). This limitation points to a significant area for improvement.

Future research should focus on evaluating PIDEQs with more complex problems (e.g., higher-order ODEs and PDEs) to better understand their potential advantages. Additionally, exploring more sophisticated methods for the backward pass, such as implicit differentiation techniques, could enhance the training efficiency and accuracy of PIDEQs.

In conclusion, while PIDEQs present a promising direction for integrating physics-informed principles with advanced deep-learning architectures, further work is still necessary to fully realize their potential and address their current limitations compared to traditional PINNs. This future research could pave the way for more robust and versatile models capable of solving a wide range of complex dynamical systems, providing competitive solver alternatives.

Acknowledgment

This research was funded in part by Fundação de Amparo à Pesquisa e Inovação do Estado de Santa Catarina (FAPESC) under grant 2021TR2265, CNPq under grants 308624/2021-1 and 402099/2023-0, and CAPES under grant PROEX.

References

  • [1] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, vol. 378, pp. 686–707, 2019.
  • [2] S. Bai, J. Z. Kolter, and V. Koltun, “Deep equilibrium models,” in Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
  • [3] L. El Ghaoui, F. Gu, B. Travacca, A. Askari, and A. Tsai, “Implicit deep learning,” SIAM Journal on Mathematics of Data Science, vol. 3, p. 930–958, Jan. 2021.
  • [4] E. Süli and D. F. Mayers, An Introduction to Numerical Analysis. Cambridge University Press, Aug. 2003.
  • [5] H. F. Walker and P. Ni, “Anderson acceleration for fixed-point iterations,” SIAM Journal on Numerical Analysis, vol. 49, p. 1715–1735, Jan. 2011.
  • [6] C. G. Broyden, “A class of methods for solving nonlinear simultaneous equations,” Mathematics of Computation, vol. 19, no. 92, p. 577–593, 1965.
  • [7] S. Bai, V. Koltun, and Z. Kolter, “Stabilizing equilibrium models by Jacobian regularization,” in Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139, pp. 554–565, PMLR, 18–24 Jul 2021.
  • [8] E. A. Antonelo, E. Camponogara, L. O. Seman, E. R. de Souza, J. P. Jordanou, and J. F. Hübner, “Physics-informed neural nets for control of dynamical systems,” Neurocomputing, vol. 579, p. 127419, April 2024.
  • [9] R. Grimshaw, Nonlinear Ordinary Differential Equations: Applied Mathematics and Engineering Science Texts. CRC Press, March 2017.
  • [10] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations (ICLR) (Y. Bengio and Y. LeCun, eds.), (San Diego, CA, USA), 2015.