\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \headersLearning Nonautonomous SDEYuan Chen and Dongbin Xiu

Modeling Unknown Stochastic Dynamical System Subject to External Excitation

Yuan Chen    Dongbin Xiu E-mail addresses: {chen.11050, xiu.16}@osu.edu. Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA. Funding: This work was partially supported by AFOSR FA9550-22-1-0011.
Abstract

We present a numerical method for learning unknown nonautonomous stochastic dynamical system, i.e., stochastic system subject to time dependent excitation or control signals. Our basic assumption is that the governing equations for the stochastic system are unavailable. However, short bursts of input/output (I/O) data consisting of certain known excitation signals and their corresponding system responses are available. When a sufficient amount of such I/O data are available, our method is capable of learning the unknown dynamics and producing an accurate predictive model for the stochastic responses of the system subject to arbitrary excitation signals not in the training data. Our method has two key components: (1) a local approximation of the training I/O data to transfer the learning into a parameterized form; and (2) a generative model to approximate the underlying unknown stochastic flow map in distribution. After presenting the method in detail, we present a comprehensive set of numerical examples to demonstrate the performance of the proposed method, especially for long-term system predictions.

keywords:
Data-driven modeling, stochastic dynamical systems, deep neural networks, nonautonomous system
{MSCcodes}

60H10, 60H35, 62M45, 65C30

1 Introduction

There has been a growing interest in recovering/discovering unknown dynamical systems from observational data. Most of the existing studies focus on deterministic systems, with methods such as physics-informed neural networks (PINNs) ([39, 40]), SINDy ([5]), Fourier neural operator (FNO) ([24]), computational graph completion ([30]), sparsity promoting methods ([41, 42, 19]), flow map learning (FML) ([38, 11]), to name a few.

Learning unknown stochastic systems is notably more challenging, as the stochastic noises in the systems usually can not be directly observed. The existing work utilizes Gaussian process ([48, 1, 12, 29]), polynomial approximations ([44, 22]), deep neural networks (DNNs) [8, 47, 9, 49, 14, 50], etc. More recently, a stochastic extension of the deterministic flow map learning (FML) approach ([38, 11]) was proposed. It employs generative models such as GANs (generative adversarial networks) ([10]) or autoencoders ([46]) to model the underlying stochasticity. However, most, if not all, of these methods are developed for autonomous systems, where time-invariance (in distribution) holds true and is critical to the method development.

The focus and contribution of this paper is on the learning and modeling of unknown non-autonomous stochastic systems. More specifically, we consider SDEs with unknown governing equations and subject to time dependent external excitation or control signals. Our goal is to develop a method that can capture the stochastic dynamics of the unknown systems by using short-term data consisting of input/output (I/O) relations between excitation signals and their corresponding system responses. We remark that there exist some studies on modeling deterministic non-autonomous systems, using methodology such as Dynamic Mode Decomposition (DMD) ([25, 34]), SINDy ([6]), Koopman operator ([35]), FML ([36]), etc. These methods are not applicable for stochastic non-autonomous systems.

The proposed method in this paper has two key components. First, the method utilize the observational I/O data to construct an accurate representation of unknown stochastic dynamics of the system. This is accomplished by a generative model that learns the stochastic map** of the system between two consecutive discrete time steps. The learning of this stochastic flow map is similar to the work of [10, 46], which extended the deterministic FML to stochastic systems. While [10, 46] utilized GANs and autoencoder as the generative model, in this paper we employ conditional normalizing flow (cf. [31]). Normalizing flow has been widely adopted as a probabilistic model for generating data with desired distributions. Its applications include image and video generation [21], statistical inference and sampling [27, 43], reinforcement learning [18], as well as scientific computing [26, 23, 17, 13]. The second key component of the proposed method is local parameterization of the excitation signal in the training I/O data. The method was first introduced in [36] for deterministic nonautonomous sytem. We adopt the similar idea and extend it to stochastic system. The approach seeks to parameterize the excitation sigals in the training data via a localized polynomial over one time step. This then transforms the learning problem into a parametric learning between the coefficients of the local polynomials and the system responses. This is a critical component, as it allows the learned system to conduct long-term system predictions under arbitrary excitation signals that are never seen in the training data. Although the proposed method requires a large number of short bursts data, the overall demand for data may not be as large. This is because the burst length of the training I/O data is as short as two time steps. Once trained, the learned model is able to simulate the unknown stochastic systems for very long-term and subject to arbitrary exitation/control signals. We demonstrate this important property in several of our numerical examples. The learning is performed using training I/O data observed over only O(1)𝑂1O(1)italic_O ( 1 ) nondimensional time units. However, the system predictions by the learned system can be accurate for time units as large as O(104)𝑂superscript104O(10^{4})italic_O ( 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) and beyond, and under excitation signals not in the training data.

2 Setup

Let ΩΩ\Omegaroman_Ω be an event space and T𝑇Titalic_T a finite time horizon. We consider a d𝑑ditalic_d-dimensional (d1𝑑1d\geq 1italic_d ≥ 1) stochastic process 𝐱(ω,t):Ω×[0,T]d:𝐱𝜔𝑡maps-toΩ0𝑇superscript𝑑\mathbf{x}(\omega,t):\Omega\times[0,T]\mapsto\mathbb{R}^{d}bold_x ( italic_ω , italic_t ) : roman_Ω × [ 0 , italic_T ] ↦ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT driven by an unknown (non-autonomous) stochastic differential equation (SDE) subject to external inputs

(1) d𝐱t=𝐚(𝐱t,μ(t))dt+𝐛(𝐱t,ν(t))d𝐖t,𝑑subscript𝐱𝑡𝐚subscript𝐱𝑡𝜇𝑡𝑑𝑡𝐛subscript𝐱𝑡𝜈𝑡𝑑subscript𝐖𝑡d\mathbf{x}_{t}=\mathbf{a}(\mathbf{x}_{t},\mu(t))dt+\mathbf{b}(\mathbf{x}_{t},% \nu(t))d\mathbf{W}_{t},italic_d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_a ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_μ ( italic_t ) ) italic_d italic_t + bold_b ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ν ( italic_t ) ) italic_d bold_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where 𝐖tsubscript𝐖𝑡\mathbf{W}_{t}bold_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is m𝑚mitalic_m-dimensional (m1𝑚1m\geq 1italic_m ≥ 1) Brownian motion, 𝐚:d×d:𝐚superscript𝑑superscript𝑑\mathbf{a}:\mathbb{R}^{d}\times\mathbb{R}\to\mathbb{R}^{d}bold_a : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT drift function, 𝐛:d×d×m:𝐛superscript𝑑superscript𝑑𝑚\mathbf{b}:\mathbb{R}^{d}\times\mathbb{R}\to\mathbb{R}^{d\times m}bold_b : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R → blackboard_R start_POSTSUPERSCRIPT italic_d × italic_m end_POSTSUPERSCRIPT diffusion function, and μ(t)𝜇𝑡\mu(t)italic_μ ( italic_t ) and ν(t)𝜈𝑡\nu(t)italic_ν ( italic_t ) time-dependent external inputs into the stochastic system. In practice, the inputs can be external excitation signals or control signals. Throughout this paper, we will generally refer them as excitations and denote 𝐮(t)=(μ(t),ν(t))T𝐮𝑡superscript𝜇𝑡𝜈𝑡𝑇\mathbf{u}(t)=(\mu(t),\nu(t))^{T}bold_u ( italic_t ) = ( italic_μ ( italic_t ) , italic_ν ( italic_t ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT to consolidate the notation.

Our basic assumption is that the SDE is unknown, in the sense that the functions 𝐚𝐚\mathbf{a}bold_a and 𝐛𝐛\mathbf{b}bold_b are not known. Also, the driving Brownian motion 𝐖tsubscript𝐖𝑡\mathbf{W}_{t}bold_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can not be observed. However, we have input-output (I/O) time history data between the excitations 𝐮=(μ,ν)T𝐮superscript𝜇𝜈𝑇\mathbf{u}=(\mu,\nu)^{T}bold_u = ( italic_μ , italic_ν ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, i.e., the inputs, and system response 𝐱𝐱\mathbf{x}bold_x, i.e., the output,

(2) I/O training data:𝐮(t)𝐱(t).I/O training data:𝐮𝑡𝐱𝑡\textrm{I/O training data:}\qquad\mathbf{u}(t)\rightarrow\mathbf{x}(t).I/O training data: bold_u ( italic_t ) → bold_x ( italic_t ) .

Our goal is to construct a numerical model for the unknown system (1) such that it can produce accurate predictions of the system response 𝐱(t)𝐱𝑡\mathbf{x}(t)bold_x ( italic_t ) for arbitrarily given excitations 𝐮(t)𝐮𝑡\mathbf{u}(t)bold_u ( italic_t ) that are not observed in the training data (2).

2.1 Problem Statement

The method presented in this paper is based on discrete time setting. Let t0<t1<subscript𝑡0subscript𝑡1t_{0}<t_{1}<...italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < … be discrete time points. For simplicity, we assume the time steps are of uniform length Δ=ti+1tiΔsubscript𝑡𝑖1subscript𝑡𝑖\Delta=t_{i+1}-t_{i}roman_Δ = italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i0for-all𝑖0\forall i\geq 0∀ italic_i ≥ 0. Suppose we observe NT1subscript𝑁𝑇1N_{T}\geq 1italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≥ 1 I/O sequences of solution responses subject to input excitations: for i=1,,NT𝑖1subscript𝑁𝑇i=1,\dots,N_{T}italic_i = 1 , … , italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT,

(3) (𝐮(t0(i)),𝐮(t1(i)),,𝐮(tLi(i)))(𝐱(t0(i)),𝐱(t1(i)),,𝐱(tLi(i))),𝐮superscriptsubscript𝑡0𝑖𝐮superscriptsubscript𝑡1𝑖𝐮superscriptsubscript𝑡subscript𝐿𝑖𝑖𝐱superscriptsubscript𝑡0𝑖𝐱superscriptsubscript𝑡1𝑖𝐱superscriptsubscript𝑡subscript𝐿𝑖𝑖\left(\mathbf{u}\left(t_{0}^{(i)}\right),\mathbf{u}\left(t_{1}^{(i)}\right),% \cdots,\mathbf{u}\left(t_{L_{i}}^{(i)}\right)\right)\rightarrow\left(\mathbf{x% }\left(t_{0}^{(i)}\right),\mathbf{x}\left(t_{1}^{(i)}\right),\cdots,\mathbf{x}% \left(t_{L_{i}}^{(i)}\right)\right),( bold_u ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , bold_u ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , ⋯ , bold_u ( italic_t start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ) → ( bold_x ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , bold_x ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , ⋯ , bold_x ( italic_t start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ) ,

where (Li+1)subscript𝐿𝑖1(L_{i}+1)( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 ) is the length of the i𝑖iitalic_i-th observation sequence. Note that each sequence of the I/O data can cover different time spans. Also, one may have more information about the excitation 𝐮(t)𝐮𝑡\mathbf{u}(t)bold_u ( italic_t ) beyond its point values. For example, the analytical form of 𝐮(t)𝐮𝑡\mathbf{u}(t)bold_u ( italic_t ) may be known within the time interval [t0(i),tLi(i)]superscriptsubscript𝑡0𝑖superscriptsubscript𝑡subscript𝐿𝑖𝑖[t_{0}^{(i)},t_{L_{i}}^{(i)}][ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] for some sequences.

The objective is to construct a numerical model to predict the system response of (1) subject to arbitrary excitations. More specifically, given an initial condition 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and excitation signal 𝐮(t)𝐮𝑡\mathbf{u}(t)bold_u ( italic_t ) that is not in the training I/O data (3), we require the model prediction 𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG to approximate the true system response 𝐱𝐱\mathbf{x}bold_x, i.e.,

(4) 𝐱^(tn;𝐱0,𝐮(t))d𝐱(tn;𝐱0,𝐮(t)),n=1,,formulae-sequencesuperscript𝑑^𝐱subscript𝑡𝑛subscript𝐱0𝐮𝑡𝐱subscript𝑡𝑛subscript𝐱0𝐮𝑡𝑛1\widehat{\mathbf{x}}(t_{n};\mathbf{x}_{0},\mathbf{u}(t))\stackrel{{% \scriptstyle d}}{{\approx}}\mathbf{x}(t_{n};\mathbf{x}_{0},\mathbf{u}(t)),% \qquad n=1,\dots,over^ start_ARG bold_x end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ; bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_u ( italic_t ) ) start_RELOP SUPERSCRIPTOP start_ARG ≈ end_ARG start_ARG italic_d end_ARG end_RELOP bold_x ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ; bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_u ( italic_t ) ) , italic_n = 1 , … ,

where dsuperscript𝑑\stackrel{{\scriptstyle d}}{{\approx}}start_RELOP SUPERSCRIPTOP start_ARG ≈ end_ARG start_ARG italic_d end_ARG end_RELOP stands for approximation in distribution. Note that since in general the stochastic driving term 𝐖tsubscript𝐖𝑡\mathbf{W}_{t}bold_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can not be directly observed, a weak approximation, such as approximation in distribution, is typically the most one can achieve from a mathematical point of view.

2.2 Related Work and Contribution

This method developed in this paper has its foundation in two recent work: flow map learning (FML) for modeling deterministic unknown dynamical systems and its extension to modeling stochastic dynamical systems.

For an unknown deterministic autonomous system, d𝐱dt=𝐟(𝐱)𝑑𝐱𝑑𝑡𝐟𝐱\frac{d\mathbf{x}}{dt}=\mathbf{f}(\mathbf{x})divide start_ARG italic_d bold_x end_ARG start_ARG italic_d italic_t end_ARG = bold_f ( bold_x ), 𝐱d𝐱superscript𝑑\mathbf{x}\in\mathbb{R}^{d}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where 𝐟:dd:𝐟superscript𝑑superscript𝑑\mathbf{f}:\mathbb{R}^{d}\to\mathbb{R}^{d}bold_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is unknown. The FML method seeks to approximate the unknown flow map 𝐱n=𝚽tnts(𝐱s)subscript𝐱𝑛subscript𝚽subscript𝑡𝑛subscript𝑡𝑠subscript𝐱𝑠\mathbf{x}_{n}=\mathbf{\Phi}_{t_{n}-t_{s}}(\mathbf{x}_{s})bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = bold_Φ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) by using observation data. More specifically, by using data on 𝐱𝐱\mathbf{x}bold_x over one time step ΔtΔ𝑡\Delta troman_Δ italic_t, the FML method constructs a model

𝐱n+1=𝚽~Δt(𝐱n),subscript𝐱𝑛1subscript~𝚽Δ𝑡subscript𝐱𝑛\mathbf{x}_{n+1}=\widetilde{\mathbf{\Phi}}_{\Delta t}(\mathbf{x}_{n}),bold_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = over~ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT roman_Δ italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

where 𝚽~Δt𝚽Δtsubscript~𝚽Δ𝑡subscript𝚽Δ𝑡\widetilde{\mathbf{\Phi}}_{\Delta t}\approx{\mathbf{\Phi}}_{\Delta t}over~ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT roman_Δ italic_t end_POSTSUBSCRIPT ≈ bold_Φ start_POSTSUBSCRIPT roman_Δ italic_t end_POSTSUBSCRIPT is a numerical approximation of the true flow map over one time step ΔtΔ𝑡\Delta troman_Δ italic_t. Once constructed, the FML model can be used as a time marching scheme to predict the system response under a given initial condition. This framework was proposed in [38], with extensions to partially observed system [16], parametric systems [37], as well as non-autonomous deterministic system [36].

For learning autonomous stochastic system, d𝐱dt=𝐟(𝐱,ω(t)),𝑑𝐱𝑑𝑡𝐟𝐱𝜔𝑡\frac{d\mathbf{x}}{dt}=\mathbf{f}(\mathbf{x},\omega(t)),divide start_ARG italic_d bold_x end_ARG start_ARG italic_d italic_t end_ARG = bold_f ( bold_x , italic_ω ( italic_t ) ) , where ω(t)𝜔𝑡\omega(t)italic_ω ( italic_t ) represents an unknown stochastic process driving the system. The work of [10] developed stochastic flow map learning (sFML). Assuming the system satisfies time-homogeneous property ([28]) (𝐱s+Δt|𝐱s)=(𝐱Δt|𝐱0)conditionalsubscript𝐱𝑠Δ𝑡subscript𝐱𝑠conditionalsubscript𝐱Δ𝑡subscript𝐱0\mathbb{P}(\mathbf{x}_{s+\Delta t}|\mathbf{x}_{s})=\mathbb{P}(\mathbf{x}_{% \Delta t}|\mathbf{x}_{0})blackboard_P ( bold_x start_POSTSUBSCRIPT italic_s + roman_Δ italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) = blackboard_P ( bold_x start_POSTSUBSCRIPT roman_Δ italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), s0𝑠0s\geq 0italic_s ≥ 0, the method uses the observation data on the state variable 𝐱𝐱\mathbf{x}bold_x to construct a one-step generative model

𝐱n+1=𝐆Δt(𝐱n;𝐳),subscript𝐱𝑛1subscript𝐆Δ𝑡subscript𝐱𝑛𝐳\mathbf{x}_{n+1}=\mathbf{G}_{\Delta t}(\mathbf{x}_{n};\mathbf{z}),bold_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = bold_G start_POSTSUBSCRIPT roman_Δ italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ; bold_z ) ,

where 𝐳𝐳\mathbf{z}bold_z is a random variable with known distribution (e.g., standard Gaussian). The function 𝐆𝐆\mathbf{G}bold_G, termed stochastic flow map, approximates the conditional distribution 𝐆Δt(𝐱s;𝐳)(𝐱s+Δt|𝐱s).subscript𝐆Δ𝑡subscript𝐱𝑠𝐳conditionalsubscript𝐱𝑠Δ𝑡subscript𝐱𝑠\mathbf{G}_{\Delta t}(\mathbf{x}_{s};\mathbf{z})\approx\mathbb{P}(\mathbf{x}_{% s+\Delta t}|\mathbf{x}_{s}).bold_G start_POSTSUBSCRIPT roman_Δ italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ; bold_z ) ≈ blackboard_P ( bold_x start_POSTSUBSCRIPT italic_s + roman_Δ italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) . Subsequently, the sFML model becomes a weak approximation, in distribution, to the true stochastic dynamics. Different generative models can be employed under the sFML framework. For example, generative adversarial networks (GANs) are used in [10], and an autoencoder is employed in [46].

The primary contribution of this paper is on the development of data driven modeling for unknown stochastic systems subject to external excitations. To accomplish this, we extend the sFML framework ([10]), which was developed for autonomous system, to non-autonomous stochastic system. To learn the system I/O responses, we employ the local parameterization technique developed for non-autonomous deterministic system ([36]). The method parameterizes the input excitations in the data and transforms the learning problem into learning a parametric dynamical system. For stochastic non-autonomous system considered in this paper, we incorporate the method into a generative model in the sFML framework. In particular, we use normalizing flow as the generative model, which has not been considered in stochastic dynamical system learning. We shall demonstrate that the newly developed method is highly effective in modeling unknown stochastic systems, when excitations are not present in the training data.

3 Methodology

In this section, we describe the proposed learning method in detail.

3.1 Parameterization of Inputs

Consider the unknown SDE (1) over a time interval [tn,tn+1]subscript𝑡𝑛subscript𝑡𝑛1[t_{n},t_{n+1}][ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ], n0𝑛0n\geq 0italic_n ≥ 0,

(5) 𝐱(tn+1)=𝐱(tn)+tntn+1𝐚(𝐱(s),μ(s))𝑑s+tntn+1𝐛(𝐱(s),ν(s))𝑑𝐖(s),𝐱subscript𝑡𝑛1𝐱subscript𝑡𝑛superscriptsubscriptsubscript𝑡𝑛subscript𝑡𝑛1𝐚𝐱𝑠𝜇𝑠differential-d𝑠superscriptsubscriptsubscript𝑡𝑛subscript𝑡𝑛1𝐛𝐱𝑠𝜈𝑠differential-d𝐖𝑠\mathbf{x}(t_{n+1})=\mathbf{x}({t_{n}})+\int_{t_{n}}^{t_{n+1}}\mathbf{a}(% \mathbf{x}(s),\mu(s))ds+\int_{t_{n}}^{t_{n+1}}\mathbf{b}(\mathbf{x}(s),\nu(s))% d\mathbf{W}(s),bold_x ( italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) = bold_x ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_a ( bold_x ( italic_s ) , italic_μ ( italic_s ) ) italic_d italic_s + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_b ( bold_x ( italic_s ) , italic_ν ( italic_s ) ) italic_d bold_W ( italic_s ) ,

which can be wrriten equivalently as,

(6) 𝐱(tn+1)=𝐱(tn)+0Δ𝐚(𝐱(tn+τ),μ(tn+τ))𝑑τ+0Δ𝐛(𝐱(tn+τ),ν(tn+τ))𝑑𝐖(tn+τ).𝐱subscript𝑡𝑛1𝐱subscript𝑡𝑛superscriptsubscript0Δ𝐚𝐱subscript𝑡𝑛𝜏𝜇subscript𝑡𝑛𝜏differential-d𝜏superscriptsubscript0Δ𝐛𝐱subscript𝑡𝑛𝜏𝜈subscript𝑡𝑛𝜏differential-d𝐖subscript𝑡𝑛𝜏\begin{split}\mathbf{x}({t_{n+1}})=\mathbf{x}({t_{n}})&+\int_{0}^{\Delta}% \mathbf{a}(\mathbf{x}({t_{n}+\tau}),\mu(t_{n}+\tau))d\tau\\ &+\int_{0}^{\Delta}\mathbf{b}(\mathbf{x}({t_{n}+\tau}),\nu(t_{n}+\tau))d% \mathbf{W}(t_{n}+\tau).\end{split}start_ROW start_CELL bold_x ( italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) = bold_x ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Δ end_POSTSUPERSCRIPT bold_a ( bold_x ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) , italic_μ ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) ) italic_d italic_τ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Δ end_POSTSUPERSCRIPT bold_b ( bold_x ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) , italic_ν ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) ) italic_d bold_W ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) . end_CELL end_ROW

By using the compact notation 𝐮(t)=(μ(t),ν(t))T𝐮𝑡superscript𝜇𝑡𝜈𝑡𝑇\mathbf{u}(t)=(\mu(t),\nu(t))^{T}bold_u ( italic_t ) = ( italic_μ ( italic_t ) , italic_ν ( italic_t ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, we now consider the excitation 𝐮(t)𝐮𝑡\mathbf{u}(t)bold_u ( italic_t ) in the time interval [tn,tn+1]subscript𝑡𝑛subscript𝑡𝑛1[t_{n},t_{n+1}][ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ]. Given the information of the excitation in the training data (3), we construct a parameterized form

(7) 𝐮(t)|[tn,tn+1)𝐮~(τ;𝚪n)=k=1m𝜶nkpk(τ),τ[0,Δ),formulae-sequenceevaluated-at𝐮𝑡subscript𝑡𝑛subscript𝑡𝑛1~𝐮𝜏subscript𝚪𝑛superscriptsubscript𝑘1𝑚superscriptsubscript𝜶𝑛𝑘subscript𝑝𝑘𝜏𝜏0Δ\mathbf{u}(t)|_{[t_{n},t_{n+1})}\approx\widetilde{\mathbf{u}}(\tau;\mathbf{% \Gamma}_{n})=\sum_{k=1}^{m}\boldsymbol{\alpha}_{n}^{k}~{}p_{k}(\tau),\qquad% \tau\in[0,\Delta),bold_u ( italic_t ) | start_POSTSUBSCRIPT [ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≈ over~ start_ARG bold_u end_ARG ( italic_τ ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT bold_italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) , italic_τ ∈ [ 0 , roman_Δ ) ,

where {pk,k=1,,m}formulae-sequencesubscript𝑝𝑘𝑘1𝑚\{p_{k},k=1,\dots,m\}{ italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k = 1 , … , italic_m } is a set of prescribed analytical basis functions and

(8) 𝚪n={𝜶n1,,𝜶nm}nΓ,subscript𝚪𝑛superscriptsubscript𝜶𝑛1superscriptsubscript𝜶𝑛𝑚superscriptsubscript𝑛Γ\mathbf{\Gamma}_{n}=\{\boldsymbol{\alpha}_{n}^{1},\dots,\boldsymbol{\alpha}_{n% }^{m}\}\in\mathbb{R}^{n_{\Gamma}},bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { bold_italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

are the expansion coefficients. In principle, one can choose any suitable basis functions. Since the time interval [tn,tn+1]subscript𝑡𝑛subscript𝑡𝑛1[t_{n},t_{n+1}][ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ] usually has a (very) small step size ΔΔ\Deltaroman_Δ, it suffices to use low-order polynomials. In fact, low-degree monomials bases, pk(τ)=τk1subscript𝑝𝑘𝜏superscript𝜏𝑘1p_{k}(\tau)=\tau^{k-1}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) = italic_τ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT, k1𝑘1k\geq 1italic_k ≥ 1, would be sufficient for most problems. When k=0𝑘0k=0italic_k = 0, the parameterization takes form of piecewise constant function; when k=1𝑘1k=1italic_k = 1, piecewise linear function.

The local parameterization of 𝐮𝐮\mathbf{u}bold_u is carried out based on the information one has about the excitations. If the excitations are only known at the discrete time instances, as shown in (3), then it is natural to utilize piecewise linear polynomial,

𝐮~(τ;𝚪n)=𝐮(tn)+τΔ(𝐮(tn+1)𝐮(tn)).~𝐮𝜏subscript𝚪𝑛𝐮subscript𝑡𝑛𝜏Δ𝐮subscript𝑡𝑛1𝐮subscript𝑡𝑛\widetilde{\mathbf{u}}(\tau;\mathbf{\Gamma}_{n})=\mathbf{u}(t_{n})+\frac{\tau}% {\Delta}(\mathbf{u}(t_{n+1})-\mathbf{u}(t_{n})).over~ start_ARG bold_u end_ARG ( italic_τ ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = bold_u ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + divide start_ARG italic_τ end_ARG start_ARG roman_Δ end_ARG ( bold_u ( italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - bold_u ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) .

If more information about 𝐮(t)𝐮𝑡\mathbf{u}(t)bold_u ( italic_t ) is available, one can construct a higher degree polynomial. Note that since the time step ΔΔ\Deltaroman_Δ is usually small, a quadratic polynomial 𝐮~~𝐮\widetilde{\mathbf{u}}over~ start_ARG bold_u end_ARG can be highly accurate in any time interval [tn,tn+1)subscript𝑡𝑛subscript𝑡𝑛1[t_{n},t_{n+1})[ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ). We remark that in the representation, only the values of the excitations 𝐮𝐮\mathbf{u}bold_u at tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and tn+1subscript𝑡𝑛1t_{n+1}italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT are needed. The values of the time tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and tn+1subscript𝑡𝑛1t_{n+1}italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT are not required.

3.2 Parametric Stochastic Flow Map

By replacing 𝐮𝐮\mathbf{u}bold_u by the local polynomial 𝐮~~𝐮\widetilde{\mathbf{u}}over~ start_ARG bold_u end_ARG (7), we transform the system (6) into

(9) 𝐱~(tn+1)=𝐱~(tn)+0Δ𝐚(𝐱~(tn+τ),μ~(τ;𝚪n))𝑑τ+0Δ𝐛(𝐱~(tn+τ),ν~(τ;𝚪n))𝑑𝐖(tn+τ),~𝐱subscript𝑡𝑛1~𝐱subscript𝑡𝑛superscriptsubscript0Δ𝐚~𝐱subscript𝑡𝑛𝜏~𝜇𝜏subscript𝚪𝑛differential-d𝜏superscriptsubscript0Δ𝐛~𝐱subscript𝑡𝑛𝜏~𝜈𝜏subscript𝚪𝑛differential-d𝐖subscript𝑡𝑛𝜏\begin{split}\widetilde{\mathbf{x}}({t_{n+1}})=\widetilde{\mathbf{x}}({t_{n}})% &+\int_{0}^{\Delta}\mathbf{a}(\widetilde{\mathbf{x}}({t_{n}+\tau}),\widetilde{% \mu}\left(\tau;{\mathbf{\Gamma}}_{n}\right))d\tau\\ &+\int_{0}^{\Delta}\mathbf{b}(\widetilde{\mathbf{x}}({t_{n}+\tau}),\widetilde{% \nu}\left(\tau;{\mathbf{\Gamma}}_{n}\right))d\mathbf{W}(t_{n}+\tau),\end{split}start_ROW start_CELL over~ start_ARG bold_x end_ARG ( italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) = over~ start_ARG bold_x end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Δ end_POSTSUPERSCRIPT bold_a ( over~ start_ARG bold_x end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) , over~ start_ARG italic_μ end_ARG ( italic_τ ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) italic_d italic_τ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Δ end_POSTSUPERSCRIPT bold_b ( over~ start_ARG bold_x end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) , over~ start_ARG italic_ν end_ARG ( italic_τ ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) italic_d bold_W ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) , end_CELL end_ROW

where the excitation signals 𝐮=(μ,ν)T𝐮superscript𝜇𝜈𝑇\mathbf{u}=(\mu,\nu)^{T}bold_u = ( italic_μ , italic_ν ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT has been parameterized by 𝐮~~𝐮\widetilde{\mathbf{u}}over~ start_ARG bold_u end_ARG via a set of parameters 𝚪nsubscript𝚪𝑛\mathbf{\Gamma}_{n}bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Compared to (6), the transformed system (9) contains possible numerical error introduced by the parameterization of the excitations over the time domain [0,Δ)0Δ[0,\Delta)[ 0 , roman_Δ ). The error can be made arbitrarily small if one uses higher degree polynomials when ΔΔ\Deltaroman_Δ is sufficiently small.

By using subscript to denote the time level and letting

𝐱~(tn)=𝐱~n,d𝐖n(τ)=d𝐖(tn+τ),formulae-sequence~𝐱subscript𝑡𝑛subscript~𝐱𝑛𝑑subscript𝐖𝑛𝜏𝑑𝐖subscript𝑡𝑛𝜏\widetilde{\mathbf{x}}(t_{n})=\widetilde{\mathbf{x}}_{n},\qquad d\mathbf{W}_{n% }(\tau)=d\mathbf{W}(t_{n}+\tau),over~ start_ARG bold_x end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_d bold_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_τ ) = italic_d bold_W ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ ) ,

the parameterized system (9) indicates that, there exists a map**

(10) 𝐱~n+1=𝐆Δ(𝐱~n,d𝐖n(Δ);𝚪n),subscript~𝐱𝑛1subscript𝐆Δsubscript~𝐱𝑛𝑑subscript𝐖𝑛Δsubscript𝚪𝑛\widetilde{\mathbf{x}}_{n+1}=\mathbf{G}_{\Delta}(\widetilde{\mathbf{x}}_{n},d% \mathbf{W}_{n}(\Delta);\boldsymbol{\Gamma}_{n}),over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = bold_G start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_d bold_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( roman_Δ ) ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

where 𝐆Δsubscript𝐆Δ\mathbf{G}_{\Delta}bold_G start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT is what we shall call parametric stochastic flow map, which is parameterized by 𝚪nsubscript𝚪𝑛\mathbf{\Gamma}_{n}bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. It is an unknown operator as the functions 𝐚𝐚\mathbf{a}bold_a and 𝐛𝐛\mathbf{b}bold_b are unknown in the original system (1).

Remark 3.1.

It is important to recognize that for the Brownian motion 𝐖(t)𝐖𝑡\mathbf{W}(t)bold_W ( italic_t ), or in general for Lévy processes (càdlàg stochastic processes with stationary independent increments), the process d𝐖n(τ)𝑑subscript𝐖𝑛𝜏d\mathbf{W}_{n}(\tau)italic_d bold_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_τ ) is stationary and independent of 𝐖(tn)𝐖subscript𝑡𝑛\mathbf{W}(t_{n})bold_W ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Therefore, only the time difference Δ=tn+1tnΔsubscript𝑡𝑛1subscript𝑡𝑛\Delta=t_{n+1}-t_{n}roman_Δ = italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT matters, and the values of tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and tn+1subscript𝑡𝑛1t_{n+1}italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT do not. Consequently, we have suppressed the time variable tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and tn+1subscript𝑡𝑛1t_{n+1}italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT in (10).

3.3 Stochastic Flow Map Learning

In this section, we describe our main method of stochastic flow map learning (sFML), which constructs a generative model to approximate the stochastic flow map (10) by using the trajectory data (3).

3.3.1 Training Data

To construct the training data set, we reorganize the original training data set (3) into pairs of consecutive time instances. Since for each of the i𝑖iitalic_i-th trajectory, i=1,,NT𝑖1subscript𝑁𝑇i=1,\dots,N_{T}italic_i = 1 , … , italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, we can extract Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such pairs, there are a total number of M=L1++LNT𝑀subscript𝐿1subscript𝐿subscript𝑁𝑇M=L_{1}+\cdots+L_{N_{T}}italic_M = italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_L start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT I/O data pairs from the data set (3):

(11) (𝐮(tk(j)),𝐮(tk+1(j)))(𝐱(tk(j)),𝐱(tk+1(j))),j=1,,M.formulae-sequence𝐮superscriptsubscript𝑡𝑘𝑗𝐮superscriptsubscript𝑡𝑘1𝑗𝐱superscriptsubscript𝑡𝑘𝑗𝐱superscriptsubscript𝑡𝑘1𝑗𝑗1𝑀\left(\mathbf{u}\left(t_{k}^{(j)}\right),\mathbf{u}\left(t_{k+1}^{(j)}\right)% \right)\rightarrow\left(\mathbf{x}\left(t_{k}^{(j)}\right),\mathbf{x}\left(t_{% k+1}^{(j)}\right)\right),\qquad j=1,\dots,M.( bold_u ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) , bold_u ( italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) ) → ( bold_x ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) , bold_x ( italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) ) , italic_j = 1 , … , italic_M .

Next, we perform the local parameterization procedure from Section 3.1 to each pair of the input data (𝐮(tk(j)),𝐮(tk+1(j)))𝐮superscriptsubscript𝑡𝑘𝑗𝐮superscriptsubscript𝑡𝑘1𝑗\left(\mathbf{u}\left(t_{k}^{(j)}\right),\mathbf{u}\left(t_{k+1}^{(j)}\right)\right)( bold_u ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) , bold_u ( italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) ) and obtain its parameterization 𝚪k(j)superscriptsubscript𝚪𝑘𝑗\mathbf{\Gamma}_{k}^{(j)}bold_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, j=1,,M𝑗1𝑀j=1,\dots,Mitalic_j = 1 , … , italic_M, in the form of (7). We now have

(12) 𝚪k(j)(𝐱(tk(j)),𝐱(tk+1(j))),j=1,,M.formulae-sequencesuperscriptsubscript𝚪𝑘𝑗𝐱superscriptsubscript𝑡𝑘𝑗𝐱superscriptsubscript𝑡𝑘1𝑗𝑗1𝑀\mathbf{\Gamma}_{k}^{(j)}\rightarrow\left(\mathbf{x}\left(t_{k}^{(j)}\right),% \mathbf{x}\left(t_{k+1}^{(j)}\right)\right),\qquad j=1,\dots,M.bold_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT → ( bold_x ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) , bold_x ( italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) ) , italic_j = 1 , … , italic_M .

Since the values of the time variables do not matter, see Remark 3.1, we again suppress the time variables and write our training data set as

(13) 𝒮M={𝚪0(j);(𝐱0(j),𝐱1(j))}j=1M,subscript𝒮𝑀superscriptsubscriptsuperscriptsubscript𝚪0𝑗superscriptsubscript𝐱0𝑗superscriptsubscript𝐱1𝑗𝑗1𝑀\mathcal{S}_{M}=\left\{\boldsymbol{\mathbf{\Gamma}}_{0}^{(j)};\left(\mathbf{x}% _{0}^{(j)},\mathbf{x}_{1}^{(j)}\right)\right\}_{j=1}^{M},caligraphic_S start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = { bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ; ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ,

where M𝑀Mitalic_M is the total number of the parametric data pairs. In this way, each j𝑗jitalic_j-th entry of the data set is a trajectory of length two over one time step ΔΔ\Deltaroman_Δ, starting with its “initial condition” at 𝐱0(j)superscriptsubscript𝐱0𝑗\mathbf{x}_{0}^{(j)}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, ending one step later at 𝐱1(j)superscriptsubscript𝐱1𝑗\mathbf{x}_{1}^{(j)}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT, and driven by a known excitation parameterized by 𝚪0(j)superscriptsubscript𝚪0𝑗\boldsymbol{\mathbf{\Gamma}}_{0}^{(j)}bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT and an unknown stationary stochastic process d𝐖n(Δ)𝑑subscript𝐖𝑛Δd\mathbf{W}_{n}(\Delta)italic_d bold_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( roman_Δ ).

3.3.2 Generative Model

In stochastic flow map learning (sFML), we seek to approximate the parametric stochastic flow map (10) via a recursive generative model in the form of

(14) 𝐱^n+1=𝐆^Δ(𝐱^n,𝐳n;𝚪n),subscript^𝐱𝑛1subscript^𝐆Δsubscript^𝐱𝑛subscript𝐳𝑛subscript𝚪𝑛\widehat{\mathbf{x}}_{n+1}=\mathbf{\widehat{G}}_{\Delta}(\widehat{\mathbf{x}}_% {n},\mathbf{z}_{n};\mathbf{\Gamma}_{n}),over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = over^ start_ARG bold_G end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

where 𝐳nz𝐳superscriptsubscript𝑛𝑧\mathbf{z}\in\mathbb{R}^{n_{z}}bold_z ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a random variable of known distribution. Again, since the stationary stochastic process d𝐖n(Δ)𝑑subscript𝐖𝑛Δd\mathbf{W}_{n}(\Delta)italic_d bold_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( roman_Δ ) in (10) is not observed, the constructed sFML model (14) is expected to be a weak approximation of (10), and in this particular case, approximation in distribution.

In order to construct the sFML model (14), we execute the model for one time step over ΔΔ\Deltaroman_Δ,

(15) 𝐱^1=𝐆^Δ(𝐱^0,𝐳0;𝚪0),subscript^𝐱1subscript^𝐆Δsubscript^𝐱0subscript𝐳0subscript𝚪0\widehat{\mathbf{x}}_{1}=\mathbf{\widehat{G}}_{\Delta}(\widehat{\mathbf{x}}_{0% },\mathbf{z}_{0};\mathbf{\Gamma}_{0}),over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = over^ start_ARG bold_G end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ,

and utilize the training data set (13) to learn the unknown operator 𝐆^^𝐆\mathbf{\widehat{G}}over^ start_ARG bold_G end_ARG. Note that the random variable 𝐳0subscript𝐳0\mathbf{z}_{0}bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is not in the training date set. In practice, one chooses 𝐳0subscript𝐳0\mathbf{z}_{0}bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with a known distribution, typically a standard Gaussian, and a specified dimension nz1subscript𝑛𝑧1n_{z}\geq 1italic_n start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ≥ 1. The presence of the random variable 𝐳0subscript𝐳0\mathbf{z}_{0}bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT enables (15) to be a stochastic generative model that can produce random realizations. Several methods exist to construct stochastic generative models, e.g., GANs, diffusion model, normalizing flow, autoencoder-decoder, etc. In this paper, we adopt normalizing flow for (15).

3.3.3 Normalizing Flow Model

Normalizing flows are generative models that produce tractable distributions to enable efficient and accurate sampling and density evaluation. A normalizing flow is a transformation of a simple probability distribution, e.g., a standard normal, into a more complex distribution by a sequence of diffeomorphism. Let 𝐙D𝐙superscript𝐷\mathbf{Z}\in\mathbb{R}^{D}bold_Z ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT be a random variable with a known and tractable distribution p𝐙subscript𝑝𝐙p_{\mathbf{Z}}italic_p start_POSTSUBSCRIPT bold_Z end_POSTSUBSCRIPT. Let 𝐠𝐠\mathbf{g}bold_g be a diffeomorphism, whose inverse is 𝐟=𝐠1𝐟superscript𝐠1\mathbf{f}=\mathbf{g}^{-1}bold_f = bold_g start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and 𝐘=𝐠(𝐙)𝐘𝐠𝐙\mathbf{Y}=\mathbf{g}(\mathbf{Z})bold_Y = bold_g ( bold_Z ). Then using the change of variable formula, one obtain the probability of 𝐘𝐘\mathbf{Y}bold_Y:

p𝐘(𝐲)=p𝐙(𝐟(𝐲))|det𝐃𝐟(𝐲)|=p𝐙(𝐟(𝐲))|det𝐃𝐠(f(𝐲))|1,subscript𝑝𝐘𝐲subscript𝑝𝐙𝐟𝐲𝐃𝐟𝐲subscript𝑝𝐙𝐟𝐲superscript𝐃𝐠𝑓𝐲1p_{\mathbf{Y}}(\mathbf{y})=p_{\mathbf{Z}}(\mathbf{f}(\mathbf{y}))|\det\mathbf{% D}\mathbf{f}(\mathbf{y})|=p_{\mathbf{Z}}(\mathbf{f}(\mathbf{y}))|\det\mathbf{D% }\mathbf{g}(f(\mathbf{y}))|^{-1},italic_p start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT ( bold_y ) = italic_p start_POSTSUBSCRIPT bold_Z end_POSTSUBSCRIPT ( bold_f ( bold_y ) ) | roman_det bold_Df ( bold_y ) | = italic_p start_POSTSUBSCRIPT bold_Z end_POSTSUBSCRIPT ( bold_f ( bold_y ) ) | roman_det bold_Dg ( italic_f ( bold_y ) ) | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

where 𝐃𝐟(𝐲)=𝐟/𝐲𝐃𝐟𝐲𝐟𝐲\mathbf{D}\mathbf{f}(\mathbf{y})=\partial\mathbf{f}/\partial\mathbf{y}bold_Df ( bold_y ) = ∂ bold_f / ∂ bold_y is the Jacobian of 𝐟𝐟\mathbf{f}bold_f and 𝐃𝐠(𝐳)=𝐠/𝐳𝐃𝐠𝐳𝐠𝐳\mathbf{D}\mathbf{g}(\mathbf{z})=\partial\mathbf{g}/\partial\mathbf{z}bold_Dg ( bold_z ) = ∂ bold_g / ∂ bold_z is the Jacobian of 𝐠𝐠\mathbf{g}bold_g. When the target complex distribution p𝐘subscript𝑝𝐘p_{\mathbf{Y}}italic_p start_POSTSUBSCRIPT bold_Y end_POSTSUBSCRIPT is given, usually as a set of samples of 𝐘𝐘\mathbf{Y}bold_Y, one chooses to find 𝐠𝐠\mathbf{g}bold_g from a parameterized family 𝐠θsubscript𝐠𝜃\mathbf{g}_{\theta}bold_g start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, where the parameter θ𝜃\thetaitalic_θ is optimized to match the target distribution. Also, to circumvent the difficulty of constructing a complicated nonlinear function 𝐠𝐠\mathbf{g}bold_g, one utilizes a composition of (much) simpler diffeomorphisms: 𝐠=𝐠m𝐠m1𝐠1𝐠subscript𝐠𝑚subscript𝐠𝑚1subscript𝐠1\mathbf{g}=\mathbf{g}_{m}\circ\mathbf{g}_{m-1}\circ\cdots\circ\mathbf{g}_{1}bold_g = bold_g start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∘ bold_g start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ∘ ⋯ ∘ bold_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. It can be shown that 𝐠𝐠\mathbf{g}bold_g remains a diffeomorphism with its inverse 𝐟=𝐟1𝐟m1𝐟m𝐟subscript𝐟1subscript𝐟𝑚1subscript𝐟𝑚\mathbf{f}=\mathbf{f}_{1}\circ\cdots\circ\mathbf{f}_{m-1}\circ\mathbf{f}_{m}bold_f = bold_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ ⋯ ∘ bold_f start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ∘ bold_f start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. There exist a large amount of literature on normalizing flows. We refer interested reader to review articles [20, 32].

In our setting, we seek to construct the one-step generative model (15) by using the training data (13). Let 𝐳0dsubscript𝐳0superscript𝑑\mathbf{z}_{0}\in\mathbb{R}^{d}bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a random variable with a known distribution. In our approach, we choose 𝐳0subscript𝐳0\mathbf{z}_{0}bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to be d𝑑ditalic_d-dimensional standard normal. Let 𝐓θsubscript𝐓𝜃\mathbf{T}_{\theta}bold_T start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT be a diffeomorphism with a set of parameters θnθ𝜃superscriptsubscript𝑛𝜃\theta\in\mathbb{R}^{n_{\theta}}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Our objective is to find θ𝜃\thetaitalic_θ such that 𝐓θ(𝐳0)subscript𝐓𝜃subscript𝐳0\mathbf{T}_{\theta}(\mathbf{z}_{0})bold_T start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) follows the distribution of {𝐱1(j)}j=1Msuperscriptsubscriptsuperscriptsubscript𝐱1𝑗𝑗1𝑀\{\mathbf{x}_{1}^{(j)}\}_{j=1}^{M}{ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT in (13).

Since the distribution of 𝐱1subscript𝐱1\mathbf{x}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT clearly depends on 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝚪0subscript𝚪0\mathbf{\Gamma}_{0}bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we constraint the choice of θ𝜃\thetaitalic_θ to be a function of 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝚪0subscript𝚪0\mathbf{\Gamma}_{0}bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We define

(16) θ=𝐍(𝐱0,𝚪0;Θ),𝜃𝐍subscript𝐱0subscript𝚪0Θ\theta=\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0};\Theta),italic_θ = bold_N ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; roman_Θ ) ,

where 𝐍𝐍\mathbf{N}bold_N is a DNN map** with trainable hyperparameters ΘΘ\Thetaroman_Θ. No special DNN structure required, and we adopt the straightforward fully connected feedforward DNN for 𝐍𝐍\mathbf{N}bold_N. This effectively defines

(17) 𝐱1=𝐓𝐍(𝐱0,𝚪0;Θ)(𝐳0),subscript𝐱1subscript𝐓𝐍subscript𝐱0subscript𝚪0Θsubscript𝐳0\mathbf{x}_{1}=\mathbf{T}_{\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0};% \Theta)}(\mathbf{z}_{0}),bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_T start_POSTSUBSCRIPT bold_N ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; roman_Θ ) end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ,

where the diffeomorphism 𝐓𝐓\mathbf{T}bold_T is effectively parameterized by the trainable hyperparameters ΘΘ\Thetaroman_Θ of the DNN. Let 𝐒=𝐓1𝐒superscript𝐓1\mathbf{S}=\mathbf{T}^{-1}bold_S = bold_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT be the inverse of 𝐓𝐓\mathbf{T}bold_T. We have 𝐳0=𝐒𝐍Δ(𝐱0,𝚪0;𝚯)(𝐱1)subscript𝐳0subscript𝐒subscript𝐍Δsubscript𝐱0subscript𝚪0𝚯subscript𝐱1\mathbf{z}_{0}=\mathbf{S}_{\mathbf{N}_{\Delta}(\mathbf{x}_{0},\mathbf{\Gamma}_% {0};\mathbf{\Theta})}(\mathbf{x}_{1})bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_S start_POSTSUBSCRIPT bold_N start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; bold_Θ ) end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ).

The invertibility of 𝐓𝐓\mathbf{T}bold_T allows us to compute:

(18) p(𝐱1|𝐱0,𝚪0;𝚯)=p𝐳0(𝐒𝐍(𝐱0,𝚪0;Θ)(𝐱1))|det𝐃𝐓𝐍(𝐱0,𝚪0;Θ)(𝐒𝐍(𝐱0,𝚪0;Θ)(𝐱1))|1.𝑝conditionalsubscript𝐱1subscript𝐱0subscript𝚪0𝚯subscript𝑝subscript𝐳0subscript𝐒𝐍subscript𝐱0subscript𝚪0Θsubscript𝐱1superscriptsubscript𝐃𝐓𝐍subscript𝐱0subscript𝚪0Θsubscript𝐒𝐍subscript𝐱0subscript𝚪0Θsubscript𝐱11p(\mathbf{x}_{1}|\mathbf{x}_{0},\mathbf{\Gamma}_{0};\mathbf{\Theta})=p_{% \mathbf{z}_{0}}\left(\mathbf{S}_{\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0}% ;\Theta)}(\mathbf{x}_{1})\right)\left|\det\mathbf{D}{\mathbf{T}_{\mathbf{N}(% \mathbf{x}_{0},\mathbf{\Gamma}_{0};{\Theta})}}(\mathbf{S}_{\mathbf{N}(\mathbf{% x}_{0},\mathbf{\Gamma}_{0};\Theta)}(\mathbf{x}_{1}))\right|^{-1}.italic_p ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; bold_Θ ) = italic_p start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_S start_POSTSUBSCRIPT bold_N ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; roman_Θ ) end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) | roman_det bold_DT start_POSTSUBSCRIPT bold_N ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; roman_Θ ) end_POSTSUBSCRIPT ( bold_S start_POSTSUBSCRIPT bold_N ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; roman_Θ ) end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

The hyperparameters ΘΘ\Thetaroman_Θ are determined by maximizing the expected log-likelihood, which is accomplished by minimizing its negative as the loss function,

(Θ):=𝔼(𝚪0,𝐱0,𝐱1)p𝚍𝚊𝚝𝚊log(p(𝐱1|𝐱0,𝚪0;Θ)),assignΘsubscript𝔼similar-tosubscript𝚪0subscript𝐱0subscript𝐱1subscript𝑝𝚍𝚊𝚝𝚊𝑝conditionalsubscript𝐱1subscript𝐱0subscript𝚪0Θ\mathcal{L}({\Theta}):=-\mathbb{E}_{(\mathbf{\Gamma}_{0},\mathbf{x}_{0},% \mathbf{x}_{1})\sim p_{\mathtt{data}}}\log(p(\mathbf{x}_{1}|\mathbf{x}_{0},% \mathbf{\Gamma}_{0};{\Theta})),caligraphic_L ( roman_Θ ) := - blackboard_E start_POSTSUBSCRIPT ( bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∼ italic_p start_POSTSUBSCRIPT typewriter_data end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log ( italic_p ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; roman_Θ ) ) ,

where p𝚍𝚊𝚝𝚊subscript𝑝𝚍𝚊𝚝𝚊p_{\mathtt{data}}italic_p start_POSTSUBSCRIPT typewriter_data end_POSTSUBSCRIPT is the distribution from the training data set (13) and computed as

(19) (Θ)=j=1Mlog(p(𝐱1(j)|𝐱0(j),𝚪0(j);Θ)).Θsuperscriptsubscript𝑗1𝑀𝑝conditionalsuperscriptsubscript𝐱1𝑗superscriptsubscript𝐱0𝑗superscriptsubscript𝚪0𝑗Θ\mathcal{L}({\Theta})=-\sum_{j=1}^{M}\log\left(p(\mathbf{x}_{1}^{(j)}|\mathbf{% x}_{0}^{(j)},\mathbf{\Gamma}_{0}^{(j)};{\Theta})\right).caligraphic_L ( roman_Θ ) = - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT roman_log ( italic_p ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ; roman_Θ ) ) .

Several designs for the invertible map 𝐓𝐓\mathbf{T}bold_T have been developed and studied extensively in the literature. These include, for example, masked autoregressive flow (MAF) [33], real-valued non-volume preserving (RealNVP) [15], neural ordinary differential equations (Neural ODE) [7], etc. In this paper, we adopt the MAF approach, where the dimension of the parameter θ𝜃\thetaitalic_θ in (16) is set to be nθ=2dsubscript𝑛𝜃2𝑑n_{\theta}=2ditalic_n start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = 2 italic_d, where d𝑑ditalic_d is the dimension of the dynamical system. For the technical detail of MAF, see [33].

3.4 DNN Model Structure and System Prediction

An illustration of the proposed sFML model structure can be found in Figure 1. This is in direct correspondence of (17). Minimization of the loss function (19), using the data set (13), results in the training of the DNN hyperparameters ΘΘ\Thetaroman_Θ. Once the training is completed and ΘΘ\Thetaroman_Θ fixed, (17) effectively defines the one-step sFML model (15):

𝐱1=𝐓𝐍(𝐱0,𝚪0)(𝐳0)=𝐆^Δ(𝐱0,𝐳0;𝚪0),subscript𝐱1subscript𝐓𝐍subscript𝐱0subscript𝚪0subscript𝐳0subscript^𝐆Δsubscript𝐱0subscript𝐳0subscript𝚪0\mathbf{x}_{1}=\mathbf{T}_{\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0})}(% \mathbf{z}_{0})=\mathbf{\widehat{G}}_{\Delta}(\mathbf{x}_{0},\mathbf{z}_{0};% \mathbf{\Gamma}_{0}),bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_T start_POSTSUBSCRIPT bold_N ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = over^ start_ARG bold_G end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ,

where we have suppressed the fixed parameter ΘΘ\Thetaroman_Θ.

Refer to caption
Figure 1: The DNN model structure for the proposed normalizing flow sFML method (17).

Iterative execution of the one-step sFML model allows one to conduct system predictions under excitations that are not in the training data. For a given (new) excitation signal 𝐮(t)=(μ(t),ν(t))T𝐮𝑡superscript𝜇𝑡𝜈𝑡𝑇\mathbf{u}(t)=(\mu(t),\nu(t))^{T}bold_u ( italic_t ) = ( italic_μ ( italic_t ) , italic_ν ( italic_t ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, we first conduct its parameterization in the form of (7), to obtain its local parameter 𝚪nsubscript𝚪𝑛\mathbf{\Gamma}_{n}bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for [tn,tn+1)subscript𝑡𝑛subscript𝑡𝑛1[t_{n},t_{n+1})[ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ), for any n0𝑛0n\geq 0italic_n ≥ 0. The sFML system then produces the system prediction, for a given initial condition 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

(20) 𝐱n+1=𝐆^Δ(𝐱n,𝐳n;𝚪n),n0,formulae-sequencesubscript𝐱𝑛1subscript^𝐆Δsubscript𝐱𝑛subscript𝐳𝑛subscript𝚪𝑛𝑛0\mathbf{x}_{n+1}=\mathbf{\widehat{G}}_{\Delta}(\mathbf{x}_{n},\mathbf{z}_{n};% \mathbf{\Gamma}_{n}),\qquad n\geq 0,bold_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = over^ start_ARG bold_G end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ; bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_n ≥ 0 ,

where 𝐳nsubscript𝐳𝑛\mathbf{z}_{n}bold_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. d𝑑ditalic_d-dimensional standard normal random variables.

4 Numerical Examples

In this section, we present several numerical tests to demonstrate the performance of our proposed method. After presenting results for an Ornstein-Uhlenbeck (OU) process and a nonlinear SDE, we focus on nonlinear SDE systems for long-term predictions. These include stochastic a predator-prey model and a stochastic oscillator with double well potential. In both cases, we study very long-term predictions of the learned sFML models. In particular, for the stochastic oscillator, we utilize a periodic excitation signal that is known to generate the well-known “stochastic resonance” phenomenon.

In all the examples, the true SDE systems are known. However, the known SDEs are used only to generate the training data set (13). We solve the true systems by Euler-Maruyama method with a time step Δ=0.01Δ0.01\Delta=0.01roman_Δ = 0.01. The “initial conditions” 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in (13) are sampled uniformly in a domain I𝐱subscript𝐼𝐱I_{\mathbf{x}}italic_I start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT, specified in each example, and the excitations are local polynomials whose coefficients 𝚪0subscript𝚪0\mathbf{\Gamma}_{0}bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are sampled in a domain specified for each example.

In our sFML model, Figure 1, the DNN 𝐍𝐍\mathbf{N}bold_N has 3 layers, each of which with 20 nodes, and utilizes tanh\tanhroman_tanh activation function. We employ cyclic learning rate with a base rate 3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and a maximum rate 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, γ=0.99999𝛾0.99999\gamma=0.99999italic_γ = 0.99999, and step size 10,0001000010,00010 , 000. The cycle is set for every 40,0004000040,00040 , 000 training epochs and with a decay scale 0.50.50.50.5. A small weight decay of 0.010.010.010.01 on the gradient updates is also used to help stablize the training. In our examples, the DNN training is usually conducted for 200,000300,000formulae-sequencesimilar-to200000300000200,000\sim 300,000200 , 000 ∼ 300 , 000 epochs.

4.1 Linear SDE with Control

We first consider Ornstein–Uhlenbeck (OU) process with control/excitation. Two cases are considered: when the control is in the drift and when the control is in both the drift and the diffusion. Note that since the true equations are not known, one has no information on “where” the excitations operate onto the system. The sFML approach also does not seek to recover the drift or diffusion terms.

4.1.1 OU with Drift Control

We first consider an Ornstein–Uhlenbeck (OU) process,

(21) dxt=[μxt+α(t)]dt+σdWt,𝑑subscript𝑥𝑡delimited-[]𝜇subscript𝑥𝑡𝛼𝑡𝑑𝑡𝜎𝑑subscript𝑊𝑡dx_{t}=\left[-\mu x_{t}+\alpha(t)\right]dt+\sigma dW_{t},italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ - italic_μ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_α ( italic_t ) ] italic_d italic_t + italic_σ italic_d italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ are set as μ=1.0𝜇1.0\mu=1.0italic_μ = 1.0 and σ=0.2𝜎0.2\sigma=0.2italic_σ = 0.2, and the control signal α(t)𝛼𝑡\alpha(t)italic_α ( italic_t ) is applied to the drift. The training data set (13) is generated by sampling x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in (2,2)22(-2,2)( - 2 , 2 ) and using Taylor polynomial of degree 2222 for the control α(t)𝛼𝑡\alpha(t)italic_α ( italic_t ). This introduces 3 parameters for 𝚪0subscript𝚪0\mathbf{\Gamma}_{0}bold_Γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which are sampled from (9,9)3superscript993(-9,9)^{3}( - 9 , 9 ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. A total of M=120,000𝑀120000M=120,000italic_M = 120 , 000 trajectory pairs are used in the training data set (13), where the time step Δ=0.01Δ0.01\Delta=0.01roman_Δ = 0.01.

Once the sFML model (14) is trained, we conduct system prediction for up to T=10.0𝑇10.0T=10.0italic_T = 10.0, which requires 1,000 time steps.

Refer to caption
Refer to caption
Figure 2: Sample trajectories of Example 4.1.1 with initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0 and α(t)=12sin(6t)𝛼𝑡126𝑡\alpha(t)=\frac{1}{2}\sin(6t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( 6 italic_t ). Left: ground truth; Right: Simulation using the trained sFML model.
Refer to caption
Figure 3: Mean and standard deviation (STD) of Example 4.1.1 with initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0 and α(t)=12sin(6t)𝛼𝑡126𝑡\alpha(t)=\frac{1}{2}\sin(6t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( 6 italic_t ).
Refer to caption
Refer to caption
Refer to caption
Figure 4: Comparasion of distribution of Example 4.1.1 at T=2,4,8𝑇248T=2,4,8italic_T = 2 , 4 , 8 with initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0 and α(t)=0.5sin(6t)𝛼𝑡0.56𝑡\alpha(t)=0.5\sin(6t)italic_α ( italic_t ) = 0.5 roman_sin ( 6 italic_t ).

In Figure 2, we compare some sample trajectory pathes produced by the ground truth (left) and the learned sFML model (right), with an initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0 and a “new” control signal α(t)=12sin(6t)𝛼𝑡126𝑡\alpha(t)=\frac{1}{2}\sin(6t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( 6 italic_t ). We observe the two sets appear visually similar to each other. To further validate the sFML model prediction, we compute the mean and standard deviation of the solution averaged over 10,0001000010,00010 , 000 trajectories. The sFML model predictions are shown in Figure 3, along with the reference ground truth. In Figure 4, we also show the comparison of the solution probability distributions at time T=2,4,8𝑇248T=2,4,8italic_T = 2 , 4 , 8. We observe good agreement between the learned sFML model and the true model. This verifies that the sFML model indeed provides an accurate approximation in distribution.

We now present the results under a different setting: the initial condition x0=1.0subscript𝑥01.0x_{0}=-1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - 1.0, and the excitation α(t)=12sin(5t)+15sin(1.5t)𝛼𝑡125𝑡151.5𝑡\alpha(t)=\frac{1}{2}\sin(5t)+\frac{1}{5}\sin(1.5t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( 5 italic_t ) + divide start_ARG 1 end_ARG start_ARG 5 end_ARG roman_sin ( 1.5 italic_t ). The sample solution trajectories are shown in Figure 5 and the solution mean and standard deviation averaged over 10,0001000010,00010 , 000 trajectories are shown in Figure 6. Again, we observe good agreement between the sFML model prediction and the ground truth.

Refer to caption
Refer to caption
Figure 5: Sample trajectories of Example 4.1.1 with initial condition x0=1.0subscript𝑥01.0x_{0}=-1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - 1.0 and α(t)=12sin(5t)+15sin(1.5t)𝛼𝑡125𝑡151.5𝑡\alpha(t)=\frac{1}{2}\sin(5t)+\frac{1}{5}\sin(1.5t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( 5 italic_t ) + divide start_ARG 1 end_ARG start_ARG 5 end_ARG roman_sin ( 1.5 italic_t ). Left: ground truth; Right: Simulation using the trained sFML model.
Refer to caption
Figure 6: Mean and standard deviation (STD) of Example 4.1.1 with initial condition x0=1.0subscript𝑥01.0x_{0}=-1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - 1.0 and α(t)=12sin(5t)+15sin(1.5t)𝛼𝑡125𝑡151.5𝑡\alpha(t)=\frac{1}{2}\sin(5t)+\frac{1}{5}\sin(1.5t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( 5 italic_t ) + divide start_ARG 1 end_ARG start_ARG 5 end_ARG roman_sin ( 1.5 italic_t ).

4.1.2 Fully control

We then consider the following OU process with control on both drift and diffusion terms:

(22) dxt=[μxt+α(t)]dt+β(t)dWt,𝑑subscript𝑥𝑡delimited-[]𝜇subscript𝑥𝑡𝛼𝑡𝑑𝑡𝛽𝑡𝑑subscript𝑊𝑡dx_{t}=\left[-\mu x_{t}+\alpha(t)\right]dt+\beta(t)dW_{t},italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ - italic_μ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_α ( italic_t ) ] italic_d italic_t + italic_β ( italic_t ) italic_d italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where μ=1.0𝜇1.0\mu=1.0italic_μ = 1.0, and α(t)𝛼𝑡\alpha(t)italic_α ( italic_t ) and β(t)𝛽𝑡\beta(t)italic_β ( italic_t ) are the excitation/control signals. To generate training data, we conduct the local parameterization of α(t)𝛼𝑡\alpha(t)italic_α ( italic_t ) and β(t)𝛽𝑡\beta(t)italic_β ( italic_t ) with 2nd degree Taylor polynomials, resulting in 𝚪nnΓsubscript𝚪𝑛superscriptsubscript𝑛Γ\mathbf{\Gamma}_{n}\in\mathbb{R}^{n_{\Gamma}}bold_Γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, nΓ=3+3=6subscript𝑛Γ336n_{\Gamma}=3+3=6italic_n start_POSTSUBSCRIPT roman_Γ end_POSTSUBSCRIPT = 3 + 3 = 6. Moreover, we generate 120,000120000120,000120 , 000 training data pairs with initial conditions uniformly sampled from I𝐱=[0.8,1.5]subscript𝐼𝐱0.81.5I_{\mathbf{x}}=[-0.8,1.5]italic_I start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT = [ - 0.8 , 1.5 ] and I𝚪=[0.6,0.6]×[0.8,0.8]×[0.7,0.7]×[0.01,0.35]×[0.5,0.5]×[1.55,0.55]subscript𝐼𝚪0.60.60.80.80.70.70.010.350.50.51.550.55I_{\mathbf{\Gamma}}=[-0.6,0.6]\times[-0.8,0.8]\times[-0.7,0.7]\times[0.01,0.35% ]\times[-0.5,0.5]\times[-1.55,0.55]italic_I start_POSTSUBSCRIPT bold_Γ end_POSTSUBSCRIPT = [ - 0.6 , 0.6 ] × [ - 0.8 , 0.8 ] × [ - 0.7 , 0.7 ] × [ 0.01 , 0.35 ] × [ - 0.5 , 0.5 ] × [ - 1.55 , 0.55 ].

Refer to caption
Refer to caption
Figure 7: Sample trajectories of Example 4.1.2 with initial condition x0=1.0subscript𝑥01.0x_{0}=1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0, α(t)=12sin(π2t)𝛼𝑡12𝜋2𝑡\alpha(t)=\frac{1}{2}\sin(\frac{\pi}{2}t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( divide start_ARG italic_π end_ARG start_ARG 2 end_ARG italic_t ) and β(t)=110ecos(πt)𝛽𝑡110superscript𝑒𝜋𝑡\beta(t)=\frac{1}{10}e^{\cos(\pi t)}italic_β ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 10 end_ARG italic_e start_POSTSUPERSCRIPT roman_cos ( italic_π italic_t ) end_POSTSUPERSCRIPT. Left: ground truth; Right: Simulation using the trained sFML model.
Refer to caption
Figure 8: Mean and standard deviation (STD) of Example 4.1.2 with initial condition x0=1.0subscript𝑥01.0x_{0}=1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0, α(t)=12sin(π2t)𝛼𝑡12𝜋2𝑡\alpha(t)=\frac{1}{2}\sin(\frac{\pi}{2}t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( divide start_ARG italic_π end_ARG start_ARG 2 end_ARG italic_t ) and β(t)=110ecos(πt)𝛽𝑡110superscript𝑒𝜋𝑡\beta(t)=\frac{1}{10}e^{\cos(\pi t)}italic_β ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 10 end_ARG italic_e start_POSTSUPERSCRIPT roman_cos ( italic_π italic_t ) end_POSTSUPERSCRIPT.
Refer to caption
Refer to caption
Refer to caption
Figure 9: Comparasion of distribution of Example 4.1.2 at T=2,6,8𝑇268T=2,6,8italic_T = 2 , 6 , 8 with initial condition x0=1.0subscript𝑥01.0x_{0}=1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0, α(t)=12sin(π2t)𝛼𝑡12𝜋2𝑡\alpha(t)=\frac{1}{2}\sin(\frac{\pi}{2}t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( divide start_ARG italic_π end_ARG start_ARG 2 end_ARG italic_t ) and β(t)=110ecos(πt)𝛽𝑡110superscript𝑒𝜋𝑡\beta(t)=\frac{1}{10}e^{\cos(\pi t)}italic_β ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 10 end_ARG italic_e start_POSTSUPERSCRIPT roman_cos ( italic_π italic_t ) end_POSTSUPERSCRIPT.

To examine the performance of the learned sFML model, we conduct a simulation with an initial condition 𝐱0=1.0subscript𝐱01.0\mathbf{x}_{0}=1.0bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0 and excitations α(t)=12sin(π2t)𝛼𝑡12𝜋2𝑡\alpha(t)=\frac{1}{2}\sin(\frac{\pi}{2}t)italic_α ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_sin ( divide start_ARG italic_π end_ARG start_ARG 2 end_ARG italic_t ) and β(t)=110ecos(πt)𝛽𝑡110superscript𝑒𝜋𝑡\beta(t)=\frac{1}{10}e^{\cos(\pi t)}italic_β ( italic_t ) = divide start_ARG 1 end_ARG start_ARG 10 end_ARG italic_e start_POSTSUPERSCRIPT roman_cos ( italic_π italic_t ) end_POSTSUPERSCRIPT. (Note that the excitations are not the Taylor polynomails in the training data set.) Some sample solution trajectories are shown in Figure 7. The mean and STD of the solution are shown in Figure 8. And in Figure 9, we also show the comparison of the probability distribution of the solution at T=2,6,8𝑇268T=2,6,8italic_T = 2 , 6 , 8. We observe good agreement between the sFML model prediction and the gorund truth.

4.2 Nonlinear SDEs with Control

We now consider a nonlinear system of SDEs, inspired by an exmple in Section 2.3.2 of [45]:

(23) {x˙t=f(xt,yt,t)+σ1W˙1,y˙t=μ(ytxt)+σ2W˙2,casessubscript˙𝑥𝑡𝑓subscript𝑥𝑡subscript𝑦𝑡𝑡subscript𝜎1subscript˙𝑊1subscript˙𝑦𝑡𝜇subscript𝑦𝑡subscript𝑥𝑡subscript𝜎2subscript˙𝑊2\left\{\begin{array}[]{l}\dot{x}_{t}=f(x_{t},y_{t},t)+\sigma_{1}\dot{W}_{1},\\ \dot{y}_{t}=-\mu(y_{t}-x_{t})+\sigma_{2}\dot{W}_{2},\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) + italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over˙ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_μ ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over˙ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW end_ARRAY

where W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and W2subscript𝑊2{W}_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are independent Brownian motions, μ=1.0𝜇1.0\mu=1.0italic_μ = 1.0, σ1=0.2subscript𝜎10.2\sigma_{1}=0.2italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.2, σ2=0.05subscript𝜎20.05\sigma_{2}=0.05italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.05, and the function f𝑓fitalic_f contains a control signal u(t)𝑢𝑡u(t)italic_u ( italic_t ):

f(x,y,t)=y3+u(t),u(t)=sin(πt)+cos(2πt).formulae-sequence𝑓𝑥𝑦𝑡superscript𝑦3𝑢𝑡𝑢𝑡𝜋𝑡2𝜋𝑡f(x,y,t)=-y^{3}+u(t),\qquad u(t)=\sin(\pi t)+\cos(\sqrt{2}\pi t).italic_f ( italic_x , italic_y , italic_t ) = - italic_y start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_u ( italic_t ) , italic_u ( italic_t ) = roman_sin ( italic_π italic_t ) + roman_cos ( square-root start_ARG 2 end_ARG italic_π italic_t ) .

To generate the training data, we simulate the system with 120,000120000120,000120 , 000 sample paths over one time step Δ=0.01Δ0.01\Delta=0.01roman_Δ = 0.01 from initial conditions uniformly in I𝐱=[1.5,2.0]×[1.0,1.6]subscript𝐼𝐱1.52.01.01.6I_{\mathbf{x}}=[-1.5,2.0]\times[-1.0,1.6]italic_I start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT = [ - 1.5 , 2.0 ] × [ - 1.0 , 1.6 ] and under controls by 2nd-degree Taylor polynomials with coeffficients sampled from [2,2]×[8,8]×[15,15]22881515[-2,2]\times[-8,8]\times[-15,15][ - 2 , 2 ] × [ - 8 , 8 ] × [ - 15 , 15 ].

Refer to caption
Refer to caption
Figure 10: Sample phase portraits of Example 4.2 with initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0, y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0 for time up to T=10𝑇10T=10italic_T = 10. Left: reference solution; Right: sFML model prediction.
Refer to caption
Refer to caption
Figure 11: Mean and standard deviation (STD) of Example 4.2 with initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0, y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0. Left: x𝑥xitalic_x; Right: y𝑦yitalic_y.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 12: Comparasion of probability distributions of Example 4.2 at T=4,6,7,9𝑇4679T=4,6,7,9italic_T = 4 , 6 , 7 , 9 with an initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0, y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0. To row: x𝑥xitalic_x; Bottom row: y𝑦yitalic_y.

For the learned sFML model, we conduct system predictions with an initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0 and y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0. In Figure 10, we plot a few sample phase portraits from ground truth (left), as well as from the sFML model prediction (right). They appear to be visually in agreement. The mean and standard deviation of the system prediction by the sFML model are shown in Figure 11, along with those of the true solution. In Figure 12, we also show the comparison of reference and learned density functions of the test trajectory at time T=4,6,7,9𝑇4679T=4,6,7,9italic_T = 4 , 6 , 7 , 9. We observe that the sFML model exhibits good accuracy in these predictions.

4.3 Stochastic Predator-Prey Model

We then consider a stochastic Lotka-Volterra system with a time-dependent excitation u(t)𝑢𝑡u(t)italic_u ( italic_t ):

(24) {x˙t=xtxtyt+u(t)+σ1xtW˙1,y˙t=yt+xtyt+σ2ytW˙2,casessubscript˙𝑥𝑡subscript𝑥𝑡subscript𝑥𝑡subscript𝑦𝑡𝑢𝑡subscript𝜎1subscript𝑥𝑡subscript˙𝑊1subscript˙𝑦𝑡subscript𝑦𝑡subscript𝑥𝑡subscript𝑦𝑡subscript𝜎2subscript𝑦𝑡subscript˙𝑊2\left\{\begin{array}[]{l}\dot{x}_{t}=x_{t}-x_{t}y_{t}+u(t)+\sigma_{1}x_{t}\dot% {W}_{1},\\ \dot{y}_{t}=-y_{t}+x_{t}y_{t}+\sigma_{2}y_{t}\dot{W}_{2},\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_u ( italic_t ) + italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over˙ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over˙ start_ARG italic_W end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW end_ARRAY

where W1subscript𝑊1{W}_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and W2subscript𝑊2{W}_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are independent Brownian motions, and σ1=σ2=0.05subscript𝜎1subscript𝜎20.05\sigma_{1}=\sigma_{2}=0.05italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.05. The training data are generated by simulating 120,000120000120,000120 , 000 solutoin samples for one step Δ=0.01Δ0.01\Delta=0.01roman_Δ = 0.01, from initial conditions in I𝐱=[0.1,0.35]×[0.2,5.5]subscript𝐼𝐱0.10.350.25.5I_{\mathbf{x}}=[0.1,0.35]\times[0.2,5.5]italic_I start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT = [ 0.1 , 0.35 ] × [ 0.2 , 5.5 ] and under exicitations of 2nd-degree Taylor polynomials whose coefficients are from [0.01,4.2]×[1.5,1.5]×[0.7,0.7]0.014.21.51.50.70.7[0.01,4.2]\times[-1.5,1.5]\times[-0.7,0.7][ 0.01 , 4.2 ] × [ - 1.5 , 1.5 ] × [ - 0.7 , 0.7 ].

Once we have the trained model, we conduct system prediction with an initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0, y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0 and exitation u(t)=sin(t3)+cos(t)+2𝑢𝑡𝑡3𝑡2u(t)=\sin(\frac{t}{3})+\cos(t)+2italic_u ( italic_t ) = roman_sin ( divide start_ARG italic_t end_ARG start_ARG 3 end_ARG ) + roman_cos ( italic_t ) + 2. We conduct relatively long-term prediction for time up to T=80𝑇80T=80italic_T = 80. (Note that the training data are of lenght 0.010.010.010.01.) In Figure 13, we plot a few sample of the phase portrait of the system. Good visual agreement between the sFML prediction and the ground truth can be observed. To examine the accuracy more closely, we present the mean and standard deviation of the system in Figure 14. We observe good predictive accuracy of the sFML model for up to T=80𝑇80T=80italic_T = 80.

Refer to caption
Refer to caption
Figure 13: Phase portrait samples of the Lotka-Volterra system 4.3 with an initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0, y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0 and u(t)=sin(t3)+cos(t)+2𝑢𝑡𝑡3𝑡2u(t)=\sin(\frac{t}{3})+\cos(t)+2italic_u ( italic_t ) = roman_sin ( divide start_ARG italic_t end_ARG start_ARG 3 end_ARG ) + roman_cos ( italic_t ) + 2.
Refer to caption
Refer to caption
Figure 14: Mean and standard deviation (STD) of Example 4.3 with initial condition x0=2.0subscript𝑥02.0x_{0}=2.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 2.0, y0=1.0subscript𝑦01.0y_{0}=1.0italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0 and u(t)=sin(t3)+cos(t)+2𝑢𝑡𝑡3𝑡2u(t)=\sin(\frac{t}{3})+\cos(t)+2italic_u ( italic_t ) = roman_sin ( divide start_ARG italic_t end_ARG start_ARG 3 end_ARG ) + roman_cos ( italic_t ) + 2. Upper: x𝑥xitalic_x; Lower: y𝑦yitalic_y.

4.4 Stochastic Resonance

Finally, we consider the following SDE with a double-well potential and excitation,

(25) dxt=[xtxt3+u(t)]dt+σdWt,𝑑subscript𝑥𝑡delimited-[]subscript𝑥𝑡superscriptsubscript𝑥𝑡3𝑢𝑡𝑑𝑡𝜎𝑑subscript𝑊𝑡dx_{t}=\left[x_{t}-x_{t}^{3}+u(t)\right]dt+\sigma dW_{t},italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_u ( italic_t ) ] italic_d italic_t + italic_σ italic_d italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where σ=0.25𝜎0.25\sigma=0.25italic_σ = 0.25 is a parameter, and u(t)𝑢𝑡u(t)italic_u ( italic_t ) is the excitation. When V=0𝑉0V=0italic_V = 0, there is no excitation to the system. The solution would exhibit random transition between two metastable states x=1𝑥1x=-1italic_x = - 1 and x=1𝑥1x=1italic_x = 1. The transition probability depends on the parameters σ𝜎\sigmaitalic_σ. When V0𝑉0V\neq 0italic_V ≠ 0, an excitation is exerted to the system. If the excitation is periodic, under the right circumstance the random transtion between the two metastable states becomes synchorized with the perodicity of the exication, resulting in the so-called stochastic resonance, cf., [4, 2, 3].

Here, we demonstrate that the proposed sFML method can accurately model and predict the long-term system behavior using only very short burst of measurement data. Our data are 30,0003000030,00030 , 000 trajectories of one step (Δ=0.01Δ0.01\Delta=0.01roman_Δ = 0.01) length, with initial conditions sampled from I𝐱=[1.6,1.6]subscript𝐼𝐱1.61.6I_{\mathbf{x}}=[-1.6,1.6]italic_I start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT = [ - 1.6 , 1.6 ] and under piecewise constant exictations sampled from [0.13,0.13]0.130.13[-0.13,0.13][ - 0.13 , 0.13 ].

Once the sFML model is trained, we conduct system prediction under various excitations. In particular, we choose u(t)=Vcos(ωt)𝑢𝑡𝑉𝜔𝑡u(t)=V\cos(\omega t)italic_u ( italic_t ) = italic_V roman_cos ( italic_ω italic_t ), with V=0.12𝑉0.12V=0.12italic_V = 0.12 and ω=0.001𝜔0.001\omega=0.001italic_ω = 0.001. These parameters are chosen according to [4], to ensure the occurrence of stochastic resonance. An exceptionially long-term system prediction is conducted by the sFML model, for time up to T=40,000𝑇40000T=40,000italic_T = 40 , 000. The result is shown in the top of Figure 15, where we also plotted the (rescaled) periodic excitation in light grey line in the background. We can clearly observe the synchonization between the random transition and the periodic excitation — the stochastic resonance. For reference, we also conduct the sFML system prediction with V=0𝑉0V=0italic_V = 0, i.e., no excitation. The solution, shown in the bottom of Figure 15, exhibits the expected random transition between the two metastable states. We shall emphsize that in this case the transition probability is very small, O(105)𝑂superscript105O(10^{-5})italic_O ( 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT ). The learned sFML model is capable of capturing such a small probability event. We shall remark again that the training data are pairwise data separated by one time step. Thus, none of the (long-term) system behaviors can be observed in the training data.

Refer to caption
Refer to caption
Figure 15: Sample trajectories of Example 4.4 with an initial condition x0=1.0subscript𝑥01.0x_{0}=1.0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.0. Top: With the periodic excitation u(t)=Vcos(ωt)𝑢𝑡𝑉𝜔𝑡u(t)=V\cos(\omega t)italic_u ( italic_t ) = italic_V roman_cos ( italic_ω italic_t ), V=0.12𝑉0.12V=0.12italic_V = 0.12 and ω=0.001𝜔0.001\omega=0.001italic_ω = 0.001. The system exhibits stochastic resonance. Bottom: No excitation case with u(t)=0𝑢𝑡0u(t)=0italic_u ( italic_t ) = 0. The system exhibits random transitions with very small probability O(105)𝑂superscript105O(10^{-5})italic_O ( 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT ).

5 Conclusion

In this paper, we presented a general numerical framework for modeling unknown nonautonomous stochastic systems by using observed trajectory data. To overcome the difficulties brought by the external time-dependent inputs, we transfer the original system into a local parametric stochastic system. We accomplished this by locally parameterizing the time-dependent external inputs on several discrete time points. The resulting stochastic system is then driven by a stationary parametric stochastic flow map. A normalizing flow model is devised to approximate the parametric stochastic flow map. By using a comprehensive set of numerical examples, we demonstrated that the proposed approach is effective and accurate in modeling a variety of unknown stochastic systems. The learned model can conduct expectational long-term system, subject to arbitrary external excitations that are not contained in the training data.

References

  • [1] C. Archambeau, D. Cornford, M. Opper, and J. Shawe-Taylor, Gaussian process approximations of stochastic differential equations, in Gaussian Processes in Practice, N. D. Lawrence, A. Schwaighofer, and J. Quiñonero Candela, eds., vol. 1 of Proceedings of Machine Learning Research, Bletchley Park, UK, 12–13 Jun 2007, PMLR, pp. 1–16, https://proceedings.mlr.press/v1/archambeau07a.html.
  • [2] R. Benzi, G. Parisi, A. Sutera, and A. Vulpiani, Stochastic resonance in climatic change, Tellus, 34 (1982), pp. 10–16.
  • [3] R. Benzi, G. Parisi, A. Sutera, and A. Vulpiani, A theory of stochastic resonance in climatic change, SIAM J. Appl. Math., 43 (1983), pp. 565–478, https://doi.org/10.1137/0143037, https://doi.org/10.1137/0143037.
  • [4] R. Benzi, A. Sutera, and A. Vulpiani, The mechanism of stochastic resonance, J. Phys. A, 14 (1981), pp. L453–L457, http://stacks.iop.org/0305-4470/14/L453.
  • [5] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937, https://doi.org/10.1073/pnas.1517384113.
  • [6] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Sparse identification of nonlinear dynamics with control (sindyc), IFAC-PapersOnLine, 49 (2016), pp. 710–715, https://doi.org/https://doi.org/10.1016/j.ifacol.2016.10.249, https://www.sciencedirect.com/science/article/pii/S2405896316318298. 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016.
  • [7] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., vol. 31, Curran Associates, Inc., 2018, https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.
  • [8] X. Chen, J. Duan, J. Hu, and D. Li, Data-driven method to learn the most probable transition pathway and stochastic differential equation, Phys. D, 443 (2023), pp. Paper No. 133559, 15, https://doi.org/10.1016/j.physd.2022.133559.
  • [9] X. Chen, L. Yang, J. Duan, and G. E. Karniadakis, Solving inverse stochastic problems from discrete particle observations using the Fokker-Planck equation and physics-informed neural networks, SIAM J. Sci. Comput., 43 (2021), pp. B811–B830, https://doi.org/10.1137/20M1360153.
  • [10] Y. Chen and D. Xiu, Learning stochastic dynamical system via flow map operator, J. Comput. Phys., 508 (2024), p. Paper No. 112984, https://doi.org/10.1016/j.jcp.2024.112984, https://doi.org/10.1016/j.jcp.2024.112984.
  • [11] V. Churchill and D. Xiu, Flow map learning for unknown dynamical systems: Overview, implementation, and benchmarks, Journal of Machine Learning for Modeling and Computing, 4 (2023), pp. 173–201.
  • [12] M. Darcy, B. Hamzi, G. Livieri, H. Owhadi, and P. Tavallali, One-shot learning of stochastic differential equations with data adapted kernels, Phys. D, 444 (2023), pp. Paper No. 133583, 18, https://doi.org/10.1016/j.physd.2022.133583.
  • [13] R. Deng, B. Chang, M. A. Brubaker, G. Mori, and A. Lehrmann, Modeling continuous stochastic processes with dynamic normalizing flows, in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds., vol. 33, Curran Associates, Inc., 2020, pp. 7805–7815, https://proceedings.neurips.cc/paper_files/paper/2020/file/58c54802a9fb9526cd0923353a34a7ae-Paper.pdf.
  • [14] F. Dietrich, A. Makeev, G. Kevrekidis, N. Evangelou, T. Bertalan, S. Reich, and I. G. Kevrekidis, Learning effective stochastic differential equations from microscopic simulations: linking stochastic numerics to deep learning, Chaos, 33 (2023), pp. Paper No. 023121, 19, https://doi.org/10.1063/5.0113632, https://doi.org/10.1063/5.0113632.
  • [15] L. Dinh, J. Sohl-Dickstein, and S. Bengio, Density estimation using real NVP, in International Conference on Learning Representations, 2017, https://openreview.net/forum?id=HkpbnH9lx.
  • [16] X. Fu, L.-B. Chang, and D. Xiu, Learning reduced systems via deep neural networks with memory, J. Machine Learning Model. Comput., 1 (2020), pp. 97–118.
  • [17] L. Guo, H. Wu, and T. Zhou, Normalizing field flows: Solving forward and inverse stochastic differential equations using physics-informed flow models, Journal of Computational Physics, 461 (2022), p. 111202, https://doi.org/https://doi.org/10.1016/j.jcp.2022.111202, https://www.sciencedirect.com/science/article/pii/S0021999122002649.
  • [18] T. Haarnoja, K. Hartikainen, P. Abbeel, and S. Levine, Latent space policies for hierarchical reinforcement learning, in Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, PMLR, 10–15 Jul 2018, pp. 1851–1860, https://proceedings.mlr.press/v80/haarnoja18a.html.
  • [19] S. H. Kang, W. Liao, and Y. Liu, IDENT: identifying differential equations with numerical time evolution, J. Sci. Comput., 87 (2021), pp. Paper No. 1, 27, https://doi.org/10.1007/s10915-020-01404-9.
  • [20] I. Kobyzev, S. Prince, and M. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE Trans. Pattern Anal. Machine Intel., 43 (2021), pp. 3964–3979.
  • [21] V. Laparra, G. Camps-Valls, and J. Malo, Iterative gaussianization: From ica to random rotations, IEEE Transactions on Neural Networks, 22 (2011), pp. 537–549, https://doi.org/10.1109/TNN.2011.2106511.
  • [22] Y. Li and J. Duan, A data-driven approach for discovering stochastic dynamical systems with non-Gaussian Lévy noise, Phys. D, 417 (2021), pp. Paper No. 132830, 12, https://doi.org/10.1016/j.physd.2020.132830.
  • [23] Y. Li, Y. Lu, S. Xu, and J. Duan, Extracting stochastic dynamical systems with α𝛼\alphaitalic_α-stable Lévy noise from data, J. Stat. Mech. Theory Exp., (2022), pp. Paper No. 023405, 23, https://doi.org/10.1088/1742-5468/ac4e87, https://doi.org/10.1088/1742-5468/ac4e87.
  • [24] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, Fourier neural operator for parametric partial differential equations, in International Conference on Learning Representations, 2021, https://openreview.net/forum?id=c8P9NQVtmnO.
  • [25] H. Lu and D. M. Tartakovsky, Data-driven models of nonautonomous systems, J. Comput. Phys., 507 (2024), p. Paper No. 112976, https://doi.org/10.1016/j.jcp.2024.112976, https://doi.org/10.1016/j.jcp.2024.112976.
  • [26] Y. Lu, R. Maulik, T. Gao, F. Dietrich, I. G. Kevrekidis, and J. Duan, Learning the temporal evolution of multivariate densities via normalizing flows, Chaos, 32 (2022), pp. Paper No. 033121, 17, https://doi.org/10.1063/5.0065093, https://doi.org/10.1063/5.0065093.
  • [27] T. Müller, B. Mcwilliams, F. Rousselle, M. Gross, and J. Novák, Neural importance sampling, ACM Trans. Graph., 38 (2019), https://doi.org/10.1145/3341156, https://doi.org/10.1145/3341156.
  • [28] B. Øksendal, Stochastic differential equations, in Stochastic differential equations, Springer, 2003, pp. 65–84.
  • [29] M. Opper, Variational inference for stochastic differential equations, Ann. Phys., 531 (2019), pp. 1800233, 9, https://doi.org/10.1002/andp.201800233.
  • [30] H. Owhadi, Computational graph completion, Research in the Mathematical Sciences, 9 (2022), p. 27.
  • [31] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, Normalizing flows for probabilistic modeling and inference, J. Mach. Learn. Res., 22 (2021), pp. Paper No. 57, 64.
  • [32] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, Normalizing flows for probabilistic modeling and inference, J. Machine Learning Res., 22 (2021), pp. 1–64.
  • [33] G. Papamakarios, T. Pavlakou, and I. Murray, Masked autoregressive flow for density estimation, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017, https://proceedings.neurips.cc/paper_files/paper/2017/file/6c1da886822c67822bcf3679d04369fa-Paper.pdf.
  • [34] J. L. Proctor, S. L. Brunton, and J. N. Kutz, Dynamic mode decomposition with control, SIAM J. Appl. Dyn. Syst., 15 (2016), pp. 142–161, https://doi.org/10.1137/15M1013857, https://doi.org/10.1137/15M1013857.
  • [35] J. L. Proctor, S. L. Brunton, and J. N. Kutz, Generalizing Koopman theory to allow for inputs and control, SIAM J. Appl. Dyn. Syst., 17 (2018), pp. 909–930, https://doi.org/10.1137/16M1062296, https://doi.org/10.1137/16M1062296.
  • [36] T. Qin, Z. Chen, J. D. Jakeman, and D. Xiu, Data-driven learning of nonautonomous systems, SIAM J. Sci. Comput., 43 (2021), pp. A1607–A1624, https://doi.org/10.1137/20M1342859.
  • [37] T. Qin, Z. Chen, J. D. Jakeman, and D. Xiu, Deep learning of parameterized equations with applications to uncertainty quantification, Int. J. Uncertain. Quantif., 11 (2021), pp. 63–82, https://doi.org/10.1615/Int.J.UncertaintyQuantification.2020034123, https://doi.org/10.1615/Int.J.UncertaintyQuantification.2020034123.
  • [38] T. Qin, K. Wu, and D. Xiu, Data driven governing equations approximation using deep neural networks, J. Comput. Phys., 395 (2019), pp. 620–635, https://doi.org/10.1016/j.jcp.2019.06.042.
  • [39] M. Raissi, P. Perdikaris, and G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019), pp. 686–707, https://doi.org/10.1016/j.jcp.2018.10.045.
  • [40] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236, (2018).
  • [41] H. Schaeffer and S. G. McCalla, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), pp. 023302, 7, https://doi.org/10.1103/physreve.96.023302.
  • [42] H. Schaeffer, G. Tran, and R. Ward, Extracting sparse high-dimensional dynamics from limited data, SIAM J. Appl. Math., 78 (2018), pp. 3279–3295, https://doi.org/10.1137/18M116798X.
  • [43] J. Song, S. Zhao, and S. Ermon, A-nice-mc: Adversarial training for mcmc, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017, https://proceedings.neurips.cc/paper_files/paper/2017/file/2417dc8af8570f274e6775d4d60496da-Paper.pdf.
  • [44] Y. Wang, H. Fang, J. **, G. Ma, X. He, X. Dai, Z. Yue, C. Cheng, H.-T. Zhang, D. Pu, D. Wu, Y. Yuan, J. Gonçalves, J. Kurths, and H. Ding, Data-driven discovery of stochastic differential equations, Engineering, 17 (2022), pp. 244–252, https://doi.org/https://doi.org/10.1016/j.eng.2022.02.007.
  • [45] E. Weinan, Principles of multiscale modeling, Cambridge University Press, 2011.
  • [46] Z. Xu, Y. Chen, Q. Chen, and D. Xiu, Modeling unknown stochastic dynamical system via autoencoder, arXiv preprint arXiv:2312.10001, (2023).
  • [47] L. Yang, C. Daskalakis, and G. E. Karniadakis, Generative ensemble regression: Learning particle dynamics from observations of ensembles with physics-informed deep generative models, SIAM Journal on Scientific Computing, 44 (2022), pp. B80–B99, https://doi.org/10.1137/21M1413018.
  • [48] C. Yildiz, M. Heinonen, J. Intosalmi, H. Mannerstrom, and H. Lahdesmaki, Learning stochastic differential equations with gaussian processes without gradient matching, in 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2018, pp. 1–6.
  • [49] J. Zhang, S. Zhang, and G. Lin, Multiauto-deeponet: A multi-resolution autoencoder deeponet for nonlinear dimension reduction, uncertainty quantification and operator learning of forward and inverse stochastic problems, arXiv preprint arXiv:2204.03193, (2022).
  • [50] A. Zhu and Q. Li, Dyngma: a robust approach for learning stochastic differential equations from data, arXiv preprint arXiv:2402.14475, (2024).