\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \headersLearning Nonautonomous SDEYuan Chen and Dongbin Xiu

Modeling Unknown Stochastic Dynamical System Subject to External Excitation

Yuan Chen Dongbin Xiu E-mail addresses: {chen.11050, xiu.16}@osu.edu. Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA. Funding: This work was partially supported by AFOSR FA9550-22-1-0011.

Abstract

We present a numerical method for learning unknown nonautonomous stochastic dynamical system, i.e., stochastic system subject to time dependent excitation or control signals. Our basic assumption is that the governing equations for the stochastic system are unavailable. However, short bursts of input/output (I/O) data consisting of certain known excitation signals and their corresponding system responses are available. When a sufficient amount of such I/O data are available, our method is capable of learning the unknown dynamics and producing an accurate predictive model for the stochastic responses of the system subject to arbitrary excitation signals not in the training data. Our method has two key components: (1) a local approximation of the training I/O data to transfer the learning into a parameterized form; and (2) a generative model to approximate the underlying unknown stochastic flow map in distribution. After presenting the method in detail, we present a comprehensive set of numerical examples to demonstrate the performance of the proposed method, especially for long-term system predictions.

keywords:

Data-driven modeling, stochastic dynamical systems, deep neural networks, nonautonomous system

{MSCcodes}

60H10, 60H35, 62M45, 65C30

1 Introduction

There has been a growing interest in recovering/discovering unknown dynamical systems from observational data. Most of the existing studies focus on deterministic systems, with methods such as physics-informed neural networks (PINNs) ([39, 40]), SINDy ([5]), Fourier neural operator (FNO) ([24]), computational graph completion ([30]), sparsity promoting methods ([41, 42, 19]), flow map learning (FML) ([38, 11]), to name a few.

Learning unknown stochastic systems is notably more challenging, as the stochastic noises in the systems usually can not be directly observed. The existing work utilizes Gaussian process ([48, 1, 12, 29]), polynomial approximations ([44, 22]), deep neural networks (DNNs) [8, 47, 9, 49, 14, 50], etc. More recently, a stochastic extension of the deterministic flow map learning (FML) approach ([38, 11]) was proposed. It employs generative models such as GANs (generative adversarial networks) ([10]) or autoencoders ([46]) to model the underlying stochasticity. However, most, if not all, of these methods are developed for autonomous systems, where time-invariance (in distribution) holds true and is critical to the method development.

The focus and contribution of this paper is on the learning and modeling of unknown non-autonomous stochastic systems. More specifically, we consider SDEs with unknown governing equations and subject to time dependent external excitation or control signals. Our goal is to develop a method that can capture the stochastic dynamics of the unknown systems by using short-term data consisting of input/output (I/O) relations between excitation signals and their corresponding system responses. We remark that there exist some studies on modeling deterministic non-autonomous systems, using methodology such as Dynamic Mode Decomposition (DMD) ([25, 34]), SINDy ([6]), Koopman operator ([35]), FML ([36]), etc. These methods are not applicable for stochastic non-autonomous systems.

The proposed method in this paper has two key components. First, the method utilize the observational I/O data to construct an accurate representation of unknown stochastic dynamics of the system. This is accomplished by a generative model that learns the stochastic map** of the system between two consecutive discrete time steps. The learning of this stochastic flow map is similar to the work of [10, 46], which extended the deterministic FML to stochastic systems. While [10, 46] utilized GANs and autoencoder as the generative model, in this paper we employ conditional normalizing flow (cf. [31]). Normalizing flow has been widely adopted as a probabilistic model for generating data with desired distributions. Its applications include image and video generation [21], statistical inference and sampling [27, 43], reinforcement learning [18], as well as scientific computing [26, 23, 17, 13]. The second key component of the proposed method is local parameterization of the excitation signal in the training I/O data. The method was first introduced in [36] for deterministic nonautonomous sytem. We adopt the similar idea and extend it to stochastic system. The approach seeks to parameterize the excitation sigals in the training data via a localized polynomial over one time step. This then transforms the learning problem into a parametric learning between the coefficients of the local polynomials and the system responses. This is a critical component, as it allows the learned system to conduct long-term system predictions under arbitrary excitation signals that are never seen in the training data. Although the proposed method requires a large number of short bursts data, the overall demand for data may not be as large. This is because the burst length of the training I/O data is as short as two time steps. Once trained, the learned model is able to simulate the unknown stochastic systems for very long-term and subject to arbitrary exitation/control signals. We demonstrate this important property in several of our numerical examples. The learning is performed using training I/O data observed over only $O(1)$ nondimensional time units. However, the system predictions by the learned system can be accurate for time units as large as $O(10^{4})$ and beyond, and under excitation signals not in the training data.

2 Setup

Let $\Omega$ be an event space and $T$ a finite time horizon. We consider a $d$ -dimensional ( $d\geq 1$ ) stochastic process $\mathbf{x}(\omega,t):\Omega\times[0,T]\mapsto\mathbb{R}^{d}$ driven by an unknown (non-autonomous) stochastic differential equation (SDE) subject to external inputs

(1)

d\mathbf{x}_{t}=\mathbf{a}(\mathbf{x}_{t},\mu(t))dt+\mathbf{b}(\mathbf{x}_{t},% \nu(t))d\mathbf{W}_{t},

where $\mathbf{W}_{t}$ is $m$ -dimensional ( $m\geq 1$ ) Brownian motion, $\mathbf{a}:\mathbb{R}^{d}\times\mathbb{R}\to\mathbb{R}^{d}$ drift function, $\mathbf{b}:\mathbb{R}^{d}\times\mathbb{R}\to\mathbb{R}^{d\times m}$ diffusion function, and $\mu(t)$ and $\nu(t)$ time-dependent external inputs into the stochastic system. In practice, the inputs can be external excitation signals or control signals. Throughout this paper, we will generally refer them as excitations and denote $\mathbf{u}(t)=(\mu(t),\nu(t))^{T}$ to consolidate the notation.

Our basic assumption is that the SDE is unknown, in the sense that the functions $\mathbf{a}$ and $\mathbf{b}$ are not known. Also, the driving Brownian motion $\mathbf{W}_{t}$ can not be observed. However, we have input-output (I/O) time history data between the excitations $\mathbf{u}=(\mu,\nu)^{T}$ , i.e., the inputs, and system response $\mathbf{x}$ , i.e., the output,

(2)

\textrm{I/O training data:}\qquad\mathbf{u}(t)\rightarrow\mathbf{x}(t).

Our goal is to construct a numerical model for the unknown system (1) such that it can produce accurate predictions of the system response $\mathbf{x}(t)$ for arbitrarily given excitations $\mathbf{u}(t)$ that are not observed in the training data (2).

2.1 Problem Statement

The method presented in this paper is based on discrete time setting. Let $t_{0}<t_{1}<...$ be discrete time points. For simplicity, we assume the time steps are of uniform length $\Delta=t_{i+1}-t_{i}$ , $\forall i\geq 0$ . Suppose we observe $N_{T}\geq 1$ I/O sequences of solution responses subject to input excitations: for $i=1,\dots,N_{T}$ ,

(3)

\left(\mathbf{u}\left(t_{0}^{(i)}\right),\mathbf{u}\left(t_{1}^{(i)}\right),% \cdots,\mathbf{u}\left(t_{L_{i}}^{(i)}\right)\right)\rightarrow\left(\mathbf{x% }\left(t_{0}^{(i)}\right),\mathbf{x}\left(t_{1}^{(i)}\right),\cdots,\mathbf{x}% \left(t_{L_{i}}^{(i)}\right)\right),

where $(L_{i}+1)$ is the length of the $i$ -th observation sequence. Note that each sequence of the I/O data can cover different time spans. Also, one may have more information about the excitation $\mathbf{u}(t)$ beyond its point values. For example, the analytical form of $\mathbf{u}(t)$ may be known within the time interval $[t_{0}^{(i)},t_{L_{i}}^{(i)}]$ for some sequences.

The objective is to construct a numerical model to predict the system response of (1) subject to arbitrary excitations. More specifically, given an initial condition $\mathbf{x}_{0}$ and excitation signal $\mathbf{u}(t)$ that is not in the training I/O data (3), we require the model prediction $\widehat{\mathbf{x}}$ to approximate the true system response $\mathbf{x}$ , i.e.,

(4)

\widehat{\mathbf{x}}(t_{n};\mathbf{x}_{0},\mathbf{u}(t))\stackrel{{% \scriptstyle d}}{{\approx}}\mathbf{x}(t_{n};\mathbf{x}_{0},\mathbf{u}(t)),% \qquad n=1,\dots,

where $\stackrel{{\scriptstyle d}}{{\approx}}$ stands for approximation in distribution. Note that since in general the stochastic driving term $\mathbf{W}_{t}$ can not be directly observed, a weak approximation, such as approximation in distribution, is typically the most one can achieve from a mathematical point of view.

2.2 Related Work and Contribution

This method developed in this paper has its foundation in two recent work: flow map learning (FML) for modeling deterministic unknown dynamical systems and its extension to modeling stochastic dynamical systems.

For an unknown deterministic autonomous system, $\frac{d\mathbf{x}}{dt}=\mathbf{f}(\mathbf{x})$ , $\mathbf{x}\in\mathbb{R}^{d}$ , where $\mathbf{f}:\mathbb{R}^{d}\to\mathbb{R}^{d}$ is unknown. The FML method seeks to approximate the unknown flow map $\mathbf{x}_{n}=\mathbf{\Phi}_{t_{n}-t_{s}}(\mathbf{x}_{s})$ by using observation data. More specifically, by using data on $\mathbf{x}$ over one time step $\Delta t$ , the FML method constructs a model

\mathbf{x}_{n+1}=\widetilde{\mathbf{\Phi}}_{\Delta t}(\mathbf{x}_{n}),

where $\widetilde{\mathbf{\Phi}}_{\Delta t}\approx{\mathbf{\Phi}}_{\Delta t}$ is a numerical approximation of the true flow map over one time step $\Delta t$ . Once constructed, the FML model can be used as a time marching scheme to predict the system response under a given initial condition. This framework was proposed in [38], with extensions to partially observed system [16], parametric systems [37], as well as non-autonomous deterministic system [36].

For learning autonomous stochastic system, $\frac{d\mathbf{x}}{dt}=\mathbf{f}(\mathbf{x},\omega(t)),$ where $\omega(t)$ represents an unknown stochastic process driving the system. The work of [10] developed stochastic flow map learning (sFML). Assuming the system satisfies time-homogeneous property ([28]) $\mathbb{P}(\mathbf{x}_{s+\Delta t}|\mathbf{x}_{s})=\mathbb{P}(\mathbf{x}_{% \Delta t}|\mathbf{x}_{0})$ , $s\geq 0$ , the method uses the observation data on the state variable $\mathbf{x}$ to construct a one-step generative model

\mathbf{x}_{n+1}=\mathbf{G}_{\Delta t}(\mathbf{x}_{n};\mathbf{z}),

where $\mathbf{z}$ is a random variable with known distribution (e.g., standard Gaussian). The function $\mathbf{G}$ , termed stochastic flow map, approximates the conditional distribution $\mathbf{G}_{\Delta t}(\mathbf{x}_{s};\mathbf{z})\approx\mathbb{P}(\mathbf{x}_{% s+\Delta t}|\mathbf{x}_{s}).$ Subsequently, the sFML model becomes a weak approximation, in distribution, to the true stochastic dynamics. Different generative models can be employed under the sFML framework. For example, generative adversarial networks (GANs) are used in [10], and an autoencoder is employed in [46].

The primary contribution of this paper is on the development of data driven modeling for unknown stochastic systems subject to external excitations. To accomplish this, we extend the sFML framework ([10]), which was developed for autonomous system, to non-autonomous stochastic system. To learn the system I/O responses, we employ the local parameterization technique developed for non-autonomous deterministic system ([36]). The method parameterizes the input excitations in the data and transforms the learning problem into learning a parametric dynamical system. For stochastic non-autonomous system considered in this paper, we incorporate the method into a generative model in the sFML framework. In particular, we use normalizing flow as the generative model, which has not been considered in stochastic dynamical system learning. We shall demonstrate that the newly developed method is highly effective in modeling unknown stochastic systems, when excitations are not present in the training data.

3 Methodology

In this section, we describe the proposed learning method in detail.

3.1 Parameterization of Inputs

Consider the unknown SDE (1) over a time interval $[t_{n},t_{n+1}]$ , $n\geq 0$ ,

(5)

\mathbf{x}(t_{n+1})=\mathbf{x}({t_{n}})+\int_{t_{n}}^{t_{n+1}}\mathbf{a}(% \mathbf{x}(s),\mu(s))ds+\int_{t_{n}}^{t_{n+1}}\mathbf{b}(\mathbf{x}(s),\nu(s))% d\mathbf{W}(s),

which can be wrriten equivalently as,

(6)

\begin{split}\mathbf{x}({t_{n+1}})=\mathbf{x}({t_{n}})&+\int_{0}^{\Delta}% \mathbf{a}(\mathbf{x}({t_{n}+\tau}),\mu(t_{n}+\tau))d\tau\\ &+\int_{0}^{\Delta}\mathbf{b}(\mathbf{x}({t_{n}+\tau}),\nu(t_{n}+\tau))d% \mathbf{W}(t_{n}+\tau).\end{split}

By using the compact notation $\mathbf{u}(t)=(\mu(t),\nu(t))^{T}$ , we now consider the excitation $\mathbf{u}(t)$ in the time interval $[t_{n},t_{n+1}]$ . Given the information of the excitation in the training data (3), we construct a parameterized form

(7)

\mathbf{u}(t)|_{[t_{n},t_{n+1})}\approx\widetilde{\mathbf{u}}(\tau;\mathbf{% \Gamma}_{n})=\sum_{k=1}^{m}\boldsymbol{\alpha}_{n}^{k}~{}p_{k}(\tau),\qquad% \tau\in[0,\Delta),

where $\{p_{k},k=1,\dots,m\}$ is a set of prescribed analytical basis functions and

(8)

\mathbf{\Gamma}_{n}=\{\boldsymbol{\alpha}_{n}^{1},\dots,\boldsymbol{\alpha}_{n% }^{m}\}\in\mathbb{R}^{n_{\Gamma}},

are the expansion coefficients. In principle, one can choose any suitable basis functions. Since the time interval $[t_{n},t_{n+1}]$ usually has a (very) small step size $\Delta$ , it suffices to use low-order polynomials. In fact, low-degree monomials bases, $p_{k}(\tau)=\tau^{k-1}$ , $k\geq 1$ , would be sufficient for most problems. When $k=0$ , the parameterization takes form of piecewise constant function; when $k=1$ , piecewise linear function.

The local parameterization of $\mathbf{u}$ is carried out based on the information one has about the excitations. If the excitations are only known at the discrete time instances, as shown in (3), then it is natural to utilize piecewise linear polynomial,

\widetilde{\mathbf{u}}(\tau;\mathbf{\Gamma}_{n})=\mathbf{u}(t_{n})+\frac{\tau}% {\Delta}(\mathbf{u}(t_{n+1})-\mathbf{u}(t_{n})).

If more information about $\mathbf{u}(t)$ is available, one can construct a higher degree polynomial. Note that since the time step $\Delta$ is usually small, a quadratic polynomial $\widetilde{\mathbf{u}}$ can be highly accurate in any time interval $[t_{n},t_{n+1})$ . We remark that in the representation, only the values of the excitations $\mathbf{u}$ at $t_{n}$ and $t_{n+1}$ are needed. The values of the time $t_{n}$ and $t_{n+1}$ are not required.

3.2 Parametric Stochastic Flow Map

By replacing $\mathbf{u}$ by the local polynomial $\widetilde{\mathbf{u}}$ (7), we transform the system (6) into

(9)

\begin{split}\widetilde{\mathbf{x}}({t_{n+1}})=\widetilde{\mathbf{x}}({t_{n}})% &+\int_{0}^{\Delta}\mathbf{a}(\widetilde{\mathbf{x}}({t_{n}+\tau}),\widetilde{% \mu}\left(\tau;{\mathbf{\Gamma}}_{n}\right))d\tau\\ &+\int_{0}^{\Delta}\mathbf{b}(\widetilde{\mathbf{x}}({t_{n}+\tau}),\widetilde{% \nu}\left(\tau;{\mathbf{\Gamma}}_{n}\right))d\mathbf{W}(t_{n}+\tau),\end{split}

where the excitation signals $\mathbf{u}=(\mu,\nu)^{T}$ has been parameterized by $\widetilde{\mathbf{u}}$ via a set of parameters $\mathbf{\Gamma}_{n}$ . Compared to (6), the transformed system (9) contains possible numerical error introduced by the parameterization of the excitations over the time domain $[0,\Delta)$ . The error can be made arbitrarily small if one uses higher degree polynomials when $\Delta$ is sufficiently small.

By using subscript to denote the time level and letting

\widetilde{\mathbf{x}}(t_{n})=\widetilde{\mathbf{x}}_{n},\qquad d\mathbf{W}_{n% }(\tau)=d\mathbf{W}(t_{n}+\tau),

the parameterized system (9) indicates that, there exists a map**

(10)

\widetilde{\mathbf{x}}_{n+1}=\mathbf{G}_{\Delta}(\widetilde{\mathbf{x}}_{n},d% \mathbf{W}_{n}(\Delta);\boldsymbol{\Gamma}_{n}),

where $\mathbf{G}_{\Delta}$ is what we shall call parametric stochastic flow map, which is parameterized by $\mathbf{\Gamma}_{n}$ . It is an unknown operator as the functions $\mathbf{a}$ and $\mathbf{b}$ are unknown in the original system (1).

Remark 3.1.

It is important to recognize that for the Brownian motion $\mathbf{W}(t)$ , or in general for Lévy processes (càdlàg stochastic processes with stationary independent increments), the process $d\mathbf{W}_{n}(\tau)$ is stationary and independent of $\mathbf{W}(t_{n})$ . Therefore, only the time difference $\Delta=t_{n+1}-t_{n}$ matters, and the values of $t_{n}$ and $t_{n+1}$ do not. Consequently, we have suppressed the time variable $t_{n}$ and $t_{n+1}$ in (10).

3.3 Stochastic Flow Map Learning

In this section, we describe our main method of stochastic flow map learning (sFML), which constructs a generative model to approximate the stochastic flow map (10) by using the trajectory data (3).

3.3.1 Training Data

To construct the training data set, we reorganize the original training data set (3) into pairs of consecutive time instances. Since for each of the $i$ -th trajectory, $i=1,\dots,N_{T}$ , we can extract $L_{i}$ such pairs, there are a total number of $M=L_{1}+\cdots+L_{N_{T}}$ I/O data pairs from the data set (3):

(11)

\left(\mathbf{u}\left(t_{k}^{(j)}\right),\mathbf{u}\left(t_{k+1}^{(j)}\right)% \right)\rightarrow\left(\mathbf{x}\left(t_{k}^{(j)}\right),\mathbf{x}\left(t_{% k+1}^{(j)}\right)\right),\qquad j=1,\dots,M.

Next, we perform the local parameterization procedure from Section 3.1 to each pair of the input data $\left(\mathbf{u}\left(t_{k}^{(j)}\right),\mathbf{u}\left(t_{k+1}^{(j)}\right)\right)$ and obtain its parameterization $\mathbf{\Gamma}_{k}^{(j)}$ , $j=1,\dots,M$ , in the form of (7). We now have

(12)

\mathbf{\Gamma}_{k}^{(j)}\rightarrow\left(\mathbf{x}\left(t_{k}^{(j)}\right),% \mathbf{x}\left(t_{k+1}^{(j)}\right)\right),\qquad j=1,\dots,M.

Since the values of the time variables do not matter, see Remark 3.1, we again suppress the time variables and write our training data set as

(13)

\mathcal{S}_{M}=\left\{\boldsymbol{\mathbf{\Gamma}}_{0}^{(j)};\left(\mathbf{x}% _{0}^{(j)},\mathbf{x}_{1}^{(j)}\right)\right\}_{j=1}^{M},

where $M$ is the total number of the parametric data pairs. In this way, each $j$ -th entry of the data set is a trajectory of length two over one time step $\Delta$ , starting with its “initial condition” at $\mathbf{x}_{0}^{(j)}$ , ending one step later at $\mathbf{x}_{1}^{(j)}$ , and driven by a known excitation parameterized by $\boldsymbol{\mathbf{\Gamma}}_{0}^{(j)}$ and an unknown stationary stochastic process $d\mathbf{W}_{n}(\Delta)$ .

3.3.2 Generative Model

In stochastic flow map learning (sFML), we seek to approximate the parametric stochastic flow map (10) via a recursive generative model in the form of

(14)

\widehat{\mathbf{x}}_{n+1}=\mathbf{\widehat{G}}_{\Delta}(\widehat{\mathbf{x}}_% {n},\mathbf{z}_{n};\mathbf{\Gamma}_{n}),

where $\mathbf{z}\in\mathbb{R}^{n_{z}}$ is a random variable of known distribution. Again, since the stationary stochastic process $d\mathbf{W}_{n}(\Delta)$ in (10) is not observed, the constructed sFML model (14) is expected to be a weak approximation of (10), and in this particular case, approximation in distribution.

In order to construct the sFML model (14), we execute the model for one time step over $\Delta$ ,

(15)

\widehat{\mathbf{x}}_{1}=\mathbf{\widehat{G}}_{\Delta}(\widehat{\mathbf{x}}_{0% },\mathbf{z}_{0};\mathbf{\Gamma}_{0}),

and utilize the training data set (13) to learn the unknown operator $\mathbf{\widehat{G}}$ . Note that the random variable $\mathbf{z}_{0}$ is not in the training date set. In practice, one chooses $\mathbf{z}_{0}$ with a known distribution, typically a standard Gaussian, and a specified dimension $n_{z}\geq 1$ . The presence of the random variable $\mathbf{z}_{0}$ enables (15) to be a stochastic generative model that can produce random realizations. Several methods exist to construct stochastic generative models, e.g., GANs, diffusion model, normalizing flow, autoencoder-decoder, etc. In this paper, we adopt normalizing flow for (15).

3.3.3 Normalizing Flow Model

Normalizing flows are generative models that produce tractable distributions to enable efficient and accurate sampling and density evaluation. A normalizing flow is a transformation of a simple probability distribution, e.g., a standard normal, into a more complex distribution by a sequence of diffeomorphism. Let $\mathbf{Z}\in\mathbb{R}^{D}$ be a random variable with a known and tractable distribution $p_{\mathbf{Z}}$ . Let $\mathbf{g}$ be a diffeomorphism, whose inverse is $\mathbf{f}=\mathbf{g}^{-1}$ , and $\mathbf{Y}=\mathbf{g}(\mathbf{Z})$ . Then using the change of variable formula, one obtain the probability of $\mathbf{Y}$ :

p_{\mathbf{Y}}(\mathbf{y})=p_{\mathbf{Z}}(\mathbf{f}(\mathbf{y}))|\det\mathbf{% D}\mathbf{f}(\mathbf{y})|=p_{\mathbf{Z}}(\mathbf{f}(\mathbf{y}))|\det\mathbf{D% }\mathbf{g}(f(\mathbf{y}))|^{-1},

where $\mathbf{D}\mathbf{f}(\mathbf{y})=\partial\mathbf{f}/\partial\mathbf{y}$ is the Jacobian of $\mathbf{f}$ and $\mathbf{D}\mathbf{g}(\mathbf{z})=\partial\mathbf{g}/\partial\mathbf{z}$ is the Jacobian of $\mathbf{g}$ . When the target complex distribution $p_{\mathbf{Y}}$ is given, usually as a set of samples of $\mathbf{Y}$ , one chooses to find $\mathbf{g}$ from a parameterized family $\mathbf{g}_{\theta}$ , where the parameter $\theta$ is optimized to match the target distribution. Also, to circumvent the difficulty of constructing a complicated nonlinear function $\mathbf{g}$ , one utilizes a composition of (much) simpler diffeomorphisms: $\mathbf{g}=\mathbf{g}_{m}\circ\mathbf{g}_{m-1}\circ\cdots\circ\mathbf{g}_{1}$ . It can be shown that $\mathbf{g}$ remains a diffeomorphism with its inverse $\mathbf{f}=\mathbf{f}_{1}\circ\cdots\circ\mathbf{f}_{m-1}\circ\mathbf{f}_{m}$ . There exist a large amount of literature on normalizing flows. We refer interested reader to review articles [20, 32].

In our setting, we seek to construct the one-step generative model (15) by using the training data (13). Let $\mathbf{z}_{0}\in\mathbb{R}^{d}$ be a random variable with a known distribution. In our approach, we choose $\mathbf{z}_{0}$ to be $d$ -dimensional standard normal. Let $\mathbf{T}_{\theta}$ be a diffeomorphism with a set of parameters $\theta\in\mathbb{R}^{n_{\theta}}$ . Our objective is to find $\theta$ such that $\mathbf{T}_{\theta}(\mathbf{z}_{0})$ follows the distribution of $\{\mathbf{x}_{1}^{(j)}\}_{j=1}^{M}$ in (13).

Since the distribution of $\mathbf{x}_{1}$ clearly depends on $\mathbf{x}_{0}$ and $\mathbf{\Gamma}_{0}$ , we constraint the choice of $\theta$ to be a function of $\mathbf{x}_{0}$ and $\mathbf{\Gamma}_{0}$ . We define

(16)

\theta=\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0};\Theta),

where $\mathbf{N}$ is a DNN map** with trainable hyperparameters $\Theta$ . No special DNN structure required, and we adopt the straightforward fully connected feedforward DNN for $\mathbf{N}$ . This effectively defines

(17)

\mathbf{x}_{1}=\mathbf{T}_{\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0};% \Theta)}(\mathbf{z}_{0}),

where the diffeomorphism $\mathbf{T}$ is effectively parameterized by the trainable hyperparameters $\Theta$ of the DNN. Let $\mathbf{S}=\mathbf{T}^{-1}$ be the inverse of $\mathbf{T}$ . We have $\mathbf{z}_{0}=\mathbf{S}_{\mathbf{N}_{\Delta}(\mathbf{x}_{0},\mathbf{\Gamma}_% {0};\mathbf{\Theta})}(\mathbf{x}_{1})$ .

The invertibility of $\mathbf{T}$ allows us to compute:

(18)

p(\mathbf{x}_{1}|\mathbf{x}_{0},\mathbf{\Gamma}_{0};\mathbf{\Theta})=p_{% \mathbf{z}_{0}}\left(\mathbf{S}_{\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0}% ;\Theta)}(\mathbf{x}_{1})\right)\left|\det\mathbf{D}{\mathbf{T}_{\mathbf{N}(% \mathbf{x}_{0},\mathbf{\Gamma}_{0};{\Theta})}}(\mathbf{S}_{\mathbf{N}(\mathbf{% x}_{0},\mathbf{\Gamma}_{0};\Theta)}(\mathbf{x}_{1}))\right|^{-1}.

The hyperparameters $\Theta$ are determined by maximizing the expected log-likelihood, which is accomplished by minimizing its negative as the loss function,

\mathcal{L}({\Theta}):=-\mathbb{E}_{(\mathbf{\Gamma}_{0},\mathbf{x}_{0},% \mathbf{x}_{1})\sim p_{\mathtt{data}}}\log(p(\mathbf{x}_{1}|\mathbf{x}_{0},% \mathbf{\Gamma}_{0};{\Theta})),

where $p_{\mathtt{data}}$ is the distribution from the training data set (13) and computed as

(19)

\mathcal{L}({\Theta})=-\sum_{j=1}^{M}\log\left(p(\mathbf{x}_{1}^{(j)}|\mathbf{% x}_{0}^{(j)},\mathbf{\Gamma}_{0}^{(j)};{\Theta})\right).

Several designs for the invertible map $\mathbf{T}$ have been developed and studied extensively in the literature. These include, for example, masked autoregressive flow (MAF) [33], real-valued non-volume preserving (RealNVP) [15], neural ordinary differential equations (Neural ODE) [7], etc. In this paper, we adopt the MAF approach, where the dimension of the parameter $\theta$ in (16) is set to be $n_{\theta}=2d$ , where $d$ is the dimension of the dynamical system. For the technical detail of MAF, see [33].

3.4 DNN Model Structure and System Prediction

An illustration of the proposed sFML model structure can be found in Figure 1. This is in direct correspondence of (17). Minimization of the loss function (19), using the data set (13), results in the training of the DNN hyperparameters $\Theta$ . Once the training is completed and $\Theta$ fixed, (17) effectively defines the one-step sFML model (15):

\mathbf{x}_{1}=\mathbf{T}_{\mathbf{N}(\mathbf{x}_{0},\mathbf{\Gamma}_{0})}(% \mathbf{z}_{0})=\mathbf{\widehat{G}}_{\Delta}(\mathbf{x}_{0},\mathbf{z}_{0};% \mathbf{\Gamma}_{0}),

where we have suppressed the fixed parameter $\Theta$ .

Refer to caption — Figure 1: The DNN model structure for the proposed normalizing flow sFML method (17).

Iterative execution of the one-step sFML model allows one to conduct system predictions under excitations that are not in the training data. For a given (new) excitation signal $\mathbf{u}(t)=(\mu(t),\nu(t))^{T}$ , we first conduct its parameterization in the form of (7), to obtain its local parameter $\mathbf{\Gamma}_{n}$ for $[t_{n},t_{n+1})$ , for any $n\geq 0$ . The sFML system then produces the system prediction, for a given initial condition $\mathbf{x}_{0}$ ,

(20)

\mathbf{x}_{n+1}=\mathbf{\widehat{G}}_{\Delta}(\mathbf{x}_{n},\mathbf{z}_{n};% \mathbf{\Gamma}_{n}),\qquad n\geq 0,

where $\mathbf{z}_{n}$ are i.i.d. $d$ -dimensional standard normal random variables.

4 Numerical Examples

In this section, we present several numerical tests to demonstrate the performance of our proposed method. After presenting results for an Ornstein-Uhlenbeck (OU) process and a nonlinear SDE, we focus on nonlinear SDE systems for long-term predictions. These include stochastic a predator-prey model and a stochastic oscillator with double well potential. In both cases, we study very long-term predictions of the learned sFML models. In particular, for the stochastic oscillator, we utilize a periodic excitation signal that is known to generate the well-known “stochastic resonance” phenomenon.

In all the examples, the true SDE systems are known. However, the known SDEs are used only to generate the training data set (13). We solve the true systems by Euler-Maruyama method with a time step $\Delta=0.01$ . The “initial conditions” $\mathbf{x}_{0}$ in (13) are sampled uniformly in a domain $I_{\mathbf{x}}$ , specified in each example, and the excitations are local polynomials whose coefficients $\mathbf{\Gamma}_{0}$ are sampled in a domain specified for each example.

In our sFML model, Figure 1, the DNN $\mathbf{N}$ has 3 layers, each of which with 20 nodes, and utilizes $\tanh$ activation function. We employ cyclic learning rate with a base rate $3\times 10^{-4}$ and a maximum rate $5\times 10^{-4}$ , $\gamma=0.99999$ , and step size $10,000$ . The cycle is set for every $40,000$ training epochs and with a decay scale $0.5$ . A small weight decay of $0.01$ on the gradient updates is also used to help stablize the training. In our examples, the DNN training is usually conducted for $200,000\sim 300,000$ epochs.

4.1 Linear SDE with Control

We first consider Ornstein–Uhlenbeck (OU) process with control/excitation. Two cases are considered: when the control is in the drift and when the control is in both the drift and the diffusion. Note that since the true equations are not known, one has no information on “where” the excitations operate onto the system. The sFML approach also does not seek to recover the drift or diffusion terms.

4.1.1 OU with Drift Control

We first consider an Ornstein–Uhlenbeck (OU) process,

(21)

dx_{t}=\left[-\mu x_{t}+\alpha(t)\right]dt+\sigma dW_{t},

where $\mu$ and $\sigma$ are set as $\mu=1.0$ and $\sigma=0.2$ , and the control signal $\alpha(t)$ is applied to the drift. The training data set (13) is generated by sampling $x_{0}$ in $(-2,2)$ and using Taylor polynomial of degree $2$ for the control $\alpha(t)$ . This introduces 3 parameters for $\mathbf{\Gamma}_{0}$ , which are sampled from $(-9,9)^{3}$ . A total of $M=120,000$ trajectory pairs are used in the training data set (13), where the time step $\Delta=0.01$ .

Once the sFML model (14) is trained, we conduct system prediction for up to $T=10.0$ , which requires 1,000 time steps.

In Figure 2, we compare some sample trajectory pathes produced by the ground truth (left) and the learned sFML model (right), with an initial condition $x_{0}=2.0$ and a “new” control signal $\alpha(t)=\frac{1}{2}\sin(6t)$ . We observe the two sets appear visually similar to each other. To further validate the sFML model prediction, we compute the mean and standard deviation of the solution averaged over $10,000$ trajectories. The sFML model predictions are shown in Figure 3, along with the reference ground truth. In Figure 4, we also show the comparison of the solution probability distributions at time $T=2,4,8$ . We observe good agreement between the learned sFML model and the true model. This verifies that the sFML model indeed provides an accurate approximation in distribution.

We now present the results under a different setting: the initial condition $x_{0}=-1.0$ , and the excitation $\alpha(t)=\frac{1}{2}\sin(5t)+\frac{1}{5}\sin(1.5t)$ . The sample solution trajectories are shown in Figure 5 and the solution mean and standard deviation averaged over $10,000$ trajectories are shown in Figure 6. Again, we observe good agreement between the sFML model prediction and the ground truth.

4.1.2 Fully control

We then consider the following OU process with control on both drift and diffusion terms:

(22)

dx_{t}=\left[-\mu x_{t}+\alpha(t)\right]dt+\beta(t)dW_{t},

where $\mu=1.0$ , and $\alpha(t)$ and $\beta(t)$ are the excitation/control signals. To generate training data, we conduct the local parameterization of $\alpha(t)$ and $\beta(t)$ with 2nd degree Taylor polynomials, resulting in $\mathbf{\Gamma}_{n}\in\mathbb{R}^{n_{\Gamma}}$ , $n_{\Gamma}=3+3=6$ . Moreover, we generate $120,000$ training data pairs with initial conditions uniformly sampled from $I_{\mathbf{x}}=[-0.8,1.5]$ and $I_{\mathbf{\Gamma}}=[-0.6,0.6]\times[-0.8,0.8]\times[-0.7,0.7]\times[0.01,0.35% ]\times[-0.5,0.5]\times[-1.55,0.55]$ .

To examine the performance of the learned sFML model, we conduct a simulation with an initial condition $\mathbf{x}_{0}=1.0$ and excitations $\alpha(t)=\frac{1}{2}\sin(\frac{\pi}{2}t)$ and $\beta(t)=\frac{1}{10}e^{\cos(\pi t)}$ . (Note that the excitations are not the Taylor polynomails in the training data set.) Some sample solution trajectories are shown in Figure 7. The mean and STD of the solution are shown in Figure 8. And in Figure 9, we also show the comparison of the probability distribution of the solution at $T=2,6,8$ . We observe good agreement between the sFML model prediction and the gorund truth.

4.2 Nonlinear SDEs with Control

We now consider a nonlinear system of SDEs, inspired by an exmple in Section 2.3.2 of [45]:

(23)

\left\{\begin{array}[]{l}\dot{x}_{t}=f(x_{t},y_{t},t)+\sigma_{1}\dot{W}_{1},\\ \dot{y}_{t}=-\mu(y_{t}-x_{t})+\sigma_{2}\dot{W}_{2},\end{array}\right.

where $W_{1}$ and ${W}_{2}$ are independent Brownian motions, $\mu=1.0$ , $\sigma_{1}=0.2$ , $\sigma_{2}=0.05$ , and the function $f$ contains a control signal $u(t)$ :

f(x,y,t)=-y^{3}+u(t),\qquad u(t)=\sin(\pi t)+\cos(\sqrt{2}\pi t).

To generate the training data, we simulate the system with $120,000$ sample paths over one time step $\Delta=0.01$ from initial conditions uniformly in $I_{\mathbf{x}}=[-1.5,2.0]\times[-1.0,1.6]$ and under controls by 2nd-degree Taylor polynomials with coeffficients sampled from $[-2,2]\times[-8,8]\times[-15,15]$ .

For the learned sFML model, we conduct system predictions with an initial condition $x_{0}=2.0$ and $y_{0}=1.0$ . In Figure 10, we plot a few sample phase portraits from ground truth (left), as well as from the sFML model prediction (right). They appear to be visually in agreement. The mean and standard deviation of the system prediction by the sFML model are shown in Figure 11, along with those of the true solution. In Figure 12, we also show the comparison of reference and learned density functions of the test trajectory at time $T=4,6,7,9$ . We observe that the sFML model exhibits good accuracy in these predictions.

4.3 Stochastic Predator-Prey Model

We then consider a stochastic Lotka-Volterra system with a time-dependent excitation $u(t)$ :

(24)

\left\{\begin{array}[]{l}\dot{x}_{t}=x_{t}-x_{t}y_{t}+u(t)+\sigma_{1}x_{t}\dot% {W}_{1},\\ \dot{y}_{t}=-y_{t}+x_{t}y_{t}+\sigma_{2}y_{t}\dot{W}_{2},\end{array}\right.

where ${W}_{1}$ and ${W}_{2}$ are independent Brownian motions, and $\sigma_{1}=\sigma_{2}=0.05$ . The training data are generated by simulating $120,000$ solutoin samples for one step $\Delta=0.01$ , from initial conditions in $I_{\mathbf{x}}=[0.1,0.35]\times[0.2,5.5]$ and under exicitations of 2nd-degree Taylor polynomials whose coefficients are from $[0.01,4.2]\times[-1.5,1.5]\times[-0.7,0.7]$ .

Once we have the trained model, we conduct system prediction with an initial condition $x_{0}=2.0$ , $y_{0}=1.0$ and exitation $u(t)=\sin(\frac{t}{3})+\cos(t)+2$ . We conduct relatively long-term prediction for time up to $T=80$ . (Note that the training data are of lenght $0.01$ .) In Figure 13, we plot a few sample of the phase portrait of the system. Good visual agreement between the sFML prediction and the ground truth can be observed. To examine the accuracy more closely, we present the mean and standard deviation of the system in Figure 14. We observe good predictive accuracy of the sFML model for up to $T=80$ .

4.4 Stochastic Resonance

Finally, we consider the following SDE with a double-well potential and excitation,

(25)

dx_{t}=\left[x_{t}-x_{t}^{3}+u(t)\right]dt+\sigma dW_{t},

where $\sigma=0.25$ is a parameter, and $u(t)$ is the excitation. When $V=0$ , there is no excitation to the system. The solution would exhibit random transition between two metastable states $x=-1$ and $x=1$ . The transition probability depends on the parameters $\sigma$ . When $V\neq 0$ , an excitation is exerted to the system. If the excitation is periodic, under the right circumstance the random transtion between the two metastable states becomes synchorized with the perodicity of the exication, resulting in the so-called stochastic resonance, cf., [4, 2, 3].

Here, we demonstrate that the proposed sFML method can accurately model and predict the long-term system behavior using only very short burst of measurement data. Our data are $30,000$ trajectories of one step ( $\Delta=0.01$ ) length, with initial conditions sampled from $I_{\mathbf{x}}=[-1.6,1.6]$ and under piecewise constant exictations sampled from $[-0.13,0.13]$ .

Once the sFML model is trained, we conduct system prediction under various excitations. In particular, we choose $u(t)=V\cos(\omega t)$ , with $V=0.12$ and $\omega=0.001$ . These parameters are chosen according to [4], to ensure the occurrence of stochastic resonance. An exceptionially long-term system prediction is conducted by the sFML model, for time up to $T=40,000$ . The result is shown in the top of Figure 15, where we also plotted the (rescaled) periodic excitation in light grey line in the background. We can clearly observe the synchonization between the random transition and the periodic excitation — the stochastic resonance. For reference, we also conduct the sFML system prediction with $V=0$ , i.e., no excitation. The solution, shown in the bottom of Figure 15, exhibits the expected random transition between the two metastable states. We shall emphsize that in this case the transition probability is very small, $O(10^{-5})$ . The learned sFML model is capable of capturing such a small probability event. We shall remark again that the training data are pairwise data separated by one time step. Thus, none of the (long-term) system behaviors can be observed in the training data.

5 Conclusion

In this paper, we presented a general numerical framework for modeling unknown nonautonomous stochastic systems by using observed trajectory data. To overcome the difficulties brought by the external time-dependent inputs, we transfer the original system into a local parametric stochastic system. We accomplished this by locally parameterizing the time-dependent external inputs on several discrete time points. The resulting stochastic system is then driven by a stationary parametric stochastic flow map. A normalizing flow model is devised to approximate the parametric stochastic flow map. By using a comprehensive set of numerical examples, we demonstrated that the proposed approach is effective and accurate in modeling a variety of unknown stochastic systems. The learned model can conduct expectational long-term system, subject to arbitrary external excitations that are not contained in the training data.

References

[1] C. Archambeau, D. Cornford, M. Opper, and J. Shawe-Taylor, Gaussian process approximations of stochastic differential equations, in Gaussian Processes in Practice, N. D. Lawrence, A. Schwaighofer, and J. Quiñonero Candela, eds., vol. 1 of Proceedings of Machine Learning Research, Bletchley Park, UK, 12–13 Jun 2007, PMLR, pp. 1–16, https://proceedings.mlr.press/v1/archambeau07a.html.
[2] R. Benzi, G. Parisi, A. Sutera, and A. Vulpiani, Stochastic resonance in climatic change, Tellus, 34 (1982), pp. 10–16.
[3] R. Benzi, G. Parisi, A. Sutera, and A. Vulpiani, A theory of stochastic resonance in climatic change, SIAM J. Appl. Math., 43 (1983), pp. 565–478, https://doi.org/10.1137/0143037, https://doi.org/10.1137/0143037.
[4] R. Benzi, A. Sutera, and A. Vulpiani, The mechanism of stochastic resonance, J. Phys. A, 14 (1981), pp. L453–L457, http://stacks.iop.org/0305-4470/14/L453.
[5] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932–3937, https://doi.org/10.1073/pnas.1517384113.
[6] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Sparse identification of nonlinear dynamics with control (sindyc), IFAC-PapersOnLine, 49 (2016), pp. 710–715, https://doi.org/https://doi.org/10.1016/j.ifacol.2016.10.249, https://www.sciencedirect.com/science/article/pii/S2405896316318298. 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016.
[7] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., vol. 31, Curran Associates, Inc., 2018, https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.
[8] X. Chen, J. Duan, J. Hu, and D. Li, Data-driven method to learn the most probable transition pathway and stochastic differential equation, Phys. D, 443 (2023), pp. Paper No. 133559, 15, https://doi.org/10.1016/j.physd.2022.133559.
[9] X. Chen, L. Yang, J. Duan, and G. E. Karniadakis, Solving inverse stochastic problems from discrete particle observations using the Fokker-Planck equation and physics-informed neural networks, SIAM J. Sci. Comput., 43 (2021), pp. B811–B830, https://doi.org/10.1137/20M1360153.
[10] Y. Chen and D. Xiu, Learning stochastic dynamical system via flow map operator, J. Comput. Phys., 508 (2024), p. Paper No. 112984, https://doi.org/10.1016/j.jcp.2024.112984, https://doi.org/10.1016/j.jcp.2024.112984.
[11] V. Churchill and D. Xiu, Flow map learning for unknown dynamical systems: Overview, implementation, and benchmarks, Journal of Machine Learning for Modeling and Computing, 4 (2023), pp. 173–201.
[12] M. Darcy, B. Hamzi, G. Livieri, H. Owhadi, and P. Tavallali, One-shot learning of stochastic differential equations with data adapted kernels, Phys. D, 444 (2023), pp. Paper No. 133583, 18, https://doi.org/10.1016/j.physd.2022.133583.
[13] R. Deng, B. Chang, M. A. Brubaker, G. Mori, and A. Lehrmann, Modeling continuous stochastic processes with dynamic normalizing flows, in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds., vol. 33, Curran Associates, Inc., 2020, pp. 7805–7815, https://proceedings.neurips.cc/paper_files/paper/2020/file/58c54802a9fb9526cd0923353a34a7ae-Paper.pdf.
[14] F. Dietrich, A. Makeev, G. Kevrekidis, N. Evangelou, T. Bertalan, S. Reich, and I. G. Kevrekidis, Learning effective stochastic differential equations from microscopic simulations: linking stochastic numerics to deep learning, Chaos, 33 (2023), pp. Paper No. 023121, 19, https://doi.org/10.1063/5.0113632, https://doi.org/10.1063/5.0113632.
[15] L. Dinh, J. Sohl-Dickstein, and S. Bengio, Density estimation using real NVP, in International Conference on Learning Representations, 2017, https://openreview.net/forum?id=HkpbnH9lx.
[16] X. Fu, L.-B. Chang, and D. Xiu, Learning reduced systems via deep neural networks with memory, J. Machine Learning Model. Comput., 1 (2020), pp. 97–118.
[17] L. Guo, H. Wu, and T. Zhou, Normalizing field flows: Solving forward and inverse stochastic differential equations using physics-informed flow models, Journal of Computational Physics, 461 (2022), p. 111202, https://doi.org/https://doi.org/10.1016/j.jcp.2022.111202, https://www.sciencedirect.com/science/article/pii/S0021999122002649.
[18] T. Haarnoja, K. Hartikainen, P. Abbeel, and S. Levine, Latent space policies for hierarchical reinforcement learning, in Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, PMLR, 10–15 Jul 2018, pp. 1851–1860, https://proceedings.mlr.press/v80/haarnoja18a.html.
[19] S. H. Kang, W. Liao, and Y. Liu, IDENT: identifying differential equations with numerical time evolution, J. Sci. Comput., 87 (2021), pp. Paper No. 1, 27, https://doi.org/10.1007/s10915-020-01404-9.
[20] I. Kobyzev, S. Prince, and M. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE Trans. Pattern Anal. Machine Intel., 43 (2021), pp. 3964–3979.
[21] V. Laparra, G. Camps-Valls, and J. Malo, Iterative gaussianization: From ica to random rotations, IEEE Transactions on Neural Networks, 22 (2011), pp. 537–549, https://doi.org/10.1109/TNN.2011.2106511.
[22] Y. Li and J. Duan, A data-driven approach for discovering stochastic dynamical systems with non-Gaussian Lévy noise, Phys. D, 417 (2021), pp. Paper No. 132830, 12, https://doi.org/10.1016/j.physd.2020.132830.
[23] Y. Li, Y. Lu, S. Xu, and J. Duan, Extracting stochastic dynamical systems with $\alpha$ -stable Lévy noise from data, J. Stat. Mech. Theory Exp., (2022), pp. Paper No. 023405, 23, https://doi.org/10.1088/1742-5468/ac4e87, https://doi.org/10.1088/1742-5468/ac4e87.
[24] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, Fourier neural operator for parametric partial differential equations, in International Conference on Learning Representations, 2021, https://openreview.net/forum?id=c8P9NQVtmnO.
[25] H. Lu and D. M. Tartakovsky, Data-driven models of nonautonomous systems, J. Comput. Phys., 507 (2024), p. Paper No. 112976, https://doi.org/10.1016/j.jcp.2024.112976, https://doi.org/10.1016/j.jcp.2024.112976.
[26] Y. Lu, R. Maulik, T. Gao, F. Dietrich, I. G. Kevrekidis, and J. Duan, Learning the temporal evolution of multivariate densities via normalizing flows, Chaos, 32 (2022), pp. Paper No. 033121, 17, https://doi.org/10.1063/5.0065093, https://doi.org/10.1063/5.0065093.
[27] T. Müller, B. Mcwilliams, F. Rousselle, M. Gross, and J. Novák, Neural importance sampling, ACM Trans. Graph., 38 (2019), https://doi.org/10.1145/3341156, https://doi.org/10.1145/3341156.
[28] B. Øksendal, Stochastic differential equations, in Stochastic differential equations, Springer, 2003, pp. 65–84.
[29] M. Opper, Variational inference for stochastic differential equations, Ann. Phys., 531 (2019), pp. 1800233, 9, https://doi.org/10.1002/andp.201800233.
[30] H. Owhadi, Computational graph completion, Research in the Mathematical Sciences, 9 (2022), p. 27.
[31] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, Normalizing flows for probabilistic modeling and inference, J. Mach. Learn. Res., 22 (2021), pp. Paper No. 57, 64.
[32] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, Normalizing flows for probabilistic modeling and inference, J. Machine Learning Res., 22 (2021), pp. 1–64.
[33] G. Papamakarios, T. Pavlakou, and I. Murray, Masked autoregressive flow for density estimation, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017, https://proceedings.neurips.cc/paper_files/paper/2017/file/6c1da886822c67822bcf3679d04369fa-Paper.pdf.
[34] J. L. Proctor, S. L. Brunton, and J. N. Kutz, Dynamic mode decomposition with control, SIAM J. Appl. Dyn. Syst., 15 (2016), pp. 142–161, https://doi.org/10.1137/15M1013857, https://doi.org/10.1137/15M1013857.
[35] J. L. Proctor, S. L. Brunton, and J. N. Kutz, Generalizing Koopman theory to allow for inputs and control, SIAM J. Appl. Dyn. Syst., 17 (2018), pp. 909–930, https://doi.org/10.1137/16M1062296, https://doi.org/10.1137/16M1062296.
[36] T. Qin, Z. Chen, J. D. Jakeman, and D. Xiu, Data-driven learning of nonautonomous systems, SIAM J. Sci. Comput., 43 (2021), pp. A1607–A1624, https://doi.org/10.1137/20M1342859.
[37] T. Qin, Z. Chen, J. D. Jakeman, and D. Xiu, Deep learning of parameterized equations with applications to uncertainty quantification, Int. J. Uncertain. Quantif., 11 (2021), pp. 63–82, https://doi.org/10.1615/Int.J.UncertaintyQuantification.2020034123, https://doi.org/10.1615/Int.J.UncertaintyQuantification.2020034123.
[38] T. Qin, K. Wu, and D. Xiu, Data driven governing equations approximation using deep neural networks, J. Comput. Phys., 395 (2019), pp. 620–635, https://doi.org/10.1016/j.jcp.2019.06.042.
[39] M. Raissi, P. Perdikaris, and G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics, 378 (2019), pp. 686–707, https://doi.org/10.1016/j.jcp.2018.10.045.
[40] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Multistep neural networks for data-driven discovery of nonlinear dynamical systems, arXiv preprint arXiv:1801.01236, (2018).
[41] H. Schaeffer and S. G. McCalla, Sparse model selection via integral terms, Phys. Rev. E, 96 (2017), pp. 023302, 7, https://doi.org/10.1103/physreve.96.023302.
[42] H. Schaeffer, G. Tran, and R. Ward, Extracting sparse high-dimensional dynamics from limited data, SIAM J. Appl. Math., 78 (2018), pp. 3279–3295, https://doi.org/10.1137/18M116798X.
[43] J. Song, S. Zhao, and S. Ermon, A-nice-mc: Adversarial training for mcmc, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds., vol. 30, Curran Associates, Inc., 2017, https://proceedings.neurips.cc/paper_files/paper/2017/file/2417dc8af8570f274e6775d4d60496da-Paper.pdf.
[44] Y. Wang, H. Fang, J. **, G. Ma, X. He, X. Dai, Z. Yue, C. Cheng, H.-T. Zhang, D. Pu, D. Wu, Y. Yuan, J. Gonçalves, J. Kurths, and H. Ding, Data-driven discovery of stochastic differential equations, Engineering, 17 (2022), pp. 244–252, https://doi.org/https://doi.org/10.1016/j.eng.2022.02.007.
[45] E. Weinan, Principles of multiscale modeling, Cambridge University Press, 2011.
[46] Z. Xu, Y. Chen, Q. Chen, and D. Xiu, Modeling unknown stochastic dynamical system via autoencoder, arXiv preprint arXiv:2312.10001, (2023).
[47] L. Yang, C. Daskalakis, and G. E. Karniadakis, Generative ensemble regression: Learning particle dynamics from observations of ensembles with physics-informed deep generative models, SIAM Journal on Scientific Computing, 44 (2022), pp. B80–B99, https://doi.org/10.1137/21M1413018.
[48] C. Yildiz, M. Heinonen, J. Intosalmi, H. Mannerstrom, and H. Lahdesmaki, Learning stochastic differential equations with gaussian processes without gradient matching, in 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, 2018, pp. 1–6.
[49] J. Zhang, S. Zhang, and G. Lin, Multiauto-deeponet: A multi-resolution autoencoder deeponet for nonlinear dimension reduction, uncertainty quantification and operator learning of forward and inverse stochastic problems, arXiv preprint arXiv:2204.03193, (2022).
[50] A. Zhu and Q. Li, Dyngma: a robust approach for learning stochastic differential equations from data, arXiv preprint arXiv:2402.14475, (2024).