Linear Noise Approximation Assisted Bayesian Inference on Mechanistic Model of Partially Observed Stochastic Reaction Network

Wandi Xu Northeastern University, Boston, MA 02115 Wei Xie Corresponding author: [email protected] Northeastern University, Boston, MA 02115
Abstract

To support mechanism online learning and facilitate digital twin development for biomanufacturing processes, this paper develops an efficient Bayesian inference approach for partially observed enzymatic stochastic reaction network (SRN), a fundamental building block of multi-scale bioprocess mechanistic model. To tackle the critical challenges brought by the nonlinear stochastic differential equations (SDEs)-based mechanistic model with partially observed state and having measurement errors, an interpretable Bayesian updating linear noise approximation (LNA) metamodel, incorporating the structure information of the mechanistic model, is proposed to approximate the likelihood of observations. Then, an efficient posterior sampling approach is developed by utilizing the gradients of the derived likelihood to speed up the convergence of Markov chain Monte Carlo (MCMC). The empirical study demonstrates that the proposed approach has a promising performance.

Keywords Partially observed stochastic reaction network  \cdot Bayesian inference  \cdot Linear noise approximation  \cdot Metropolis-adjusted Langevin algorithm

1 INTRODUCTION

Partially observed stochastic reaction network (SRN) modeling the dynamics of a population of interacting species, such as chemical molecules participating in multiple reactions, is the fundamental building block of multi-scale bioprocess mechanistic model characterizing the causal interdependences from molecular- to macro-kinetics. It plays a critical role to: (1) facilitate digital twin development and support mechanism learning for biomanufacturing processes; (2) allow us to probe critical latent state based on partially observed information; and (3) serve as a fundamental model for a biofoundry platform [1] that can integrate heterogeneous online and offline measures collected from different manufacturing processes and speed up the bioprocess development with much less experiments. Model inference on the SRN mechanistic model based on heterogeneous data also helps to strengthen the theoretical foundations of federated learning on bioprocess mechanisms, through which we can train and advance knowledge.

The SRN mechanistic model has three key features that make the model inference challenging. First, the continuous-time state transition model, representing the evolution of concentration or number of molecules, is highly nonlinear. At any time, the reaction rates, characterizing the regulation mechanisms of enzymatic reaction network, are a function of random state. We adopt the diffusion approximation in [2] to model the state dynamics with a set of coupled stochastic differential equations (SDEs). In this case, the state transition model has double-stochasticity, making it analytically intractable to obtain the state transition densities at different times and also hard to get the closed form likelihood of observations. Second, since the state is partially observed, we need to integrate out the unobserved state variables to get the likelihood. Third, the data collected from biomanufacturing processes are heterogeneous and also subject to measurement errors.

The model inference of enzymatic SRN has found increasing interest especially in biomanufacturing digital twin development. Even under the situations with the reaction network structure known, that is built on thousand years of the understanding on biological system mechanisms, the mechanistic model parameters are often unknown. It is necessary to infer these parameters using the observations collected from biomanufacturing processes. Since each batch of production can be expensive, we often have very small amount of experimental observations. Coupled with high stochasticity of biomanufacturing processes, the model uncertainty tends to be high. However, frequentist model uncertainty quantification approaches are built on asymptotic approximation, such as asymptotic normality and bootstrap. Thus, in this paper, we focus on a Bayesian inference on multi-scale mechanistic model, which can support online learning and interpretability.

An enormous volume of literature has been dedicated to Bayesian inference for SRN mechanistic model. As the exact state transition density of the SDEs-based mechanistic model is unknown, coupled with another challenge (i.e., the partially observed state), the marginal likelihood integrating out the unobserved state variables is intractable. Thus, many existing works are sampling approaches without the explicit calculations of the likelihood, such as approximate Bayesian computation (ABC) and its variants [3]. But in sampling approaches without using likelihood, the complex structure and high stochasticity of SRN make the simulation generating a large amount of sample paths computationally expensive and the acceptance rates of samplers very low. Therefore, we construct a metamodel to approximate the state transition densities of the SDEs, obtain a likelihood approximation, and utilize it to speed up Bayesian inference.

Gaussian Process (GP) is often used as a metamodel. The studies [4] and [5] use GPs as priors for nonparametric estimation of the drift and diffusion terms of SDEs without an exact knowledge of their functional forms. In this paper, we suppose the structure of SDEs-based mechanistic model is known. To completely exploit such structure information and improve the interpretability of constructed metamodel, we refer to the deterministic ordinary differential equation (ODE)-based dynamic system inference. In particular, [6] specifies a GP prior over the solution to the ODE, and restricts the GP on a manifold that satisfies the ODE system, to address the incompatibility between the metamodel and the mechanistic model. And an alternative to GP under SDE-based model is linear noise approximation (LNA). The LNA was originally proposed to approximate the solution of the chemical master equation (CME) [7], and it can be derived in a number of ways. For instance, [8] and [9] follow the idea of an asymptotic system size expansion, and derive the LNA by approximating the CME through a Taylor expansion. Since the solution of SDE itself is a random variable, it is difficult to extend the GP approach developed in [6] to the SDE model inference. Instead, following [10], we specify the derived LNA as a prior to the solution of the SDE, through which we take full advantage of the structure information provided by the SDE model without the time-consuming numerical integration.

The likelihood of observations can be obtained under the LNA, but the exact Bayesian posterior is still not analytically tractable as a conjugate prior is hard to find. One thus defers to sampling approaches to generate samples from the posterior. The most common one is Markov chain Monte Carlo (MCMC), such as Metropolis-Hastings algorithm. Its effectiveness depends heavily on the choice of the proposal distribution. Metropolis-adjusted Langevin algorithm (MALA) makes use of the additional gradient information of the target posterior distribution to construct a better proposal distribution, which is shown to have a faster mixing time compared with classic MCMC [11]. Therefore, in this paper, we specifically tailor a MALA procedure to generate posterior samples more efficiently.

In specific, we propose a LNA assisted Bayesian inference on the nonlinear multivariate SDE-based mechanistic model with partially observed state and subject to measurement errors. The main contributions are twofold. First, an interpretable Bayesian updating LNA metamodel is developed for likelihood approximation. It provides a coherent way to simultaneously satisfy the SDE model and fit the observed data, allowing us to probe critical latent state based on partially observed information. Second, the proposed MALA procedure utilizes the gradient information from the derived likelihood to speed up MCMC search and more efficiently generate posterior samples. The proposed Bayesian inference for SRN can support online mechanism learning, facilitate digital twin development, and speed up bioprocess design and control.

The paper is organized as follows. We provide a brief introduction of the SDE-based mechanistic model for enzymatic SRN and problem description in Section 2. To facilitate the model Bayesian inference, the LNA is used to construct the state transition densities and a closed form likelihood is thus derived in Section 3. Then, we propose an efficient and interpretable Bayesian posterior sampling algorithm in Section 4. Its performance is studied in Section 5. Finally, we conclude the paper in Section 6.

2 STOCHASTIC REACTION NETWORK (SRN) MODEL AND PROBLEM DESCRIPTION

(1) SRN model. We first review a general SRN composed of J𝐽Jitalic_J species, denoted by 𝑿=(X1,X2,,XJ)𝑿superscriptsubscript𝑋1subscript𝑋2subscript𝑋𝐽top\boldsymbol{X}=(X_{1},X_{2},\ldots,X_{J})^{\top}bold_italic_X = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, interacting with each other through K𝐾Kitalic_K reactions. The number of molecules of species j𝑗jitalic_j at time t𝑡titalic_t is denoted by xj(t)subscript𝑥𝑗𝑡x_{j}(t)italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) and 𝒙(t)=(x1(t),x2(t),,xJ(t))𝒙𝑡superscriptsubscript𝑥1𝑡subscript𝑥2𝑡subscript𝑥𝐽𝑡top\boldsymbol{x}(t)=\left(x_{1}(t),x_{2}(t),\ldots,x_{J}(t)\right)^{\top}bold_italic_x ( italic_t ) = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , … , italic_x start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Each reaction is characterized by a nonzero reaction vector 𝑪kJsubscript𝑪𝑘superscript𝐽\boldsymbol{C}_{k}\in\mathbb{R}^{J}bold_italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_J end_POSTSUPERSCRIPT for k=1,2,K𝑘12𝐾k=1,2\ldots,Kitalic_k = 1 , 2 … , italic_K, describing the change in the numbers of J𝐽Jitalic_J species’ molecules when a k𝑘kitalic_k-th molecular reaction occurs. The associated propensity function, denoted by ωksubscript𝜔𝑘\omega_{k}italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, describes the probability with which the k𝑘kitalic_k-th reaction occurs per time unit. Specifically, for the k𝑘kitalic_k-th reaction equation given by

pk1X1+pk2X2++pkJXJωkqk1X1+qk2X2++qkJXJ,subscript𝜔𝑘subscript𝑝𝑘1subscript𝑋1subscript𝑝𝑘2subscript𝑋2subscript𝑝𝑘𝐽subscript𝑋𝐽subscript𝑞𝑘1subscript𝑋1subscript𝑞𝑘2subscript𝑋2subscript𝑞𝑘𝐽subscript𝑋𝐽p_{k1}X_{1}+p_{k2}X_{2}+\cdots+p_{kJ}X_{J}\xrightarrow{\omega_{k}}q_{k1}X_{1}+% q_{k2}X_{2}+\cdots+q_{kJ}X_{J},italic_p start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ⋯ + italic_p start_POSTSUBSCRIPT italic_k italic_J end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT start_ARROW start_OVERACCENT italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_q start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ⋯ + italic_q start_POSTSUBSCRIPT italic_k italic_J end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ,

the reaction relational structure, specified by 𝑪k=(qk1pk1,qk2pk2,,qkJpkJ)subscript𝑪𝑘superscriptsubscript𝑞𝑘1subscript𝑝𝑘1subscript𝑞𝑘2subscript𝑝𝑘2subscript𝑞𝑘𝐽subscript𝑝𝑘𝐽top\boldsymbol{C}_{k}=(q_{k1}-p_{k1},q_{k2}-p_{k2},\ldots,q_{kJ}-p_{kJ})^{\top}bold_italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( italic_q start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_k italic_J end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_k italic_J end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, is known for k=1,2,,K𝑘12𝐾k=1,2,\ldots,Kitalic_k = 1 , 2 , … , italic_K. Thus, the stoichiometry matrix 𝑪=(𝑪1,𝑪2,,𝑪K)J×K𝑪subscript𝑪1subscript𝑪2subscript𝑪𝐾superscript𝐽𝐾\boldsymbol{C}=\left(\boldsymbol{C}_{1},\boldsymbol{C}_{2},\ldots,\boldsymbol{% C}_{K}\right)\in\mathbb{R}^{J\times K}bold_italic_C = ( bold_italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_J × italic_K end_POSTSUPERSCRIPT characterizes the structure information of the reaction network composed of K𝐾Kitalic_K reactions, where its (i,j)𝑖𝑗(i,j)( italic_i , italic_j )-th element represents the number of molecules of the i𝑖iitalic_i-th species that are either consumed (indicated by a negative value) or produced (indicated by a positive value) in each random occurrence of the j𝑗jitalic_j-th reaction.

Then, we describe the state transition model for bioprocess. As a multi-scale bioprocess representing the dependence from molecular- to macro-kinetics, it is built on the fundamental building block, i.e., molecular reaction network. Let d𝑹(t)𝑑𝑹𝑡d\boldsymbol{R}(t)italic_d bold_italic_R ( italic_t ) represent a K𝐾Kitalic_K-dimensional vector of occurrences of each molecular reaction in an infinitesimal time interval (t,t+dt]𝑡𝑡𝑑𝑡(t,t+dt]( italic_t , italic_t + italic_d italic_t ]. It follows a distribution with parameters depending on the propensity functions 𝝎(𝒙(t);𝜽)=(ω1(𝒙(t);𝜽1),ω2(𝒙(t);𝜽2),,ωK(𝒙(t);𝜽K))𝝎𝒙𝑡𝜽superscriptsubscript𝜔1𝒙𝑡subscript𝜽1subscript𝜔2𝒙𝑡subscript𝜽2subscript𝜔𝐾𝒙𝑡subscript𝜽𝐾top\boldsymbol{\omega}(\boldsymbol{x}(t);\boldsymbol{\theta})=\left(\omega_{1}(% \boldsymbol{x}(t);\boldsymbol{\theta}_{1}),\omega_{2}(\boldsymbol{x}(t);% \boldsymbol{\theta}_{2}),\ldots,\omega_{K}(\boldsymbol{x}(t);\boldsymbol{% \theta}_{K})\right)^{\top}bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) = ( italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_ω start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, where the structure of each ωk(𝒙(t);𝜽k)subscript𝜔𝑘𝒙𝑡subscript𝜽𝑘\omega_{k}(\boldsymbol{x}(t);\boldsymbol{\theta}_{k})italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), characterizing the bioprocess regulation mechanism for the k𝑘kitalic_k-th molecular reaction, is given and we focus on the inference of the unknown parameters 𝜽=(𝜽1,𝜽2,,𝜽K)𝜽superscriptsuperscriptsubscript𝜽1topsuperscriptsubscript𝜽2topsuperscriptsubscript𝜽𝐾toptop\boldsymbol{\theta}=(\boldsymbol{\theta}_{1}^{\top},\boldsymbol{\theta}_{2}^{% \top},\ldots,\boldsymbol{\theta}_{K}^{\top})^{\top}bold_italic_θ = ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , bold_italic_θ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

Due to the fact that reaction events change species numbers by an integer amount, the state transition model is naturally characterized by a continuous-time Markov jump process [12]. In particular, assuming that two reactions cannot occur at exactly the same time, one can represent the occurrences number of each k𝑘kitalic_k-th reaction in an infinitesimal time interval (t,t+dt]𝑡𝑡𝑑𝑡(t,t+dt]( italic_t , italic_t + italic_d italic_t ], denoted by dRk(t)𝑑subscript𝑅𝑘𝑡dR_{k}(t)italic_d italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) (i.e., the k𝑘kitalic_k-th component of d𝑹(t)𝑑𝑹𝑡d\boldsymbol{R}(t)italic_d bold_italic_R ( italic_t )), using one of the most elementary counting process, namely, the nonhomogeneous Poisson process. Since the dynamic change of propensity function in any infinitesimal time interval (t,t+dt]𝑡𝑡𝑑𝑡(t,t+dt]( italic_t , italic_t + italic_d italic_t ] is negligible, the intensity of dRk(t)𝑑subscript𝑅𝑘𝑡dR_{k}(t)italic_d italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) becomes ωk(𝒙(t);𝜽k)dtsubscript𝜔𝑘𝒙𝑡subscript𝜽𝑘𝑑𝑡\omega_{k}(\boldsymbol{x}(t);\boldsymbol{\theta}_{k})dtitalic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_d italic_t. And conditional on 𝒙(t)𝒙𝑡\boldsymbol{x}(t)bold_italic_x ( italic_t ), dRk(t)𝑑subscript𝑅𝑘𝑡dR_{k}(t)italic_d italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) for k=1,2,,K𝑘12𝐾k=1,2,\ldots,Kitalic_k = 1 , 2 , … , italic_K can be considered as independent of one another and are Poisson(ωk(𝒙(t);𝜽k)dt)subscript𝜔𝑘𝒙𝑡subscript𝜽𝑘𝑑𝑡(\omega_{k}(\boldsymbol{x}(t);\boldsymbol{\theta}_{k})dt)( italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_d italic_t ) random variables, from which we have 𝔼(d𝑹(t))=𝝎(𝒙(t);𝜽)dt𝔼𝑑𝑹𝑡𝝎𝒙𝑡𝜽𝑑𝑡\mathbb{E}(d\boldsymbol{R}(t))=\boldsymbol{\omega}(\boldsymbol{x}(t);% \boldsymbol{\theta})dtblackboard_E ( italic_d bold_italic_R ( italic_t ) ) = bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) italic_d italic_t and Cov(d𝑹(t))=diag{𝝎(𝒙(t);𝜽)}dt𝑑𝑹𝑡diag𝝎𝒙𝑡𝜽𝑑𝑡(d\boldsymbol{R}(t))={\rm diag}\{\boldsymbol{\omega}(\boldsymbol{x}(t);% \boldsymbol{\theta})\}dt( italic_d bold_italic_R ( italic_t ) ) = roman_diag { bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) } italic_d italic_t. Under the Poisson assumption, we adopt the diffusion approximation to Markov jump process following the study [2] and then model d𝑹(t)𝑑𝑹𝑡d\boldsymbol{R}(t)italic_d bold_italic_R ( italic_t ) with Itô SDE, i.e.,

d𝑹(t)=𝔼(d𝑹(t))+{Cov(d𝑹(t))}12d𝑩(t)=𝝎(𝒙(t);𝜽)dt+{diag{𝝎(𝒙(t);𝜽)}}12d𝑩(t),𝑑𝑹𝑡𝔼𝑑𝑹𝑡superscriptCov𝑑𝑹𝑡12𝑑𝑩𝑡𝝎𝒙𝑡𝜽𝑑𝑡superscriptdiag𝝎𝒙𝑡𝜽12𝑑𝑩𝑡d\boldsymbol{R}(t)=\mathbb{E}(d\boldsymbol{R}(t))+\left\{{\rm Cov}(d% \boldsymbol{R}(t))\right\}^{\frac{1}{2}}d\boldsymbol{B}(t)=\boldsymbol{\omega}% (\boldsymbol{x}(t);\boldsymbol{\theta})dt+\left\{{\rm diag}\{\boldsymbol{% \omega}(\boldsymbol{x}(t);\boldsymbol{\theta})\}\right\}^{\frac{1}{2}}d% \boldsymbol{B}(t),italic_d bold_italic_R ( italic_t ) = blackboard_E ( italic_d bold_italic_R ( italic_t ) ) + { roman_Cov ( italic_d bold_italic_R ( italic_t ) ) } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d bold_italic_B ( italic_t ) = bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) italic_d italic_t + { roman_diag { bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) } } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d bold_italic_B ( italic_t ) ,

where d𝑩(t)𝑑𝑩𝑡d\boldsymbol{B}(t)italic_d bold_italic_B ( italic_t ) is the increment of a K𝐾Kitalic_K-dimensional standard Brownian motion. Given the reaction network structure specified by the stoichiometry matrix 𝑪𝑪\boldsymbol{C}bold_italic_C, the impact on the process dynamics becomes,

d𝒙(t)=𝑪d𝑹(t)=𝑪𝝎(𝒙(t);𝜽)dt+{𝑪diag{𝝎(𝒙(t);𝜽)}𝑪}12d𝑩(t).𝑑𝒙𝑡𝑪𝑑𝑹𝑡𝑪𝝎𝒙𝑡𝜽𝑑𝑡superscript𝑪diag𝝎𝒙𝑡𝜽superscript𝑪top12𝑑𝑩𝑡d\boldsymbol{x}(t)=\boldsymbol{C}d\boldsymbol{R}(t)=\boldsymbol{C}\boldsymbol{% \omega}(\boldsymbol{x}(t);\boldsymbol{\theta})dt+\left\{\boldsymbol{C}{\rm diag% }\{\boldsymbol{\omega}(\boldsymbol{x}(t);\boldsymbol{\theta})\}\boldsymbol{C}^% {\top}\right\}^{\frac{1}{2}}d\boldsymbol{B}(t).italic_d bold_italic_x ( italic_t ) = bold_italic_C italic_d bold_italic_R ( italic_t ) = bold_italic_C bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) italic_d italic_t + { bold_italic_C roman_diag { bold_italic_ω ( bold_italic_x ( italic_t ) ; bold_italic_θ ) } bold_italic_C start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d bold_italic_B ( italic_t ) . (1)

For both theoretical study and practical application purposes, the system is assumed to have a size parameter ΩΩ\Omegaroman_Ω (such as the volume of bioreactor). Then sj(t)=xj(t)/Ωsubscript𝑠𝑗𝑡subscript𝑥𝑗𝑡Ωs_{j}(t)=x_{j}(t)/\Omegaitalic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) / roman_Ω represents the concentration of molecules of species j𝑗jitalic_j. At any time t𝑡titalic_t, let 𝒔(t)=(s1(t),s2(t),,sJ(t))=Ω1𝒙(t)𝒔𝑡superscriptsubscript𝑠1𝑡subscript𝑠2𝑡subscript𝑠𝐽𝑡topsuperscriptΩ1𝒙𝑡\boldsymbol{s}(t)=(s_{1}(t),s_{2}(t),\ldots,s_{J}(t))^{\top}=\Omega^{-1}% \boldsymbol{x}(t)bold_italic_s ( italic_t ) = ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , … , italic_s start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = roman_Ω start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) be the bioprocess state. And the propensity functions ωk(𝒙(t);𝜽k)subscript𝜔𝑘𝒙𝑡subscript𝜽𝑘\omega_{k}(\boldsymbol{x}(t);\boldsymbol{\theta}_{k})italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) for k=1,2,,K𝑘12𝐾k=1,2,\ldots,Kitalic_k = 1 , 2 , … , italic_K can be written as

ωk(𝒙(t);𝜽k)=Ωvk(Ω1𝒙(t);𝜽k)=Ωvk(𝒔(t);𝜽k),subscript𝜔𝑘𝒙𝑡subscript𝜽𝑘Ωsubscript𝑣𝑘superscriptΩ1𝒙𝑡subscript𝜽𝑘Ωsubscript𝑣𝑘𝒔𝑡subscript𝜽𝑘\omega_{k}(\boldsymbol{x}(t);\boldsymbol{\theta}_{k})=\Omega v_{k}\left(\Omega% ^{-1}\boldsymbol{x}(t);\boldsymbol{\theta}_{k}\right)=\Omega v_{k}(\boldsymbol% {s}(t);\boldsymbol{\theta}_{k}),italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_Ω italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( roman_Ω start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_x ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = roman_Ω italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_s ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (2)

where vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the reaction rate associated with the k𝑘kitalic_k-th reaction, specified by the parameters 𝜽ksubscript𝜽𝑘\boldsymbol{\theta}_{k}bold_italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and depending on the current system state 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ). By plugging the relation between the propensity function and the reaction rate (i.e., Equation (2)) into Equation (1), we get the state transition,

d𝒔(t)=Ω1d𝒙(t)𝑑𝒔𝑡superscriptΩ1𝑑𝒙𝑡\displaystyle d\boldsymbol{s}(t)=\Omega^{-1}d\boldsymbol{x}(t)italic_d bold_italic_s ( italic_t ) = roman_Ω start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_d bold_italic_x ( italic_t ) =𝑪𝒗(𝒔(t);𝜽)dt+Ω12{𝑪diag{𝒗(𝒔(t);𝜽)}𝑪}12d𝑩(t)absent𝑪𝒗𝒔𝑡𝜽𝑑𝑡superscriptΩ12superscript𝑪diag𝒗𝒔𝑡𝜽superscript𝑪top12𝑑𝑩𝑡\displaystyle=\boldsymbol{C}\boldsymbol{v}(\boldsymbol{s}(t);\boldsymbol{% \theta})dt+\Omega^{-\frac{1}{2}}\left\{\boldsymbol{C}{\rm diag}\{\boldsymbol{v% }(\boldsymbol{s}(t);\boldsymbol{\theta})\}\boldsymbol{C}^{\top}\right\}^{\frac% {1}{2}}d\boldsymbol{B}(t)= bold_italic_C bold_italic_v ( bold_italic_s ( italic_t ) ; bold_italic_θ ) italic_d italic_t + roman_Ω start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT { bold_italic_C roman_diag { bold_italic_v ( bold_italic_s ( italic_t ) ; bold_italic_θ ) } bold_italic_C start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d bold_italic_B ( italic_t )
𝝁(𝒔(t);𝜽)dt+Ω12{𝑫(𝒔(t);𝜽)}12d𝑩(t),absent𝝁𝒔𝑡𝜽𝑑𝑡superscriptΩ12superscript𝑫𝒔𝑡𝜽12𝑑𝑩𝑡\displaystyle\triangleq\boldsymbol{\mu}(\boldsymbol{s}(t);\boldsymbol{\theta})% dt+\Omega^{-\frac{1}{2}}\left\{\boldsymbol{D}(\boldsymbol{s}(t);\boldsymbol{% \theta})\right\}^{\frac{1}{2}}d\boldsymbol{B}(t),≜ bold_italic_μ ( bold_italic_s ( italic_t ) ; bold_italic_θ ) italic_d italic_t + roman_Ω start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT { bold_italic_D ( bold_italic_s ( italic_t ) ; bold_italic_θ ) } start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d bold_italic_B ( italic_t ) , (3)

where 𝒗(𝒔(t);𝜽)=(v1(𝒔(t);𝜽1),v2(𝒔(t);𝜽2),,vK(𝒔(t);𝜽K))𝒗𝒔𝑡𝜽superscriptsubscript𝑣1𝒔𝑡subscript𝜽1subscript𝑣2𝒔𝑡subscript𝜽2subscript𝑣𝐾𝒔𝑡subscript𝜽𝐾top\boldsymbol{v}(\boldsymbol{s}(t);\boldsymbol{\theta})=\left(v_{1}(\boldsymbol{% s}(t);\boldsymbol{\theta}_{1}),v_{2}(\boldsymbol{s}(t);\boldsymbol{\theta}_{2}% ),\ldots,v_{K}(\boldsymbol{s}(t);\boldsymbol{\theta}_{K})\right)^{\top}bold_italic_v ( bold_italic_s ( italic_t ) ; bold_italic_θ ) = ( italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_s ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_s ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_v start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( bold_italic_s ( italic_t ) ; bold_italic_θ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the reaction rate vector. Equation (3) also represents the doubly stochastic property of SRN, that is, both mean 𝝁(𝒔(t);𝜽)𝝁𝒔𝑡𝜽\boldsymbol{\mu}(\boldsymbol{s}(t);\boldsymbol{\theta})bold_italic_μ ( bold_italic_s ( italic_t ) ; bold_italic_θ ) and variance 𝑫(𝒔(t);𝜽)𝑫𝒔𝑡𝜽\boldsymbol{D}(\boldsymbol{s}(t);\boldsymbol{\theta})bold_italic_D ( bold_italic_s ( italic_t ) ; bold_italic_θ ) are functions of the current system state 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) and characterized by the parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, while 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) is a random state vector that changes over time and its evolution (i.e., d𝒔(t)𝑑𝒔𝑡d\boldsymbol{s}(t)italic_d bold_italic_s ( italic_t )) is characterized by 𝝁(𝒔(t);𝜽)𝝁𝒔𝑡𝜽\boldsymbol{\mu}(\boldsymbol{s}(t);\boldsymbol{\theta})bold_italic_μ ( bold_italic_s ( italic_t ) ; bold_italic_θ ) and 𝑫(𝒔(t);𝜽)𝑫𝒔𝑡𝜽\boldsymbol{D}(\boldsymbol{s}(t);\boldsymbol{\theta})bold_italic_D ( bold_italic_s ( italic_t ) ; bold_italic_θ ).

(2) Partially observed state and heterogeneous data collection. The measures of partially observed state variables are often heterogeneous and subject to measurement errors. The observations for different observable state components are also asynchronous; see Figure 1(a). In particular, we represent all observation times of state as the time set, denoted by 𝑻={t0,t1,,tH}𝑻subscript𝑡0subscript𝑡1subscript𝑡𝐻\boldsymbol{T}=\{t_{0},t_{1},\ldots,t_{H}\}bold_italic_T = { italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT }, where t0<t1<<tHsubscript𝑡0subscript𝑡1subscript𝑡𝐻t_{0}<t_{1}<\cdots<t_{H}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ⋯ < italic_t start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT, and the time intervals Δth=th+1thΔsubscript𝑡subscript𝑡1subscript𝑡\Delta t_{h}=t_{h+1}-t_{h}roman_Δ italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT can be variable for h=0,1,,H101𝐻1h=0,1,\ldots,H-1italic_h = 0 , 1 , … , italic_H - 1. At each observation time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, we denote the set of observed components’ subscripts of underlying state 𝒔𝒔\boldsymbol{s}bold_italic_s by 𝑱hsubscript𝑱\boldsymbol{J}_{h}bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, i.e., 𝑱h={j[J]:sj is observed at time th}subscript𝑱conditional-set𝑗delimited-[]𝐽subscript𝑠𝑗 is observed at time subscript𝑡\boldsymbol{J}_{h}=\{j\in[J]:s_{j}\text{ is observed at time }t_{h}\}bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = { italic_j ∈ [ italic_J ] : italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is observed at time italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT } where [J]delimited-[]𝐽[J][ italic_J ] represents {1,2,,J}12𝐽\{1,2,\ldots,J\}{ 1 , 2 , … , italic_J }, and let 𝑱y=h=0H𝑱hsubscript𝑱𝑦superscriptsubscript0𝐻subscript𝑱\boldsymbol{J}_{y}=\cup_{h=0}^{H}\boldsymbol{J}_{h}bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = ∪ start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT be the set of subscripts of the components that can be observed at certain times of experiments. The observations are denoted by 𝒚h(th)M|𝑱h|subscript𝒚subscript𝑡superscript𝑀subscript𝑱\boldsymbol{y}_{h}(t_{h})\in\mathbb{R}^{M|\boldsymbol{J}_{h}|}bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_M | bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT, where |𝑱h|Jsubscript𝑱𝐽|\boldsymbol{J}_{h}|\leq J| bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | ≤ italic_J is the cardinality of 𝑱hsubscript𝑱\boldsymbol{J}_{h}bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT representing the dimension of observed components of underlying state 𝒔𝒔\boldsymbol{s}bold_italic_s at time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, and M𝑀Mitalic_M is the batch size of experiments. Then, the observations at time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT can be modeled as

𝒚h(th)=𝑮h𝒔(th)+ϵh(th).subscript𝒚subscript𝑡subscript𝑮𝒔subscript𝑡subscriptbold-italic-ϵsubscript𝑡\boldsymbol{y}_{h}(t_{h})=\boldsymbol{G}_{h}\boldsymbol{s}(t_{h})+\boldsymbol{% \epsilon}_{h}(t_{h}).bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) = bold_italic_G start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + bold_italic_ϵ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) . (4)

Suppose the measurement errors follow a multivariate Gaussian distribution ϵh(th)𝒩(𝟎,𝚺h)similar-tosubscriptbold-italic-ϵsubscript𝑡𝒩0subscript𝚺\boldsymbol{\epsilon}_{h}(t_{h})\sim\mathcal{N}(\boldsymbol{0},\boldsymbol{% \Sigma}_{h})bold_italic_ϵ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ∼ caligraphic_N ( bold_0 , bold_Σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ), where 𝚺hsubscript𝚺\boldsymbol{\Sigma}_{h}bold_Σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is a diagonal matrix with M𝑀Mitalic_M vectors of 𝝈hsubscript𝝈\boldsymbol{\sigma}_{h}bold_italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT on the main diagonal, and 𝝈h={σjj:j𝑱h}subscript𝝈superscriptconditional-setsubscript𝜎𝑗𝑗𝑗subscript𝑱top\boldsymbol{\sigma}_{h}=\left\{\sigma_{jj}:j\in\boldsymbol{J}_{h}\right\}^{\top}bold_italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = { italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT : italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT is the vector of measurement error level at time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. Further, let 𝝈={σjj:j𝑱y}𝝈superscriptconditional-setsubscript𝜎𝑗𝑗𝑗subscript𝑱𝑦top\boldsymbol{\sigma}=\left\{\sigma_{jj}:j\in\boldsymbol{J}_{y}\right\}^{\top}bold_italic_σ = { italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT : italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT be the vector of measurement error level of all observed components. And 𝑮hsubscript𝑮\boldsymbol{G}_{h}bold_italic_G start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is a M|𝑱h|𝑀subscript𝑱M|\boldsymbol{J}_{h}|italic_M | bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT |-by-J𝐽Jitalic_J constant matrix, map** the entire J𝐽Jitalic_J-dimensional vector of underlying state 𝒔(th)𝒔subscript𝑡\boldsymbol{s}(t_{h})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) into the M𝑀Mitalic_M batches of |𝑱h|subscript𝑱|\boldsymbol{J}_{h}|| bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT |-dimensional vector containing only the counterpart of observed components at time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. Notice that the dimension |𝑱h|subscript𝑱|\boldsymbol{J}_{h}|| bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | can change at different observation times accounting for the fact that the measures of partially observed state are asynchronous.

Given the observed data set denoted by 𝒟M={𝒚h(th)}h=0Hsubscript𝒟𝑀superscriptsubscriptsubscript𝒚subscript𝑡0𝐻\mathcal{D}_{M}=\{\boldsymbol{y}_{h}(t_{h})\}_{h=0}^{H}caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = { bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT, the model uncertainty is quantified by a posterior distribution p(𝜽,𝝈|𝒟M)p(𝜽)p(𝝈)p(𝒟M|𝜽,𝝈)proportional-to𝑝𝜽conditional𝝈subscript𝒟𝑀𝑝𝜽𝑝𝝈𝑝conditionalsubscript𝒟𝑀𝜽𝝈p\left(\boldsymbol{\theta},\boldsymbol{\sigma}|\mathcal{D}_{M}\right)\propto p% \left(\boldsymbol{\theta}\right)p\left(\boldsymbol{\sigma}\right)p\left(% \mathcal{D}_{M}|\boldsymbol{\theta},\boldsymbol{\sigma}\right)italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) ∝ italic_p ( bold_italic_θ ) italic_p ( bold_italic_σ ) italic_p ( caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT | bold_italic_θ , bold_italic_σ ). With the collection of new experiment data Δ𝒟Δ𝒟\Delta\mathcal{D}roman_Δ caligraphic_D, the model uncertainty can be updated as p(𝜽,𝝈|𝒟MΔ𝒟)p(𝜽,𝝈|𝒟M)p(Δ𝒟|𝜽,𝝈)proportional-to𝑝𝜽conditional𝝈subscript𝒟𝑀Δ𝒟𝑝𝜽conditional𝝈subscript𝒟𝑀𝑝conditionalΔ𝒟𝜽𝝈p\left(\boldsymbol{\theta},\boldsymbol{\sigma}|\mathcal{D}_{M}\cup\Delta% \mathcal{D}\right)\propto p\left(\boldsymbol{\theta},\boldsymbol{\sigma}|% \mathcal{D}_{M}\right)p\left(\Delta\mathcal{D}|\boldsymbol{\theta},\boldsymbol% {\sigma}\right)italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∪ roman_Δ caligraphic_D ) ∝ italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) italic_p ( roman_Δ caligraphic_D | bold_italic_θ , bold_italic_σ ).

(3) Key challenges on Bayesian inference and summary of the proposed inference approach. Our focus in this paper is to develop a computationally efficient Bayesian inference approach on unknown model parameters 𝜽𝚯N𝜽superscript𝚯𝑁\boldsymbol{\theta}\in\boldsymbol{\Theta}^{N}bold_italic_θ ∈ bold_Θ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT with emphasis on nonlinear 𝝁(𝒔(t);𝜽)𝝁𝒔𝑡𝜽\boldsymbol{\mu}(\boldsymbol{s}(t);\boldsymbol{\theta})bold_italic_μ ( bold_italic_s ( italic_t ) ; bold_italic_θ ) and 𝑫(𝒔(t);𝜽)𝑫𝒔𝑡𝜽\boldsymbol{D}(\boldsymbol{s}(t);\boldsymbol{\theta})bold_italic_D ( bold_italic_s ( italic_t ) ; bold_italic_θ ) characterizing the regulation mechanisms of SRN as shown in (3), where 𝚯NNsuperscript𝚯𝑁superscript𝑁\boldsymbol{\Theta}^{N}\subset\mathbb{R}^{N}bold_Θ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is the feasible parameter space. The first challenge is partially observed state subject to random measurement error. The observed components of system state 𝒔𝒔\boldsymbol{s}bold_italic_s are recorded at limited discrete time points and the observation time points of each observable component may not be synchronized; see Figure 1(a). Moreover, there are often some components of state 𝒔𝒔\boldsymbol{s}bold_italic_s unobservable. To tackle this challenge, we develop an interpretable Bayesian updating LNA metamodel on underlying state 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) in Section 3 so that we can predict all components of 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) at any time t𝑡titalic_t.

Such a metamodel needs to have capability to characterize the dependence between components of 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) and handle the doubly stochasticity of SRN. Luckily, we have a SDE model (3) representing the mechanism of state change. This brings us to the second challenge. On the one hand, the nonlinear drift and diffusion terms of the SDE (3) make solving it directly to get the metamodel of 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) require time-consuming numerical integration methods. On the other hand, the regulation mechanism structure information from the SDE cannot be completely exploited which impacts on interpretability if we choose a black-box metamodel. To take full advantage of the structure information about the state transition provided by the SDE (3), and to avoid the use of numerical integration to solve these SDEs, we place LNA priors on the dynamics of state 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) to facilitate inference of model parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ. Under the LNA, the underlying process {𝒔(t):t0}conditional-set𝒔𝑡𝑡0\{\boldsymbol{s}(t):t\geq 0\}{ bold_italic_s ( italic_t ) : italic_t ≥ 0 } follows a multivariate Gaussian distribution, combined with the assumption of linear Gaussian relation between each observation 𝒚h(th)subscript𝒚subscript𝑡\boldsymbol{y}_{h}(t_{h})bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) and underlying state value 𝒔(th)𝒔subscript𝑡\boldsymbol{s}(t_{h})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) as shown in Equation (4) to give a tractable approximation to the likelihood of the observations {𝒚h(th)}h=0Hsuperscriptsubscriptsubscript𝒚subscript𝑡0𝐻\{\boldsymbol{y}_{h}(t_{h})\}_{h=0}^{H}{ bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT. The key to our approach is, to avoid a poor approximation to the true distribution of 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) as t𝑡titalic_t gets large, we reinitialize the LNA for each time interval (th,th+1]subscript𝑡subscript𝑡1(t_{h},t_{h+1}]( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ] using the derived posterior distribution of 𝒔(th)𝒔subscript𝑡\boldsymbol{s}(t_{h})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) given 𝒚h(th),𝒚h1(th1),,𝒚0(t0)subscript𝒚subscript𝑡subscript𝒚1subscript𝑡1subscript𝒚0subscript𝑡0\boldsymbol{y}_{h}(t_{h}),\boldsymbol{y}_{h-1}(t_{h-1}),\ldots,\boldsymbol{y}_% {0}(t_{0})bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , bold_italic_y start_POSTSUBSCRIPT italic_h - 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h - 1 end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ); see Figure 1(b).

Refer to caption
(a)
Refer to caption
(b)
Figure 1: An illustration of (a) the partially observed state with measurement error; and (b) the proposed interpretable Bayesian updating LNA metamodel for enzymatic stochastic reaction network (SRN).

3 BAYESIAN UPDATING LINEAR NOISE APPROXIMATION (LNA) METAMODEL

In this section, we first utilize the LNA to approximate the SDE model (3) and then develop a Bayesian updating LNA metamodel to reduce the approximation error between the true solution to the SDE (3) and LNA model. The LNA divides the path {𝒔(t):t0}conditional-set𝒔𝑡𝑡0\{\boldsymbol{s}(t):t\geq 0\}{ bold_italic_s ( italic_t ) : italic_t ≥ 0 } of the SDE (3) into a deterministic path {𝒔¯(t):t0}conditional-set¯𝒔𝑡𝑡0\{\bar{\boldsymbol{s}}(t):t\geq 0\}{ over¯ start_ARG bold_italic_s end_ARG ( italic_t ) : italic_t ≥ 0 } and a stochastic perturbation {𝝃(t):t0}conditional-set𝝃𝑡𝑡0\{\boldsymbol{\xi}(t):t\geq 0\}{ bold_italic_ξ ( italic_t ) : italic_t ≥ 0 }, where the fluctuations in 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ) at any given time t𝑡titalic_t are assumed to be of O(Ω12)𝑂superscriptΩ12O(\Omega^{-\frac{1}{2}})italic_O ( roman_Ω start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ); see [8] and [10] for a rigorous derivation and detailed discussion. Under this partition, through a Taylor expansion of the SDE (3) around 𝒔¯(t)¯𝒔𝑡\bar{\boldsymbol{s}}(t)over¯ start_ARG bold_italic_s end_ARG ( italic_t ) up to order Ω12superscriptΩ12\Omega^{-\frac{1}{2}}roman_Ω start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, we split the SDE (3) into one deterministic ODE with the solution 𝒔¯(t)¯𝒔𝑡\bar{\boldsymbol{s}}(t)over¯ start_ARG bold_italic_s end_ARG ( italic_t ) as shown in Equation (5),

d𝒔¯(t)=𝝁(𝒔¯(t);𝜽)dt𝑑¯𝒔𝑡𝝁¯𝒔𝑡𝜽𝑑𝑡d\bar{\boldsymbol{s}}(t)=\boldsymbol{\mu}(\bar{\boldsymbol{s}}(t);\boldsymbol{% \theta})dtitalic_d over¯ start_ARG bold_italic_s end_ARG ( italic_t ) = bold_italic_μ ( over¯ start_ARG bold_italic_s end_ARG ( italic_t ) ; bold_italic_θ ) italic_d italic_t (5)

with initial value 𝒔¯(0)¯𝒔0\bar{\boldsymbol{s}}(0)over¯ start_ARG bold_italic_s end_ARG ( 0 ), and one SDE with its solution 𝝃(t)𝝃𝑡\boldsymbol{\xi}(t)bold_italic_ξ ( italic_t ) following a Gaussian distribution for any fixed or Gaussian distributed initial condition on 𝝃(0)𝝃0\boldsymbol{\xi}(0)bold_italic_ξ ( 0 ), denoting by 𝝃(t)𝒩(𝝋(t),𝚿(t))similar-to𝝃𝑡𝒩𝝋𝑡𝚿𝑡\boldsymbol{\xi}(t)\sim\mathcal{N}\left(\boldsymbol{\varphi}(t),\boldsymbol{% \Psi}(t)\right)bold_italic_ξ ( italic_t ) ∼ caligraphic_N ( bold_italic_φ ( italic_t ) , bold_Ψ ( italic_t ) ). And its mean vector 𝝋(t)𝝋𝑡\boldsymbol{\varphi}(t)bold_italic_φ ( italic_t ) and covariance matrix 𝚿(t)𝚿𝑡\boldsymbol{\Psi}(t)bold_Ψ ( italic_t ) for any t0𝑡0t\geq 0italic_t ≥ 0 can be obtained by solving the ODEs in (6) and (7),

d𝝋(t)𝑑𝝋𝑡\displaystyle d\boldsymbol{\varphi}(t)italic_d bold_italic_φ ( italic_t ) =𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(t)𝝋(t)dt,absentevaluated-atsubscript𝒔𝝁𝒔𝜽𝒔¯𝒔𝑡𝝋𝑡𝑑𝑡\displaystyle=\nabla_{\boldsymbol{s}}\boldsymbol{\mu}(\boldsymbol{s};% \boldsymbol{\theta})|_{\boldsymbol{s}=\bar{\boldsymbol{s}}(t)}\boldsymbol{% \varphi}(t)dt,= ∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t ) end_POSTSUBSCRIPT bold_italic_φ ( italic_t ) italic_d italic_t , (6)
d𝚿(t)𝑑𝚿𝑡\displaystyle d\boldsymbol{\Psi}(t)italic_d bold_Ψ ( italic_t ) ={𝚿(t)(𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(t))+𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(t)𝚿(t)+𝑫(𝒔¯(t);𝜽)}dt,absent𝚿𝑡superscriptevaluated-atsubscript𝒔𝝁𝒔𝜽𝒔¯𝒔𝑡topevaluated-atsubscript𝒔𝝁𝒔𝜽𝒔¯𝒔𝑡𝚿𝑡𝑫¯𝒔𝑡𝜽𝑑𝑡\displaystyle=\left\{\boldsymbol{\Psi}(t)\left(\nabla_{\boldsymbol{s}}% \boldsymbol{\mu}(\boldsymbol{s};\boldsymbol{\theta})|_{\boldsymbol{s}=\bar{% \boldsymbol{s}}(t)}\right)^{\top}+\nabla_{\boldsymbol{s}}\boldsymbol{\mu}(% \boldsymbol{s};\boldsymbol{\theta})|_{\boldsymbol{s}=\bar{\boldsymbol{s}}(t)}% \boldsymbol{\Psi}(t)+\boldsymbol{D}(\bar{\boldsymbol{s}}(t);\boldsymbol{\theta% })\right\}dt,= { bold_Ψ ( italic_t ) ( ∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t ) end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t ) end_POSTSUBSCRIPT bold_Ψ ( italic_t ) + bold_italic_D ( over¯ start_ARG bold_italic_s end_ARG ( italic_t ) ; bold_italic_θ ) } italic_d italic_t , (7)

with initial values 𝝋(0)𝝋0\boldsymbol{\varphi}(0)bold_italic_φ ( 0 ) and 𝚿(0)𝚿0\boldsymbol{\Psi}(0)bold_Ψ ( 0 ). Without loss of generality, in the following discussion, we simplify the notation and assume an unit system size Ω=1Ω1\Omega=1roman_Ω = 1. Suppose the initial condition for the SDE (3) with Ω=1Ω1\Omega=1roman_Ω = 1 is 𝒔(0)𝒩(𝜶(0),𝜷(0))similar-to𝒔0𝒩superscript𝜶0superscript𝜷0\boldsymbol{s}(0)\sim\mathcal{N}\left(\boldsymbol{\alpha}^{*}(0),\boldsymbol{% \beta}^{*}(0)\right)bold_italic_s ( 0 ) ∼ caligraphic_N ( bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) , bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) ), then for arbitrary 𝒔¯(0)¯𝒔0\bar{\boldsymbol{s}}(0)over¯ start_ARG bold_italic_s end_ARG ( 0 ), we can set 𝝋(0)=𝜶(0)𝒔¯(0)𝝋0superscript𝜶0¯𝒔0\boldsymbol{\varphi}(0)=\boldsymbol{\alpha}^{*}(0)-\bar{\boldsymbol{s}}(0)bold_italic_φ ( 0 ) = bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) - over¯ start_ARG bold_italic_s end_ARG ( 0 ) and 𝚿(0)=𝜷(0)𝚿0superscript𝜷0\boldsymbol{\Psi}(0)=\boldsymbol{\beta}^{*}(0)bold_Ψ ( 0 ) = bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ). Integrating the ODEs (5), (6), and (7) through time 0 to t𝑡titalic_t provides the LNA

𝒔(t)𝒩(𝒔¯(t)+𝝋(t),𝚿(t)).similar-to𝒔𝑡𝒩¯𝒔𝑡𝝋𝑡𝚿𝑡\boldsymbol{s}(t)\sim\mathcal{N}\left(\bar{\boldsymbol{s}}(t)+\boldsymbol{% \varphi}(t),\boldsymbol{\Psi}(t)\right).bold_italic_s ( italic_t ) ∼ caligraphic_N ( over¯ start_ARG bold_italic_s end_ARG ( italic_t ) + bold_italic_φ ( italic_t ) , bold_Ψ ( italic_t ) ) . (8)

Under the LNA model (8) on the partially observed state 𝒔(th)𝒔subscript𝑡\boldsymbol{s}(t_{h})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) with measurement error ϵh(th)subscriptbold-italic-ϵsubscript𝑡\boldsymbol{\epsilon}_{h}(t_{h})bold_italic_ϵ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) as shown in (4) for h=0,1,,H01𝐻h=0,1,\ldots,Hitalic_h = 0 , 1 , … , italic_H, the likelihood of the observations 𝒟Msubscript𝒟𝑀\mathcal{D}_{M}caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is tractable. In particular, the ODE components of the LNA (i.e., Equations (5), (6), and (7)) are solved once over the entire time interval for given initial values. However, LNA can lead to a poor approximation to the true 𝒔(t)𝒔𝑡\boldsymbol{s}(t)bold_italic_s ( italic_t ), due to the approximation error between the true solution to the SDE (3) and the LNA (8) gradually accumulates as t𝑡titalic_t gets large.

To tackle this issue, we construct the likelihood of the observations 𝒟Msubscript𝒟𝑀\mathcal{D}_{M}caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT through using the updated LNA model at each observation time point thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT with h=0,1,,H01𝐻h=0,1,\ldots,Hitalic_h = 0 , 1 , … , italic_H. In particular, given an estimate of model parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and measure error level 𝝈𝝈\boldsymbol{\sigma}bold_italic_σ, we first set the LNA model (8) with the initial condition 𝒔(t0)𝒩(𝒔¯(t0)+𝝋(t0),𝚿(t0))similar-to𝒔subscript𝑡0𝒩¯𝒔subscript𝑡0𝝋subscript𝑡0𝚿subscript𝑡0\boldsymbol{s}(t_{0})\sim\mathcal{N}\left(\bar{\boldsymbol{s}}(t_{0})+% \boldsymbol{\varphi}(t_{0}),\boldsymbol{\Psi}(t_{0})\right)bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∼ caligraphic_N ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) as a prior, and then the observations 𝒚h(th)𝒟Msubscript𝒚subscript𝑡subscript𝒟𝑀\boldsymbol{y}_{h}(t_{h})\in\mathcal{D}_{M}bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ∈ caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT are used to sequentially update the prior on 𝒔(th)𝒔subscript𝑡\boldsymbol{s}(t_{h})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) for each thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT with the procedure shown in Figure 1(b). Therefore, we can approximate the distribution of 𝒚h(th)subscript𝒚subscript𝑡\boldsymbol{y}_{h}(t_{h})bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) given all observations up to time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and obtain the likelihood. The detailed procedure is summarized in the following three steps.

Step 1: At the initial observation time point t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, given the prior 𝒔(t0)𝒩(𝒔¯(t0)+𝝋(t0),𝚿(t0))similar-to𝒔subscript𝑡0𝒩¯𝒔subscript𝑡0𝝋subscript𝑡0𝚿subscript𝑡0\boldsymbol{s}(t_{0})\sim\mathcal{N}\left(\bar{\boldsymbol{s}}(t_{0})+% \boldsymbol{\varphi}(t_{0}),\boldsymbol{\Psi}(t_{0})\right)bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∼ caligraphic_N ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) and the observational uncertainty (4), we can directly have

𝒚0(t0)|𝝈𝒩(𝑮0{𝒔¯(t0)+𝝋(t0)},𝑮0𝚿(t0)𝑮0+𝚺0).similar-toconditionalsubscript𝒚0subscript𝑡0𝝈𝒩subscript𝑮0¯𝒔subscript𝑡0𝝋subscript𝑡0subscript𝑮0𝚿subscript𝑡0superscriptsubscript𝑮0topsubscript𝚺0\boldsymbol{y}_{0}(t_{0})|\boldsymbol{\sigma}\sim\mathcal{N}\left(\boldsymbol{% G}_{0}\left\{\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0})\right\},% \boldsymbol{G}_{0}\boldsymbol{\Psi}(t_{0})\boldsymbol{G}_{0}^{\top}+% \boldsymbol{\Sigma}_{0}\right).bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) | bold_italic_σ ∼ caligraphic_N ( bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } , bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (9)

Combining the LNA prior of 𝒔(t0)𝒔subscript𝑡0\boldsymbol{s}(t_{0})bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) with (9), we obtain the joint distribution of 𝒔(t0)𝒔subscript𝑡0\boldsymbol{s}(t_{0})bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and 𝒚0(t0)subscript𝒚0subscript𝑡0\boldsymbol{y}_{0}(t_{0})bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) as

(𝒔(t0)𝒚0(t0))|𝝈𝒩{(𝒔¯(t0)+𝝋(t0)𝑮0{𝒔¯(t0)+𝝋(t0)}),(𝚿(t0)𝚿(t0)𝑮0𝑮0𝚿(t0)𝑮0𝚿(t0)𝑮0+𝚺0)}.similar-toconditionalmatrix𝒔subscript𝑡0subscript𝒚0subscript𝑡0𝝈𝒩matrix¯𝒔subscript𝑡0𝝋subscript𝑡0subscript𝑮0¯𝒔subscript𝑡0𝝋subscript𝑡0matrix𝚿subscript𝑡0𝚿subscript𝑡0superscriptsubscript𝑮0topsubscript𝑮0𝚿subscript𝑡0subscript𝑮0𝚿subscript𝑡0superscriptsubscript𝑮0topsubscript𝚺0\begin{pmatrix}\boldsymbol{s}(t_{0})\\ \boldsymbol{y}_{0}(t_{0})\end{pmatrix}\bigg{|}\boldsymbol{\sigma}\sim\mathcal{% N}\left\{\begin{pmatrix}\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0}% )\\ \boldsymbol{G}_{0}\left\{\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0% })\right\}\end{pmatrix},\begin{pmatrix}\boldsymbol{\Psi}(t_{0})&\boldsymbol{% \Psi}(t_{0})\boldsymbol{G}_{0}^{\top}\\ \boldsymbol{G}_{0}\boldsymbol{\Psi}(t_{0})&\boldsymbol{G}_{0}\boldsymbol{\Psi}% (t_{0})\boldsymbol{G}_{0}^{\top}+\boldsymbol{\Sigma}_{0}\end{pmatrix}\right\}.( start_ARG start_ROW start_CELL bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) | bold_italic_σ ∼ caligraphic_N { ( start_ARG start_ROW start_CELL over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) } .

By applying the conditional distribution properties of multivariate Gaussian distribution, the posterior distribution of 𝒔(t0)𝒔subscript𝑡0\boldsymbol{s}(t_{0})bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is updated based on the observation 𝒚0(t0)subscript𝒚0subscript𝑡0\boldsymbol{y}_{0}(t_{0})bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), i.e.,

𝒔(t0)|𝒚0(t0);𝝈𝒩(𝜶(t0),𝜷(t0)),similar-toconditional𝒔subscript𝑡0subscript𝒚0subscript𝑡0𝝈𝒩𝜶subscript𝑡0𝜷subscript𝑡0\boldsymbol{s}(t_{0})|\boldsymbol{y}_{0}(t_{0});\boldsymbol{\sigma}\sim% \mathcal{N}\left(\boldsymbol{\alpha}(t_{0}),\boldsymbol{\beta}(t_{0})\right),bold_italic_s ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_σ ∼ caligraphic_N ( bold_italic_α ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , bold_italic_β ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) , (10)

where

𝜶(t0)𝜶subscript𝑡0\displaystyle\boldsymbol{\alpha}(t_{0})bold_italic_α ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =𝒔¯(t0)+𝝋(t0)+𝚿(t0)𝑮0(𝑮0𝚿(t0)𝑮0+𝚺0)1(𝒚0(t0)𝑮0{𝒔¯(t0)+𝝋(t0)}),absent¯𝒔subscript𝑡0𝝋subscript𝑡0𝚿subscript𝑡0superscriptsubscript𝑮0topsuperscriptsubscript𝑮0𝚿subscript𝑡0superscriptsubscript𝑮0topsubscript𝚺01subscript𝒚0subscript𝑡0subscript𝑮0¯𝒔subscript𝑡0𝝋subscript𝑡0\displaystyle=\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0})+% \boldsymbol{\Psi}(t_{0})\boldsymbol{G}_{0}^{\top}\left(\boldsymbol{G}_{0}% \boldsymbol{\Psi}(t_{0})\boldsymbol{G}_{0}^{\top}+\boldsymbol{\Sigma}_{0}% \right)^{-1}\left(\boldsymbol{y}_{0}(t_{0})-\boldsymbol{G}_{0}\left\{\bar{% \boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0})\right\}\right),= over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } ) , (11)
𝜷(t0)𝜷subscript𝑡0\displaystyle\boldsymbol{\beta}(t_{0})bold_italic_β ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =𝚿(t0)𝚿(t0)𝑮0(𝑮0𝚿(t0)𝑮0+𝚺0)1𝑮0𝚿(t0).absent𝚿subscript𝑡0𝚿subscript𝑡0superscriptsubscript𝑮0topsuperscriptsubscript𝑮0𝚿subscript𝑡0superscriptsubscript𝑮0topsubscript𝚺01subscript𝑮0𝚿subscript𝑡0\displaystyle=\boldsymbol{\Psi}(t_{0})-\boldsymbol{\Psi}(t_{0})\boldsymbol{G}_% {0}^{\top}\left(\boldsymbol{G}_{0}\boldsymbol{\Psi}(t_{0})\boldsymbol{G}_{0}^{% \top}+\boldsymbol{\Sigma}_{0}\right)^{-1}\boldsymbol{G}_{0}\boldsymbol{\Psi}(t% _{0}).= bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (12)

Step 2: For the subsequent observation time points t1,t2,,tHsubscript𝑡1subscript𝑡2subscript𝑡𝐻t_{1},t_{2},\ldots,t_{H}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT, we apply the idea of Kalman filter to sequentially update the LNA prior for 𝒔(th+1)𝒔subscript𝑡1\boldsymbol{s}(t_{h+1})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) and calculate the approximate p(𝒚h+1(th+1)|𝒚h(th),,𝒚0(t0);𝜽,𝝈)𝑝conditionalsubscript𝒚1subscript𝑡1subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈p\left(\boldsymbol{y}_{h+1}(t_{h+1})|\boldsymbol{y}_{h}(t_{h}),\ldots,% \boldsymbol{y}_{0}(t_{0});\boldsymbol{\theta},\boldsymbol{\sigma}\right)italic_p ( bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ ) recursively for h=0,1,,H101𝐻1h=0,1,\ldots,H-1italic_h = 0 , 1 , … , italic_H - 1. Specifically, we first reinitialize the initial values of the ODEs (5) and (7) to the posterior mean and covariance of 𝒔(th)𝒔subscript𝑡\boldsymbol{s}(t_{h})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) respectively. That is, set 𝒔¯(th)=𝜶(th)¯𝒔subscript𝑡𝜶subscript𝑡\bar{\boldsymbol{s}}(t_{h})=\boldsymbol{\alpha}(t_{h})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) = bold_italic_α ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) and 𝚿(th)=𝜷(th)𝚿subscript𝑡𝜷subscript𝑡\boldsymbol{\Psi}(t_{h})=\boldsymbol{\beta}(t_{h})bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) = bold_italic_β ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ). We let 𝝋(th)=0𝝋subscript𝑡0\boldsymbol{\varphi}(t_{h})=0bold_italic_φ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) = 0 as 𝝋(tk)=0𝝋subscript𝑡𝑘0\boldsymbol{\varphi}(t_{k})=0bold_italic_φ ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = 0 for all kh𝑘k\geq hitalic_k ≥ italic_h according to the ODE (6). By integrating the ODEs (5) and (7) through time thsubscript𝑡t_{h}italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT to th+1subscript𝑡1t_{h+1}italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT, we obtain 𝒔¯(th+1)¯𝒔subscript𝑡1\bar{\boldsymbol{s}}(t_{h+1})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) and 𝚿(th+1)𝚿subscript𝑡1\boldsymbol{\Psi}(t_{h+1})bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ). In practice we work with their discretized versions, given by the Euler method,

𝒔¯(th+1)¯𝒔subscript𝑡1\displaystyle\bar{\boldsymbol{s}}(t_{h+1})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) =𝒔¯(th)+𝝁(𝒔¯(th);𝜽)Δth,absent¯𝒔subscript𝑡𝝁¯𝒔subscript𝑡𝜽Δsubscript𝑡\displaystyle=\bar{\boldsymbol{s}}(t_{h})+\boldsymbol{\mu}(\bar{\boldsymbol{s}% }(t_{h});\boldsymbol{\theta})\Delta t_{h},= over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + bold_italic_μ ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ; bold_italic_θ ) roman_Δ italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , (13)
𝚿(th+1)𝚿subscript𝑡1\displaystyle\boldsymbol{\Psi}(t_{h+1})bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) =𝚿(th)+{𝚿(th)(𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(th))+𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(th)𝚿(th)+𝑫(𝒔¯(th);𝜽)}Δth.absent𝚿subscript𝑡𝚿subscript𝑡superscriptevaluated-atsubscript𝒔𝝁𝒔𝜽𝒔¯𝒔subscript𝑡topevaluated-atsubscript𝒔𝝁𝒔𝜽𝒔¯𝒔subscript𝑡𝚿subscript𝑡𝑫¯𝒔subscript𝑡𝜽Δsubscript𝑡\displaystyle=\boldsymbol{\Psi}(t_{h})+\left\{\boldsymbol{\Psi}(t_{h})\left(% \nabla_{\boldsymbol{s}}\boldsymbol{\mu}(\boldsymbol{s};\boldsymbol{\theta})|_{% \boldsymbol{s}=\bar{\boldsymbol{s}}(t_{h})}\right)^{\top}+\nabla_{\boldsymbol{% s}}\boldsymbol{\mu}(\boldsymbol{s};\boldsymbol{\theta})|_{\boldsymbol{s}=\bar{% \boldsymbol{s}}(t_{h})}\boldsymbol{\Psi}(t_{h})+\boldsymbol{D}(\bar{% \boldsymbol{s}}(t_{h});\boldsymbol{\theta})\right\}\Delta t_{h}.= bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + { bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ( ∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + bold_italic_D ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ; bold_italic_θ ) } roman_Δ italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT . (14)

As Δth=th+1thΔsubscript𝑡subscript𝑡1subscript𝑡\Delta t_{h}=t_{h+1}-t_{h}roman_Δ italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is often too large to be used as a time step in (13) and (14), we introduce Δzh=Δth/IhΔsubscript𝑧Δsubscript𝑡subscript𝐼\Delta z_{h}=\Delta t_{h}/I_{h}roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = roman_Δ italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT / italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT for some positive integer Ih1subscript𝐼1I_{h}\geq 1italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≥ 1. By choosing Ihsubscript𝐼I_{h}italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT to be sufficiently large, we can ensure the discretization error associated with the Euler method is arbitrarily small. That is, to compute 𝒔¯(th+1)¯𝒔subscript𝑡1\bar{\boldsymbol{s}}(t_{h+1})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) and 𝚿(th+1)𝚿subscript𝑡1\boldsymbol{\Psi}(t_{h+1})bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) more accurately, we recursively calculate the following equations for i=0,1,,Ih1𝑖01subscript𝐼1i=0,1,\ldots,I_{h}-1italic_i = 0 , 1 , … , italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - 1,

𝒔¯(th+(i+1)Δzh)¯𝒔subscript𝑡𝑖1Δsubscript𝑧\displaystyle\bar{\boldsymbol{s}}(t_{h}+(i+1)\Delta z_{h})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) =𝒔¯(th+iΔzh)+𝝁(𝒔¯(th+iΔzh);𝜽)Δzh,absent¯𝒔subscript𝑡𝑖Δsubscript𝑧𝝁¯𝒔subscript𝑡𝑖Δsubscript𝑧𝜽Δsubscript𝑧\displaystyle=\bar{\boldsymbol{s}}(t_{h}+i\Delta z_{h})+\boldsymbol{\mu}(\bar{% \boldsymbol{s}}(t_{h}+i\Delta z_{h});\boldsymbol{\theta})\Delta z_{h},= over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + bold_italic_μ ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ; bold_italic_θ ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , (15)
𝚿(th+(i+1)Δzh)𝚿subscript𝑡𝑖1Δsubscript𝑧\displaystyle\boldsymbol{\Psi}(t_{h}+(i+1)\Delta z_{h})bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) =𝚿(th+iΔzh)+{𝚿(th+iΔzh)(𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(th+iΔzh))+\displaystyle=\boldsymbol{\Psi}(t_{h}+i\Delta z_{h})+\left\{\boldsymbol{\Psi}(% t_{h}+i\Delta z_{h})\left(\nabla_{\boldsymbol{s}}\boldsymbol{\mu}(\boldsymbol{% s};\boldsymbol{\theta})|_{\boldsymbol{s}=\bar{\boldsymbol{s}}(t_{h}+i\Delta z_% {h})}\right)^{\top}+\right.= bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + { bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ( ∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT +
𝒔𝝁(𝒔;𝜽)|𝒔=𝒔¯(th+iΔzh)𝚿(th+iΔzh)+𝑫(𝒔¯(th+iΔzh);𝜽)}Δzh.\displaystyle\quad\left.\nabla_{\boldsymbol{s}}\boldsymbol{\mu}(\boldsymbol{s}% ;\boldsymbol{\theta})|_{\boldsymbol{s}=\bar{\boldsymbol{s}}(t_{h}+i\Delta z_{h% })}\boldsymbol{\Psi}(t_{h}+i\Delta z_{h})+\boldsymbol{D}(\bar{\boldsymbol{s}}(% t_{h}+i\Delta z_{h});\boldsymbol{\theta})\right\}\Delta z_{h}.∇ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT bold_italic_μ ( bold_italic_s ; bold_italic_θ ) | start_POSTSUBSCRIPT bold_italic_s = over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + bold_italic_D ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_i roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ; bold_italic_θ ) } roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT . (16)

Therefore, we get the updated LNA prior on 𝒔(th+1)𝒔subscript𝑡1\boldsymbol{s}(t_{h+1})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) by applying (8), i.e.,

𝒔(th+1)|𝒚h(th),,𝒚0(t0);𝜽,𝝈𝒩(𝒔¯(th+1),𝚿(th+1)).similar-toconditional𝒔subscript𝑡1subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈𝒩¯𝒔subscript𝑡1𝚿subscript𝑡1\boldsymbol{s}(t_{h+1})|\boldsymbol{y}_{h}(t_{h}),\ldots,\boldsymbol{y}_{0}(t_% {0});\boldsymbol{\theta},\boldsymbol{\sigma}\sim\mathcal{N}\left(\bar{% \boldsymbol{s}}(t_{h+1}),\boldsymbol{\Psi}(t_{h+1})\right).bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ ∼ caligraphic_N ( over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) , bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) ) . (17)

Here, LNA gives us a Gaussian approximation to the transition density from 𝒔(th)|𝒚h(th),,𝒚0(t0);𝜽,𝝈conditional𝒔subscript𝑡subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈\boldsymbol{s}(t_{h})|\boldsymbol{y}_{h}(t_{h}),\ldots,\boldsymbol{y}_{0}(t_{0% });\boldsymbol{\theta},\boldsymbol{\sigma}bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ to 𝒔(th+1)|𝒚h(th),,𝒚0(t0);𝜽,𝝈conditional𝒔subscript𝑡1subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈\boldsymbol{s}(t_{h+1})|\boldsymbol{y}_{h}(t_{h}),\ldots,\boldsymbol{y}_{0}(t_% {0});\boldsymbol{\theta},\boldsymbol{\sigma}bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ. Then, based on the model of measurement uncertainty or error in (4), we get a one-step forecast of the observation 𝒚h+1(th+1)subscript𝒚1subscript𝑡1\boldsymbol{y}_{h+1}(t_{h+1})bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) as

𝒚h+1(th+1)|𝒚h(th),,𝒚0(t0);𝜽,𝝈𝒩(𝑮h+1𝒔¯(th+1),𝑮h+1𝚿(th+1)𝑮h+1+𝚺h+1).similar-toconditionalsubscript𝒚1subscript𝑡1subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈𝒩subscript𝑮1¯𝒔subscript𝑡1subscript𝑮1𝚿subscript𝑡1superscriptsubscript𝑮1topsubscript𝚺1\boldsymbol{y}_{h+1}(t_{h+1})|\boldsymbol{y}_{h}(t_{h}),\ldots,\boldsymbol{y}_% {0}(t_{0});\boldsymbol{\theta},\boldsymbol{\sigma}\sim\mathcal{N}\left(% \boldsymbol{G}_{h+1}\bar{\boldsymbol{s}}(t_{h+1}),\boldsymbol{G}_{h+1}% \boldsymbol{\Psi}(t_{h+1})\boldsymbol{G}_{h+1}^{\top}+\boldsymbol{\Sigma}_{h+1% }\right).bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ ∼ caligraphic_N ( bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) , bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) . (18)

Combining the distributions (17) and (18), we obtain the joint distribution as

(𝒔(th+1)𝒚h+1(th+1))|𝒚h(th),,𝒚0(t0);𝜽,𝝈𝒩{(𝒔¯(th+1)𝑮h+1𝒔¯(th+1)),(𝚿(th+1)𝚿(th+1)𝑮h+1𝑮h+1𝚿(th+1)𝑮h+1𝚿(th+1)𝑮h+1+𝚺h+1)}.similar-toconditionalmatrix𝒔subscript𝑡1subscript𝒚1subscript𝑡1subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈𝒩matrix¯𝒔subscript𝑡1subscript𝑮1¯𝒔subscript𝑡1matrix𝚿subscript𝑡1𝚿subscript𝑡1superscriptsubscript𝑮1topsubscript𝑮1𝚿subscript𝑡1subscript𝑮1𝚿subscript𝑡1superscriptsubscript𝑮1topsubscript𝚺1\begin{pmatrix}\boldsymbol{s}(t_{h+1})\\ \boldsymbol{y}_{h+1}(t_{h+1})\end{pmatrix}\bigg{|}\boldsymbol{y}_{h}(t_{h}),% \ldots,\boldsymbol{y}_{0}(t_{0});\boldsymbol{\theta},\boldsymbol{\sigma}\sim% \mathcal{N}\left\{\begin{pmatrix}\bar{\boldsymbol{s}}(t_{h+1})\\ \boldsymbol{G}_{h+1}\bar{\boldsymbol{s}}(t_{h+1})\end{pmatrix},\begin{pmatrix}% \boldsymbol{\Psi}(t_{h+1})&\boldsymbol{\Psi}(t_{h+1})\boldsymbol{G}_{h+1}^{% \top}\\ \boldsymbol{G}_{h+1}\boldsymbol{\Psi}(t_{h+1})&\boldsymbol{G}_{h+1}\boldsymbol% {\Psi}(t_{h+1})\boldsymbol{G}_{h+1}^{\top}+\boldsymbol{\Sigma}_{h+1}\end{% pmatrix}\right\}.( start_ARG start_ROW start_CELL bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ ∼ caligraphic_N { ( start_ARG start_ROW start_CELL over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) end_CELL start_CELL bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) } .

Thus, the posterior distribution of 𝒔(th+1)𝒔subscript𝑡1\boldsymbol{s}(t_{h+1})bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) becomes

𝒔(th+1)|𝒚h+1(th+1),,𝒚0(t0);𝜽,𝝈𝒩(𝜶(th+1),𝜷(th+1)),similar-toconditional𝒔subscript𝑡1subscript𝒚1subscript𝑡1subscript𝒚0subscript𝑡0𝜽𝝈𝒩𝜶subscript𝑡1𝜷subscript𝑡1\boldsymbol{s}(t_{h+1})|\boldsymbol{y}_{h+1}(t_{h+1}),\ldots,\boldsymbol{y}_{0% }(t_{0});\boldsymbol{\theta},\boldsymbol{\sigma}\sim\mathcal{N}\left(% \boldsymbol{\alpha}(t_{h+1}),\boldsymbol{\beta}(t_{h+1})\right),bold_italic_s ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ ∼ caligraphic_N ( bold_italic_α ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) , bold_italic_β ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) ) , (19)

where

𝜶(th+1)𝜶subscript𝑡1\displaystyle\boldsymbol{\alpha}(t_{h+1})bold_italic_α ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) =𝒔¯(th+1)+𝚿(th+1)𝑮h+1(𝑮h+1𝚿(th+1)𝑮h+1+𝚺h+1)1(𝒚h+1(th+1)𝑮h+1𝒔¯(th+1)),absent¯𝒔subscript𝑡1𝚿subscript𝑡1superscriptsubscript𝑮1topsuperscriptsubscript𝑮1𝚿subscript𝑡1superscriptsubscript𝑮1topsubscript𝚺11subscript𝒚1subscript𝑡1subscript𝑮1¯𝒔subscript𝑡1\displaystyle=\bar{\boldsymbol{s}}(t_{h+1})+\boldsymbol{\Psi}(t_{h+1})% \boldsymbol{G}_{h+1}^{\top}\left(\boldsymbol{G}_{h+1}\boldsymbol{\Psi}(t_{h+1}% )\boldsymbol{G}_{h+1}^{\top}+\boldsymbol{\Sigma}_{h+1}\right)^{-1}\left(% \boldsymbol{y}_{h+1}(t_{h+1})-\boldsymbol{G}_{h+1}\bar{\boldsymbol{s}}(t_{h+1}% )\right),= over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) + bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) - bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) ) , (20)
𝜷(th+1)𝜷subscript𝑡1\displaystyle\boldsymbol{\beta}(t_{h+1})bold_italic_β ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) =𝚿(th+1)𝚿(th+1)𝑮h+1(𝑮h+1𝚿(th+1)𝑮h+1+𝚺h+1)1𝑮h+1𝚿(th+1).absent𝚿subscript𝑡1𝚿subscript𝑡1superscriptsubscript𝑮1topsuperscriptsubscript𝑮1𝚿subscript𝑡1superscriptsubscript𝑮1topsubscript𝚺11subscript𝑮1𝚿subscript𝑡1\displaystyle=\boldsymbol{\Psi}(t_{h+1})-\boldsymbol{\Psi}(t_{h+1})\boldsymbol% {G}_{h+1}^{\top}\left(\boldsymbol{G}_{h+1}\boldsymbol{\Psi}(t_{h+1})% \boldsymbol{G}_{h+1}^{\top}+\boldsymbol{\Sigma}_{h+1}\right)^{-1}\boldsymbol{G% }_{h+1}\boldsymbol{\Psi}(t_{h+1}).= bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) - bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) . (21)

Step 3: From the distributions (9) and (18) for h=0,1,,H101𝐻1h=0,1,\ldots,H-1italic_h = 0 , 1 , … , italic_H - 1, the likelihood of the observations 𝒟Msubscript𝒟𝑀\mathcal{D}_{M}caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT can be calculated by the following decomposition,

p(𝒚0(t0),𝒚1(t1),,𝒚H(tH)|𝜽,𝝈)=p(𝒚0(t0)|𝜽,𝝈)h=0H1p(𝒚h+1(th+1)|𝒚h(th),,𝒚0(t0);𝜽,𝝈).𝑝subscript𝒚0subscript𝑡0subscript𝒚1subscript𝑡1conditionalsubscript𝒚𝐻subscript𝑡𝐻𝜽𝝈𝑝conditionalsubscript𝒚0subscript𝑡0𝜽𝝈superscriptsubscriptproduct0𝐻1𝑝conditionalsubscript𝒚1subscript𝑡1subscript𝒚subscript𝑡subscript𝒚0subscript𝑡0𝜽𝝈p\left(\boldsymbol{y}_{0}(t_{0}),\boldsymbol{y}_{1}(t_{1}),\ldots,\boldsymbol{% y}_{H}(t_{H})|\boldsymbol{\theta},\boldsymbol{\sigma}\right)=p\left(% \boldsymbol{y}_{0}(t_{0})|\boldsymbol{\theta},\boldsymbol{\sigma}\right)\prod_% {h=0}^{H-1}p\left(\boldsymbol{y}_{h+1}(t_{h+1})|\boldsymbol{y}_{h}(t_{h}),% \ldots,\boldsymbol{y}_{0}(t_{0});\boldsymbol{\theta},\boldsymbol{\sigma}\right).italic_p ( bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ) | bold_italic_θ , bold_italic_σ ) = italic_p ( bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) | bold_italic_θ , bold_italic_σ ) ∏ start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT italic_p ( bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) | bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) , … , bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; bold_italic_θ , bold_italic_σ ) . (22)

4 BAYESIAN ANALYSIS AND Algorithm Development

In this section, we simultaneously infer the model parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and the measurement error level 𝝈𝝈\boldsymbol{\sigma}bold_italic_σ from the observations 𝒟Msubscript𝒟𝑀\mathcal{D}_{M}caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT. By applying the Bayes’ rule, we have the joint posterior distribution of 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and 𝝈𝝈\boldsymbol{\sigma}bold_italic_σ,

p(𝜽,𝝈|𝒟M)𝑝𝜽conditional𝝈subscript𝒟𝑀\displaystyle p\left(\boldsymbol{\theta},\boldsymbol{\sigma}|\mathcal{D}_{M}\right)italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) p(𝜽)p(𝝈)exp{12[M|𝑱0|log(2π)+log|𝑮0𝚿(t0)𝑮0+𝚺0|\displaystyle\propto p\left(\boldsymbol{\theta}\right)p\left(\boldsymbol{% \sigma}\right)\exp\left\{-\frac{1}{2}\left[M|\boldsymbol{J}_{0}|\log(2\pi)+% \log\left|\boldsymbol{G}_{0}\boldsymbol{\Psi}(t_{0})\boldsymbol{G}_{0}^{\top}+% \boldsymbol{\Sigma}_{0}\right|\right.\right.∝ italic_p ( bold_italic_θ ) italic_p ( bold_italic_σ ) roman_exp { - divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ italic_M | bold_italic_J start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | roman_log ( 2 italic_π ) + roman_log | bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT |
+(𝒚0(t0)𝑮0{𝒔¯(t0)+𝝋(t0)})(𝑮0𝚿(t0)𝑮0+𝚺0)1(𝒚0(t0)𝑮0{𝒔¯(t0)+𝝋(t0)})]\displaystyle\quad\left.+\left(\boldsymbol{y}_{0}(t_{0})-\boldsymbol{G}_{0}% \left\{\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0})\right\}\right)^% {\top}\left(\boldsymbol{G}_{0}\boldsymbol{\Psi}(t_{0})\boldsymbol{G}_{0}^{\top% }+\boldsymbol{\Sigma}_{0}\right)^{-1}\left(\boldsymbol{y}_{0}(t_{0})-% \boldsymbol{G}_{0}\left\{\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0% })\right\}\right)\right]+ ( bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - bold_italic_G start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } ) ]
12h=0H1[M|𝑱h+1|log(2π)+log|𝑮h+1𝚿(th+1)𝑮h+1+𝚺h+1|\displaystyle\quad-\frac{1}{2}\sum_{h=0}^{H-1}\left[M|\boldsymbol{J}_{h+1}|% \log(2\pi)+\log\left|\boldsymbol{G}_{h+1}\boldsymbol{\Psi}(t_{h+1})\boldsymbol% {G}_{h+1}^{\top}+\boldsymbol{\Sigma}_{h+1}\right|\right.- divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT [ italic_M | bold_italic_J start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT | roman_log ( 2 italic_π ) + roman_log | bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT |
+(𝒚h+1(th+1)𝑮h+1𝒔¯(th+1))(𝑮h+1𝚿(th+1)𝑮h+1+𝚺h+1)1(𝒚h+1(th+1)𝑮h+1𝒔¯(th+1))]},\displaystyle\quad\left.\left.+\left(\boldsymbol{y}_{h+1}(t_{h+1})-\boldsymbol% {G}_{h+1}\bar{\boldsymbol{s}}(t_{h+1})\right)^{\top}\left(\boldsymbol{G}_{h+1}% \boldsymbol{\Psi}(t_{h+1})\boldsymbol{G}_{h+1}^{\top}+\boldsymbol{\Sigma}_{h+1% }\right)^{-1}\left(\boldsymbol{y}_{h+1}(t_{h+1})-\boldsymbol{G}_{h+1}\bar{% \boldsymbol{s}}(t_{h+1})\right)\right]\right\},+ ( bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) - bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + bold_Σ start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) - bold_italic_G start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) ) ] } , (23)

where p(𝜽)𝑝𝜽p\left(\boldsymbol{\theta}\right)italic_p ( bold_italic_θ ) and p(𝝈)𝑝𝝈p\left(\boldsymbol{\sigma}\right)italic_p ( bold_italic_σ ) are the priors for 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and 𝝈𝝈\boldsymbol{\sigma}bold_italic_σ respectively. By utilizing the joint posterior distribution p(𝜽,𝝈|𝒟M)𝑝𝜽conditional𝝈subscript𝒟𝑀p\left(\boldsymbol{\theta},\boldsymbol{\sigma}|\mathcal{D}_{M}\right)italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) in (23), we further develop a MALA procedure to efficiently generate posterior samples of the L𝐿Litalic_L-dimensional parameters 𝜼(𝜽,𝝈)𝜼superscriptsuperscript𝜽topsuperscript𝝈toptop\boldsymbol{\eta}\equiv\left(\boldsymbol{\theta}^{\top},\boldsymbol{\sigma}^{% \top}\right)^{\top}bold_italic_η ≡ ( bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , bold_italic_σ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT where L=N+|𝑱y|𝐿𝑁subscript𝑱𝑦L=N+|\boldsymbol{J}_{y}|italic_L = italic_N + | bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT |.

By utilizing the gradients of posterior (23), MALA generates more promising candidate samples at the parameter space with higher posterior probability. It improves the mixing of classic MCMC algorithm through utilizing a combination of two mechanisms, i.e., Langevin diffusion and Metropolis-Hastings step. Langevin diffusion is originally a gradient descent of a potential function (representing a force field in physics) plus a Brownian motion term accounting for thermodynamics. To overcome the limitation of random walk-based search strategies used in classic MCMC, we leverage on the information provided by the closed form posterior distribution p(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) and use Langevin diffusion to develop a more efficient posterior sampling approach. We construct a continuous-time stochastic process characterizing the Langevin diffusion-based posterior search. Specifically, we consider the following (overdamped) Langevin diffusion

d𝜼(τ)=𝜼logp(𝜼|𝒟M)|𝜼=𝜼(τ)dτ+2d𝑾(τ)𝑑𝜼𝜏evaluated-atsubscript𝜼𝑝conditional𝜼subscript𝒟𝑀𝜼𝜼𝜏𝑑𝜏2𝑑𝑾𝜏d\boldsymbol{\eta}(\tau)=\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{% \eta}|\mathcal{D}_{M}\right)|_{\boldsymbol{\eta}=\boldsymbol{\eta}(\tau)}d\tau% +\sqrt{2}d\boldsymbol{W}(\tau)italic_d bold_italic_η ( italic_τ ) = ∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT bold_italic_η = bold_italic_η ( italic_τ ) end_POSTSUBSCRIPT italic_d italic_τ + square-root start_ARG 2 end_ARG italic_d bold_italic_W ( italic_τ ) (24)

driven by the time derivative of an L𝐿Litalic_L-dimensional standard Brownian motion (i.e., d𝑾(τ)𝑑𝑾𝜏d\boldsymbol{W}(\tau)italic_d bold_italic_W ( italic_τ )). It speeds up the MCMC convergence through drifting the search with the gradient of the target log-posterior distribution (i.e., logp(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀\log p(\boldsymbol{\eta}|\mathcal{D}_{M})roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT )), which drives the random walk towards the parameter region with high posterior probability.

To numerically solve Equation (24) and generate posterior samples from p(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀p(\boldsymbol{\eta}|\mathcal{D}_{M})italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ), the Euler-Maruyama approximation [13] is used to obtain the discretized Langevin diffusion with a step size Δτ>0Δ𝜏0\Delta\tau>0roman_Δ italic_τ > 0,

𝜼(τ+1):=𝜼(τ)+𝜼logp(𝜼|𝒟M)|𝜼=𝜼(τ)Δτ+2Δ𝑾(τ),assign𝜼𝜏1𝜼𝜏evaluated-atsubscript𝜼𝑝conditional𝜼subscript𝒟𝑀𝜼𝜼𝜏Δ𝜏2Δ𝑾𝜏\boldsymbol{\eta}(\tau+1):=\boldsymbol{\eta}(\tau)+\nabla_{\boldsymbol{\eta}}% \log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)|_{\boldsymbol{\eta}=% \boldsymbol{\eta}(\tau)}\Delta\tau+\sqrt{2}\Delta\boldsymbol{W}(\tau),bold_italic_η ( italic_τ + 1 ) := bold_italic_η ( italic_τ ) + ∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT bold_italic_η = bold_italic_η ( italic_τ ) end_POSTSUBSCRIPT roman_Δ italic_τ + square-root start_ARG 2 end_ARG roman_Δ bold_italic_W ( italic_τ ) , (25)

where each Δ𝑾(τ)LΔ𝑾𝜏superscript𝐿\Delta\boldsymbol{W}(\tau)\in\mathbb{R}^{L}roman_Δ bold_italic_W ( italic_τ ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT is a Gaussian random vector with mean zero and covariance diag{Δτ}L×LΔ𝜏superscript𝐿𝐿\{\Delta\tau\}\in\mathbb{R}^{L\times L}{ roman_Δ italic_τ } ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_L end_POSTSUPERSCRIPT. The gradient of the log-posterior

𝜼logp(𝜼|𝒟M)subscript𝜼𝑝conditional𝜼subscript𝒟𝑀\displaystyle\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{% D}_{M}\right)∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) =({logp(𝜽,𝝈|𝒟M)θn,n[N]},{logp(𝜽,𝝈|𝒟M)σjj,j𝑱y})absentsuperscript𝑝𝜽conditional𝝈subscript𝒟𝑀subscript𝜃𝑛𝑛delimited-[]𝑁𝑝𝜽conditional𝝈subscript𝒟𝑀subscript𝜎𝑗𝑗𝑗subscript𝑱𝑦top\displaystyle=\left(\left\{\frac{\partial\log p\left(\boldsymbol{\theta},% \boldsymbol{\sigma}|\mathcal{D}_{M}\right)}{\partial\theta_{n}},n\in[N]\right% \},\left\{\frac{\partial\log p\left(\boldsymbol{\theta},\boldsymbol{\sigma}|% \mathcal{D}_{M}\right)}{\partial\sigma_{jj}},j\in\boldsymbol{J}_{y}\right\}% \right)^{\top}= ( { divide start_ARG ∂ roman_log italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , italic_n ∈ [ italic_N ] } , { divide start_ARG ∂ roman_log italic_p ( bold_italic_θ , bold_italic_σ | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT end_ARG , italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT } ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT

is tractable from Equation (23). In particular, we provide a recursive procedure in Algorithm 1 to compute p(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) and 𝜼logp(𝜼|𝒟M)subscript𝜼𝑝conditional𝜼subscript𝒟𝑀\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ).

Input: The priors p(𝜽)𝑝𝜽p\left(\boldsymbol{\theta}\right)italic_p ( bold_italic_θ ) and p(𝝈)𝑝𝝈p\left(\boldsymbol{\sigma}\right)italic_p ( bold_italic_σ ), observations 𝒟M={𝒚h(th)}h=0Hsubscript𝒟𝑀superscriptsubscriptsubscript𝒚subscript𝑡0𝐻\mathcal{D}_{M}=\{\boldsymbol{y}_{h}(t_{h})\}_{h=0}^{H}caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = { bold_italic_y start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_h = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT, ODE initial values 𝒔¯(t0)¯𝒔subscript𝑡0\bar{\boldsymbol{s}}(t_{0})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), 𝝋(t0)𝝋subscript𝑡0\boldsymbol{\varphi}(t_{0})bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and 𝚿(t0)𝚿subscript𝑡0\boldsymbol{\Psi}(t_{0})bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), constant matrices 𝑮hsubscript𝑮\boldsymbol{G}_{h}bold_italic_G start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, and appropriate positive integers Ihsubscript𝐼I_{h}italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT for h=0,1,,H101𝐻1h=0,1,\ldots,H-1italic_h = 0 , 1 , … , italic_H - 1.
Output: p(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) and 𝜼logp(𝜼|𝒟M)subscript𝜼𝑝conditional𝜼subscript𝒟𝑀\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ).
1. Calculate 𝜶(t0)𝜶subscript𝑡0\boldsymbol{\alpha}(t_{0})bold_italic_α ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and 𝜷(t0)𝜷subscript𝑡0\boldsymbol{\beta}(t_{0})bold_italic_β ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) by applying Equations (11) and (12);
2. Calculate 𝜶(t0)/θn𝜶subscript𝑡0subscript𝜃𝑛\partial\boldsymbol{\alpha}(t_{0})/\partial\theta_{n}∂ bold_italic_α ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, 𝜷(t0)/θn𝜷subscript𝑡0subscript𝜃𝑛\partial\boldsymbol{\beta}(t_{0})/\partial\theta_{n}∂ bold_italic_β ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for n[N]𝑛delimited-[]𝑁n\in[N]italic_n ∈ [ italic_N ], and 𝜶(t0)/σjj𝜶subscript𝑡0subscript𝜎𝑗𝑗\partial\boldsymbol{\alpha}(t_{0})/\partial\sigma_{jj}∂ bold_italic_α ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT, 𝜷(t0)/σjj𝜷subscript𝑡0subscript𝜎𝑗𝑗\partial\boldsymbol{\beta}(t_{0})/\partial\sigma_{jj}∂ bold_italic_β ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT for j𝑱y𝑗subscript𝑱𝑦j\in\boldsymbol{J}_{y}italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT;
for h=0,1,,H101𝐻1h=0,1,\ldots,H-1italic_h = 0 , 1 , … , italic_H - 1 do
       for i=0,1,,Ih1𝑖01subscript𝐼1i=0,1,\ldots,I_{h}-1italic_i = 0 , 1 , … , italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - 1 do
            3. Calculate 𝒔¯(th+(i+1)Δzh)¯𝒔subscript𝑡𝑖1Δsubscript𝑧\bar{\boldsymbol{s}}(t_{h}+(i+1)\Delta z_{h})over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) and 𝚿(th+(i+1)Δzh)𝚿subscript𝑡𝑖1Δsubscript𝑧\boldsymbol{\Psi}(t_{h}+(i+1)\Delta z_{h})bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) by applying Equations (15) and (16);
             4. Calculate 𝒔¯(th+(i+1)Δzh)/θn¯𝒔subscript𝑡𝑖1Δsubscript𝑧subscript𝜃𝑛\partial\bar{\boldsymbol{s}}(t_{h}+(i+1)\Delta z_{h})/\partial\theta_{n}∂ over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, 𝚿(th+(i+1)Δzh)/θn𝚿subscript𝑡𝑖1Δsubscript𝑧subscript𝜃𝑛\partial\boldsymbol{\Psi}(t_{h}+(i+1)\Delta z_{h})/\partial\theta_{n}∂ bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for n[N]𝑛delimited-[]𝑁n\in[N]italic_n ∈ [ italic_N ], and 𝒔¯(th+(i+1)Δzh)/σjj¯𝒔subscript𝑡𝑖1Δsubscript𝑧subscript𝜎𝑗𝑗\partial\bar{\boldsymbol{s}}(t_{h}+(i+1)\Delta z_{h})/\partial\sigma_{jj}∂ over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT, 𝚿(th+(i+1)Δzh)/σjj𝚿subscript𝑡𝑖1Δsubscript𝑧subscript𝜎𝑗𝑗\partial\boldsymbol{\Psi}(t_{h}+(i+1)\Delta z_{h})/\partial\sigma_{jj}∂ bold_Ψ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + ( italic_i + 1 ) roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT for j𝑱y𝑗subscript𝑱𝑦j\in\boldsymbol{J}_{y}italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT;
      5. Calculate 𝜶(th+1)𝜶subscript𝑡1\boldsymbol{\alpha}(t_{h+1})bold_italic_α ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) and 𝜷(th+1)𝜷subscript𝑡1\boldsymbol{\beta}(t_{h+1})bold_italic_β ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) by applying Equations (20) and (21);
       6. Calculate 𝜶(th+1)/θn𝜶subscript𝑡1subscript𝜃𝑛\partial\boldsymbol{\alpha}(t_{h+1})/\partial\theta_{n}∂ bold_italic_α ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, 𝜷(th+1)/θn𝜷subscript𝑡1subscript𝜃𝑛\partial\boldsymbol{\beta}(t_{h+1})/\partial\theta_{n}∂ bold_italic_β ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for n[N]𝑛delimited-[]𝑁n\in[N]italic_n ∈ [ italic_N ], and 𝜶(th+1)/σjj𝜶subscript𝑡1subscript𝜎𝑗𝑗\partial\boldsymbol{\alpha}(t_{h+1})/\partial\sigma_{jj}∂ bold_italic_α ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT, 𝜷(th+1)/σjj𝜷subscript𝑡1subscript𝜎𝑗𝑗\partial\boldsymbol{\beta}(t_{h+1})/\partial\sigma_{jj}∂ bold_italic_β ( italic_t start_POSTSUBSCRIPT italic_h + 1 end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT for j𝑱y𝑗subscript𝑱𝑦j\in\boldsymbol{J}_{y}italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT;
      
7. Return p(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) by applying Equation (23), and 𝜼logp(𝜼|𝒟M)subscript𝜼𝑝conditional𝜼subscript𝒟𝑀\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) by calculating logp(𝜼|𝒟M)/θn𝑝conditional𝜼subscript𝒟𝑀subscript𝜃𝑛\partial\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)/\partial\theta_{n}∂ roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) / ∂ italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for n[N]𝑛delimited-[]𝑁n\in[N]italic_n ∈ [ italic_N ] and logp(𝜼|𝒟M)/σjj𝑝conditional𝜼subscript𝒟𝑀subscript𝜎𝑗𝑗\partial\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)/\partial\sigma_{jj}∂ roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) / ∂ italic_σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT for j𝑱y𝑗subscript𝑱𝑦j\in\boldsymbol{J}_{y}italic_j ∈ bold_italic_J start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT.
Algorithm 1 Computing p(𝜼|𝒟M)𝑝conditional𝜼subscript𝒟𝑀p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) and 𝜼logp(𝜼|𝒟M)subscript𝜼𝑝conditional𝜼subscript𝒟𝑀\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ).

To correct the bias in the stationary distribution induced by the discretization used in the update rule (25), a Metropolis-Hastings step is incorporated for simulating the Langevin diffusion (24). Specifically, we consider the update rule (25) and define a proposal distribution to generate a new MCMC posterior sample 𝜼~(τ+1)~𝜼𝜏1\tilde{\boldsymbol{\eta}}(\tau+1)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ),

𝜼~(τ+1):=𝜼(τ)+𝜼logp(𝜼|𝒟M)|𝜼=𝜼(τ)Δτ+2Δ𝑾(τ).assign~𝜼𝜏1𝜼𝜏evaluated-atsubscript𝜼𝑝conditional𝜼subscript𝒟𝑀𝜼𝜼𝜏Δ𝜏2Δ𝑾𝜏\tilde{\boldsymbol{\eta}}(\tau+1):=\boldsymbol{\eta}(\tau)+\nabla_{\boldsymbol% {\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)|_{\boldsymbol{\eta% }=\boldsymbol{\eta}(\tau)}\Delta\tau+\sqrt{2}\Delta\boldsymbol{W}(\tau).over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) := bold_italic_η ( italic_τ ) + ∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT bold_italic_η = bold_italic_η ( italic_τ ) end_POSTSUBSCRIPT roman_Δ italic_τ + square-root start_ARG 2 end_ARG roman_Δ bold_italic_W ( italic_τ ) . (26)

Thus, the MCMC conditional sampling distribution 𝜼~(τ+1)|𝜼(τ)conditional~𝜼𝜏1𝜼𝜏\tilde{\boldsymbol{\eta}}(\tau+1)|\boldsymbol{\eta}(\tau)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) | bold_italic_η ( italic_τ ) is Gaussian distributed with mean 𝜼(τ)+𝜼logp(𝜼|𝒟M)|𝜼=𝜼(τ)Δτ𝜼𝜏evaluated-atsubscript𝜼𝑝conditional𝜼subscript𝒟𝑀𝜼𝜼𝜏Δ𝜏\boldsymbol{\eta}(\tau)+\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta% }|\mathcal{D}_{M}\right)|_{\boldsymbol{\eta}=\boldsymbol{\eta}(\tau)}\Delta\taubold_italic_η ( italic_τ ) + ∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT bold_italic_η = bold_italic_η ( italic_τ ) end_POSTSUBSCRIPT roman_Δ italic_τ and covariance diag{2Δτ}L×L2Δ𝜏superscript𝐿𝐿\{2\Delta\tau\}\in\mathbb{R}^{L\times L}{ 2 roman_Δ italic_τ } ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_L end_POSTSUPERSCRIPT. Then, the candidate sample 𝜼~(τ+1)~𝜼𝜏1\tilde{\boldsymbol{\eta}}(\tau+1)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) generated from this proposal is accepted with the ratio,

γacc:=min{1,p(𝜼~(τ+1)|𝒟M)qtrans(𝜼(τ)|𝜼~(τ+1))p(𝜼(τ)|𝒟M)qtrans(𝜼~(τ+1)|𝜼(τ))},assignsubscript𝛾acc1𝑝conditional~𝜼𝜏1subscript𝒟𝑀subscript𝑞transconditional𝜼𝜏~𝜼𝜏1𝑝conditional𝜼𝜏subscript𝒟𝑀subscript𝑞transconditional~𝜼𝜏1𝜼𝜏\gamma_{\rm acc}:=\min\left\{1,\frac{p\left(\tilde{\boldsymbol{\eta}}(\tau+1)|% \mathcal{D}_{M}\right)q_{\rm trans}\left(\boldsymbol{\eta}(\tau)\bigg{|}\tilde% {\boldsymbol{\eta}}(\tau+1)\right)}{p\left(\boldsymbol{\eta}(\tau)|\mathcal{D}% _{M}\right)q_{\rm trans}\left(\tilde{\boldsymbol{\eta}}(\tau+1)\bigg{|}% \boldsymbol{\eta}(\tau)\right)}\right\},italic_γ start_POSTSUBSCRIPT roman_acc end_POSTSUBSCRIPT := roman_min { 1 , divide start_ARG italic_p ( over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) italic_q start_POSTSUBSCRIPT roman_trans end_POSTSUBSCRIPT ( bold_italic_η ( italic_τ ) | over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) ) end_ARG start_ARG italic_p ( bold_italic_η ( italic_τ ) | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) italic_q start_POSTSUBSCRIPT roman_trans end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) | bold_italic_η ( italic_τ ) ) end_ARG } , (27)

where the proposal distribution qtrans(𝜼|𝜼)exp{14Δτ||𝜼𝜼𝜼logp(𝜼|𝒟M)Δτ||2}q_{\rm trans}(\boldsymbol{\eta}^{\prime}|\boldsymbol{\eta})\propto\exp\left\{-% \frac{1}{4\Delta\tau}\bigg{|}\bigg{|}\boldsymbol{\eta}^{\prime}-\boldsymbol{% \eta}-\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}% \right)\Delta\tau\bigg{|}\bigg{|}^{2}\right\}italic_q start_POSTSUBSCRIPT roman_trans end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_italic_η ) ∝ roman_exp { - divide start_ARG 1 end_ARG start_ARG 4 roman_Δ italic_τ end_ARG | | bold_italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_italic_η - ∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) roman_Δ italic_τ | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (||||||\cdot||| | ⋅ | | denotes the Euclidean norm) is the transition density from 𝜼𝜼\boldsymbol{\eta}bold_italic_η to 𝜼superscript𝜼\boldsymbol{\eta}^{\prime}bold_italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT obtained from Equation (26).

In sum, we provide the MALA procedure in Algorithm 2 to generate posterior samples for the enzymatic SRN mechanistic model parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and the measurement error level 𝝈𝝈\boldsymbol{\sigma}bold_italic_σ together. Within each τ𝜏\tauitalic_τ-th iteration of MALA joint posterior sampler, given the previous sample 𝜼(τ)𝜼𝜏\boldsymbol{\eta}(\tau)bold_italic_η ( italic_τ ), we compute and generate one proposal sample 𝜼~(τ+1)~𝜼𝜏1\tilde{\boldsymbol{\eta}}(\tau+1)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) from the discretized Langevin diffusion (26), and accept it with the Metropolis-Hastings ratio (27). By repeating this procedure, 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and 𝝈𝝈\boldsymbol{\sigma}bold_italic_σ are updated together with a joint gradient at each iteration, and we thus get samples 𝜼(τ)=(𝜽(τ),𝝈(τ))𝜼𝜏superscriptsuperscript𝜽top𝜏superscript𝝈top𝜏top\boldsymbol{\eta}(\tau)=\left(\boldsymbol{\theta}^{\top}(\tau),\boldsymbol{% \sigma}^{\top}(\tau)\right)^{\top}bold_italic_η ( italic_τ ) = ( bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_τ ) , bold_italic_σ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_τ ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT with τ=1,2,,T0+(B1)δ𝜏12subscript𝑇0𝐵1𝛿\tau=1,2,\ldots,T_{0}+(B-1)\deltaitalic_τ = 1 , 2 , … , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( italic_B - 1 ) italic_δ. To reduce the initial bias and correlations between consecutive samples, we discard an appropriate burn-in period for convergence (i.e., the first T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT samples) and then keep one for every δ𝛿\deltaitalic_δ samples. Consequently, we obtain the posterior samples 𝜼(b)p(𝜼|𝒟M)similar-tosuperscript𝜼𝑏𝑝conditional𝜼subscript𝒟𝑀\boldsymbol{\eta}^{(b)}\sim p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)bold_italic_η start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∼ italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) with b=1,2,,B𝑏12𝐵b=1,2,\ldots,Bitalic_b = 1 , 2 , … , italic_B.

Input: The priors p(𝜽)𝑝𝜽p\left(\boldsymbol{\theta}\right)italic_p ( bold_italic_θ ) and p(𝝈)𝑝𝝈p\left(\boldsymbol{\sigma}\right)italic_p ( bold_italic_σ ), step size ΔτΔ𝜏\Delta\tauroman_Δ italic_τ, posterior sample size B𝐵Bitalic_B, initial warm-up length T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and an appropriate integer δ𝛿\deltaitalic_δ to reduce sample correlation.
Output: Posterior samples 𝜼(b)p(𝜼|𝒟M)similar-tosuperscript𝜼𝑏𝑝conditional𝜼subscript𝒟𝑀\boldsymbol{\eta}^{(b)}\sim p(\boldsymbol{\eta}|\mathcal{D}_{M})bold_italic_η start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT ∼ italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) with b=1,2,,B𝑏12𝐵b=1,2,\ldots,Bitalic_b = 1 , 2 , … , italic_B.
1. Set the initial values 𝜼(0):=(𝜽(0),𝝈(0))assign𝜼0superscriptsuperscript𝜽top0superscript𝝈top0top\boldsymbol{\eta}(0):=\left(\boldsymbol{\theta}^{\top}(0),\boldsymbol{\sigma}^% {\top}(0)\right)^{\top}bold_italic_η ( 0 ) := ( bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( 0 ) , bold_italic_σ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( 0 ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT by sampling from the priors;
for τ=0,1,,T0+(B1)δ𝜏01subscript𝑇0𝐵1𝛿\tau=0,1,\ldots,T_{0}+(B-1)\deltaitalic_τ = 0 , 1 , … , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( italic_B - 1 ) italic_δ do
       2. Calculate p(𝜼(τ)|𝒟M)𝑝conditional𝜼𝜏subscript𝒟𝑀p\left(\boldsymbol{\eta}(\tau)|\mathcal{D}_{M}\right)italic_p ( bold_italic_η ( italic_τ ) | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) and 𝜼logp(𝜼|𝒟M)|𝜼=𝜼(τ)evaluated-atsubscript𝜼𝑝conditional𝜼subscript𝒟𝑀𝜼𝜼𝜏\nabla_{\boldsymbol{\eta}}\log p\left(\boldsymbol{\eta}|\mathcal{D}_{M}\right)% |_{\boldsymbol{\eta}=\boldsymbol{\eta}(\tau)}∇ start_POSTSUBSCRIPT bold_italic_η end_POSTSUBSCRIPT roman_log italic_p ( bold_italic_η | caligraphic_D start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT bold_italic_η = bold_italic_η ( italic_τ ) end_POSTSUBSCRIPT by calling Algorithm 1;
       3. Generate a proposal 𝜼~(τ+1)~𝜼𝜏1\tilde{\boldsymbol{\eta}}(\tau+1)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) by applying Equation (26);
       4. Calculate the acceptance ratio γaccsubscript𝛾acc\gamma_{\rm acc}italic_γ start_POSTSUBSCRIPT roman_acc end_POSTSUBSCRIPT by applying Equation (27);
       5. Draw u𝑢uitalic_u from the continuous uniform distribution U(0,1)𝑈01U(0,1)italic_U ( 0 , 1 );
       if uγacc𝑢subscript𝛾accu\leq\gamma_{\rm acc}italic_u ≤ italic_γ start_POSTSUBSCRIPT roman_acc end_POSTSUBSCRIPT then
             6. The proposal 𝜼~(τ+1)~𝜼𝜏1\tilde{\boldsymbol{\eta}}(\tau+1)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) is accepted, and set 𝜼(τ+1):=𝜼~(τ+1)assign𝜼𝜏1~𝜼𝜏1\boldsymbol{\eta}(\tau+1):=\tilde{\boldsymbol{\eta}}(\tau+1)bold_italic_η ( italic_τ + 1 ) := over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 );
      else if u>γacc𝑢subscript𝛾accu>\gamma_{\rm acc}italic_u > italic_γ start_POSTSUBSCRIPT roman_acc end_POSTSUBSCRIPT then
             7. The proposal 𝜼~(τ+1)~𝜼𝜏1\tilde{\boldsymbol{\eta}}(\tau+1)over~ start_ARG bold_italic_η end_ARG ( italic_τ + 1 ) is rejected, and set 𝜼(τ+1):=𝜼(τ)assign𝜼𝜏1𝜼𝜏\boldsymbol{\eta}(\tau+1):=\boldsymbol{\eta}(\tau)bold_italic_η ( italic_τ + 1 ) := bold_italic_η ( italic_τ );
      
8. Return posterior samples 𝜼(b):=𝜼(T0+(b1)δ+1)assignsuperscript𝜼𝑏𝜼subscript𝑇0𝑏1𝛿1\boldsymbol{\eta}^{(b)}:=\boldsymbol{\eta}(T_{0}+(b-1)\delta+1)bold_italic_η start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT := bold_italic_η ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( italic_b - 1 ) italic_δ + 1 ) for b=1,2,,B𝑏12𝐵b=1,2,\ldots,Bitalic_b = 1 , 2 , … , italic_B.
Algorithm 2 MALA joint posterior sampler for SRN.
Remark 1.

From Equation (26), the support of the posterior samples 𝛈(b)superscript𝛈𝑏\boldsymbol{\eta}^{(b)}bold_italic_η start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT generated by Algorithm 2 is the entire L𝐿Litalic_L-dimensional real space Lsuperscript𝐿\mathbb{R}^{L}blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. But in most real-word cases including SRN, the feasible space of 𝛈𝛈\boldsymbol{\eta}bold_italic_η is restricted, meaning it can be a subset of Lsuperscript𝐿\mathbb{R}^{L}blackboard_R start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. For instance, some biological parameters such as rates should be ensured positivity, while some parameters such as probabilities or bioavailability should be between 0 and 1 [14]. Reparametrization of the system allows us to take these constraints into account. Specifically, we can introduce one-to-one functions fl()subscript𝑓𝑙f_{l}(\cdot)italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( ⋅ ) for l=1,2,,L𝑙12𝐿l=1,2,\ldots,Litalic_l = 1 , 2 , … , italic_L, and define transformed parameters ηltrans=fl(ηl)superscriptsubscript𝜂𝑙transsubscript𝑓𝑙subscript𝜂𝑙\eta_{l}^{\rm trans}=f_{l}(\eta_{l})italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_trans end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ). For instance, fl()subscript𝑓𝑙f_{l}(\cdot)italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( ⋅ ) can be logarithmic functions to transform the support from the positive space to the real space, or inverse Logistic functions to transform the support from the interval [0,1]01[0,1][ 0 , 1 ] to the real space. Then we can perform Algorithm 2 on the transformed 𝛈trans=(η1trans,η2trans,,ηLtrans)superscript𝛈transsuperscriptsuperscriptsubscript𝜂1transsuperscriptsubscript𝜂2transsuperscriptsubscript𝜂𝐿transtop\boldsymbol{\eta}^{\rm trans}=\left(\eta_{1}^{\rm trans},\eta_{2}^{\rm trans},% \ldots,\eta_{L}^{\rm trans}\right)^{\top}bold_italic_η start_POSTSUPERSCRIPT roman_trans end_POSTSUPERSCRIPT = ( italic_η start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_trans end_POSTSUPERSCRIPT , italic_η start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_trans end_POSTSUPERSCRIPT , … , italic_η start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_trans end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

5 EMPIRICAL STUDY

In this section, we use a representative example of SRN, i.e., Michaelis-Menten enzyme kinetics [15], to assess the empirical performance of the proposed Bayesian inference approach. In specific, we consider the Michaelis-Menten enzyme kinetic model involving four biochemical species, i.e., Enzyme, Substrate, Complex, and Product. It describes the catalytic conversion of a substrate into a product via an enzymatic reaction involving enzyme, represented by the following three chemical reactions,

Reaction 1:Enzyme+SubstrateComplex,Reaction 2:ComplexEnzyme+Substrate,Reaction 3:ComplexEnzyme+Product,:Reaction 1EnzymeSubstrateComplex:Reaction 2ComplexEnzymeSubstrate:Reaction 3ComplexEnzymeProduct\displaystyle\begin{array}[]{l}\text{Reaction }1:\ {\rm Enzyme}+{\rm Substrate% }\longrightarrow{\rm Complex},\\ \text{Reaction }2:\ {\rm Complex}\longrightarrow{\rm Enzyme}+{\rm Substrate},% \\ \text{Reaction }3:\ {\rm Complex}\longrightarrow{\rm Enzyme}+{\rm Product},% \end{array}start_ARRAY start_ROW start_CELL Reaction 1 : roman_Enzyme + roman_Substrate ⟶ roman_Complex , end_CELL end_ROW start_ROW start_CELL Reaction 2 : roman_Complex ⟶ roman_Enzyme + roman_Substrate , end_CELL end_ROW start_ROW start_CELL Reaction 3 : roman_Complex ⟶ roman_Enzyme + roman_Product , end_CELL end_ROW end_ARRAY with𝑪=(111110111001).with𝑪matrix111110111001\displaystyle~{}~{}~{}\mbox{with}~{}~{}~{}\boldsymbol{C}=\begin{pmatrix}-1&1&1% \\ -1&1&0\\ 1&-1&-1\\ 0&0&1\end{pmatrix}.with bold_italic_C = ( start_ARG start_ROW start_CELL - 1 end_CELL start_CELL 1 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL - 1 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL - 1 end_CELL start_CELL - 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ) .

In particular, let 𝒔(t)=(s1(t),s2(t),s3(t),s4(t))𝒔𝑡superscriptsubscript𝑠1𝑡subscript𝑠2𝑡subscript𝑠3𝑡subscript𝑠4𝑡top\boldsymbol{s}(t)=\left(s_{1}(t),s_{2}(t),s_{3}(t),s_{4}(t)\right)^{\top}bold_italic_s ( italic_t ) = ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ) , italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT denote the system state vector at any time t𝑡titalic_t, where s1(t)subscript𝑠1𝑡s_{1}(t)italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ), s2(t)subscript𝑠2𝑡s_{2}(t)italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ), s3(t)subscript𝑠3𝑡s_{3}(t)italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ), and s4(t)subscript𝑠4𝑡s_{4}(t)italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_t ) are the respective concentration of Enzyme, Substrate, Complex, and Product. The stoichiometry matrix 𝑪𝑪\boldsymbol{C}bold_italic_C associated with the system can be obtained from the above three reaction equations, and the associated reaction rate vector is 𝒗(𝒔(t);𝜽)=(θ1s1(t)s2(t),θ2s3(t),θ3s3(t))𝒗𝒔𝑡𝜽superscriptsubscript𝜃1subscript𝑠1𝑡subscript𝑠2𝑡subscript𝜃2subscript𝑠3𝑡subscript𝜃3subscript𝑠3𝑡top\boldsymbol{v}(\boldsymbol{s}(t);\boldsymbol{\theta})=\left(\theta_{1}s_{1}(t)% s_{2}(t),\theta_{2}s_{3}(t),\theta_{3}s_{3}(t)\right)^{\top}bold_italic_v ( bold_italic_s ( italic_t ) ; bold_italic_θ ) = ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ) , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Our goal is to perform Bayesian inference for the vector of the unknown kinetic rate parameters 𝜽=(θ1,θ2,θ3)𝜽superscriptsubscript𝜃1subscript𝜃2subscript𝜃3top\boldsymbol{\theta}=\left(\theta_{1},\theta_{2},\theta_{3}\right)^{\top}bold_italic_θ = ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

We simulate a synthetic dataset for 80 seconds (i.e., on the time interval [0,80]080[0,80][ 0 , 80 ] seconds) using the Gillespie algorithm [16] to ensure exact simulation with the true parameters 𝜽true=(0.001,0.005,0.01)superscript𝜽truesuperscript0.0010.0050.01top\boldsymbol{\theta}^{\rm true}=\left(0.001,0.005,0.01\right)^{\top}bold_italic_θ start_POSTSUPERSCRIPT roman_true end_POSTSUPERSCRIPT = ( 0.001 , 0.005 , 0.01 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, and the initial states 𝒔(0)=(45,39,55,6)𝒔0superscript4539556top\boldsymbol{s}(0)=\left(45,39,55,6\right)^{\top}bold_italic_s ( 0 ) = ( 45 , 39 , 55 , 6 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. These initial values are obtained after running the process for a short time from some arbitrarily chosen population levels. And we create a challenging data-poor scenario for model inference by assuming incomplete and noisy observations. Specifically, we consider one batch size (i.e., M=1𝑀1M=1italic_M = 1), and discard observations on the Enzyme, Substrate, and Product levels, and only the Complex level is observed every ΔtΔ𝑡\Delta troman_Δ italic_t seconds from t0=0subscript𝑡00t_{0}=0italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 to tH=80subscript𝑡𝐻80t_{H}=80italic_t start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT = 80 (H+1𝐻1H+1italic_H + 1 observation time points in total), so that 𝑱h={3}subscript𝑱3\boldsymbol{J}_{h}=\{3\}bold_italic_J start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = { 3 } and 𝑮h=(0,0,1,0)subscript𝑮0010\boldsymbol{G}_{h}=\left(0,0,1,0\right)bold_italic_G start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = ( 0 , 0 , 1 , 0 ) for any h=0,1,,H01𝐻h=0,1,\ldots,Hitalic_h = 0 , 1 , … , italic_H. And we assume that there is homogeneous additive Gaussian measurement error, i.e., ϵ(th)𝒩(0,σ)similar-toitalic-ϵsubscript𝑡𝒩0𝜎\epsilon(t_{h})\sim\mathcal{N}(0,\sigma)italic_ϵ ( italic_t start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ∼ caligraphic_N ( 0 , italic_σ ) where σ=4𝜎4\sigma=4italic_σ = 4; that is, the error standard deviation is two Complex molecules. The inferred parameter vector is 𝜼(𝜽,σ)𝜼superscriptsuperscript𝜽top𝜎top\boldsymbol{\eta}\equiv(\boldsymbol{\theta}^{\top},\sigma)^{\top}bold_italic_η ≡ ( bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_σ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

We assess the performance of the proposed joint posterior sampler under model uncertainty induced with the different data sizes, i.e., H=4,8,16𝐻4816H=4,8,16italic_H = 4 , 8 , 16 (Δt=20,10,5Δ𝑡20105\Delta t=20,10,5roman_Δ italic_t = 20 , 10 , 5 seconds correspondingly). To study the effect of additional gradient information and Bayesian updating step, the MALA with Bayesian updating LNA is compared to other candidate approaches, including (1) classic Metropolis-Hastings algorithm (M-H) with Bayesian updating LNA, and (2) MALA with original LNA (without Bayesian update), in terms of convergence behavior of posterior sampling. Since the support of the parameters is the positive space, we first need to use the logarithmic function to transform it to the real space. That is, we set log(𝜼)=(log(θ1),log(θ2),log(θ3),log(σ))𝜼superscriptsubscript𝜃1subscript𝜃2subscript𝜃3𝜎top\log(\boldsymbol{\eta})=\left(\log(\theta_{1}),\log(\theta_{2}),\log(\theta_{3% }),\log(\sigma)\right)^{\top}roman_log ( bold_italic_η ) = ( roman_log ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , roman_log ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , roman_log ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , roman_log ( italic_σ ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and apply three algorithms to log(𝜼)𝜼\log(\boldsymbol{\eta})roman_log ( bold_italic_η ). For both LNA metamodels, we set 𝒔¯(t0)+𝝋(t0)=(50,40,60,10)¯𝒔subscript𝑡0𝝋subscript𝑡0superscript50406010top\bar{\boldsymbol{s}}(t_{0})+\boldsymbol{\varphi}(t_{0})=\left(50,40,60,10% \right)^{\top}over¯ start_ARG bold_italic_s end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + bold_italic_φ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( 50 , 40 , 60 , 10 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, 𝚿(t0)𝚿subscript𝑡0\boldsymbol{\Psi}(t_{0})bold_Ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) as a 4-by-4 identity matrix, and Ih=2000,1000,500subscript𝐼20001000500I_{h}=2000,1000,500italic_I start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 2000 , 1000 , 500 for Δt=20,10,5Δ𝑡20105\Delta t=20,10,5roman_Δ italic_t = 20 , 10 , 5 respectively to make a small Δzh=0.01Δsubscript𝑧0.01\Delta z_{h}=0.01roman_Δ italic_z start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 0.01 for any h=0,1,,H01𝐻h=0,1,\ldots,Hitalic_h = 0 , 1 , … , italic_H. The priors of the parameters are set as θkU(0,1)similar-tosubscript𝜃𝑘𝑈01\theta_{k}\sim U(0,1)italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ italic_U ( 0 , 1 ) for k=1,2,3𝑘123k=1,2,3italic_k = 1 , 2 , 3, and σU(0,25)similar-to𝜎𝑈025\sigma\sim U(0,25)italic_σ ∼ italic_U ( 0 , 25 ), to consider a difficult situation without strong prior information. The results are estimated based on 10 macro-replications.

First, we compare the convergence speed of three algorithms. For MALA with Bayesian updating LNA and with original LNA (without Bayesian update), we set the step size Δτ=0.001Δ𝜏0.001\Delta\tau=0.001roman_Δ italic_τ = 0.001. Correspondingly, to show that MALA improves the mixing of MCMC, for M-H with Bayesian updating LNA, we set its proposal distribution to be Gaussian distributed with the current sample as mean and diag{2Δτ}=2Δ𝜏absent\{2\Delta\tau\}={ 2 roman_Δ italic_τ } = diag{0.002}0.002\{0.002\}{ 0.002 } as covariance. Figure 2 shows the mean convergence trends of the three algorithms for the three log-kinetic rate parameters (with 95% confidence intervals (CIs) across 10 macro-replications) under the data size H=16(Δt=5)𝐻16Δ𝑡5H=16\ (\Delta t=5)italic_H = 16 ( roman_Δ italic_t = 5 ). The black line represents the true log-parameters. By comparing the widths of the CIs as iterations progress, we observe that MALA shows a significant improvement in the convergence speed of the log-kinetic rate parameters inference over M-H, while the Bayesian updating step reduces the approximation error accumulation over time of original LNA, providing a more accurate likelihood approximation. It demonstrates that by sufficiently leveraging on the likelihood and its gradient information provided by a moderate size of observations, MALA posterior sampling based on the likelihood approximated by the Bayesian updating LNA metamodel converges quickly towards the true log-parameters.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: The convergence trends of (1) MALA with original LNA, (2) MALA with Bayesian updating LNA, and (3) M-H with Bayesian updating LNA (with 95% CIs) when the data size H=16(Δt=5)𝐻16Δ𝑡5H=16\ (\Delta t=5)italic_H = 16 ( roman_Δ italic_t = 5 ).

Then, we study the root mean square error (RMSE) of model parameter estimation to assess the convergence results of three algorithms under three different data sizes. Basically, the RMSE measures the differences between the true and the estimated log-parameters based on B𝐵Bitalic_B posterior samples, i.e., RMSE =1Bb=1B|log(ηltrue){log(ηl)}(b)|2absent1𝐵superscriptsubscript𝑏1𝐵superscriptsuperscriptsubscript𝜂𝑙truesuperscriptsubscript𝜂𝑙𝑏2=\sqrt{\frac{1}{B}\sum_{b=1}^{B}|\log(\eta_{l}^{\rm true})-\{\log(\eta_{l})\}^% {(b)}|^{2}}= square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_B end_ARG ∑ start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT | roman_log ( italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_true end_POSTSUPERSCRIPT ) - { roman_log ( italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG for l=1,2,3,4𝑙1234l=1,2,3,4italic_l = 1 , 2 , 3 , 4. We set the initial warm-up length T0=10000subscript𝑇010000T_{0}=10000italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 10000 to reduce the initial bias, the posterior sample size B=100𝐵100B=100italic_B = 100, and an appropriate integer δ=10𝛿10\delta=10italic_δ = 10 to reduce sample correlation for three algorithms. We summarize the 95% CIs obtained by using 10 macro-replications of the RMSEs for the four log-parameters inferred by the three algorithms in Table 1. As it shows, for the four log-parameters except log(θ3)subscript𝜃3\log(\theta_{3})roman_log ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) inferred under the data size H=8,16𝐻816H=8,16italic_H = 8 , 16, the RMSEs of MALA with Bayesian updating LNA decrease more significantly than those of MALA with original LNA as the data size increases, demonstrating the major benefit induced by the Bayesian updating LNA metamodel. That means it refines the approximation of the likelihood as more observations available to update the original LNA model. The exception of log(θ3)subscript𝜃3\log(\theta_{3})roman_log ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) is due to that, θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is the kinetic rate parameter associated with the Reaction 3, whose reactant (i.e., Complex) is the only observable component of this system, the data size of H=16𝐻16H=16italic_H = 16 is thus sufficient to provide relatively accurate likelihood information based on the original LNA metamodel, while the Bayesian updating step improves accuracy of likelihood approximation when the data size H=8𝐻8H=8italic_H = 8. With the help of MALA, log(θ3)subscript𝜃3\log(\theta_{3})roman_log ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) converges close to the true value when H=8,16𝐻816H=8,16italic_H = 8 , 16. Additionally, for log(θ1)subscript𝜃1\log(\theta_{1})roman_log ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and log(θ2)subscript𝜃2\log(\theta_{2})roman_log ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) under all data sizes, based on the likelihood approximated by the Bayesian updating LNA metamodel, MALA performs better than M-H in terms of both RMSEs and their confidence half-widths, meaning that MALA converges faster than M-H; while for log(θ3)subscript𝜃3\log(\theta_{3})roman_log ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) and log(σ)𝜎\log(\sigma)roman_log ( italic_σ ), the performance of the two algorithms is similar. This is because both two algorithms have converged before T0=10000subscript𝑇010000T_{0}=10000italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 10000.

Table 1: The RMSEs between the estimated and the true log-parameters (with 95% CIs).
MALA with Bayesian updating LNA M-H with Bayesian updating LNA MALA with original LNA
Data H=4𝐻4H=4italic_H = 4 H=8𝐻8H=8italic_H = 8 H=16𝐻16H=16italic_H = 16 H=4𝐻4H=4italic_H = 4 H=8𝐻8H=8italic_H = 8 H=16𝐻16H=16italic_H = 16 H=4𝐻4H=4italic_H = 4 H=8𝐻8H=8italic_H = 8 H=16𝐻16H=16italic_H = 16
size (Δt=20)Δ𝑡20(\Delta t=20)( roman_Δ italic_t = 20 ) (Δt=10)Δ𝑡10(\Delta t=10)( roman_Δ italic_t = 10 ) (Δt=5)Δ𝑡5(\Delta t=5)( roman_Δ italic_t = 5 ) (Δt=20)Δ𝑡20(\Delta t=20)( roman_Δ italic_t = 20 ) (Δt=10)Δ𝑡10(\Delta t=10)( roman_Δ italic_t = 10 ) (Δt=5)Δ𝑡5(\Delta t=5)( roman_Δ italic_t = 5 ) (Δt=20)Δ𝑡20(\Delta t=20)( roman_Δ italic_t = 20 ) (Δt=10)Δ𝑡10(\Delta t=10)( roman_Δ italic_t = 10 ) (Δt=5)Δ𝑡5(\Delta t=5)( roman_Δ italic_t = 5 )
log(θ1)subscript𝜃1\log(\theta_{1})roman_log ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 1.79±0.73plus-or-minus1.790.731.79\pm 0.731.79 ± 0.73 1.27±0.92plus-or-minus1.270.921.27\pm 0.921.27 ± 0.92 0.48±0.08plus-or-minus0.480.080.48\pm 0.080.48 ± 0.08 2.56±1.09plus-or-minus2.561.092.56\pm 1.092.56 ± 1.09 1.80±1.07plus-or-minus1.801.071.80\pm 1.071.80 ± 1.07 1.48±1.37plus-or-minus1.481.371.48\pm 1.371.48 ± 1.37 1.82±0.74plus-or-minus1.820.741.82\pm 0.741.82 ± 0.74 1.72±0.81plus-or-minus1.720.811.72\pm 0.811.72 ± 0.81 1.41±0.94plus-or-minus1.410.941.41\pm 0.941.41 ± 0.94
log(θ2)subscript𝜃2\log(\theta_{2})roman_log ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) 2.76±1.37plus-or-minus2.761.372.76\pm 1.372.76 ± 1.37 2.12±1.31plus-or-minus2.121.312.12\pm 1.312.12 ± 1.31 1.32±0.58plus-or-minus1.320.581.32\pm 0.581.32 ± 0.58 3.29±1.82plus-or-minus3.291.823.29\pm 1.823.29 ± 1.82 2.87±1.49plus-or-minus2.871.492.87\pm 1.492.87 ± 1.49 2.50±1.61plus-or-minus2.501.612.50\pm 1.612.50 ± 1.61 2.80±1.38plus-or-minus2.801.382.80\pm 1.382.80 ± 1.38 2.73±1.36plus-or-minus2.731.362.73\pm 1.362.73 ± 1.36 2.63±1.19plus-or-minus2.631.192.63\pm 1.192.63 ± 1.19
log(θ3)subscript𝜃3\log(\theta_{3})roman_log ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) 1.73±2.02plus-or-minus1.732.021.73\pm 2.021.73 ± 2.02 0.25±0.03plus-or-minus0.250.030.25\pm 0.030.25 ± 0.03 0.28±0.03plus-or-minus0.280.030.28\pm 0.030.28 ± 0.03 1.66±2.12plus-or-minus1.662.121.66\pm 2.121.66 ± 2.12 1.15±2.02plus-or-minus1.152.021.15\pm 2.021.15 ± 2.02 0.24±0.03plus-or-minus0.240.030.24\pm 0.030.24 ± 0.03 1.71±2.02plus-or-minus1.712.021.71\pm 2.021.71 ± 2.02 0.69±0.83plus-or-minus0.690.830.69\pm 0.830.69 ± 0.83 0.17±0.04plus-or-minus0.170.040.17\pm 0.040.17 ± 0.04
log(σ)𝜎\log(\sigma)roman_log ( italic_σ ) 1.81±0.93plus-or-minus1.810.931.81\pm 0.931.81 ± 0.93 1.02±0.56plus-or-minus1.020.561.02\pm 0.561.02 ± 0.56 0.92±0.59plus-or-minus0.920.590.92\pm 0.590.92 ± 0.59 1.59±0.70plus-or-minus1.590.701.59\pm 0.701.59 ± 0.70 0.98±0.43plus-or-minus0.980.430.98\pm 0.430.98 ± 0.43 0.80±0.29plus-or-minus0.800.290.80\pm 0.290.80 ± 0.29 1.76±0.93plus-or-minus1.760.931.76\pm 0.931.76 ± 0.93 1.30±0.62plus-or-minus1.300.621.30\pm 0.621.30 ± 0.62 1.05±0.73plus-or-minus1.050.731.05\pm 0.731.05 ± 0.73

6 Conclusion

Bayesian inference on partially observed SRN plays a critical role for multi-scale bioprocess mechanism learning. To tackle the critical challenges of biomanufacturing processes, including high complexity, high inherent stochasticity, and very limited and sparse observations on partially observed state with measurement errors, we propose an interpretable Bayesian updating LNA metamodel to approximate the likelihood of heterogeneous online and offline measures, accounting for the structure information of the enzymatic SRN mechanistic model. Then, we develop a MALA sampling algorithm that utilizes the information from the derived likelihood and more efficiently generates posterior samples. The empirical study shows that our proposed LNA assisted Bayesian inference approach has a promising performance, demonstrating its potential to benefit bioprocess mechanisms online learning and digital twin development.

Acknowledgements

The authors thank Prof. Cheng Li from the Department of Statistics and Data Science, National University of Singapore for providing the discussion of Metropolis-adjusted Langevin algorithm (MALA).

References

  • [1] Nathan Hillson, Mark Caddick, Yizhi Cai, Jose A Carrasco, Matthew Wook Chang, Natalie C Curach, David J Bell, Rosalind Le Feuvre, Douglas C Friedman, Xiongfei Fu, et al. Building a global alliance of biofoundries. Nature Communications, 10(1):2040, 2019.
  • [2] Daniel T Gillespie. The chemical langevin equation. The Journal of Chemical Physics, 113(1):297–306, 2000.
  • [3] Wei Xie, Keqi Wang, Hua Zheng, and Ben Feng. Sequential importance sampling for hybrid model bayesian inference to support bioprocess mechanism learning and robust control. In 2022 Winter Simulation Conference (WSC), pages 2282–2293. IEEE, 2022.
  • [4] Cedric Archambeau, Dan Cornford, Manfred Opper, and John Shawe-Taylor. Gaussian process approximations of stochastic differential equations. In Gaussian Processes in Practice, pages 1–16. PMLR, 2007.
  • [5] Constantino A Garcia, Abraham Otero, Paulo Felix, Jesus Presedo, and David G Marquez. Nonparametric estimation of stochastic differential equations with sparse gaussian processes. Physical Review E, 96(2):022104, 2017.
  • [6] Shihao Yang, Samuel WK Wong, and SC Kou. Inference of dynamic systems from noisy and sparse data via manifold-constrained gaussian processes. Proceedings of the National Academy of Sciences, 118(15):e2020397118, 2021.
  • [7] Daniel T Gillespie. A rigorous derivation of the chemical master equation. Physica A: Statistical Mechanics and Its Applications, 188(1-3):404–425, 1992.
  • [8] Lars Ferm, Per Lötstedt, and Andreas Hellander. A hierarchy of approximations of the master equation scaled by a size parameter. Journal of Scientific Computing, 34(2):127–151, 2008.
  • [9] Andreas Ruttor and Manfred Opper. Efficient statistical inference for stochastic reaction processes. Physical Review Letters, 103(23):230601, 2009.
  • [10] Paul Fearnhead, Vasilieos Giagos, and Chris Sherlock. Inference for reaction networks using the linear noise approximation. Biometrics, 70(2):457–466, 2014.
  • [11] Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, and Philippe Rigollet. Optimal dimension dependence of the metropolis-adjusted langevin algorithm. In Conference on Learning Theory, pages 1260–1300. PMLR, 2021.
  • [12] David F Anderson and Thomas G Kurtz. Continuous time markov chain models for chemical reaction networks. In Design and Analysis of Biomolecular Circuits: Engineering Approaches to Systems and Synthetic Biology, pages 3–42. Springer, 2011.
  • [13] P.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Applications of Mathematics : Stochastic Modelling and Applied Probability. Springer, 1992.
  • [14] MéLanie Prague, Daniel Commenges, JéRéMie Guedj, Julia Drylewicz, and Rodolphe ThiéBaut. Nimrod: A program for inference via a normal approximation of the posterior in models with random effects based on ordinary differential equations. Computer Methods and Programs in Biomedicine, 111(2):447–458, 2013.
  • [15] Christopher V Rao and Adam P Arkin. Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the gillespie algorithm. The Journal of Chemical Physics, 118(11):4999–5010, 2003.
  • [16] Daniel T Gillespie. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340–2361, 1977.