Adjoint Sensitivity Analysis on Multi-Scale bioprocess stochastic reaction network

Keilung Choy
Northeastern University &Wei Xie
Northeastern University Corresponding author. Email: [email protected]

Abstract

Motivated by the pressing challenges in the digital twin development for biomanufacturing systems, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse production processes and leverage the information from existing macro-kinetic and genome-scale models. To support forward prediction and backward reasoning, we develop a convergent adjoint SA algorithm studying how the perturbations of model parameters and inputs (e.g., initial state) propagate through enzymatic reaction networks and impact on output trajectory predictions. This SA can provide a sample efficient and interpretable way to assess the sensitivities between inputs and outputs accounting for their causal dependencies. Our empirical study underscores the resilience of these sensitivities and illuminates a deeper comprehension of the regulatory mechanisms behind bioprocess through sensitivities.

Keywords Multi-Scale Bioprocess $\cdot$ Enzymatic Reaction Network $\cdot$ Diffusion Approximation $\cdot$ Stochastic Differential Equation $\cdot$ Sensitivity Analysis $\cdot$ Gradient-Based Optimization

1 INTRODUCTION

With prior knowledge of the reaction network structure and regulatory mechanisms, the trajectory output of bioprocess hinges on three key inputs: (1) initial states; (2) actions; and (3) parameters of the mechanistic model(Xie et al., 2022). Suppose the effect of actions, such as feeding strategies, on state change is immediate and known. Therefore, our objective is to develop an interpretable and sample-efficient sensitivity analysis (SA) approach for the multi-scale bioprocess mechanistic model that focuses on investigating the sensitivities between inputs (i.e., initial states and model parameters) and outputs.

A multi-scale bioprocess mechanistic model can facilitate the development of digital twins and BioFoundries for biomanufacturing processes. Within this paradigm, the model’s foundation is constructed upon fundamental building blocks, i.e., molecular reaction networks. There exist various challenges to conduct SA on the mechanistic model of enzymatic stochastic reaction networks. One significant feature is its double-stochasticity, which means at any time molecular reaction rates are contingent upon random states, such as species concentrations and environmental variables. Upon formulating the multi-scale bioprocess mechanistic model in the form of stochastic differential equations (SDEs), the drift and diffusion terms could be built based on Michaelis–Menten kinetics, which is the most frequently used kinetic model of enzymatic reactions found in existing literature (Kyriakopoulos et al., 2018). It leverages our existing understanding of the reaction rate structure and encapsulates the double stochasticity inherent in stochastic reaction networks, i.e., both the drift and diffusion terms are contingent upon the current states. Consequently, deriving analytical solutions for such SDEs, characterizing stochastic molecular reaction network dynamics and variations, could be difficult. Moreover, the interactions between input factors and intermediate states further introduce high complexity in SA.

Existing sensitivity analysis approaches could be divided into two categories: local and global sensitivity analysis. Global sensitivity analysis focuses on the impact of significant variations in model inputs. For example, the Sobol method (Sobol, 2001) is a global sensitivity analysis approach based on variance decomposition, which decomposes the variance of the model output into the contributions of different model inputs and parameters. In contrast, the objective of local sensitivity analysis is to quantify the effects of minor perturbations in model inputs and parameters on model predictions. There are a variety of local sensitivity analysis approaches including finite direct differential method (Kramer et al., 1984), Nominal Range Sensitivity Method (John Bailer, 2001), and automatic differentiation (Kedem, 1980). Unfortunately, most of these existing approaches are not applicable to our study since they scale poorly, either in terms of computational time or memory usage, as the number of parameters and states within the model increases. Given that the multi-scale bioprocess model typically represents a complex system with many inputs and parameters, these limitations pose a significant barrier to the learning process of underlying mechanisms.

In this paper, we formulate the multi-scale bioprocess mechanistic model in SDEs form, accounting for underlying causal interdependencies of an enzymatic stochastic reaction network, and then develop an adjoint SA approach studying the sensitivities between inputs and outputs. It can correctly and efficiently quantify the contribution and criticality of each input and model parameter impacting on the prediction errors of multivariate output trajectories, such as productivity and product critical quality attributes (CQAs). To support forward prediction and backward reasoning, this adjoint sensitivity analysis over the operator of SDEs, characterizing bioprocess mechanisms, exhibits robust scalability even when the complexity of the mechanistic model increases. In addition, since it leverages the structural information of the regulatory reaction network, the adjoint SA can provide sample efficient and interpretable guidance to search inputs and model parameters, accounting for their interactions, to speed up the learning process.

This paper addresses two key challenges in bioprocess modeling through the development of the adjoint SA approach. Firstly, it addresses the issue that the dimensionality of model parameters can be so high that searching for the optimal parameter estimation becomes challenging. The local SA approach assesses the impact of model parameters on prediction accuracy and expedites process mechanism learning. This facilitates digital twin development and more efficient experimental design, which is crucial in the costly, highly regulated biomanufacturing industry. Secondly, the proposed adjoint SA can incorporate the intricate interdependencies of bioprocesses. By leveraging structural information within enzymatic molecular reaction networks, the paper mitigates complexity and streamlines the calculation of input-output sensitivities, alleviating computational bottlenecks encountered in previous local SA studies.

The structure of the paper is organized as follows. In Section 2, we initiate by delineating the multi-scale bioprocess mechanistic model and summarizing its key characteristics. Subsequently, in Section 3, we proceed to construct a metamodel to the multi-scale bioprocess, leveraging the insights from the preceding section. Section 4 is dedicated to local sensitivity analysis algorithm development, which will help us investigate the barriers to the reduction of model prediction errors and provide a guidance to speed up the search for optimal inputs. Building upon this part, Section 4.2 introduces an adjoint SA algorithm on SDEs designed to enhance computational efficiency. Improvements brought by the algorithm will be validated in Section 5 through the empirical study of its finite-sample performance. Finally, in Section 6, we synthesize the findings and insights gathered throughout this study, leading to the paper’s conclusion.

2 Problem Description

2.1 Multi-Scale Bioprocess Stochastic Reaction Network

A multi-scale bioprocessing mechanistic model characterizes the causal dependence from molecular to macroscopic kinetics and it is built on the fundamental building block, i.e., enzymatic molecular reaction networks. Suppose the system is composed of $I$ species, denoted by $\boldsymbol{X}=(X_{1},X_{2},\ldots,X_{I})^{\top}$ , interacting with each other through $J$ reactions. At any time $t$ , let $\boldsymbol{s}_{t}=(s_{t}^{1},s_{t}^{2},\ldots,s_{t}^{I})^{\top}\in\mathbb{R}_% {+}^{I}$ be the bioprocess state, where $s_{t}^{i}$ denotes the number of molecules of species $i$ . Each $j$ -th reaction with $j=1,2,\ldots,J$ is characterized by a reaction vector $\boldsymbol{N}_{j}\in\mathbb{R}^{I}$ , describing the change in the numbers of $I$ species’ molecules when a $j$ -th molecular reaction occurs. The associated reaction rate denoted by $v_{j}$ , depending on state, such as the current number of molecules of each species, describes the rate at which the $j$ -th reaction occurs. Specifically, for the $j$ -th reaction equation given by

p_{j1}X_{1}+p_{j2}X_{2}+\cdots+p_{jI}X_{I}\xrightarrow{\bm{v}_{j}}q_{j1}X_{1}+% q_{j2}X_{2}+\cdots+q_{jI}X_{I},

the reaction relational structure, specified by the vector $\boldsymbol{N}_{j}=(q_{j1}-p_{j1},q_{j2}-p_{j2},\ldots,q_{jI}-p_{iI})^{\top}$ , is known for $j=1,2,\ldots,J$ . Thus, the stoichiometry matrix $\boldsymbol{N}=\left(\boldsymbol{N}_{1},\boldsymbol{N}_{2},\ldots,\boldsymbol{% N}_{J}\right)\in\mathbb{R}^{I\times J}$ characterizes the structure information of the reaction network composed of $J$ reactions. The $(i,j)$ -th element of $\boldsymbol{N}$ , denoted by $N_{ij}$ , represents the number of molecules of the $i$ -th species that are either consumed (indicated by a negative value) or produced (indicated by a positive value) in each random occurrence of the $j$ -th reaction.

In this paper, we suppose that the structure of the reaction network represented by matrix $\boldsymbol{N}$ is known. The regulation mechanism of each $j$ -th reaction is characterized by the reaction rate function $\bm{v}_{j}$ , which is associated with the current system state $\boldsymbol{s}_{t}$ and the mechanistic model parameters denoted by $\boldsymbol{\theta}_{t}$ . Let $\bm{R}_{t}$ be a vector of the occurrences of each molecular reaction in a given short time interval $(t,t+\Delta t]$ and the system state is updated from $\boldsymbol{s}_{t}$ to $\boldsymbol{s}_{t+1}$ . Since a molecular reaction will occur when one molecule collides, binds, and reacts with another one while molecules move around randomly, driven by stochastic thermodynamics of Brownian motion Golightly and Wilkinson (2005), the occurrences of molecular reactions are modeled by non-homogeneous Poisson process. Therefore, the state transition model becomes,

\boldsymbol{s}_{t+1}=\boldsymbol{s}_{t}+\bm{N}\cdot\bm{R_{t}}\quad\mbox{with}% \quad\bm{R}_{t}\sim\mbox{Poisson}(\bm{v}(\boldsymbol{s}_{t},\bm{\theta}_{t})),

(1)

where $\bm{N}\cdot\bm{R_{t}}$ represents the net amount of reaction outputs during time interval $(t,t+\Delta t]$ .

Michaelis–Menten (MM) kinetics is commonly used to model the regulation mechanisms of enzymatic reaction networks and the flux rates $\bm{v}(\boldsymbol{s}_{t},\bm{\theta}_{t})$ depend on the state (Michaelis and Menten, 2007). In an enzymatic molecular reaction as shown in the first equation in (2), the substrate (S) initially forms a reversible complex (ES) with the enzyme (E), i.e., the enzyme and substrate have to interact for the enzyme to be able to perform its catalytic function to produce the product (P), with $K_{F}$ , $K_{R}$ , and $K_{cat}$ representing kinetic rates. For MM kinetics as shown in the second equation in (2), we assume the enzyme is either present as the free enzyme or as the ES complex, i.e., $[E]_{total}=[E]+[ES]$ with $[E]$ denoting the concentration of the enzyme. Suppose the rate of formation of the ES complex is equal to the rate of dissociation plus the breakdown, i.e., $K_{F}[E][S]=[ES](K_{R}+K_{cat})$ . Thus, the parameters in MM kinetics characterize the regulation mechanisms of enzymatic reaction network: (1) $V_{max}^{j}=K_{cat}[E]_{total}$ , is the maximum possible velocity of the $j$ -th molecular reaction that can occur when all the enzyme molecules are bound with the substrate, i.e., $[E]_{total}=[ES]$ ; and (2) $K_{m}^{j}=\frac{K_{R}+K_{cat}}{K_{F}}=\frac{[E][S]}{[ES]}$ is a dissociation constant for the ES complex,

E+S\underset{K_{R}}{\overset{K_{F}}{\rightleftarrows}}ES\overset{K_{cat}}{% \rightarrow}E+P\text{(product)}~{}~{}~{}\mbox{and}~{}~{}~{}\bm{v}_{j}(% \boldsymbol{s}_{t},\bm{\theta}_{t})=\frac{V_{max}^{j}s_{t}^{j}}{K_{m}^{j}+s_{t% }^{j}}.

(2)

By applying diffusion approximation, the state transition model becomes,

p(\boldsymbol{s}_{t+1}|\boldsymbol{s}_{t},\bm{\theta}_{t})\sim\left\{\begin{% array}[]{l}\mathcal{N}(\boldsymbol{s}_{t}+\bm{N}\boldsymbol{v}(\boldsymbol{s}_% {t},\bm{\theta}_{t})\Delta t,\bm{N}diag(\boldsymbol{v}(\boldsymbol{s}_{t},\bm{% \theta}_{t}))\bm{N}^{\top}\Delta t),\boldsymbol{s}_{t+1}>\bm{0},\\ 0,\boldsymbol{s}_{t+1}\leq\bm{0}.\end{array}\right.

(3)

Thus, the multi-scale bioprocess mechanistic model is further represented as a system of Stochastic Differential Equations (SDEs), intricately reliant on parameters $\bm{\theta}_{t}$ and current states $\boldsymbol{s}_{t}$ . The diffusion term is necessary due to the inherent stochasticity of molecular reactions in the bioprocess:

\displaystyle(d\boldsymbol{s}_{t},d\bm{\theta}_{t})^{\top}

\displaystyle=

\displaystyle(\mu(\boldsymbol{s}_{t},\bm{\theta}_{t}),f(\boldsymbol{s}_{t},\bm% {\theta}_{t}))^{\top}dt+(\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t}),g(% \boldsymbol{s}_{t},\bm{\theta}_{t}))^{\top}dW_{t}.

(4)

where $\mu$ , $f$ are the drift terms and $\sigma$ , $g$ are the diffusion terms defined as:

	$\displaystyle(\mu(\boldsymbol{s}_{t},\bm{\theta}_{t}),f(\boldsymbol{s}_{t},\bm% {\theta}_{t}))^{\top}$	$\displaystyle=$	$\displaystyle\mbox{E}[(\boldsymbol{s}_{t+\Delta t},\bm{\theta}_{t+\Delta t})^{% \top}-(\boldsymbol{s}_{t},\bm{\theta}_{t})^{\top}\|\boldsymbol{s}_{t},\bm{% \theta}_{t}],$
	$\displaystyle(\sigma^{2}(\boldsymbol{s}_{t},\bm{\theta}_{t}),g^{2}(\boldsymbol% {s}_{t},\bm{\theta}_{t}))^{\top}$	$\displaystyle=$	$\displaystyle\mbox{Var}[(\boldsymbol{s}_{t+\Delta t},\bm{\theta}_{t+\Delta t})% ^{\top}\|\boldsymbol{s}_{t},\bm{\theta}_{t}].$

The specific form of infinitesimal mean and variance on the right side will be derived in Section 3. This SDE system is the foundational groundwork for the subsequent local sensitivity analysis we will conduct. In this paper, $\bm{\theta}_{t}$ remains constant, resulting in both $f$ and $g$ becoming 0. However, given that we model the dynamics of $\bm{\theta}_{t}$ in SDE form as shown in Equation (4), our approach can be readily extended to accommodate the scenarios where $\bm{\theta}_{t}$ varies over time.

Given any feasible policy with action, i.e., $\boldsymbol{a}_{t}=\pi(\boldsymbol{s}_{t})$ at any decision time $t$ , we consider the state transition model characterizing the process mechanistic dynamics and inherent stochasticity, i.e.,

\boldsymbol{s}_{t+1}\sim p(\boldsymbol{s}_{t+1}|\boldsymbol{s}_{t},\boldsymbol% {a}_{t};\boldsymbol{\theta}).

For the biomanufacturing process, suppose the impact of decision $\boldsymbol{a}_{t}$ (e.g., feeding strategy) on the state is known and happens immediately; that means we get the post-decision state denoted by $\boldsymbol{s}_{t}^{\prime}=\boldsymbol{f}(\boldsymbol{s}_{t},\boldsymbol{a}_{% t})$ with a known function $\boldsymbol{f}$ . For notation simplification, we ignore the impact of action. The proposed sensitivity analysis over inputs (i.e., states and model parameters) is extendable to account for the policy effect.

2.2 Parameter selection and calibration

Local sensitivity analysis studies the changes in the model prediction outputs with respect to initial input values, i.e., $(\boldsymbol{s}_{0},\bm{\theta}_{0})$ . The variations around this local point are quantified by the sensitivity coefficients (Zi, 2011) In this way, we could generate the forward flow $\Phi_{0,T}(\boldsymbol{s},\bm{\theta})$ and derive both $\frac{\partial\Phi_{0,T}(\boldsymbol{s},\bm{\theta})}{\partial\boldsymbol{s}_{% 0}}$ and $\frac{\partial\Phi_{0,T}(\boldsymbol{s},\bm{\theta})}{\partial\bm{\theta}_{0}}$ , then $\frac{\partial\mathcal{L}(\Phi_{0,T}(\boldsymbol{s},\bm{\theta}))}{\partial s}$ and $\frac{\partial\mathcal{L}(\Phi_{0,T}(\boldsymbol{s},\bm{\theta}))}{\partial\bm% {\theta}}$ . that are intimately connected to the solution of the SDE system (4). In Section 3, we will derive $\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})$ and $\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})$ , showing that both functions are characterized by infinite differentiability, with their first-order derivatives being bounded, i.e., $\mu,\sigma\in C_{b}^{\infty,1}$ . Consequently, once the initial values are given, a unique solution to the system (4) is guaranteed to exist. We use $\Phi_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ to represent the solution of the SDEs in (4) at time $t_{2}$ and it is called forward flow satisfying the property:

\Phi_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})=\Phi_{0,t}(\Phi_{t,t_{2}}(% \boldsymbol{s}_{0},\bm{\theta}_{0}))~{}~{}\mbox{ for }~{}~{}0\leq t\leq t_{2}.

To simulate $\Phi_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ , we consider its calculus form based on Equation (4):

	$\displaystyle\Phi_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})$	$\displaystyle=$	$\displaystyle(\boldsymbol{s}_{0},\bm{\theta}_{0})^{\top}+\int_{0}^{t_{2}}(\mu(% \Phi_{0,t}(\boldsymbol{s}_{0},\bm{\theta}_{0})),f(\Phi_{0,t}(\boldsymbol{s}_{0% },\bm{\theta}_{0})))^{\top}dt$		(5)
			$\displaystyle+\int_{0}^{t_{2}}(\sigma(\Phi_{0,t}(\boldsymbol{s}_{0},\bm{\theta% }_{0})),g(\Phi_{0,t}(\boldsymbol{s}_{0},\bm{\theta}_{0})))^{\top}\circ dW_{t},$		(5)

where $\circ dW_{t}$ represents the Stratonovich stochastic integral. For a continuous semimartingale $\{f_{t}\}_{t<T}$ adapted to the forward filtration $\{\mathcal{F}_{0,t}\}_{t<T}$ , the Stratonovich stochastic integral is:

\int_{0}^{T}f_{t}\circ dW_{t}=\lim_{|\Pi|\to 0}\sum\limits_{i=1}^{N}\frac{(f_{% t_{i-1}}+f_{t_{i}})}{2}(W_{t_{i}}-W_{t_{i-1}}),

where $\Pi=\{0=t_{0}<0<\ldots<t_{N}=T\}$ is a partition of time interval $[0,T]$ and $|\Pi|=\max\limits_{n}(t_{n}-t_{n-1})$ . The reason to introduce Stratonovich stochastic integral is that we could generate the inverse flow $\psi_{0,t_{2}}=\Phi_{0,t_{2}}^{-1}$ from SDE system (4) based on Equation (5) Kunita (2019):

	$\displaystyle\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})$	$\displaystyle=$	$\displaystyle(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})^{\top}-\int_{0}^{t_{% 2}}(\mu(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),f(\psi_{t,% t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}dt$		(6)
			$\displaystyle-\int_{0}^{t_{2}}(\sigma(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},% \bm{\theta}_{t_{2}})),g(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{% 2}})))^{\top}\circ d\widetilde{W}_{t},$		(6)

where $\widetilde{W}_{t}$ is the backward Wiener process defined as $\widetilde{W}_{t}=W_{t}-W_{T}$ for any $t<T$ . It is adapted to the backward filtration $\{\mathcal{F}_{t,T}\}_{t<T}$ . The difference between Equations (5) and (6) is only the negative sign, and such symmetry is attributed to the use of Stratonovich stochastic integral.

We could further define a scalar loss function $\mathcal{L}$ of $\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ . Then, the loss for system (4) becomes $\mathcal{L}(\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}))$ and the sensitivity coefficient for $\boldsymbol{s}_{0}$ and $\bm{\theta}_{0}$ becomes,

A_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})\equiv\left(\frac{\partial\mathcal{% L}(\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}))}{\partial\boldsymbol{s}_{0}% },\frac{\partial\mathcal{L}(\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}))}{% \partial\bm{\theta}_{0}}\right)^{\top}.

(7)

Then, based on the chain rule:

\displaystyle A_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})=\triangledown_{\Phi}% \mathcal{L}(\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}))\triangledown\Phi_{% 0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}).

(8)

For $\triangledown\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ in (7), we first derive $\left(\frac{\partial\boldsymbol{s}_{T}}{\partial\boldsymbol{s}_{0}},\frac{% \partial\bm{\theta}_{T}}{\partial\bm{\theta}_{0}}\right)^{\top}$ under the assumption that $(\boldsymbol{s}_{T},\bm{\theta}_{T})^{\top}$ is deterministic, i.e., not dependent on the Wiener process $W_{t}$ . We then extend to the case that $(\boldsymbol{s}_{T},\bm{\theta}_{T})^{\top}$ is stochastically obtained from forward flow $\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ from SDE system (4) and derive $\left(\frac{\partial\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})}{\partial% \boldsymbol{s}_{0}},\frac{\partial\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0% })}{\partial\bm{\theta}_{0}}\right)^{\top}$ , which implements the structural information in the stochastic reaction network through a dual process of forward and backward propagation.

Then, for $\triangledown_{\Phi}\mathcal{L}(\Phi_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}))$ in (8), it is determined by the loss function $\mathcal{L}$ . Denote the observed bioprocess states as $\boldsymbol{s}^{c}$ , the underlying true parameters of the real reaction network as $\bm{\theta^{c}}$ and the output as $O(\boldsymbol{s}^{c})$ which is a scalar function of states. Here the superscript “c" indicates observations and parameters from the real system. Then the simulated output is denoted by $O(\boldsymbol{s})$ , where $\boldsymbol{s}$ is the simulated state with initial state $\boldsymbol{s}_{0}=\boldsymbol{s}^{c}_{0}$ and $\bm{\theta}_{0}$ as the input of simulation model. $\mathcal{L}$ is a scalar function measuring the difference of observed and simulated outputs, such as mean squared error (MSE) $,i.e.,\mbox{E}[(O(\boldsymbol{s}^{c})-O(\boldsymbol{s}))^{2}|\boldsymbol{s}_{0% },\bm{\theta}_{0}]$ . Then we could generate $\triangledown_{\Phi}\mathcal{L}(\Phi_{0,T}(\boldsymbol{s}_{0}))$ . Combining the above results, we can derive the sensitivities $A_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ , which can identify the key contributor to the model prediction MSE.

3 Metamodeling and Prediction

In this section, we will derive the expression of $\mu$ and $\sigma$ , we start with two-step transition from initial states,

	$\displaystyle\mbox{E}(\boldsymbol{s}_{2}\|\boldsymbol{s}_{0},\bm{\theta}_{0})$	$\displaystyle=$	$\displaystyle\int p(\boldsymbol{s}_{1},\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{% \theta}_{0})\mbox{E}(\boldsymbol{s}_{2}\|\boldsymbol{s}_{1},\bm{\theta}_{1})d(% \boldsymbol{s}_{1},\bm{\theta}_{1})=\int p(\boldsymbol{s}_{1},\bm{\theta}_{1}\|% \boldsymbol{s}_{0},\bm{\theta}_{0})[\boldsymbol{s}_{1}+\bm{N}\bm{v}(% \boldsymbol{s}_{1},\bm{\theta}_{1})\Delta t]d(\boldsymbol{s}_{1},\bm{\theta}_{% 1})$		(9)
		$\displaystyle=$	$\displaystyle\mbox{E}(\boldsymbol{s}_{1}\|\boldsymbol{s}_{0},\bm{\theta}_{0})+% \int p(\boldsymbol{s}_{1},\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{\theta}_{0})% \bm{N}\bm{v}(\boldsymbol{s}_{1},\bm{\theta}_{1})\Delta td(\boldsymbol{s}_{1},% \bm{\theta}_{1}).$		(9)

By combining Equations (9) and $\mbox{E}(\boldsymbol{s}_{1}|\boldsymbol{s}_{0},\bm{\theta}_{0})=\boldsymbol{s}% _{0}+\bm{N}\bm{v}(\boldsymbol{s}_{0},\bm{\theta}_{0})\Delta t.$ , we have

\mbox{E}(\boldsymbol{s}_{2}|\boldsymbol{s}_{0},\bm{\theta}_{0})=\boldsymbol{s}% _{0}+\bm{N}\bm{v}(\boldsymbol{s}_{0},\bm{\theta}_{0})\Delta t+\bm{N}\int p(% \boldsymbol{s}_{1},\bm{\theta}_{1}|\boldsymbol{s}_{0},\bm{\theta}_{0})\bm{v}(% \boldsymbol{s}_{1},\bm{\theta}_{1})\Delta td(\boldsymbol{s}_{1},\bm{\theta}_{1% }).

(10)

We need to calculate $\int p(\boldsymbol{s}_{1},\bm{\theta}_{1}|\boldsymbol{s}_{0},\bm{\theta}_{0})% \bm{v}(\boldsymbol{s}_{1},\bm{\theta}_{1})\Delta td(\boldsymbol{s}_{1},\bm{% \theta}_{1})$ in Equation (10), i.e., the expectation of flux rate after one step transition. For each $j$ -th reaction, its reaction rate ${v_{j}}(\boldsymbol{s}_{1},\bm{\theta}_{1})$ is related to $s_{1}^{j}$ , and by applying the MM kinetics in (2), we have

	$\displaystyle\int p(\boldsymbol{s}_{1},\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{% \theta}_{0})\bm{v_{j}}(\boldsymbol{s}_{1},\bm{\theta}_{1})\Delta td(% \boldsymbol{s}_{1},\bm{\theta}_{1})=\int p(s_{1}^{j},\bm{\theta}_{1}\|% \boldsymbol{s}_{0},\bm{\theta}_{0})\frac{V_{max}^{j}s_{1}^{j}}{K_{m}^{j}+s_{1}% ^{j}}\Delta td(\boldsymbol{s}_{1},\bm{\theta}_{1})$	(11)
$\displaystyle=$	$\displaystyle\int p(s_{1}^{j},\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{\theta}_{% 0})\left(V_{max}^{j}-\frac{V_{max}^{j}K_{m}^{j}}{K_{m}^{j}+s_{1}^{j}}\right)% \Delta td(\boldsymbol{s}_{1},\bm{\theta}_{1})$
$\displaystyle=$	$\displaystyle V_{max}^{j}\Delta t-V_{max}^{j}K_{m}^{j}\Delta t\int p(s_{1}^{j}% ,\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{\theta}_{0})\frac{1}{K_{m}^{j}+s_{1}^{% j}}d(\boldsymbol{s}_{1},\bm{\theta}_{1}).$

To compute the second term in Equation (11), we first consider the probability distribution function $p(s_{1}^{j},\bm{\theta}_{1}|\boldsymbol{s}_{0},\bm{\theta}_{0})$ . Since $\bm{\theta}_{t}$ is constant over time, we will put our focus on $p(s_{1}^{j}|\boldsymbol{s}_{0},\bm{\theta}_{0})$ . As shown in Section 2.1, we’ve already derived the state transition model in Equation (3). This model adopts a truncated normal distribution, which is motivated by the fact that the concentration of molecules, represented by $s_{1}^{j}$ , cannot be negative in reality. Furthermore, the introduction of the truncated normal distribution is due to the absence of an inverse moment for the standard normal distribution. To address this issue, we will first review Theorem 1 and then apply it to approximate the second term in Equation (11).

Theorem 1 (Hall (1979)).

Suppose $\mu>0$ and $X$ is a random variable with density

\phi(X)=\left\{\begin{array}[]{l}\frac{k}{\sqrt{2\pi}\sigma}e^{\frac{-(X-\mu)^% {2}}{2\sigma^{2}}},X\geq a>0,\\ 0,X<a,\end{array}\right.

where $k$ is the normalization constant. Then for each $\hat{\sigma}=\frac{\sigma}{\mu}\leq\frac{1}{5}$ and any value of $a$ satisfying the inequality $\hat{\sigma}^{2}\leq\frac{a}{\mu}\leq\frac{1}{25}$ , we have: $\mu\mbox{E}[X^{-1}]=I(\hat{\sigma})+e_{1},\left|e_{1}\right|<8000\hat{\sigma}^% {12}<3.3\times 10^{-5},$ where $I(\hat{\sigma})$ is given in terms of Dawson’s Integral by the following equations:

\displaystyle I(\hat{\sigma})=\frac{\sqrt{2}}{\hat{\sigma}}D\left(\frac{1}{% \sqrt{2}\hat{\sigma}}\right),\quad D(x)=e^{-x^{2}}\int_{0}^{x}e^{t^{2}}dt.

(12)

Before applying Theorem 1, we rewrite Equation (11) by changing the variable,

\int p(s_{1}^{j}|\boldsymbol{s}_{0},\bm{\theta}_{0})\frac{1}{K_{m}^{j}+s_{1}^{% j}}ds_{1}^{j}\overset{\hat{s}_{1}^{j}=s_{1}^{j}+K_{m}^{j}}{=}\int p(\hat{s}_{1% }^{j}|\boldsymbol{s}_{0},\bm{\theta}_{0})\frac{1}{\hat{s}_{1}^{j}}d\hat{s}_{1}% ^{j}.

When $\boldsymbol{s}_{0}\gg\Delta t$ , both conditions $\hat{\sigma}\leq\frac{1}{5}$ and $\hat{\sigma}^{2}\leq\frac{a}{\mu}\leq\frac{1}{25}$ could be satisfied. Thus, Theorem 1 is applicable to the approximation of $\int p(s_{1}^{j}|\boldsymbol{s}_{0},\bm{\theta}_{0})\frac{1}{K_{m}^{j}+s_{1}^{% j}}ds_{1}^{j}$ , which means:

\int p(s_{1}^{j}|\boldsymbol{s}_{0},\bm{\theta}_{0})\frac{1}{K_{m}^{j}+s_{1}^{% j}}ds_{1}^{j}\approx\frac{I(\hat{\sigma}_{0}^{j})}{\mu_{0}^{j}},

(13)

\mu_{0}^{j}=s_{0}^{j}+\sum\limits_{k=1}^{R}\bm{N_{j,k}}\boldsymbol{v}_{k}(% \boldsymbol{s}_{0},\bm{\theta}_{0})\Delta t+K_{m}^{j},~{}~{}~{}\sigma_{0}^{j}=% \sqrt{\sum\limits_{k=1}^{R}\bm{N}_{j,k}^{2}\boldsymbol{v}_{k}(\boldsymbol{s}_{% 0},\bm{\theta}_{0})\Delta t},~{}~{}~{}\hat{\sigma}_{0}^{j}=\frac{\sigma_{0}^{j% }}{\mu_{0}^{j}}.

(14)

Since Dawson’s Integral does not have an analytic expression, we introduce an analytic approximation of Dawson’s Integral (Filobello-Nino et al., 2019): $D(x)\approx\frac{1}{2x}+\frac{1}{4x^{3}}$ for $x>2.68.$ Based on Filobello-Nino’s results, the approximation above can control the relative error below 2.5% (Filobello-Nino et al., 2019). Since we follow Theorem 1’s assumption that $\hat{\sigma}_{0}\leq\frac{1}{5}$ , we could derive $\frac{1}{\sqrt{2}\hat{\sigma}_{0}}\geq\frac{1}{\frac{\sqrt{2}}{5}}>2.68$ . Thus:

\frac{I(\hat{\sigma}_{0}^{j})}{\mu_{0}^{j}}=\frac{\frac{\sqrt{2}}{\hat{\sigma}% _{0}^{j}}D\left(\frac{1}{\sqrt{2}\hat{\sigma}_{0}^{j}}\right)}{\mu_{0}^{j}}% \approx\frac{\frac{\sqrt{2}}{\hat{\sigma}_{0}^{j}}\frac{\sqrt{2}\hat{\sigma}_{% 0}^{j}}{2}\left(1+\frac{(\sqrt{2}\hat{\sigma}_{0}^{j})^{2}}{2}\right)}{\mu_{0}% ^{j}}=\frac{1+{(\hat{\sigma}_{0}^{j})^{2}}}{\mu_{0}^{j}}=\frac{{(\sigma_{0}^{j% })^{2}}+{(\mu_{0}^{j})^{2}}}{{(\mu_{0}^{j})^{3}}}.

(15)

Combining results from Equations (10), (11), (13) and (15), we can get,

\mbox{E}(\boldsymbol{s}_{2}|\boldsymbol{s}_{0},\bm{\theta}_{0})\approx f(\bm{v% }(\boldsymbol{s}_{0},\bm{\theta}_{0}))\triangleq\boldsymbol{s}_{0}+\bm{N}\bm{v% }(\boldsymbol{s}_{0},\bm{\theta}_{0})\Delta t+\bm{N}V_{max}\Delta t-\bm{N}% \Delta t\left(V_{max}\odot K_{m}\odot\frac{I(\hat{\sigma}_{0})}{\mu_{0}}\right),

(16)

where $\frac{I(\hat{\sigma}_{0})}{\mu_{0}}$ , $V_{max}$ and $K_{m}$ are $R$ -dimensional vectors with $j$ th component equal to $\frac{I(\hat{\sigma}_{0}^{j})}{\mu_{0}^{j}}$ , $V_{max}^{j}$ and $K_{m}^{j}$ .
In addition, we can derive $\mbox{Var}(\boldsymbol{s}_{2}|{s}_{0},\bm{\theta}_{0})$ following above process:

$\displaystyle\mbox{Var}(\boldsymbol{s}_{2}\|\boldsymbol{s}_{0},\bm{\theta}_{0})$	$\displaystyle=$	$\displaystyle\int p(\boldsymbol{s}_{1},\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{% \theta}_{0})\mbox{Var}(\boldsymbol{s}_{2}\|\boldsymbol{s}_{1},\bm{\theta}_{1})d% (\boldsymbol{s}_{1},\bm{\theta}_{1})$	(17)
	$\displaystyle=$	$\displaystyle\int p(\boldsymbol{s}_{1},\bm{\theta}_{1}\|\boldsymbol{s}_{0},\bm{% \theta}_{0})\left[\bm{N}diag(\bm{v}(\boldsymbol{s}_{1},\bm{\theta}_{1}))\bm{N}% ^{\top}\Delta t\right]d(\boldsymbol{s}_{1},\bm{\theta}_{1})$
	$\displaystyle=$	$\displaystyle\bm{N}diag\left(V_{max}-\left(V_{max}\odot K_{m}\odot\frac{I(\hat% {\sigma}_{0})}{\mu_{0}}\right)\right)\bm{N}^{\top}\Delta t.$

Since we know the distribution of $\boldsymbol{s}_{2}$ conditioned on $\boldsymbol{s}_{0}$ , we could extend it to time $t$ and write the form of $\mu$ and $\sigma$ :

	$\displaystyle d\boldsymbol{s}_{t}=\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})dt+% \sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})dW_{t},$
	$\displaystyle\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})=\mbox{E}[\boldsymbol{s}_{% t+2}-\boldsymbol{s}_{t}\|\boldsymbol{s}_{t},\bm{\theta}_{t}]\approx\bm{N}\bm{v}% (\boldsymbol{s}_{t},\bm{\theta}_{t})\Delta t+\bm{N}V_{max}\Delta t-\bm{N}% \Delta t\left(V_{max}\odot K_{m}\odot\frac{I(\hat{\sigma}_{t})}{\mu_{t}}\right),$
	$\displaystyle\sigma^{2}(\boldsymbol{s}_{t},\bm{\theta}_{t})=\mbox{Var}[% \boldsymbol{s}_{t+2}\|\boldsymbol{s}_{t},\bm{\theta}_{t}]\approx\bm{N}diag\left% (V_{max}-\left(V_{max}\odot K_{m}\odot\frac{I(\hat{\sigma}_{t})}{\mu_{t}}% \right)\right)\bm{N}^{\top}\Delta t.$

We could also calculate $\frac{\partial\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\boldsymbol{s}_% {t}}$ , $\frac{\partial\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\bm{\theta}_{t}}$ , $\frac{\partial\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\boldsymbol{% s}_{t}}$ and $\frac{\partial\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\bm{\theta}_% {t}}$ which will be the input of the algorithm in Section 4.

4 Local Sensitivity Analysis

4.1 Convergence Analysis

We start from the gradient of the backward flow represented in Equation (6),

	$\displaystyle\triangledown\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t% _{2}})$	$\displaystyle=$	$\displaystyle\triangledown\boldsymbol{s}_{t_{2}}-\triangledown\int_{0}^{t_{2}}% (\mu(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),f(\psi_{t,t_{% 2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}dt$
			$\displaystyle-\triangledown\int_{0}^{t_{2}}(\sigma(\psi_{t,t_{2}}(\boldsymbol{% s}_{t_{2}},\bm{\theta}_{t_{2}})),g(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{% \theta}_{t_{2}})))^{\top}\circ d\widetilde{W}_{t}.$

The gradient $\triangledown\boldsymbol{s}_{0}$ is the identity matrix $\mathbb{I}$ with $d$ dimension, where $d$ is the sum of the dimension of bioprocess states $\boldsymbol{s}_{t}$ and the parameter set $\bm{\theta}_{t}$ . For the last two terms on the right side of equation, based on Proposition 2.4.3 and Theorem 3.4.3 from Kunita (2019), we can switch the order of derivative and integral, i.e.,

	$\displaystyle\triangledown\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t% _{2}})$	$\displaystyle=$	$\displaystyle\mathbb{I}_{d}-\int_{0}^{t_{2}}(\triangledown_{\psi}\mu(\psi_{t,t% _{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}f(\psi_% {t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\triangledown% \psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})dt$		(18)
			$\displaystyle-\int_{0}^{t_{2}}(\triangledown_{\psi}\sigma(\psi_{t,t_{2}}(% \boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}g(\psi_{t,t_{% 2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\triangledown\psi_{t,t% _{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})\circ d\widetilde{W}_{t}.$		(18)

Since $\psi_{0,t_{2}}$ is the inverse function of $\Phi_{0,t_{2}}$ , we have $\Phi_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))=(% \boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})$ . Therefore, by applying the chain rule, $\triangledown\Phi_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_% {t_{2}}))\triangledown\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}% })=\mathbb{I}_{d}$ . Then by applying Stratonovich version of Itô’s formula (Theorem 2.4.1 (Kunita, 2019)) on Equation (18), we have,

\frac{\partial\triangledown\Phi_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}% },\bm{\theta}_{t_{2}}))}{\partial\triangledown\psi_{0,t_{2}}(\boldsymbol{s}_{t% _{2}},\bm{\theta}_{t_{2}})}=\frac{\partial(\triangledown\psi_{0,t_{2}}(% \boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))^{-1}}{\partial\triangledown\psi_{% 0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})}=-\frac{1}{\triangledown% \psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})^{2}},

where

$\displaystyle\triangledown\Phi_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}}% ,\bm{\theta}_{t_{2}}))$			(19)
	$\displaystyle=$	$\displaystyle\mathbb{I}_{d}-\int_{0}^{t_{2}}(\triangledown_{\psi}\mu(\psi_{t,t% _{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}f(\psi_% {t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\triangledown% \psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})\frac{\partial% \triangledown\Phi_{t,t_{2}}(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_% {t_{2}}))}{\partial\triangledown\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{% \theta}_{t_{2}})}dt$
		$\displaystyle-\int_{0}^{t_{2}}(\triangledown_{\psi}\sigma(\psi_{t,t_{2}}(% \boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}g(\psi_{t,t_{% 2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\triangledown\psi_{t,t% _{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})\frac{\partial\triangledown% \Phi_{t,t_{2}}(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))}{% \partial\triangledown\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}% )}\circ d\widetilde{W}_{t}$
	$\displaystyle=$	$\displaystyle\mathbb{I}_{d}+\int_{0}^{t_{2}}(\triangledown_{\psi}\mu(\psi_{t,t% _{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}f(\psi_% {t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\triangledown% \Phi_{t,t_{2}}(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))dt$
		$\displaystyle+\int_{0}^{t_{2}}(\triangledown_{\psi}\sigma(\psi_{t,t_{2}}(% \boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}g(\psi_{t,t_{% 2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\triangledown\Phi_{t,t% _{2}}(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))\circ d% \widetilde{W}_{t}.$

Let $\widetilde{A}_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})=A_{0,t_{2}% }(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))$ . Then, by combining Equation (19) and

A_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})=\triangledown\mathcal{L}(\Phi_% {0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0}))\triangledown\Phi_{0,t_{2}}(% \boldsymbol{s}_{0},\bm{\theta}_{0}),\\

we can derive

$\displaystyle\widetilde{A}_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}% })=A_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))=% \triangledown\mathcal{L}(\Phi_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},% \bm{\theta}_{t_{2}})))\triangledown\Phi_{0,t_{2}}(\psi_{0,t_{2}}(\boldsymbol{s% }_{t_{2}},\bm{\theta}_{t_{2}}))$			(20)
	$\displaystyle=$	$\displaystyle\triangledown\mathcal{L}(\boldsymbol{s}_{t_{2}})+\int_{0}^{t_{2}}% (\triangledown_{\psi}\mu(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_% {2}})),\triangledown_{\psi}f(\psi_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}% _{t_{2}})))^{\top}\widetilde{A}_{t,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{% t_{2}})dt$
		$\displaystyle+\int_{0}^{t_{2}}(\triangledown_{\psi}\sigma(\psi_{t,t_{2}}(% \boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})),\triangledown_{\psi}g(\psi_{t,t_{% 2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})))^{\top}\widetilde{A}_{t,t_{2}% }(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})\circ d\widetilde{W}_{t}.$

From Equations (6) and (20), we can see that the drift and diffusion terms for $\psi_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})$ and $\widetilde{A}_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})$ are $C_{b}^{\infty,1}$ , which means the system $(\widetilde{A}_{0,t_{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}),\psi_{0,t% _{2}}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}}))$ has a unique strong solution. Thus, we could define $\widetilde{A}_{0,T}(\boldsymbol{s}_{t_{2}},\bm{\theta}_{t_{2}})=F(\boldsymbol{% s}_{t_{2}},\bm{\theta}_{t_{2}},W)$ where $W=\{W_{t}\}_{0\leq t\leq T}$ is a path of Wiener process and $F:\mathbb{R}^{d}\times C([0,1],\mathbb{R}^{J})\to\mathbb{R}^{d}$ is a deterministic measurable function, which is also called Itô map. In our case, the number of reactions denoted by $J$ defines the dimension of the Wiener process deployed to simulate the stochastic reaction network. Note that $A_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})=\widetilde{A}_{0,t_{2}}(\Phi_{% 0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0}))$ . In Section 2.2, we’ve already shown that $\Phi_{0,t_{2}}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ has a unique strong solution. Similar to $F$ , we define $G:\mathbb{R}^{d}\times C([0,1],\mathbb{R}^{J})\to\mathbb{R}^{d}$ as the solution map for forward flow, i.e., $G(\boldsymbol{s}_{0},\bm{\theta}_{0},W)=\Phi_{0,T}(\boldsymbol{s}_{0},\bm{% \theta}_{0})$ . Then, apparently:

A_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})=\widetilde{A}_{0,T}(G(\boldsymbol{% s}_{0},\bm{\theta}_{0},W))=F(G(\boldsymbol{s}_{0},\bm{\theta}_{0},W),W).

4.2 Algorithm Development

Based on Equations (5), (6), and (20), with $\frac{\partial\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\boldsymbol{s}_% {t}}$ , $\frac{\partial\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\bm{\theta}_{t}}$ , $\frac{\partial\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\boldsymbol{% s}_{t}}$ and $\frac{\partial\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\bm{\theta}_% {t}}$ as inputs, we could approximate $A_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0})$ . Since the system is described in Stratonovich integral, the most commonly used Euler-Maruyama Scheme is not applicable, which requires the system to be represented in Itô integral. Thus, we instead deployed the Euler-Heun method below:

$\displaystyle(\boldsymbol{s}_{t+1},\boldsymbol{\theta}_{t+1})^{\top}$	$\displaystyle=$	$\displaystyle(\boldsymbol{s}_{t},\boldsymbol{\theta}_{t})^{\top}+(\mu(% \boldsymbol{s}_{t},\bm{\theta}_{t}),f(\boldsymbol{s}_{t},\bm{\theta}_{t}))^{% \top}\Delta t$
		$\displaystyle+\frac{1}{2}\left[(\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t}),g(% \boldsymbol{s}_{t},\bm{\theta}_{t}))^{\top}+(\sigma(\overline{\boldsymbol{s}}_% {t},\overline{\bm{\theta}}_{t}),g(\overline{\boldsymbol{s}}_{t},\overline{\bm{% \theta}}_{t}))^{\top}\right](W_{t}-W_{t-1}),$
$\displaystyle(\overline{\boldsymbol{s}}_{t},\overline{\bm{\theta}}_{t})^{\top}$	$\displaystyle=$	$\displaystyle(\boldsymbol{s}_{t},\bm{\theta}_{t})^{\top}+(\sigma(\boldsymbol{s% }_{t},\bm{\theta}_{t}),g(\boldsymbol{s}_{t},\bm{\theta}_{t}))^{\top}(W_{t}-W_{% t-1}).$

Based on the schemes, the procedure of adjoint sensitivity analysis on the SDEs-based mechanistic model is described in Algorithm 1. The algorithm is built on the Euler-Heun Scheme, using the gradients of the drift and diffusion terms as inputs, and produces the expected gradient of the simulation prediction MSE as its output. The algorithm begins with the generation of sample paths of the Wiener process $W_{t}^{n}$ in Step 1. Subsequently, as depicted in Steps 2 and 3, we conduct simulations under $\boldsymbol{s}_{0}$ and $\bm{\theta}_{0}$ to obtain the simulated output denoted by $O(\boldsymbol{s})$ . In the empirical study, the observed states $\bm{s}^{c}$ are generated from the model with a predefined set of true parameters $\bm{\theta}^{c}$ , using the same path $W_{t}^{n}$ to control randomness. $W_{t}^{n}$ is queried again in the backward pass $\widetilde{W}_{t}^{n}$ to generate the backward flow $\psi^{n}_{(t-1)\Delta t,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{\theta}_{0}))$ . By combining observed output, simulated output, and the backward flow, we derive sensitivity $\widetilde{A}^{n}_{(t-1)\Delta t,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{% \theta}_{0}))$ corresponding to path $W_{t}^{n}$ in Step 6. By repeating these steps, we generate $N$ sample paths, and the mean value of $\widetilde{A}^{n}_{(t-1)\Delta t,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{% \theta}_{0}))$ for each sample path serves as the final sensitivity, as shown in Step 7.

Input:

•

Observed states $\boldsymbol{s}^{c}$ from the real system.
•

Initial parameters $\bm{\theta}_{0}$ .
•

Start time $t_{0}$ , end time $T$ .
•

Gradient of drift and diffusion term $\frac{\partial\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\boldsymbol{s}_% {t}}$ , $\frac{\partial\mu(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\bm{\theta}_{t}}$ , $\frac{\partial\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\boldsymbol{% s}_{t}}$ and $\frac{\partial\sigma(\boldsymbol{s}_{t},\bm{\theta}_{t})}{\partial\bm{\theta}_% {t}}$ .
•

Number of iterations $N$ .
•

Grid size $\Delta t$ , number of steps $S=\frac{T}{\Delta t}$ .

Output:

\mbox{E}\left[\left.\left(\frac{\partial(O(\boldsymbol{s})-O(\boldsymbol{s}^{c% }))^{2}}{\partial\boldsymbol{s}_{0}},\frac{\partial(O(\boldsymbol{s})-O(% \boldsymbol{s}^{c}))^{2}}{\partial\bm{\theta}_{0}}\right)^{\top}\right|(% \boldsymbol{s}_{0},\bm{\theta}_{0})\right]

1. Calculate the real system output

O(\boldsymbol{s}^{c})

and define

\boldsymbol{s}_{0}=\boldsymbol{s}_{0}^{c}

;

for $n=1,\ldots,N$ do

for $t\leftarrow 1$ to $S$ do

2. Generate a sample path of Wiener process

W^{n}_{t}

with grid size

\Delta t

;

3. Based on

W^{n}_{t}

, generate

\Phi^{n}_{0,t\Delta t}(\boldsymbol{s}_{0},\bm{\theta}_{0})

from Equation (5) with Euler-Heun Scheme;

end for

4. Generate backward pass

\widetilde{W}^{n}_{t}

based on forward pass

W_{t}^{n}

;

5. Calculate the model prediction output

O(\Phi^{n}_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}))

;

for $t\leftarrow S$ to $1$ do

6. Based on

\widetilde{W}^{n}_{t}

, generate

\psi^{n}_{(t-1)\Delta t,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{\theta}_{0}))

from Equation (6) with Euler-Heun Scheme;

7. Based on

\psi^{n}_{(t-1)\Delta t,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{\theta}_{0}))

, generate

\widetilde{A}^{n}_{(t-1)\Delta t,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{% \theta}_{0}))

from Equation (20) with Euler-Heun Scheme,

\triangledown\mathcal{L}(\boldsymbol{s}_{T})=2|O(\boldsymbol{s})-O(\boldsymbol% {s}^{c})|

;

end for

8. Calculate

\mbox{E}\left[\left.\left(\frac{\partial(O(\boldsymbol{s})-O(\boldsymbol{s}^{c% }))^{2}}{\partial\boldsymbol{s}_{0}},\frac{\partial(O(\boldsymbol{s})-O(% \boldsymbol{s}^{c}))^{2}}{\partial\bm{\theta}_{0}}\right)^{\top}\right|(% \boldsymbol{s}_{0},\bm{\theta}_{0})\right]=\frac{1}{N}\sum\limits_{n=1}^{N}% \widetilde{A}^{n}_{0,T}(\Phi_{0,T}^{n}(\boldsymbol{s}_{0},\bm{\theta}_{0}))

Algorithm 1 Adjoint Sensitivity Analysis on SDEs based on the MSE of Output Prediction.

From Theorem 2, as the grid or time size $\Delta t\to 0$ , the output of Algorithm 1 will converge pathwise (i.e., almost surely) from any fixed starting point to true local sensitivity. This sensitivity analysis, accounting for the interdependency of multi-scale mechanistic model, accelerates the search for the optimal parameters $\boldsymbol{\theta}$ that minimizes the model prediction’s MSE.

Theorem 2 (Li et al. (2020)).

Suppose the schemes $F_{h}$ and $G_{h}$ , with $h=\frac{T}{L}$ denoting the size of the grid for solvers, satisfy the following conditions:

•

$F_{h}(\boldsymbol{s}_{t},\boldsymbol{\theta}_{t},W)\to F(\boldsymbol{s}_{t},% \boldsymbol{\theta}_{t},W)$ , $G_{h}(\boldsymbol{s}_{t},\boldsymbol{\theta}_{t},W)\to G(\boldsymbol{s}_{t},% \boldsymbol{\theta}_{t},W)$ as $h\to 0$ ;
•

For $\forall M>0$ , $\sup_{\boldsymbol{s}_{t};\boldsymbol{\theta}_{t}\leq M}|F_{h}(\boldsymbol{s}_{% t},\boldsymbol{\theta}_{t},W)-F(\boldsymbol{s}_{t},\boldsymbol{\theta}_{t},W)|\to 0$ as $h\to 0$ .

Then for any starting point $\boldsymbol{s}_{0}$ and $\boldsymbol{\theta}_{0}$ , we have,

F_{h}(G_{h}(\boldsymbol{s}_{0},\bm{\theta}_{0},W),W)\to F(G(\boldsymbol{s}_{0}% ,\bm{\theta}_{0},W),W)=A_{0,T}(\boldsymbol{s}_{0},\bm{\theta}_{0}),\forall% \boldsymbol{s}_{0}\in\mathbb{R}^{d}.

(21)

5 Empirical Study

In this section, we conduct the empirical study by using the In-Vitro Transcription (IVT) system example to validate our SA algorithm. The bioprocess model from Wang et al. (2024) with the true model parameters denoted by $\bm{\theta}^{c}$ is used to assess the performance of proposed adjoint SA algorithm. We start the section by providing an in-depth interpretation of the sensitivity analysis results obtained from Algorithm 1. We then assess the efficacy of our approach by showing the decrease rate in the simulation output prediction MSE over multiple search iterations of model parameters.

The IVT process is mainly composed of four sub-processes: (1) initiation; (2) elongation; (3) termination; and (4) degradation of generated RNA molecule product. The molecular reaction network is illustrated in Figure 1. One insightful interpretation of the sensitivity analysis results is to investigate the relative importance of each subprocess, which is inferred from the sensitivity of their corresponding parameters. Aligned with this reasoning, Figure 1 presents the relative importance of each subprocess within the IVT system. Notably, the result underscores the pivotal role of elongation reactions in the IVT process regardless of termination time $T$ . Furthermore, it is observed that the impact of degradation increases over time, which fits our expectation since the synthesis rate of mRNA transcripts decreases as the IVT process approaches completion. Concurrently, the accumulation of generated RNA transcripts leads to an increase in the degradation rate at the system level. Therefore, the relevance of parameters associated with the degradation model becomes increasingly significant over time. Conversely, the influence of the initiation process appears to diminish over time as the raw materials (i.e., NTPs) used to synthesize RNA products are consumed.

To further assess the performance of the proposed adjoint SA, we benchmark our approach with a state-of-art gradient estimation approach (Fu, 2015) that estimates the gradient by using finite difference (FD) simulation estimator, i.e.,

\frac{O(\Phi^{n}_{0,T}(\boldsymbol{s}_{0},\hat{\bm{\theta}}+c_{k}\boldsymbol{e% }_{k}))-O(\Phi^{n}_{0,T}(\boldsymbol{s}_{0},\hat{\bm{\theta}}-c_{k}\boldsymbol% {e}_{k}))}{2c_{k}}

for the $n$ -th iteration of parameter search, where $c_{k}$ is the perturbation and $\boldsymbol{e}_{k}$ denotes a unit vector with the element in $k$ -th dimension equal to 1 and all remaining elements equal to 0. In our case, we define $c_{k}=p\times\hat{\bm{\theta}}_{k}$ where $p$ is a percentage selected to be $5\%$ , $10\%$ , and $20\%$ .

Approach	Observed	Adjoint sensitivity	Finite difference (p= $5\%$ )	Finite difference (p= $10\%$ )	Finite difference (p= $20\%$ )
Yield ( $\mu M$ )	$73.48\pm 0.28$	$73.05\pm 0.27$	$72.66\pm 0.32$	$74.37\pm 0.39$	$74.51\pm 0.31$

Table 1: Final RNA yield and its 95

\%

confidence interval with

\hat{\bm{\theta}}

calibrated with different approaches.

Figure 2 illustrates the convergence of the Mean Squared Error (MSE) of IVT process output prediction over the iterations. It’s evident that as the number of iterations increases, the adjoint SA approach surpasses the finite difference gradient estimation approach. Nonetheless, same as the finite difference estimator, the convergence speed gradually decreases, and the lower bound shows no further improvement once it reaches approximately $0.4(g/L)^{2}$ , which could be associated with a fixed step size, i.e., $\alpha=0.01$ , used during the gradient search process. This slowdown could also be attributed to our MSE objective measure that solely relies on the final RNA yield. This discrete data provide very limited information about the interactions within the complex reaction network. We also show the specific RNA yield prediction by using the model with parameters $\hat{\bm{\theta}}$ estimated with different approaches in Table 1. In contrast, the final RNA yield and its $95\%$ confidence interval (CI) obtained from the system with true parameters $\bm{\theta}^{c}$ are $73.48\pm 0.28$ . There is an overlap between the 95% confidence interval of the adjoint SA approach’s predictions and the observed RNA yield, indicating no statistically significant difference between them. In contrast, the finite difference approach shows a significant prediction discrepancy from the observed values, demonstrating that the adjoint SA approach outperforms the finite difference method.

Furthermore, we present the results of the expected relative error between $\hat{\bm{\theta}}$ and $\bm{\theta}^{c}$ , i.e., $\mbox{E}\big{[}|\frac{\hat{\bm{\theta}}-\bm{\theta}^{c}}{\bm{\theta}^{c}}|\big% {]}$ , in Figure 2. Our approach consistently outperforms the finite difference estimator, as the expected relative error diminishes more significantly after 50 iterations compared with the finite difference estimator regardless of the $p$ value chosen. However, similar to the MSE results, the lower bound of the relative difference stagnates at approximately $15\%$ , which could be attributed to the same limitation mentioned above.

6 CONCLUSION

In this paper, we present an adjoint sensitivity analysis approach designed to expedite the search for stochastic reaction network model parameters . We first formulate the multi-scale bioprocess stochastic reaction network in the form of SDEs. Then, we develop an adjoint SA algorithm on SDEs for computing local sensitivities of model parameters and validate the convergence of the algorithm. Our empirical study underscores the importance of model parameters of enzymatic stochastic reaction networks and provides empirical evidence of the efficacy of our approach. Moving forward, we aim to leverage this capability by refining the loss function. Rather than solely relying on the Mean Squared Error (MSE) of the final output, we intend to integrate additional bioprocess states and consider the disparities between simulated and real trajectories, which promises to enhance the effectiveness and accuracy of our method.

References

Filobello-Nino et al. (2019) Uriel A. Filobello-Nino, Hector Vazquez-Leal, Agustin Herrera-May, ROBERTO Ambrosio, Roberto Castañeda-Sheissa, V.M. Jimenez-Fernandez, Mario Sandoval-Hernandez, and Ana Contreras-Hernandez. A handy, accurate, invertible and integrable expression for dawson’s function. 29:18, 09 2019. doi: 10.15174/au.2019.2124.
Fu (2015) Michael C. Fu. Stochastic Gradient Estimation, pages 105–147. Springer New York, New York, NY, 2015. ISBN 978-1-4939-1384-8. doi: 10.1007/978-1-4939-1384-8_5. URL https://doi.org/10.1007/978-1-4939-1384-8_5.
Golightly and Wilkinson (2005) Andrew Golightly and Darren J Wilkinson. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics, 61(3):781–788, 2005.
Hall (1979) Richard L. Hall. Inverse moments for a class of truncated normal distributions. Sankhyā: The Indian Journal of Statistics, Series B (1960-2002), 41(1/2):66–76, 1979. ISSN 05815738. URL http://www.jstor.org/stable/25052135.
John Bailer (2001) A. John Bailer. Probabilistic techniques in exposure assessment. a handbook for dealing with variability and uncertainty in models and inputs. a. c. cullen and h. c. frey, plenum press, new york and london, 1999. no. of pages: ix + 335. price: $99.50. isbn 0-306-45956-6. Statistics in Medicine, 20(14):2211–2213, 2001. doi: https://doi.org/10.1002/sim.958. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.958.
Kedem (1980) Gershon Kedem. Automatic differentiation of computer programs. ACM Trans. Math. Softw., 6(2):150–165, jun 1980. ISSN 0098-3500. doi: 10.1145/355887.355890. URL https://doi.org/10.1145/355887.355890.
Kramer et al. (1984) Mark A. Kramer, Herschel Rabitz, Joseph M. Calo, and Robert J. Kee. Sensitivity analysis in chemical kinetics: Recent developments and computational comparisons. International Journal of Chemical Kinetics, 16(5):559–578, 1984. doi: https://doi.org/10.1002/kin.550160506. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/kin.550160506.
Kunita (2019) H. Kunita. Stochastic Flows and Jump-diffusions. Probability theory and stochastic modelling. Springer, 2019. ISBN 9789811338021. URL https://books.google.com/books?id=_hftxgEACAAJ.
Kyriakopoulos et al. (2018) Sarantos Kyriakopoulos, Kok Siong Ang, Meiyappan Lakshmanan, Zhuangrong Huang, Seongkyu Yoon, Rudiyanto Gunawan, and Dong-Yup Lee. Kinetic modeling of mammalian cell culture bioprocessing: The quest to advance biomanufacturing. Biotechnology Journal, 13(3):1700229, 2018. doi: https://doi.org/10.1002/biot.201700229. URL https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/biot.201700229.
Li et al. (2020) Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, and David Kristjanson Duvenaud. Scalable gradients for stochastic differential equations. ArXiv, abs/2001.01328, 2020. URL https://api.semanticscholar.org/CorpusID:209862121.
Michaelis and Menten (2007) Leonor Michaelis and Maud Leonora Menten. Die kinetik der invertinwirkung. Biochemische Zeitschrift, 49:333 – 369, 2007.
Sobol (2001) Ilya M. Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. 2001. URL https://api.semanticscholar.org/CorpusID:202584415.
Wang et al. (2024) Keqi Wang, Wei Xie, and Hua Zheng. Stochastic molecular reaction queueing network modeling for in vitro transcription process. In Proceedings of the Winter Simulation Conference, WSC ’23, page 1900–1911. IEEE Press, 2024. ISBN 9798350369663.
Xie et al. (2022) Wei Xie, Bo Wang, Cheng Li, Dongming Xie, and Jared Auclair. Interpretable biomanufacturing process risk and sensitivity analyses for quality-by-design and stability control. Naval Research Logistics (NRL), 69(3):461–483, 2022. doi: https://doi.org/10.1002/nav.22019. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nav.22019.
Zi (2011) Zhike Zi. Sensitivity analysis approaches applied to systems biology models. IET systems biology, 5 6:336–6, 2011. URL https://api.semanticscholar.org/CorpusID:473778.