License: CC BY-NC-ND 4.0
arXiv:2403.03274v1 [q-bio.QM] 05 Mar 2024

From Noise to Signal: Unveiling Treatment Effects from Digital Health Data through Pharmacology-Informed Neural-SDE

Samira Pakravan Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Department of Mechanical Engineering, University of California, Santa Barbara, CA, USA Equal contribution Nikolaos Evangelou Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Department of Chemical and Biomolecular Engineering Johns Hopkins University Baltimore, MD, USA Equal contribution Maxime Usdin Computational Sciences, Genentech, South San Francisco, CA 94080, USA Logan Brooks Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Corresponding authors ( [email protected], [email protected]) James Lu Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Corresponding authors ( [email protected], [email protected])
Abstract

Digital health technologies (DHT), such as wearable devices, provide personalized, continuous, and real-time monitoring of patient. These technologies are contributing to the development of novel therapies and personalized medicine. Gaining insight from these technologies requires appropriate modeling techniques to capture clinically-relevant changes in disease state. The data generated from these devices is characterized by being stochastic in nature, may have missing elements, and exhibits considerable inter-individual variability - thereby making it difficult to analyze using traditional longitudinal modeling techniques. We present a novel pharmacology-informed neural stochastic differential equation (SDE) model capable of addressing these challenges. Using synthetic data, we demonstrate that our approach is effective in identifying treatment effects and learning causal relationships from stochastic data, thereby enabling counterfactual simulation.

1 Introduction

The rise of digital health technologies (DHT) including wearable devices such as smart watch and patch based physiological sensors has opened new possibilities for continuous patient monitoring Friend et al. (2023) and enables generation of time-series data at an unprecedented temporal resolution and duration, thereby offering the potential to generate new clinical measures and insights Berisha et al. (2021). Furthermore, recent examples have shown the clinical value in modeling both the longitudinal trends as well as the stochastistity in digital health (DH) data Leander et al. (2022).

Stochastic differential equations (SDEs) have been developed to describe various phenomena that exhibits random fluctuations Fagin et al. (2023), including in biological and biomedical applications Mei et al. (2013); Tajmirriahi & Amini (2021). In the context of DH, the interplay between physiology and the measurement device is likely far too complex for one to theoretically derive the equations underlying the link between disease status and DH data from first principles. Instead, we propose to learn the underlying dynamical system directly from data, with the help of neural-SDE Evangelou et al. (2023); Dietrich et al. (2023).

Here, we develop a pharmacology-informed Lu et al. (2021); Laurie & Lu (2023) neural-SDE that:

  • learns the underlying dynamical system from a patient population, while introducing patient-dependent parameters that enables the characterization of patient-to-patient variability;

  • incorporates the causality between pharmacokinetics (PK) and pharmacodynamics (PD);

  • enables counterfactual simulations to describe drug effects at the individual patient level.

We demonstrate the effectiveness of the proposed model using synthetic data.

2 Methods

2.1 Neural-SDE Model

We assume that the longitudinal data are modelled by a system of equations of the form,

dct𝑑subscript𝑐𝑡\displaystyle dc_{t}italic_d italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =f(ct)dtabsent𝑓subscript𝑐𝑡𝑑𝑡\displaystyle=f(c_{t})dt= italic_f ( italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_d italic_t (1)
dxt𝑑subscript𝑥𝑡\displaystyle dx_{t}italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =ν(xt,ct,𝒑)dt+σ(xt,ct,𝒑)dWtabsent𝜈subscript𝑥𝑡subscript𝑐𝑡𝒑𝑑𝑡𝜎subscript𝑥𝑡subscript𝑐𝑡𝒑𝑑subscript𝑊𝑡\displaystyle=\nu(x_{t},c_{t},\bm{p})dt+\sigma(x_{t},c_{t},\bm{p})dW_{t}= italic_ν ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_p ) italic_d italic_t + italic_σ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_p ) italic_d italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (2)

where Equation 1 represents a known Ordinary Differential Equation (ODE) model with f()𝑓f(\cdot)italic_f ( ⋅ ) being the vector field for PK that governs the drug concentration, ctsubscript𝑐𝑡c_{t}\in\mathbb{R}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R , and where the drift and diffusion terms (i.e., ν(xt,ct,𝒑)𝜈subscript𝑥𝑡subscript𝑐𝑡𝒑\nu(x_{t},c_{t},\bm{p})italic_ν ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_p ) and σ(xt,ct,𝒑)𝜎subscript𝑥𝑡subscript𝑐𝑡𝒑\sigma(x_{t},c_{t},\bm{p})italic_σ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_p ) respectively) are described by neural networks. We work under the hypothesis that the drift and diffusivity terms of the effective SDE, are dependent on the state (xtsubscript𝑥𝑡x_{t}\in\mathbb{R}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R) as well as the drug concentration ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Additionally, while the underlying equations are the same for all patients, the model includes a latent patient-dependent parameter vector 𝒑𝒑\bm{p}bold_italic_p that describes the patient-to-patient variability. This latent parameter 𝒑𝒑\bm{p}bold_italic_p is discovered in a data-driven way based on the work of Lu et al. (2021), which we elaborate below.

While the available data are in the form of trajectories, we transform them to snapshots 𝒟𝒟\mathcal{D}caligraphic_D in a manner analogous to that done in Dietrich et al. (2023). In particular, each snapshot 𝒟isuperscript𝒟𝑖\mathcal{D}^{i}caligraphic_D start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, uniquely identified by the index i𝑖iitalic_i, takes the form 𝒟i={x1i,x0i,Δt,c1i,𝒑i,j}superscript𝒟𝑖superscriptsubscript𝑥1𝑖superscriptsubscript𝑥0𝑖Δ𝑡superscriptsubscript𝑐1𝑖superscript𝒑𝑖𝑗\mathcal{D}^{i}=\{x_{1}^{i},x_{0}^{i},\Delta t,c_{1}^{i},\bm{p}^{i,j}\}caligraphic_D start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , roman_Δ italic_t , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT }, where x1isubscriptsuperscript𝑥𝑖1x^{i}_{1}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the evolution of the state variable xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT after a time step ΔtΔ𝑡\Delta troman_Δ italic_t given the initial condition x0isuperscriptsubscript𝑥0𝑖x_{0}^{i}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT; 𝒑i,jsuperscript𝒑𝑖𝑗\bm{p}^{i,j}bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT is the latent parameter for the j𝑗jitalic_jth patient. Note that we utilize the concentration at c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and not at c0subscript𝑐0c_{0}italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT following the (symplectic) Euler-Maryama scheme discussed in Dietrich et al. (2023). The concentration ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the patient dependent parameter 𝒑𝒑\bm{p}bold_italic_p enter into the overall architecture as inputs based on Dietrich et al. (2023); Evangelou et al. (2023).

The construction of the loss function (based on Dietrich et al. (2023)) is derived from the numerical integration scheme (symplectic) Euler-Maruyama. The numerical approximation of Equations 1 and 2 results in:

c1i=c0i+f(c0i)Δtsubscriptsuperscript𝑐𝑖1subscriptsuperscript𝑐𝑖0𝑓subscriptsuperscript𝑐𝑖0Δ𝑡\displaystyle c^{i}_{1}=c^{i}_{0}+f(c^{i}_{0})\Delta titalic_c start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_c start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_f ( italic_c start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) roman_Δ italic_t (3)
x1i=x0i+ν(x0i,c1i,𝒑i,j)Δt+σ(x0i,c1i,𝒑i,j)δW0,subscriptsuperscript𝑥𝑖1subscriptsuperscript𝑥𝑖0𝜈subscriptsuperscript𝑥𝑖0subscriptsuperscript𝑐𝑖1superscript𝒑𝑖𝑗Δ𝑡𝜎subscriptsuperscript𝑥𝑖0subscriptsuperscript𝑐𝑖1superscript𝒑𝑖𝑗𝛿subscript𝑊0\displaystyle x^{i}_{1}=x^{i}_{0}+\nu(x^{i}_{0},c^{i}_{1},\bm{p}^{i,j})\Delta t% +\sigma(x^{i}_{0},c^{i}_{1},\bm{p}^{i,j})\delta W_{0},italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_ν ( italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) roman_Δ italic_t + italic_σ ( italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_c start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) italic_δ italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (4)

where δW0𝛿subscript𝑊0\delta W_{0}italic_δ italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is normally distributed around zero and ΔtΔ𝑡\Delta troman_Δ italic_t is a variable timestep. The drift and diffusivity terms are approximated by two networks νθsubscript𝜈𝜃\nu_{\theta}italic_ν start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and σθsubscript𝜎𝜃\sigma_{\theta}italic_σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, under the assumption that x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is drawn from a normal distribution of the form,

x1i𝒩(x0i+νθ(x0i,c1i,𝒑i,j)Δt,σθ(x0i,c1i,𝒑i,j)2Δt).similar-tosuperscriptsubscript𝑥1𝑖𝒩superscriptsubscript𝑥0𝑖subscript𝜈𝜃superscriptsubscript𝑥0𝑖superscriptsubscript𝑐1𝑖superscript𝒑𝑖𝑗Δ𝑡subscript𝜎𝜃superscriptsuperscriptsubscript𝑥0𝑖superscriptsubscript𝑐1𝑖superscript𝒑𝑖𝑗2Δ𝑡x_{1}^{i}\sim\mathcal{N}(x_{0}^{i}+\nu_{\theta}(x_{0}^{i},c_{1}^{i},\bm{p}^{i,% j})\Delta t,\sigma_{\theta}(x_{0}^{i},c_{1}^{i},\bm{p}^{i,j})^{2}\Delta t).italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∼ caligraphic_N ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_ν start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) roman_Δ italic_t , italic_σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ italic_t ) . (5)

With the assumed mean and variance in Equation 5 for the drift and diffusivity, we can compute the logarithm of the resulting normal distribution and derive the following loss function that maximizes the likelihood:

(θ|x0i,x1i,Δt):=(x1ix0iνθ(x0i,c1i,𝒑i,j))2Δtσθ(x0i,c1i,𝒑i,j)2+log|Δtσ(x0i,c1i,𝒑i,j)2|.assignconditional𝜃superscriptsubscript𝑥0𝑖superscriptsubscript𝑥1𝑖Δ𝑡superscriptsuperscriptsubscript𝑥1𝑖superscriptsubscript𝑥0𝑖subscript𝜈𝜃superscriptsubscript𝑥0𝑖superscriptsubscript𝑐1𝑖superscript𝒑𝑖𝑗2Δ𝑡subscript𝜎𝜃superscriptsuperscriptsubscript𝑥0𝑖superscriptsubscript𝑐1𝑖superscript𝒑𝑖𝑗2logΔ𝑡𝜎superscriptsuperscriptsubscript𝑥0𝑖superscriptsubscript𝑐1𝑖superscript𝒑𝑖𝑗2\mathcal{L}(\theta|x_{0}^{i},x_{1}^{i},\Delta t):=\frac{(x_{1}^{i}-x_{0}^{i}-% \nu_{\theta}(x_{0}^{i},c_{1}^{i},\bm{p}^{i,j}))^{2}}{\Delta t\sigma_{\theta}(x% _{0}^{i},c_{1}^{i},\bm{p}^{i,j})^{2}}+\text{log}|\Delta t\sigma(x_{0}^{i},c_{1% }^{i},\bm{p}^{i,j})^{2}|.caligraphic_L ( italic_θ | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , roman_Δ italic_t ) := divide start_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - italic_ν start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ italic_t italic_σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + log | roman_Δ italic_t italic_σ ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_p start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | . (6)

It should be noted that the Neural-SDE framework by Dietrich et al. (2023) is also capable of handling varying time steps ΔtΔ𝑡\Delta troman_Δ italic_t.

The Neural-SDE architecture consists of two network components for the drift and diffusion models. In our work, the drift network consists of 4 layers where each layer has 64 neurons each followed by ELU activation function. The diffusion network consists of 3 layers with 32 neurons, the first two layers are followed by ELU activation function and the output layer is followed by softplus activation function. A schematic of the Neural-SDE architecture is shown in Figure 1.

2.2 Latent Patient Descriptors - GRU Encoder

Our approach to learning the Neural-SDE from data across the patient population is to identify a set of dynamical equations that holds across all patients, as well as patient-specific descriptors (or embedding) that characterize patient-to-patient variability Laurie & Lu (2023). In our approach, those patient-specific descriptors are discovered in a data-driven manner, based on the work of Lu et al. (2021): a Gated Recurrent Unit (GRU) encoder was used to discover the latent parameter 𝒑𝒑\bm{p}bold_italic_p, with longitudinal data provided in a tabular form as an input. More specifically, the input data entering the encoder consist of variable number of rows for each patient and the following four columns: (1) the absolute time; (2) the time after dose; (3) the stochastic PD data (4) the deterministic PK data.

Each tabular input was padded and masking was applied in order to handle the variable time points. The GRU encoder has 128 hidden states and is connected to a Multilayer Perceptron (MLP) consisting of 2 layers, each with 128 neurons, both followed by ELU activation function. The output of MLP is the latent parameter 𝒑𝒑\bm{p}bold_italic_p that enters the Neural-SDE architecture. An end-to-end training was implemented by using the loss function given by Equation 6.

Refer to caption
Figure 1: The Neural-SDE architecture including the GRU encoder.

2.3 Dataset

To mimic clinical digital health measurements, synthetic data was simulated in which the PK serves as a deterministic driving input that causally influences a stochastically evolving PD. Patient specific parameters were sampled from a log-normal distribution: 50 individual patient trajectories were sampled across 3 different dose levels (50 mg, 100 mg, 400 mg) for a total of 150 patient trajectories and 70:30 train-test split was used; further details are summarized in Appendix A.1.

3 Results

Figure 2 demonstrates the model’s ability to learn the underlying system’s dynamics by comparing “true” (i.e., the underlying ground truth) SDE trajectories from the test dataset against the model predicted trajectories. For each patient in the test set, we sampled 250 trajectories to provide a robust representation of the predictive variability associated with the model. This result demonstrates the model’s ability in replicating the complex dynamics of PD trajectories at the population level.

Refer to caption
Figure 2: Comparison of the true and predicted SDE trajectories in the test dataset. Left panel: the colored lines represent the observed stochastic trajectories in the test data. Right panel: blue line and shaded region represent the median and the 10thsuperscript10𝑡10^{th}10 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT to 90thsuperscript90𝑡90^{th}90 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentile respectively of the ground truth trajectories; similarly, the orange lines and shade region represent those from the model.

3.1 Dosing regimen analysis

To analyze the impact of different dosing regimens on PD, we consider three distinct simulated doses at 50mg50mg\rm 50~{}mg50 roman_mg, 100mg100mg\rm 100~{}mg100 roman_mg, and 400mg400mg\rm 400~{}mg400 roman_mg. For each patient from the test dataset, we sampled 250 SDE trajectories. Figure 3 shows the the model is qualitatively able to capture the true underlying dose response relationship.

Refer to caption
Refer to caption
Refer to caption
Figure 3: Comparison of the true and predicted SDE trajectories in the test datase for 50, 100 and 400 mg doses. Blue lines represent the median of the ground truth trajectories; orange dashed lines and shaded regions represent median and the 10thsuperscript10𝑡10^{th}10 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT to 90thsuperscript90𝑡90^{th}90 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentile of trajectories from the model.

3.2 Patient-specific responses and counterfactual analysis

Figure 4 demonstrates the proposed methodology’s ability to perform counterfactual analysis and identify individual treatment effects. To accomplish this, for each patient the drift and diffusivity terms were inferred from the trained model and 250 SDE trajectories were generated. The results demonstrate the model’s ability to capture the underlying dynamics of the stochastic process for individual patients. This suggests that the GRU encoding strategy not only captures the population behaviors, but also successfully learns to differentiate amongst patients. Moreover, we demonstrate a what-if scenario: in the absence of PK, the model correctly predicts a lack of dynamical change in the modeled PD endpoint. This suggests our model is able to correctly identify the causal relationship between PK and PD.

Refer to caption
Refer to caption
Refer to caption
Figure 4: Patient-specific trajectories and counterfactual simulations. Each subplot represents a random patient from the respective dosages. The solid blue line represents the true drift; the orange dashed line and shaded region represent the mean and mean ±plus-or-minus\rm\pm± standard deviation (std) of 250 posterior samples; the green dashed lines represent counterfactual simulations assuming no dosing (i.e., PK=0PK0\rm PK=0roman_PK = 0).

4 CONCLUSION

We proposed a pharmacology-informed neural-SDE architecture that is able learn the relationship between a deterministic PK and stochastic PD. Using synthetic data, the model correctly reproduces the underlying PK-PD relationship at the population level. Furthermore, the model enables the counterfactual simulation of PD in the absence of the hypothetical drug - and in doing so, quantify the individual treatment effect.

References

  • Berisha et al. (2021) Visar Berisha, Chelsea Krantsevich, P Richard Hahn, Shira Hahn, Gautam Dasarathy, Pavan Turaga, and Julie Liss. Digital medicine and the curse of dimensionality. NPJ digital medicine, 4(1):153, 2021.
  • Dayneka et al. (1993) N. L. Dayneka, V. Garg, and W. J. Jusko. Comparison of four basic models of indirect pharmacodynamic responses. Journal of pharmacokinetics and biopharmaceutics, 21(4):457–478, 1993. doi: 10.1007/BF01061691.
  • Dietrich et al. (2023) Felix Dietrich, Alexei Makeev, George Kevrekidis, Nikolaos Evangelou, Tom Bertalan, Sebastian Reich, and Ioannis G Kevrekidis. Learning effective stochastic differential equations from microscopic simulations: Linking stochastic numerics to deep learning. Chaos: An Interdisciplinary Journal of Nonlinear Science, 33(2), 2023.
  • Evangelou et al. (2023) Nikolaos Evangelou, Felix Dietrich, Juan M Bello-Rivas, Alex J Yeh, Rachel S Hendley, Michael A Bevan, and Ioannis G Kevrekidis. Learning effective sdes from brownian dynamic simulations of colloidal particles. Molecular Systems Design & Engineering, 2023.
  • Fagin et al. (2023) Joshua Fagin, Ji Won Park, Henry Best, KE Saavik Ford, Matthew J Graham, V Ashley Villar, Shirley Ho, James Hung-Hsu Chan, and Matthew O’Dowd. Latent stochastic differential equations for modeling quasar variability and inferring black hole properties. arXiv preprint arXiv:2304.04277, 2023.
  • Friend et al. (2023) Stephen H Friend, Geoffrey S Ginsburg, and Rosalind W Picard. Wearable digital health technology, 2023.
  • Laurie & Lu (2023) Mark Laurie and James Lu. Explainable deep learning for tumor dynamic modeling and overall survival prediction using neural-ode. npj Systems Biology and Applications, 9(1):58, 2023.
  • Leander et al. (2022) Jacob Leander, Mats Jirstrand, Ulf G Eriksson, and Robert Palmér. A stochastic mixed effects model to assess treatment effects and fluctuations in home-measured peak expiratory flow and the association with exacerbation risk in asthma. CPT: Pharmacometrics & Systems Pharmacology, 11(2):212–224, 2022.
  • Lu et al. (2021) James Lu, Brendan Bender, ** Y **, and Yuanfang Guan. Deep learning prediction of patient response time course from early data via neural-pharmacokinetic/pharmacodynamic modelling. Nature machine intelligence, 3:696–704, 2021.
  • Mei et al. (2013) Yongguo Mei, Adria Carbo, Raquel Hontecillas, and Josep Bassaganya-Riera. Enisi sde: a novel web-based stochastic modeling tool for computational biology. In 2013 IEEE International Conference on Bioinformatics and Biomedicine, pp.  392–397. IEEE, 2013.
  • Tajmirriahi & Amini (2021) Mahnoosh Tajmirriahi and Zahra Amini. Modeling of seizure and seizure-free eeg signals based on stochastic differential equations. Chaos, Solitons & Fractals, 150:111104, 2021.

Appendix A Appendix

A.1 Dataset Generation Details

Synthetic training data was generated to represent a indirect response PK-PD model Dayneka et al. (1993) by which PK acts causally to change the PD, with the additional modification that the observable PD variable is stochastic in nature. This system follows the general form of Equations 1 and 2, with the following system of ODEs being specified for the term f(ct,𝒑)dt𝑓subscript𝑐𝑡𝒑𝑑𝑡f(c_{t},\bm{p})dtitalic_f ( italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_italic_p ) italic_d italic_t:

du1dt𝑑subscript𝑢1𝑑𝑡\displaystyle\frac{du_{1}}{dt}divide start_ARG italic_d italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG =KA×u1(t)absent𝐾𝐴subscript𝑢1𝑡\displaystyle=-KA\times u_{1}(t)= - italic_K italic_A × italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) (7)
du2dt𝑑subscript𝑢2𝑑𝑡\displaystyle\frac{du_{2}}{dt}divide start_ARG italic_d italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG =KA×u1(t)u2(t)×(KE+K12)+u3(t)×K21absent𝐾𝐴subscript𝑢1𝑡subscript𝑢2𝑡𝐾𝐸𝐾12subscript𝑢3𝑡𝐾21\displaystyle=KA\times u_{1}(t)-u_{2}(t)\times(KE+K12)+u_{3}(t)\times K21= italic_K italic_A × italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) - italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) × ( italic_K italic_E + italic_K 12 ) + italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ) × italic_K 21 (8)
du3dt𝑑subscript𝑢3𝑑𝑡\displaystyle\frac{du_{3}}{dt}divide start_ARG italic_d italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG =K12×u2(t)K21×u3(t)absent𝐾12subscript𝑢2𝑡𝐾21subscript𝑢3𝑡\displaystyle=K12\times u_{2}(t)-K21\times u_{3}(t)= italic_K 12 × italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) - italic_K 21 × italic_u start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ) (9)

where ct=u2(t)/V2subscript𝑐𝑡subscript𝑢2𝑡V2c_{t}=u_{2}(t)/\text{V2}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) / V2 with V2 representing the volume of distribution for drug in plasma circulation. The drift term in the relationship between ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and PD is represented by the following:

du4dt=KIN(KOUT*(1(Imax×ct/IC50+ct)))×u4(t).𝑑subscript𝑢4𝑑𝑡𝐾𝐼𝑁𝐾𝑂𝑈𝑇1𝐼𝑚𝑎𝑥subscript𝑐𝑡𝐼𝐶50subscript𝑐𝑡subscript𝑢4𝑡\displaystyle\frac{du_{4}}{dt}={KIN}-(KOUT*(1-(Imax\times c_{t}/IC50+c_{t})))% \times u_{4}(t).divide start_ARG italic_d italic_u start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG = italic_K italic_I italic_N - ( italic_K italic_O italic_U italic_T * ( 1 - ( italic_I italic_m italic_a italic_x × italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_I italic_C 50 + italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ) × italic_u start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_t ) . (10)

Example trajectories of this system are shown in Figure 5. The diffusion term in Equation 2 is described by the following βu4dWt𝛽subscript𝑢4𝑑subscript𝑊𝑡\beta u_{4}dW_{t}italic_β italic_u start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_d italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where β𝛽\betaitalic_β was sampled from a log-normal distribution. Examples of stochastic trajectories for ctsubscript𝑐𝑡c_{t}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are shown in Figure 6.

In the current set of experiments, an equal number of patients were simulation for a range of doses (50, 100, 400 mg). Dosing was set to begin at day 5 for all synthetic subjects with daily dosing; the PD sampling frequency is once per hour, over a period of 30 days.

Refer to caption
Figure 5: Synthetic data trajectories without the diffusivity component under different simulated doses.
Refer to caption
Figure 6: Synthetic data trajectories under different doses.

A.2 Training methodology and optimization strategy

The current model, including the numerical integration scheme which employs a Euler-Maruyama solver, have been implemented in PyTorch. While a higher-order methods were not used in this current work, it remains open for future development based on specific needs.

In model training, we leveraged vectorization rather than operating on a single value at a time whereby the model processes each time-step for each patient sequentially. In this way, the model operates at a patient level, concurrently processing all data points associated with a specific patient. This is feasible based on the observation that evaluating the loss function given in Equation 6 at each time-step is independent from other time instances. The vectorization strategy significantly enhances the training and inference performance.

We trained the network for 100 epochs using the ADAM optimizer with learning rate 0.001 and batch size of 1. The overall training process takes around 140 seconds using one NVIDIA V100 GPU.