From Noise to Signal: Unveiling Treatment Effects from Digital Health Data through Pharmacology-Informed Neural-SDE

Samira Pakravan Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Department of Mechanical Engineering, University of California, Santa Barbara, CA, USA Equal contribution Nikolaos Evangelou Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Department of Chemical and Biomolecular Engineering Johns Hopkins University Baltimore, MD, USA Equal contribution Maxime Usdin Computational Sciences, Genentech, South San Francisco, CA 94080, USA Logan Brooks Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Corresponding authors ( [email protected], [email protected]) James Lu Clinical Pharmacology, Genentech, South San Francisco, CA 94080, USA Corresponding authors ( [email protected], [email protected])

Abstract

Digital health technologies (DHT), such as wearable devices, provide personalized, continuous, and real-time monitoring of patient. These technologies are contributing to the development of novel therapies and personalized medicine. Gaining insight from these technologies requires appropriate modeling techniques to capture clinically-relevant changes in disease state. The data generated from these devices is characterized by being stochastic in nature, may have missing elements, and exhibits considerable inter-individual variability - thereby making it difficult to analyze using traditional longitudinal modeling techniques. We present a novel pharmacology-informed neural stochastic differential equation (SDE) model capable of addressing these challenges. Using synthetic data, we demonstrate that our approach is effective in identifying treatment effects and learning causal relationships from stochastic data, thereby enabling counterfactual simulation.

1 Introduction

The rise of digital health technologies (DHT) including wearable devices such as smart watch and patch based physiological sensors has opened new possibilities for continuous patient monitoring Friend et al. (2023) and enables generation of time-series data at an unprecedented temporal resolution and duration, thereby offering the potential to generate new clinical measures and insights Berisha et al. (2021). Furthermore, recent examples have shown the clinical value in modeling both the longitudinal trends as well as the stochastistity in digital health (DH) data Leander et al. (2022).

Stochastic differential equations (SDEs) have been developed to describe various phenomena that exhibits random fluctuations Fagin et al. (2023), including in biological and biomedical applications Mei et al. (2013); Tajmirriahi & Amini (2021). In the context of DH, the interplay between physiology and the measurement device is likely far too complex for one to theoretically derive the equations underlying the link between disease status and DH data from first principles. Instead, we propose to learn the underlying dynamical system directly from data, with the help of neural-SDE Evangelou et al. (2023); Dietrich et al. (2023).

Here, we develop a pharmacology-informed Lu et al. (2021); Laurie & Lu (2023) neural-SDE that:

•

learns the underlying dynamical system from a patient population, while introducing patient-dependent parameters that enables the characterization of patient-to-patient variability;
•

incorporates the causality between pharmacokinetics (PK) and pharmacodynamics (PD);
•

enables counterfactual simulations to describe drug effects at the individual patient level.

We demonstrate the effectiveness of the proposed model using synthetic data.

2 Methods

2.1 Neural-SDE Model

We assume that the longitudinal data are modelled by a system of equations of the form,

	$\displaystyle dc_{t}$	$\displaystyle=f(c_{t})dt$		(1)
	$\displaystyle dx_{t}$	$\displaystyle=\nu(x_{t},c_{t},\bm{p})dt+\sigma(x_{t},c_{t},\bm{p})dW_{t}$		(2)

where Equation 1 represents a known Ordinary Differential Equation (ODE) model with $f(\cdot)$ being the vector field for PK that governs the drug concentration, $c_{t}\in\mathbb{R}$ , and where the drift and diffusion terms (i.e., $\nu(x_{t},c_{t},\bm{p})$ and $\sigma(x_{t},c_{t},\bm{p})$ respectively) are described by neural networks. We work under the hypothesis that the drift and diffusivity terms of the effective SDE, are dependent on the state ( $x_{t}\in\mathbb{R}$ ) as well as the drug concentration $c_{t}$ . Additionally, while the underlying equations are the same for all patients, the model includes a latent patient-dependent parameter vector $\bm{p}$ that describes the patient-to-patient variability. This latent parameter $\bm{p}$ is discovered in a data-driven way based on the work of Lu et al. (2021), which we elaborate below.

While the available data are in the form of trajectories, we transform them to snapshots $\mathcal{D}$ in a manner analogous to that done in Dietrich et al. (2023). In particular, each snapshot $\mathcal{D}^{i}$ , uniquely identified by the index $i$ , takes the form $\mathcal{D}^{i}=\{x_{1}^{i},x_{0}^{i},\Delta t,c_{1}^{i},\bm{p}^{i,j}\}$ , where $x^{i}_{1}$ is the evolution of the state variable $x_{t}$ after a time step $\Delta t$ given the initial condition $x_{0}^{i}$ ; $\bm{p}^{i,j}$ is the latent parameter for the $j$ th patient. Note that we utilize the concentration at $c_{1}$ and not at $c_{0}$ following the (symplectic) Euler-Maryama scheme discussed in Dietrich et al. (2023). The concentration $c_{t}$ and the patient dependent parameter $\bm{p}$ enter into the overall architecture as inputs based on Dietrich et al. (2023); Evangelou et al. (2023).

The construction of the loss function (based on Dietrich et al. (2023)) is derived from the numerical integration scheme (symplectic) Euler-Maruyama. The numerical approximation of Equations 1 and 2 results in:

	$\displaystyle c^{i}_{1}=c^{i}_{0}+f(c^{i}_{0})\Delta t$		(3)
	$\displaystyle x^{i}_{1}=x^{i}_{0}+\nu(x^{i}_{0},c^{i}_{1},\bm{p}^{i,j})\Delta t% +\sigma(x^{i}_{0},c^{i}_{1},\bm{p}^{i,j})\delta W_{0},$		(4)

where $\delta W_{0}$ is normally distributed around zero and $\Delta t$ is a variable timestep. The drift and diffusivity terms are approximated by two networks $\nu_{\theta}$ and $\sigma_{\theta}$ , under the assumption that $x_{1}$ is drawn from a normal distribution of the form,

x_{1}^{i}\sim\mathcal{N}(x_{0}^{i}+\nu_{\theta}(x_{0}^{i},c_{1}^{i},\bm{p}^{i,% j})\Delta t,\sigma_{\theta}(x_{0}^{i},c_{1}^{i},\bm{p}^{i,j})^{2}\Delta t).

(5)

With the assumed mean and variance in Equation 5 for the drift and diffusivity, we can compute the logarithm of the resulting normal distribution and derive the following loss function that maximizes the likelihood:

\mathcal{L}(\theta|x_{0}^{i},x_{1}^{i},\Delta t):=\frac{(x_{1}^{i}-x_{0}^{i}-% \nu_{\theta}(x_{0}^{i},c_{1}^{i},\bm{p}^{i,j}))^{2}}{\Delta t\sigma_{\theta}(x% _{0}^{i},c_{1}^{i},\bm{p}^{i,j})^{2}}+\text{log}|\Delta t\sigma(x_{0}^{i},c_{1% }^{i},\bm{p}^{i,j})^{2}|.

(6)

It should be noted that the Neural-SDE framework by Dietrich et al. (2023) is also capable of handling varying time steps $\Delta t$ .

The Neural-SDE architecture consists of two network components for the drift and diffusion models. In our work, the drift network consists of 4 layers where each layer has 64 neurons each followed by ELU activation function. The diffusion network consists of 3 layers with 32 neurons, the first two layers are followed by ELU activation function and the output layer is followed by softplus activation function. A schematic of the Neural-SDE architecture is shown in Figure 1.

2.2 Latent Patient Descriptors - GRU Encoder

Our approach to learning the Neural-SDE from data across the patient population is to identify a set of dynamical equations that holds across all patients, as well as patient-specific descriptors (or embedding) that characterize patient-to-patient variability Laurie & Lu (2023). In our approach, those patient-specific descriptors are discovered in a data-driven manner, based on the work of Lu et al. (2021): a Gated Recurrent Unit (GRU) encoder was used to discover the latent parameter $\bm{p}$ , with longitudinal data provided in a tabular form as an input. More specifically, the input data entering the encoder consist of variable number of rows for each patient and the following four columns: (1) the absolute time; (2) the time after dose; (3) the stochastic PD data (4) the deterministic PK data.

Each tabular input was padded and masking was applied in order to handle the variable time points. The GRU encoder has 128 hidden states and is connected to a Multilayer Perceptron (MLP) consisting of 2 layers, each with 128 neurons, both followed by ELU activation function. The output of MLP is the latent parameter $\bm{p}$ that enters the Neural-SDE architecture. An end-to-end training was implemented by using the loss function given by Equation 6.

Refer to caption — Figure 1: The Neural-SDE architecture including the GRU encoder.

2.3 Dataset

To mimic clinical digital health measurements, synthetic data was simulated in which the PK serves as a deterministic driving input that causally influences a stochastically evolving PD. Patient specific parameters were sampled from a log-normal distribution: 50 individual patient trajectories were sampled across 3 different dose levels (50 mg, 100 mg, 400 mg) for a total of 150 patient trajectories and 70:30 train-test split was used; further details are summarized in Appendix A.1.

3 Results

Figure 2 demonstrates the model’s ability to learn the underlying system’s dynamics by comparing “true” (i.e., the underlying ground truth) SDE trajectories from the test dataset against the model predicted trajectories. For each patient in the test set, we sampled 250 trajectories to provide a robust representation of the predictive variability associated with the model. This result demonstrates the model’s ability in replicating the complex dynamics of PD trajectories at the population level.

3.1 Dosing regimen analysis

To analyze the impact of different dosing regimens on PD, we consider three distinct simulated doses at $\rm 50~{}mg$ , $\rm 100~{}mg$ , and $\rm 400~{}mg$ . For each patient from the test dataset, we sampled 250 SDE trajectories. Figure 3 shows the the model is qualitatively able to capture the true underlying dose response relationship.

3.2 Patient-specific responses and counterfactual analysis

Figure 4 demonstrates the proposed methodology’s ability to perform counterfactual analysis and identify individual treatment effects. To accomplish this, for each patient the drift and diffusivity terms were inferred from the trained model and 250 SDE trajectories were generated. The results demonstrate the model’s ability to capture the underlying dynamics of the stochastic process for individual patients. This suggests that the GRU encoding strategy not only captures the population behaviors, but also successfully learns to differentiate amongst patients. Moreover, we demonstrate a what-if scenario: in the absence of PK, the model correctly predicts a lack of dynamical change in the modeled PD endpoint. This suggests our model is able to correctly identify the causal relationship between PK and PD.

4 CONCLUSION

We proposed a pharmacology-informed neural-SDE architecture that is able learn the relationship between a deterministic PK and stochastic PD. Using synthetic data, the model correctly reproduces the underlying PK-PD relationship at the population level. Furthermore, the model enables the counterfactual simulation of PD in the absence of the hypothetical drug - and in doing so, quantify the individual treatment effect.

References

Berisha et al. (2021) Visar Berisha, Chelsea Krantsevich, P Richard Hahn, Shira Hahn, Gautam Dasarathy, Pavan Turaga, and Julie Liss. Digital medicine and the curse of dimensionality. NPJ digital medicine, 4(1):153, 2021.
Dayneka et al. (1993) N. L. Dayneka, V. Garg, and W. J. Jusko. Comparison of four basic models of indirect pharmacodynamic responses. Journal of pharmacokinetics and biopharmaceutics, 21(4):457–478, 1993. doi: 10.1007/BF01061691.
Dietrich et al. (2023) Felix Dietrich, Alexei Makeev, George Kevrekidis, Nikolaos Evangelou, Tom Bertalan, Sebastian Reich, and Ioannis G Kevrekidis. Learning effective stochastic differential equations from microscopic simulations: Linking stochastic numerics to deep learning. Chaos: An Interdisciplinary Journal of Nonlinear Science, 33(2), 2023.
Evangelou et al. (2023) Nikolaos Evangelou, Felix Dietrich, Juan M Bello-Rivas, Alex J Yeh, Rachel S Hendley, Michael A Bevan, and Ioannis G Kevrekidis. Learning effective sdes from brownian dynamic simulations of colloidal particles. Molecular Systems Design & Engineering, 2023.
Fagin et al. (2023) Joshua Fagin, Ji Won Park, Henry Best, KE Saavik Ford, Matthew J Graham, V Ashley Villar, Shirley Ho, James Hung-Hsu Chan, and Matthew O’Dowd. Latent stochastic differential equations for modeling quasar variability and inferring black hole properties. arXiv preprint arXiv:2304.04277, 2023.
Friend et al. (2023) Stephen H Friend, Geoffrey S Ginsburg, and Rosalind W Picard. Wearable digital health technology, 2023.
Laurie & Lu (2023) Mark Laurie and James Lu. Explainable deep learning for tumor dynamic modeling and overall survival prediction using neural-ode. npj Systems Biology and Applications, 9(1):58, 2023.
Leander et al. (2022) Jacob Leander, Mats Jirstrand, Ulf G Eriksson, and Robert Palmér. A stochastic mixed effects model to assess treatment effects and fluctuations in home-measured peak expiratory flow and the association with exacerbation risk in asthma. CPT: Pharmacometrics & Systems Pharmacology, 11(2):212–224, 2022.
Lu et al. (2021) James Lu, Brendan Bender, ** Y **, and Yuanfang Guan. Deep learning prediction of patient response time course from early data via neural-pharmacokinetic/pharmacodynamic modelling. Nature machine intelligence, 3:696–704, 2021.
Mei et al. (2013) Yongguo Mei, Adria Carbo, Raquel Hontecillas, and Josep Bassaganya-Riera. Enisi sde: a novel web-based stochastic modeling tool for computational biology. In 2013 IEEE International Conference on Bioinformatics and Biomedicine, pp. 392–397. IEEE, 2013.
Tajmirriahi & Amini (2021) Mahnoosh Tajmirriahi and Zahra Amini. Modeling of seizure and seizure-free eeg signals based on stochastic differential equations. Chaos, Solitons & Fractals, 150:111104, 2021.

Appendix A Appendix

A.1 Dataset Generation Details

Synthetic training data was generated to represent a indirect response PK-PD model Dayneka et al. (1993) by which PK acts causally to change the PD, with the additional modification that the observable PD variable is stochastic in nature. This system follows the general form of Equations 1 and 2, with the following system of ODEs being specified for the term $f(c_{t},\bm{p})dt$ :

$\displaystyle\frac{du_{1}}{dt}$	$\displaystyle=-KA\times u_{1}(t)$	(7)
$\displaystyle\frac{du_{2}}{dt}$	$\displaystyle=KA\times u_{1}(t)-u_{2}(t)\times(KE+K12)+u_{3}(t)\times K21$	(8)
$\displaystyle\frac{du_{3}}{dt}$	$\displaystyle=K12\times u_{2}(t)-K21\times u_{3}(t)$	(9)

where $c_{t}=u_{2}(t)/\text{V2}$ with V2 representing the volume of distribution for drug in plasma circulation. The drift term in the relationship between $c_{t}$ and PD is represented by the following:

\displaystyle\frac{du_{4}}{dt}={KIN}-(KOUT*(1-(Imax\times c_{t}/IC50+c_{t})))% \times u_{4}(t).

(10)

Example trajectories of this system are shown in Figure 5. The diffusion term in Equation 2 is described by the following $\beta u_{4}dW_{t}$ , where $\beta$ was sampled from a log-normal distribution. Examples of stochastic trajectories for $c_{t}$ are shown in Figure 6.

In the current set of experiments, an equal number of patients were simulation for a range of doses (50, 100, 400 mg). Dosing was set to begin at day 5 for all synthetic subjects with daily dosing; the PD sampling frequency is once per hour, over a period of 30 days.

A.2 Training methodology and optimization strategy

The current model, including the numerical integration scheme which employs a Euler-Maruyama solver, have been implemented in PyTorch. While a higher-order methods were not used in this current work, it remains open for future development based on specific needs.

In model training, we leveraged vectorization rather than operating on a single value at a time whereby the model processes each time-step for each patient sequentially. In this way, the model operates at a patient level, concurrently processing all data points associated with a specific patient. This is feasible based on the observation that evaluating the loss function given in Equation 6 at each time-step is independent from other time instances. The vectorization strategy significantly enhances the training and inference performance.

We trained the network for 100 epochs using the ADAM optimizer with learning rate 0.001 and batch size of 1. The overall training process takes around 140 seconds using one NVIDIA V100 GPU.