𝖿-𝖦𝖠𝖭𝖿-𝖦𝖠𝖭\mathsf{f}\text{-}\mathsf{GAN}sansserif_f - sansserif_GAN: a frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

Abstract

Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual’s cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due to motion, they have been widely used to continuously monitor cardiovascular health because of their convenience. Therefore, we would like to combine the ease with which PPGs can be collected with the information that ECGs provide about cardiovascular health by develo** models to synthesize ECG signals from paired PPG signals. We tackled this problem using generative adversarial networks (GANs) and found that models trained using the original GAN formulations can be successfully used to synthesize ECG signals from which heart rate can be extracted using standard signal processing pipelines. Incorporating a frequency-domain constraint to model training improved the stability of model performance and also the performance on heart rate estimation.

Index Terms—  Generative adversarial networks, electrocardiograms, photoplethysmograms, cardiovascular health

1 Introduction

There is an increasing interest in cardiovascular wellness [1, 2] and there exists various non-invasive approaches for monitoring cardiovascular health. Clinically, electrocardiograms (ECGs) and fingertip photoplethysmograms (PPGs) are used along with established signal processing pipelines for extracting clinically-validated traits. Continuous monitoring precludes the use of them due to the equipment necessary for their collection. The increasing prevalence of wrist-worn devices may alleviate this problem, since PPGs are easily collected by these devices [3].

There are trade-offs to consider when choosing between ECGs or PPGs. Each signal modality differs in their availability, signal quality and the mode of measurements. For example, ECGs are less noisy with higher sampling rate because they are usually collected in stationary settings. In addition, ECGs monitor cardiac electrical activity whereas PPGs monitor blood flow in the periphery using optical sensors. Therefore, it is of interest to combine the richness of ECGs with the availability of PPGs.

We work towards this goal by synthesizing ECG signals directly from (paired) PPG signals with a few desiderata. First, we would like a model that can synthesize ECG signals from PPG signals that are obtained from wearable devices during daily activities, as opposed to from PPG signals obtained using fingertip pulse oximeters. This is because wearable devices more readily allow for the continuous monitoring of biometric signals during typical daily activities. Second, we would like a PPG-to-ECG synthesis model that can generalize across subjects so that it can be more readily used off-the-shelf.

Here, we used the framework of generative adversarial networks (GANs, [4]) to learn the map** from PPG signals to ECG signals. We did not pose this problem as a supervised translation problem (e.g., Euclidean distance minimization) because GANs allow us to not only learn the map** function (from PPG to ECG), but also remove the need to hand-engineer the objective function for synthesizing realistic data samples [5]. Further motivating our use of GANs is their successful applications in image-to-image translation problems [5, 6].

There are a few related works that used GANs for synthesizing ECG signals from PPG signals. Using the MIMIC dataset [7, 8, 9], it has been shown that ECG signals could be generated with high-fidelity from PPG signals obtained via fingertip pulse oximetry [10]. Other work showing successful PPG-to-ECG synthesis used a GAN that incorporated both time-domain and frequency-domain inputs along with constraints on backwards translation (i.e., ECG-to-PPG) [11]. Their model was very sophisticated, but it was unclear which components contributed the greatest to model performance and how stable it was across random initializations. Finally, models that do not use GANs have also been shown to be capable of PPG-to-ECG reconstruction, but are subject-specific and are built using PPG signals obtained from fingertip pulse oximetry [12].

Our work focusses on building models to synthesize ECGs from PPGs using GANs starting with the most basic formulation of them and on investigating the benefits that certain model components confer. We found that training a model using the original GAN formulation resulted in synthetic ECG signals that can be used for heart rate estimation, serving as a good baseline. Furthermore, adding a frequency-domain constraint during model training improves the properties of model training and also improves model performance on heart rate estimation.

2 Methodology

2.1 Dataset and Signal Preprocessing

We used the dataset collected by Reiss et al. [13], known as PPG-Dalia. It consists of a set of synchronized PPG and ECG signals obtained from 15151515 participants while they performed a wide variety of natural activities, including sitting (at rest), cycling, working, walking and playing table soccer, etc., over a span of approximately two hours. PPG signals in this dataset were collected from a wrist-worn device at 64646464 Hz, while ECG signals were (simultaneously) collected at 700700700700 Hz using a chest-worn device.

We next performed basic signal preprocessing of each participant’s data. The PPG and the ECG signals were first resampled to 128128128128 Hz. They were then segmented into overlap** four-second windows (i.e., 512512512512 time points for each data sample), where adjacent data samples had a two-second overlap. After signal segmentation, a bandpass filter was applied on each signal using a Python package known as biosppy [14]. Each PPG segment was bandpass filtered using a fourth order Chebyshev Type II filter with passband frequencies of 0.40.40.40.4 Hz and 8888 Hz. Similarly, each ECG segment was bandpass filtered using a finite impulse response (FIR) filter with passband frequencies of 3333 Hz and 45454545 Hz. Finally, both signal segments were min-max scaled to [1,1]11[-1,1][ - 1 , 1 ]. These preprocessing steps are similar to those used in prior work [11].

Prior to model training and evaluation, the data were split into train, validation and test sets. Nine random participants were assigned to the train set (40 6754067540\,67540 675 segments), three random participants were assigned to the validation set (11 2761127611\,27611 276 segments) and the final three participants were assigned to the test set (12 7961279612\,79612 796 segments). All three sets were disjoint and consisted of data from different subjects, so that our models are subject agnostic.

2.2 Generative Adversarial Networks and Objective Functions

At a high-level, GANs are based on a two-player game, where both players can be deep neural networks. One player is the “generator” (G𝐺Gitalic_G) and its goal is to produce synthetic data (e.g., ECG signals) that are as indistinguishable as possible from real data. The other player is the “discriminator” (D𝐷Ditalic_D) and its goal is to determine whether or not its input is synthetic (i.e., fake). Thus, the generator and the discriminator have opposing objectives and are trained adversarially until they reach equilibrium.

We experimented with two different objective functions when training a generator to synthesize ECG signals from PPG signals. The first objective function was the original adversarial loss formulation [4], defined as follows:

𝖦𝖠𝖭(𝑿,𝒀;𝜽G,𝜽D)=𝔼𝒙𝖯𝑿(𝒙)[log(1D(G(𝒙))]+\displaystyle\mathcal{L}_{\mathsf{GAN}}(\boldsymbol{X},\boldsymbol{Y};% \boldsymbol{\theta}_{G},\boldsymbol{\theta}_{D})=\mathbb{E}_{\boldsymbol{x}% \sim\mathsf{P}_{\boldsymbol{X}}(\boldsymbol{x})}[\log(1-D(G(\boldsymbol{x}))]\,+caligraphic_L start_POSTSUBSCRIPT sansserif_GAN end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_Y ; bold_italic_θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ sansserif_P start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ( bold_italic_x ) end_POSTSUBSCRIPT [ roman_log ( 1 - italic_D ( italic_G ( bold_italic_x ) ) ] + (1)
𝔼𝒚𝖯𝒀(𝒚)[logD(𝒚)],subscript𝔼similar-to𝒚subscript𝖯𝒀𝒚delimited-[]𝐷𝒚\displaystyle\mathbb{E}_{\boldsymbol{y}\sim\mathsf{P}_{\boldsymbol{Y}}(% \boldsymbol{y})}[\log D(\boldsymbol{y})],blackboard_E start_POSTSUBSCRIPT bold_italic_y ∼ sansserif_P start_POSTSUBSCRIPT bold_italic_Y end_POSTSUBSCRIPT ( bold_italic_y ) end_POSTSUBSCRIPT [ roman_log italic_D ( bold_italic_y ) ] ,

where 𝜽Gsubscript𝜽𝐺\boldsymbol{\theta}_{G}bold_italic_θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and 𝜽Dsubscript𝜽𝐷\boldsymbol{\theta}_{D}bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT are the parameters of the generator and the discriminator respectively, 𝑿𝑿\boldsymbol{X}bold_italic_X is the set of PPG signals, 𝒀𝒀\boldsymbol{Y}bold_italic_Y is the set of ECG signals, 𝒙512𝒙superscript512\boldsymbol{x}\in\mathbb{R}^{512}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 512 end_POSTSUPERSCRIPT is a PPG signal and 𝒚512𝒚superscript512\boldsymbol{y}\in\mathbb{R}^{512}bold_italic_y ∈ blackboard_R start_POSTSUPERSCRIPT 512 end_POSTSUPERSCRIPT is a real ECG signal. Since the objective of D𝐷Ditalic_D is to distinguish between synthetic and real ECG signals, D𝐷Ditalic_D aims to maximize 𝖦𝖠𝖭subscript𝖦𝖠𝖭\mathcal{L}_{\mathsf{GAN}}caligraphic_L start_POSTSUBSCRIPT sansserif_GAN end_POSTSUBSCRIPT. On the contrary, G𝐺Gitalic_G aims to synthesize ECG signals so real as to fool D𝐷Ditalic_D. Thus, G𝐺Gitalic_G aims to minimize 𝖦𝖠𝖭subscript𝖦𝖠𝖭\mathcal{L}_{\mathsf{GAN}}caligraphic_L start_POSTSUBSCRIPT sansserif_GAN end_POSTSUBSCRIPT.

The second objective function builds upon Equation 1 by incorporating a constraint in the frequency domain, since constraints in the time-domain may not be able to handle the undesirable effects due to the different latencies between PPG peaks and ECG peaks across participants. The constraint is also motivated by the desire to encourage greater morphological similarity between the synthetic and the real ECG signals (e.g., number of R-peaks, P-waves, T-waves). Concretely, the frequency-domain constraint is defined as follows:

𝖿𝗋𝖾𝗊(𝑿,𝒀;𝜽G)=𝔼𝒙,𝒚𝖯𝑿,𝒀(𝒙,𝒚)[|(G(𝒙))||(𝒚)|1],subscript𝖿𝗋𝖾𝗊𝑿𝒀subscript𝜽𝐺subscript𝔼similar-to𝒙𝒚subscript𝖯𝑿𝒀𝒙𝒚delimited-[]subscriptdelimited-∥∥𝐺𝒙𝒚1\mathcal{L}_{\mathsf{freq}}(\boldsymbol{X},\boldsymbol{Y};\boldsymbol{\theta}_% {G})=\mathbb{E}_{\boldsymbol{x},\boldsymbol{y}\sim\mathsf{P}_{\boldsymbol{X},% \boldsymbol{Y}}(\boldsymbol{x},\boldsymbol{y})}\left[\left\lVert\,\lvert% \mathcal{F}(G(\boldsymbol{x}))\rvert-\lvert\mathcal{F}(\boldsymbol{y})\rvert\,% \right\rVert_{1}\right],caligraphic_L start_POSTSUBSCRIPT sansserif_freq end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_Y ; bold_italic_θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT bold_italic_x , bold_italic_y ∼ sansserif_P start_POSTSUBSCRIPT bold_italic_X , bold_italic_Y end_POSTSUBSCRIPT ( bold_italic_x , bold_italic_y ) end_POSTSUBSCRIPT [ ∥ | caligraphic_F ( italic_G ( bold_italic_x ) ) | - | caligraphic_F ( bold_italic_y ) | ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , (2)

where |(𝒛)|𝒛\lvert\mathcal{F}(\boldsymbol{z})\rvert| caligraphic_F ( bold_italic_z ) | denotes the amplitude of each frequency component of 𝒛𝒛\boldsymbol{z}bold_italic_z, excluding the 00 Hz component (i.e., DC component), obtained via a fast Fourier transform. The final objective incorporating the frequency-domain constraint is:

𝖿-𝖦𝖠𝖭(𝑿,𝒀;𝜽G,𝜽D)=𝖦𝖠𝖭(𝑿,𝒀;𝜽G,𝜽D)+subscript𝖿-𝖦𝖠𝖭𝑿𝒀subscript𝜽𝐺subscript𝜽𝐷limit-fromsubscript𝖦𝖠𝖭𝑿𝒀subscript𝜽𝐺subscript𝜽𝐷\displaystyle\mathcal{L}_{\mathsf{f}\text{-}\mathsf{GAN}}(\boldsymbol{X},% \boldsymbol{Y};\boldsymbol{\theta}_{G},\boldsymbol{\theta}_{D})=\mathcal{L}_{% \mathsf{GAN}}(\boldsymbol{X},\boldsymbol{Y};\boldsymbol{\theta}_{G},% \boldsymbol{\theta}_{D})\,+caligraphic_L start_POSTSUBSCRIPT sansserif_f - sansserif_GAN end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_Y ; bold_italic_θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) = caligraphic_L start_POSTSUBSCRIPT sansserif_GAN end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_Y ; bold_italic_θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT , bold_italic_θ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ) + (3)
λ𝖿𝗋𝖾𝗊𝖿𝗋𝖾𝗊(𝑿,𝒀;𝜽G),subscript𝜆𝖿𝗋𝖾𝗊subscript𝖿𝗋𝖾𝗊𝑿𝒀subscript𝜽𝐺\displaystyle\lambda_{\mathsf{freq}}\mathcal{L}_{\mathsf{freq}}(\boldsymbol{X}% ,\boldsymbol{Y};\boldsymbol{\theta}_{G}),italic_λ start_POSTSUBSCRIPT sansserif_freq end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT sansserif_freq end_POSTSUBSCRIPT ( bold_italic_X , bold_italic_Y ; bold_italic_θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) ,

where λ𝖿𝗋𝖾𝗊subscript𝜆𝖿𝗋𝖾𝗊\lambda_{\mathsf{freq}}italic_λ start_POSTSUBSCRIPT sansserif_freq end_POSTSUBSCRIPT is the coefficient that weights the relative importance of the frequency-domain constraint. This combined objective therefore encourages the generator to both fool the discriminator while remaining close to the real data in the frequency domain in an 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sense.

2.3 Model Architecture

The generator architecture was a U-Net with skip connections [15, 5, 11]. In addition to the U-Net structure, we applied attention gates at the output of each skip connection so that the model learns to emphasize input features at various resolutions that are useful for a task, as in prior work [16, 11]. The “encoding” portion of the generator consisted of six 1D convolutional layers with an increasing number of output filters: (64,128,256,512,512,512)64128256512512512(64,128,256,512,512,512)( 64 , 128 , 256 , 512 , 512 , 512 ). The first three convolutional operations had a stride of two and the latter three convolutional operations had a stride of one. The “decoding” portion of the generator was essentially the “mirrored” version of the encoder, where spatial size is gradually increased by sequentially applying upsampling and convolutional operations. All the convolutional filters had a kernel size of 16161616. A high-level schematic of the generator architecture is shown on the left of Figure 1.

Refer to caption
Fig. 1: Schematics of the generator and the discriminator architectures. “AG” denotes the attention gate when combining features from the encoder with the features from the decoder. Left: architecture of the generator, which takes as input the PPG signal. Right: architecture of the discriminator, which takes as input the real or the synthetic ECG signal.

The discriminator architecture was a six-layer 1D convolutional neural network. The number of output filters for each layer increased as a function of depth: (32,64,128,256,512)3264128256512(32,64,128,256,512)( 32 , 64 , 128 , 256 , 512 ). Another 1D convolutional layer was applied on the output of these five convolutional layers to reduce the number of output channels from 512512512512 to one. Finally, this output was reduced to a scalar value via an average-pooling operation (the scalar value is used to determine the probability that the input is real or fake). All the convolutional filters had a kernel size of 16161616 with a stride of one. A high-level schematic of the discriminator architecture is shown on the right of Figure 1.

2.4 Model Training

The objective functions described in Equations 1 and 3 were optimized using the Adam optimizer [17] with a learning rate of 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT for the discriminator, a learning rate of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT for the generator and a batch size of 128128128128. The beta coefficients for the optimizer were set to β1=0.9subscript𝛽10.9\beta_{1}=0.9italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9 and β2=0.999subscript𝛽20.999\beta_{2}=0.999italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.999. The discriminator was also updated five times slower than the generator (i.e., the discriminator is updated every five training iterations and the generator is updated every iteration). To optimize Equation 1, the models were trained for 15151515 epochs with the learning rates constant for four epochs and then linearly decayed to zero. To optimize Equation 3, λ𝖿𝗋𝖾𝗊subscript𝜆𝖿𝗋𝖾𝗊\lambda_{\mathsf{freq}}italic_λ start_POSTSUBSCRIPT sansserif_freq end_POSTSUBSCRIPT was set to 0.10.10.10.1 and the models were trained for 11111111 epochs with the learning rates constant for five epochs and then linearly decayed to zero. Finally, the optimization of each objective function was performed 31313131 times, each time with a different random seed. All models and optimization were implemented using PyTorch.

2.5 Model Evaluation

We evaluated each synthetic ECG signal based on how well heart rate could be estimated from the signal with respect to the heart rate estimated from the real ECG signal, as it is more interpretable than other metrics such as root mean-squared error. A 10101010-second ECG segment allows for heart rate to be extracted more reliably. Therefore, we first segmented the signals from the validation and the test sets into 10101010-second paired PPG and ECG segments (with eight-second overlaps between consecutive samples). 10101010-second synthetic ECG signals were then generated for each 10101010-second PPG signal. A popular peak detection algorithm [18, 14] was applied to extract heart rate from both the synthetic and the real ECG signals, same as that used in prior work [13, 11]. We then computed the absolute difference between the two estimated heart rates scaled by the heart rate of the real ECG signal. Concretely, the mean absolute percentage error (denoted as 𝖬𝖠𝖯𝖤𝖬𝖠𝖯𝖤\mathsf{MAPE}sansserif_MAPE) across the dataset is defined as follows:

𝖬𝖠𝖯𝖤(𝗘𝗖𝗚,𝗘𝗖𝗚^)=100Ni=1N|𝖧𝖱(𝗘𝗖𝗚i)𝖧𝖱(𝗘𝗖𝗚^i)|𝖧𝖱(𝗘𝗖𝗚i),𝖬𝖠𝖯𝖤𝗘𝗖𝗚^𝗘𝗖𝗚100𝑁superscriptsubscript𝑖1𝑁𝖧𝖱subscript𝗘𝗖𝗚𝑖𝖧𝖱subscript^𝗘𝗖𝗚𝑖𝖧𝖱subscript𝗘𝗖𝗚𝑖\mathsf{MAPE}(\boldsymbol{\mathsf{ECG}},\widehat{\boldsymbol{\mathsf{ECG}}})=% \frac{100}{N}\sum_{i=1}^{N}\frac{\lvert\mathsf{HR}(\boldsymbol{\mathsf{ECG}}_{% i})-\mathsf{HR}(\widehat{\boldsymbol{\mathsf{ECG}}}_{i})\rvert}{\mathsf{HR}(% \boldsymbol{\mathsf{ECG}}_{i})},sansserif_MAPE ( bold_sansserif_ECG , over^ start_ARG bold_sansserif_ECG end_ARG ) = divide start_ARG 100 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG | sansserif_HR ( bold_sansserif_ECG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - sansserif_HR ( over^ start_ARG bold_sansserif_ECG end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG start_ARG sansserif_HR ( bold_sansserif_ECG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , (4)

where N𝑁Nitalic_N is the number of (PPG or ECG) signal segments in either the validation or the test set, 𝗘𝗖𝗚isubscript𝗘𝗖𝗚𝑖\boldsymbol{\mathsf{ECG}}_{i}bold_sansserif_ECG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_ith real ECG signal, 𝗘𝗖𝗚^isubscript^𝗘𝗖𝗚𝑖\widehat{\boldsymbol{\mathsf{ECG}}}_{i}over^ start_ARG bold_sansserif_ECG end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_ith synthetic ECG signal and 𝖧𝖱()𝖧𝖱\mathsf{HR}(\cdot)sansserif_HR ( ⋅ ) is the peak detection algorithm used to estimate heart rate from ECG signals [18].

3 Results

3.1 Qualitative Results

Figure 2 shows qualitatively that a GAN trained with the frequency-domain constraint (Equation 3) can synthesize ECG signals that look very similar to the real ECG signal. Furthermore, we can observe some aspects of ECG-signal morphology, including the P-wave, the QRS complex and the T-wave, in each heartbeat.

Refer to caption
Fig. 2: Example of a synthesized ECG signal from a PPG signal using a GAN trained with the frequency-domain constraint. The PPG signal (top) was used as the input to the generator, which output a synthetic ECG signal (middle). The synthetic ECG signal looks qualitatively similar to the real (paired) ECG signal. Signal amplitudes shown here were obtained after signal preprocessing steps were performed.

3.2 Improved Stability of Model Performance Across Random Seeds

GANs are known to have non-convergence issues [19, 20]. We therefore ascertained the stability of each model’s performance on heart rate estimation across 31313131 random seeds (i.e., random initializations and dataset shuffles) using the validation set. The distribution of the mean absolute percentage error for both objective functions across the random seeds is shown in Figure 3. We found that across the random seeds, model performance was more variable if a frequency-domain constraint was not incorporated, as can be seen by the heavily right-skewed distribution in Figure 3. Specifically, the standard deviation in model performance across random seeds for the model trained without the frequency-domain constraint was 12%percent1212\%12 % and was 3%percent33\%3 % for the model trained with the constraint. Furthermore, the average of the mean absolute percentage error across random seeds was smaller for the model trained with the constraint (p=0.005𝑝0.005p=0.005italic_p = 0.005).

Refer to caption
Fig. 3: Distribution across random seeds of model performance on heart rate estimation during all activities for the two objective functions. Across the entire validation set, the model incorporating the frequency-domain constraint performs better (on average across seeds) than the model without the constraint (t(60)=3.04,p=0.005formulae-sequence𝑡603.04𝑝0.005t(60)=3.04,p=0.005italic_t ( 60 ) = 3.04 , italic_p = 0.005) and its performance is less variable across random seeds.

We subset the validation set into segments that were associated with “more active” activities to investigate how the models performed under activities that have higher heart rates. These activities consisted of going up and down stairs, playing table soccer, cycling, driving, walking and working. The performance distributions evaluated over the “active” activities shifted slightly to the right (compare the positions of the leftmost bars in Figures 3 and 4). Overall, the observations of improved average model performance and reduced performance variance across random seeds is consistent across an evaluation using a subset of activities containing higher heart rates.

Refer to caption
Fig. 4: Distribution across random seeds of model performance on heart rate estimation during “active” activities for the two objective functions. When evaluated on signal segments during staircase traversal, table soccer, cycling, driving, walking and working, the model incorporating the frequency-domain constraint has better performance (on average across seeds) than the model without the constraint (t(60)=2.98,p=0.005formulae-sequence𝑡602.98𝑝0.005t(60)=2.98,p=0.005italic_t ( 60 ) = 2.98 , italic_p = 0.005). Incorporating the constraint also reduces performance variance.

3.3 Reduction in the Number of Heart Rate Estimation Failure Cases

One important use-case of the synthetic ECG signals is the ability for them to be able to be incorporated into well-established signal processing pipelines that can detect different morphological properties of ECGs. To assess a model’s ability for this use-case, we computed, for each model (and for each random seed), the total number of validation set samples in which a standard peak-detection algorithm [18, 14] failed to detect heart rate. Across the random seeds, we found that the model trained without the frequency-domain constraint had many more failure cases than the model trained with the constraint, shown by the long right tail of the “Original GAN” distribution in Figure 5 (p=0.006𝑝0.006p=0.006italic_p = 0.006). Thus, training models using an additional frequency-domain constraint more readily allows for existing ECG signal processing pipelines to be used.

Refer to caption
Fig. 5: Distribution across random seeds of total number of samples where heart rate estimation failed during all activities for the two objective functions. Across the entire validation set (a total of 11 2761127611\,27611 276 samples), the model incorporating the frequency-domain constraint results in less samples in which heart rate estimation fails (on average across seeds) than the model without the constraint (t(60)=2.97,p=0.006formulae-sequence𝑡602.97𝑝0.006t(60)=2.97,p=0.006italic_t ( 60 ) = 2.97 , italic_p = 0.006), while using existing signal processing pipelines.

3.4 Reduction in Average Heart Rate Estimation Error

We next evaluated the best performing model (chosen according to the validation set), for each objective function, on the test set. We also compared the models’ performance with respect to the mean absolute percentage error obtained if the peak-detection algorithm of Elgendi et al. [21, 22] were used directly on the PPG signals, without accelerometer data. We found that the model trained with the frequency-domain constraint outperforms the model trained without the constraint by 2%percent22\%2 % when all the activities are considered (leftmost column of Table 1). It also outperforms a strong PPG peak-detection algorithm for heart rate estimation by 3%percent33\%3 %.

All Not Active Active
PPG [21, 22] 15%percent1515\%15 % 14%percent1414\%14 % 17%percent1717\%17 %
Original GAN
(Equation 1)
14%percent1414\%14 % 14%percent1414\%14 % 14%percent1414\%14 %
Original GAN +
Frequency-Domain
Constraint (Equation 3)
𝟏𝟐%percent12\mathbf{12\%}bold_12 % 𝟏𝟐%percent12\mathbf{12\%}bold_12 % 𝟏𝟐%percent12\mathbf{12\%}bold_12 %
Table 1: Mean absolute percentage error on the test set (a total of 12 7961279612\,79612 796 samples) for different activity subsets using the best model determined by the validation set. For the PPG signals, heart rate was estimated using the peak-detection algorithm of Elgendi et al. [21, 22] and the error was computed with respect to the heart rate estimated from the paired ECG signal.

4 Conclusions

The increasing adoption of health-oriented wearable devices will increase the availability of PPG signals. A subject-agnostic model to generate ECG signals from PPG signals could provide vast amounts of information of wearable-users’ cardiovascular health. This information could potentially be used as a continuous-monitoring platform for existing cardiovascular conditions.

In this work, we have made some additional progress into this problem by providing a basis upon which next-generation, GAN-based models could be developed. Firstly, by converting PPG signals to ECG signals, we obtained more accurate heart rate estimations when compared to state-of-the-art PPG heart rate estimation algorithms. This was especially prominent when participants were performing activities causing noisier PPG signals.

We also showed that incorporating a frequency-domain constraint during subject-agnostic model training conferred several advantages. It led to improved performance stability (across random seeds) during GAN training, measured by average heart rate estimation error. Moreover, the constraint led to a reduced number of heart rate estimation failure cases, therefore improving the reliability of the detected heart rates so that they can be more readily used with existing ECG signal processing pipelines. Finally, models trained with the constraint had reduced average heart rate estimation error compared to the model trained without the constraint.

References

  • [1] US Preventive Services Task Force, Carol M Mangione, Michael J Barry, Wanda K Nicholson, Michael Cabana, David Chelmow, Tumaini Rucker Coker, Esa M Davis, Katrina E Donahue, Chyke A Doubeni, Carlos Roberto Jaén, Martha Kubik, Li Li, Gbenga Ogedegbe, Lori Pbert, John M Ruiz, James Stevermer, and John B Wong, “Vitamin, mineral, and multivitamin supplementation to prevent cardiovascular disease and cancer: US preventive services task force recommendation statement,” JAMA, vol. 327, no. 23, pp. 2326–2333, June 2022.
  • [2] Zulqarnain Javed, Muhammad Haisum Maqsood, Tamer Yahya, Zahir Amin, Isaac Acquah, Javier Valero-Elizondo, Julia Andrieni, Prachi Dubey, Ryane K Jackson, Mary A Daffin, Miguel Cainzos-Achirica, Adnan A Hyder, and Khurram Nasir, “Race, racism, and cardiovascular health: Applying a social determinants of health framework to racial/ethnic disparities in cardiovascular disease,” Circ. Cardiovasc. Qual. Outcomes, vol. 15, no. 1, pp. e007917, Jan. 2022.
  • [3] Peter H Charlton, Panicos A Kyriaco, Jonathan Mant, Vaidotas Marozas, Phil Chowienczyk, and Jordi Alastruey, “Wearable photoplethysmography for cardiovascular monitoring,” Proc. IEEE Inst. Electr. Electron. Eng., vol. 110, no. 3, pp. 355–381, Mar 2022.
  • [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, vol. 27.
  • [5] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1125–1134.
  • [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232.
  • [7] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark, “Mimic-iii, a freely accessible critical care database,” Scientific Data, vol. 3, no. 1, pp. 1–9, 2016.
  • [8] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
  • [9] George B Moody and Roger G Mark, “A database to support development and evaluation of intelligent intensive care monitoring,” in Computers in Cardiology. IEEE, 1996, pp. 657–660.
  • [10] Khuong Vo, Emad Kasaeyan Naeini, Amir Naderi, Daniel Jilani, Amir M Rahmani, Nikil Dutt, and Hung Cao, “P2e-wgan: Ecg waveform synthesis from ppg with conditional wasserstein generative adversarial networks,” in Proceedings of the 36th Annual ACM Symposium on Applied Computing, 2021, pp. 1030–1036.
  • [11] Pritam Sarkar and Ali Etemad, “Cardiogan: Attentive generative adversarial network with dual discriminators for synthesis of ecg from ppg,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, pp. 488–496.
  • [12] Qunfeng Tang, Zhencheng Chen, Yanke Guo, Yongbo Liang, Rabab Ward, Carlo Menon, and Mohamed Elgendi, “Robust reconstruction of electrocardiogram using photoplethysmography: A subject-based model,” Frontiers in Physiology, p. 645, 2022.
  • [13] Attila Reiss, Ina Indlekofer, Philip Schmidt, and Kristof Van Laerhoven, “Deep ppg: Large-scale heart rate estimation with convolutional neural networks,” Sensors, vol. 19, no. 14, pp. 3079, 2019.
  • [14] Carlos Carreiras, Ana Priscila Alves, André Lourenço, Filipe Canento, Hugo Silva, Ana Fred, et al., “BioSPPy: Biosignal processing in Python,” 2015–, [Online].
  • [15] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
  • [16] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
  • [17] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
  • [18] Pat Hamilton, “Open source ecg analysis,” in Computers in Cardiology. IEEE, 2002, pp. 101–104.
  • [19] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen, “Improved techniques for training gans,” Advances in Neural Information Processing Systems, vol. 29, 2016.
  • [20] Ian Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160, 2016.
  • [21] Mohamed Elgendi, Ian Norton, Matt Brearley, Derek Abbott, and Dale Schuurmans, “Systolic peak detection in acceleration photoplethysmograms measured from emergency responders in tropical conditions,” PloS one, vol. 8, no. 10, pp. e76585, 2013.
  • [22] Dominique Makowski, Tam Pham, Zen J. Lau, Jan C. Brammer, François Lespinasse, Hung Pham, Christopher Schölzel, and S. H. Annabel Chen, “NeuroKit2: A python toolbox for neurophysiological signal processing,” Behavior Research Methods, vol. 53, no. 4, pp. 1689–1696, 2021.