: a frequency-domain-constrained generative adversarial network for PPG to ECG synthesis
Abstract
Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual’s cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due to motion, they have been widely used to continuously monitor cardiovascular health because of their convenience. Therefore, we would like to combine the ease with which PPGs can be collected with the information that ECGs provide about cardiovascular health by develo** models to synthesize ECG signals from paired PPG signals. We tackled this problem using generative adversarial networks (GANs) and found that models trained using the original GAN formulations can be successfully used to synthesize ECG signals from which heart rate can be extracted using standard signal processing pipelines. Incorporating a frequency-domain constraint to model training improved the stability of model performance and also the performance on heart rate estimation.
Index Terms— Generative adversarial networks, electrocardiograms, photoplethysmograms, cardiovascular health
1 Introduction
There is an increasing interest in cardiovascular wellness [1, 2] and there exists various non-invasive approaches for monitoring cardiovascular health. Clinically, electrocardiograms (ECGs) and fingertip photoplethysmograms (PPGs) are used along with established signal processing pipelines for extracting clinically-validated traits. Continuous monitoring precludes the use of them due to the equipment necessary for their collection. The increasing prevalence of wrist-worn devices may alleviate this problem, since PPGs are easily collected by these devices [3].
There are trade-offs to consider when choosing between ECGs or PPGs. Each signal modality differs in their availability, signal quality and the mode of measurements. For example, ECGs are less noisy with higher sampling rate because they are usually collected in stationary settings. In addition, ECGs monitor cardiac electrical activity whereas PPGs monitor blood flow in the periphery using optical sensors. Therefore, it is of interest to combine the richness of ECGs with the availability of PPGs.
We work towards this goal by synthesizing ECG signals directly from (paired) PPG signals with a few desiderata. First, we would like a model that can synthesize ECG signals from PPG signals that are obtained from wearable devices during daily activities, as opposed to from PPG signals obtained using fingertip pulse oximeters. This is because wearable devices more readily allow for the continuous monitoring of biometric signals during typical daily activities. Second, we would like a PPG-to-ECG synthesis model that can generalize across subjects so that it can be more readily used off-the-shelf.
Here, we used the framework of generative adversarial networks (GANs, [4]) to learn the map** from PPG signals to ECG signals. We did not pose this problem as a supervised translation problem (e.g., Euclidean distance minimization) because GANs allow us to not only learn the map** function (from PPG to ECG), but also remove the need to hand-engineer the objective function for synthesizing realistic data samples [5]. Further motivating our use of GANs is their successful applications in image-to-image translation problems [5, 6].
There are a few related works that used GANs for synthesizing ECG signals from PPG signals. Using the MIMIC dataset [7, 8, 9], it has been shown that ECG signals could be generated with high-fidelity from PPG signals obtained via fingertip pulse oximetry [10]. Other work showing successful PPG-to-ECG synthesis used a GAN that incorporated both time-domain and frequency-domain inputs along with constraints on backwards translation (i.e., ECG-to-PPG) [11]. Their model was very sophisticated, but it was unclear which components contributed the greatest to model performance and how stable it was across random initializations. Finally, models that do not use GANs have also been shown to be capable of PPG-to-ECG reconstruction, but are subject-specific and are built using PPG signals obtained from fingertip pulse oximetry [12].
Our work focusses on building models to synthesize ECGs from PPGs using GANs starting with the most basic formulation of them and on investigating the benefits that certain model components confer. We found that training a model using the original GAN formulation resulted in synthetic ECG signals that can be used for heart rate estimation, serving as a good baseline. Furthermore, adding a frequency-domain constraint during model training improves the properties of model training and also improves model performance on heart rate estimation.
2 Methodology
2.1 Dataset and Signal Preprocessing
We used the dataset collected by Reiss et al. [13], known as PPG-Dalia. It consists of a set of synchronized PPG and ECG signals obtained from participants while they performed a wide variety of natural activities, including sitting (at rest), cycling, working, walking and playing table soccer, etc., over a span of approximately two hours. PPG signals in this dataset were collected from a wrist-worn device at Hz, while ECG signals were (simultaneously) collected at Hz using a chest-worn device.
We next performed basic signal preprocessing of each participant’s data. The PPG and the ECG signals were first resampled to Hz. They were then segmented into overlap** four-second windows (i.e., time points for each data sample), where adjacent data samples had a two-second overlap. After signal segmentation, a bandpass filter was applied on each signal using a Python package known as biosppy [14]. Each PPG segment was bandpass filtered using a fourth order Chebyshev Type II filter with passband frequencies of Hz and Hz. Similarly, each ECG segment was bandpass filtered using a finite impulse response (FIR) filter with passband frequencies of Hz and Hz. Finally, both signal segments were min-max scaled to . These preprocessing steps are similar to those used in prior work [11].
Prior to model training and evaluation, the data were split into train, validation and test sets. Nine random participants were assigned to the train set ( segments), three random participants were assigned to the validation set ( segments) and the final three participants were assigned to the test set ( segments). All three sets were disjoint and consisted of data from different subjects, so that our models are subject agnostic.
2.2 Generative Adversarial Networks and Objective Functions
At a high-level, GANs are based on a two-player game, where both players can be deep neural networks. One player is the “generator” () and its goal is to produce synthetic data (e.g., ECG signals) that are as indistinguishable as possible from real data. The other player is the “discriminator” () and its goal is to determine whether or not its input is synthetic (i.e., fake). Thus, the generator and the discriminator have opposing objectives and are trained adversarially until they reach equilibrium.
We experimented with two different objective functions when training a generator to synthesize ECG signals from PPG signals. The first objective function was the original adversarial loss formulation [4], defined as follows:
(1) | |||
where and are the parameters of the generator and the discriminator respectively, is the set of PPG signals, is the set of ECG signals, is a PPG signal and is a real ECG signal. Since the objective of is to distinguish between synthetic and real ECG signals, aims to maximize . On the contrary, aims to synthesize ECG signals so real as to fool . Thus, aims to minimize .
The second objective function builds upon Equation 1 by incorporating a constraint in the frequency domain, since constraints in the time-domain may not be able to handle the undesirable effects due to the different latencies between PPG peaks and ECG peaks across participants. The constraint is also motivated by the desire to encourage greater morphological similarity between the synthetic and the real ECG signals (e.g., number of R-peaks, P-waves, T-waves). Concretely, the frequency-domain constraint is defined as follows:
(2) |
where denotes the amplitude of each frequency component of , excluding the Hz component (i.e., DC component), obtained via a fast Fourier transform. The final objective incorporating the frequency-domain constraint is:
(3) | |||
where is the coefficient that weights the relative importance of the frequency-domain constraint. This combined objective therefore encourages the generator to both fool the discriminator while remaining close to the real data in the frequency domain in an sense.
2.3 Model Architecture
The generator architecture was a U-Net with skip connections [15, 5, 11]. In addition to the U-Net structure, we applied attention gates at the output of each skip connection so that the model learns to emphasize input features at various resolutions that are useful for a task, as in prior work [16, 11]. The “encoding” portion of the generator consisted of six 1D convolutional layers with an increasing number of output filters: . The first three convolutional operations had a stride of two and the latter three convolutional operations had a stride of one. The “decoding” portion of the generator was essentially the “mirrored” version of the encoder, where spatial size is gradually increased by sequentially applying upsampling and convolutional operations. All the convolutional filters had a kernel size of . A high-level schematic of the generator architecture is shown on the left of Figure 1.
![Refer to caption](x1.png)
The discriminator architecture was a six-layer 1D convolutional neural network. The number of output filters for each layer increased as a function of depth: . Another 1D convolutional layer was applied on the output of these five convolutional layers to reduce the number of output channels from to one. Finally, this output was reduced to a scalar value via an average-pooling operation (the scalar value is used to determine the probability that the input is real or fake). All the convolutional filters had a kernel size of with a stride of one. A high-level schematic of the discriminator architecture is shown on the right of Figure 1.
2.4 Model Training
The objective functions described in Equations 1 and 3 were optimized using the Adam optimizer [17] with a learning rate of for the discriminator, a learning rate of for the generator and a batch size of . The beta coefficients for the optimizer were set to and . The discriminator was also updated five times slower than the generator (i.e., the discriminator is updated every five training iterations and the generator is updated every iteration). To optimize Equation 1, the models were trained for epochs with the learning rates constant for four epochs and then linearly decayed to zero. To optimize Equation 3, was set to and the models were trained for epochs with the learning rates constant for five epochs and then linearly decayed to zero. Finally, the optimization of each objective function was performed times, each time with a different random seed. All models and optimization were implemented using PyTorch.
2.5 Model Evaluation
We evaluated each synthetic ECG signal based on how well heart rate could be estimated from the signal with respect to the heart rate estimated from the real ECG signal, as it is more interpretable than other metrics such as root mean-squared error. A -second ECG segment allows for heart rate to be extracted more reliably. Therefore, we first segmented the signals from the validation and the test sets into -second paired PPG and ECG segments (with eight-second overlaps between consecutive samples). -second synthetic ECG signals were then generated for each -second PPG signal. A popular peak detection algorithm [18, 14] was applied to extract heart rate from both the synthetic and the real ECG signals, same as that used in prior work [13, 11]. We then computed the absolute difference between the two estimated heart rates scaled by the heart rate of the real ECG signal. Concretely, the mean absolute percentage error (denoted as ) across the dataset is defined as follows:
(4) |
where is the number of (PPG or ECG) signal segments in either the validation or the test set, is the th real ECG signal, is the th synthetic ECG signal and is the peak detection algorithm used to estimate heart rate from ECG signals [18].
3 Results
3.1 Qualitative Results
Figure 2 shows qualitatively that a GAN trained with the frequency-domain constraint (Equation 3) can synthesize ECG signals that look very similar to the real ECG signal. Furthermore, we can observe some aspects of ECG-signal morphology, including the P-wave, the QRS complex and the T-wave, in each heartbeat.
![Refer to caption](x2.png)
3.2 Improved Stability of Model Performance Across Random Seeds
GANs are known to have non-convergence issues [19, 20]. We therefore ascertained the stability of each model’s performance on heart rate estimation across random seeds (i.e., random initializations and dataset shuffles) using the validation set. The distribution of the mean absolute percentage error for both objective functions across the random seeds is shown in Figure 3. We found that across the random seeds, model performance was more variable if a frequency-domain constraint was not incorporated, as can be seen by the heavily right-skewed distribution in Figure 3. Specifically, the standard deviation in model performance across random seeds for the model trained without the frequency-domain constraint was and was for the model trained with the constraint. Furthermore, the average of the mean absolute percentage error across random seeds was smaller for the model trained with the constraint ().
![Refer to caption](x3.png)
We subset the validation set into segments that were associated with “more active” activities to investigate how the models performed under activities that have higher heart rates. These activities consisted of going up and down stairs, playing table soccer, cycling, driving, walking and working. The performance distributions evaluated over the “active” activities shifted slightly to the right (compare the positions of the leftmost bars in Figures 3 and 4). Overall, the observations of improved average model performance and reduced performance variance across random seeds is consistent across an evaluation using a subset of activities containing higher heart rates.
![Refer to caption](x4.png)
3.3 Reduction in the Number of Heart Rate Estimation Failure Cases
One important use-case of the synthetic ECG signals is the ability for them to be able to be incorporated into well-established signal processing pipelines that can detect different morphological properties of ECGs. To assess a model’s ability for this use-case, we computed, for each model (and for each random seed), the total number of validation set samples in which a standard peak-detection algorithm [18, 14] failed to detect heart rate. Across the random seeds, we found that the model trained without the frequency-domain constraint had many more failure cases than the model trained with the constraint, shown by the long right tail of the “Original GAN” distribution in Figure 5 (). Thus, training models using an additional frequency-domain constraint more readily allows for existing ECG signal processing pipelines to be used.
![Refer to caption](x5.png)
3.4 Reduction in Average Heart Rate Estimation Error
We next evaluated the best performing model (chosen according to the validation set), for each objective function, on the test set. We also compared the models’ performance with respect to the mean absolute percentage error obtained if the peak-detection algorithm of Elgendi et al. [21, 22] were used directly on the PPG signals, without accelerometer data. We found that the model trained with the frequency-domain constraint outperforms the model trained without the constraint by when all the activities are considered (leftmost column of Table 1). It also outperforms a strong PPG peak-detection algorithm for heart rate estimation by .
All | Not Active | Active | ||||
---|---|---|---|---|---|---|
PPG [21, 22] | ||||||
|
||||||
|
4 Conclusions
The increasing adoption of health-oriented wearable devices will increase the availability of PPG signals. A subject-agnostic model to generate ECG signals from PPG signals could provide vast amounts of information of wearable-users’ cardiovascular health. This information could potentially be used as a continuous-monitoring platform for existing cardiovascular conditions.
In this work, we have made some additional progress into this problem by providing a basis upon which next-generation, GAN-based models could be developed. Firstly, by converting PPG signals to ECG signals, we obtained more accurate heart rate estimations when compared to state-of-the-art PPG heart rate estimation algorithms. This was especially prominent when participants were performing activities causing noisier PPG signals.
We also showed that incorporating a frequency-domain constraint during subject-agnostic model training conferred several advantages. It led to improved performance stability (across random seeds) during GAN training, measured by average heart rate estimation error. Moreover, the constraint led to a reduced number of heart rate estimation failure cases, therefore improving the reliability of the detected heart rates so that they can be more readily used with existing ECG signal processing pipelines. Finally, models trained with the constraint had reduced average heart rate estimation error compared to the model trained without the constraint.
References
- [1] US Preventive Services Task Force, Carol M Mangione, Michael J Barry, Wanda K Nicholson, Michael Cabana, David Chelmow, Tumaini Rucker Coker, Esa M Davis, Katrina E Donahue, Chyke A Doubeni, Carlos Roberto Jaén, Martha Kubik, Li Li, Gbenga Ogedegbe, Lori Pbert, John M Ruiz, James Stevermer, and John B Wong, “Vitamin, mineral, and multivitamin supplementation to prevent cardiovascular disease and cancer: US preventive services task force recommendation statement,” JAMA, vol. 327, no. 23, pp. 2326–2333, June 2022.
- [2] Zulqarnain Javed, Muhammad Haisum Maqsood, Tamer Yahya, Zahir Amin, Isaac Acquah, Javier Valero-Elizondo, Julia Andrieni, Prachi Dubey, Ryane K Jackson, Mary A Daffin, Miguel Cainzos-Achirica, Adnan A Hyder, and Khurram Nasir, “Race, racism, and cardiovascular health: Applying a social determinants of health framework to racial/ethnic disparities in cardiovascular disease,” Circ. Cardiovasc. Qual. Outcomes, vol. 15, no. 1, pp. e007917, Jan. 2022.
- [3] Peter H Charlton, Panicos A Kyriaco, Jonathan Mant, Vaidotas Marozas, Phil Chowienczyk, and Jordi Alastruey, “Wearable photoplethysmography for cardiovascular monitoring,” Proc. IEEE Inst. Electr. Electron. Eng., vol. 110, no. 3, pp. 355–381, Mar 2022.
- [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, vol. 27.
- [5] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1125–1134.
- [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232.
- [7] Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark, “Mimic-iii, a freely accessible critical care database,” Scientific Data, vol. 3, no. 1, pp. 1–9, 2016.
- [8] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
- [9] George B Moody and Roger G Mark, “A database to support development and evaluation of intelligent intensive care monitoring,” in Computers in Cardiology. IEEE, 1996, pp. 657–660.
- [10] Khuong Vo, Emad Kasaeyan Naeini, Amir Naderi, Daniel Jilani, Amir M Rahmani, Nikil Dutt, and Hung Cao, “P2e-wgan: Ecg waveform synthesis from ppg with conditional wasserstein generative adversarial networks,” in Proceedings of the 36th Annual ACM Symposium on Applied Computing, 2021, pp. 1030–1036.
- [11] Pritam Sarkar and Ali Etemad, “Cardiogan: Attentive generative adversarial network with dual discriminators for synthesis of ecg from ppg,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, pp. 488–496.
- [12] Qunfeng Tang, Zhencheng Chen, Yanke Guo, Yongbo Liang, Rabab Ward, Carlo Menon, and Mohamed Elgendi, “Robust reconstruction of electrocardiogram using photoplethysmography: A subject-based model,” Frontiers in Physiology, p. 645, 2022.
- [13] Attila Reiss, Ina Indlekofer, Philip Schmidt, and Kristof Van Laerhoven, “Deep ppg: Large-scale heart rate estimation with convolutional neural networks,” Sensors, vol. 19, no. 14, pp. 3079, 2019.
- [14] Carlos Carreiras, Ana Priscila Alves, André Lourenço, Filipe Canento, Hugo Silva, Ana Fred, et al., “BioSPPy: Biosignal processing in Python,” 2015–, [Online].
- [15] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
- [16] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
- [17] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
- [18] Pat Hamilton, “Open source ecg analysis,” in Computers in Cardiology. IEEE, 2002, pp. 101–104.
- [19] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen, “Improved techniques for training gans,” Advances in Neural Information Processing Systems, vol. 29, 2016.
- [20] Ian Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160, 2016.
- [21] Mohamed Elgendi, Ian Norton, Matt Brearley, Derek Abbott, and Dale Schuurmans, “Systolic peak detection in acceleration photoplethysmograms measured from emergency responders in tropical conditions,” PloS one, vol. 8, no. 10, pp. e76585, 2013.
- [22] Dominique Makowski, Tam Pham, Zen J. Lau, Jan C. Brammer, François Lespinasse, Hung Pham, Christopher Schölzel, and S. H. Annabel Chen, “NeuroKit2: A python toolbox for neurophysiological signal processing,” Behavior Research Methods, vol. 53, no. 4, pp. 1689–1696, 2021.