-
Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health
Authors:
Apiwat Ditthapron,
Emmanuel O. Agu,
Adam C. Lammert
Abstract:
Modern smartphones possess hardware for audio acquisition and to perform speech processing tasks such as speaker recognition and health assessment. However, energy consumption remains a concern, especially for resource-intensive DNNs. Prior work has improved the DNN energy efficiency by utilizing a compact model or reducing the dimensions of speech features. Both approaches reduced energy consumpt…
▽ More
Modern smartphones possess hardware for audio acquisition and to perform speech processing tasks such as speaker recognition and health assessment. However, energy consumption remains a concern, especially for resource-intensive DNNs. Prior work has improved the DNN energy efficiency by utilizing a compact model or reducing the dimensions of speech features. Both approaches reduced energy consumption during DNN inference but not during speech acquisition. This paper proposes using a masking kernel integrated into gradient descent during DNN training to learn the most energy-efficient speech length and sampling rate for windowing, a common step for sample construction. To determine the most energy-optimal parameters, a masking function with non-zero derivatives was combined with a low-pass filter. The proposed approach minimizes the energy consumption of both data collection and inference by 57%, and is competitive with speaker recognition and traumatic brain injury detection baselines.
△ Less
Submitted 15 August, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Improving Heart Rate Estimation on Consumer Grade Wrist-Worn Device Using Post-Calibration Approach
Authors:
Tanut Choksatchawathi,
Puntawat Ponglertnapakorn,
Apiwat Ditthapron,
Pitshaporn Leelaarporn,
Thayakorn Wisutthisen,
Maytus Piriyajitakonkij,
Theerawit Wilaiprasitporn
Abstract:
The technological advancement in wireless health monitoring through the direct contact of the skin allows the development of light-weight wrist-worn wearable devices to be equipped with different sensors such as photoplethysmography (PPG) sensors. However, the motion artifact (MA) is possible to occur during daily activities. In this study, we attempted to perform a post-calibration of the heart r…
▽ More
The technological advancement in wireless health monitoring through the direct contact of the skin allows the development of light-weight wrist-worn wearable devices to be equipped with different sensors such as photoplethysmography (PPG) sensors. However, the motion artifact (MA) is possible to occur during daily activities. In this study, we attempted to perform a post-calibration of the heart rate (HR) estimation during the three possible states of average daily activity (resting, \textcolor{red}{laying down}, and intense treadmill activity states) in 29 participants (130 minutes/person) on four popular wearable devices: Fitbit Charge HR, Apple Watch Series 4, TicWatch Pro, and Empatica E4. In comparison to the standard measurement (HR$_\text{ECG}$), HR provided by Fitbit Charge HR (HR$_\text{Fitbit}$) yielded the highest error of $3.26 \pm 0.34$ bpm in resting, $2.33 \pm 0.23$ bpm in \textcolor{red}{laying down}, $9.53 \pm 1.47$ bpm in intense treadmill activity states, and $5.02 \pm 0.64$ bpm in all states combined among the four chosen devices. Following our improving HR estimation model with rolling windows as feature (HR$_\text{R}$), the mean absolute error (MAE) was significantly reduced by $33.44\%$ in resting, $15.88\%$ in \textcolor{red}{laying down}, $9.55\%$ in intense treadmill activity states, and $18.73\%$ in all states combined. This demonstrates the feasibility of our proposed methods in order to correct and provide HR monitoring post-calibrated with high accuracy, raising further awareness of individual fitness in the daily application.
△ Less
Submitted 5 March, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Deep Neural Networks with Weighted Averaged Overnight Airflow Features for Sleep Apnea-Hypopnea Severity Classification
Authors:
Payongkit Lakhan,
Apiwat Ditthapron,
Nannapas Banluesombatkul,
Theerawit Wilaiprasitporn
Abstract:
Dramatic raising of Deep Learning (DL) approach and its capability in biomedical applications lead us to explore the advantages of using DL for sleep Apnea-Hypopnea severity classification. To reduce the complexity of clinical diagnosis using Polysomnography (PSG), which is multiple sensing platform, we incorporates our proposed DL scheme into one single Airflow (AF) sensing signal (subset of PSG)…
▽ More
Dramatic raising of Deep Learning (DL) approach and its capability in biomedical applications lead us to explore the advantages of using DL for sleep Apnea-Hypopnea severity classification. To reduce the complexity of clinical diagnosis using Polysomnography (PSG), which is multiple sensing platform, we incorporates our proposed DL scheme into one single Airflow (AF) sensing signal (subset of PSG). Seventeen features have been extracted from AF and then fed into Deep Neural Networks to classify in two studies. First, we proposed a binary classifications which use the cutoff indices at AHI = 5, 15 and 30 events/hour. Second, the multiple Sleep Apnea-Hypopnea Syndrome (SAHS) severity classification was proposed to classify patients into 4 groups including no SAHS, mild SAHS, moderate SAHS, and severe SAHS. For methods evaluation, we used a higher number of patients than related works to accommodate more diversity which includes 520 AF records obtained from the MrOS sleep study (Visit 2) database. We then applied the 10-fold cross-validation technique to get the accuracy, sensitivity and specificity. Moreover, we compared the results from our main classifier with other two approaches which were used in previous researches including the Support Vector Machine (SVM) and the Adaboost-Classification and Regression Trees (AB-CART). From the binary classification, our proposed method provides significantly higher performance than other two approaches with the accuracy of 83.46 %, 85.39 % and 92.69 % in each cutoff, respectively. For the multiclass classification, it also returns a highest accuracy of all approaches with 63.70 %.
△ Less
Submitted 31 August, 2018;
originally announced August 2018.
-
Universal Joint Feature Extraction for P300 EEG Classification using Multi-task Autoencoder
Authors:
Apiwat Ditthapron,
Nannapas Banluesombatkul,
Sombat Ketrat,
Ekapol Chuangsuwanich,
Theerawit Wilaiprasitporn
Abstract:
The process of recording Electroencephalography (EEG) signals is onerous and requires massive storage to store signals at an applicable frequency rate. In this work, we propose the EventRelated Potential Encoder Network (ERPENet); a multi-task autoencoder-based model, that can be applied to any ERP-related tasks. The strength of ERPENet lies in its capability to handle various kinds of ERP dataset…
▽ More
The process of recording Electroencephalography (EEG) signals is onerous and requires massive storage to store signals at an applicable frequency rate. In this work, we propose the EventRelated Potential Encoder Network (ERPENet); a multi-task autoencoder-based model, that can be applied to any ERP-related tasks. The strength of ERPENet lies in its capability to handle various kinds of ERP datasets and its robustness across multiple recording setups, enabling joint training across datasets. ERPENet incorporates Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), in an autoencoder setup, which tries to simultaneously compress the input EEG signal and extract related P300 features into a latent vector. Here, we can infer the process for generating the latent vector as universal joint feature extraction. The network also includes a classification part for attended and unattended events classification as an auxiliary task. We experimented on six different P300 datasets. The results show that the latent vector exhibits better compression capability than the previous state-of-the-art semi-supervised autoencoder model. For attended and unattended events classification, pre-trained weights are adopted as initial weights and tested on unseen P300 datasets to evaluate the adaptability of the model, which shortens the training process as compared to using random Xavier weight initialization. At the compression rate of 6.84, the classification accuracy outperforms conventional P300 classification models: XdawnLDA, DeepConvNet, and EEGNet achieving 79.37% - 88.52% classification accuracy depending on the dataset.
△ Less
Submitted 30 April, 2019; v1 submitted 30 July, 2018;
originally announced August 2018.
-
Affective EEG-Based Person Identification Using the Deep Learning Approach
Authors:
Theerawit Wilaiprasitporn,
Apiwat Ditthapron,
Karis Matchaparn,
Tanaboon Tongbuasirilai,
Nannapas Banluesombatkul,
Ekapol Chuangsuwanich
Abstract:
Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance…
▽ More
Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance of affective EEG-based PI using a deep learning approach. \textcolor{red}{We proposed a cascade of deep learning using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)}. CNNs are used to handle the spatial information from the EEG while RNNs extract the temporal information. \textcolor{red}{We evaluated two types of RNNs, namely, Long Short-Term Memory (CNN-LSTM) and Gated Recurrent Unit (CNN-GRU). } The proposed method is evaluated on the state-of-the-art affective dataset DEAP. The results indicate that CNN-GRU and CNN-LSTM can perform PI from different affective states and reach up to 99.90--100\% mean Correct Recognition Rate (CRR), significantly outperforming a support vector machine (SVM) baseline system that uses power spectral density (PSD) features. Notably, the 100\% mean \emph{CRR} comes from only 40 subjects in DEAP dataset. To reduce the number of EEG electrodes from thirty-two to five for more practical applications, the frontal region gives the best results reaching up to 99.17\% CRR (from CNN-GRU). Amongst the two deep learning models, we find CNN-GRU to slightly outperform CNN-LSTM, while having faster training time. \textcolor{red}{Furthermore, CNN-GRU overcomes the influence of affective states in EEG-Based PI reported in the previous works.
△ Less
Submitted 29 April, 2019; v1 submitted 5 July, 2018;
originally announced July 2018.