Search | arXiv e-print repository

doi 10.21437/Interspeech.2023-1026

Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health

Authors: Apiwat Ditthapron, Emmanuel O. Agu, Adam C. Lammert

Abstract: Modern smartphones possess hardware for audio acquisition and to perform speech processing tasks such as speaker recognition and health assessment. However, energy consumption remains a concern, especially for resource-intensive DNNs. Prior work has improved the DNN energy efficiency by utilizing a compact model or reducing the dimensions of speech features. Both approaches reduced energy consumpt… ▽ More Modern smartphones possess hardware for audio acquisition and to perform speech processing tasks such as speaker recognition and health assessment. However, energy consumption remains a concern, especially for resource-intensive DNNs. Prior work has improved the DNN energy efficiency by utilizing a compact model or reducing the dimensions of speech features. Both approaches reduced energy consumption during DNN inference but not during speech acquisition. This paper proposes using a masking kernel integrated into gradient descent during DNN training to learn the most energy-efficient speech length and sampling rate for windowing, a common step for sample construction. To determine the most energy-optimal parameters, a masking function with non-zero derivatives was combined with a low-pass filter. The proposed approach minimizes the energy consumption of both data collection and inference by 57%, and is competitive with speaker recognition and traumatic brain injury detection baselines. △ Less

Submitted 15 August, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Journal ref: Proc. INTERSPEECH 2023, 2843-2847

arXiv:1910.07881 [pdf, other]

doi 10.1109/JSEN.2020.2979191

Improving Heart Rate Estimation on Consumer Grade Wrist-Worn Device Using Post-Calibration Approach

Authors: Tanut Choksatchawathi, Puntawat Ponglertnapakorn, Apiwat Ditthapron, Pitshaporn Leelaarporn, Thayakorn Wisutthisen, Maytus Piriyajitakonkij, Theerawit Wilaiprasitporn

Abstract: The technological advancement in wireless health monitoring through the direct contact of the skin allows the development of light-weight wrist-worn wearable devices to be equipped with different sensors such as photoplethysmography (PPG) sensors. However, the motion artifact (MA) is possible to occur during daily activities. In this study, we attempted to perform a post-calibration of the heart r… ▽ More The technological advancement in wireless health monitoring through the direct contact of the skin allows the development of light-weight wrist-worn wearable devices to be equipped with different sensors such as photoplethysmography (PPG) sensors. However, the motion artifact (MA) is possible to occur during daily activities. In this study, we attempted to perform a post-calibration of the heart rate (HR) estimation during the three possible states of average daily activity (resting, \textcolor{red}{laying down}, and intense treadmill activity states) in 29 participants (130 minutes/person) on four popular wearable devices: Fitbit Charge HR, Apple Watch Series 4, TicWatch Pro, and Empatica E4. In comparison to the standard measurement (HR$_\text{ECG}$), HR provided by Fitbit Charge HR (HR$_\text{Fitbit}$) yielded the highest error of $3.26 \pm 0.34$ bpm in resting, $2.33 \pm 0.23$ bpm in \textcolor{red}{laying down}, $9.53 \pm 1.47$ bpm in intense treadmill activity states, and $5.02 \pm 0.64$ bpm in all states combined among the four chosen devices. Following our improving HR estimation model with rolling windows as feature (HR$_\text{R}$), the mean absolute error (MAE) was significantly reduced by $33.44\%$ in resting, $15.88\%$ in \textcolor{red}{laying down}, $9.55\%$ in intense treadmill activity states, and $18.73\%$ in all states combined. This demonstrates the feasibility of our proposed methods in order to correct and provide HR monitoring post-calibrated with high accuracy, raising further awareness of individual fitness in the daily application. △ Less

Submitted 5 March, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Journal ref: IEEE Sensors Journal, 2020

arXiv:1808.10845 [pdf, other]

doi 10.1109/TENCON.2018.8650491

Deep Neural Networks with Weighted Averaged Overnight Airflow Features for Sleep Apnea-Hypopnea Severity Classification

Authors: Payongkit Lakhan, Apiwat Ditthapron, Nannapas Banluesombatkul, Theerawit Wilaiprasitporn

Abstract: Dramatic raising of Deep Learning (DL) approach and its capability in biomedical applications lead us to explore the advantages of using DL for sleep Apnea-Hypopnea severity classification. To reduce the complexity of clinical diagnosis using Polysomnography (PSG), which is multiple sensing platform, we incorporates our proposed DL scheme into one single Airflow (AF) sensing signal (subset of PSG)… ▽ More Dramatic raising of Deep Learning (DL) approach and its capability in biomedical applications lead us to explore the advantages of using DL for sleep Apnea-Hypopnea severity classification. To reduce the complexity of clinical diagnosis using Polysomnography (PSG), which is multiple sensing platform, we incorporates our proposed DL scheme into one single Airflow (AF) sensing signal (subset of PSG). Seventeen features have been extracted from AF and then fed into Deep Neural Networks to classify in two studies. First, we proposed a binary classifications which use the cutoff indices at AHI = 5, 15 and 30 events/hour. Second, the multiple Sleep Apnea-Hypopnea Syndrome (SAHS) severity classification was proposed to classify patients into 4 groups including no SAHS, mild SAHS, moderate SAHS, and severe SAHS. For methods evaluation, we used a higher number of patients than related works to accommodate more diversity which includes 520 AF records obtained from the MrOS sleep study (Visit 2) database. We then applied the 10-fold cross-validation technique to get the accuracy, sensitivity and specificity. Moreover, we compared the results from our main classifier with other two approaches which were used in previous researches including the Support Vector Machine (SVM) and the Adaboost-Classification and Regression Trees (AB-CART). From the binary classification, our proposed method provides significantly higher performance than other two approaches with the accuracy of 83.46 %, 85.39 % and 92.69 % in each cutoff, respectively. For the multiclass classification, it also returns a highest accuracy of all approaches with 63.70 %. △ Less

Submitted 31 August, 2018; originally announced August 2018.

Journal ref: TENCON 2018 - 2018 IEEE Region 10 Conference

arXiv:1808.06541 [pdf, other]

doi 10.1109/ACCESS.2019.2919143

Universal Joint Feature Extraction for P300 EEG Classification using Multi-task Autoencoder

Authors: Apiwat Ditthapron, Nannapas Banluesombatkul, Sombat Ketrat, Ekapol Chuangsuwanich, Theerawit Wilaiprasitporn

Abstract: The process of recording Electroencephalography (EEG) signals is onerous and requires massive storage to store signals at an applicable frequency rate. In this work, we propose the EventRelated Potential Encoder Network (ERPENet); a multi-task autoencoder-based model, that can be applied to any ERP-related tasks. The strength of ERPENet lies in its capability to handle various kinds of ERP dataset… ▽ More The process of recording Electroencephalography (EEG) signals is onerous and requires massive storage to store signals at an applicable frequency rate. In this work, we propose the EventRelated Potential Encoder Network (ERPENet); a multi-task autoencoder-based model, that can be applied to any ERP-related tasks. The strength of ERPENet lies in its capability to handle various kinds of ERP datasets and its robustness across multiple recording setups, enabling joint training across datasets. ERPENet incorporates Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), in an autoencoder setup, which tries to simultaneously compress the input EEG signal and extract related P300 features into a latent vector. Here, we can infer the process for generating the latent vector as universal joint feature extraction. The network also includes a classification part for attended and unattended events classification as an auxiliary task. We experimented on six different P300 datasets. The results show that the latent vector exhibits better compression capability than the previous state-of-the-art semi-supervised autoencoder model. For attended and unattended events classification, pre-trained weights are adopted as initial weights and tested on unseen P300 datasets to evaluate the adaptability of the model, which shortens the training process as compared to using random Xavier weight initialization. At the compression rate of 6.84, the classification accuracy outperforms conventional P300 classification models: XdawnLDA, DeepConvNet, and EEGNet achieving 79.37% - 88.52% classification accuracy depending on the dataset. △ Less

Submitted 30 April, 2019; v1 submitted 30 July, 2018; originally announced August 2018.

Journal ref: IEEE Access 2019

arXiv:1807.03147 [pdf, other]

doi 10.1109/TCDS.2019.2924648

Affective EEG-Based Person Identification Using the Deep Learning Approach

Authors: Theerawit Wilaiprasitporn, Apiwat Ditthapron, Karis Matchaparn, Tanaboon Tongbuasirilai, Nannapas Banluesombatkul, Ekapol Chuangsuwanich

Abstract: Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance… ▽ More Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance of affective EEG-based PI using a deep learning approach. \textcolor{red}{We proposed a cascade of deep learning using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)}. CNNs are used to handle the spatial information from the EEG while RNNs extract the temporal information. \textcolor{red}{We evaluated two types of RNNs, namely, Long Short-Term Memory (CNN-LSTM) and Gated Recurrent Unit (CNN-GRU). } The proposed method is evaluated on the state-of-the-art affective dataset DEAP. The results indicate that CNN-GRU and CNN-LSTM can perform PI from different affective states and reach up to 99.90--100\% mean Correct Recognition Rate (CRR), significantly outperforming a support vector machine (SVM) baseline system that uses power spectral density (PSD) features. Notably, the 100\% mean \emph{CRR} comes from only 40 subjects in DEAP dataset. To reduce the number of EEG electrodes from thirty-two to five for more practical applications, the frontal region gives the best results reaching up to 99.17\% CRR (from CNN-GRU). Amongst the two deep learning models, we find CNN-GRU to slightly outperform CNN-LSTM, while having faster training time. \textcolor{red}{Furthermore, CNN-GRU overcomes the influence of affective states in EEG-Based PI reported in the previous works. △ Less

Submitted 29 April, 2019; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: 10 pages

Journal ref: IEEE Transactions on Cognitive and Developmental System (2019)

Showing 1–5 of 5 results for author: Ditthapron, A