Search | arXiv e-print repository

Learning Job Title Representation from Job Description Aggregation Network

Authors: Napat Laosaengpha, Thanit Tativannarat, Chawan Piansaddhayanon, Attapol Rutherford, Ekapol Chuangsuwanich

Abstract: Learning job title representation is a vital process for develo** automatic human resource tools. To do so, existing methods primarily rely on learning the title representation through skills extracted from the job description, neglecting the rich and diverse content within. Thus, we propose an alternative framework for learning job titles through their respective job description (JD) and utiliz… ▽ More Learning job title representation is a vital process for develo** automatic human resource tools. To do so, existing methods primarily rely on learning the title representation through skills extracted from the job description, neglecting the rich and diverse content within. Thus, we propose an alternative framework for learning job titles through their respective job description (JD) and utilize a Job Description Aggregator component to handle the lengthy description and bidirectional contrastive loss to account for the bidirectional relationship between the job title and its description. We evaluated the performance of our method on both in-domain and out-of-domain settings, achieving a superior performance over the skill-based approach. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: to be published in Findings of the Association for Computational Linguistics: ACL 2024

arXiv:2406.06139 [pdf, other]

Thunder : Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge

Authors: Thanapat Trachu, Chawan Piansaddhayanon, Ekapol Chuangsuwanich

Abstract: Diffusion-based speech enhancement has shown promising results, but can suffer from a slower inference time. Initializing the diffusion process with the enhanced audio generated by a regression-based model can be used to reduce the computational steps required. However, these approaches often necessitate a regression model, further increasing the system's complexity. We propose Thunder, a unified… ▽ More Diffusion-based speech enhancement has shown promising results, but can suffer from a slower inference time. Initializing the diffusion process with the enhanced audio generated by a regression-based model can be used to reduce the computational steps required. However, these approaches often necessitate a regression model, further increasing the system's complexity. We propose Thunder, a unified regression-diffusion model that utilizes the Brownian bridge process which can allow the model to act in both modes. The regression mode can be accessed by setting the diffusion time step closed to 1. However, the standard score-based diffusion modeling does not perform well in this setup due to gradient instability. To mitigate this problem, we modify the diffusion model to predict the clean speech instead of the score function, achieving competitive performance with a more compact model size and fewer reverse steps. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures, 4 tables, This paper will be submitted in the interspeech conference

arXiv:2406.05733 [pdf, other]

MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

Authors: Danupat Khamnuansin, Tawunrat Chalothorn, Ekapol Chuangsuwanich

Abstract: Large Language Models (LLMs) often struggle with hallucinations and outdated information. To address this, Information Retrieval (IR) systems can be employed to augment LLMs with up-to-date knowledge. However, existing IR techniques contain deficiencies, posing a performance bottleneck. Given the extensive array of IR systems, combining diverse approaches presents a viable strategy. Nevertheless,… ▽ More Large Language Models (LLMs) often struggle with hallucinations and outdated information. To address this, Information Retrieval (IR) systems can be employed to augment LLMs with up-to-date knowledge. However, existing IR techniques contain deficiencies, posing a performance bottleneck. Given the extensive array of IR systems, combining diverse approaches presents a viable strategy. Nevertheless, prior attempts have yielded restricted efficacy. In this work, we propose an approach that leverages learning-to-rank techniques to combine heterogeneous IR systems. We demonstrate the method on two Retrieval Question Answering (ReQA) tasks. Our empirical findings exhibit a significant performance enhancement, outperforming previous approaches and achieving state-of-the-art results on ReQA SQuAD. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: To be published in Findings of ACL 2024

arXiv:2406.03125 [pdf, other]

Space Decomposition for Sentence Embedding

Authors: Wuttikorn Ponwitayarat, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong

Abstract: Determining sentence pair similarity is crucial for various NLP tasks. A common technique to address this is typically evaluated on a continuous semantic textual similarity scale from 0 to 5. However, based on a linguistic observation in STS annotation guidelines, we found that the score in the range [4,5] indicates an upper-range sample, while the rest are lower-range samples. This necessitates a… ▽ More Determining sentence pair similarity is crucial for various NLP tasks. A common technique to address this is typically evaluated on a continuous semantic textual similarity scale from 0 to 5. However, based on a linguistic observation in STS annotation guidelines, we found that the score in the range [4,5] indicates an upper-range sample, while the rest are lower-range samples. This necessitates a new approach to treating the upper-range and lower-range classes separately. In this paper, we introduce a novel embedding space decomposition method called MixSP utilizing a Mixture of Specialized Projectors, designed to distinguish and rank upper-range and lower-range samples accurately. The experimental results demonstrate that MixSP decreased the overlap representation between upper-range and lower-range classes significantly while outperforming competitors on STS and zero-shot benchmarks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: ACL Finding 2024. The code and pre-trained models are available at https://github.com/KornWtp/MixSP

arXiv:2403.16127 [pdf, other]

WangchanLion and WangchanX MRC Eval

Authors: Wannaphong Phatthiyaphaibun, Surapon Nonesung, Patomporn Payoungkhamdee, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Jitkapat Sawatphol, Chompakorn Chaksangchaichot, Ekapol Chuangsuwanich, Sarana Nutanong

Abstract: This technical report describes the development of WangchanLion, an instruction fine-tuned model focusing on Machine Reading Comprehension (MRC) in the Thai language. Our model is based on SEA-LION and a collection of instruction following datasets. To promote open research and reproducibility, we publicly release all training data, code, and the final model weights under the Apache-2 license. To… ▽ More This technical report describes the development of WangchanLion, an instruction fine-tuned model focusing on Machine Reading Comprehension (MRC) in the Thai language. Our model is based on SEA-LION and a collection of instruction following datasets. To promote open research and reproducibility, we publicly release all training data, code, and the final model weights under the Apache-2 license. To assess the contextual understanding capability, we conducted extensive experimental studies using two Thai MRC datasets, XQuAD and Iapp_wiki_qa_squad. Experimental results demonstrate the model's ability to comprehend the context and produce an answer faithful to the reference one in 0-shot and 1-shot settings. In addition, our evaluation goes beyond the traditional MRC. We propose a new evaluation scheme assessing the answer's correctness, helpfulness, conciseness, and contextuality. Our code is available publicly at https://github.com/vistec-AI/WangchanLion. △ Less

Submitted 23 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2311.03228 [pdf, other]

An Efficient Self-Supervised Cross-View Training For Sentence Embedding

Authors: Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong

Abstract: Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a representation learning method such as contrastive learning. While this approach achieves impressive performance on larger PLMs, the performance rapidly degrade… ▽ More Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a representation learning method such as contrastive learning. While this approach achieves impressive performance on larger PLMs, the performance rapidly degrades as the number of parameters decreases. In this paper, we propose a framework called Self-supervised Cross-View Training (SCT) to narrow the performance gap between large and small PLMs. To evaluate the effectiveness of SCT, we compare it to 5 baseline and state-of-the-art competitors on seven Semantic Textual Similarity (STS) benchmarks using 5 PLMs with the number of parameters ranging from 4M to 340M. The experimental results show that STC outperforms the competitors for PLMs with less than 100M parameters in 18 of 21 cases. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted to TACL. The code and pre-trained models are available at https://github.com/mrpeerat/SCT

arXiv:2306.10348 [pdf, other]

Typo-Robust Representation Learning for Dense Retrieval

Authors: Panuthep Tasawong, Wuttikorn Ponwitayarat, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong

Abstract: Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones. Unlike the existing approaches, which only fo… ▽ More Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones. Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries. To assess the effectiveness of our proposed method, we compare it against the existing competitors using two benchmark datasets and two base encoders. Our method outperforms the competitors in all cases with misspelled queries. Our code and models are available at https://github. com/panuthept/DST-DenseRetrieval. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: 5 pages, 2 figures

ACM Class: I.2.7

arXiv:2303.13396 [pdf, other]

Zero-guidance Segmentation Using Zero Segment Labels

Authors: Pitchaporn Rewatbowornwong, Nattanat Chatthee, Ekapol Chuangsuwanich, Supasorn Suwajanakorn

Abstract: CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabulary segmentation, which can locate any segment given an arbitrary text query. In our research, we ask whether it is possible to discover semantic segments without any user guidance in the form of text queries or predefined classes, and label them using natural language automatically? We propose a nove… ▽ More CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabulary segmentation, which can locate any segment given an arbitrary text query. In our research, we ask whether it is possible to discover semantic segments without any user guidance in the form of text queries or predefined classes, and label them using natural language automatically? We propose a novel problem zero-guidance segmentation and the first baseline that leverages two pre-trained generalist models, DINO and CLIP, to solve this problem without any fine-tuning or segmentation dataset. The general idea is to first segment an image into small over-segments, encode them into CLIP's visual-language space, translate them into text labels, and merge semantically similar segments together. The key challenge, however, is how to encode a visual segment into a segment-specific embedding that balances global and local context information, both useful for recognition. Our main contribution is a novel attention-masking technique that balances the two contexts by analyzing the attention layers inside CLIP. We also introduce several metrics for the evaluation of this new task. With CLIP's innate knowledge, our method can precisely locate the Mona Lisa painting among a museum crowd. Project page: https://zero-guide-seg.github.io/. △ Less

Submitted 4 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2208.04799 [pdf, ps, other]

Thai Wav2Vec2.0 with CommonVoice V8

Authors: Wannaphong Phatthiyaphaibun, Chompakorn Chaksangchaichot, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong

Abstract: Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and… ▽ More Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. We hope that our models will be beneficial to individuals and the ASR community in Thailand. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2202.13912 [pdf, other]

ReCasNet: Improving consistency within the two-stage mitosis detection framework

Authors: Chawan Piansaddhayanon, Sakun Santisukwongchote, Shanop Shuangshoti, Qingyi Tao, Sira Sriswasdi, Ekapol Chuangsuwanich

Abstract: Mitotic count (MC) is an important histological parameter for cancer diagnosis and grading, but the manual process for obtaining MC from whole-slide histopathological images is very time-consuming and prone to error. Therefore, deep learning models have been proposed to facilitate this process. Existing approaches utilize a two-stage pipeline: the detection stage for identifying the locations of p… ▽ More Mitotic count (MC) is an important histological parameter for cancer diagnosis and grading, but the manual process for obtaining MC from whole-slide histopathological images is very time-consuming and prone to error. Therefore, deep learning models have been proposed to facilitate this process. Existing approaches utilize a two-stage pipeline: the detection stage for identifying the locations of potential mitotic cells and the classification stage for refining prediction confidences. However, this pipeline formulation can lead to inconsistencies in the classification stage due to the poor prediction quality of the detection stage and the mismatches in training data distributions between the two stages. In this study, we propose a Refine Cascade Network (ReCasNet), an enhanced deep learning pipeline that mitigates the aforementioned problems with three improvements. First, window relocation was used to reduce the number of poor quality false positives generated during the detection stage. Second, object re-crop** was performed with another deep learning model to adjust poorly centered objects. Third, improved data selection strategies were introduced during the classification stage to reduce the mismatches in training data distributions. ReCasNet was evaluated on two large-scale mitotic figure recognition datasets, canine cutaneous mast cell tumor (CCMCT) and canine mammary carcinoma (CMC), which resulted in up to 4.8% percentage point improvements in the F1 scores for mitotic cell detection and 44.1% reductions in mean absolute percentage error (MAPE) for MC prediction. Techniques that underlie ReCasNet can be generalized to other two-stage object detection networks and should contribute to improving the performances of deep learning models in broad digital pathology applications. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2010.11475 [pdf, other]

High resolution weakly supervised localization architectures for medical images

Authors: Konpat Preechakul, Sira Sriswasdi, Boonserm Kijsirikul, Ekapol Chuangsuwanich

Abstract: In medical imaging, Class-Activation Map (CAM) serves as the main explainability tool by pointing to the region of interest. Since the localization accuracy from CAM is constrained by the resolution of the model's feature map, one may expect that segmentation models, which generally have large feature maps, would produce more accurate CAMs. However, we have found that this is not the case due to t… ▽ More In medical imaging, Class-Activation Map (CAM) serves as the main explainability tool by pointing to the region of interest. Since the localization accuracy from CAM is constrained by the resolution of the model's feature map, one may expect that segmentation models, which generally have large feature maps, would produce more accurate CAMs. However, we have found that this is not the case due to task mismatch. While segmentation models are developed for datasets with pixel-level annotation, only image-level annotation is available in most medical imaging datasets. Our experiments suggest that Global Average Pooling (GAP) and Group Normalization are the main culprits that worsen the localization accuracy of CAM. To address this issue, we propose Pyramid Localization Network (PYLON), a model for high-accuracy weakly-supervised localization that achieved 0.62 average point localization accuracy on NIH's Chest X-Ray 14 dataset, compared to 0.45 for a traditional CAM model. Source code and extended results are available at https://github.com/cmb-chula/pylon. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: submitted to ICASSP 2021

arXiv:2005.07920 [pdf, other]

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

Authors: Burin Naowarat, Thananchai Kongthaworn, Korrawe Karunratanakul, Sheng Hui Wu, Ekapol Chuangsuwanich

Abstract: Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme duplication, resulting in language-inconsistent spellings. We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling con… ▽ More Code-Switching (CS) remains a challenge for Automatic Speech Recognition (ASR), especially character-based models. With the combined choice of characters from multiple languages, the outcome from character-based models suffers from phoneme duplication, resulting in language-inconsistent spellings. We propose Contextualized Connectionist Temporal Classification (CCTC) loss to encourage spelling consistencies of a character-based non-autoregressive ASR which allows for faster inference. The CCTC loss conditions the main prediction on the predicted contexts to ensure language consistency in the spellings. In contrast to existing CTC-based approaches, CCTC loss does not require frame-level alignments, since the context ground truth is obtained from the model's estimated path. Compared to the same model trained with regular CTC loss, our method consistently improved the ASR performance on both CS and monolingual corpora. △ Less

Submitted 22 June, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

Comments: ICASSP 2021

arXiv:2004.04157 [pdf, other]

doi 10.1109/JBHI.2020.3037693

MetaSleepLearner: A Pilot Study on Fast Adaptation of Bio-signals-Based Sleep Stage Classifier to New Individual Subject Using Meta-Learning

Authors: Nannapas Banluesombatkul, Pichayoot Ouppaphan, Pitshaporn Leelaarporn, Payongkit Lakhan, Busarakum Chaitusaney, Nattapong Jaimchariyatam, Ekapol Chuangsuwanich, Wei Chen, Huy Phan, Nat Dilokthanakul, Theerawit Wilaiprasitporn

Abstract: Identifying bio-signals based-sleep stages requires time-consuming and tedious labor of skilled clinicians. Deep learning approaches have been introduced in order to challenge the automatic sleep stage classification conundrum. However, the difficulties can be posed in replacing the clinicians with the automatic system due to the differences in many aspects found in individual bio-signals, causing… ▽ More Identifying bio-signals based-sleep stages requires time-consuming and tedious labor of skilled clinicians. Deep learning approaches have been introduced in order to challenge the automatic sleep stage classification conundrum. However, the difficulties can be posed in replacing the clinicians with the automatic system due to the differences in many aspects found in individual bio-signals, causing the inconsistency in the performance of the model on every incoming individual. Thus, we aim to explore the feasibility of using a novel approach, capable of assisting the clinicians and lessening the workload. We propose the transfer learning framework, entitled MetaSleepLearner, based on Model Agnostic Meta-Learning (MAML), in order to transfer the acquired sleep staging knowledge from a large dataset to new individual subjects. The framework was demonstrated to require the labelling of only a few sleep epochs by the clinicians and allow the remainder to be handled by the system. Layer-wise Relevance Propagation (LRP) was also applied to understand the learning course of our approach. In all acquired datasets, in comparison to the conventional approach, MetaSleepLearner achieved a range of 5.4\% to 17.7\% improvement with statistical difference in the mean of both approaches. The illustration of the model interpretation after the adaptation to each subject also confirmed that the performance was directed towards reasonable learning. MetaSleepLearner outperformed the conventional approaches as a result from the fine-tuning using the recordings of both healthy subjects and patients. This is the first work that investigated a non-conventional pre-training method, MAML, resulting in a possibility for human-machine collaboration in sleep stage classification and easing the burden of the clinicians in labelling the sleep stages through only several epochs rather than an entire recording. △ Less

Submitted 10 November, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

Comments: IEEE Journal of Biomedical and Health Informatics (Accepted) (source code is available at https://github.com/IoBT-VISTEC/MetaSleepLearner)

Journal ref: IEEE Journal of Biomedical and Health Informatics (2020)

arXiv:1908.01294 [pdf, ps, other]

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Authors: Chanatip Saetia, Ekapol Chuangsuwanich, Tawunrat Chalothorn, Peerapon Vateekul

Abstract: A sentence is typically treated as the minimal syntactic unit used for extracting valuable information from a longer piece of text. However, in written Thai, there are no explicit sentence markers. We proposed a deep learning model for the task of sentence segmentation that includes three main contributions. First, we integrate n-gram embedding as a local representation to capture word groups near… ▽ More A sentence is typically treated as the minimal syntactic unit used for extracting valuable information from a longer piece of text. However, in written Thai, there are no explicit sentence markers. We proposed a deep learning model for the task of sentence segmentation that includes three main contributions. First, we integrate n-gram embedding as a local representation to capture word groups near sentence boundaries. Second, to focus on the keywords of dependent clauses, we combine the model with a distant representation obtained from self-attention modules. Finally, due to the scarcity of labeled data, for which annotation is difficult and time-consuming, we also investigate and adapt Cross-View Training (CVT) as a semi-supervised learning technique, allowing us to utilize unlabeled data to improve the model representations. In the Thai sentence segmentation experiments, our model reduced the relative error by 7.4% and 10.5% compared with the baseline models on the Orchid and UGWC datasets, respectively. We also applied our model to the task of pronunciation recovery on the IWSLT English dataset. Our model outperformed the prior sequence tagging models, achieving a relative error reduction of 2.5%. Ablation studies revealed that utilizing n-gram presentations was the main contributing factor for Thai, while the semi-supervised training helped the most for English. △ Less

Submitted 25 August, 2019; v1 submitted 4 August, 2019; originally announced August 2019.

Comments: 19 pages, 6 figures

arXiv:1808.10852 [pdf, other]

doi 10.1109/TENCON.2018.8650546

Towards Asynchronous Motor Imagery-Based Brain-Computer Interfaces: a joint training scheme using deep learning

Authors: Patcharin Cheng, Phairot Autthasan, Boriwat Pijarana, Ekapol Chuangsuwanich, Theerawit Wilaiprasitporn

Abstract: In this paper, the deep learning (DL) approach is applied to a joint training scheme for asynchronous motor imagery-based Brain-Computer Interface (BCI). The proposed DL approach is a cascade of one-dimensional convolutional neural networks and fully-connected neural networks (CNN-FC). The focus is mainly on three types of brain responses: non-imagery EEG (\textit{background EEG}), (\textit{pure i… ▽ More In this paper, the deep learning (DL) approach is applied to a joint training scheme for asynchronous motor imagery-based Brain-Computer Interface (BCI). The proposed DL approach is a cascade of one-dimensional convolutional neural networks and fully-connected neural networks (CNN-FC). The focus is mainly on three types of brain responses: non-imagery EEG (\textit{background EEG}), (\textit{pure imagery}) EEG, and EEG during the transitional period between background EEG and pure imagery (\textit{transitional imagery}). The study of transitional imagery signals should provide greater insight into real-world scenarios. It may be inferred that pure imagery and transitional EEG are high and low power EEG imagery, respectively. Moreover, the results from the CNN-FC are compared to the conventional approach for motor imagery-BCI, namely the common spatial pattern (CSP) for feature extraction and support vector machine (SVM) for classification (CSP-SVM). Under a joint training scheme, pure and transitional imagery are treated as the same class, while background EEG is another class. Ten-fold cross-validation is used to evaluate whether the joint training scheme significantly improves the performance task of classifying pure and transitional imagery signals from background EEG. Using sparse of just a few electrode channels ($C_{z}$, $C_{3}$ and $C_{4}$), mean accuracy reaches 71.52 % and 70.27 % for CNN-FC and CSP-SVM, respectively. On the other hand, mean accuracy without the joint training scheme achieve only 62.68 % and 52.41 % for CNN-FC and CSP-SVM, respectively. △ Less

Submitted 31 August, 2018; originally announced August 2018.

Journal ref: TENCON 2018 - 2018 IEEE Region 10 Conference

arXiv:1808.06541 [pdf, other]

doi 10.1109/ACCESS.2019.2919143

Universal Joint Feature Extraction for P300 EEG Classification using Multi-task Autoencoder

Authors: Apiwat Ditthapron, Nannapas Banluesombatkul, Sombat Ketrat, Ekapol Chuangsuwanich, Theerawit Wilaiprasitporn

Abstract: The process of recording Electroencephalography (EEG) signals is onerous and requires massive storage to store signals at an applicable frequency rate. In this work, we propose the EventRelated Potential Encoder Network (ERPENet); a multi-task autoencoder-based model, that can be applied to any ERP-related tasks. The strength of ERPENet lies in its capability to handle various kinds of ERP dataset… ▽ More The process of recording Electroencephalography (EEG) signals is onerous and requires massive storage to store signals at an applicable frequency rate. In this work, we propose the EventRelated Potential Encoder Network (ERPENet); a multi-task autoencoder-based model, that can be applied to any ERP-related tasks. The strength of ERPENet lies in its capability to handle various kinds of ERP datasets and its robustness across multiple recording setups, enabling joint training across datasets. ERPENet incorporates Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), in an autoencoder setup, which tries to simultaneously compress the input EEG signal and extract related P300 features into a latent vector. Here, we can infer the process for generating the latent vector as universal joint feature extraction. The network also includes a classification part for attended and unattended events classification as an auxiliary task. We experimented on six different P300 datasets. The results show that the latent vector exhibits better compression capability than the previous state-of-the-art semi-supervised autoencoder model. For attended and unattended events classification, pre-trained weights are adopted as initial weights and tested on unseen P300 datasets to evaluate the adaptability of the model, which shortens the training process as compared to using random Xavier weight initialization. At the compression rate of 6.84, the classification accuracy outperforms conventional P300 classification models: XdawnLDA, DeepConvNet, and EEGNet achieving 79.37% - 88.52% classification accuracy depending on the dataset. △ Less

Submitted 30 April, 2019; v1 submitted 30 July, 2018; originally announced August 2018.

Journal ref: IEEE Access 2019

arXiv:1807.03147 [pdf, other]

doi 10.1109/TCDS.2019.2924648

Affective EEG-Based Person Identification Using the Deep Learning Approach

Authors: Theerawit Wilaiprasitporn, Apiwat Ditthapron, Karis Matchaparn, Tanaboon Tongbuasirilai, Nannapas Banluesombatkul, Ekapol Chuangsuwanich

Abstract: Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance… ▽ More Electroencephalography (EEG) is another mode for performing Person Identification (PI). Due to the nature of the EEG signals, EEG-based PI is typically done while the person is performing some kind of mental task, such as motor control. However, few works have considered EEG-based PI while the person is in different mental states (affective EEG). The aim of this paper is to improve the performance of affective EEG-based PI using a deep learning approach. \textcolor{red}{We proposed a cascade of deep learning using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)}. CNNs are used to handle the spatial information from the EEG while RNNs extract the temporal information. \textcolor{red}{We evaluated two types of RNNs, namely, Long Short-Term Memory (CNN-LSTM) and Gated Recurrent Unit (CNN-GRU). } The proposed method is evaluated on the state-of-the-art affective dataset DEAP. The results indicate that CNN-GRU and CNN-LSTM can perform PI from different affective states and reach up to 99.90--100\% mean Correct Recognition Rate (CRR), significantly outperforming a support vector machine (SVM) baseline system that uses power spectral density (PSD) features. Notably, the 100\% mean \emph{CRR} comes from only 40 subjects in DEAP dataset. To reduce the number of EEG electrodes from thirty-two to five for more practical applications, the frontal region gives the best results reaching up to 99.17\% CRR (from CNN-GRU). Amongst the two deep learning models, we find CNN-GRU to slightly outperform CNN-LSTM, while having faster training time. \textcolor{red}{Furthermore, CNN-GRU overcomes the influence of affective states in EEG-Based PI reported in the previous works. △ Less

Submitted 29 April, 2019; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: 10 pages

Journal ref: IEEE Transactions on Cognitive and Developmental System (2019)

arXiv:1805.11491 [pdf]

Rice Classification Using Spatio-Spectral Deep Convolutional Neural Network

Authors: Itthi Chatnuntawech, Kittipong Tantisantisom, Paisan Khanchaitit, Thitikorn Boonkoom, Berkin Bilgic, Ekapol Chuangsuwanich

Abstract: Rice has been one of the staple foods that contribute significantly to human food supplies. Numerous rice varieties have been cultivated, imported, and exported worldwide. Different rice varieties could be mixed during rice production and trading. Rice impurities could damage the trust between rice importers and exporters, calling for the need to develop a rice variety inspection system. In this w… ▽ More Rice has been one of the staple foods that contribute significantly to human food supplies. Numerous rice varieties have been cultivated, imported, and exported worldwide. Different rice varieties could be mixed during rice production and trading. Rice impurities could damage the trust between rice importers and exporters, calling for the need to develop a rice variety inspection system. In this work, we develop a non-destructive rice variety classification system that benefits from the synergy between hyperspectral imaging and deep convolutional neural network (CNN). The proposed method uses a hyperspectral imaging system to simultaneously acquire complementary spatial and spectral information of rice seeds. The rice varieties are then determined from the acquired spatio-spectral data using a deep CNN. As opposed to several existing rice variety classification methods that require hand-engineered features, the proposed method automatically extracts spatio-spectral features from the raw sensor data. As demonstrated using two types of rice datasets, the proposed method achieved up to 11.9% absolute improvement in the mean classification accuracy, compared to the commonly used classification methods based on support vector machines. △ Less

Submitted 25 June, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: 22 pages, 10 figures, 6 tables; more methods and experiments included with references; link to github included; article restructured for clarity; typos fixed

arXiv:1510.08985 [pdf, other]

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

Authors: Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

Abstract: In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition. A PAC-RNN is comprised of a pair of neural networks in which a {\it correction} network uses auxiliary information given by a {\it prediction} network to help estimate the state probability. The information from the correction network is also used by t… ▽ More In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition. A PAC-RNN is comprised of a pair of neural networks in which a {\it correction} network uses auxiliary information given by a {\it prediction} network to help estimate the state probability. The information from the correction network is also used by the prediction network in a recurrent loop. Our model outperforms other state-of-the-art neural networks (DNNs, LSTMs) on IARPA-Babel tasks. Moreover, transfer learning from a language that is similar to the target language can help improve performance further. △ Less

Submitted 30 October, 2015; originally announced October 2015.

Showing 1–19 of 19 results for author: Chuangsuwanich, E