Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
Authors:
Taekyung Ahn,
Yeonjung Hong,
Younggon Im,
Do Hyung Kim,
Dayoung Kang,
Joo Won Jeong,
Jae Won Kim,
Min Jung Kim,
Ah-ra Cho,
Dae-Hyun Jang,
Hosung Nam
Abstract:
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit…
▽ More
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
M3FPolypSegNet: Segmentation Network with Multi-frequency Feature Fusion for Polyp Localization in Colonoscopy Images
Authors:
Ju-Hyeon Nam,
Seo-Hyeong Park,
Nur Suriza Syazwany,
Yerim Jung,
Yu-Han Im,
Sang-Chul Lee
Abstract:
Polyp segmentation is crucial for preventing colorectal cancer a common type of cancer. Deep learning has been used to segment polyps automatically, which reduces the risk of misdiagnosis. Localizing small polyps in colonoscopy images is challenging because of its complex characteristics, such as color, occlusion, and various shapes of polyps. To address this challenge, a novel frequency-based ful…
▽ More
Polyp segmentation is crucial for preventing colorectal cancer a common type of cancer. Deep learning has been used to segment polyps automatically, which reduces the risk of misdiagnosis. Localizing small polyps in colonoscopy images is challenging because of its complex characteristics, such as color, occlusion, and various shapes of polyps. To address this challenge, a novel frequency-based fully convolutional neural network, Multi-Frequency Feature Fusion Polyp Segmentation Network (M3FPolypSegNet) was proposed to decompose the input image into low/high/full-frequency components to use the characteristics of each component. We used three independent multi-frequency encoders to map multiple input images into a high-dimensional feature space. In the Frequency-ASPP Scalable Attention Module (F-ASPP SAM), ASPP was applied between each frequency component to preserve scale information. Subsequently, scalable attention was applied to emphasize polyp regions in a high-dimensional feature space. Finally, we designed three multi-task learning (i.e., region, edge, and distance) in four decoder blocks to learn the structural characteristics of the region. The proposed model outperformed various segmentation models with performance gains of 6.92% and 7.52% on average for all metrics on CVC-ClinicDB and BKAI-IGH-NeoPolyp, respectively.
△ Less
Submitted 9 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.