-
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
Authors:
Taekyung Ahn,
Yeonjung Hong,
Younggon Im,
Do Hyung Kim,
Dayoung Kang,
Joo Won Jeong,
Jae Won Kim,
Min Jung Kim,
Ah-ra Cho,
Dae-Hyun Jang,
Hosung Nam
Abstract:
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit…
▽ More
This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Authors:
Jeong Hun Yeo,
Minsu Kim,
Jeongsoo Choi,
Dae Hoe Kim,
Yong Man Ro
Abstract:
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different fro…
▽ More
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different from the previous methods, the proposed AKVSR 1) utilizes rich audio knowledge encoded by a large-scale pretrained audio model, 2) saves the linguistic information of audio knowledge in compact audio memory by discarding the non-linguistic information from the audio through quantization, and 3) includes Audio Bridging Module which can find the best-matched audio features from the compact audio memory, which makes our training possible without audio inputs, once after the compact audio memory is composed. We validate the effectiveness of the proposed method through extensive experiments, and achieve new state-of-the-art performances on the widely-used LRS3 dataset.
△ Less
Submitted 11 January, 2024; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Drive Safe: Cognitive-Behavioral Mining for Intelligent Transportation Cyber-Physical System
Authors:
Md. Shirajum Munir,
Sarder Fakhrul Abedin,
Ki Tae Kim,
Do Hyeon Kim,
Md. Golam Rabiul Alam,
Choong Seon Hong
Abstract:
This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further,…
▽ More
This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further, we develop a prototype of the proposed drive safe platform to establish proof-of-concept (PoC) for the road safety in IT-CPS. In the developed driving safety platform, we employ five AI and statistical-based models to infer a vehicle driver's cognitive-behavioral mining to ensure safe driving during the drive. Especially, capsule network (CN), maximum likelihood (ML), convolutional neural network (CNN), Apriori algorithm, and Bayesian network (BN) are deployed for driver activity recognition, environmental feature extraction, mood recognition, sequential pattern mining, and content recommendation for affective mood repairment of the driver, respectively. Besides, we develop a communication module to interact with the systems in IT-CPS asynchronously. Thus, the developed drive safe PoC can guide the vehicle drivers when they are distracted from driving due to the cognitive-behavioral factors. Finally, we have performed a qualitative evaluation to measure the usability and effectiveness of the developed drive safe platform. We observe that the P-value is 0.0041 (i.e., < 0.05) in the ANOVA test. Moreover, the confidence interval analysis also shows significant gains in prevalence value which is around 0.93 for a 95% confidence level. The aforementioned statistical results indicate high reliability in terms of driver's safety and mental state.
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration
Authors:
Boah Kim,
Dong Hwan Kim,
Seong Ho Park,
Jieun Kim,
June-Goo Lee,
Jong Chul Ye
Abstract:
Image registration is a fundamental task in medical image analysis. Recently, deep learning based image registration methods have been extensively investigated due to their excellent performance despite the ultra-fast computational time. However, the existing deep learning methods still have limitation in the preservation of original topology during the deformation with registration vector fields.…
▽ More
Image registration is a fundamental task in medical image analysis. Recently, deep learning based image registration methods have been extensively investigated due to their excellent performance despite the ultra-fast computational time. However, the existing deep learning methods still have limitation in the preservation of original topology during the deformation with registration vector fields. To address this issues, here we present a cycle-consistent deformable image registration. The cycle consistency enhances image registration performance by providing an implicit regularization to preserve topology during the deformation. The proposed method is so flexible that can be applied for both 2D and 3D registration problems for various applications, and can be easily extended to multi-scale implementation to deal with the memory issues in large volume registration. Experimental results on various datasets from medical and non-medical applications demonstrate that the proposed method provides effective and accurate registration on diverse image pairs within a few seconds. Qualitative and quantitative evaluations on deformation fields also verify the effectiveness of the cycle consistency of the proposed method.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Deep Convolutional GANs for Car Image Generation
Authors:
Dong Hui Kim
Abstract:
In this paper, we investigate the application of deep convolutional GANs on car image generation. We improve upon the commonly used DCGAN architecture by implementing Wasserstein loss to decrease mode collapse and introducing dropout at the end of the discrimiantor to introduce stochasticity. Furthermore, we introduce convolutional layers at the end of the generator to improve expressiveness and s…
▽ More
In this paper, we investigate the application of deep convolutional GANs on car image generation. We improve upon the commonly used DCGAN architecture by implementing Wasserstein loss to decrease mode collapse and introducing dropout at the end of the discrimiantor to introduce stochasticity. Furthermore, we introduce convolutional layers at the end of the generator to improve expressiveness and smooth noise. All of these improvements upon the DCGAN architecture comprise our proposal of the novel BoolGAN architecture, which is able to decrease the FID from 195.922 (baseline) to 165.966.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Unsupervised Deformable Image Registration Using Cycle-Consistent CNN
Authors:
Boah Kim,
Jieun Kim,
June-Goo Lee,
Dong Hwan Kim,
Seong Ho Park,
Jong Chul Ye
Abstract:
Medical image registration is one of the key processing steps for biomedical image analysis such as cancer diagnosis. Recently, deep learning based supervised and unsupervised image registration methods have been extensively studied due to its excellent performance in spite of ultra-fast computational time compared to the classical approaches. In this paper, we present a novel unsupervised medical…
▽ More
Medical image registration is one of the key processing steps for biomedical image analysis such as cancer diagnosis. Recently, deep learning based supervised and unsupervised image registration methods have been extensively studied due to its excellent performance in spite of ultra-fast computational time compared to the classical approaches. In this paper, we present a novel unsupervised medical image registration method that trains deep neural network for deformable registration of 3D volumes using a cycle-consistency. Thanks to the cycle consistency, the proposed deep neural networks can take diverse pair of image data with severe deformation for accurate registration. Experimental results using multiphase liver CT images demonstrate that our method provides very precise 3D image registration within a few seconds, resulting in more accurate cancer size estimation.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.