Search | arXiv e-print repository

A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention

Authors: Amarjeet Kumar, Hongxu Jiang, Muhammad Imran, Cyndi Valdes, Gabriela Leon, Dahyun Kang, Parvathi Nataraj, Yuyin Zhou, Michael D. Weiss, Wei Shao

Abstract: Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is f… ▽ More Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is feasible, it fails to capture the spatial relationships between slices. On the other hand, 3D models face challenges such as resolution inconsistencies in 2.5D images, along with computational complexity and susceptibility to overfitting when trained with limited data. In this context, 2.5D models, which capture inter-slice correlations using only 2D neural networks, emerge as a promising solution due to their reduced computational demand and simplicity in implementation. In this paper, we introduce CSA-Net, a flexible 2.5D segmentation model capable of processing 2.5D images with an arbitrary number of slices through an innovative Cross-Slice Attention (CSA) module. This module uses the cross-slice attention mechanism to effectively capture 3D spatial information by learning long-range dependencies between the center slice (for segmentation) and its neighboring slices. Moreover, CSA-Net utilizes the self-attention mechanism to understand correlations among pixels within the center slice. We evaluated CSA-Net on three 2.5D segmentation tasks: (1) multi-class brain MRI segmentation, (2) binary prostate MRI segmentation, and (3) multi-class prostate MRI segmentation. CSA-Net outperformed leading 2D and 2.5D segmentation methods across all three tasks, demonstrating its efficacy and superiority. Our code is publicly available at https://github.com/mirthAI/CSA-Net. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2403.15803 [pdf, other]

Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations

Authors: Ruige Zong, Tao Wang, Chunwang Li, Xinlin Zhang, Yuanbin Chen, Longxuan Zhao, Qixuan Li, Qinquan Gao, Dezhi Kang, Fuxin Lin, Tong Tong

Abstract: Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha… ▽ More Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions have progressed. To alleviate this problem, we propose a quantitative statistical framework for FCCM, comprising an efficient annotation module, an FCCM lesion segmentation module, and an FCCM lesion quantitative statistics module. Our framework demonstrates precise segmentation of the FCCM lesion based on efficient data annotation, achieving a Dice coefficient of 93.22\%. More importantly, we focus on quantitative statistics of lesions, which is combined with image registration to realize the quantitative comparison of lesions between different examinations of patients, and a visualization framework has been established for doctors to comprehensively compare and analyze lesions. The experimental results have demonstrated that our proposed framework not only obtains objective, accurate, and comprehensive quantitative statistical information, which provides a quantitative assessment method for disease progression and drug efficacy study, but also considerably reduces the manual measurement and statistical workload of lesions, assisting clinical decision-making for FCCM and accelerating progress in FCCM clinical research. This highlights the potential of practical application of the framework in FCCM clinical research and clinical decision-making. The codes are available at https://github.com/6zrg/Quantitative-Statistics-of-FCCM. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.08187 [pdf, other]

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

ACM Class: I.2.7

arXiv:2307.07123 [pdf, other]

Improved Flood Insights: Diffusion-Based SAR to EO Image Translation

Authors: Minseok Seo, Youngtack Oh, Doyi Kim, Dongmin Kang, Yeji Choi

Abstract: Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing S… ▽ More Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing Synthetic Aperture Radar (SAR) data have been proposed. Despite the advantages of SAR over EO in the aforementioned situations, SAR presents a distinct drawback: human analysts often struggle with data interpretation. To tackle this issue, this paper introduces a novel framework, Diffusion-Based SAR to EO Image Translation (DSE). The DSE framework converts SAR images into EO images, thereby enhancing the interpretability of flood insights for humans. Experimental results on the Sen1Floods11 and SEN12-FLOOD datasets confirm that the DSE framework not only delivers enhanced visual information but also improves performance across all tested flood segmentation baselines. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 10 pages, 6 figures

Report number: 10

arXiv:2305.17842 [pdf, other]

doi 10.1109/LRA.2023.3307008

RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion

Authors: Dongho Kang, ** Cheng, Miguel Zamora, Fatemeh Zargarbashi, Stelian Coros

Abstract: This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for t… ▽ More This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL's flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters. △ Less

Submitted 4 September, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: The paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L). You can find the copyright information on the front page of the paper. The supplementary video is available in https://www.youtube.com/watch?v=qPttVfzGS84

arXiv:2305.10732 [pdf, ps, other]

BlindHarmony: "Blind" Harmonization for MR Images via Flow model

Authors: Hwihun Jeong, Heejoon Byun, Dong Un Kang, Jongho Lee

Abstract: In MRI, images of the same contrast (e.g., T$_1$) from the same subject can exhibit noticeable differences when acquired using different hardware, sequences, or scan parameters. These differences in images create a domain gap that needs to be bridged by a step called image harmonization, to process the images successfully using conventional or deep learning-based image analysis (e.g., segmentation… ▽ More In MRI, images of the same contrast (e.g., T$_1$) from the same subject can exhibit noticeable differences when acquired using different hardware, sequences, or scan parameters. These differences in images create a domain gap that needs to be bridged by a step called image harmonization, to process the images successfully using conventional or deep learning-based image analysis (e.g., segmentation). Several methods, including deep learning-based approaches, have been proposed to achieve image harmonization. However, they often require datasets from multiple domains for deep learning training and may still be unsuccessful when applied to images from unseen domains. To address this limitation, we propose a novel concept called `Blind Harmonization', which utilizes only target domain data for training but still has the capability to harmonize images from unseen domains. For the implementation of blind harmonization, we developed BlindHarmony using an unconditional flow model trained on target domain data. The harmonized image is optimized to have a correlation with the input source domain image while ensuring that the latent vector of the flow model is close to the center of the Gaussian distribution. BlindHarmony was evaluated on both simulated and real datasets and compared to conventional methods. BlindHarmony demonstrated noticeable performance on both datasets, highlighting its potential for future use in clinical settings. The source code is available at: https://github.com/SNU-LIST/BlindHarmony △ Less

Submitted 16 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: ICCV 2023 accepted. 9 pages and 5 Figures for manuscipt, supplementary included

arXiv:2212.05662 [pdf, other]

Optimal Planning of Hybrid Energy Storage Systems using Curtailed Renewable Energy through Deep Reinforcement Learning

Authors: Dongju Kang, Doeun Kang, Sumin Hwangbo, Haider Niaz, Won Bo Lee, J. Jay Liu, Jonggeol Na

Abstract: Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with… ▽ More Energy management systems (EMS) are becoming increasingly important in order to utilize the continuously growing curtailed renewable energy. Promising energy storage systems (ESS), such as batteries and green hydrogen should be employed to maximize the efficiency of energy stakeholders. However, optimal decision-making, i.e., planning the leveraging between different strategies, is confronted with the complexity and uncertainties of large-scale problems. Here, we propose a sophisticated deep reinforcement learning (DRL) methodology with a policy-based algorithm to realize the real-time optimal ESS planning under the curtailed renewable energy uncertainty. A quantitative performance comparison proved that the DRL agent outperforms the scenario-based stochastic optimization (SO) algorithm, even with a wide action and observation space. Owing to the uncertainty rejection capability of the DRL, we could confirm a robust performance, under a large uncertainty of the curtailed renewable energy, with a maximizing net profit and stable system. Action-map** was performed for visually assessing the action taken by the DRL agent according to the state. The corresponding results confirmed that the DRL agent learns the way like what a human expert would do, suggesting reliable application of the proposed methodology. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: 30 pages, 8 figures

arXiv:2210.14406 [pdf, other]

RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech

Authors: Kyumin Park, Keon Lee, Daeyoung Kim, Dongyeop Kang

Abstract: Even with recent advances in speech synthesis models, the evaluation of such models is based purely on human judgement as a single naturalness score, such as the Mean Opinion Score (MOS). The score-based metric does not give any further information about which parts of speech are unnatural or why human judges believe they are unnatural. We present a novel speech dataset, RedPen, with human annotat… ▽ More Even with recent advances in speech synthesis models, the evaluation of such models is based purely on human judgement as a single naturalness score, such as the Mean Opinion Score (MOS). The score-based metric does not give any further information about which parts of speech are unnatural or why human judges believe they are unnatural. We present a novel speech dataset, RedPen, with human annotations on unnatural speech regions and their corresponding reasons. RedPen consists of 180 synthesized speeches with unnatural regions annotated by crowd workers; These regions are then reasoned and categorized by error types, such as voice trembling and background noise. We find that our dataset shows a better explanation for unnatural speech regions than the model-driven unnaturalness prediction. Our analysis also shows that each model includes different types of error types. Summing up, our dataset successfully shows the possibility that various error regions and types lie under the single naturalness score. We believe that our dataset will shed light on the evaluation and development of more interpretable speech models in the future. Our dataset will be publicly available upon acceptance. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2210.11946 [pdf, other]

RT-MOT: Confidence-Aware Real-Time Scheduling Framework for Multi-Object Tracking Tasks

Authors: Donghwa Kang, Seunghoon Lee, Hoon Sung Chwa, Seung-Hwan Bae, Chang Mook Kang, **kyu Lee, Hyeongboo Baek

Abstract: Different from existing MOT (Multi-Object Tracking) techniques that usually aim at improving tracking accuracy and average FPS, real-time systems such as autonomous vehicles necessitate new requirements of MOT under limited computing resources: (R1) guarantee of timely execution and (R2) high tracking accuracy. In this paper, we propose RT-MOT, a novel system design for multiple MOT tasks, which a… ▽ More Different from existing MOT (Multi-Object Tracking) techniques that usually aim at improving tracking accuracy and average FPS, real-time systems such as autonomous vehicles necessitate new requirements of MOT under limited computing resources: (R1) guarantee of timely execution and (R2) high tracking accuracy. In this paper, we propose RT-MOT, a novel system design for multiple MOT tasks, which addresses R1 and R2. Focusing on multiple choices of a workload pair of detection and association, which are two main components of the tracking-by-detection approach for MOT, we tailor a measure of object confidence for RT-MOT and develop how to estimate the measure for the next frame of each MOT task. By utilizing the estimation, we make it possible to predict tracking accuracy variation according to different workload pairs to be applied to the next frame of an MOT task. Next, we develop a novel confidence-aware real-time scheduling framework, which offers an offline timing guarantee for a set of MOT tasks based on non-preemptive fixed-priority scheduling with the smallest workload pair. At run-time, the framework checks the feasibility of a priority-inversion associated with a larger workload pair, which does not compromise the timing guarantee of every task, and then chooses a feasible scenario that yields the largest tracking accuracy improvement based on the proposed prediction. Our experiment results demonstrate that RT-MOT significantly improves overall tracking accuracy by up to 1.5x, compared to existing popular tracking-by-detection approaches, while guaranteeing timely execution of all MOT tasks. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted to 2022 Real-Time Systems Symposium (RTSS)

arXiv:2207.01520 [pdf, other]

Adaptive GLCM sampling for transformer-based COVID-19 detection on CT

Authors: Okchul Jung, Dong Un Kang, Gwanghyun Kim, Se Young Chun

Abstract: The world has suffered from COVID-19 (SARS-CoV-2) for the last two years, causing much damage and change in people's daily lives. Thus, automated detection of COVID-19 utilizing deep learning on chest computed tomography (CT) scans became promising, which helps correct diagnosis efficiently. Recently, transformer-based COVID-19 detection method on CT is proposed to utilize 3D information in CT vol… ▽ More The world has suffered from COVID-19 (SARS-CoV-2) for the last two years, causing much damage and change in people's daily lives. Thus, automated detection of COVID-19 utilizing deep learning on chest computed tomography (CT) scans became promising, which helps correct diagnosis efficiently. Recently, transformer-based COVID-19 detection method on CT is proposed to utilize 3D information in CT volume. However, its sampling method for selecting slices is not optimal. To leverage rich 3D information in CT volume, we propose a transformer-based COVID-19 detection using a novel data curation and adaptive sampling method using gray level co-occurrence matrices (GLCM). To train the model which consists of CNN layer, followed by transformer architecture, we first executed data curation based on lung segmentation and utilized the entropy of GLCM value of every slice in CT volumes to select important slices for the prediction. The experimental results show that the proposed method improve the detection performance with large margin without much difficult modification to the model. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: 6 pages

arXiv:2110.07716 [pdf]

Adversarial Scene Reconstruction and Object Detection System for Assisting Autonomous Vehicle

Authors: Md Foysal Haque, Hay-Youn Lim, Dae-Seong Kang

Abstract: In the current computer vision era classifying scenes through video surveillance systems is a crucial task. Artificial Intelligence (AI) Video Surveillance technologies have been advanced remarkably while artificial intelligence and deep learning ascended into the system. Adopting the superior compounds of deep learning visual classification methods achieved enormous accuracy in classifying visual… ▽ More In the current computer vision era classifying scenes through video surveillance systems is a crucial task. Artificial Intelligence (AI) Video Surveillance technologies have been advanced remarkably while artificial intelligence and deep learning ascended into the system. Adopting the superior compounds of deep learning visual classification methods achieved enormous accuracy in classifying visual scenes. However, the visual classifiers face difficulties examining the scenes in dark visible areas, especially during the nighttime. Also, the classifiers face difficulties in identifying the contexts of the scenes. This paper proposed a deep learning model that reconstructs dark visual scenes to clear scenes like daylight, and the method recognizes visual actions for the autonomous vehicle. The proposed model achieved 87.3 percent accuracy for scene reconstruction and 89.2 percent in scene understanding and detection tasks. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:1911.07410 [pdf, other]

Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training

Authors: Dongwon Park, Dong Un Kang, Jisoo Kim, Se Young Chun

Abstract: Multi-scale (MS) approaches have been widely investigated for blind single image / video deblurring that sequentially recovers deblurred images in low spatial scale first and then in high spatial scale later with the output of lower scales. MS approaches have been effective especially for severe blurs induced by large motions in high spatial scale since those can be seen as small blurs in low spat… ▽ More Multi-scale (MS) approaches have been widely investigated for blind single image / video deblurring that sequentially recovers deblurred images in low spatial scale first and then in high spatial scale later with the output of lower scales. MS approaches have been effective especially for severe blurs induced by large motions in high spatial scale since those can be seen as small blurs in low spatial scale. In this work, we investigate alternative approach to MS, called multi-temporal (MT) approach, for non-uniform single image deblurring. We propose incremental temporal training with constructed MT level dataset from time-resolved dataset, develop novel MT-RNNs with recurrent feature maps, and investigate progressive single image deblurring over iterations. Our proposed MT methods outperform state-of-the-art MS methods on the GoPro dataset in PSNR with the smallest number of parameters. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: 10 pages, 8 figures, 6 tables, work in progress

Showing 1–12 of 12 results for author: Kang, D