Search | arXiv e-print repository

Breast Ultrasound Report Generation using LangChain

Authors: Jaeyoung Huh, Hyun Jeong Park, Jong Chul Ye

Abstract: Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multipl… ▽ More Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multiple images of various parts to compile a single report, presenting a time-consuming challenge. To address this problem, we propose the integration of multiple image analysis tools through a LangChain using Large Language Models (LLM), into the breast reporting process. Through a combination of designated tools and text generation through LangChain, our method can accurately extract relevant features from ultrasound images, interpret them in a clinical context, and produce comprehensive and standardized reports. This approach not only reduces the burden on radiologists and healthcare professionals but also enhances the consistency and quality of reports. The extensive experiments shows that each tools involved in the proposed method can offer qualitatively and quantitatively significant results. Furthermore, clinical evaluation on the generated reports demonstrates that the proposed method can make report in clinically meaningful way. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.01908 [pdf, other]

LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

Authors: Yu** Oh, Sangjoon Park, Hwa Kyung Byun, Yeona Cho, Ik Jae Lee, ** Sung Kim, Jong Chul Ye

Abstract: Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven mul… ▽ More Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven multimodal AI, namely LLMSeg, that utilizes the clinical text information and is applicable to the challenging task of target volume contouring for radiation therapy, and validate it within the context of breast cancer radiation therapy target volume contouring. Using external validation and data-insufficient environments, which attributes highly conducive to real-world applications, we demonstrate that the proposed model exhibits markedly improved performance compared to conventional unimodal AI models, particularly exhibiting robust generalization performance and data efficiency. To our best knowledge, this is the first LLM-driven multimodal AI model that integrates the clinical text information into target volume delineation for radiation oncology. △ Less

Submitted 15 April, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

arXiv:2308.00193 [pdf, other]

C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation

Authors: Boah Kim, Yu** Oh, Bradford J. Wood, Ronald M. Summers, Jong Chul Ye

Abstract: Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this pape… ▽ More Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this paper presents a self-supervised vessel segmentation method, dubbed the contrastive diffusion adversarial representation learning (C-DARL) model. Our model is composed of a diffusion module and a generation module that learns the distribution of multi-domain blood vessel data by generating synthetic vessel images from diffusion latent. Moreover, we employ contrastive learning through a mask-based contrastive loss so that the model can learn more realistic vessel representations. To validate the efficacy, C-DARL is trained using various vessel datasets, including coronary angiograms, abdominal digital subtraction angiograms, and retinal imaging. Experimental results confirm that our model achieves performance improvement over baseline methods with noise robustness, suggesting the effectiveness of C-DARL for vessel segmentation. △ Less

Submitted 31 July, 2023; originally announced August 2023.

arXiv:2307.15208 [pdf, other]

Generative AI for Medical Imaging: extending the MONAI Framework

Authors: Walter H. L. Pinaya, Mark S. Graham, Eric Kerfoot, Petru-Daniel Tudosiu, Jessica Dafflon, Virginia Fernandez, Pedro Sanchez, Julia Wolleb, Pedro F. da Costa, Ashay Patel, Hyung** Chung, Can Zhao, Wei Peng, Zelong Liu, Xueyan Mei, Oeslle Lucena, Jong Chul Ye, Sotirios A. Tsaftaris, Prerna Dogra, Andrew Feng, Marc Modat, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso

Abstract: Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the comp… ▽ More Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the complexity of these models, their implementation and reproducibility can be difficult. This complexity can hinder progress, act as a use barrier, and dissuade the comparison of new methods with existing works. In this study, we present MONAI Generative Models, a freely available open-source platform that allows researchers and developers to easily train, evaluate, and deploy generative models and related applications. Our platform reproduces state-of-art studies in a standardised way involving different architectures (such as diffusion models, autoregressive transformers, and GANs), and provides pre-trained models for the community. We have implemented these models in a generalisable fashion, illustrating that their results can be extended to 2D or 3D scenarios, including medical images with different modalities (like CT, MRI, and X-Ray data) and from different anatomical areas. Finally, we adopt a modular and extensible approach, ensuring long-term maintainability and the extension of current applications for future features. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2306.04339 [pdf, other]

Unpaired Deep Learning for Pharmacokinetic Parameter Estimation from Dynamic Contrast-Enhanced MRI

Authors: Gyutaek Oh, Won-** Moon, Jong Chul Ye

Abstract: DCE-MRI provides information about vascular permeability and tissue perfusion through the acquisition of pharmacokinetic parameters. However, traditional methods for estimating these pharmacokinetic parameters involve fitting tracer kinetic models, which often suffer from computational complexity and low accuracy due to noisy arterial input function (AIF) measurements. Although some deep learning… ▽ More DCE-MRI provides information about vascular permeability and tissue perfusion through the acquisition of pharmacokinetic parameters. However, traditional methods for estimating these pharmacokinetic parameters involve fitting tracer kinetic models, which often suffer from computational complexity and low accuracy due to noisy arterial input function (AIF) measurements. Although some deep learning approaches have been proposed to tackle these challenges, most existing methods rely on supervised learning that requires paired input DCE-MRI and labeled pharmacokinetic parameter maps. This dependency on labeled data introduces significant time and resource constraints, as well as potential noise in the labels, making supervised learning methods often impractical. To address these limitations, here we present a novel unpaired deep learning method for estimating both pharmacokinetic parameters and the AIF using a physics-driven CycleGAN approach. Our proposed CycleGAN framework is designed based on the underlying physics model, resulting in a simpler architecture with a single generator and discriminator pair. Crucially, our experimental results indicate that our method, which does not necessitate separate AIF measurements, produces more reliable pharmacokinetic parameters than other techniques. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.16482 [pdf, other]

Score-based Diffusion Models for Bayesian Image Reconstruction

Authors: Michael T. McCann, Hyung** Chung, Jong Chul Ye, Marc L. Klasky

Abstract: This paper explores the use of score-based diffusion models for Bayesian image reconstruction. Diffusion models are an efficient tool for generative modeling. Diffusion models can also be used for solving image reconstruction problems. We present a simple and flexible algorithm for training a diffusion model and using it for maximum a posteriori reconstruction, minimum mean square error reconstruc… ▽ More This paper explores the use of score-based diffusion models for Bayesian image reconstruction. Diffusion models are an efficient tool for generative modeling. Diffusion models can also be used for solving image reconstruction problems. We present a simple and flexible algorithm for training a diffusion model and using it for maximum a posteriori reconstruction, minimum mean square error reconstruction, and posterior sampling. We present experiments on both a linear and a nonlinear reconstruction problem that highlight the strengths and limitations of the approach. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 5 pages, 3 figures

arXiv:2303.08440 [pdf, other]

Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models

Authors: Suhyeon Lee, Hyung** Chung, Minyoung Park, Jonghyuk Park, Wi-Sun Ryu, Jong Chul Ye

Abstract: Diffusion models have become a popular approach for image generation and reconstruction due to their numerous advantages. However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to… ▽ More Diffusion models have become a popular approach for image generation and reconstruction due to their numerous advantages. However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem. By modeling the 3D data distribution as a product of 2D distributions sliced in different directions, our method effectively addresses the curse of dimensionality. Our experimental results demonstrate that our method is highly effective for 3D medical image reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing MRI, and sparse-view CT. Our method can generate high-quality voxel volumes suitable for medical applications. △ Less

Submitted 1 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: ICCV23 poster. 15 pages, 9 figures

arXiv:2303.00091 [pdf, other]

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

Authors: Jaeyoung Huh, Sangjoon Park, Jeong Eun Lee, Jong Chul Ye

Abstract: Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on… ▽ More Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on typists to transcribe their voice recordings. However, develo** an STT model for the medical domain is challenging due to the lack of sufficient speech and text datasets. To address this issue, we propose a medical-domain text correction method that modifies the output text of a general STT system using the Vision Language Pre-training (VLP) method. VLP combines textual and visual information to correct text based on image knowledge. Our extensive experiments demonstrate that the proposed method offers quantitatively and clinically significant improvements in STT performance in the medical field. We further show that multi-modal understanding of image and text information outperforms single-modal understanding using only text information. △ Less

Submitted 27 February, 2023; originally announced March 2023.

arXiv:2301.03027 [pdf, other]

Annealed Score-Based Diffusion Model for MR Motion Artifact Reduction

Authors: Gyutaek Oh, Jeong Eun Lee, Jong Chul Ye

Abstract: Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a stric… ▽ More Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a strict motion-corruption model, which limits their use for real-world situations. To address this issue, here we present an annealed score-based diffusion model for MRI motion artifact reduction. Specifically, we train a score-based model using only motion-free images, and then motion artifacts are removed by applying forward and reverse diffusion processes repeatedly to gradually impose a low-frequency data consistency. Experimental results verify that the proposed method successfully reduces both simulated and in vivo motion artifacts, outperforming the state-of-the-art deep learning methods. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2209.14566 [pdf, other]

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Authors: Boah Kim, Yu** Oh, Jong Chul Ye

Abstract: Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address t… ▽ More Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it to vessel segmentation. In particular, for self-supervised vessel segmentation, DARL learns the background signal using a diffusion module, which lets a generation module effectively provide vessel representations. Also, by adversarial learning based on the proposed switchable spatially-adaptive denormalization, our model estimates synthetic fake vessel images as well as vessel segmentation masks, which further makes the model capture vessel-relevant semantic information. Once the proposed model is trained, the model generates segmentation masks in a single step and can be applied to general vascular structure segmentation of coronary angiography and retinal images. Experimental results on various datasets show that our method significantly outperforms existing unsupervised and self-supervised vessel segmentation methods. △ Less

Submitted 15 February, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted at ICLR 2023

arXiv:2208.05140 [pdf, other]

Self-supervised Multi-modal Training from Uncurated Image and Reports Enables Zero-shot Oversight Artificial Intelligence in Radiology

Authors: Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, Jong Chul Ye

Abstract: Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited success… ▽ More Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited successes in the application of vision-language models in the medical domain, as the current vision-language models and learning strategies for photographic images and captions call for the web-scale data corpus of image and text pairs which was not often feasible in the medical domain. To address this, here we present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL), leveraging the key components to be tailored for the medical domain. Our medical X-VL model is based on the following components: self-supervised uni-modal models in medical domain and fusion encoder to bridge them, momentum distillation, sentence-wise contrastive learning for medical reports, and the sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed the current state-of-the-art models in two different medical image database, suggesting the novel clinical usage of our oversight AI model for monitoring human errors. Our method was especially successful in the data-limited setting, which is frequently encountered in the clinics, suggesting the potential widespread applicability in medical domain. △ Less

Submitted 12 April, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2207.02377 [pdf, other]

Patch-wise Deep Metric Learning for Unsupervised Low-Dose CT Denoising

Authors: Chanyong Jung, Joonhyung Lee, Sunkyoung You, Jong Chul Ye

Abstract: The acquisition conditions for low-dose and high-dose CT images are usually different, so that the shifts in the CT numbers often occur. Accordingly, unsupervised deep learning-based approaches, which learn the target image distribution, often introduce CT number distortions and result in detrimental effects in diagnostic performance. To address this, here we propose a novel unsupervised learning… ▽ More The acquisition conditions for low-dose and high-dose CT images are usually different, so that the shifts in the CT numbers often occur. Accordingly, unsupervised deep learning-based approaches, which learn the target image distribution, often introduce CT number distortions and result in detrimental effects in diagnostic performance. To address this, here we propose a novel unsupervised learning approach for lowdose CT reconstruction using patch-wise deep metric learning. The key idea is to learn embedding space by pulling the positive pairs of image patches which shares the same anatomical structure, and pushing the negative pairs which have same noise level each other. Thereby, the network is trained to suppress the noise level, while retaining the original global CT number distributions even after the image translation. Experimental results confirm that our deep metric learning plays a critical role in producing high quality denoised images without CT number shift. △ Less

Submitted 13 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: MICCAI 2022

arXiv:2206.13295 [pdf, other]

Diffusion Deformable Model for 4D Temporal Medical Image Generation

Authors: Boah Kim, Jong Chul Ye

Abstract: Temporal volume images with 3D+t (4D) information are often used in medical imaging to statistically analyze temporal dynamics or capture disease progression. Although deep-learning-based generative models for natural images have been extensively studied, approaches for temporal medical image generation such as 4D cardiac volume data are limited. In this work, we present a novel deep learning mode… ▽ More Temporal volume images with 3D+t (4D) information are often used in medical imaging to statistically analyze temporal dynamics or capture disease progression. Although deep-learning-based generative models for natural images have been extensively studied, approaches for temporal medical image generation such as 4D cardiac volume data are limited. In this work, we present a novel deep learning model that generates intermediate temporal volumes between source and target volumes. Specifically, we propose a diffusion deformable model (DDM) by adapting the denoising diffusion probabilistic model that has recently been widely investigated for realistic image generation. Our proposed DDM is composed of the diffusion and the deformation modules so that DDM can learn spatial deformation information between the source and target volumes and provide a latent code for generating intermediate frames along a geodesic path. Once our model is trained, the latent code estimated from the diffusion module is simply interpolated and fed into the deformation module, which enables DDM to generate temporal frames along the continuous trajectory while preserving the topology of the source image. We demonstrate the proposed method with the 4D cardiac MR image generation between the diastolic and systolic phases for each subject. Compared to the existing deformation methods, our DDM achieves high performance on temporal volume generation. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted for MICCAI 2022

arXiv:2203.12621 [pdf, other]

MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion

Authors: Hyung** Chung, Eun Sun Lee, Jong Chul Ye

Abstract: Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce… ▽ More Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world sitautions: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-the-art performance, while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2202.08510 [pdf, other]

doi 10.1109/JBHI.2023.3276778

Multi-Scale Hybrid Vision Transformer for Learning Gastric Histology: AI-Based Decision Support System for Gastric Cancer Treatment

Authors: Yu** Oh, Go Eun Bae, Kyung-Hee Kim, Min-Kyung Yeo, Jong Chul Ye

Abstract: Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, existing AI systems are limited in fine-grained cancer subclassifications and have little usability in planning… ▽ More Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, existing AI systems are limited in fine-grained cancer subclassifications and have little usability in planning cancer treatment. We propose a practical AI system that enables five subclassifications of GC pathology, which can be directly matched to general GC treatment guidance. The AI system is designed to efficiently differentiate multi-classes of GC through multi-scale self-attention mechanism using 2-stage hybrid Vision Transformer (ViT) networks, by mimicking the way how human pathologists understand histology. The AI system demonstrates reliable diagnostic performance by achieving class-average sensitivity of above 0.85 on a total of 1,212 slides from multicentric cohort. Furthermore, AI-assisted pathologists show significantly improved diagnostic sensitivity by 12% in addition to 18% reduced screening time compared to human pathologists. Our results demonstrate that AI-assisted gastric endoscopic screening has a great potential for providing presumptive pathologic opinion and appropriate cancer treatment of gastric cancer in practical clinical settings. △ Less

Submitted 15 August, 2023; v1 submitted 17 February, 2022; originally announced February 2022.

Journal ref: Published in: IEEE Journal of Biomedical and Health Informatics (Volume: 27, Issue: 8, August 2023)

arXiv:2202.08262 [pdf, other]

Phase Aberration Robust Beamformer for Planewave US Using Self-Supervised Learning

Authors: Shujaat Khan, Jaeyoung Huh, Jong Chul Ye

Abstract: Ultrasound (US) is widely used for clinical imaging applications thanks to its real-time and non-invasive nature. However, its lesion detectability is often limited in many applications due to the phase aberration artefact caused by variations in the speed of sound (SoS) within body parts. To address this, here we propose a novel self-supervised 3D CNN that enables phase aberration robust plane-wa… ▽ More Ultrasound (US) is widely used for clinical imaging applications thanks to its real-time and non-invasive nature. However, its lesion detectability is often limited in many applications due to the phase aberration artefact caused by variations in the speed of sound (SoS) within body parts. To address this, here we propose a novel self-supervised 3D CNN that enables phase aberration robust plane-wave imaging. Instead of aiming at estimating the SoS distribution as in conventional methods, our approach is unique in that the network is trained in a self-supervised manner to robustly generate a high-quality image from various phase aberrated images by modeling the variation in the speed of sound as stochastic. Experimental results using real measurements from tissue-mimicking phantom and \textit{in vivo} scans confirmed that the proposed method can significantly reduce the phase aberration artifacts and improve the visual quality of deep scans. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 10 pages, 12 figures, submitted to IEEE-TMI

arXiv:2202.06431 [pdf, other]

doi 10.1038/s41467-022-31514-x

AI can evolve without labels: self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation

Authors: Sangjoon Park, Gwanghyun Kim, Yu** Oh, Joon Beom Seo, Sang Min Lee, ** Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Chang Min Park, Jong Chul Ye

Abstract: Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, develo** a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially i… ▽ More Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, develo** a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially in deprived areas. To address this, here we present a novel deep learning framework that uses knowledge distillation through self-supervised learning and self-training, which shows that the performance of the original model trained with a small number of labels can be gradually improved with more unlabeled data. Experimental results show that the proposed framework maintains impressive robustness against a real-world environment and has general applicability to several diagnostic tasks such as tuberculosis, pneumothorax, and COVID-19. Notably, we demonstrated that our model performs even better than those trained with the same amount of labeled data. The proposed framework has a great potential for medical imaging, where plenty of data is accumulated every year, but ground truth annotations are expensive to obtain. △ Less

Submitted 13 February, 2022; originally announced February 2022.

Comments: 24 pages

arXiv:2112.05149 [pdf, other]

DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model

Authors: Boah Kim, Inhwa Han, Jong Chul Ye

Abstract: Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological… ▽ More Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability. △ Less

Submitted 29 September, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

arXiv:2112.05146 [pdf, other]

Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction

Authors: Hyung** Chung, Byeongsu Sim, Jong Chul Ye

Abstract: Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from p… ▽ More Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from pure Gaussian noise. In this work, we show that starting from Gaussian noise is unnecessary. Instead, starting from a single forward diffusion with better initialization significantly reduces the number of sampling steps in the reverse conditional diffusion. This phenomenon is formally explained by the contraction theory of the stochastic difference equations like our conditional diffusion strategy - the alternating applications of reverse diffusion followed by a non-expansive data consistency step. The new sampling strategy, dubbed Come-Closer-Diffuse-Faster (CCDF), also reveals a new insight on how the existing feed-forward neural network approaches for inverse problems can be synergistically combined with the diffusion models. Experimental results with super-resolution, image inpainting, and compressed sensing MRI demonstrate that our method can achieve state-of-the-art reconstruction performance at significantly reduced sampling steps. △ Less

Submitted 19 March, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Accepted to CVPR 2022

arXiv:2112.03696 [pdf, other]

Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching

Authors: Kwanyoung Kim, Taesung Kwon, Jong Chul Ye

Abstract: Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining wit… ▽ More Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining with the recent Noise2Score self-supervised image denoising approach and the saddle point approximation of Tweedie distribution, we can provide a general closed-form denoising formula that can be used for large classes of noise distributions without ever knowing the underlying noise distribution. Similar to the original Noise2Score, the new approach is composed of two successive steps: score matching using perturbed noisy images, followed by a closed form image denoising formula via distribution-independent Tweedie's formula. This also suggests a systematic algorithm to estimate the noise model and noise parameters for a given noisy image data set. Through extensive experiments, we demonstrate that the proposed method can accurately estimate noise models and parameters, and provide the state-of-the-art self-supervised image denoising performance in the benchmark dataset and real-world dataset. △ Less

Submitted 4 December, 2021; originally announced December 2021.

arXiv:2112.02896 [pdf, other]

Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGAN

Authors: Jaeyoung Huh, Shujaat Khan, Sung** Choi, Dongkuk Shin, Eun Sun Lee, Jong Chul Ye

Abstract: In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for… ▽ More In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for example, the image quality is comparable along the beam direction, but significant deterioration in image quality is often observed in the other two axial image planes. To address this, here we propose a novel unsupervised deep learning approach to improve 3-D US image quality. In particular, using {\em unmatched} high-quality 2-D US images as a reference, we trained a recently proposed switchable CycleGAN architecture so that every map** plane in 3-D US can learn the image quality of 2-D US images. Thanks to the switchable architecture, our network can also provide real-time control of image enhancement level based on user preference, which is ideal for a user-centric scanner setup. Extensive experiments with clinical evaluation confirm that our method offers significantly improved image quality as well user-friendly flexibility. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2112.00374 [pdf, other]

CLIPstyler: Image Style Transfer with a Single Text Condition

Authors: Gihyun Kwon, Jong Chul Ye

Abstract: Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a s… ▽ More Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts. △ Less

Submitted 19 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: CVPR 2022 camera ready

arXiv:2111.01338 [pdf, other]

Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training

Authors: Sangjoon Park, Gwanghyun Kim, Jeongsol Kim, Boah Kim, Jong Chul Ye

Abstract: Federated learning, which shares the weights of the neural network across clients, is gaining attention in the healthcare sector as it enables training on a large corpus of decentralized data while maintaining data privacy. For example, this enables neural network training for COVID-19 diagnosis on chest X-ray (CXR) images without collecting patient CXR data across multiple hospitals. Unfortunatel… ▽ More Federated learning, which shares the weights of the neural network across clients, is gaining attention in the healthcare sector as it enables training on a large corpus of decentralized data while maintaining data privacy. For example, this enables neural network training for COVID-19 diagnosis on chest X-ray (CXR) images without collecting patient CXR data across multiple hospitals. Unfortunately, the exchange of the weights quickly consumes the network bandwidth if highly expressive network architecture is employed. So-called split learning partially solves this problem by dividing a neural network into a client and a server part, so that the client part of the network takes up less extensive computation resources and bandwidth. However, it is not clear how to find the optimal split without sacrificing the overall network performance. To amalgamate these methods and thereby maximize their distinct strengths, here we show that the Vision Transformer, a recently developed deep learning architecture with straightforward decomposable configuration, is ideally suitable for split learning without sacrificing performance. Even under the non-independent and identically distributed data distribution which emulates a real collaboration between hospitals using CXR datasets from multiple sources, the proposed framework was able to attain performance comparable to data-centralized training. In addition, the proposed framework along with heterogeneous multi-task clients also improves individual task performances including the diagnosis of COVID-19, eliminating the need for sharing large weights with innumerable parameters. Our results affirm the suitability of Transformer for collaborative learning in medical imaging and pave the way forward for future real-world implementations. △ Less

Submitted 3 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted for NeurIPS 2021

arXiv:2110.05243 [pdf, other]

Score-based diffusion models for accelerated MRI

Authors: Hyung** Chung, Jong Chul Ye

Abstract: Score-based diffusion models provide a powerful way to model images using the gradient of the data distribution. Leveraging the learned score function as a prior, here we introduce a way to sample data from a conditional distribution given the measurements, such that the model can be readily used for solving inverse problems in imaging, especially for accelerated MRI. In short, we train a continuo… ▽ More Score-based diffusion models provide a powerful way to model images using the gradient of the data distribution. Leveraging the learned score function as a prior, here we introduce a way to sample data from a conditional distribution given the measurements, such that the model can be readily used for solving inverse problems in imaging, especially for accelerated MRI. In short, we train a continuous time-dependent score function with denoising score matching. Then, at the inference stage, we iterate between numerical SDE solver and data consistency projection step to achieve reconstruction. Our model requires magnitude images only for training, and yet is able to reconstruct complex-valued data, and even extends to parallel imaging. The proposed method is agnostic to sub-sampling patterns, and can be used with any sampling schemes. Also, due to its generative nature, our approach can quantify uncertainty, which is not possible with standard regression settings. On top of all the advantages, our method also has very strong performance, even beating the models trained with full supervision. With extensive experiments, we verify the superiority of our method in terms of quality and practicality. △ Less

Submitted 16 July, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: Code available at: https://github.com/HJ-harry/score-MRI

arXiv:2110.05201 [pdf, other]

doi 10.1109/TSP.2022.3215735

Performance Analysis of Fractional Learning Algorithms

Authors: Abdul Wahab, Shujaat Khan, Imran Naseem, Jong Chul Ye

Abstract: Fractional learning algorithms are trending in signal processing and adaptive filtering recently. However, it is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed. In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is perfor… ▽ More Fractional learning algorithms are trending in signal processing and adaptive filtering recently. However, it is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed. In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is performed. Some critical schematic kinks in fractional learning algorithms are identified. Their origins and consequences on the performance of the learning algorithms are discussed and swift ready-witted remedies are proposed. Apposite numerical experiments are conducted to discuss the convergence and efficiency of the fractional learning algorithms in stochastic environments. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 29 pages, 6 figures

arXiv:2109.11431 [pdf, other]

Deep Learning for Ultrasound Beamforming

Authors: Ruud JG van Sloun, Jong Chul Ye, Yonina C Eldar

Abstract: Diagnostic imaging plays a critical role in healthcare, serving as a fundamental asset for timely diagnosis, disease staging and management as well as for treatment choice, planning, guidance, and follow-up. Among the diagnostic imaging options, ultrasound imaging is uniquely positioned, being a highly cost-effective modality that offers the clinician an unmatched and invaluable level of interacti… ▽ More Diagnostic imaging plays a critical role in healthcare, serving as a fundamental asset for timely diagnosis, disease staging and management as well as for treatment choice, planning, guidance, and follow-up. Among the diagnostic imaging options, ultrasound imaging is uniquely positioned, being a highly cost-effective modality that offers the clinician an unmatched and invaluable level of interaction, enabled by its real-time nature. Ultrasound probes are becoming increasingly compact and portable, with the market demand for low-cost pocket-sized and (in-body) miniaturized devices expanding. At the same time, there is a strong trend towards 3D imaging and the use of high-frame-rate imaging schemes; both accompanied by dramatically increasing data rates that pose a heavy burden on the probe-system communication and subsequent image reconstruction algorithms. With the demand for high-quality image reconstruction and signal extraction from less (e.g unfocused or parallel) transmissions that facilitate fast imaging, and a push towards compact probes, modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing. Beamforming, the process of map** received ultrasound echoes to the spatial image domain, naturally lies at the heart of the ultrasound image formation chain. In this chapter on Deep Learning for Ultrasound Beamforming, we discuss why and when deep learning methods can play a compelling role in the digital beamforming pipeline, and then show how these data-driven systems can be leveraged for improved ultrasound image reconstruction. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2107.11089 [pdf, other]

A Resolution-Adaptive 8 mm$^\text{2}$ 9.98 Gb/s 39.7 pJ/b 32-Antenna All-Digital Spatial Equalizer for mmWave Massive MU-MIMO in 65nm CMOS

Authors: Oscar Castañeda, Zachariah Boynton, Seyed Hadi Mirfarshbafan, Shimin Huang, Jamie C. Ye, Alyosha Molnar, Christoph Studer

Abstract: All-digital millimeter-wave (mmWave) massive multi-user multiple-input multiple-output (MU-MIMO) receivers enable extreme data rates but require high power consumption. In order to reduce power consumption, this paper presents the first resolution-adaptive all-digital receiver ASIC that is able to adjust the resolution of the data-converters and baseband-processing engine to the instantaneous comm… ▽ More All-digital millimeter-wave (mmWave) massive multi-user multiple-input multiple-output (MU-MIMO) receivers enable extreme data rates but require high power consumption. In order to reduce power consumption, this paper presents the first resolution-adaptive all-digital receiver ASIC that is able to adjust the resolution of the data-converters and baseband-processing engine to the instantaneous communication scenario. The scalable 32-antenna, 65 nm CMOS receiver occupies a total area of 8 mm$^\text{2}$ and integrates analog-to-digital converters (ADCs) with programmable gain and resolution, beamspace channel estimation, and a resolution-adaptive processing-in-memory spatial equalizer. With 6-bit ADC samples and a 4-bit spatial equalizer, our ASIC achieves a throughput of 9.98 Gb/s while being at least 2x more energy-efficient than state-of-the-art designs. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: To be presented at the IEEE European Solid-State Circuits Conference (ESSCIRC) 2021

arXiv:2106.07009 [pdf, other]

Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images

Authors: Kwanyoung Kim, Jong Chul Ye

Abstract: Recently, there has been extensive research interest in training deep networks to denoise images without clean reference. However, the representative approaches such as Noise2Noise, Noise2Void, Stein's unbiased risk estimator (SURE), etc. seem to differ from one another and it is difficult to find the coherent mathematical structure. To address this, here we present a novel approach, called Noise2… ▽ More Recently, there has been extensive research interest in training deep networks to denoise images without clean reference. However, the representative approaches such as Noise2Noise, Noise2Void, Stein's unbiased risk estimator (SURE), etc. seem to differ from one another and it is difficult to find the coherent mathematical structure. To address this, here we present a novel approach, called Noise2Score, which reveals a missing link in order to unite these seemingly different approaches. Specifically, we show that image denoising problems without clean images can be addressed by finding the mode of the posterior distribution and that the Tweedie's formula offers an explicit solution through the score function (i.e. the gradient of log likelihood). Our method then uses the recent finding that the score function can be stably estimated from the noisy images using the amortized residual denoising autoencoder, the method of which is closely related to Noise2Noise or Nose2Void. Our Noise2Score approach is so universal that the same network training can be used to remove noises from images that are corrupted by any exponential family distributions and noise parameters. Using extensive experiments with Gaussian, Poisson, and Gamma noises, we show that Noise2Score significantly outperforms the state-of-the-art self-supervised denoising methods in the benchmark data set such as (C)BSD68, Set12, and Kodak, etc. △ Less

Submitted 27 October, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: Camera ready version for NeurIPS 2021

arXiv:2105.08040 [pdf, other]

doi 10.1109/MSP.2021.3119273

Unsupervised Deep Learning Methods for Biological Image Reconstruction and Enhancement

Authors: Mehmet Akçakaya, Burhaneddin Yaman, Hyung** Chung, Jong Chul Ye

Abstract: Recently, deep learning approaches have become the main research frontier for biological image reconstruction and enhancement problems thanks to their high performance, along with their ultra-fast inference times. However, due to the difficulty of obtaining matched reference data for supervised learning, there has been increasing interest in unsupervised learning approaches that do not need paired… ▽ More Recently, deep learning approaches have become the main research frontier for biological image reconstruction and enhancement problems thanks to their high performance, along with their ultra-fast inference times. However, due to the difficulty of obtaining matched reference data for supervised learning, there has been increasing interest in unsupervised learning approaches that do not need paired reference data. In particular, self-supervised learning and generative models have been successfully used for various biological imaging applications. In this paper, we overview these approaches from a coherent perspective in the context of classical inverse problems, and discuss their applications to biological imaging, including electron, fluorescence and deconvolution microscopy, optical diffraction tomography and functional neuroimaging. △ Less

Submitted 22 November, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: To appear in IEEE Signal Processing Magazine

Journal ref: IEEE Signal Processing Magazine, 2022

arXiv:2105.00240 [pdf, other]

Simultaneous super-resolution and motion artifact removal in diffusion-weighted MRI using unsupervised deep learning

Authors: Hyung** Chung, Jaehyun Kim, Jeong Hee Yoon, Jeong Min Lee, Jong Chul Ye

Abstract: Diffusion-weighted MRI is nowadays performed routinely due to its prognostic ability, yet the quality of the scans are often unsatisfactory which can subsequently hamper the clinical utility. To overcome the limitations, here we propose a fully unsupervised quality enhancement scheme, which boosts the resolution and removes the motion artifact simultaneously. This process is done by first training… ▽ More Diffusion-weighted MRI is nowadays performed routinely due to its prognostic ability, yet the quality of the scans are often unsatisfactory which can subsequently hamper the clinical utility. To overcome the limitations, here we propose a fully unsupervised quality enhancement scheme, which boosts the resolution and removes the motion artifact simultaneously. This process is done by first training the network using optimal transport driven cycleGAN with stochastic degradation block which learns to remove aliasing artifacts and enhance the resolution, then using the trained network in the test stage by utilizing bootstrap subsampling and aggregation for motion artifact suppression. We further show that we can control the trade-off between the amount of artifact correction and resolution by controlling the bootstrap subsampling ratio at the inference stage. To the best of our knowledge, the proposed method is the first to tackle super-resolution and motion artifact correction simultaneously in the context of MRI using unsupervised learning. We demonstrate the efficiency of our method by applying it to both quantitative evaluation using simulation study, and to in vivo diffusion-weighted MR scans, which shows that our method is superior to the current state-of-the-art methods. The proposed method is flexible in that it can be applied to various quality enhancement schemes in other types of MR scans, and also directly to the quality enhancement of apparent diffusion coefficient maps. △ Less

Submitted 1 May, 2021; originally announced May 2021.

arXiv:2105.00194 [pdf, other]

Feature Disentanglement in generating three-dimensional structure from two-dimensional slice with sliceGAN

Authors: Hyung** Chung, Jong Chul Ye

Abstract: Deep generative models are known to be able to model arbitrary probability distributions. Among these, a recent deep generative model, dubbed sliceGAN, proposed a new way of using the generative adversarial network (GAN) to capture the micro-structural characteristics of a two-dimensional (2D) slice and generate three-dimensional (3D) volumes with similar properties. While 3D micrographs are large… ▽ More Deep generative models are known to be able to model arbitrary probability distributions. Among these, a recent deep generative model, dubbed sliceGAN, proposed a new way of using the generative adversarial network (GAN) to capture the micro-structural characteristics of a two-dimensional (2D) slice and generate three-dimensional (3D) volumes with similar properties. While 3D micrographs are largely beneficial in simulating diverse material behavior, they are often much harder to obtain than their 2D counterparts. Hence, sliceGAN opens up many interesting directions of research by learning the representative distribution from 2D slices, and transferring the learned knowledge to generate arbitrary 3D volumes. However, one limitation of sliceGAN is that latent space steering is not possible. Hence, we combine sliceGAN with AdaIN to endow the model with the ability to disentangle the features and control the synthesis. △ Less

Submitted 1 May, 2021; originally announced May 2021.

arXiv:2104.08538 [pdf, other]

Cycle-free CycleGAN using Invertible Generator for Unsupervised Low-Dose CT Denoising

Authors: Taesung Kwon, Jong Chul Ye

Abstract: Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposa… ▽ More Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposal of tunable CycleGAN with Adaptive Instance Normalization (AdaIN) alleviates the problem in part by using a single generator. However, two discriminators and an additional AdaIN code generator are still required for training. To solve this problem, here we present a novel cycle-free Cycle-GAN architecture, which consists of a single generator and a discriminator but still guarantees cycle consistency. The main innovation comes from the observation that the use of an invertible generator automatically fulfills the cycle consistency condition and eliminates the additional discriminator in the CycleGAN formulation. To make the invertible generator more effective, our network is implemented in the wavelet residual domain. Extensive experiments using various levels of low-dose CT images confirm that our method can significantly improve denoising performance using only 10% of learnable parameters and faster training time compared to the conventional CycleGAN. △ Less

Submitted 17 April, 2021; originally announced April 2021.

Comments: 12 pages, 12 figures

arXiv:2104.07235 [pdf, other]

Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification

Authors: Sangjoon Park, Gwanghyun Kim, Yu** Oh, Joon Beom Seo, Sang Min Lee, ** Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Jong Chul Ye

Abstract: Develo** a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data… ▽ More Develo** a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data can be used through structural modeling by the self-attention mechanism. However, the use of existing ViT is not optimal, since feature embedding through direct patch flattening or ResNet backbone in the standard ViT is not intended for CXR. To address this problem, here we propose a novel Vision Transformer that utilizes low-level CXR feature corpus obtained from a backbone network that extracts common CXR findings. Specifically, the backbone network is first trained with large public datasets to detect common abnormal findings such as consolidation, opacity, edema, etc. Then, the embedded features from the backbone network are used as corpora for a Transformer model for the diagnosis and the severity quantification of COVID-19. We evaluate our model on various external test datasets from totally different institutions to evaluate the generalization capability. The experimental results confirm that our model can achieve the state-of-the-art performance in both diagnosis and severity quantification tasks with superior generalization capability, which are sine qua non of widespread deployment. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: 13 pages

arXiv:2104.05892 [pdf, other]

CXR Segmentation by AdaIN-based Domain Adaptation and Knowledge Distillation

Authors: Yu** Oh, Jong Chul Ye

Abstract: As segmentation labels are scarce, extensive researches have been conducted to train segmentation networks with domain adaptation, semi-supervised or self-supervised learning techniques to utilize abundant unlabeled dataset. However, these approaches appear different from each other, so it is not clear how these approaches can be combined for better performance. Inspired by recent multi-domain ima… ▽ More As segmentation labels are scarce, extensive researches have been conducted to train segmentation networks with domain adaptation, semi-supervised or self-supervised learning techniques to utilize abundant unlabeled dataset. However, these approaches appear different from each other, so it is not clear how these approaches can be combined for better performance. Inspired by recent multi-domain image translation approaches, here we propose a novel segmentation framework using adaptive instance normalization (AdaIN), so that a single generator is trained to perform both domain adaptation and semi-supervised segmentation tasks via knowledge distillation by simply changing task-specific AdaIN codes. Specifically, our framework is designed to deal with difficult situations in chest X-ray radiograph (CXR) segmentation, where labels are only available for normal data, but the trained model should be applied to both normal and abnormal data. The proposed network demonstrates great generalizability under domain shift and achieves the state-of-the-art performance for abnormal CXR segmentation. △ Less

Submitted 11 October, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Accepted to ECCV 2022

Journal ref: ECCV 2022, Part XXI, LNCS 13681

arXiv:2104.02895 [pdf, other]

doi 10.1007/978-3-030-67070-2_12

PyNET-CA: Enhanced PyNET with Channel Attention for End-to-End Mobile Image Signal Processing

Authors: Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek

Abstract: Reconstructing RGB image from RAW data obtained with a mobile device is related to a number of image signal processing (ISP) tasks, such as demosaicing, denoising, etc. Deep neural networks have shown promising results over hand-crafted ISP algorithms on solving these tasks separately, or even replacing the whole reconstruction process with one model. Here, we propose PyNET-CA, an end-to-end mobil… ▽ More Reconstructing RGB image from RAW data obtained with a mobile device is related to a number of image signal processing (ISP) tasks, such as demosaicing, denoising, etc. Deep neural networks have shown promising results over hand-crafted ISP algorithms on solving these tasks separately, or even replacing the whole reconstruction process with one model. Here, we propose PyNET-CA, an end-to-end mobile ISP deep learning algorithm for RAW to RGB reconstruction. The model enhances PyNET, a recently proposed state-of-the-art model for mobile ISP, and improve its performance with channel attention and subpixel reconstruction module. We demonstrate the performance of the proposed method with comparative experiments and results from the AIM 2020 learned smartphone ISP challenge. The source code of our implementation is available at https://github.com/egyptdj/skyb-aim2020-public △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: ECCV 2020 AIM workshop accepted version

arXiv:2103.09022 [pdf, other]

Missing Cone Artifacts Removal in ODT using Unsupervised Deep Learning in Projection Domain

Authors: Hyung** Chung, Jaeyoung Huh, Geon Kim, Yong Keun Park, Jong Chul Ye

Abstract: Optical diffraction tomography (ODT) produces three dimensional distribution of refractive index (RI) by measuring scattering fields at various angles. Although the distribution of RI index is highly informative, due to the missing cone problem stemming from the limited-angle acquisition of holograms, reconstructions have very poor resolution along axial direction compared to the horizontal imagin… ▽ More Optical diffraction tomography (ODT) produces three dimensional distribution of refractive index (RI) by measuring scattering fields at various angles. Although the distribution of RI index is highly informative, due to the missing cone problem stemming from the limited-angle acquisition of holograms, reconstructions have very poor resolution along axial direction compared to the horizontal imaging plane. To solve this issue, here we present a novel unsupervised deep learning framework, which learns the probability distribution of missing projection views through optimal transport driven cycleGAN. Experimental results show that missing cone artifact in ODT can be significantly resolved by the proposed method. △ Less

Submitted 18 July, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: This will appear in IEEE Trans. on Computational Imaging

arXiv:2103.07062 [pdf, other]

Severity Quantification and Lesion Localization of COVID-19 on CXR using Vision Transformer

Authors: Gwanghyun Kim, Sangjoon Park, Yu** Oh, Joon Beom Seo, Sang Min Lee, ** Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Jong Chul Ye

Abstract: Under the global pandemic of COVID-19, building an automated framework that quantifies the severity of COVID-19 and localizes the relevant lesion on chest X-ray images has become increasingly important. Although pixel-level lesion severity labels, e.g. lesion segmentation, can be the most excellent target to build a robust model, collecting enough data with such labels is difficult due to time and… ▽ More Under the global pandemic of COVID-19, building an automated framework that quantifies the severity of COVID-19 and localizes the relevant lesion on chest X-ray images has become increasingly important. Although pixel-level lesion severity labels, e.g. lesion segmentation, can be the most excellent target to build a robust model, collecting enough data with such labels is difficult due to time and labor-intensive annotation tasks. Instead, array-based severity labeling that assigns integer scores on six subdivisions of lungs can be an alternative choice enabling the quick labeling. Several groups proposed deep learning algorithms that quantify the severity of COVID-19 using the array-based COVID-19 labels and localize the lesions with explainability maps. To further improve the accuracy and interpretability, here we propose a novel Vision Transformer tailored for both quantification of the severity and clinically applicable localization of the COVID-19 related lesions. Our model is trained in a weakly-supervised manner to generate the full probability maps from weak array-based labels. Furthermore, a novel progressive self-training method enables us to build a model with a small labeled dataset. The quantitative and qualitative analysis on the external testset demonstrates that our method shows comparable performance with radiologists for both tasks with stability in a real-world application. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: 8 pages

arXiv:2103.07055 [pdf, other]

Vision Transformer for COVID-19 CXR Diagnosis using Chest X-ray Feature Corpus

Authors: Sangjoon Park, Gwanghyun Kim, Yu** Oh, Joon Beom Seo, Sang Min Lee, ** Hwan Kim, Sungjun Moon, Jae-Kwang Lim, Jong Chul Ye

Abstract: Under the global COVID-19 crisis, develo** robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that use… ▽ More Under the global COVID-19 crisis, develo** robust diagnosis algorithm for COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data set, although CXR data with other disease are abundant. This situation is suitable for vision transformer architecture that can exploit the abundant unlabeled data using pre-training. However, the direct use of existing vision transformer that uses the corpus generated by the ResNet is not optimal for correct feature embedding. To mitigate this problem, we propose a novel vision Transformer by using the low-level CXR feature corpus that are obtained to extract the abnormal CXR features. Specifically, the backbone network is trained using large public datasets to obtain the abnormal features in routine diagnosis such as consolidation, glass-grass opacity (GGO), etc. Then, the embedded features from the backbone network are used as corpus for vision transformer training. We examine our model on various external test datasets acquired from totally different institutions to assess the generalization ability. Our experiments demonstrate that our method achieved the state-of-art performance and has better generalization capability, which are crucial for a widespread deployment. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: 10 pages

arXiv:2011.10475 [pdf, other]

DeepPhaseCut: Deep Relaxation in Phase for Unsupervised Fourier Phase Retrieval

Authors: Eunju Cha, Chanseok Lee, Mooseok Jang, Jong Chul Ye

Abstract: Fourier phase retrieval is a classical problem of restoring a signal only from the measured magnitude of its Fourier transform. Although Fienup-type algorithms, which use prior knowledge in both spatial and Fourier domains, have been widely used in practice, they can often stall in local minima. Modern methods such as PhaseLift and PhaseCut may offer performance guarantees with the help of convex… ▽ More Fourier phase retrieval is a classical problem of restoring a signal only from the measured magnitude of its Fourier transform. Although Fienup-type algorithms, which use prior knowledge in both spatial and Fourier domains, have been widely used in practice, they can often stall in local minima. Modern methods such as PhaseLift and PhaseCut may offer performance guarantees with the help of convex relaxation. However, these algorithms are usually computationally intensive for practical use. To address this problem, we propose a novel, unsupervised, feed-forward neural network for Fourier phase retrieval which enables immediate high quality reconstruction. Unlike the existing deep learning approaches that use a neural network as a regularization term or an end-to-end blackbox model for supervised training, our algorithm is a feed-forward neural network implementation of PhaseCut algorithm in an unsupervised learning framework. Specifically, our network is composed of two generators: one for the phase estimation using PhaseCut loss, followed by another generator for image reconstruction, all of which are trained simultaneously using a cycleGAN framework without matched data. The link to the classical Fienup-type algorithms and the recent symmetry-breaking learning approach is also revealed. Extensive experiments demonstrate that the proposed method outperforms all existing approaches in Fourier phase retrieval problems. △ Less

Submitted 20 November, 2020; originally announced November 2020.

arXiv:2011.06337 [pdf, other]

Unsupervised MR Motion Artifact Deep Learning using Outlier-Rejecting Bootstrap Aggregation

Authors: Gyutaek Oh, Jeong Eun Lee, Jong Chul Ye

Abstract: Recently, deep learning approaches for MR motion artifact correction have been extensively studied. Although these approaches have shown high performance and reduced computational complexity compared to classical methods, most of them require supervised training using paired artifact-free and artifact-corrupted images, which may prohibit its use in many important clinical applications. For example… ▽ More Recently, deep learning approaches for MR motion artifact correction have been extensively studied. Although these approaches have shown high performance and reduced computational complexity compared to classical methods, most of them require supervised training using paired artifact-free and artifact-corrupted images, which may prohibit its use in many important clinical applications. For example, transient severe motion (TSM) due to acute transient dyspnea in Gd-EOB-DTPA-enhanced MR is difficult to control and model for paired data generation. To address this issue, here we propose a novel unsupervised deep learning scheme through outlier-rejecting bootstrap subsampling and aggregation. This is inspired by the observation that motions usually cause sparse k-space outliers in the phase encoding direction, so k-space subsampling along the phase encoding direction can remove some outliers and the aggregation step can further improve the results from the reconstruction network. Our method does not require any paired data because the training step only requires artifact-free images. Furthermore, to address the smoothing from potential bias to the artifact-free images, the network is trained in an unsupervised manner using optimal transport driven cycleGAN. We verify that our method can be applied for artifact correction from simulated motion as well as real motion from TSM successfully, outperforming existing state-of-the-art deep learning methods. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.04994 [pdf, other]

AIM 2020 Challenge on Learned Image Signal Processing Pipeline

Authors: Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, Wangmeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li , et al. (14 additional authors not shown)

Abstract: This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB map** problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of com… ▽ More This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB map** problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of complex computer vision subtasks, such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The target metric used in this challenge combined fidelity scores (PSNR and SSIM) with solutions' perceptual results measured in a user study. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical image signal processing pipeline modeling. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: Published in ECCV 2020 Workshops (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

arXiv:2009.13777 [pdf]

DeepRegularizer: Rapid Resolution Enhancement of Tomographic Imaging using Deep Learning

Authors: DongHun Ryu, Dongmin Ryu, YoonSeok Baek, Hyungjoo Cho, Geon Kim, Young Seo Kim, Yongki Lee, Yoosik Kim, Jong Chul Ye, Hyun-Seok Min, YongKeun Park

Abstract: Optical diffraction tomography measures the three-dimensional refractive index map of a specimen and visualizes biochemical phenomena at the nanoscale in a non-destructive manner. One major drawback of optical diffraction tomography is poor axial resolution due to limited access to the three-dimensional optical transfer function. This missing cone problem has been addressed through regularization… ▽ More Optical diffraction tomography measures the three-dimensional refractive index map of a specimen and visualizes biochemical phenomena at the nanoscale in a non-destructive manner. One major drawback of optical diffraction tomography is poor axial resolution due to limited access to the three-dimensional optical transfer function. This missing cone problem has been addressed through regularization algorithms that use a priori information, such as non-negativity and sample smoothness. However, the iterative nature of these algorithms and their parameter dependency make real-time visualization impossible. In this article, we propose and experimentally demonstrate a deep neural network, which we term DeepRegularizer, that rapidly improves the resolution of a three-dimensional refractive index map. Trained with pairs of datasets (a raw refractive index tomogram and a resolution-enhanced refractive index tomogram via the iterative total variation algorithm), the three-dimensional U-net-based convolutional neural network learns a transformation between the two tomogram domains. The feasibility and generalizability of our network are demonstrated using bacterial cells and a human leukaemic cell line, and by validating the model across different samples. DeepRegularizer offers more than an order of magnitude faster regularization performance compared to the conventional iterative method. We envision that the proposed data-driven approach can bypass the high time complexity of various image reconstructions in other imaging modalities. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2008.13646 [pdf, other]

Switchable Deep Beamformer

Authors: Shujaat Khan, Jaeyoung Huh, Jong Chul Ye

Abstract: Recent proposals of deep beamformers using deep neural networks have attracted significant attention as computational efficient alternatives to adaptive and compressive beamformers. Moreover, deep beamformers are versatile in that image post-processing algorithms can be combined with the beamforming. Unfortunately, in the current technology, a separate beamformer should be trained and stored for e… ▽ More Recent proposals of deep beamformers using deep neural networks have attracted significant attention as computational efficient alternatives to adaptive and compressive beamformers. Moreover, deep beamformers are versatile in that image post-processing algorithms can be combined with the beamforming. Unfortunately, in the current technology, a separate beamformer should be trained and stored for each application, demanding significant scanner resources. To address this problem, here we propose a {\em switchable} deep beamformer that can produce various types of output such as DAS, speckle removal, deconvolution, etc., using a single network with a simple switch. In particular, the switch is implemented through Adaptive Instance Normalization (AdaIN) layers, so that various output can be generated by merely changing the AdaIN code. Experimental results using B-mode focused ultrasound confirm the flexibility and efficacy of the proposed methods for various applications. △ Less

Submitted 4 September, 2020; v1 submitted 31 August, 2020; originally announced August 2020.

arXiv:2008.12967 [pdf, other]

Unpaired Deep Learning for Accelerated MRI using Optimal Transport Driven CycleGAN

Authors: Gyutaek Oh, Byeongsu Sim, Hyung** Chung, Leonard Sunwoo, Jong Chul Ye

Abstract: Recently, deep learning approaches for accelerated MRI have been extensively studied thanks to their high performance reconstruction in spite of significantly reduced runtime complexity. These neural networks are usually trained in a supervised manner, so matched pairs of subsampled and fully sampled k-space data are required. Unfortunately, it is often difficult to acquire matched fully sampled k… ▽ More Recently, deep learning approaches for accelerated MRI have been extensively studied thanks to their high performance reconstruction in spite of significantly reduced runtime complexity. These neural networks are usually trained in a supervised manner, so matched pairs of subsampled and fully sampled k-space data are required. Unfortunately, it is often difficult to acquire matched fully sampled k-space data, since the acquisition of fully sampled k-space data requires long scan time and often leads to the change of the acquisition protocol. Therefore, unpaired deep learning without matched label data has become a very important research topic. In this paper, we propose an unpaired deep learning approach using a optimal transport driven cycle-consistent generative adversarial network (OT-cycleGAN) that employs a single pair of generator and discriminator. The proposed OT-cycleGAN architecture is rigorously derived from a dual formulation of the optimal transport formulation using a specially designed penalized least squares cost. The experimental results show that our method can reconstruct high resolution MR images from accelerated k- space data from both single and multiple coil acquisition, without requiring matched reference data. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: Accepted for IEEE Transactions on Computational Imaging

arXiv:2008.05772 [pdf, other]

CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration

Authors: Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, June-Goo Lee, Jong Chul Ye

Abstract: Image registration is a fundamental task in medical image analysis. Recently, deep learning based image registration methods have been extensively investigated due to their excellent performance despite the ultra-fast computational time. However, the existing deep learning methods still have limitation in the preservation of original topology during the deformation with registration vector fields.… ▽ More Image registration is a fundamental task in medical image analysis. Recently, deep learning based image registration methods have been extensively investigated due to their excellent performance despite the ultra-fast computational time. However, the existing deep learning methods still have limitation in the preservation of original topology during the deformation with registration vector fields. To address this issues, here we present a cycle-consistent deformable image registration. The cycle consistency enhances image registration performance by providing an implicit regularization to preserve topology during the deformation. The proposed method is so flexible that can be applied for both 2D and 3D registration problems for various applications, and can be easily extended to multi-scale implementation to deal with the memory issues in large volume registration. Experimental results on various datasets from medical and non-medical applications demonstrate that the proposed method provides effective and accurate registration on diverse image pairs within a few seconds. Qualitative and quantitative evaluations on deformation fields also verify the effectiveness of the cycle consistency of the proposed method. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2008.05753 [pdf, other]

AdaIN-Switchable CycleGAN for Efficient Unsupervised Low-Dose CT Denoising

Authors: Jawook Gu, Jong Chul Ye

Abstract: Recently, deep learning approaches have been extensively studied for low-dose CT denoising thanks to its superior performance despite the fast computational time. In particular, cycleGAN has been demonstrated as a powerful unsupervised learning scheme to improve the low-dose CT image quality without requiring matched high-dose reference data. Unfortunately, one of the main limitations of the cycle… ▽ More Recently, deep learning approaches have been extensively studied for low-dose CT denoising thanks to its superior performance despite the fast computational time. In particular, cycleGAN has been demonstrated as a powerful unsupervised learning scheme to improve the low-dose CT image quality without requiring matched high-dose reference data. Unfortunately, one of the main limitations of the cycleGAN approach is that it requires two deep neural network generators at the training phase, although only one of them is used at the inference phase. The secondary auxiliary generator is needed to enforce the cycle-consistency, but the additional memory requirement and increases of the learnable parameters are the main huddles for cycleGAN training. To address this issue, here we propose a novel cycleGAN architecture using a single switchable generator. In particular, a single generator is implemented using adaptive instance normalization (AdaIN) layers so that the baseline generator converting a low-dose CT image to a routine-dose CT image can be switched to a generator converting high-dose to low-dose by simply changing the AdaIN code. Thanks to the shared baseline network, the additional memory requirement and weight increases are minimized, and the training can be done more stably even with small training data. Experimental results show that the proposed method outperforms the previous cycleGAN approaches while using only about half the parameters. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: 12 pages, 10 figures

arXiv:2008.01362 [pdf, other]

Two-Stage Deep Learning for Accelerated 3D Time-of-Flight MRA without Matched Training Data

Authors: Hyung** Chung, Eunju Cha, Leonard Sunwoo, Jong Chul Ye

Abstract: Time-of-flight magnetic resonance angiography (TOF-MRA) is one of the most widely used non-contrast MR imaging methods to visualize blood vessels, but due to the 3-D volume acquisition highly accelerated acquisition is necessary. Accordingly, high quality reconstruction from undersampled TOF-MRA is an important research topic for deep learning. However, most existing deep learning works require ma… ▽ More Time-of-flight magnetic resonance angiography (TOF-MRA) is one of the most widely used non-contrast MR imaging methods to visualize blood vessels, but due to the 3-D volume acquisition highly accelerated acquisition is necessary. Accordingly, high quality reconstruction from undersampled TOF-MRA is an important research topic for deep learning. However, most existing deep learning works require matched reference data for supervised training, which are often difficult to obtain. By extending the recent theoretical understanding of cycleGAN from the optimal transport theory, here we propose a novel two-stage unsupervised deep learning approach, which is composed of the multi-coil reconstruction network along the coronal plane followed by a multi-planar refinement network along the axial plane. Specifically, the first network is trained in the square-root of sum of squares (SSoS) domain to achieve high quality parallel image reconstruction, whereas the second refinement network is designed to efficiently learn the characteristics of highly-activated blood flow using double-headed max-pool discriminator. Extensive experiments demonstrate that the proposed learning process without matched reference exceeds performance of state-of-the-art compressed sensing (CS)-based method and provides comparable or even better results than supervised learning approaches. △ Less

Submitted 4 August, 2020; originally announced August 2020.

arXiv:2007.05205 [pdf, other]

OT-driven Multi-Domain Unsupervised Ultrasound Image Artifact Removal using a Single CNN

Authors: Jaeyoung Huh, Shujaat Khan, Jong Chul Ye

Abstract: Ultrasound imaging (US) often suffers from distinct image artifacts from various sources. Classic approaches for solving these problems are usually model-based iterative approaches that have been developed specifically for each type of artifact, which are often computationally intensive. Recently, deep learning approaches have been proposed as computationally efficient and high performance alterna… ▽ More Ultrasound imaging (US) often suffers from distinct image artifacts from various sources. Classic approaches for solving these problems are usually model-based iterative approaches that have been developed specifically for each type of artifact, which are often computationally intensive. Recently, deep learning approaches have been proposed as computationally efficient and high performance alternatives. Unfortunately, in the current deep learning approaches, a dedicated neural network should be trained with matched training data for each specific artifact type. This poses a fundamental limitation in the practical use of deep learning for US, since large number of models should be stored to deal with various US image artifacts. Inspired by the recent success of multi-domain image transfer, here we propose a novel, unsupervised, deep learning approach in which a single neural network can be used to deal with different types of US artifacts simply by changing a mask vector that switches between different target domains. Our algorithm is rigorously derived using an optimal transport (OT) theory for cascaded probability measures. Experimental results using phantom and in vivo data demonstrate that the proposed method can generate high quality image by removing distinct artifacts, which are comparable to those obtained by separately trained multiple neural networks. △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2007.03480 [pdf, other]

Unsupervised CT Metal Artifact Learning using Attention-guided beta-CycleGAN

Authors: Junghyun Lee, Jawook Gu, Jong Chul Ye

Abstract: Metal artifact reduction (MAR) is one of the most important research topics in computed tomography (CT). With the advance of deep learning technology for image reconstruction,various deep learning methods have been also suggested for metal artifact removal, among which supervised learning methods are most popular. However, matched non-metal and metal image pairs are difficult to obtain in real CT… ▽ More Metal artifact reduction (MAR) is one of the most important research topics in computed tomography (CT). With the advance of deep learning technology for image reconstruction,various deep learning methods have been also suggested for metal artifact removal, among which supervised learning methods are most popular. However, matched non-metal and metal image pairs are difficult to obtain in real CT acquisition. Recently, a promising unsupervised learning for MAR was proposed using feature disentanglement, but the resulting network architecture is complication and difficult to handle large size clinical images. To address this, here we propose a much simpler and much effective unsupervised MAR method for CT. The proposed method is based on a novel beta-cycleGAN architecture derived from the optimal transport theory for appropriate feature space disentanglement. Another important contribution is to show that attention mechanism is the key element to effectively remove the metal artifacts. Specifically, by adding the convolutional block attention module (CBAM) layers with a proper disentanglement parameter, experimental results confirm that we can get more improved MAR that preserves the detailed texture of the original image. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:2006.14773 [pdf, other]

Pushing the Limit of Unsupervised Learning for Ultrasound Image Artifact Removal

Authors: Shujaat Khan, Jaeyoung Huh, Jong Chul Ye

Abstract: Ultrasound (US) imaging is a fast and non-invasive imaging modality which is widely used for real-time clinical imaging applications without concerning about radiation hazard. Unfortunately, it often suffers from poor visual quality from various origins, such as speckle noises, blurring, multi-line acquisition (MLA), limited RF channels, small number of view angles for the case of plane wave imagi… ▽ More Ultrasound (US) imaging is a fast and non-invasive imaging modality which is widely used for real-time clinical imaging applications without concerning about radiation hazard. Unfortunately, it often suffers from poor visual quality from various origins, such as speckle noises, blurring, multi-line acquisition (MLA), limited RF channels, small number of view angles for the case of plane wave imaging, etc. Classical methods to deal with these problems include image-domain signal processing approaches using various adaptive filtering and model-based approaches. Recently, deep learning approaches have been successfully used for ultrasound imaging field. However, one of the limitations of these approaches is that paired high quality images for supervised training are difficult to obtain in many practical applications. In this paper, inspired by the recent theory of unsupervised learning using optimal transport driven cycleGAN (OT-cycleGAN), we investigate applicability of unsupervised deep learning for US artifact removal problems without matched reference data. Experimental results for various tasks such as deconvolution, speckle removal, limited data artifact removal, etc. confirmed that our unsupervised learning method provides comparable results to supervised learning for many practical applications. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Showing 1–50 of 67 results for author: Ye, J C