-
CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced Brain Tumor MRI Synthesis
Authors:
Lujun Gui,
Chuyang Ye,
Tianyi Yan
Abstract:
Contrast-enhanced magnetic resonance imaging (MRI) is pivotal in the pipeline of brain tumor segmentation and analysis. Gadolinium-based contrast agents, as the most commonly used contrast agents, are expensive and may have potential side effects, and it is desired to obtain contrast-enhanced brain tumor MRI scans without the actual use of contrast agents. Deep learning methods have been applied t…
▽ More
Contrast-enhanced magnetic resonance imaging (MRI) is pivotal in the pipeline of brain tumor segmentation and analysis. Gadolinium-based contrast agents, as the most commonly used contrast agents, are expensive and may have potential side effects, and it is desired to obtain contrast-enhanced brain tumor MRI scans without the actual use of contrast agents. Deep learning methods have been applied to synthesize virtual contrast-enhanced MRI scans from non-contrast images. However, as this synthesis problem is inherently ill-posed, these methods fall short in producing high-quality results. In this work, we propose Conditional Autoregressive Vision Model (CAVM) for improving the synthesis of contrast-enhanced brain tumor MRI. As the enhancement of image intensity grows with a higher dose of contrast agents, we assume that it is less challenging to synthesize a virtual image with a lower dose, where the difference between the contrast-enhanced and non-contrast images is smaller. Thus, CAVM gradually increases the contrast agent dosage and produces higher-dose images based on previous lower-dose ones until the final desired dose is achieved. Inspired by the resemblance between the gradual dose increase and the Chain-of-Thought approach in natural language processing, CAVM uses an autoregressive strategy with a decomposition tokenizer and a decoder. Specifically, the tokenizer is applied to obtain a more compact image representation for computational efficiency, and it decomposes the image into dose-variant and dose-invariant tokens. Then, a masked self-attention mechanism is developed for autoregression that gradually increases the dose of the virtual image based on the dose-variant tokens. Finally, the updated dose-variant tokens corresponding to the desired dose are decoded together with dose-invariant tokens to produce the final contrast-enhanced MRI.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
Authors:
Xinru Zhang,
Ni Ou,
Berke Doga Basaran,
Marco Visentin,
Mengyun Qiao,
Renyang Gu,
Cheng Ouyang,
Yaou Liu,
Paul M. Matthew,
Chuyang Ye,
Wenjia Bai
Abstract:
Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation…
▽ More
Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Advancing Brain Tumor Inpainting with Generative Models
Authors:
Ruizhi Zhu,
Xinru Zhang,
Haowen Pang,
Chundan Xu,
Chuyang Ye
Abstract:
Synthesizing healthy brain scans from diseased brain scans offers a potential solution to address the limitations of general-purpose algorithms, such as tissue segmentation and brain extraction algorithms, which may not effectively handle diseased images. We consider this a 3D inpainting task and investigate the adaptation of 2D inpainting methods to meet the requirements of 3D magnetic resonance…
▽ More
Synthesizing healthy brain scans from diseased brain scans offers a potential solution to address the limitations of general-purpose algorithms, such as tissue segmentation and brain extraction algorithms, which may not effectively handle diseased images. We consider this a 3D inpainting task and investigate the adaptation of 2D inpainting methods to meet the requirements of 3D magnetic resonance imaging(MRI) data. Our contributions encompass potential modifications tailored to MRI-specific needs, and we conducted evaluations of multiple inpainting techniques using the BraTS2023 Inpainting datasets to assess their efficacy and limitations.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Breast Ultrasound Report Generation using LangChain
Authors:
Jaeyoung Huh,
Hyun Jeong Park,
Jong Chul Ye
Abstract:
Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multipl…
▽ More
Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multiple images of various parts to compile a single report, presenting a time-consuming challenge. To address this problem, we propose the integration of multiple image analysis tools through a LangChain using Large Language Models (LLM), into the breast reporting process. Through a combination of designated tools and text generation through LangChain, our method can accurately extract relevant features from ultrasound images, interpret them in a clinical context, and produce comprehensive and standardized reports. This approach not only reduces the burden on radiologists and healthcare professionals but also enhances the consistency and quality of reports. The extensive experiments shows that each tools involved in the proposed method can offer qualitatively and quantitatively significant results. Furthermore, clinical evaluation on the generated reports demonstrates that the proposed method can make report in clinically meaningful way.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Authors:
Yu** Oh,
Sangjoon Park,
Hwa Kyung Byun,
Yeona Cho,
Ik Jae Lee,
** Sung Kim,
Jong Chul Ye
Abstract:
Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven mul…
▽ More
Target volume contouring for radiation therapy is considered significantly more challenging than the normal organ segmentation tasks as it necessitates the utilization of both image and text-based clinical information. Inspired by the recent advancement of large language models (LLMs) that can facilitate the integration of the textural information and images, here we present a novel LLM-driven multimodal AI, namely LLMSeg, that utilizes the clinical text information and is applicable to the challenging task of target volume contouring for radiation therapy, and validate it within the context of breast cancer radiation therapy target volume contouring. Using external validation and data-insufficient environments, which attributes highly conducive to real-world applications, we demonstrate that the proposed model exhibits markedly improved performance compared to conventional unimodal AI models, particularly exhibiting robust generalization performance and data efficiency. To our best knowledge, this is the first LLM-driven multimodal AI model that integrates the clinical text information into target volume delineation for radiation oncology.
△ Less
Submitted 15 April, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy
Authors:
Cassandra Tong Ye,
Jiashu Han,
Kunzan Liu,
Anastasios Angelopoulos,
Linda Griffith,
Kristina Monakhova,
Sixian You
Abstract:
Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep…
▽ More
Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep learning could be used to denoise multiphoton microscopy measurements, but these algorithms can be prone to hallucination, which can be disastrous for medical and scientific applications. We propose a method to simultaneously denoise and predict pixel-wise uncertainty for multiphoton imaging measurements, improving algorithm trustworthiness and providing statistical guarantees for the deep learning predictions. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. We demonstrate our method on experimental noisy MPM measurements of human endometrium tissues, showing that we can maintain fine features and outperform other denoising methods while predicting uncertainty at each pixel. Finally, with our adaptive acquisition technique, we demonstrate a 120X reduction in acquisition time and total light dose while successfully recovering fine features in the sample. We are the first to demonstrate distribution-free uncertainty quantification for a denoising task with real experimental data and the first to propose adaptive acquisition based on reconstruction uncertainty
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Progressive Dual Priori Network for Generalized Breast Tumor Segmentation
Authors:
Li Wang,
Lihui Wang,
Zixiang Kuai,
Lei Tang,
Yingfeng Ou,
Chen Ye,
Yuemin Zhu
Abstract:
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regi…
▽ More
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast and irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different centers. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC and HD95 of PDPNet were improved at least by 5.13% and 7.58% respectively on multi-center test sets. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregular tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance. The source code and open data can be accessed at https://github.com/wangli100209/PDPNet.
△ Less
Submitted 16 June, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Better Generalization of White Matter Tract Segmentation to Arbitrary Datasets with Scaled Residual Bootstrap
Authors:
Wan Liu,
Chuyang Ye
Abstract:
White matter (WM) tract segmentation is a crucial step for brain connectivity studies. It is performed on diffusion magnetic resonance imaging (dMRI), and deep neural networks (DNNs) have achieved promising segmentation accuracy. Existing DNN-based methods use an annotated dataset for model training. However, the performance of the trained model on a different test dataset may not be optimal due t…
▽ More
White matter (WM) tract segmentation is a crucial step for brain connectivity studies. It is performed on diffusion magnetic resonance imaging (dMRI), and deep neural networks (DNNs) have achieved promising segmentation accuracy. Existing DNN-based methods use an annotated dataset for model training. However, the performance of the trained model on a different test dataset may not be optimal due to distribution shift, and it is desirable to design WM tract segmentation approaches that allow better generalization of the segmentation model to arbitrary test datasets. In this work, we propose a WM tract segmentation approach that improves the generalization with scaled residual bootstrap. The difference between dMRI scans in training and test datasets is most noticeably caused by the different numbers of diffusion gradients and noise levels. Since both of them lead to different signal-to-noise ratios (SNRs) between the training and test data, we propose to augment the training scans by adjusting the noise magnitude and develop an adapted residual bootstrap strategy for the augmentation. To validate the proposed approach, two dMRI datasets were used, and the experimental results show that our method consistently improved the generalization of WM tract segmentation under various settings.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation
Authors:
Boah Kim,
Yu** Oh,
Bradford J. Wood,
Ronald M. Summers,
Jong Chul Ye
Abstract:
Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this pape…
▽ More
Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this paper presents a self-supervised vessel segmentation method, dubbed the contrastive diffusion adversarial representation learning (C-DARL) model. Our model is composed of a diffusion module and a generation module that learns the distribution of multi-domain blood vessel data by generating synthetic vessel images from diffusion latent. Moreover, we employ contrastive learning through a mask-based contrastive loss so that the model can learn more realistic vessel representations. To validate the efficacy, C-DARL is trained using various vessel datasets, including coronary angiograms, abdominal digital subtraction angiograms, and retinal imaging. Experimental results confirm that our model achieves performance improvement over baseline methods with noise robustness, suggesting the effectiveness of C-DARL for vessel segmentation.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Generative AI for Medical Imaging: extending the MONAI Framework
Authors:
Walter H. L. Pinaya,
Mark S. Graham,
Eric Kerfoot,
Petru-Daniel Tudosiu,
Jessica Dafflon,
Virginia Fernandez,
Pedro Sanchez,
Julia Wolleb,
Pedro F. da Costa,
Ashay Patel,
Hyung** Chung,
Can Zhao,
Wei Peng,
Zelong Liu,
Xueyan Mei,
Oeslle Lucena,
Jong Chul Ye,
Sotirios A. Tsaftaris,
Prerna Dogra,
Andrew Feng,
Marc Modat,
Parashkev Nachev,
Sebastien Ourselin,
M. Jorge Cardoso
Abstract:
Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the comp…
▽ More
Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the complexity of these models, their implementation and reproducibility can be difficult. This complexity can hinder progress, act as a use barrier, and dissuade the comparison of new methods with existing works. In this study, we present MONAI Generative Models, a freely available open-source platform that allows researchers and developers to easily train, evaluate, and deploy generative models and related applications. Our platform reproduces state-of-art studies in a standardised way involving different architectures (such as diffusion models, autoregressive transformers, and GANs), and provides pre-trained models for the community. We have implemented these models in a generalisable fashion, illustrating that their results can be extended to 2D or 3D scenarios, including medical images with different modalities (like CT, MRI, and X-Ray data) and from different anatomical areas. Finally, we adopt a modular and extensible approach, ensuring long-term maintainability and the extension of current applications for future features.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI
Authors:
Zimeng Li,
Sa Xiao,
Cheng Wang,
Haidong Li,
Xiuchao Zhao,
Caohui Duan,
Qian Zhou,
Qiuchen Rao,
Yuan Fang,
Junshuai Xie,
Lei Shi,
Fumin Guo,
Chaohui Ye,
Xin Zhou
Abstract:
Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) direc…
▽ More
Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care.
△ Less
Submitted 13 November, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Unpaired Deep Learning for Pharmacokinetic Parameter Estimation from Dynamic Contrast-Enhanced MRI
Authors:
Gyutaek Oh,
Won-** Moon,
Jong Chul Ye
Abstract:
DCE-MRI provides information about vascular permeability and tissue perfusion through the acquisition of pharmacokinetic parameters. However, traditional methods for estimating these pharmacokinetic parameters involve fitting tracer kinetic models, which often suffer from computational complexity and low accuracy due to noisy arterial input function (AIF) measurements. Although some deep learning…
▽ More
DCE-MRI provides information about vascular permeability and tissue perfusion through the acquisition of pharmacokinetic parameters. However, traditional methods for estimating these pharmacokinetic parameters involve fitting tracer kinetic models, which often suffer from computational complexity and low accuracy due to noisy arterial input function (AIF) measurements. Although some deep learning approaches have been proposed to tackle these challenges, most existing methods rely on supervised learning that requires paired input DCE-MRI and labeled pharmacokinetic parameter maps. This dependency on labeled data introduces significant time and resource constraints, as well as potential noise in the labels, making supervised learning methods often impractical. To address these limitations, here we present a novel unpaired deep learning method for estimating both pharmacokinetic parameters and the AIF using a physics-driven CycleGAN approach. Our proposed CycleGAN framework is designed based on the underlying physics model, resulting in a simpler architecture with a single generator and discriminator pair. Crucially, our experimental results indicate that our method, which does not necessitate separate AIF measurements, produces more reliable pharmacokinetic parameters than other techniques.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Score-based Diffusion Models for Bayesian Image Reconstruction
Authors:
Michael T. McCann,
Hyung** Chung,
Jong Chul Ye,
Marc L. Klasky
Abstract:
This paper explores the use of score-based diffusion models for Bayesian image reconstruction. Diffusion models are an efficient tool for generative modeling. Diffusion models can also be used for solving image reconstruction problems. We present a simple and flexible algorithm for training a diffusion model and using it for maximum a posteriori reconstruction, minimum mean square error reconstruc…
▽ More
This paper explores the use of score-based diffusion models for Bayesian image reconstruction. Diffusion models are an efficient tool for generative modeling. Diffusion models can also be used for solving image reconstruction problems. We present a simple and flexible algorithm for training a diffusion model and using it for maximum a posteriori reconstruction, minimum mean square error reconstruction, and posterior sampling. We present experiments on both a linear and a nonlinear reconstruction problem that highlight the strengths and limitations of the approach.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
NexToU: Efficient Topology-Aware U-Net for Medical Image Segmentation
Authors:
Pengcheng Shi,
Xutao Guo,
Yanwu Yang,
Chenfei Ye,
Ting Ma
Abstract:
Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variabili…
▽ More
Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variability, particularly for the vasculature. Motivated by the successful breakthroughs of graph neural networks (GNN) in capturing topological properties and non-Euclidean relationships across various fields, we propose NexToU, a novel hybrid architecture for medical image segmentation. NexToU comprises improved Pool GNN and Swin GNN modules from Vision GNN (ViG) for learning both global and local topological representations while minimizing computational costs. To address the containment and exclusion relationships among various anatomical structures, we reformulate the topological interaction (TI) module based on the nature of binary trees, rapidly encoding the topological constraints into NexToU. Extensive experiments conducted on three datasets (including distinct imaging dimensions, disease types, and imaging modalities) demonstrate that our method consistently outperforms other state-of-the-art (SOTA) architectures. All the code is publicly available at https://github.com/PengchengShi1220/NexToU.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models
Authors:
Suhyeon Lee,
Hyung** Chung,
Minyoung Park,
Jonghyuk Park,
Wi-Sun Ryu,
Jong Chul Ye
Abstract:
Diffusion models have become a popular approach for image generation and reconstruction due to their numerous advantages. However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to…
▽ More
Diffusion models have become a popular approach for image generation and reconstruction due to their numerous advantages. However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem. By modeling the 3D data distribution as a product of 2D distributions sliced in different directions, our method effectively addresses the curse of dimensionality. Our experimental results demonstrate that our method is highly effective for 3D medical image reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing MRI, and sparse-view CT. Our method can generate high-quality voxel volumes suitable for medical applications.
△ Less
Submitted 1 September, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Authors:
Jaeyoung Huh,
Sangjoon Park,
Jeong Eun Lee,
Jong Chul Ye
Abstract:
Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on…
▽ More
Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on typists to transcribe their voice recordings. However, develo** an STT model for the medical domain is challenging due to the lack of sufficient speech and text datasets. To address this issue, we propose a medical-domain text correction method that modifies the output text of a general STT system using the Vision Language Pre-training (VLP) method. VLP combines textual and visual information to correct text based on image knowledge. Our extensive experiments demonstrate that the proposed method offers quantitatively and clinically significant improvements in STT performance in the medical field. We further show that multi-modal understanding of image and text information outperforms single-modal understanding using only text information.
△ Less
Submitted 27 February, 2023;
originally announced March 2023.
-
Annealed Score-Based Diffusion Model for MR Motion Artifact Reduction
Authors:
Gyutaek Oh,
Jeong Eun Lee,
Jong Chul Ye
Abstract:
Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a stric…
▽ More
Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a strict motion-corruption model, which limits their use for real-world situations. To address this issue, here we present an annealed score-based diffusion model for MRI motion artifact reduction. Specifically, we train a score-based model using only motion-free images, and then motion artifacts are removed by applying forward and reverse diffusion processes repeatedly to gradually impose a low-frequency data consistency. Experimental results verify that the proposed method successfully reduces both simulated and in vivo motion artifacts, outperforming the state-of-the-art deep learning methods.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation
Authors:
Xutao Guo,
Yanwu Yang,
Chenfei Ye,
Shang Lu,
Yang Xiang,
Ting Ma
Abstract:
Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian n…
▽ More
Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Multi-modal Dynamic Graph Network: Coupling Structural and Functional Connectome for Disease Diagnosis and Classification
Authors:
Yanwu Yang,
Xutao Guo,
Zhikai Chang,
Chenfei Ye,
Yang Xiang,
Ting Ma
Abstract:
Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of gre…
▽ More
Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of great importance in modeling brain connectome networks and relating disease-specific patterns. However, most existing graph methods explicitly require known graph structures, which are not available in the sophisticated brain system. Especially in heterogeneous multi-modal brain networks, there exists a great challenge to model interactions among brain regions in consideration of inter-modal dependencies. In this study, we propose a Multi-modal Dynamic Graph Convolution Network (MDGCN) for structural and functional brain network learning. Our method benefits from modeling inter-modal representations and relating attentive multi-model associations into dynamic graphs with a compositional correspondence matrix. Moreover, a bilateral graph convolution layer is proposed to aggregate multi-modal representations in terms of multi-modal associations. Extensive experiments on three datasets demonstrate the superiority of our proposed method in terms of disease classification, with the accuracy of 90.4%, 85.9% and 98.3% in predicting Mild Cognitive Impairment (MCI), Parkinson's disease (PD), and schizophrenia (SCHZ) respectively. Furthermore, our statistical evaluations on the correspondence matrix exhibit a high correspondence with previous evidence of biomarkers.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation
Authors:
Boah Kim,
Yu** Oh,
Jong Chul Ye
Abstract:
Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address t…
▽ More
Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it to vessel segmentation. In particular, for self-supervised vessel segmentation, DARL learns the background signal using a diffusion module, which lets a generation module effectively provide vessel representations. Also, by adversarial learning based on the proposed switchable spatially-adaptive denormalization, our model estimates synthetic fake vessel images as well as vessel segmentation masks, which further makes the model capture vessel-relevant semantic information. Once the proposed model is trained, the model generates segmentation masks in a single step and can be applied to general vascular structure segmentation of coronary angiography and retinal images. Experimental results on various datasets show that our method significantly outperforms existing unsupervised and self-supervised vessel segmentation methods.
△ Less
Submitted 15 February, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Estimating Brain Age with Global and Local Dependencies
Authors:
Yanwu Yang,
Xutao Guo,
Zhikai Chang,
Chenfei Ye,
Yang Xiang,
Haiyan Lv,
Ting Ma
Abstract:
The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a…
▽ More
The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Self-supervised Multi-modal Training from Uncurated Image and Reports Enables Zero-shot Oversight Artificial Intelligence in Radiology
Authors:
Sangjoon Park,
Eun Sun Lee,
Kyung Sook Shin,
Jeong Eun Lee,
Jong Chul Ye
Abstract:
Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited success…
▽ More
Oversight AI is an emerging concept in radiology where the AI forms a symbiosis with radiologists by continuously supporting radiologists in their decision-making. Recent advances in vision-language models sheds a light on the long-standing problems of the oversight AI by the understanding both visual and textual concepts and their semantic correspondences. However, there have been limited successes in the application of vision-language models in the medical domain, as the current vision-language models and learning strategies for photographic images and captions call for the web-scale data corpus of image and text pairs which was not often feasible in the medical domain. To address this, here we present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL), leveraging the key components to be tailored for the medical domain. Our medical X-VL model is based on the following components: self-supervised uni-modal models in medical domain and fusion encoder to bridge them, momentum distillation, sentence-wise contrastive learning for medical reports, and the sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed the current state-of-the-art models in two different medical image database, suggesting the novel clinical usage of our oversight AI model for monitoring human errors. Our method was especially successful in the data-limited setting, which is frequently encountered in the clinics, suggesting the potential widespread applicability in medical domain.
△ Less
Submitted 12 April, 2023; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Patch-wise Deep Metric Learning for Unsupervised Low-Dose CT Denoising
Authors:
Chanyong Jung,
Joonhyung Lee,
Sunkyoung You,
Jong Chul Ye
Abstract:
The acquisition conditions for low-dose and high-dose CT images are usually different, so that the shifts in the CT numbers often occur. Accordingly, unsupervised deep learning-based approaches, which learn the target image distribution, often introduce CT number distortions and result in detrimental effects in diagnostic performance. To address this, here we propose a novel unsupervised learning…
▽ More
The acquisition conditions for low-dose and high-dose CT images are usually different, so that the shifts in the CT numbers often occur. Accordingly, unsupervised deep learning-based approaches, which learn the target image distribution, often introduce CT number distortions and result in detrimental effects in diagnostic performance. To address this, here we propose a novel unsupervised learning approach for lowdose CT reconstruction using patch-wise deep metric learning. The key idea is to learn embedding space by pulling the positive pairs of image patches which shares the same anatomical structure, and pushing the negative pairs which have same noise level each other. Thereby, the network is trained to suppress the noise level, while retaining the original global CT number distributions even after the image translation. Experimental results confirm that our deep metric learning plays a critical role in producing high quality denoised images without CT number shift.
△ Less
Submitted 13 July, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Diffusion Deformable Model for 4D Temporal Medical Image Generation
Authors:
Boah Kim,
Jong Chul Ye
Abstract:
Temporal volume images with 3D+t (4D) information are often used in medical imaging to statistically analyze temporal dynamics or capture disease progression. Although deep-learning-based generative models for natural images have been extensively studied, approaches for temporal medical image generation such as 4D cardiac volume data are limited. In this work, we present a novel deep learning mode…
▽ More
Temporal volume images with 3D+t (4D) information are often used in medical imaging to statistically analyze temporal dynamics or capture disease progression. Although deep-learning-based generative models for natural images have been extensively studied, approaches for temporal medical image generation such as 4D cardiac volume data are limited. In this work, we present a novel deep learning model that generates intermediate temporal volumes between source and target volumes. Specifically, we propose a diffusion deformable model (DDM) by adapting the denoising diffusion probabilistic model that has recently been widely investigated for realistic image generation. Our proposed DDM is composed of the diffusion and the deformation modules so that DDM can learn spatial deformation information between the source and target volumes and provide a latent code for generating intermediate frames along a geodesic path. Once our model is trained, the latent code estimated from the diffusion module is simply interpolated and fed into the deformation module, which enables DDM to generate temporal frames along the continuous trajectory while preserving the topology of the source image. We demonstrate the proposed method with the 4D cardiac MR image generation between the diastolic and systolic phases for each subject. Compared to the existing deformation methods, our DDM achieves high performance on temporal volume generation.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
A microstructure estimation Transformer inspired by sparse representation for diffusion MRI
Authors:
Tianshu Zheng,
Cong Sun,
Weihao Zheng,
Wen Shi,
Haotian Li,
Yi Sun,
Yi Zhang,
Guangbin Wang,
Chuyang Ye,
Dan Wu
Abstract:
Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by th…
▽ More
Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by the superior performance of the Transformer, in this work, we present a learning-based framework based on Transformer, namely, a Microstructure Estimation Transformer with Sparse Coding (METSC) for dMRI-based microstructure estimation with downsampled q-space data. To take advantage of the Transformer while addressing its limitation in large training data requirements, we explicitly introduce an inductive bias - model bias into the Transformer using a sparse coding technique to facilitate the training process. Thus, the METSC is composed with three stages, an embedding stage, a sparse representation stage, and a map** stage. The embedding stage is a Transformer-based structure that encodes the signal to ensure the voxel is represented effectively. In the sparse representation stage, a dictionary is constructed by solving a sparse reconstruction problem that unfolds the Iterative Hard Thresholding (IHT) process. The map** stage is essentially a decoder that computes the microstructural parameters from the output of the second stage, based on the weighted sum of normalized dictionary coefficients where the weights are also learned. We tested our framework on two dMRI models with downsampled q-space data, including the intravoxel incoherent motion (IVIM) model and the neurite orientation dispersion and density imaging (NODDI) model. The proposed method achieved up to 11.25 folds of acceleration in scan time and outperformed the other state-of-the-art learning-based methods.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Multi-agent Reinforcement Learning for Dynamic Resource Management in 6G in-X Subnetworks
Authors:
Xiao Du,
Ting Wang,
Qiang Feng,
Chenhui Ye,
Tao Tao,
Yuanming Shi,
Mingsong Chen
Abstract:
The 6G network enables a subnetwork-wide evolution, resulting in a "network of subnetworks". However, due to the dynamic mobility of wireless subnetworks, the data transmission of intra-subnetwork and inter-subnetwork will inevitably interfere with each other, which poses a great challenge to radio resource management. Moreover, most of the existing approaches require the instantaneous channel gai…
▽ More
The 6G network enables a subnetwork-wide evolution, resulting in a "network of subnetworks". However, due to the dynamic mobility of wireless subnetworks, the data transmission of intra-subnetwork and inter-subnetwork will inevitably interfere with each other, which poses a great challenge to radio resource management. Moreover, most of the existing approaches require the instantaneous channel gain between subnetworks, which are usually difficult to be collected. To tackle these issues, in this paper we propose a novel effective intelligent radio resource management method using multi-agent deep reinforcement learning (MARL), which only needs the sum of received power, named received signal strength indicator (RSSI), on each channel instead of channel gains. However, to directly separate individual interference from RSSI is an almost impossible thing. To this end, we further propose a novel MARL architecture, named GA-Net, which integrates a hard attention layer to model the importance distribution of inter-subnetwork relationships based on RSSI and exclude the impact of unrelated subnetworks, and employs a graph attention network with a multi-head attention layer to exact the features and calculate their weights that will impact individual throughput. Experimental results prove that our proposed framework significantly outperforms both traditional and MARL-based methods in various aspects.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion
Authors:
Hyung** Chung,
Eun Sun Lee,
Jong Chul Ye
Abstract:
Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce…
▽ More
Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world sitautions: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-the-art performance, while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
Multi-Scale Hybrid Vision Transformer for Learning Gastric Histology: AI-Based Decision Support System for Gastric Cancer Treatment
Authors:
Yu** Oh,
Go Eun Bae,
Kyung-Hee Kim,
Min-Kyung Yeo,
Jong Chul Ye
Abstract:
Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, existing AI systems are limited in fine-grained cancer subclassifications and have little usability in planning…
▽ More
Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, existing AI systems are limited in fine-grained cancer subclassifications and have little usability in planning cancer treatment. We propose a practical AI system that enables five subclassifications of GC pathology, which can be directly matched to general GC treatment guidance. The AI system is designed to efficiently differentiate multi-classes of GC through multi-scale self-attention mechanism using 2-stage hybrid Vision Transformer (ViT) networks, by mimicking the way how human pathologists understand histology. The AI system demonstrates reliable diagnostic performance by achieving class-average sensitivity of above 0.85 on a total of 1,212 slides from multicentric cohort. Furthermore, AI-assisted pathologists show significantly improved diagnostic sensitivity by 12% in addition to 18% reduced screening time compared to human pathologists. Our results demonstrate that AI-assisted gastric endoscopic screening has a great potential for providing presumptive pathologic opinion and appropriate cancer treatment of gastric cancer in practical clinical settings.
△ Less
Submitted 15 August, 2023; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Phase Aberration Robust Beamformer for Planewave US Using Self-Supervised Learning
Authors:
Shujaat Khan,
Jaeyoung Huh,
Jong Chul Ye
Abstract:
Ultrasound (US) is widely used for clinical imaging applications thanks to its real-time and non-invasive nature. However, its lesion detectability is often limited in many applications due to the phase aberration artefact caused by variations in the speed of sound (SoS) within body parts. To address this, here we propose a novel self-supervised 3D CNN that enables phase aberration robust plane-wa…
▽ More
Ultrasound (US) is widely used for clinical imaging applications thanks to its real-time and non-invasive nature. However, its lesion detectability is often limited in many applications due to the phase aberration artefact caused by variations in the speed of sound (SoS) within body parts. To address this, here we propose a novel self-supervised 3D CNN that enables phase aberration robust plane-wave imaging. Instead of aiming at estimating the SoS distribution as in conventional methods, our approach is unique in that the network is trained in a self-supervised manner to robustly generate a high-quality image from various phase aberrated images by modeling the variation in the speed of sound as stochastic. Experimental results using real measurements from tissue-mimicking phantom and \textit{in vivo} scans confirmed that the proposed method can significantly reduce the phase aberration artifacts and improve the visual quality of deep scans.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
AI can evolve without labels: self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation
Authors:
Sangjoon Park,
Gwanghyun Kim,
Yu** Oh,
Joon Beom Seo,
Sang Min Lee,
** Hwan Kim,
Sungjun Moon,
Jae-Kwang Lim,
Chang Min Park,
Jong Chul Ye
Abstract:
Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, develo** a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially i…
▽ More
Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, develo** a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially in deprived areas. To address this, here we present a novel deep learning framework that uses knowledge distillation through self-supervised learning and self-training, which shows that the performance of the original model trained with a small number of labels can be gradually improved with more unlabeled data. Experimental results show that the proposed framework maintains impressive robustness against a real-world environment and has general applicability to several diagnostic tasks such as tuberculosis, pneumothorax, and COVID-19. Notably, we demonstrated that our model performs even better than those trained with the same amount of labeled data. The proposed framework has a great potential for medical imaging, where plenty of data is accumulated every year, but ground truth annotations are expensive to obtain.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model
Authors:
Boah Kim,
Inhwa Han,
Jong Chul Ye
Abstract:
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological…
▽ More
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability.
△ Less
Submitted 29 September, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction
Authors:
Hyung** Chung,
Byeongsu Sim,
Jong Chul Ye
Abstract:
Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from p…
▽ More
Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from pure Gaussian noise. In this work, we show that starting from Gaussian noise is unnecessary. Instead, starting from a single forward diffusion with better initialization significantly reduces the number of sampling steps in the reverse conditional diffusion. This phenomenon is formally explained by the contraction theory of the stochastic difference equations like our conditional diffusion strategy - the alternating applications of reverse diffusion followed by a non-expansive data consistency step. The new sampling strategy, dubbed Come-Closer-Diffuse-Faster (CCDF), also reveals a new insight on how the existing feed-forward neural network approaches for inverse problems can be synergistically combined with the diffusion models. Experimental results with super-resolution, image inpainting, and compressed sensing MRI demonstrate that our method can achieve state-of-the-art reconstruction performance at significantly reduced sampling steps.
△ Less
Submitted 19 March, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching
Authors:
Kwanyoung Kim,
Taesung Kwon,
Jong Chul Ye
Abstract:
Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining wit…
▽ More
Tweedie distributions are a special case of exponential dispersion models, which are often used in classical statistics as distributions for generalized linear models. Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. Specifically, by combining with the recent Noise2Score self-supervised image denoising approach and the saddle point approximation of Tweedie distribution, we can provide a general closed-form denoising formula that can be used for large classes of noise distributions without ever knowing the underlying noise distribution. Similar to the original Noise2Score, the new approach is composed of two successive steps: score matching using perturbed noisy images, followed by a closed form image denoising formula via distribution-independent Tweedie's formula. This also suggests a systematic algorithm to estimate the noise model and noise parameters for a given noisy image data set. Through extensive experiments, we demonstrate that the proposed method can accurately estimate noise models and parameters, and provide the state-of-the-art self-supervised image denoising performance in the benchmark dataset and real-world dataset.
△ Less
Submitted 4 December, 2021;
originally announced December 2021.
-
Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGAN
Authors:
Jaeyoung Huh,
Shujaat Khan,
Sung** Choi,
Dongkuk Shin,
Eun Sun Lee,
Jong Chul Ye
Abstract:
In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for…
▽ More
In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for example, the image quality is comparable along the beam direction, but significant deterioration in image quality is often observed in the other two axial image planes. To address this, here we propose a novel unsupervised deep learning approach to improve 3-D US image quality. In particular, using {\em unmatched} high-quality 2-D US images as a reference, we trained a recently proposed switchable CycleGAN architecture so that every map** plane in 3-D US can learn the image quality of 2-D US images. Thanks to the switchable architecture, our network can also provide real-time control of image enhancement level based on user preference, which is ideal for a user-centric scanner setup. Extensive experiments with clinical evaluation confirm that our method offers significantly improved image quality as well user-friendly flexibility.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
CLIPstyler: Image Style Transfer with a Single Text Condition
Authors:
Gihyun Kwon,
Jong Chul Ye
Abstract:
Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a s…
▽ More
Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts.
△ Less
Submitted 19 March, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training
Authors:
Sangjoon Park,
Gwanghyun Kim,
Jeongsol Kim,
Boah Kim,
Jong Chul Ye
Abstract:
Federated learning, which shares the weights of the neural network across clients, is gaining attention in the healthcare sector as it enables training on a large corpus of decentralized data while maintaining data privacy. For example, this enables neural network training for COVID-19 diagnosis on chest X-ray (CXR) images without collecting patient CXR data across multiple hospitals. Unfortunatel…
▽ More
Federated learning, which shares the weights of the neural network across clients, is gaining attention in the healthcare sector as it enables training on a large corpus of decentralized data while maintaining data privacy. For example, this enables neural network training for COVID-19 diagnosis on chest X-ray (CXR) images without collecting patient CXR data across multiple hospitals. Unfortunately, the exchange of the weights quickly consumes the network bandwidth if highly expressive network architecture is employed. So-called split learning partially solves this problem by dividing a neural network into a client and a server part, so that the client part of the network takes up less extensive computation resources and bandwidth. However, it is not clear how to find the optimal split without sacrificing the overall network performance. To amalgamate these methods and thereby maximize their distinct strengths, here we show that the Vision Transformer, a recently developed deep learning architecture with straightforward decomposable configuration, is ideally suitable for split learning without sacrificing performance. Even under the non-independent and identically distributed data distribution which emulates a real collaboration between hospitals using CXR datasets from multiple sources, the proposed framework was able to attain performance comparable to data-centralized training. In addition, the proposed framework along with heterogeneous multi-task clients also improves individual task performances including the diagnosis of COVID-19, eliminating the need for sharing large weights with innumerable parameters. Our results affirm the suitability of Transformer for collaborative learning in medical imaging and pave the way forward for future real-world implementations.
△ Less
Submitted 3 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Score-based diffusion models for accelerated MRI
Authors:
Hyung** Chung,
Jong Chul Ye
Abstract:
Score-based diffusion models provide a powerful way to model images using the gradient of the data distribution. Leveraging the learned score function as a prior, here we introduce a way to sample data from a conditional distribution given the measurements, such that the model can be readily used for solving inverse problems in imaging, especially for accelerated MRI. In short, we train a continuo…
▽ More
Score-based diffusion models provide a powerful way to model images using the gradient of the data distribution. Leveraging the learned score function as a prior, here we introduce a way to sample data from a conditional distribution given the measurements, such that the model can be readily used for solving inverse problems in imaging, especially for accelerated MRI. In short, we train a continuous time-dependent score function with denoising score matching. Then, at the inference stage, we iterate between numerical SDE solver and data consistency projection step to achieve reconstruction. Our model requires magnitude images only for training, and yet is able to reconstruct complex-valued data, and even extends to parallel imaging. The proposed method is agnostic to sub-sampling patterns, and can be used with any sampling schemes. Also, due to its generative nature, our approach can quantify uncertainty, which is not possible with standard regression settings. On top of all the advantages, our method also has very strong performance, even beating the models trained with full supervision. With extensive experiments, we verify the superiority of our method in terms of quality and practicality.
△ Less
Submitted 16 July, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Performance Analysis of Fractional Learning Algorithms
Authors:
Abdul Wahab,
Shujaat Khan,
Imran Naseem,
Jong Chul Ye
Abstract:
Fractional learning algorithms are trending in signal processing and adaptive filtering recently. However, it is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed. In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is perfor…
▽ More
Fractional learning algorithms are trending in signal processing and adaptive filtering recently. However, it is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed. In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is performed. Some critical schematic kinks in fractional learning algorithms are identified. Their origins and consequences on the performance of the learning algorithms are discussed and swift ready-witted remedies are proposed. Apposite numerical experiments are conducted to discuss the convergence and efficiency of the fractional learning algorithms in stochastic environments.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Deep Learning for Ultrasound Beamforming
Authors:
Ruud JG van Sloun,
Jong Chul Ye,
Yonina C Eldar
Abstract:
Diagnostic imaging plays a critical role in healthcare, serving as a fundamental asset for timely diagnosis, disease staging and management as well as for treatment choice, planning, guidance, and follow-up. Among the diagnostic imaging options, ultrasound imaging is uniquely positioned, being a highly cost-effective modality that offers the clinician an unmatched and invaluable level of interacti…
▽ More
Diagnostic imaging plays a critical role in healthcare, serving as a fundamental asset for timely diagnosis, disease staging and management as well as for treatment choice, planning, guidance, and follow-up. Among the diagnostic imaging options, ultrasound imaging is uniquely positioned, being a highly cost-effective modality that offers the clinician an unmatched and invaluable level of interaction, enabled by its real-time nature. Ultrasound probes are becoming increasingly compact and portable, with the market demand for low-cost pocket-sized and (in-body) miniaturized devices expanding. At the same time, there is a strong trend towards 3D imaging and the use of high-frame-rate imaging schemes; both accompanied by dramatically increasing data rates that pose a heavy burden on the probe-system communication and subsequent image reconstruction algorithms.
With the demand for high-quality image reconstruction and signal extraction from less (e.g unfocused or parallel) transmissions that facilitate fast imaging, and a push towards compact probes, modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing. Beamforming, the process of map** received ultrasound echoes to the spatial image domain, naturally lies at the heart of the ultrasound image formation chain. In this chapter on Deep Learning for Ultrasound Beamforming, we discuss why and when deep learning methods can play a compelling role in the digital beamforming pipeline, and then show how these data-driven systems can be leveraged for improved ultrasound image reconstruction.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
CarveMix: A Simple Data Augmentation Method for Brain Lesion Segmentation
Authors:
Xinru Zhang,
Chenghao Liu,
Ni Ou,
Xiangzhu Zeng,
Xiaoliang Xiong,
Yizhou Yu,
Zhiwen Liu,
Chuyang Ye
Abstract:
Brain lesion segmentation provides a valuable tool for clinical diagnosis, and convolutional neural networks (CNNs) have achieved unprecedented success in the task. Data augmentation is a widely used strategy that improves the training of CNNs, and the design of the augmentation method for brain lesion segmentation is still an open problem. In this work, we propose a simple data augmentation appro…
▽ More
Brain lesion segmentation provides a valuable tool for clinical diagnosis, and convolutional neural networks (CNNs) have achieved unprecedented success in the task. Data augmentation is a widely used strategy that improves the training of CNNs, and the design of the augmentation method for brain lesion segmentation is still an open problem. In this work, we propose a simple data augmentation approach, dubbed as CarveMix, for CNN-based brain lesion segmentation. Like other "mix"-based methods, such as Mixup and CutMix, CarveMix stochastically combines two existing labeled images to generate new labeled samples. Yet, unlike these augmentation strategies based on image combination, CarveMix is lesion-aware, where the combination is performed with an attention on the lesions and a proper annotation is created for the generated image. Specifically, from one labeled image we carve a region of interest (ROI) according to the lesion location and geometry, and the size of the ROI is sampled from a probability distribution. The carved ROI then replaces the corresponding voxels in a second labeled image, and the annotation of the second image is replaced accordingly as well. In this way, we generate new labeled images for network training and the lesion information is preserved. To evaluate the proposed method, experiments were performed on two brain lesion datasets. The results show that our method improves the segmentation accuracy compared with other simple data augmentation approaches.
△ Less
Submitted 16 August, 2021; v1 submitted 15 August, 2021;
originally announced August 2021.
-
A Resolution-Adaptive 8 mm$^\text{2}$ 9.98 Gb/s 39.7 pJ/b 32-Antenna All-Digital Spatial Equalizer for mmWave Massive MU-MIMO in 65nm CMOS
Authors:
Oscar Castañeda,
Zachariah Boynton,
Seyed Hadi Mirfarshbafan,
Shimin Huang,
Jamie C. Ye,
Alyosha Molnar,
Christoph Studer
Abstract:
All-digital millimeter-wave (mmWave) massive multi-user multiple-input multiple-output (MU-MIMO) receivers enable extreme data rates but require high power consumption. In order to reduce power consumption, this paper presents the first resolution-adaptive all-digital receiver ASIC that is able to adjust the resolution of the data-converters and baseband-processing engine to the instantaneous comm…
▽ More
All-digital millimeter-wave (mmWave) massive multi-user multiple-input multiple-output (MU-MIMO) receivers enable extreme data rates but require high power consumption. In order to reduce power consumption, this paper presents the first resolution-adaptive all-digital receiver ASIC that is able to adjust the resolution of the data-converters and baseband-processing engine to the instantaneous communication scenario. The scalable 32-antenna, 65 nm CMOS receiver occupies a total area of 8 mm$^\text{2}$ and integrates analog-to-digital converters (ADCs) with programmable gain and resolution, beamspace channel estimation, and a resolution-adaptive processing-in-memory spatial equalizer. With 6-bit ADC samples and a 4-bit spatial equalizer, our ASIC achieves a throughput of 9.98 Gb/s while being at least 2x more energy-efficient than state-of-the-art designs.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images
Authors:
Kwanyoung Kim,
Jong Chul Ye
Abstract:
Recently, there has been extensive research interest in training deep networks to denoise images without clean reference. However, the representative approaches such as Noise2Noise, Noise2Void, Stein's unbiased risk estimator (SURE), etc. seem to differ from one another and it is difficult to find the coherent mathematical structure. To address this, here we present a novel approach, called Noise2…
▽ More
Recently, there has been extensive research interest in training deep networks to denoise images without clean reference. However, the representative approaches such as Noise2Noise, Noise2Void, Stein's unbiased risk estimator (SURE), etc. seem to differ from one another and it is difficult to find the coherent mathematical structure. To address this, here we present a novel approach, called Noise2Score, which reveals a missing link in order to unite these seemingly different approaches. Specifically, we show that image denoising problems without clean images can be addressed by finding the mode of the posterior distribution and that the Tweedie's formula offers an explicit solution through the score function (i.e. the gradient of log likelihood). Our method then uses the recent finding that the score function can be stably estimated from the noisy images using the amortized residual denoising autoencoder, the method of which is closely related to Noise2Noise or Nose2Void. Our Noise2Score approach is so universal that the same network training can be used to remove noises from images that are corrupted by any exponential family distributions and noise parameters. Using extensive experiments with Gaussian, Poisson, and Gamma noises, we show that Noise2Score significantly outperforms the state-of-the-art self-supervised denoising methods in the benchmark data set such as (C)BSD68, Set12, and Kodak, etc.
△ Less
Submitted 27 October, 2021; v1 submitted 13 June, 2021;
originally announced June 2021.
-
Knowledge Transfer for Few-shot Segmentation of Novel White Matter Tracts
Authors:
Qi Lu,
Chuyang Ye
Abstract:
Convolutional neural networks (CNNs) have achieved stateof-the-art performance for white matter (WM) tract segmentation based on diffusion magnetic resonance imaging (dMRI). These CNNs require a large number of manual delineations of the WM tracts of interest for training, which are generally labor-intensive and costly. The expensive manual delineation can be a particular disadvantage when novel W…
▽ More
Convolutional neural networks (CNNs) have achieved stateof-the-art performance for white matter (WM) tract segmentation based on diffusion magnetic resonance imaging (dMRI). These CNNs require a large number of manual delineations of the WM tracts of interest for training, which are generally labor-intensive and costly. The expensive manual delineation can be a particular disadvantage when novel WM tracts, i.e., tracts that have not been included in existing manual delineations, are to be analyzed. To accurately segment novel WM tracts, it is desirable to transfer the knowledge learned about existing WM tracts, so that even with only a few delineations of the novel WM tracts, CNNs can learn adequately for the segmentation. In this paper, we explore the transfer of such knowledge to the segmentation of novel WM tracts in the few-shot setting. Although a classic fine-tuning strategy can be used for the purpose, the information in the last task-specific layer for segmenting existing WM tracts is completely discarded. We hypothesize that the weights of this last layer can bear valuable information for segmenting the novel WM tracts and thus completely discarding the information is not optimal. In particular, we assume that the novel WM tracts can correlate with existing WM tracts and the segmentation of novel WM tracts can be predicted with the logits of existing WM tracts. In this way, better initialization of the last layer than random initialization can be achieved for fine-tuning. Further, we show that a more adaptive use of the knowledge in the last layer for segmenting existing WM tracts can be conveniently achieved by simply inserting a warmup stage before classic fine-tuning. The proposed method was evaluated on a publicly available dMRI dataset, where we demonstrate the benefit of our method for few-shot segmentation of novel WM tracts.
△ Less
Submitted 1 June, 2021; v1 submitted 30 May, 2021;
originally announced May 2021.
-
Unsupervised Deep Learning Methods for Biological Image Reconstruction and Enhancement
Authors:
Mehmet Akçakaya,
Burhaneddin Yaman,
Hyung** Chung,
Jong Chul Ye
Abstract:
Recently, deep learning approaches have become the main research frontier for biological image reconstruction and enhancement problems thanks to their high performance, along with their ultra-fast inference times. However, due to the difficulty of obtaining matched reference data for supervised learning, there has been increasing interest in unsupervised learning approaches that do not need paired…
▽ More
Recently, deep learning approaches have become the main research frontier for biological image reconstruction and enhancement problems thanks to their high performance, along with their ultra-fast inference times. However, due to the difficulty of obtaining matched reference data for supervised learning, there has been increasing interest in unsupervised learning approaches that do not need paired reference data. In particular, self-supervised learning and generative models have been successfully used for various biological imaging applications. In this paper, we overview these approaches from a coherent perspective in the context of classical inverse problems, and discuss their applications to biological imaging, including electron, fluorescence and deconvolution microscopy, optical diffraction tomography and functional neuroimaging.
△ Less
Submitted 22 November, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Simultaneous super-resolution and motion artifact removal in diffusion-weighted MRI using unsupervised deep learning
Authors:
Hyung** Chung,
Jaehyun Kim,
Jeong Hee Yoon,
Jeong Min Lee,
Jong Chul Ye
Abstract:
Diffusion-weighted MRI is nowadays performed routinely due to its prognostic ability, yet the quality of the scans are often unsatisfactory which can subsequently hamper the clinical utility. To overcome the limitations, here we propose a fully unsupervised quality enhancement scheme, which boosts the resolution and removes the motion artifact simultaneously. This process is done by first training…
▽ More
Diffusion-weighted MRI is nowadays performed routinely due to its prognostic ability, yet the quality of the scans are often unsatisfactory which can subsequently hamper the clinical utility. To overcome the limitations, here we propose a fully unsupervised quality enhancement scheme, which boosts the resolution and removes the motion artifact simultaneously. This process is done by first training the network using optimal transport driven cycleGAN with stochastic degradation block which learns to remove aliasing artifacts and enhance the resolution, then using the trained network in the test stage by utilizing bootstrap subsampling and aggregation for motion artifact suppression. We further show that we can control the trade-off between the amount of artifact correction and resolution by controlling the bootstrap subsampling ratio at the inference stage. To the best of our knowledge, the proposed method is the first to tackle super-resolution and motion artifact correction simultaneously in the context of MRI using unsupervised learning. We demonstrate the efficiency of our method by applying it to both quantitative evaluation using simulation study, and to in vivo diffusion-weighted MR scans, which shows that our method is superior to the current state-of-the-art methods. The proposed method is flexible in that it can be applied to various quality enhancement schemes in other types of MR scans, and also directly to the quality enhancement of apparent diffusion coefficient maps.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Feature Disentanglement in generating three-dimensional structure from two-dimensional slice with sliceGAN
Authors:
Hyung** Chung,
Jong Chul Ye
Abstract:
Deep generative models are known to be able to model arbitrary probability distributions. Among these, a recent deep generative model, dubbed sliceGAN, proposed a new way of using the generative adversarial network (GAN) to capture the micro-structural characteristics of a two-dimensional (2D) slice and generate three-dimensional (3D) volumes with similar properties. While 3D micrographs are large…
▽ More
Deep generative models are known to be able to model arbitrary probability distributions. Among these, a recent deep generative model, dubbed sliceGAN, proposed a new way of using the generative adversarial network (GAN) to capture the micro-structural characteristics of a two-dimensional (2D) slice and generate three-dimensional (3D) volumes with similar properties. While 3D micrographs are largely beneficial in simulating diverse material behavior, they are often much harder to obtain than their 2D counterparts. Hence, sliceGAN opens up many interesting directions of research by learning the representative distribution from 2D slices, and transferring the learned knowledge to generate arbitrary 3D volumes. However, one limitation of sliceGAN is that latent space steering is not possible. Hence, we combine sliceGAN with AdaIN to endow the model with the ability to disentangle the features and control the synthesis.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Cycle-free CycleGAN using Invertible Generator for Unsupervised Low-Dose CT Denoising
Authors:
Taesung Kwon,
Jong Chul Ye
Abstract:
Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposa…
▽ More
Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposal of tunable CycleGAN with Adaptive Instance Normalization (AdaIN) alleviates the problem in part by using a single generator. However, two discriminators and an additional AdaIN code generator are still required for training. To solve this problem, here we present a novel cycle-free Cycle-GAN architecture, which consists of a single generator and a discriminator but still guarantees cycle consistency. The main innovation comes from the observation that the use of an invertible generator automatically fulfills the cycle consistency condition and eliminates the additional discriminator in the CycleGAN formulation. To make the invertible generator more effective, our network is implemented in the wavelet residual domain. Extensive experiments using various levels of low-dose CT images confirm that our method can significantly improve denoising performance using only 10% of learnable parameters and faster training time compared to the conventional CycleGAN.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning
Authors:
Chunhua Ye,
Zhong Yin,
Chenxi Wu,
Xiayidai Abulaiti,
Yixing Zhang,
Zhenqi Sun,
Jianhua Zhang
Abstract:
Mental fatigue increases the risk of operator error in language comprehension tasks. In order to prevent operator performance degradation, we used EEG signals to assess the mental fatigue of operators in human-computer systems. This study presents an experimental design for fatigue detection in language comprehension tasks. We obtained EEG signals from a 14-channel wireless EEG detector in 15 heal…
▽ More
Mental fatigue increases the risk of operator error in language comprehension tasks. In order to prevent operator performance degradation, we used EEG signals to assess the mental fatigue of operators in human-computer systems. This study presents an experimental design for fatigue detection in language comprehension tasks. We obtained EEG signals from a 14-channel wireless EEG detector in 15 healthy participants. Each participant was given a cognitive test of a language comprehension task, in the form of multiple choice questions, in which pronoun references were selected between nominal and surrogate sentences. In this paper, the 2400 EEG fragments collected are divided into three data sets according to different utilization rates, namely 1200s data set with 50% utilization rate, 1500s data set with 62.5% utilization rate, and 1800s data set with 75% utilization rate. In the aspect of feature extraction, different EEG features were extracted, including time domain features, frequency domain features and entropy features, and the effects of different features and feature combinations on classification accuracy were explored. In terms of classification, we introduced the Convolutional Neural Network (CNN) method as the preferred method, It was compared with Least Squares Support Vector Machines(LSSVM),Support Vector Machines(SVM),Logistic Regression (LR), Random Forest(RF), Naive Bayes (NB), K-Nearest Neighbor (KNN) and Decision Tree(DT).According to the results, the classification accuracy of convolutional neural network (CNN) is higher than that of other classification methods. The classification results show that the classification accuracy of 1200S dataset is higher than the other two datasets. The combination of Frequency and entropy feature and CNN has the highest classification accuracy, which is 85.34%.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Vision Transformer using Low-level Chest X-ray Feature Corpus for COVID-19 Diagnosis and Severity Quantification
Authors:
Sangjoon Park,
Gwanghyun Kim,
Yu** Oh,
Joon Beom Seo,
Sang Min Lee,
** Hwan Kim,
Sungjun Moon,
Jae-Kwang Lim,
Jong Chul Ye
Abstract:
Develo** a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data…
▽ More
Develo** a robust algorithm to diagnose and quantify the severity of COVID-19 using Chest X-ray (CXR) requires a large number of well-curated COVID-19 datasets, which is difficult to collect under the global COVID-19 pandemic. On the other hand, CXR data with other findings are abundant. This situation is ideally suited for the Vision Transformer (ViT) architecture, where a lot of unlabeled data can be used through structural modeling by the self-attention mechanism. However, the use of existing ViT is not optimal, since feature embedding through direct patch flattening or ResNet backbone in the standard ViT is not intended for CXR. To address this problem, here we propose a novel Vision Transformer that utilizes low-level CXR feature corpus obtained from a backbone network that extracts common CXR findings. Specifically, the backbone network is first trained with large public datasets to detect common abnormal findings such as consolidation, opacity, edema, etc. Then, the embedded features from the backbone network are used as corpora for a Transformer model for the diagnosis and the severity quantification of COVID-19. We evaluate our model on various external test datasets from totally different institutions to evaluate the generalization capability. The experimental results confirm that our model can achieve the state-of-the-art performance in both diagnosis and severity quantification tasks with superior generalization capability, which are sine qua non of widespread deployment.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
CXR Segmentation by AdaIN-based Domain Adaptation and Knowledge Distillation
Authors:
Yu** Oh,
Jong Chul Ye
Abstract:
As segmentation labels are scarce, extensive researches have been conducted to train segmentation networks with domain adaptation, semi-supervised or self-supervised learning techniques to utilize abundant unlabeled dataset. However, these approaches appear different from each other, so it is not clear how these approaches can be combined for better performance. Inspired by recent multi-domain ima…
▽ More
As segmentation labels are scarce, extensive researches have been conducted to train segmentation networks with domain adaptation, semi-supervised or self-supervised learning techniques to utilize abundant unlabeled dataset. However, these approaches appear different from each other, so it is not clear how these approaches can be combined for better performance. Inspired by recent multi-domain image translation approaches, here we propose a novel segmentation framework using adaptive instance normalization (AdaIN), so that a single generator is trained to perform both domain adaptation and semi-supervised segmentation tasks via knowledge distillation by simply changing task-specific AdaIN codes. Specifically, our framework is designed to deal with difficult situations in chest X-ray radiograph (CXR) segmentation, where labels are only available for normal data, but the trained model should be applied to both normal and abnormal data. The proposed network demonstrates great generalizability under domain shift and achieves the state-of-the-art performance for abnormal CXR segmentation.
△ Less
Submitted 11 October, 2022; v1 submitted 12 April, 2021;
originally announced April 2021.