Search | arXiv e-print repository

Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration

Authors: Mingyuan Meng, Dagan Feng, Lei Bi, **man Kim

Abstract: Deformable image registration is a fundamental step for medical image analysis. Recently, transformers have been used for registration and outperformed Convolutional Neural Networks (CNNs). Transformers can capture long-range dependence among image features, which have been shown beneficial for registration. However, due to the high computation/memory loads of self-attention, transformers are typi… ▽ More Deformable image registration is a fundamental step for medical image analysis. Recently, transformers have been used for registration and outperformed Convolutional Neural Networks (CNNs). Transformers can capture long-range dependence among image features, which have been shown beneficial for registration. However, due to the high computation/memory loads of self-attention, transformers are typically used at downsampled feature resolutions and cannot capture fine-grained long-range dependence at the full image resolution. This limits deformable registration as it necessitates precise dense correspondence between each image pixel. Multi-layer Perceptrons (MLPs) without self-attention are efficient in computation/memory usage, enabling the feasibility of capturing fine-grained long-range dependence at full resolution. Nevertheless, MLPs have not been extensively explored for image registration and are lacking the consideration of inductive bias crucial for medical registration tasks. In this study, we propose the first correlation-aware MLP-based registration network (CorrMLP) for deformable medical image registration. Our CorrMLP introduces a correlation-aware multi-window MLP block in a novel coarse-to-fine registration architecture, which captures fine-grained multi-range dependence to perform correlation-aware coarse-to-fine registration. Extensive experiments with seven public medical datasets show that our CorrMLP outperforms state-of-the-art deformable registration methods. △ Less

Submitted 12 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

Comments: Accepted at CVPR2024 as Oral Presentation && Best Paper Candidate

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9645-9654

arXiv:2311.16707 [pdf]

Full-resolution MLPs Empower Medical Dense Prediction

Authors: Mingyuan Meng, Yuxin Xue, Dagan Feng, Lei Bi, **man Kim

Abstract: Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-r… ▽ More Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to the high computational complexity and large memory consumption of self-attention operations, transformers are usually used at downsampled feature resolutions. Such usage cannot effectively leverage the tissue-level textural information available only at the full image resolution. This textural information is crucial for medical dense prediction as it can differentiate the subtle human anatomy in medical images. In this study, we hypothesize that Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction where tissue-level details dominate the performance, as MLPs enable long-range dependence at the full image resolution. To validate our hypothesis, we develop a full-resolution hierarchical MLP framework that uses MLPs beginning from the full image resolution. We evaluate this framework with various MLP blocks on a wide range of medical dense prediction tasks including restoration, registration, and segmentation. Extensive experiments on six public well-benchmarked datasets show that, by simply using MLPs at full resolution, our framework outperforms its CNN and transformer counterparts and achieves state-of-the-art performance on various medical dense prediction tasks. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Under Review

arXiv:2310.15550 [pdf]

PET Synthesis via Self-supervised Adaptive Residual Estimation Generative Adversarial Network

Authors: Yuxin Xue, Lei Bi, Yige Peng, Michael Fulham, David Dagan Feng, **man Kim

Abstract: Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low… ▽ More Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low-to-high image recovery methods. However, these methods are prone to exhibiting discrepancies in texture and structure between synthesized and real images. Furthermore, the distribution shift between low-dose PET and standard PET has not been fully investigated. To address these issues, we developed a self-supervised adaptive residual estimation generative adversarial network (SS-AEGAN). We introduce (1) An adaptive residual estimation map** mechanism, AE-Net, designed to dynamically rectify the preliminary synthesized PET images by taking the residual map between the low-dose PET and synthesized output as the input, and (2) A self-supervised pre-training strategy to enhance the feature representation of the coarse generator. Our experiments with a public benchmark dataset of total-body PET images show that SS-AEGAN consistently outperformed the state-of-the-art synthesis methods with various dose reduction factors. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.05271 [pdf]

AutoFuse: Automatic Fusion Networks for Deformable Medical Image Registration

Authors: Mingyuan Meng, Michael Fulham, Dagan Feng, Lei Bi, **man Kim

Abstract: Deformable image registration aims to find a dense non-linear spatial correspondence between a pair of images, which is a crucial step for many medical tasks such as tumor growth monitoring and population analysis. Recently, Deep Neural Networks (DNNs) have been widely recognized for their ability to perform fast end-to-end registration. However, DNN-based registration needs to explore the spatial… ▽ More Deformable image registration aims to find a dense non-linear spatial correspondence between a pair of images, which is a crucial step for many medical tasks such as tumor growth monitoring and population analysis. Recently, Deep Neural Networks (DNNs) have been widely recognized for their ability to perform fast end-to-end registration. However, DNN-based registration needs to explore the spatial information of each image and fuse this information to characterize spatial correspondence. This raises an essential question: what is the optimal fusion strategy to characterize spatial correspondence? Existing fusion strategies (e.g., early fusion, late fusion) were empirically designed to fuse information by manually defined prior knowledge, which inevitably constrains the registration performance within the limits of empirical designs. In this study, we depart from existing empirically-designed fusion strategies and develop a data-driven fusion strategy for deformable image registration. To achieve this, we propose an Automatic Fusion network (AutoFuse) that provides flexibility to fuse information at many potential locations within the network. A Fusion Gate (FG) module is also proposed to control how to fuse information at each potential network location based on training data. Our AutoFuse can automatically optimize its fusion strategy during training and can be generalizable to both unsupervised registration (without any labels) and semi-supervised registration (with weak labels provided for partial training data). Extensive experiments on two well-benchmarked medical registration tasks (inter- and intra-patient registration) with eight public datasets show that our AutoFuse outperforms state-of-the-art unsupervised and semi-supervised registration methods. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Under Review

arXiv:2307.03427 [pdf]

doi 10.1007/978-3-031-43987-2_39

Merging-Diverging Hybrid Transformer Networks for Survival Prediction in Head and Neck Cancer

Authors: Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, **man Kim

Abstract: Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific info… ▽ More Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Early Accepted at International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023)

Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 400-410, 2023

arXiv:2307.03421 [pdf]

doi 10.1007/978-3-031-43999-5_71

Non-iterative Coarse-to-fine Transformer Networks for Joint Affine and Deformable Image Registration

Authors: Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, **man Kim

Abstract: Image registration is a fundamental requirement for medical image analysis. Deep registration methods based on deep learning have been widely recognized for their capabilities to perform fast end-to-end registration. Many deep registration methods achieved state-of-the-art performance by performing coarse-to-fine registration, where multiple registration steps were iterated with cascaded networks.… ▽ More Image registration is a fundamental requirement for medical image analysis. Deep registration methods based on deep learning have been widely recognized for their capabilities to perform fast end-to-end registration. Many deep registration methods achieved state-of-the-art performance by performing coarse-to-fine registration, where multiple registration steps were iterated with cascaded networks. Recently, Non-Iterative Coarse-to-finE (NICE) registration methods have been proposed to perform coarse-to-fine registration in a single network and showed advantages in both registration accuracy and runtime. However, existing NICE registration methods mainly focus on deformable registration, while affine registration, a common prerequisite, is still reliant on time-consuming traditional optimization-based methods or extra affine registration networks. In addition, existing NICE registration methods are limited by the intrinsic locality of convolution operations. Transformers may address this limitation for their capabilities to capture long-range dependency, but the benefits of using transformers for NICE registration have not been explored. In this study, we propose a Non-Iterative Coarse-to-finE Transformer network (NICE-Trans) for image registration. Our NICE-Trans is the first deep registration method that (i) performs joint affine and deformable coarse-to-fine registration within a single network, and (ii) embeds transformers into a NICE registration framework to model long-range relevance between images. Extensive experiments with seven public datasets show that our NICE-Trans outperforms state-of-the-art registration methods on both registration accuracy and runtime. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted at International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023)

Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp.750-760, 2023

arXiv:2305.09946 [pdf]

AdaMSS: Adaptive Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images

Authors: Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, **man Kim

Abstract: Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-… ▽ More Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-Task Learning (MTL). However, these deep survival models have difficulties in exploring out-of-tumor prognostic information. In addition, existing deep survival models are unable to effectively leverage multi-modality images. Empirically-designed fusion strategies were commonly adopted to fuse multi-modality information via task-specific manually-designed networks, thus limiting the adaptability to different scenarios. In this study, we propose an Adaptive Multi-modality Segmentation-to-Survival model (AdaMSS) for survival prediction from PET/CT images. Instead of adopting MTL, we propose a novel Segmentation-to-Survival Learning (SSL) strategy, where our AdaMSS is trained for tumor segmentation and survival prediction sequentially in two stages. This strategy enables the AdaMSS to focus on tumor regions in the first stage and gradually expand its focus to include other prognosis-related regions in the second stage. We also propose a data-driven strategy to fuse multi-modality information, which realizes adaptive optimization of fusion strategies based on training data during training. With the SSL and data-driven fusion strategies, our AdaMSS is designed as an adaptive model that can self-adapt its focus regions and fusion strategy for different training stages. Extensive experiments with two large clinical datasets show that our AdaMSS outperforms state-of-the-art survival prediction methods. △ Less

Submitted 19 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Under Review

arXiv:2304.00725 [pdf]

CG-3DSRGAN: A classification guided 3D generative adversarial network for image quality recovery from low-dose PET images

Authors: Yuxin Xue, Yige Peng, Lei Bi, Dagan Feng, **man Kim

Abstract: Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with mi… ▽ More Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with minimum low-dose, Convolutional Neural Networks (CNNs) based methods have been developed for high quality PET synthesis from its low-dose counterparts. Previous CNNs-based studies usually directly map low-dose PET into features space without consideration of different dose reduction level. In this study, a novel approach named CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) is presented. Specifically, a multi-tasking coarse generator, guided by a classification head, allows for a more comprehensive understanding of the noise-level features present in the low-dose data, resulting in improved image synthesis. Moreover, to recover spatial details of standard PET, an auxiliary super resolution network - Contextual-Net - is proposed as a second-stage training to narrow the gap between coarse prediction and standard PET. We compared our method to the state-of-the-art methods on whole-body PET with different dose reduction factors (DRFs). Experiments demonstrate our method can outperform others on all DRF. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2211.05409 [pdf]

doi 10.1007/978-3-031-27420-6_14

Radiomics-enhanced Deep Multi-task Learning for Outcome Prediction in Head and Neck Cancer

Authors: Mingyuan Meng, Lei Bi, Dagan Feng, **man Kim

Abstract: Outcome prediction is crucial for head and neck cancer patients as it can provide prognostic information for early treatment planning. Radiomics methods have been widely used for outcome prediction from medical images. However, these methods are limited by their reliance on intractable manual segmentation of tumor regions. Recently, deep learning methods have been proposed to perform end-to-end ou… ▽ More Outcome prediction is crucial for head and neck cancer patients as it can provide prognostic information for early treatment planning. Radiomics methods have been widely used for outcome prediction from medical images. However, these methods are limited by their reliance on intractable manual segmentation of tumor regions. Recently, deep learning methods have been proposed to perform end-to-end outcome prediction so as to remove the reliance on manual segmentation. Unfortunately, without segmentation masks, these methods will take the whole image as input, such that makes them difficult to focus on tumor regions and potentially unable to fully leverage the prognostic information within the tumor regions. In this study, we propose a radiomics-enhanced deep multi-task framework for outcome prediction from PET/CT images, in the context of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022). In our framework, our novelty is to incorporate radiomics as an enhancement to our recently proposed Deep Multi-task Survival model (DeepMTS). The DeepMTS jointly learns to predict the survival risk scores of patients and the segmentation masks of tumor regions. Radiomics features are extracted from the predicted tumor regions and combined with the predicted survival risk scores for final outcome prediction, through which the prognostic information in tumor regions can be further leveraged. Our method achieved a C-index of 0.681 on the testing set, placing the 2nd on the leaderboard with only 0.00068 lower in C-index than the 1st place. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022)

Journal ref: Head and Neck Tumor Segmentation and Outcome Prediction (HECKTOR 2022), pp.135-143

arXiv:2210.15808 [pdf]

Hyper-Connected Transformer Network for Multi-Modality PET-CT Segmentation

Authors: Lei Bi, Michael Fulham, Shaoli Song, David Dagan Feng, **man Kim

Abstract: [18F]-Fluorodeoxyglucose (FDG) positron emission tomography - computed tomography (PET-CT) has become the imaging modality of choice for diagnosing many cancers. Co-learning complementary PET-CT imaging features is a fundamental requirement for automatic tumor segmentation and for develo** computer aided cancer diagnosis systems. In this study, we propose a hyper-connected transformer (HCT) netw… ▽ More [18F]-Fluorodeoxyglucose (FDG) positron emission tomography - computed tomography (PET-CT) has become the imaging modality of choice for diagnosing many cancers. Co-learning complementary PET-CT imaging features is a fundamental requirement for automatic tumor segmentation and for develo** computer aided cancer diagnosis systems. In this study, we propose a hyper-connected transformer (HCT) network that integrates a transformer network (TN) with a hyper connected fusion for multi-modality PET-CT images. The TN was leveraged for its ability to provide global dependencies in image feature learning, which was achieved by using image patch embeddings with a self-attention mechanism to capture image-wide contextual information. We extended the single-modality definition of TN with multiple TN based branches to separately extract image features. We also introduced a hyper connected fusion to fuse the contextual and complementary image features across multiple transformers in an iterative manner. Our results with two clinical datasets show that HCT achieved better performance in segmentation accuracy when compared to the existing methods. △ Less

Submitted 7 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: EMBC 2023

arXiv:2209.07705 [pdf, other]

Automatic Tumor Segmentation via False Positive Reduction Network for Whole-Body Multi-Modal PET/CT Images

Authors: Yige Peng, **man Kim, Dagan Feng, Lei Bi

Abstract: Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) has been routinely used in the assessment of common cancers, such as lung cancer, lymphoma, and melanoma. This is mainly attributed to the fact that PET/CT combines the high sensitivity for tumor detection of PET and anatomical information from CT. In PET/CT image assessment, automatic tumor segment… ▽ More Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) has been routinely used in the assessment of common cancers, such as lung cancer, lymphoma, and melanoma. This is mainly attributed to the fact that PET/CT combines the high sensitivity for tumor detection of PET and anatomical information from CT. In PET/CT image assessment, automatic tumor segmentation is an important step, and in recent years, deep learning based methods have become the state-of-the-art. Unfortunately, existing methods tend to over-segment the tumor regions and include regions such as the normal high uptake organs, inflammation, and other infections. In this study, we introduce a false positive reduction network to overcome this limitation. We firstly introduced a self-supervised pre-trained global segmentation module to coarsely delineate the candidate tumor regions using a self-supervised pre-trained encoder. The candidate tumor regions were then refined by removing false positives via a local refinement module. Our experiments with the MICCAI 2022 Automated Lesion Segmentation in Whole-Body FDG-PET/CT (AutoPET) challenge dataset showed that our method achieved a dice score of 0.9324 with the preliminary testing data and was ranked 1st place in dice on the leaderboard. Our method was also ranked in the top 7 methods on the final testing data, the final ranking will be announced during the 2022 MICCAI AutoPET workshop. Our code is available at: https://github.com/YigePeng/AutoPET_False_Positive_Reduction. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: Pre-print paper for 2022 MICCAI AutoPET Challenge

arXiv:2112.06979 [pdf, other]

The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients

Authors: Bhakti Baheti, Satrajit Chakrabarty, Hamed Akbari, Michel Bilello, Benedikt Wiestler, Julian Schwarting, Evan Calabrese, Jeffrey Rudie, Syed Abidi, Mina Mousa, Javier Villanueva-Meyer, Brandon K. K. Fields, Florian Kofler, Russell Takeshi Shinohara, Juan Eugenio Iglesias, Tony C. W. Mok, Albert C. S. Chung, Marek Wodzinski, Artur Jurgas, Niccolo Marini, Manfredo Atzori, Henning Muller, Christoph Grobroehmer, Hanna Siebert, Lasse Hansen , et al. (48 additional authors not shown)

Abstract: Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registr… ▽ More Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at https://bratsreg.github.io/. △ Less

Submitted 17 April, 2024; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2109.14805 [pdf, other]

Unsupervised Landmark Detection Based Spatiotemporal Motion Estimation for 4D Dynamic Medical Images

Authors: Yuyu Guo, Lei Bi, Dongming Wei, Liyun Chen, Zhengbin Zhu, Dagan Feng, Ruiyan Zhang, Qian Wang, **man Kim

Abstract: Motion estimation is a fundamental step in dynamic medical image processing for the assessment of target organ anatomy and function. However, existing image-based motion estimation methods, which optimize the motion field by evaluating the local image similarity, are prone to produce implausible estimation, especially in the presence of large motion. In this study, we provide a novel motion estima… ▽ More Motion estimation is a fundamental step in dynamic medical image processing for the assessment of target organ anatomy and function. However, existing image-based motion estimation methods, which optimize the motion field by evaluating the local image similarity, are prone to produce implausible estimation, especially in the presence of large motion. In this study, we provide a novel motion estimation framework of Dense-Sparse-Dense (DSD), which comprises two stages. In the first stage, we process the raw dense image to extract sparse landmarks to represent the target organ anatomical topology and discard the redundant information that is unnecessary for motion estimation. For this purpose, we introduce an unsupervised 3D landmark detection network to extract spatially sparse but representative landmarks for the target organ motion estimation. In the second stage, we derive the sparse motion displacement from the extracted sparse landmarks of two images of different time points. Then, we present a motion reconstruction network to construct the motion field by projecting the sparse landmarks displacement back into the dense image domain. Furthermore, we employ the estimated motion field from our two-stage DSD framework as initialization and boost the motion estimation quality in light-weight yet effective iterative optimization. We evaluate our method on two dynamic medical imaging tasks to model cardiac motion and lung respiratory motion, respectively. Our method has produced superior motion estimation accuracy compared to existing comparative methods. Besides, the extensive experimental results demonstrate that our solution can extract well representative anatomical landmarks without any requirement of manual annotation. Our code is publicly available online. △ Less

Submitted 7 November, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: accepted by IEEE Transactions on Cybernetics

arXiv:2109.07711 [pdf]

doi 10.1109/JBHI.2022.3181791

DeepMTS: Deep Multi-task Learning for Survival Prediction in Patients with Advanced Nasopharyngeal Carcinoma using Pretreatment PET/CT

Authors: Mingyuan Meng, Bingxin Gu, Lei Bi, Shaoli Song, David Dagan Feng, **man Kim

Abstract: Nasopharyngeal Carcinoma (NPC) is a malignant epithelial cancer arising from the nasopharynx. Survival prediction is a major concern for NPC patients, as it provides early prognostic information to plan treatments. Recently, deep survival models based on deep learning have demonstrated the potential to outperform traditional radiomics-based survival prediction models. Deep survival models usually… ▽ More Nasopharyngeal Carcinoma (NPC) is a malignant epithelial cancer arising from the nasopharynx. Survival prediction is a major concern for NPC patients, as it provides early prognostic information to plan treatments. Recently, deep survival models based on deep learning have demonstrated the potential to outperform traditional radiomics-based survival prediction models. Deep survival models usually use image patches covering the whole target regions (e.g., nasopharynx for NPC) or containing only segmented tumor regions as the input. However, the models using the whole target regions will also include non-relevant background information, while the models using segmented tumor regions will disregard potentially prognostic information existing out of primary tumors (e.g., local lymph node metastasis and adjacent tissue invasion). In this study, we propose a 3D end-to-end Deep Multi-Task Survival model (DeepMTS) for joint survival prediction and tumor segmentation in advanced NPC from pretreatment PET/CT. Our novelty is the introduction of a hard-sharing segmentation backbone to guide the extraction of local features related to the primary tumors, which reduces the interference from non-relevant background information. In addition, we also introduce a cascaded survival network to capture the prognostic information existing out of primary tumors and further leverage the global tumor information (e.g., tumor size, shape, and locations) derived from the segmentation backbone. Our experiments with two clinical datasets demonstrate that our DeepMTS can consistently outperform traditional radiomics-based survival prediction models and existing deep survival models. △ Less

Submitted 7 June, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted at IEEE Journal of Biomedical and Health Informatics (JBHI)

Journal ref: IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 9, pp. 4497-4507, 2022

arXiv:2106.04961 [pdf]

Spatio-Temporal Dual-Stream Neural Network for Sequential Whole-Body PET Segmentation

Authors: Kai-Chieh Liang, Lei Bi, Ashnil Kumar, Michael Fulham, **man Kim

Abstract: Sequential whole-body 18F-Fluorodeoxyglucose (FDG) positron emission tomography (PET) scans are regarded as the imaging modality of choice for the assessment of treatment response in the lymphomas because they detect treatment response when there may not be changes on anatomical imaging. Any computerized analysis of lymphomas in whole-body PET requires automatic segmentation of the studies so that… ▽ More Sequential whole-body 18F-Fluorodeoxyglucose (FDG) positron emission tomography (PET) scans are regarded as the imaging modality of choice for the assessment of treatment response in the lymphomas because they detect treatment response when there may not be changes on anatomical imaging. Any computerized analysis of lymphomas in whole-body PET requires automatic segmentation of the studies so that sites of disease can be quantitatively monitored over time. State-of-the-art PET image segmentation methods are based on convolutional neural networks (CNNs) given their ability to leverage annotated datasets to derive high-level features about the disease process. Such methods, however, focus on PET images from a single time-point and discard information from other scans or are targeted towards specific organs and cannot cater for the multiple structures in whole-body PET images. In this study, we propose a spatio-temporal 'dual-stream' neural network (ST-DSNN) to segment sequential whole-body PET scans. Our ST-DSNN learns and accumulates image features from the PET images done over time. The accumulated image features are used to enhance the organs / structures that are consistent over time to allow easier identification of sites of active lymphoma. Our results show that our method outperforms the state-of-the-art PET image segmentation methods. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: 16 pages

arXiv:2104.11416 [pdf]

doi 10.1088/1361-6560/ac3d17

Predicting Distant Metastases in Soft-Tissue Sarcomas from PET-CT scans using Constrained Hierarchical Multi-Modality Feature Learning

Authors: Yige Peng, Lei Bi, Ashnil Kumar, Michael Fulham, Dagan Feng, **man Kim

Abstract: Distant metastases (DM) refer to the dissemination of tumors, usually, beyond the organ where the tumor originated. They are the leading cause of death in patients with soft-tissue sarcomas (STSs). Positron emission tomography-computed tomography (PET-CT) is regarded as the imaging modality of choice for the management of STSs. It is difficult to determine from imaging studies which STS patients w… ▽ More Distant metastases (DM) refer to the dissemination of tumors, usually, beyond the organ where the tumor originated. They are the leading cause of death in patients with soft-tissue sarcomas (STSs). Positron emission tomography-computed tomography (PET-CT) is regarded as the imaging modality of choice for the management of STSs. It is difficult to determine from imaging studies which STS patients will develop metastases. 'Radiomics' refers to the extraction and analysis of quantitative features from medical images and it has been employed to help identify such tumors. The state-of-the-art in radiomics is based on convolutional neural networks (CNNs). Most CNNs are designed for single-modality imaging data (CT or PET alone) and do not exploit the information embedded in PET-CT where there is a combination of an anatomical and functional imaging modality. Furthermore, most radiomic methods rely on manual input from imaging specialists for tumor delineation, definition and selection of radiomic features. This approach, however, may not be scalable to tumors with complex boundaries and where there are multiple other sites of disease. We outline a new 3D CNN to help predict DM in STS patients from PET-CT data. The 3D CNN uses a constrained feature learning module and a hierarchical multi-modality feature learning module that leverages the complementary information from the modalities to focus on semantically important regions. Our results on a public PET-CT dataset of STS patients show that multi-modal information improves the ability to identify those patients who develop DM. Further our method outperformed all other related state-of-the-art methods. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: Under Review

arXiv:2103.05220 [pdf]

doi 10.3389/fonc.2022.899351

Prediction of 5-year Progression-Free Survival in Advanced Nasopharyngeal Carcinoma with Pretreatment PET/CT using Multi-Modality Deep Learning-based Radiomics

Authors: Bingxin Gu, Mingyuan Meng, Lei Bi, **man Kim, David Dagan Feng, Shaoli Song

Abstract: Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of… ▽ More Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of 257 patients (170/87 in internal/external cohorts) with advanced NPC (TNM stage III or IVa) were enrolled. We developed an end-to-end multi-modality DLR model, in which a 3D convolutional neural network was optimized to extract deep features from pretreatment PET/CT images and predict the probability of 5-year PFS. TNM stage, as a high-level clinical feature, could be integrated into our DLR model to further improve the prognostic performance. To compare conventional radiomics and DLR, 1456 handcrafted features were extracted, and optimal conventional radiomics methods were selected from 54 cross-combinations of 6 feature selection methods and 9 classification methods. In addition, risk group stratification was performed with clinical signature, conventional radiomics signature, and DLR signature. Results: Our multi-modality DLR model using both PET and CT achieved higher prognostic performance than the optimal conventional radiomics method. Furthermore, the multi-modality DLR model outperformed single-modality DLR models using only PET or only CT. For risk group stratification, the conventional radiomics signature and DLR signature enabled significant differences between the high- and low-risk patient groups in both internal and external cohorts, while the clinical signature failed in the external cohort. Conclusion: Our study identified potential prognostic tools for survival prediction in advanced NPC, suggesting that DLR could provide complementary values to the current TNM staging. △ Less

Submitted 4 July, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Accepted at Frontiers in Oncology

Journal ref: Frontiers in Oncology, vol. 12, pp. 899352, 2022

arXiv:2103.05213 [pdf]

doi 10.1016/j.neuroimage.2022.119444

Enhancing Medical Image Registration via Appearance Adjustment Networks

Authors: Mingyuan Meng, Lei Bi, Michael Fulham, David Dagan Feng, **man Kim

Abstract: Deformable image registration is fundamental for many medical image analyses. A key obstacle for accurate image registration lies in image appearance variations such as the variations in texture, intensities, and noise. These variations are readily apparent in medical images, especially in brain images where registration is frequently used. Recently, deep learning-based registration methods (DLRs)… ▽ More Deformable image registration is fundamental for many medical image analyses. A key obstacle for accurate image registration lies in image appearance variations such as the variations in texture, intensities, and noise. These variations are readily apparent in medical images, especially in brain images where registration is frequently used. Recently, deep learning-based registration methods (DLRs), using deep neural networks, have shown computational efficiency that is several orders of magnitude faster than traditional optimization-based registration methods (ORs). DLRs rely on a globally optimized network that is trained with a set of training samples to achieve faster registration. DLRs tend, however, to disregard the target-pair-specific optimization inherent in ORs and thus have degraded adaptability to variations in testing samples. This limitation is severe for registering medical images with large appearance variations, especially since few existing DLRs explicitly take into account appearance variations. In this study, we propose an Appearance Adjustment Network (AAN) to enhance the adaptability of DLRs to appearance variations. Our AAN, when integrated into a DLR, provides appearance transformations to reduce the appearance variations during registration. In addition, we propose an anatomy-constrained loss function through which our AAN generates anatomy-preserving transformations. Our AAN has been purposely designed to be readily inserted into a wide range of DLRs and can be trained cooperatively in an unsupervised and end-to-end manner. We evaluated our AAN with three state-of-the-art DLRs on three well-established public datasets of 3D brain magnetic resonance imaging (MRI). The results show that our AAN consistently improved existing DLRs and outperformed state-of-the-art ORs on registration accuracy, while adding a fractional computational load to existing DLRs. △ Less

Submitted 3 July, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Published at NeuroImage

Journal ref: NeuroImage, vol. 259, pp. 119444, 2022

arXiv:2103.03931 [pdf]

doi 10.1016/j.patcog.2022.108576

Attention-Enhanced Cross-Task Network for Analysing Multiple Attributes of Lung Nodules in CT

Authors: Xiaohang Fu, Lei Bi, Ashnil Kumar, Michael Fulham, **man Kim

Abstract: Accurate characterisation of visual attributes such as spiculation, lobulation, and calcification of lung nodules is critical in cancer management. The characterisation of these attributes is often subjective, which may lead to high inter- and intra-observer variability. Furthermore, lung nodules are often heterogeneous in the cross-sectional image slices of a 3D volume. Current state-of-the-art m… ▽ More Accurate characterisation of visual attributes such as spiculation, lobulation, and calcification of lung nodules is critical in cancer management. The characterisation of these attributes is often subjective, which may lead to high inter- and intra-observer variability. Furthermore, lung nodules are often heterogeneous in the cross-sectional image slices of a 3D volume. Current state-of-the-art methods that score multiple attributes rely on deep learning-based multi-task learning (MTL) schemes. These methods, however, extract shared visual features across attributes and then examine each attribute without explicitly leveraging their inherent intercorrelations. Furthermore, current methods either treat each slice with equal importance without considering their relevance or heterogeneity, which limits performance. In this study, we address these challenges with a new convolutional neural network (CNN)-based MTL model that incorporates multiple attention-based learning modules to simultaneously score 9 visual attributes of lung nodules in computed tomography (CT) image volumes. Our model processes entire nodule volumes of arbitrary depth and uses a slice attention module to filter out irrelevant slices. We also introduce cross-attribute and attribute specialisation attention modules that learn an optimal amalgamation of meaningful representations to leverage relationships between attributes. We demonstrate that our model outperforms previous state-of-the-art methods at scoring attributes using the well-known public LIDC-IDRI dataset of pulmonary nodules from over 1,000 patients. Our model also performs competitively when repurposed for benign-malignant classification. Our attention modules also provide easy-to-interpret weights that offer insights into the predictions of the model. △ Less

Submitted 10 June, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

arXiv:2007.14728 [pdf]

doi 10.1109/JBHI.2021.3059453

Multimodal Spatial Attention Module for Targeting Multimodal PET-CT Lung Tumor Segmentation

Authors: Xiaohang Fu, Lei Bi, Ashnil Kumar, Michael Fulham, **man Kim

Abstract: Multimodal positron emission tomography-computed tomography (PET-CT) is used routinely in the assessment of cancer. PET-CT combines the high sensitivity for tumor detection with PET and anatomical information from CT. Tumor segmentation is a critical element of PET-CT but at present, there is not an accurate automated segmentation method. Segmentation tends to be done manually by different imaging… ▽ More Multimodal positron emission tomography-computed tomography (PET-CT) is used routinely in the assessment of cancer. PET-CT combines the high sensitivity for tumor detection with PET and anatomical information from CT. Tumor segmentation is a critical element of PET-CT but at present, there is not an accurate automated segmentation method. Segmentation tends to be done manually by different imaging experts and it is labor-intensive and prone to errors and inconsistency. Previous automated segmentation methods largely focused on fusing information that is extracted separately from the PET and CT modalities, with the underlying assumption that each modality contains complementary information. However, these methods do not fully exploit the high PET tumor sensitivity that can guide the segmentation. We introduce a multimodal spatial attention module (MSAM) that automatically learns to emphasize regions (spatial areas) related to tumors and suppress normal regions with physiologic high-uptake. The resulting spatial attention maps are subsequently employed to target a convolutional neural network (CNN) for segmentation of areas with higher tumor likelihood. Our MSAM can be applied to common backbone architectures and trained end-to-end. Our experimental results on two clinical PET-CT datasets of non-small cell lung cancer (NSCLC) and soft tissue sarcoma (STS) validate the effectiveness of the MSAM in these different cancer types. We show that our MSAM, with a conventional U-Net backbone, surpasses the state-of-the-art lung tumor segmentation approach by a margin of 7.6% in Dice similarity coefficient (DSC). △ Less

Submitted 6 August, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

arXiv:2007.06002 [pdf, other]

Multi-Modality Information Fusion for Radiomics-based Neural Architecture Search

Authors: Yige Peng, Lei Bi, Michael Fulham, Dagan Feng, **man Kim

Abstract: 'Radiomics' is a method that extracts mineable quantitative features from radiographic images. These features can then be used to determine prognosis, for example, predicting the development of distant metastases (DM). Existing radiomics methods, however, require complex manual effort including the design of hand-crafted radiomic features and their extraction and selection. Recent radiomics method… ▽ More 'Radiomics' is a method that extracts mineable quantitative features from radiographic images. These features can then be used to determine prognosis, for example, predicting the development of distant metastases (DM). Existing radiomics methods, however, require complex manual effort including the design of hand-crafted radiomic features and their extraction and selection. Recent radiomics methods, based on convolutional neural networks (CNNs), also require manual input in network architecture design and hyper-parameter tuning. Radiomic complexity is further compounded when there are multiple imaging modalities, for example, combined positron emission tomography - computed tomography (PET-CT) where there is functional information from PET and complementary anatomical localization information from computed tomography (CT). Existing multi-modality radiomics methods manually fuse the data that are extracted separately. Reliance on manual fusion often results in sub-optimal fusion because they are dependent on an 'expert's' understanding of medical images. In this study, we propose a multi-modality neural architecture search method (MM-NAS) to automatically derive optimal multi-modality image features for radiomics and thus negate the dependence on a manual process. We evaluated our MM-NAS on the ability to predict DM using a public PET-CT dataset of patients with soft-tissue sarcomas (STSs). Our results show that our MM-NAS had a higher prediction accuracy when compared to state-of-the-art radiomics methods. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: Accepted by MICCAI 2020

arXiv:2002.12680 [pdf, other]

A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image

Authors: Yuyu Guo, Lei Bi, Euijoon Ahn, Dagan Feng, Qian Wang, **man Kim

Abstract: Dynamic medical imaging is usually limited in application due to the large radiation doses and longer image scanning and reconstruction times. Existing methods attempt to reduce the dynamic sequence by interpolating the volumes between the acquired image volumes. However, these methods are limited to either 2D images and/or are unable to support large variations in the motion between the image vol… ▽ More Dynamic medical imaging is usually limited in application due to the large radiation doses and longer image scanning and reconstruction times. Existing methods attempt to reduce the dynamic sequence by interpolating the volumes between the acquired image volumes. However, these methods are limited to either 2D images and/or are unable to support large variations in the motion between the image volume sequences. In this paper, we present a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images. SVIN introduces dual networks: first is the spatiotemporal motion network that leverages the 3D convolutional neural network (CNN) for unsupervised parametric volumetric registration to derive spatiotemporal motion field from two-image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate image volumes, together with a new regression-based module to characterize the periodic motion cycles in functional organ structures. We also introduce an adaptive multi-scale architecture to capture the volumetric large anatomy motions. Experimental results demonstrated that our SVIN outperformed state-of-the-art temporal medical interpolation methods and natural video interpolation methods that have been extended to support volumetric images. Our ablation study further exemplified that our motion network was able to better represent the large functional motion compared with the state-of-the-art unsupervised medical registration methods. △ Less

Submitted 24 April, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: 10 pages, 8 figures, Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Showing 1–22 of 22 results for author: Bi, L