Search | arXiv e-print repository

Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation). △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Project Link: https://sam-auxol.github.io/AuxOL/

arXiv:2404.15339 [pdf, other]

Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation

Authors: Yuehao Wang, Bingchen Gong, Yonghao Long, Siu Hin Fan, Qi Dou

Abstract: The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate sha… ▽ More The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate shapes and textures. To address this gap, we present a data-driven framework that leverages emerging neural radiance field technology to enable high-quality surgical reconstruction and explore its application for surgical simulations. We first focus on develo** a fast NeRF-based surgical scene 3D reconstruction approach that achieves state-of-the-art performance. This method can significantly outperform traditional 3D reconstruction methods, which have failed to capture large deformations and produce fine-grained shapes and textures. We then propose an automated creation pipeline of interactive surgical simulation environments through a closed mesh extraction algorithm. Our experiments have validated the superior performance and efficiency of our proposed approach in surgical scene 3D reconstruction. We further utilize our reconstructed soft tissues to conduct FEM and MPM simulations, showcasing the practical application of our method in data-driven surgical simulations. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 figures. Accepted by International Journal of Computer Assisted Radiology and Surgery

arXiv:2404.01082 [pdf, other]

The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Li** Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and map** raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and map** tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field. △ Less

Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 25 pages, 17 figures

arXiv:2312.09899 [pdf, other]

SQA-SAM: Segmentation Quality Assessment for Medical Images Utilizing the Segment Anything Model

Authors: Yizhe Zhang, Shuo Wang, Tao Zhou, Qi Dou, Danny Z. Chen

Abstract: Segmentation quality assessment (SQA) plays a critical role in the deployment of a medical image based AI system. Users need to be informed/alerted whenever an AI system generates unreliable/incorrect predictions. With the introduction of the Segment Anything Model (SAM), a general foundation segmentation model, new research opportunities emerged in how one can utilize SAM for medical image segmen… ▽ More Segmentation quality assessment (SQA) plays a critical role in the deployment of a medical image based AI system. Users need to be informed/alerted whenever an AI system generates unreliable/incorrect predictions. With the introduction of the Segment Anything Model (SAM), a general foundation segmentation model, new research opportunities emerged in how one can utilize SAM for medical image segmentation. In this paper, we propose a novel SQA method, called SQA-SAM, which exploits SAM to enhance the accuracy of quality assessment for medical image segmentation. When a medical image segmentation model (MedSeg) produces predictions for a test image, we generate visual prompts based on the predictions, and SAM is utilized to generate segmentation maps corresponding to the visual prompts. How well MedSeg's segmentation aligns with SAM's segmentation indicates how well MedSeg's segmentation aligns with the general perception of objectness and image region partition. We develop a score measure for such alignment. In experiments, we find that the generated scores exhibit moderate to strong positive correlation (in Pearson correlation and Spearman correlation) with Dice coefficient scores reflecting the true segmentation quality. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Work in progress;

arXiv:2304.11341 [pdf, ps, other]

Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications

Authors: Ziyang Song, Zheng Shi, Jiaji Su, Qing** Dou, Guanghua Yang, Haichuan Ding, Shaodan Ma

Abstract: Terahertz (THz) communications are envisioned to be a promising technology for 6G thanks to its broad bandwidth. However, the large path loss, antenna misalignment, and atmospheric influence of THz communications severely deteriorate its reliability. To address this, hybrid automatic repeat request (HARQ) is recognized as an effective technique to ensure reliable THz communications. This paper del… ▽ More Terahertz (THz) communications are envisioned to be a promising technology for 6G thanks to its broad bandwidth. However, the large path loss, antenna misalignment, and atmospheric influence of THz communications severely deteriorate its reliability. To address this, hybrid automatic repeat request (HARQ) is recognized as an effective technique to ensure reliable THz communications. This paper delves into the performance analysis of HARQ with incremental redundancy (HARQ-IR)-aided THz communications in the presence/absence of blockage. More specifically, the analytical expression of the outage probability of HARQ-IR-aided THz communications is derived, with which the asymptotic outage analysis is enabled to gain meaningful insights, including diversity order, power allocation gain, modulation and coding gain, etc. Then the long term average throughput (LTAT) is expressed in terms of the outage probability based on renewal theory. Moreover, to combat the blockage effects, a multi-hop HARQ-IR-aided THz communication scheme is proposed and its performance is examined. To demonstrate the superiority of the proposed scheme, the other two HARQ-aided schemes, i.e., Type-I HARQ and HARQ with chase combining (HARQ-CC), are used for benchmarking in the simulations. In addition, a deep neural network (DNN) based outage evaluation framework with low computational complexity is devised to reap the benefits of using both asymptotic and simulation results in low and high outage regimes, respectively. This novel outage evaluation framework is finally employed for the optimal rate selection, which outperforms the asymptotic based optimization. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Comments: Blockage, hybrid automatic repeat request (HARQ), outage probability, terahertz (THz) communications

arXiv:2302.12662 [pdf, other]

FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

Authors: Tianpeng Deng, Yanqi Huang, Guoqiang Han, Zhenwei Shi, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-**g Guo, C. L. Philip Chen, Chu Han

Abstract: Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated… ▽ More Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated training samples and numerous rounds of communication which hinder their practicability in the real-world clinical scenario. In this paper, we propose a universal and lightweight federated learning framework, named Federated Deep-Broad Learning (FedDBL), to achieve superior classification performance with limited training samples and only one-round communication. By simply associating a pre-trained deep learning feature extractor, a fast and lightweight broad learning inference system and a classical federated aggregation approach, FedDBL can dramatically reduce data dependency and improve communication efficiency. Five-fold cross-validation demonstrates that FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Furthermore, due to the lightweight design and one-round communication, FedDBL reduces the communication burden from 4.6GB to only 276.5KB per client using the ResNet-50 backbone at 50-round training. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk. Code is available at https://github.com/tianpeng-deng/FedDBL. △ Less

Submitted 17 December, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

arXiv:2209.13638 [pdf, ps, other]

Outage Probability Analysis of HARQ-Aided Terahertz Communications

Authors: Ziyang Song, Zheng Shi, Qing** Dou, Guanghua Yang, Yunfei Li, Shaodan Ma

Abstract: Although terahertz (THz) communications can provide mobile broadband services, it usually has a large path loss and is vulnerable to antenna misalignment. This significantly degrades the reception reliability. To address this issue, the hybrid automatic repeat request (HARQ) is proposed to further enhance the reliability of THz communications. This paper provides an in-depth investigation on the o… ▽ More Although terahertz (THz) communications can provide mobile broadband services, it usually has a large path loss and is vulnerable to antenna misalignment. This significantly degrades the reception reliability. To address this issue, the hybrid automatic repeat request (HARQ) is proposed to further enhance the reliability of THz communications. This paper provides an in-depth investigation on the outage performance of two different types of HARQ-aided THz communications, including Type-I HARQ and HARQ with chase combining (HARQ-CC). Moreover, the effects of both fading and stochastic antenna misalignment are considered in this paper. The exact outage probabilities of HARQ-aided THz communications are derived in closed-form, with which the asymptotic outage analysis is enabled to explore helpful insights. In particular, it is revealed that full time diversity can be achieved by using HARQ assisted schemes. Besides, the HARQ-CC-aided scheme performs better than the Type-I HARQ-aided one due to its high diversity combining gain. The analytical results are eventually validated via Monte-Carlo simulations. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2207.00769 [pdf, other]

Test-time Adaptation with Calibration of Medical Image Classification Nets for Label Distribution Shift

Authors: Wenao Ma, Cheng Chen, Shuang Zheng, **g Qin, Huimao Zhang, Qi Dou

Abstract: Class distribution plays an important role in learning deep classifiers. When the proportion of each class in the test set differs from the training set, the performance of classification nets usually degrades. Such a label distribution shift problem is common in medical diagnosis since the prevalence of disease vary over location and time. In this paper, we propose the first method to tackle labe… ▽ More Class distribution plays an important role in learning deep classifiers. When the proportion of each class in the test set differs from the training set, the performance of classification nets usually degrades. Such a label distribution shift problem is common in medical diagnosis since the prevalence of disease vary over location and time. In this paper, we propose the first method to tackle label shift for medical image classification, which effectively adapt the model learned from a single training label distribution to arbitrary unknown test label distribution. Our approach innovates distribution calibration to learn multiple representative classifiers, which are capable of handling different one-dominating-class distributions. When given a test image, the diverse classifiers are dynamically aggregated via the consistency-driven test-time adaptation, to deal with the unknown test label distribution. We validate our method on two important medical image classification tasks including liver fibrosis staging and COVID-19 severity prediction. Our experiments clearly show the decreased model performance under label shift. With our method, model performance significantly improves on all the test datasets with different label shifts for both medical image diagnosis tasks. △ Less

Submitted 9 July, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: This paper has been accepted by MICCAI 2022

arXiv:2205.04723 [pdf, other]

Robust Medical Image Classification from Noisy Labeled Data with Global and Local Representation Guided Co-training

Authors: Cheng Xue, Lequan Yu, Pengfei Chen, Qi Dou, Pheng-Ann Heng

Abstract: Deep neural networks have achieved remarkable success in a wide variety of natural image and medical image computing tasks. However, these achievements indispensably rely on accurately annotated training data. If encountering some noisy-labeled images, the network training procedure would suffer from difficulties, leading to a sub-optimal classifier. This problem is even more severe in the medical… ▽ More Deep neural networks have achieved remarkable success in a wide variety of natural image and medical image computing tasks. However, these achievements indispensably rely on accurately annotated training data. If encountering some noisy-labeled images, the network training procedure would suffer from difficulties, leading to a sub-optimal classifier. This problem is even more severe in the medical image analysis field, as the annotation quality of medical images heavily relies on the expertise and experience of annotators. In this paper, we propose a novel collaborative training paradigm with global and local representation learning for robust medical image classification from noisy-labeled data to combat the lack of high quality annotated medical data. Specifically, we employ the self-ensemble model with a noisy label filter to efficiently select the clean and noisy samples. Then, the clean samples are trained by a collaborative training strategy to eliminate the disturbance from imperfect labeled samples. Notably, we further design a novel global and local representation learning scheme to implicitly regularize the networks to utilize noisy samples in a self-supervised manner. We evaluated our proposed robust learning strategy on four public medical image classification datasets with three types of label noise,ie,random noise, computer-generated label noise, and inter-observer variability noise. Our method outperforms other learning from noisy label methods and we also conducted extensive experiments to analyze each component of our method. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2204.10836 [pdf, other]

doi 10.1038/s41467-022-33407-5

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing. △ Less

Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

arXiv:2204.08467 [pdf, other]

IOP-FL: Inside-Outside Personalization for Federated Medical Image Segmentation

Authors: Meirui Jiang, Hongzheng Yang, Chen Cheng, Qi Dou

Abstract: Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant wh… ▽ More Federated learning (FL) allows multiple medical institutions to collaboratively learn a global model without centralizing client data. It is difficult, if possible at all, for such a global model to commonly achieve optimal performance for each individual client, due to the heterogeneity of medical images from various scanners and patient demographics. This problem becomes even more significant when deploying the global model to unseen clients outside the FL with unseen distributions not presented during federated training. To optimize the prediction accuracy of each individual client for medical imaging tasks, we propose a novel unified framework for both \textit{Inside and Outside model Personalization in FL} (IOP-FL). Our inside personalization uses a lightweight gradient-based approach that exploits the local adapted model for each client, by accumulating both the global gradients for common knowledge and the local gradients for client-specific optimization. Moreover, and importantly, the obtained local personalized models and the global model can form a diverse and informative routing space to personalize an adapted model for outside FL clients. Hence, we design a new test-time routing scheme using the consistency loss with a shape constraint to dynamically incorporate the models, given the distribution information conveyed by the test data. Our extensive experimental results on two medical image segmentation tasks present significant improvements over SOTA methods on both inside and outside personalization, demonstrating the potential of our IOP-FL scheme for clinical practice. △ Less

Submitted 29 March, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted by IEEE TMI special issue on federated learning for medical imaging

arXiv:2203.13963 [pdf, other]

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

Authors: Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, **g Qin

Abstract: Magnetic resonance imaging (MRI) can present multi-contrast images of the same anatomical structures, enabling multi-contrast super-resolution (SR) techniques. Compared with SR reconstruction using a single-contrast, multi-contrast SR reconstruction is promising to yield SR images with higher quality by leveraging diverse yet complementary information embedded in different imaging modalities. Howe… ▽ More Magnetic resonance imaging (MRI) can present multi-contrast images of the same anatomical structures, enabling multi-contrast super-resolution (SR) techniques. Compared with SR reconstruction using a single-contrast, multi-contrast SR reconstruction is promising to yield SR images with higher quality by leveraging diverse yet complementary information embedded in different imaging modalities. However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures. We propose a novel network to comprehensively address these problems by develo** a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques; we call it McMRSR. Firstly, we tame transformers to model long-range dependencies in both reference and target images. Then, a new multi-scale contextual matching method is proposed to capture corresponding contexts from reference features at different scales. Furthermore, we introduce a multi-scale aggregation mechanism to gradually and interactively aggregate multi-scale matched features for reconstructing the target SR MR image. Extensive experiments demonstrate that our network outperforms state-of-the-art approaches and has great potential to be applied in clinical practice. Codes are available at https://github.com/XAIMI-Lab/McMRSR. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: CVPR 2022 accepted

arXiv:2112.10775 [pdf, other]

HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images

Authors: Meirui Jiang, Zirui Wang, Qi Dou

Abstract: Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a d… ▽ More Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a drift in the learning process, in both local (client) and global (server) optimizations, which harms the convergence as well as model performance. Many previous works have attempted to address the non-iid issue by tackling the drift locally or globally, but how to jointly solve the two essentially coupled drifts is still unclear. In this work, we concentrate on handling both local and global drifts and introduce a new harmonizing framework called HarmoFL. First, we propose to mitigate the local update drift by normalizing amplitudes of images transformed into the frequency domain to mimic a unified imaging setting, in order to generate a harmonized feature space across local clients. Second, based on harmonized features, we design a client weight perturbation guiding each local model to reach a flat optimum, where a neighborhood area of the local optimal solution has a uniformly low loss. Without any extra communication cost, the perturbation assists the global model to optimize towards a converged optimal solution by aggregating several local flat optima. We have theoretically analyzed the proposed method and empirically conducted extensive experiments on three medical image classification and segmentation tasks, showing that HarmoFL outperforms a set of recent state-of-the-art methods with promising convergence behavior. Code is available at https://github.com/med-air/HarmoFL. △ Less

Submitted 24 April, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: Accepted at AAAI 2022

arXiv:2112.10074 [pdf, other]

doi 10.59275/j.melba.2022-354b

QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results

Authors: Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini , et al. (67 additional authors not shown)

Abstract: Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying… ▽ More Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Develo** scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS. △ Less

Submitted 23 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA): https://www.melba-journal.org/papers/2022:026.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2110.03912 [pdf, other]

doi 10.1109/TBME.2022.3195027

Stereo Dense Scene Reconstruction and Accurate Localization for Learning-Based Navigation of Laparoscope in Minimally Invasive Surgery

Authors: Ruofeng Wei, Bin Li, Hangjie Mo, Bo Lu, Yonghao Long, Bohan Yang, Qi Dou, Yunhui Liu, Dong Sun

Abstract: Objective: The computation of anatomical information and laparoscope position is a fundamental block of surgical navigation in Minimally Invasive Surgery (MIS). Recovering a dense 3D structure of surgical scene using visual cues remains a challenge, and the online laparoscopic tracking primarily relies on external sensors, which increases system complexity. Methods: Here, we propose a learning-dri… ▽ More Objective: The computation of anatomical information and laparoscope position is a fundamental block of surgical navigation in Minimally Invasive Surgery (MIS). Recovering a dense 3D structure of surgical scene using visual cues remains a challenge, and the online laparoscopic tracking primarily relies on external sensors, which increases system complexity. Methods: Here, we propose a learning-driven framework, in which an image-guided laparoscopic localization with 3D reconstructions of complex anatomical structures is obtained. To reconstruct the 3D structure of the whole surgical environment, we first fine-tune a learning-based stereoscopic depth perception method, which is robust to the texture-less and variant soft tissues, for depth estimation. Then, we develop a dense visual reconstruction algorithm to represent the scene by surfels, estimate the laparoscope poses and fuse the depth maps into a unified reference coordinate for tissue reconstruction. To estimate poses of new laparoscope views, we achieve a coarse-to-fine localization method, which incorporates our reconstructed 3D model. Results: We evaluate the reconstruction method and the localization module on three datasets, namely, the stereo correspondence and reconstruction of endoscopic data (SCARED), the ex-vivo phantom and tissue data collected with Universal Robot (UR) and Karl Storz Laparoscope, and the in-vivo DaVinci robotic surgery dataset, where the reconstructed 3D structures have rich details of surface texture with an accuracy error under 1.71 mm and the localization module can accurately track the laparoscope with only images as input. Conclusions: Experimental results demonstrate the superior performance of the proposed method in 3D anatomy reconstruction and laparoscopic localization. Significance: The proposed framework can be potentially extended to the current surgical navigation system. △ Less

Submitted 27 November, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Journal ref: IEEE Transactions on Biomedical Engineering 2022

arXiv:2109.14956 [pdf]

Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming **, Yonghao Long , et al. (16 additional authors not shown)

Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2109.09735 [pdf, other]

Source-Free Domain Adaptive Fundus Image Segmentation with Denoised Pseudo-Labeling

Authors: Cheng Chen, Quande Liu, Yueming **, Qi Dou, Pheng-Ann Heng

Abstract: Domain adaptation typically requires to access source domain data to utilize their distribution information for domain alignment with the target data. However, in many real-world scenarios, the source data may not be accessible during the model adaptation in the target domain due to privacy issue. This paper studies the practical yet challenging source-free unsupervised domain adaptation problem,… ▽ More Domain adaptation typically requires to access source domain data to utilize their distribution information for domain alignment with the target data. However, in many real-world scenarios, the source data may not be accessible during the model adaptation in the target domain due to privacy issue. This paper studies the practical yet challenging source-free unsupervised domain adaptation problem, in which only an existing source model and the unlabeled target data are available for model adaptation. We present a novel denoised pseudo-labeling method for this problem, which effectively makes use of the source model and unlabeled target data to promote model self-adaptation from pseudo labels. Importantly, considering that the pseudo labels generated from source model are inevitably noisy due to domain shift, we further introduce two complementary pixel-level and class-level denoising schemes with uncertainty estimation and prototype estimation to reduce noisy pseudo labels and select reliable ones to enhance the pseudo-labeling efficacy. Experimental results on cross-domain fundus image segmentation show that without using any source images or altering source training, our approach achieves comparable or even higher performance than state-of-the-art source-dependent unsupervised domain adaptation methods. △ Less

Submitted 19 September, 2021; originally announced September 2021.

Comments: Accepted to MICCAI 2021

arXiv:2108.13035 [pdf, other]

SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Authors: Jiaqi Xu, Bin Li, Bo Lu, Yun-Hui Liu, Qi Dou, Pheng-Ann Heng

Abstract: Autonomous surgical execution relieves tedious routines and surgeon's fatigue. Recent learning-based methods, especially reinforcement learning (RL) based methods, achieve promising performance for dexterous manipulation, which usually requires the simulation to collect data efficiently and reduce the hardware cost. The existing learning-based simulation platforms for medical robots suffer from li… ▽ More Autonomous surgical execution relieves tedious routines and surgeon's fatigue. Recent learning-based methods, especially reinforcement learning (RL) based methods, achieve promising performance for dexterous manipulation, which usually requires the simulation to collect data efficiently and reduce the hardware cost. The existing learning-based simulation platforms for medical robots suffer from limited scenarios and simplified physical interactions, which degrades the real-world performance of learned policies. In this work, we designed SurRoL, an RL-centered simulation platform for surgical robot learning compatible with the da Vinci Research Kit (dVRK). The designed SurRoL integrates a user-friendly RL library for algorithm development and a real-time physics engine, which is able to support more PSM/ECM scenarios and more realistic physical interactions. Ten learning-based surgical tasks are built in the platform, which are common in the real autonomous surgical execution. We evaluate SurRoL using RL algorithms in simulation, provide in-depth analysis, deploy the trained policies on the real dVRK, and show that our SurRoL achieves better transferability in the real world. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: 8 pages, 8 figures, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2104.01975 [pdf, other]

Cascaded Robust Learning at Imperfect Labels for Chest X-ray Segmentation

Authors: Cheng Xue, Qiao Deng, Xiaomeng Li, Qi Dou, Pheng Ann Heng

Abstract: The superior performance of CNN on medical image analysis heavily depends on the annotation quality, such as the number of labeled image, the source of image, and the expert experience. The annotation requires great expertise and labour. To deal with the high inter-rater variability, the study of imperfect label has great significance in medical image segmentation tasks. In this paper, we present… ▽ More The superior performance of CNN on medical image analysis heavily depends on the annotation quality, such as the number of labeled image, the source of image, and the expert experience. The annotation requires great expertise and labour. To deal with the high inter-rater variability, the study of imperfect label has great significance in medical image segmentation tasks. In this paper, we present a novel cascaded robust learning framework for chest X-ray segmentation with imperfect annotation. Our model consists of three independent network, which can effectively learn useful information from the peer networks. The framework includes two stages. In the first stage, we select the clean annotated samples via a model committee setting, the networks are trained by minimizing a segmentation loss using the selected clean samples. In the second stage, we design a joint optimization framework with label correction to gradually correct the wrong annotation and improve the network performance. We conduct experiments on the public chest X-ray image datasets collected by Shenzhen Hospital. The results show that our methods could achieve a significant improvement on the accuracy in segmentation tasks compared to the previous methods. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: 9pages, 4 figures. MICCAI 2020

arXiv:2009.07652 [pdf, other]

doi 10.1109/JBHI.2020.3023246

Contrastive Cross-site Learning with Redesigned Net for COVID-19 CT Classification

Authors: Zhao Wang, Quande Liu, Qi Dou

Abstract: The pandemic of coronavirus disease 2019 (COVID-19) has lead to a global public health crisis spreading hundreds of countries. With the continuous growth of new infections, develo** automated tools for COVID-19 identification with CT image is highly desired to assist the clinical diagnosis and reduce the tedious workload of image interpretation. To enlarge the datasets for develo** machine lea… ▽ More The pandemic of coronavirus disease 2019 (COVID-19) has lead to a global public health crisis spreading hundreds of countries. With the continuous growth of new infections, develo** automated tools for COVID-19 identification with CT image is highly desired to assist the clinical diagnosis and reduce the tedious workload of image interpretation. To enlarge the datasets for develo** machine learning methods, it is essentially helpful to aggregate the cases from different medical systems for learning robust and generalizable models. This paper proposes a novel joint learning framework to perform accurate COVID-19 identification by effectively learning with heterogeneous datasets with distribution discrepancy. We build a powerful backbone by redesigning the recently proposed COVID-Net in aspects of network architecture and learning strategy to improve the prediction accuracy and learning efficiency. On top of our improved backbone, we further explicitly tackle the cross-site domain shift by conducting separate feature normalization in latent space. Moreover, we propose to use a contrastive training objective to enhance the domain invariance of semantic embeddings for boosting the classification performance on each dataset. We develop and evaluate our method with two public large-scale COVID-19 diagnosis datasets made up of CT images. Extensive experiments show that our approach consistently improves the performances on both datasets, outperforming the original COVID-Net trained on each dataset by 12.16% and 14.23% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: Published as a journal paper at IEEE J-BHI; code and dataset are available at https://github.com/med-air/Contrastive-COVIDNet

arXiv:2007.02096 [pdf]

doi 10.1109/TMI.2021.3055428

Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

Authors: Yue Sun, Kun Gao, Zhengwang Wu, Zhihao Lei, Ying Wei, Jun Ma, ** Yang, Xue Feng, Li Zhao, Trung Le Phan, Jitae Shin, Tao Zhong, Yu Zhang, Lequan Yu, Caizi Li, Ramesh Basnet, M. Omair Ahmad, M. N. S. Swamy, Wenao Ma, Qi Dou, Toan Duc Bui, Camilo Bermudez Noguera, Bennett Landman, Ian H. Gotlib, Kathryn L. Humphreys , et al. (8 additional authors not shown)

Abstract: To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i… ▽ More To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue. △ Less

Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

Journal ref: IEEE Transactions on Medical Imaging, 40(5), 1363-1376, 2021

arXiv:2006.16741 [pdf, other]

doi 10.1007/978-3-030-59728-3_69

Image-level Harmonization of Multi-Site Data using Image-and-Spatial Transformer Networks

Authors: R. Robinson, Q. Dou, D. C. Castro, K. Kamnitsas, M. de Groot, R. M. Summers, D. Rueckert, B. Glocker

Abstract: We investigate the use of image-and-spatial transformer networks (ISTNs) to tackle domain shift in multi-site medical imaging data. Commonly, domain adaptation (DA) is performed with little regard for explainability of the inter-domain transformation and is often conducted at the feature-level in the latent space. We employ ISTNs for DA at the image-level which constrains transformations to explai… ▽ More We investigate the use of image-and-spatial transformer networks (ISTNs) to tackle domain shift in multi-site medical imaging data. Commonly, domain adaptation (DA) is performed with little regard for explainability of the inter-domain transformation and is often conducted at the feature-level in the latent space. We employ ISTNs for DA at the image-level which constrains transformations to explainable appearance and shape changes. As proof-of-concept we demonstrate that ISTNs can be trained adversarially on a classification problem with simulated 2D data. For real-data validation, we construct two 3D brain MRI datasets from the Cam-CAN and UK Biobank studies to investigate domain shift due to acquisition and population differences. We show that age regression and sex classification models trained on ISTN output improve generalization when training on data from one and testing on the other site. △ Less

Submitted 30 June, 2020; originally announced June 2020.

Comments: Accepted at MICCAI 2020

Journal ref: Medical Image Computing and Computer-Assisted Intervention (2020), pp. 710-719, LNCS 12267

arXiv:2002.03366 [pdf, other]

doi 10.1109/TMI.2020.2974574

MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data

Authors: Quande Liu, Qi Dou, Lequan Yu, Pheng Ann Heng

Abstract: Automated prostate segmentation in MRI is highly demanded for computer-assisted diagnosis. Recently, a variety of deep learning methods have achieved remarkable progress in this task, usually relying on large amounts of training data. Due to the nature of scarcity for medical images, it is important to effectively aggregate data from multiple sites for robust model training, to alleviate the insuf… ▽ More Automated prostate segmentation in MRI is highly demanded for computer-assisted diagnosis. Recently, a variety of deep learning methods have achieved remarkable progress in this task, usually relying on large amounts of training data. Due to the nature of scarcity for medical images, it is important to effectively aggregate data from multiple sites for robust model training, to alleviate the insufficiency of single-site samples. However, the prostate MRIs from different sites present heterogeneity due to the differences in scanners and imaging protocols, raising challenges for effective ways of aggregating multi-site data for network training. In this paper, we propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations, leveraging multiple sources of data. To compensate for the inter-site heterogeneity of different MRI datasets, we develop Domain-Specific Batch Normalization layers in the network backbone, enabling the network to estimate statistics and perform feature normalization for each site separately. Considering the difficulty of capturing the shared knowledge from multiple datasets, a novel learning paradigm, i.e., Multi-site-guided Knowledge Transfer, is proposed to enhance the kernels to extract more generic representations from multi-site data. Extensive experiments on three heterogeneous prostate MRI datasets demonstrate that our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning. △ Less

Submitted 19 February, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: IEEE TMI, 2020

arXiv:2002.02255 [pdf, other]

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

Authors: Cheng Chen, Qi Dou, Hao Chen, **g Qin, Pheng Ann Heng

Abstract: Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA), to effectively adapt a segmentation… ▽ More Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin. △ Less

Submitted 6 February, 2020; originally announced February 2020.

Comments: IEEE TMI

arXiv:2001.03111 [pdf, other]

Unpaired Multi-modal Segmentation via Knowledge Distillation

Authors: Qi Dou, Quande Liu, Pheng Ann Heng, Ben Glocker

Abstract: Multi-modal learning is typically performed with network architectures containing modality-specific layers and shared layers, utilizing co-registered images of different modalities. We propose a novel learning scheme for unpaired cross-modality image segmentation, with a highly compact architecture achieving superior segmentation accuracy. In our method, we heavily reuse network parameters, by sha… ▽ More Multi-modal learning is typically performed with network architectures containing modality-specific layers and shared layers, utilizing co-registered images of different modalities. We propose a novel learning scheme for unpaired cross-modality image segmentation, with a highly compact architecture achieving superior segmentation accuracy. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI, and only employ modality-specific internal normalization layers which compute respective statistics. To effectively train such a highly compact model, we introduce a novel loss term inspired by knowledge distillation, by explicitly constraining the KL-divergence of our derived prediction distributions between modalities. We have extensively validated our approach on two multi-class segmentation problems: i) cardiac structure segmentation, and ii) abdominal organ segmentation. Different network settings, i.e., 2D dilated network and 3D U-net, are utilized to investigate our method's general efficacy. Experimental results on both tasks demonstrate that our novel multi-modal learning scheme consistently outperforms single-modal training and previous multi-modal approaches. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Comments: IEEE TMI, code available

arXiv:1910.04597 [pdf, other]

Machine Learning with Multi-Site Imaging Data: An Empirical Study on the Impact of Scanner Effects

Authors: Ben Glocker, Robert Robinson, Daniel C. Castro, Qi Dou, Ender Konukoglu

Abstract: This is an empirical study to investigate the impact of scanner effects when using machine learning on multi-site neuroimaging data. We utilize structural T1-weighted brain MRI obtained from two different studies, Cam-CAN and UK Biobank. For the purpose of our investigation, we construct a dataset consisting of brain scans from 592 age- and sex-matched individuals, 296 subjects from each original… ▽ More This is an empirical study to investigate the impact of scanner effects when using machine learning on multi-site neuroimaging data. We utilize structural T1-weighted brain MRI obtained from two different studies, Cam-CAN and UK Biobank. For the purpose of our investigation, we construct a dataset consisting of brain scans from 592 age- and sex-matched individuals, 296 subjects from each original study. Our results demonstrate that even after careful pre-processing with state-of-the-art neuroimaging pipelines a classifier can easily distinguish between the origin of the data with very high accuracy. Our analysis on the example application of sex classification suggests that current approaches to harmonize data are unable to remove scanner-specific bias leading to overly optimistic performance estimates and poor generalization. We conclude that multi-site data harmonization remains an open challenge and particular care needs to be taken when using such data with advanced machine learning methods for predictive modelling. △ Less

Submitted 10 October, 2019; originally announced October 2019.

Comments: Presented at the Medical Imaging meets NeurIPS Workshop 2019

arXiv:1909.12289 [pdf, other]

Attention Forcing for Sequence-to-sequence Model Training

Authors: Qingyun Dou, Yiting Lu, Joshua Efiong, Mark J. F. Gales

Abstract: Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during infe… ▽ More Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during inference, where the reference output is replaced by generated output. Several approaches deal with this problem, largely by guiding the model with generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing, which guides the model with generated output history and reference attention. This approach can train the model to recover from its mistakes, in a stable fashion, without the need for a schedule or a classifier. In addition, it allows the model to generate output sequences aligned with the references, which can be important for cascaded systems like many speech synthesis systems. Experiments on speech synthesis show that attention forcing yields significant performance gain. Experiments on machine translation show that for tasks where various re-orderings of the output are valid, guiding the model with generated output history is challenging, while guiding the model with reference attention is beneficial. △ Less

Submitted 2 October, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: 11 pages, 4 figures, conference

ACM Class: I.2

arXiv:1909.02344 [pdf, other]

An Active Learning Approach for Reducing Annotation Cost in Skin Lesion Analysis

Authors: Xueying Shi, Qi Dou, Cheng Xue, **g Qin, Hao Chen, Pheng-Ann Heng

Abstract: Automated skin lesion analysis is very crucial in clinical practice, as skin cancer is among the most common human malignancy. Existing approaches with deep learning have achieved remarkable performance on this challenging task, however, heavily relying on large-scale labelled datasets. In this paper, we present a novel active learning framework for cost-effective skin lesion analysis. The goal is… ▽ More Automated skin lesion analysis is very crucial in clinical practice, as skin cancer is among the most common human malignancy. Existing approaches with deep learning have achieved remarkable performance on this challenging task, however, heavily relying on large-scale labelled datasets. In this paper, we present a novel active learning framework for cost-effective skin lesion analysis. The goal is to effectively select and utilize much fewer labelled samples, while the network can still achieve state-of-the-art performance. Our sample selection criteria complementarily consider both informativeness and representativeness, derived from decoupled aspects of measuring model certainty and covering sample diversity. To make wise use of the selected samples, we further design a simple yet effective strategy to aggregate intra-class images in pixel space, as a new form of data augmentation. We validate our proposed method on data of ISIC 2017 Skin Lesion Classification Challenge for two tasks. Using only up to 50% of samples, our approach can achieve state-of-the-art performances on both tasks, which are comparable or exceeding the accuracies with full-data training, and outperform other well-known active learning methods by a large margin. △ Less

Submitted 5 September, 2019; originally announced September 2019.

Comments: Accepted by MIML2019

arXiv:1907.06099 [pdf, other]

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis

Authors: Yueming **, Huaxia Li, Qi Dou, Hao Chen, **g Qin, Chi-Wing Fu, Pheng-Ann Heng

Abstract: Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis and also very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is well-defined, most previous methods tackled them separately, without making full… ▽ More Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis and also very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by develo** a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition). The code can be found on our project website. △ Less

Submitted 13 July, 2019; originally announced July 2019.

Comments: Minor Revision at Medical Image Analysis

arXiv:1906.09745 [pdf]

Respiratory Motion Correction in Abdominal MRI using a Densely Connected U-Net with GAN-guided Training

Authors: Wenhao Jiang, Zhiyu Liu, Kit-Hang Lee, Shihui Chen, Yui-Lun Ng, Qi Dou, Hing-Chiu Chang, Ka-Wai Kwok

Abstract: Abdominal magnetic resonance imaging (MRI) provides a straightforward way of characterizing tissue and locating lesions of patients as in standard diagnosis. However, abdominal MRI often suffers from respiratory motion artifacts, which leads to blurring and ghosting that significantly deteriorate the imaging quality. Conventional methods to reduce or eliminate these motion artifacts include breath… ▽ More Abdominal magnetic resonance imaging (MRI) provides a straightforward way of characterizing tissue and locating lesions of patients as in standard diagnosis. However, abdominal MRI often suffers from respiratory motion artifacts, which leads to blurring and ghosting that significantly deteriorate the imaging quality. Conventional methods to reduce or eliminate these motion artifacts include breath holding, patient sedation, respiratory gating, and image post-processing, but these strategies inevitably involve extra scanning time and patient discomfort. In this paper, we propose a novel deep-learning-based model to recover MR images from respiratory motion artifacts. The proposed model comprises a densely connected U-net with generative adversarial network (GAN)-guided training and a perceptual loss function. We validate the model using a diverse collection of MRI data that are adversely affected by both synthetic and authentic respiration artifacts. Effective outcomes of motion removal are demonstrated. Our experimental results show the great potential of utilizing deep-learning-based methods in respiratory motion correction for abdominal MRI. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 8 pages, 4 figures, submitted to the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:1906.02999 [pdf, other]

Deep Angular Embedding and Feature Correlation Attention for Breast MRI Cancer Analysis

Authors: Luyang Luo, Hao Chen, Xi Wang, Qi Dou, Huang** Lin, Juan Zhou, Gongjie Li, Pheng-Ann Heng

Abstract: Accurate and automatic analysis of breast MRI plays an important role in early diagnosis and successful treatment planning for breast cancer. Due to the heterogeneity nature, accurate diagnosis of tumors remains a challenging task. In this paper, we propose to identify breast tumor in MRI by Cosine Margin Sigmoid Loss (CMSL) with deep learning (DL) and localize possible cancer lesion by COrrelatio… ▽ More Accurate and automatic analysis of breast MRI plays an important role in early diagnosis and successful treatment planning for breast cancer. Due to the heterogeneity nature, accurate diagnosis of tumors remains a challenging task. In this paper, we propose to identify breast tumor in MRI by Cosine Margin Sigmoid Loss (CMSL) with deep learning (DL) and localize possible cancer lesion by COrrelation Attention Map (COAM) based on the learned features. The CMSL embeds tumor features onto a hypersphere and imposes a decision margin through cosine constraints. In this way, the DL model could learn more separable inter-class features and more compact intra-class features in the angular space. Furthermore, we utilize the correlations among feature vectors to generate attention maps that could accurately localize cancer candidates with only image-level label. We build the largest breast cancer dataset involving 10,290 DCE-MRI scan volumes for develo** and evaluating the proposed methods. The model driven by CMSL achieved classification accuracy of 0.855 and AUC of 0.902 on the testing set, with sensitivity and specificity of 0.857 and 0.852, respectively, outperforming other competitive methods overall. In addition, the proposed COAM accomplished more accurate localization of the cancer center compared with other state-of-the-art weakly supervised localization method. △ Less

Submitted 7 June, 2019; originally announced June 2019.

Comments: Accepted by MICCAI 2019

arXiv:1906.02283 [pdf, other]

Improving RetinaNet for CT Lesion Detection with Dense Masks from Weak RECIST Labels

Authors: Martin Zlocha, Qi Dou, Ben Glocker

Abstract: Accurate, automated lesion detection in Computed Tomography (CT) is an important yet challenging task due to the large variation of lesion types, sizes, locations and appearances. Recent work on CT lesion detection employs two-stage region proposal based methods trained with centroid or bounding-box annotations. We propose a highly accurate and efficient one-stage lesion detector, by re-designing… ▽ More Accurate, automated lesion detection in Computed Tomography (CT) is an important yet challenging task due to the large variation of lesion types, sizes, locations and appearances. Recent work on CT lesion detection employs two-stage region proposal based methods trained with centroid or bounding-box annotations. We propose a highly accurate and efficient one-stage lesion detector, by re-designing a RetinaNet to meet the particular challenges in medical imaging. Specifically, we optimize the anchor configurations using a differential evolution search algorithm. For training, we leverage the response evaluation criteria in solid tumors (RECIST) annotation which are measured in clinical routine. We incorporate dense masks from weak RECIST labels, obtained automatically using GrabCut, into the training objective, which in combination with other advancements yields new state-of-the-art performance. We evaluate our method on the public DeepLesion benchmark, consisting of 32,735 lesions across the body. Our one-stage detector achieves a sensitivity of 90.77% at 4 false positives per image, significantly outperforming the best reported methods by over 5%. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: Accepted at MICCAI 2019

Showing 1–32 of 32 results for author: Dou, Q