Search | arXiv e-print repository

The ULS23 Challenge: a Baseline Model and Benchmark Dataset for 3D Universal Lesion Segmentation in Computed Tomography

Authors: M. J. J. de Grauw, E. Th. Scholten, E. J. Smit, M. J. C. M. Rutten, M. Prokop, B. van Ginneken, A. Hering

Abstract: Size measurements of tumor manifestations on follow-up CT examinations are crucial for evaluating treatment outcomes in cancer patients. Efficient lesion segmentation can speed up these radiological workflows. While numerous benchmarks and challenges address lesion segmentation in specific organs like the liver, kidneys, and lungs, the larger variety of lesion types encountered in clinical practic… ▽ More Size measurements of tumor manifestations on follow-up CT examinations are crucial for evaluating treatment outcomes in cancer patients. Efficient lesion segmentation can speed up these radiological workflows. While numerous benchmarks and challenges address lesion segmentation in specific organs like the liver, kidneys, and lungs, the larger variety of lesion types encountered in clinical practice demands a more universal approach. To address this gap, we introduced the ULS23 benchmark for 3D universal lesion segmentation in chest-abdomen-pelvis CT examinations. The ULS23 training dataset contains 38,693 lesions across this region, including challenging pancreatic, colon and bone lesions. For evaluation purposes, we curated a dataset comprising 775 lesions from 284 patients. Each of these lesions was identified as a target lesion in a clinical context, ensuring diversity and clinical relevance within this dataset. The ULS23 benchmark is publicly accessible via uls23.grand-challenge.org, enabling researchers worldwide to assess the performance of their segmentation methods. Furthermore, we have developed and publicly released our baseline semi-supervised 3D lesion segmentation model. This model achieved an average Dice coefficient of 0.703 $\pm$ 0.240 on the challenge test set. We invite ongoing submissions to advance the development of future ULS models. △ Less

Submitted 21 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.06463 [pdf, other]

MRSegmentator: Robust Multi-Modality Segmentation of 40 Classes in MRI and CT Sequences

Authors: Hartmut Häntze, Lina Xu, Felix J. Dorfner, Leonhard Donle, Daniel Truhn, Hugo Aerts, Mathias Prokop, Bram van Ginneken, Alessa Hering, Lisa C. Adams, Keno K. Bressem

Abstract: Purpose: To introduce a deep learning model capable of multi-organ segmentation in MRI scans, offering a solution to the current limitations in MRI analysis due to challenges in resolution, standardized intensity values, and variability in sequences. Materials and Methods: he model was trained on 1,200 manually annotated MRI scans from the UK Biobank, 221 in-house MRI scans and 1228 CT scans, le… ▽ More Purpose: To introduce a deep learning model capable of multi-organ segmentation in MRI scans, offering a solution to the current limitations in MRI analysis due to challenges in resolution, standardized intensity values, and variability in sequences. Materials and Methods: he model was trained on 1,200 manually annotated MRI scans from the UK Biobank, 221 in-house MRI scans and 1228 CT scans, leveraging cross-modality transfer learning from CT segmentation models. A human-in-the-loop annotation workflow was employed to efficiently create high-quality segmentations. The model's performance was evaluated on NAKO and the AMOS22 dataset containing 600 and 60 MRI examinations. Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) was used to assess segmentation accuracy. The model will be open sourced. Results: The model showcased high accuracy in segmenting well-defined organs, achieving Dice Similarity Coefficient (DSC) scores of 0.97 for the right and left lungs, and 0.95 for the heart. It also demonstrated robustness in organs like the liver (DSC: 0.96) and kidneys (DSC: 0.95 left, 0.95 right), which present more variability. However, segmentation of smaller and complex structures such as the portal and splenic veins (DSC: 0.54) and adrenal glands (DSC: 0.65 left, 0.61 right) revealed the need for further model optimization. Conclusion: The proposed model is a robust, tool for accurate segmentation of 40 anatomical structures in MRI and CT images. By leveraging cross-modality learning and interactive annotation, the model achieves strong performance and generalizability across diverse datasets, making it a valuable resource for researchers and clinicians. It is open source and can be downloaded from https://github.com/hhaentze/MRSegmentator. △ Less

Submitted 13 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: 13 pages, 6 figures; corrected co-author info

ACM Class: J.3

arXiv:2401.02192 [pdf]

Nodule detection and generation on chest X-rays: NODE21 Challenge

Authors: Ecem Sogancioglu, Bram van Ginneken, Finn Behrendt, Marcel Bengs, Alexander Schlaefer, Miron Radu, Di Xu, Ke Sheng, Fabien Scalzo, Eric Marcus, Samuele Papa, Jonas Teuwen, Ernst Th. Scholten, Steven Schalekamp, Nils Hendrix, Colin Jacobs, Ward Hendrix, Clara I Sánchez, Keelin Murphy

Abstract: Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of… ▽ More Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 15 pages, 5 figures

arXiv:2311.05032 [pdf, other]

Transfer learning from a sparsely annotated dataset of 3D medical images

Authors: Gabriel Efrain Humpire-Mamani, Colin Jacobs, Mathias Prokop, Bram van Ginneken, Nikolas Lessmann

Abstract: Transfer learning leverages pre-trained model features from a large dataset to save time and resources when training new models for various tasks, potentially enhancing performance. Due to the lack of large datasets in the medical imaging domain, transfer learning from one medical imaging model to other medical imaging models has not been widely explored. This study explores the use of transfer le… ▽ More Transfer learning leverages pre-trained model features from a large dataset to save time and resources when training new models for various tasks, potentially enhancing performance. Due to the lack of large datasets in the medical imaging domain, transfer learning from one medical imaging model to other medical imaging models has not been widely explored. This study explores the use of transfer learning to improve the performance of deep convolutional neural networks for organ segmentation in medical imaging. A base segmentation model (3D U-Net) was trained on a large and sparsely annotated dataset; its weights were used for transfer learning on four new down-stream segmentation tasks for which a fully annotated dataset was available. We analyzed the training set size's influence to simulate scarce data. The results showed that transfer learning from the base model was beneficial when small datasets were available, providing significant performance improvements; where fine-tuning the base model is more beneficial than updating all the network weights with vanilla transfer learning. Transfer learning with fine-tuning increased the performance by up to 0.129 (+28\%) Dice score than experiments trained from scratch, and on average 23 experiments increased the performance by 0.029 Dice score in the new segmentation tasks. The study also showed that cross-modality transfer learning using CT scans was beneficial. The findings of this study demonstrate the potential of transfer learning to improve the efficiency of annotation and increase the accessibility of accurate organ segmentation in medical imaging, ultimately leading to improved patient care. We made the network definition and weights publicly available to benefit other users and researchers. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2309.03383 [pdf, other]

Kidney abnormality segmentation in thorax-abdomen CT scans

Authors: Gabriel Efrain Humpire Mamani, Nikolas Lessmann, Ernst Th. Scholten, Mathias Prokop, Colin Jacobs, Bram van Ginneken

Abstract: In this study, we introduce a deep learning approach for segmenting kidney parenchyma and kidney abnormalities to support clinicians in identifying and quantifying renal abnormalities such as cysts, lesions, masses, metastases, and primary tumors. Our end-to-end segmentation method was trained on 215 contrast-enhanced thoracic-abdominal CT scans, with half of these scans containing one or more abn… ▽ More In this study, we introduce a deep learning approach for segmenting kidney parenchyma and kidney abnormalities to support clinicians in identifying and quantifying renal abnormalities such as cysts, lesions, masses, metastases, and primary tumors. Our end-to-end segmentation method was trained on 215 contrast-enhanced thoracic-abdominal CT scans, with half of these scans containing one or more abnormalities. We began by implementing our own version of the original 3D U-Net network and incorporated four additional components: an end-to-end multi-resolution approach, a set of task-specific data augmentations, a modified loss function using top-$k$, and spatial dropout. Furthermore, we devised a tailored post-processing strategy. Ablation studies demonstrated that each of the four modifications enhanced kidney abnormality segmentation performance, while three out of four improved kidney parenchyma segmentation. Subsequently, we trained the nnUNet framework on our dataset. By ensembling the optimized 3D U-Net and the nnUNet with our specialized post-processing, we achieved marginally superior results. Our best-performing model attained Dice scores of 0.965 and 0.947 for segmenting kidney parenchyma in two test sets (20 scans without abnormalities and 30 with abnormalities), outperforming an independent human observer who scored 0.944 and 0.925, respectively. In segmenting kidney abnormalities within the 30 test scans containing them, the top-performing method achieved a Dice score of 0.585, while an independent second human observer reached a score of 0.664, suggesting potential for further improvement in computerized methods. All training data is available to the research community under a CC-BY 4.0 license on https://doi.org/10.5281/zenodo.8014289 △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.02576 [pdf, other]

doi 10.1038/s41598-023-40116-6

Emphysema Subty** on Thoracic Computed Tomography Scans using Deep Neural Networks

Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Dirk Jan Slebos, Bram van Ginneken

Abstract: Accurate identification of emphysema subtypes and severity is crucial for effective management of COPD and the study of disease heterogeneity. Manual analysis of emphysema subtypes and severity is laborious and subjective. To address this challenge, we present a deep learning-based approach for automating the Fleischner Society's visual score system for emphysema subty** and severity analysis. W… ▽ More Accurate identification of emphysema subtypes and severity is crucial for effective management of COPD and the study of disease heterogeneity. Manual analysis of emphysema subtypes and severity is laborious and subjective. To address this challenge, we present a deep learning-based approach for automating the Fleischner Society's visual score system for emphysema subty** and severity analysis. We trained and evaluated our algorithm using 9650 subjects from the COPDGene study. Our algorithm achieved the predictive accuracy at 52\%, outperforming a previously published method's accuracy of 45\%. In addition, the agreement between the predicted scores of our method and the visual scores was good, where the previous method obtained only moderate agreement. Our approach employs a regression training strategy to generate categorical labels while simultaneously producing high-resolution localized activation maps for visualizing the network predictions. By leveraging these dense activation maps, our method possesses the capability to compute the percentage of emphysema involvement per lung in addition to categorical severity scores. Furthermore, the proposed method extends its predictive capabilities beyond centrilobular emphysema to include paraseptal emphysema subtypes. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Journal ref: Sci Rep. 2023 Aug 29;13(1):14147

arXiv:2306.12217 [pdf]

doi 10.1038/s41597-024-03090-w

Lumbar spine segmentation in MR images: a dataset and a public benchmark

Authors: Jasper W. van der Graaf, Miranda L. van Hooff, Constantinus F. M. Buckens, Matthieu Rutten, Job L. C. van Susante, Robert Jan Kroeze, Marinus de Kleuver, Bram van Ginneken, Nikolas Lessmann

Abstract: This paper presents a large publicly available multi-center lumbar spine magnetic resonance imaging (MRI) dataset with reference segmentations of vertebrae, intervertebral discs (IVDs), and spinal canal. The dataset includes 447 sagittal T1 and T2 MRI series from 218 patients with a history of low back pain and was collected from four different hospitals. An iterative data annotation approach was… ▽ More This paper presents a large publicly available multi-center lumbar spine magnetic resonance imaging (MRI) dataset with reference segmentations of vertebrae, intervertebral discs (IVDs), and spinal canal. The dataset includes 447 sagittal T1 and T2 MRI series from 218 patients with a history of low back pain and was collected from four different hospitals. An iterative data annotation approach was used by training a segmentation algorithm on a small part of the dataset, enabling semi-automatic segmentation of the remaining images. The algorithm provided an initial segmentation, which was subsequently reviewed, manually corrected, and added to the training data. We provide reference performance values for this baseline algorithm and nnU-Net, which performed comparably. Performance values were computed on a sequestered set of 39 studies with 97 series, which were additionally used to set up a continuous segmentation challenge that allows for a fair comparison of different segmentation algorithms. This study may encourage wider collaboration in the field of spine segmentation and improve the diagnostic value of lumbar spine MRI. △ Less

Submitted 5 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: Published in Scientific Data

Journal ref: Scientific Data 11.1 (2024): 264

arXiv:2306.10484 [pdf, other]

The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data

Authors: Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schon, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Muller, Silvan Mertes, Niklas Schroter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis , et al. (13 additional authors not shown)

Abstract: Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m… ▽ More Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.HSUXJM-TNZF9CHSUXJM-TNZF9C △ Less

Submitted 25 June, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

arXiv:2302.03116 [pdf, ps, other]

Uncertainty-Aware Multiple-Instance Learning for Reliable Classification: Application to Optical Coherence Tomography

Authors: Coen de Vente, Bram van Ginneken, Carel B. Hoyng, Caroline C. W. Klaver, Clara I. Sánchez

Abstract: Deep learning classification models for medical image analysis often perform well on data from scanners that were used during training. However, when these models are applied to data from different vendors, their performance tends to drop substantially. Artifacts that only occur within scans from specific scanners are major causes of this poor generalizability. We aimed to improve the reliability… ▽ More Deep learning classification models for medical image analysis often perform well on data from scanners that were used during training. However, when these models are applied to data from different vendors, their performance tends to drop substantially. Artifacts that only occur within scans from specific scanners are major causes of this poor generalizability. We aimed to improve the reliability of deep learning classification models by proposing Uncertainty-Based Instance eXclusion (UBIX). This technique, based on multiple-instance learning, reduces the effect of corrupted instances on the bag-classification by seamlessly integrating out-of-distribution (OOD) instance detection during inference. Although UBIX is generally applicable to different medical images and diverse classification tasks, we focused on staging of age-related macular degeneration in optical coherence tomography. After being trained using images from one vendor, UBIX showed a reliable behavior, with a slight decrease in performance (a decrease of the quadratic weighted kappa ($κ_w$) from 0.861 to 0.708), when applied to images from different vendors containing artifacts; while a state-of-the-art 3D neural network suffered from a significant detriment of performance ($κ_w$ from 0.852 to 0.084) on the same test set. We showed that instances with unseen artifacts can be identified with OOD detection and their contribution to the bag-level predictions can be reduced, improving reliability without the need for retraining on new data. This potentially increases the applicability of artificial intelligence models to data from other scanners than the ones for which they were developed. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: 17 pages, 7 figures, 5 tables

arXiv:2302.01738 [pdf, other]

AIROGS: Artificial Intelligence for RObust Glaucoma Screening Challenge

Authors: Coen de Vente, Koenraad A. Vermeer, Nicolas Jaccard, He Wang, Hongyi Sun, Firas Khader, Daniel Truhn, Temirgali Aimyshev, Yerkebulan Zhanibekuly, Tien-Dung Le, Adrian Galdran, Miguel Ángel González Ballester, Gustavo Carneiro, Devika R G, Hrishikesh P S, Densen Puthussery, Hong Liu, Zekang Yang, Satoshi Kondo, Satoshi Kasai, Edward Wang, Ashritha Durvasula, Jónathan Heras, Miguel Ángel Zapata, Teresa Araújo , et al. (11 additional authors not shown)

Abstract: The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios… ▽ More The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper, and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening. △ Less

Submitted 10 February, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 19 pages, 8 figures, 3 tables

arXiv:2207.06193 [pdf, other]

Domain adaptation strategies for cancer-independent detection of lymph node metastases

Authors: Péter Bándi, Maschenka Balkenhol, Marcory van Dijk, Bram van Ginneken, Jeroen van der Laak, Geert Litjens

Abstract: Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper… ▽ More Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper we investigate how to leverage existing high-quality datasets most efficiently in multi-task settings for closely related tasks. Specifically, we will explore different training and domain adaptation strategies, including prevention of catastrophic forgetting, for colon and head-and-neck cancer metastasis detection in lymph nodes. Our results show state-of-the-art performance on both cancer metastasis detection tasks. Furthermore, we show the effectiveness of repeated adaptation of networks from one cancer type to another to obtain multi-task metastasis detection networks. Last, we show that leveraging existing high-quality datasets can significantly boost performance on new target tasks and that catastrophic forgetting can be effectively mitigated using regularization. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2112.04489 [pdf, other]

Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning

Authors: Alessa Hering, Lasse Hansen, Tony C. W. Mok, Albert C. S. Chung, Hanna Siebert, Stephanie Häger, Annkristin Lange, Sven Kuckertz, Stefan Heldmann, Wei Shao, Sulaiman Vesal, Mirabela Rusu, Geoffrey Sonn, Théo Estienne, Maria Vakalopoulou, Luyi Han, Yunzhi Huang, Pew-Thian Yap, Mikael Brudfors, Yaël Balbastre, Samuel Joutard, Marc Modat, Gal Lifshitz, Dan Raviv, **xin Lv , et al. (28 additional authors not shown)

Abstract: Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing… ▽ More Image registration is a fundamental medical image analysis task, and a wide variety of approaches have been proposed. However, only a few studies have comprehensively compared medical image registration approaches on a wide range of clinically relevant tasks. This limits the development of registration methods, the adoption of research advances into practice, and a fair benchmark across competing approaches. The Learn2Reg challenge addresses these limitations by providing a multi-task medical image registration data set for comprehensive characterisation of deformable registration algorithms. A continuous evaluation will be possible at https://learn2reg.grand-challenge.org. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation. We established an easily accessible framework for training and validation of 3D registration methods, which enabled the compilation of results of over 65 individual method submissions from more than 20 unique teams. We used a complementary set of metrics, including robustness, accuracy, plausibility, and runtime, enabling unique insight into the current state-of-the-art of medical image registration. This paper describes datasets, tasks, evaluation methods and results of the challenge, as well as results of further analysis of transferability to new datasets, the importance of label supervision, and resulting bias. While no single approach worked best across all tasks, many methodological aspects could be identified that push the performance of medical image registration to new state-of-the-art performance. Furthermore, we demystified the common belief that conventional registration methods have to be much slower than deep-learning-based methods. △ Less

Submitted 7 October, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

arXiv:2106.05735 [pdf, other]

doi 10.1038/s41467-022-30695-9

The Medical Segmentation Decathlon

Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical problem. We hypothesized that a method capable of performing well on multiple tasks will generalize well to a previously unseen task and potentially outperform a custom-designed solution. To investigate the hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a biomedical image analysis challenge, in which algorithms compete in a multitude of both tasks and modalities. The underlying data set was designed to explore the axis of difficulties typically encountered when dealing with medical images, such as small data sets, unbalanced labels, multi-site data and small objects. The MSD challenge confirmed that algorithms with a consistent good performance on a set of tasks preserved their good average performance on a different set of previously unseen tasks. Moreover, by monitoring the MSD winner for two years, we found that this algorithm continued generalizing well to a wide range of other clinical problems, further confirming our hypothesis. Three main conclusions can be drawn from this study: (1) state-of-the-art image segmentation algorithms are mature, accurate, and generalize well when retrained on unseen tasks; (2) consistent algorithmic performance across multiple tasks is a strong surrogate of algorithmic generalizability; (3) the training of accurate AI segmentation models is now commoditized to non AI experts. △ Less

Submitted 10 June, 2021; originally announced June 2021.

MSC Class: 68T07

arXiv:2106.01351 [pdf, other]

Deep Clustering Activation Maps for Emphysema Subty**

Authors: Weiyi Xie, Colin Jacobs, Bram van Ginneken

Abstract: We propose a deep learning clustering method that exploits dense features from a segmentation network for emphysema subty** from computed tomography (CT) scans. Using dense features enables high-resolution visualization of image regions corresponding to the cluster assignment via dense clustering activation maps (dCAMs). This approach provides model interpretability. We evaluated clustering resu… ▽ More We propose a deep learning clustering method that exploits dense features from a segmentation network for emphysema subty** from computed tomography (CT) scans. Using dense features enables high-resolution visualization of image regions corresponding to the cluster assignment via dense clustering activation maps (dCAMs). This approach provides model interpretability. We evaluated clustering results on 500 subjects from the COPDGenestudy, where radiologists manually annotated emphysema sub-types according to their visual CT assessment. We achieved a 43% unsupervised clustering accuracy, outperforming our baseline at 41% and yielding results comparable to supervised classification at 45%. The proposed method also offers a better cluster formation than the baseline, achieving0.54 in silhouette coefficient and 0.55 in David-Bouldin scores. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.11748 [pdf, other]

Dense Regression Activation Maps For Lesion Segmentation in CT scans of COVID-19 patients

Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Bram van Ginneken

Abstract: Automatic lesion segmentation on thoracic CT enables rapid quantitative analysis of lung involvement in COVID-19 infections. However, obtaining a large amount of voxel-level annotations for training segmentation networks is prohibitively expensive. Therefore, we propose a weakly-supervised segmentation method based on dense regression activation maps (dRAMs). Most weakly-supervised segmentation ap… ▽ More Automatic lesion segmentation on thoracic CT enables rapid quantitative analysis of lung involvement in COVID-19 infections. However, obtaining a large amount of voxel-level annotations for training segmentation networks is prohibitively expensive. Therefore, we propose a weakly-supervised segmentation method based on dense regression activation maps (dRAMs). Most weakly-supervised segmentation approaches exploit class activation maps (CAMs) to localize objects. However, because CAMs were trained for classification, they do not align precisely with the object segmentations. Instead, we produce high-resolution activation maps using dense features from a segmentation network that was trained to estimate a per-lobe lesion percentage. In this way, the network can exploit knowledge regarding the required lesion volume. In addition, we propose an attention neural network module to refine dRAMs, optimized together with the main regression task. We evaluated our algorithm on 90 subjects. Results show our method achieved 70.2% Dice coefficient, substantially outperforming the CAM-based baseline at 48.6%. △ Less

Submitted 18 November, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2105.01181 [pdf, other]

doi 10.1002/mp.15655

Automated Estimation of Total Lung Volume using Chest Radiographs and Deep Learning

Authors: Ecem Sogancioglu, Keelin Murphy, Ernst Th. Scholten, Luuk H. Boulogne, Mathias Prokop, Bram van Ginneken

Abstract: Total lung volume is an important quantitative biomarker and is used for the assessment of restrictive lung diseases. In this study, we investigate the performance of several deep-learning approaches for automated measurement of total lung volume from chest radiographs. 7621 posteroanterior and lateral view chest radiographs (CXR) were collected from patients with chest CT available. Similarly, 92… ▽ More Total lung volume is an important quantitative biomarker and is used for the assessment of restrictive lung diseases. In this study, we investigate the performance of several deep-learning approaches for automated measurement of total lung volume from chest radiographs. 7621 posteroanterior and lateral view chest radiographs (CXR) were collected from patients with chest CT available. Similarly, 928 CXR studies were chosen from patients with pulmonary function test (PFT) results. The reference total lung volume was calculated from lung segmentation on CT or PFT data, respectively. This dataset was used to train deep-learning architectures to predict total lung volume from chest radiographs. The experiments were constructed in a step-wise fashion with increasing complexity to demonstrate the effect of training with CT-derived labels only and the sources of error. The optimal models were tested on 291 CXR studies with reference lung volume obtained from PFT. The optimal deep-learning regression model showed an MAE of 408 ml and a MAPE of 8.1\% and Pearson's r = 0.92 using both frontal and lateral chest radiographs as input. CT-derived labels were useful for pre-training but the optimal performance was obtained by fine-tuning the network with PFT-derived labels. We demonstrate, for the first time, that state-of-the-art deep learning solutions can accurately measure total lung volume from plain chest radiographs. The proposed model can be used to obtain total lung volume from routinely acquired chest radiographs at no additional cost and could be a useful tool to identify trends over time in patients referred regularly for chest x-rays. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: Under review

arXiv:2104.05642 [pdf, other]

Common Limitations of Image Processing Metrics: A Picture Story

Authors: Annika Reinke, Minu D. Tizabi, Carole H. Sudre, Matthias Eisenmann, Tim Rädsch, Michael Baumgartner, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Peter Bankhead, Arriel Benis, Matthew Blaschko, Florian Buettner, M. Jorge Cardoso, Jianxu Chen, Veronika Cheplygina, Evangelia Christodoulou, Beth Cimini, Gary S. Collins, Sandy Engelhardt, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken , et al. (68 additional authors not shown)

Abstract: While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using spe… ▽ More While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide. △ Less

Submitted 6 December, 2023; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Shared first authors: Annika Reinke and Minu D. Tizabi. This is a dynamic paper on limitations of commonly used metrics. It discusses metrics for image-level classification, semantic and instance segmentation, and object detection. For missing use cases, comments or questions, please contact [email protected]. Substantial contributions to this document will be acknowledged with a co-authorship

arXiv:2103.13833 [pdf]

doi 10.1371/journal.pone.0255301

Deep Learning with robustness to missing data: A novel approach to the detection of COVID-19

Authors: Erdi Çallı, Keelin Murphy, Steef Kurstjens, Tijs Samson, Robert Herpers, Henk Smits, Matthieu Rutten, Bram van Ginneken

Abstract: In the context of the current global pandemic and the limitations of the RT-PCR test, we propose a novel deep learning architecture, DFCN (Denoising Fully Connected Network). Since medical facilities around the world differ enormously in what laboratory tests or chest imaging may be available, DFCN is designed to be robust to missing input data. An ablation study extensively evaluates the performa… ▽ More In the context of the current global pandemic and the limitations of the RT-PCR test, we propose a novel deep learning architecture, DFCN (Denoising Fully Connected Network). Since medical facilities around the world differ enormously in what laboratory tests or chest imaging may be available, DFCN is designed to be robust to missing input data. An ablation study extensively evaluates the performance benefits of the DFCN as well as its robustness to missing inputs. Data from 1088 patients with confirmed RT-PCR results are obtained from two independent medical facilities. The data includes results from 27 laboratory tests and a chest x-ray scored by a deep learning model. Training and test datasets are taken from different medical facilities. Data is made publicly available. The performance of DFCN in predicting the RT-PCR result is compared with 3 related architectures as well as a Random Forest baseline. All models are trained with varying levels of masked input data to encourage robustness to missing inputs. Missing data is simulated at test time by masking inputs randomly. DFCN outperforms all other models with statistical significance using random subsets of input data with 2-27 available inputs. When all 28 inputs are available DFCN obtains an AUC of 0.924, higher than any other model. Furthermore, with clinically meaningful subsets of parameters consisting of just 6 and 7 inputs respectively, DFCN achieves higher AUCs than any other model, with values of 0.909 and 0.919. △ Less

Submitted 2 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.08700 [pdf, other]

doi 10.1016/j.media.2021.102125

Deep Learning for Chest X-ray Analysis: A Survey

Authors: Ecem Sogancioglu, Erdi Çallı, Bram van Ginneken, Kicky G. van Leeuwen, Keelin Murphy

Abstract: Recent advances in deep learning have led to a promising performance in many medical image analysis tasks. As the most commonly performed radiological exam, chest radiographs are a particularly important modality for which a variety of applications have been researched. The release of multiple, large, publicly available chest X-ray datasets in recent years has encouraged research interest and boos… ▽ More Recent advances in deep learning have led to a promising performance in many medical image analysis tasks. As the most commonly performed radiological exam, chest radiographs are a particularly important modality for which a variety of applications have been researched. The release of multiple, large, publicly available chest X-ray datasets in recent years has encouraged research interest and boosted the number of publications. In this paper, we review all studies using deep learning on chest radiographs, categorizing works by task: image-level prediction (classification and regression), segmentation, localization, image generation and domain adaptation. Commercially available applications are detailed, and a comprehensive discussion of the current state of the art and potential future directions are provided. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: Under review in Medical Image Analysis

arXiv:2101.06468 [pdf, other]

Adversarial cycle-consistent synthesis of cerebral microbleeds for data augmentation

Authors: Khrystyna Faryna, Kevin Koschmieder, Marcella M. Paul, Thomas van den Heuvel, Anke van der Eerden, Rashindra Manniesing, Bram van Ginneken

Abstract: We propose a novel framework for controllable pathological image synthesis for data augmentation. Inspired by CycleGAN, we perform cycle-consistent image-to-image translation between two domains: healthy and pathological. Guided by a semantic mask, an adversarially trained generator synthesizes pathology on a healthy image in the specified location. We demonstrate our approach on an institutional… ▽ More We propose a novel framework for controllable pathological image synthesis for data augmentation. Inspired by CycleGAN, we perform cycle-consistent image-to-image translation between two domains: healthy and pathological. Guided by a semantic mask, an adversarially trained generator synthesizes pathology on a healthy image in the specified location. We demonstrate our approach on an institutional dataset of cerebral microbleeds in traumatic brain injury patients. We utilize synthetic images generated with our method for data augmentation in cerebral microbleeds detection. Enriching the training dataset with synthetic images exhibits the potential to increase detection performance for cerebral microbleeds in traumatic brain injury patients. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: Accepted in Medical Imaging meets NIPS Workshop, NIPS 2020

arXiv:2009.11120 [pdf, other]

doi 10.1016/j.cmpb.2020.105821

Anisotropic 3D Multi-Stream CNN for Accurate Prostate Segmentation from Multi-Planar MRI

Authors: Anneke Meyer, Grzegorz Chlebus, Marko Rak, Daniel Schindele, Martin Schostak, Bram van Ginneken, Andrea Schenk, Hans Meine, Horst K. Hahn, Andreas Schreiber, Christian Hansen

Abstract: Background and Objective: Accurate and reliable segmentation of the prostate gland in MR images can support the clinical assessment of prostate cancer, as well as the planning and monitoring of focal and loco-regional therapeutic interventions. Despite the availability of multi-planar MR scans due to standardized protocols, the majority of segmentation approaches presented in the literature consid… ▽ More Background and Objective: Accurate and reliable segmentation of the prostate gland in MR images can support the clinical assessment of prostate cancer, as well as the planning and monitoring of focal and loco-regional therapeutic interventions. Despite the availability of multi-planar MR scans due to standardized protocols, the majority of segmentation approaches presented in the literature consider the axial scans only. Methods: We propose an anisotropic 3D multi-stream CNN architecture, which processes additional scan directions to produce a higher-resolution isotropic prostate segmentation. We investigate two variants of our architecture, which work on two (dual-plane) and three (triple-plane) image orientations, respectively. We compare them with the standard baseline (single-plane) used in literature, i.e., plain axial segmentation. To realize a fair comparison, we employ a hyperparameter optimization strategy to select optimal configurations for the individual approaches. Results: Training and evaluation on two datasets spanning multiple sites obtain statistical significant improvement over the plain axial segmentation ($p<0.05$ on the Dice similarity coefficient). The improvement can be observed especially at the base ($0.898$ single-plane vs. $0.906$ triple-plane) and apex ($0.888$ single-plane vs. $0.901$ dual-plane). Conclusion: This study indicates that models employing two or three scan directions are superior to plain axial segmentation. The knowledge of precise boundaries of the prostate is crucial for the conservation of risk structures. Thus, the proposed models have the potential to improve the outcome of prostate cancer diagnosis and therapies. △ Less

Submitted 2 December, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: Accepted manuscript in Elsevier Computer Methods and Programs in Biomedicine. Anneke Meyer and Grzegorz Chlebus contributed equally to this work. Sourcecode available at https://github.com/AnnekeMeyer/AnisotropicMultiStreamCNN. Data available at https://doi.org/10.7937/TCIA.2019.DEG7ZG1U

arXiv:2009.09725 [pdf, other]

Improving Automated COVID-19 Grading with Convolutional Neural Networks in Computed Tomography Scans: An Ablation Study

Authors: Coen de Vente, Luuk H. Boulogne, Kiran Vaidhya Venkadesh, Cheryl Sital, Nikolas Lessmann, Colin Jacobs, Clara I. Sánchez, Bram van Ginneken

Abstract: Amidst the ongoing pandemic, several studies have shown that COVID-19 classification and grading using computed tomography (CT) images can be automated with convolutional neural networks (CNNs). Many of these studies focused on reporting initial results of algorithms that were assembled from commonly used components. The choice of these components was often pragmatic rather than systematic. For in… ▽ More Amidst the ongoing pandemic, several studies have shown that COVID-19 classification and grading using computed tomography (CT) images can be automated with convolutional neural networks (CNNs). Many of these studies focused on reporting initial results of algorithms that were assembled from commonly used components. The choice of these components was often pragmatic rather than systematic. For instance, several studies used 2D CNNs even though these might not be optimal for handling 3D CT volumes. This paper identifies a variety of components that increase the performance of CNN-based algorithms for COVID-19 grading from CT images. We investigated the effectiveness of using a 3D CNN instead of a 2D CNN, of using transfer learning to initialize the network, of providing automatically computed lesion maps as additional network input, and of predicting a continuous instead of a categorical output. A 3D CNN with these components achieved an area under the ROC curve (AUC) of 0.934 on our test set of 105 CT scans and an AUC of 0.923 on a publicly available set of 742 CT scans, a substantial improvement in comparison with a previously published 2D CNN. An ablation study demonstrated that in addition to using a 3D CNN instead of a 2D CNN transfer learning contributed the most and continuous output contributed the least to improving the model performance. △ Less

Submitted 21 September, 2020; originally announced September 2020.

Comments: 9 pages, 6 figures

arXiv:2008.09104 [pdf, other]

doi 10.1109/JPROC.2021.3054390

A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises

Authors: S. Kevin Zhou, Hayit Greenspan, Christos Davatzikos, James S. Duncan, Bram van Ginneken, Anant Madabhushi, Jerry L. Prince, Daniel Rueckert, Ronald M. Summers

Abstract: Since its renaissance, deep learning has been widely used in various medical imaging tasks and has achieved remarkable success in many medical imaging applications, thereby propelling us into the so-called artificial intelligence (AI) era. It is known that the success of AI is mostly attributed to the availability of big data with annotations for a single task and the advances in high performance… ▽ More Since its renaissance, deep learning has been widely used in various medical imaging tasks and has achieved remarkable success in many medical imaging applications, thereby propelling us into the so-called artificial intelligence (AI) era. It is known that the success of AI is mostly attributed to the availability of big data with annotations for a single task and the advances in high performance computing. However, medical imaging presents unique challenges that confront deep learning approaches. In this survey paper, we first present traits of medical imaging, highlight both clinical needs and technical challenges in medical imaging, and describe how emerging trends in deep learning are addressing these issues. We cover the topics of network architecture, sparse and noisy labels, federating learning, interpretability, uncertainty quantification, etc. Then, we present several case studies that are commonly found in clinical practice, including digital pathology and chest, brain, cardiovascular, and abdominal imaging. Rather than presenting an exhaustive literature survey, we instead describe some prominent research highlights related to these case study applications. We conclude with a discussion and presentation of promising future directions. △ Less

Submitted 5 March, 2021; v1 submitted 2 August, 2020; originally announced August 2020.

Comments: 20 pages, 7 figures

Journal ref: Proceedings of the IEEE (2021)

arXiv:2006.06356 [pdf, other]

doi 10.1016/j.media.2021.102141

Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors

Authors: Gerda Bortsova, Cristina González-Gonzalo, Suzanne C. Wetstein, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bart Liefers, Bram van Ginneken, Josien P. W. Pluim, Mitko Veta, Clara I. Sánchez, Marleen de Bruijne

Abstract: Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure. In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep l… ▽ More Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure. In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples. We consider this to be the most realistic scenario for MedIA systems. Firstly, we study the effect of weight initialization (ImageNet vs. random) on the transferability of adversarial attacks from the surrogate model to the target model. Secondly, we study the influence of differences in development data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks. Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when develo** security-critical MedIA systems planned to be deployed in clinical practice. △ Less

Submitted 17 June, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: First three authors contributed equally

Journal ref: Medical Image Analysis. Available online 18 Jun 2021

arXiv:2004.07443 [pdf, other]

doi 10.1109/TMI.2020.2995108

Relational Modeling for Robust and Efficient Pulmonary Lobe Segmentation in CT Scans

Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Bram van Ginneken

Abstract: Pulmonary lobe segmentation in computed tomography scans is essential for regional assessment of pulmonary diseases. Recent works based on convolution neural networks have achieved good performance for this task. However, they are still limited in capturing structured relationships due to the nature of convolution. The shape of the pulmonary lobes affect each other and their borders relate to the… ▽ More Pulmonary lobe segmentation in computed tomography scans is essential for regional assessment of pulmonary diseases. Recent works based on convolution neural networks have achieved good performance for this task. However, they are still limited in capturing structured relationships due to the nature of convolution. The shape of the pulmonary lobes affect each other and their borders relate to the appearance of other structures, such as vessels, airways, and the pleural wall. We argue that such structural relationships play a critical role in the accurate delineation of pulmonary lobes when the lungs are affected by diseases such as COVID-19 or COPD. In this paper, we propose a relational approach (RTSU-Net) that leverages structured relationships by introducing a novel non-local neural network module. The proposed module learns both visual and geometric relationships among all convolution features to produce self-attention weights. With a limited amount of training data available from COVID-19 subjects, we initially train and validate RTSU-Net on a cohort of 5000 subjects from the COPDGene study (4000 for training and 1000 for evaluation). Using models pre-trained on COPDGene, we apply transfer learning to retrain and evaluate RTSU-Net on 470 COVID-19 suspects (370 for retraining and 100 for evaluation). Experimental results show that RTSU-Net outperforms three baselines and performs robustly on cases with severe lung infection due to COVID-19. △ Less

Submitted 12 May, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2003.06158 [pdf, other]

Random smooth gray value transformations for cross modality learning with gray value invariant networks

Authors: Nikolas Lessmann, Bram van Ginneken

Abstract: Random transformations are commonly used for augmentation of the training data with the goal of reducing the uniformity of the training samples. These transformations normally aim at variations that can be expected in images from the same modality. Here, we propose a simple method for transforming the gray values of an image with the goal of reducing cross modality differences. This approach enabl… ▽ More Random transformations are commonly used for augmentation of the training data with the goal of reducing the uniformity of the training samples. These transformations normally aim at variations that can be expected in images from the same modality. Here, we propose a simple method for transforming the gray values of an image with the goal of reducing cross modality differences. This approach enables segmentation of the lumbar vertebral bodies in CT images using a network trained exclusively with MR images. The source code is made available at https://github.com/nlessmann/rsgt △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:1910.04071 [pdf]

BIAS: Transparent reporting of biomedical image analysis challenges

Authors: Lena Maier-Hein, Annika Reinke, Michal Kozubek, Anne L. Martel, Tal Arbel, Matthias Eisenmann, Allan Hanbuary, Pierre Jannin, Henning Müller, Sinan Onogur, Julio Saez-Rodriguez, Bram van Ginneken, Annette Kopp-Schneider, Bennett Landman

Abstract: The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility… ▽ More The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical I mage Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit. △ Less

Submitted 31 August, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: 2 Appendices - Appendix A: BIAS reporting guideline for biomedical image analysis challenges, Appendix B: Glossary; 2 Supplements - Suppl 1: Form for summarizing information on challenge organization, Suppl 2: Structured description of a challenge design

arXiv:1909.12047 [pdf, other]

Function Follows Form: Regression from Complete Thoracic Computed Tomography Scans

Authors: Max Argus, Cornelia Schaefer-Prokop, David A. Lynch, Bram van Ginneken

Abstract: Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality. While COPD diagnosis is based on lung function tests, early stages and progression of different aspects of the disease can be visible and quantitatively assessed on computed tomography (CT) scans. Many studies have been published that quantify imaging biomarkers related to COPD. In this paper we present a c… ▽ More Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality. While COPD diagnosis is based on lung function tests, early stages and progression of different aspects of the disease can be visible and quantitatively assessed on computed tomography (CT) scans. Many studies have been published that quantify imaging biomarkers related to COPD. In this paper we present a convolutional neural network that directly computes visual emphysema scores and predicts the outcome of lung function tests for 195 CT scans from the COPDGene study. Contrary to previous work, the proposed method does not encode any specific prior knowledge about what to quantify, but it is trained end-to-end with a set of 1424 CT scans for which the output parameters were available. The network provided state-of-the-art results for these tasks: Visual emphysema scores are comparable to those assessed by trained human observers; COPD diagnosis from estimated lung function reaches an area under the ROC curve of 0.94, outperforming prior art. The method is easily generalizable to other situations where information from whole scans needs to be summarized in single quantities. △ Less

Submitted 27 September, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

arXiv:1908.05621 [pdf]

doi 10.1016/j.ophtha.2020.02.009

A deep learning model for segmentation of geographic atrophy to study its long-term natural history

Authors: Bart Liefers, Johanna M. Colijn, Cristina González-Gonzalo, Timo Verzijden, Paul Mitchell, Carel B. Hoyng, Bram van Ginneken, Caroline C. W. Klaver, Clara I. Sánchez

Abstract: Purpose: To develop and validate a deep learning model for automatic segmentation of geographic atrophy (GA) in color fundus images (CFIs) and its application to study growth rate of GA. Participants: 409 CFIs of 238 eyes with GA from the Rotterdam Study (RS) and the Blue Mountain Eye Study (BMES) for model development, and 5,379 CFIs of 625 eyes from the Age-Related Eye Disease Study (AREDS) for… ▽ More Purpose: To develop and validate a deep learning model for automatic segmentation of geographic atrophy (GA) in color fundus images (CFIs) and its application to study growth rate of GA. Participants: 409 CFIs of 238 eyes with GA from the Rotterdam Study (RS) and the Blue Mountain Eye Study (BMES) for model development, and 5,379 CFIs of 625 eyes from the Age-Related Eye Disease Study (AREDS) for analysis of GA growth rate. Methods: A deep learning model based on an ensemble of encoder-decoder architectures was implemented and optimized for the segmentation of GA in CFIs. Four experienced graders delineated GA in CFIs from RS and BMES. These manual delineations were used to evaluate the segmentation model using 5-fold cross-validation. The model was further applied to CFIs from the AREDS to study the growth rate of GA. Linear regression analysis was used to study associations between structural biomarkers at baseline and GA growth rate. A general estimate of the progression of GA area over time was made by combining growth rates of all eyes with GA from the AREDS set. Results: The model obtained an average Dice coefficient of 0.72 $\pm$ 0.26 on the BMES and RS. An intraclass correlation coefficient of 0.83 was reached between the automatically estimated GA area and the graders' consensus measures. Eight automatically calculated structural biomarkers (area, filled area, convex area, convex solidity, eccentricity, roundness, foveal involvement and perimeter) were significantly associated with growth rate. Combining all growth rates indicated that GA area grows quadratically up to an area of around 12 mm$^{2}$, after which growth rate stabilizes or decreases. Conclusion: The presented deep learning model allowed for fully automatic and robust segmentation of GA in CFIs. These segmentations can be used to extract structural characteristics of GA that predict its growth rate. △ Less

Submitted 15 August, 2019; originally announced August 2019.

Comments: 22 pages, 3 tables, 4 figures, 1 supplemental figure

Journal ref: Ophthalmology, Published February 14, 2020

arXiv:1908.00295 [pdf, other]

Chest CT Super-resolution and Domain-adaptation using Memory-efficient 3D Reversible GANs

Authors: Tycho F. A. van der Ouderaa, Daniel E. Worrall, Bram van Ginneken

Abstract: Recently, paired (e.g. Pix2pix) and unpaired (e.g. CycleGAN) image-to-image translation methods have shown effective in medical imaging tasks. In practice, however, it can be difficult to apply these deep models on medical data volumes, such as from MR and CT scans, since such data volumes tend to be 3-dimensional and of a high spatial resolution pushing the limits of the memory constraints of GPU… ▽ More Recently, paired (e.g. Pix2pix) and unpaired (e.g. CycleGAN) image-to-image translation methods have shown effective in medical imaging tasks. In practice, however, it can be difficult to apply these deep models on medical data volumes, such as from MR and CT scans, since such data volumes tend to be 3-dimensional and of a high spatial resolution pushing the limits of the memory constraints of GPU hardware that is typically used to train these models. Recent studies in the field of invertible neural networks have shown that reversible neural networks do not require to store intermediate activations for training. We use the RevGAN model that makes use of this property to perform memory-efficient partially-reversible image-to-image translation. We demonstrate this by performing a 3D super-resolution and a 3D domain-adaptation task on chest CT data volumes. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/SkxueFsiFV

arXiv:1907.12314 [pdf, other]

Automated interpretation of prenatal ultrasound using a predefined acquisition protocol in resource-limited countries

Authors: Thomas L. A. van den Heuvel, Chris L. de Korte, Bram van Ginneken

Abstract: In this study, we combine a standardized acquisition protocol with image analysis algorithms to investigate if it is possible to automatically detect maternal risk factors without a trained sonographer. The standardized acquisition protocol can be taught to any health care worker within two hours. This protocol was acquired from 280 pregnant women at St.\ Luke's Catholic Hospital, Wolisso, Ethiopi… ▽ More In this study, we combine a standardized acquisition protocol with image analysis algorithms to investigate if it is possible to automatically detect maternal risk factors without a trained sonographer. The standardized acquisition protocol can be taught to any health care worker within two hours. This protocol was acquired from 280 pregnant women at St.\ Luke's Catholic Hospital, Wolisso, Ethiopia. A VGG-like network was used to perform a frame classification for each frame within the acquired ultrasound data. This frame classification was used to automatically determine the number of fetuses and the fetal presentation. A U-net was trained to measure the fetal head circumference in all frames in which the VGG-like network detected a fetal head. This head circumference was used to estimate the gestational age. The results show that it possible automatically determine gestational age and determine fetal presentation and the potential to detect twin pregnancies using the standardized acquisition protocol. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/H1eUCb6at4

arXiv:1907.10978 [pdf, other]

Vertebra partitioning with thin-plate spline surfaces steered by a convolutional neural network

Authors: Nikolas Lessmann, Jelmer M. Wolterink, Majd Zreik, Max A. Viergever, Bram van Ginneken, Ivana Išgum

Abstract: Thin-plate splines can be used for interpolation of image values, but can also be used to represent a smooth surface, such as the boundary between two structures. We present a method for partitioning vertebra segmentation masks into two substructures, the vertebral body and the posterior elements, using a convolutional neural network that predicts the boundary between the two structures. This boun… ▽ More Thin-plate splines can be used for interpolation of image values, but can also be used to represent a smooth surface, such as the boundary between two structures. We present a method for partitioning vertebra segmentation masks into two substructures, the vertebral body and the posterior elements, using a convolutional neural network that predicts the boundary between the two structures. This boundary is modeled as a thin-plate spline surface defined by a set of control points predicted by the network. The neural network is trained using the reconstruction error of a convolutional autoencoder to enable the use of unpaired data. △ Less

Submitted 25 July, 2019; originally announced July 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/B1eQv5INqV

arXiv:1907.07980 [pdf, other]

doi 10.1016/S1470-2045(19)30739-9

Automated Gleason Grading of Prostate Biopsies using Deep Learning

Authors: Wouter Bulten, Hans Pinckaers, Hester van Boven, Robert Vink, Thomas de Bel, Bram van Ginneken, Jeroen van der Laak, Christina Hulsbergen-van de Kaa, Geert Litjens

Abstract: The Gleason score is the most important prognostic marker for prostate cancer patients but suffers from significant inter-observer variability. We developed a fully automated deep learning system to grade prostate biopsies. The system was developed using 5834 biopsies from 1243 patients. A semi-automatic labeling technique was used to circumvent the need for full manual annotation by pathologists.… ▽ More The Gleason score is the most important prognostic marker for prostate cancer patients but suffers from significant inter-observer variability. We developed a fully automated deep learning system to grade prostate biopsies. The system was developed using 5834 biopsies from 1243 patients. A semi-automatic labeling technique was used to circumvent the need for full manual annotation by pathologists. The developed system achieved a high agreement with the reference standard. In a separate observer experiment, the deep learning system outperformed 10 out of 15 pathologists. The system has the potential to improve prostate cancer prognostics by acting as a first or second reader. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: 13 pages, 6 figures

Journal ref: The Lancet Oncology, Available online 8 January 2020

arXiv:1907.01253 [pdf, other]

FRODO: Free rejection of out-of-distribution samples: application to chest x-ray analysis

Authors: Erdi Çallı, Keelin Murphy, Ecem Sogancioglu, Bram van Ginneken

Abstract: In this work, we propose a method to reject out-of-distribution samples which can be adapted to any network architecture and requires no additional training data. Publicly available chest x-ray data (38,353 images) is used to train a standard ResNet-50 model to detect emphysema. Feature activations of intermediate layers are used as descriptors defining the training data distribution. A novel metr… ▽ More In this work, we propose a method to reject out-of-distribution samples which can be adapted to any network architecture and requires no additional training data. Publicly available chest x-ray data (38,353 images) is used to train a standard ResNet-50 model to detect emphysema. Feature activations of intermediate layers are used as descriptors defining the training data distribution. A novel metric, FRODO, is measured by using the Mahalanobis distance of a new test sample to the training data distribution. The method is tested using a held-out test dataset of 21,176 chest x-rays (in-distribution) and a set of 14,821 out-of-distribution x-ray images of incorrect orientation or anatomy. In classifying test samples as in or out-of distribution, our method achieves an AUC score of 0.99. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/H1e7kWD794

arXiv:1903.03349 [pdf]

doi 10.1038/s41598-020-62148-y

Computer aided detection of tuberculosis on chest radiographs: An evaluation of the CAD4TB v6 system

Authors: Keelin Murphy, Shifa Salman Habib, Syed Mohammad Asad Zaidi, Saira Khowaja, Aamir Khan, Jaime Melendez, Ernst T. Scholten, Farhan Amad, Steven Schalekamp, Maurits Verhagen, Rick H. H. M. Philipsen, Annet Meijers, Bram van Ginneken

Abstract: There is a growing interest in the automated analysis of chest X-Ray (CXR) as a sensitive and inexpensive means of screening susceptible populations for pulmonary tuberculosis. In this work we evaluate the latest version of CAD4TB, a commercial software platform designed for this purpose. Version 6 of CAD4TB was released in 2018 and is here tested on a fully independent dataset of 5565 CXR images… ▽ More There is a growing interest in the automated analysis of chest X-Ray (CXR) as a sensitive and inexpensive means of screening susceptible populations for pulmonary tuberculosis. In this work we evaluate the latest version of CAD4TB, a commercial software platform designed for this purpose. Version 6 of CAD4TB was released in 2018 and is here tested on a fully independent dataset of 5565 CXR images with GeneXpert (Xpert) sputum test results available (854 Xpert positive subjects). A subset of 500 subjects (50% Xpert positive) was reviewed and annotated by 5 expert observers independently to obtain a radiological reference standard. The latest version of CAD4TB is found to outperform all previous versions in terms of area under receiver operating curve (ROC) with respect to both Xpert and radiological reference standards. Improvements with respect to Xpert are most apparent at high sensitivity levels with a specificity of 76% obtained at a fixed 90% sensitivity. When compared with the radiological reference standard, CAD4TB v6 also outperformed previous versions by a considerable margin and achieved 98% specificity at the 90% sensitivity setting. No substantial difference was found between the performance of CAD4TB v6 and any of the various expert observers against the Xpert reference standard. A cost and efficiency analysis on this dataset demonstrates that in a standard clinical situation, operating at 90% sensitivity, users of CAD4TB v6 can process 132 subjects per day at n average cost per screen of \$5.95 per subject, while users of version 3 process only 85 subjects per day at a cost of \$8.38 per subject. At all tested operating points version 6 is shown to be more efficient and cost effective than any other version. △ Less

Submitted 2 April, 2020; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: Published in Scientific Reports

Journal ref: Scientific Reports 10, 5492 (2020)

arXiv:1902.09063 [pdf, other]

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

Authors: Amber L. Simpson, Michela Antonelli, Spyridon Bakas, Michel Bilello, Keyvan Farahani, Bram van Ginneken, Annette Kopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc Gollub, Jennifer Golia-Pernicka, Stephan H. Heckers, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Eugene Vorontsov, Lena Maier-Hein, M. Jorge Cardoso

Abstract: Semantic segmentation of medical images aims to associate a pixel with a label in a medical image without human initialization. The success of semantic segmentation algorithms is contingent on the availability of high-quality imaging data with corresponding labels provided by experts. We sought to create a large collection of annotated medical image datasets of various clinically relevant anatomie… ▽ More Semantic segmentation of medical images aims to associate a pixel with a label in a medical image without human initialization. The success of semantic segmentation algorithms is contingent on the availability of high-quality imaging data with corresponding labels provided by experts. We sought to create a large collection of annotated medical image datasets of various clinically relevant anatomies available under open source license to facilitate the development of semantic segmentation algorithms. Such a resource would allow: 1) objective assessment of general-purpose segmentation methods through comprehensive benchmarking and 2) open and free access to medical image data for any researcher interested in the problem domain. Through a multi-institutional effort, we generated a large, curated dataset representative of several highly variable segmentation tasks that was used in a crowd-sourced challenge - the Medical Segmentation Decathlon held during the 2018 Medical Image Computing and Computer Aided Interventions Conference in Granada, Spain. Here, we describe these ten labeled image datasets so that these data may be effectively reused by the research community. △ Less

Submitted 24 February, 2019; originally announced February 2019.

Showing 1–36 of 36 results for author: van Ginneken, B