-
QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results
Authors:
Raghav Mehta,
Angelos Filos,
Ujjwal Baid,
Chiharu Sako,
Richard McKinley,
Michael Rebsamen,
Katrin Datwyler,
Raphael Meier,
Piotr Radojewski,
Gowtham Krishnan Murugesan,
Sahil Nalawade,
Chandan Ganesh,
Ben Wagner,
Fang F. Yu,
Baowei Fei,
Ananth J. Madhuranthakam,
Joseph A. Maldjian,
Laura Daza,
Catalina Gomez,
Pablo Arbelaez,
Chengliang Dai,
Shuo Wang,
Hadrien Reynaud,
Yuan-han Mo,
Elsa Angelini
, et al. (67 additional authors not shown)
Abstract:
Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying…
▽ More
Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Develo** scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS.
△ Less
Submitted 23 August, 2022; v1 submitted 19 December, 2021;
originally announced December 2021.
-
Addressing catastrophic forgetting for medical domain expansion
Authors:
Sharut Gupta,
Praveer Singh,
Ken Chang,
Liangqiong Qu,
Mehak Aggarwal,
Nishanth Arun,
Ashwin Vaswani,
Shruti Raghavan,
Vibha Agarwal,
Mishka Gidwani,
Katharina Hoebel,
Jay Patel,
Charles Lu,
Christopher P. Bridge,
Daniel L. Rubin,
Jayashree Kalpathy-Cramer
Abstract:
Model brittleness is a key concern when deploying deep learning models in real-world medical settings. A model that has high performance at one institution may suffer a significant decline in performance when tested at other institutions. While pooling datasets from multiple institutions and retraining may provide a straightforward solution, it is often infeasible and may compromise patient privac…
▽ More
Model brittleness is a key concern when deploying deep learning models in real-world medical settings. A model that has high performance at one institution may suffer a significant decline in performance when tested at other institutions. While pooling datasets from multiple institutions and retraining may provide a straightforward solution, it is often infeasible and may compromise patient privacy. An alternative approach is to fine-tune the model on subsequent institutions after training on the original institution. Notably, this approach degrades model performance at the original institution, a phenomenon known as catastrophic forgetting. In this paper, we develop an approach to address catastrophic forget-ting based on elastic weight consolidation combined with modulation of batch normalization statistics under two scenarios: first, for expanding the domain from one imaging system's data to another imaging system's, and second, for expanding the domain from a large multi-institutional dataset to another single institution dataset. We show that our approach outperforms several other state-of-the-art approaches and provide theoretical justification for the efficacy of batch normalization modulation. The results of this study are generally applicable to the deployment of any clinical deep learning model which requires domain expansion.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
The unreasonable effectiveness of Batch-Norm statistics in addressing catastrophic forgetting across medical institutions
Authors:
Sharut Gupta,
Praveer Singh,
Ken Chang,
Mehak Aggarwal,
Nishanth Arun,
Liangqiong Qu,
Katharina Hoebel,
Jay Patel,
Mishka Gidwani,
Ashwin Vaswani,
Daniel L Rubin,
Jayashree Kalpathy-Cramer
Abstract:
Model brittleness is a primary concern when deploying deep learning models in medical settings owing to inter-institution variations, like patient demographics and intra-institution variation, such as multiple scanner types. While simply training on the combined datasets is fraught with data privacy limitations, fine-tuning the model on subsequent institutions after training it on the original ins…
▽ More
Model brittleness is a primary concern when deploying deep learning models in medical settings owing to inter-institution variations, like patient demographics and intra-institution variation, such as multiple scanner types. While simply training on the combined datasets is fraught with data privacy limitations, fine-tuning the model on subsequent institutions after training it on the original institution results in a decrease in performance on the original dataset, a phenomenon called catastrophic forgetting. In this paper, we investigate trade-off between model refinement and retention of previously learned knowledge and subsequently address catastrophic forgetting for the assessment of mammographic breast density. More specifically, we propose a simple yet effective approach, adapting Elastic weight consolidation (EWC) using the global batch normalization (BN) statistics of the original dataset. The results of this study provide guidance for the deployment of clinical deep learning models where continuous learning is needed for domain expansion.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Towards Trainable Saliency Maps in Medical Imaging
Authors:
Mehak Aggarwal,
Nishanth Arun,
Sharut Gupta,
Ashwin Vaswani,
Bryan Chen,
Matthew Li,
Ken Chang,
Jay Patel,
Katherine Hoebel,
Mishka Gidwani,
Jayashree Kalpathy-Cramer,
Praveer Singh
Abstract:
While success of Deep Learning (DL) in automated diagnosis can be transformative to the medicinal practice especially for people with little or no access to doctors, its widespread acceptability is severely limited by inherent black-box decision making and unsafe failure modes. While saliency methods attempt to tackle this problem in non-medical contexts, their apriori explanations do not transfer…
▽ More
While success of Deep Learning (DL) in automated diagnosis can be transformative to the medicinal practice especially for people with little or no access to doctors, its widespread acceptability is severely limited by inherent black-box decision making and unsafe failure modes. While saliency methods attempt to tackle this problem in non-medical contexts, their apriori explanations do not transfer well to medical usecases. With this study we validate a model design element agnostic to both architecture complexity and model task, and show how introducing this element gives an inherently self-explanatory model. We compare our results with state of the art non-trainable saliency maps on RSNA Pneumonia Dataset and demonstrate a much higher localization efficacy using our adopted technique. We also compare, with a fully supervised baseline and provide a reasonable alternative to it's high data labelling overhead. We further investigate the validity of our claims through qualitative evaluation from an expert reader.
△ Less
Submitted 15 November, 2020;
originally announced November 2020.
-
Assessing the (Un)Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging
Authors:
Nishanth Arun,
Nathan Gaw,
Praveer Singh,
Ken Chang,
Mehak Aggarwal,
Bryan Chen,
Katharina Hoebel,
Sharut Gupta,
Jay Patel,
Mishka Gidwani,
Julius Adebayo,
Matthew D. Li,
Jayashree Kalpathy-Cramer
Abstract:
Saliency maps have become a widely used method to make deep learning models more interpretable by providing post-hoc explanations of classifiers through identification of the most pertinent areas of the input medical image. They are increasingly being used in medical imaging to provide clinically plausible explanations for the decisions the neural network makes. However, the utility and robustness…
▽ More
Saliency maps have become a widely used method to make deep learning models more interpretable by providing post-hoc explanations of classifiers through identification of the most pertinent areas of the input medical image. They are increasingly being used in medical imaging to provide clinically plausible explanations for the decisions the neural network makes. However, the utility and robustness of these visualization maps has not yet been rigorously examined in the context of medical imaging. We posit that trustworthiness in this context requires 1) localization utility, 2) sensitivity to model weight randomization, 3) repeatability, and 4) reproducibility. Using the localization information available in two large public radiology datasets, we quantify the performance of eight commonly used saliency map approaches for the above criteria using area under the precision-recall curves (AUPRC) and structural similarity index (SSIM), comparing their performance to various baseline measures. Using our framework to quantify the trustworthiness of saliency maps, we show that all eight saliency map techniques fail at least one of the criteria and are, in most cases, less trustworthy when compared to the baselines. We suggest that their usage in the high-risk domain of medical imaging warrants additional scrutiny and recommend that detection or segmentation models be used if localization is the desired output of the network. Additionally, to promote reproducibility of our findings, we provide the code we used for all tests performed in this work at this link: https://github.com/QTIM-Lab/Assessing-Saliency-Maps.
△ Less
Submitted 14 July, 2021; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Assessing the validity of saliency maps for abnormality localization in medical imaging
Authors:
Nishanth Thumbavanam Arun,
Nathan Gaw,
Praveer Singh,
Ken Chang,
Katharina Viktoria Hoebel,
Jay Patel,
Mishka Gidwani,
Jayashree Kalpathy-Cramer
Abstract:
Saliency maps have become a widely used method to assess which areas of the input image are most pertinent to the prediction of a trained neural network. However, in the context of medical imaging, there is no study to our knowledge that has examined the efficacy of these techniques and quantified them using overlap with ground truth bounding boxes. In this work, we explored the credibility of the…
▽ More
Saliency maps have become a widely used method to assess which areas of the input image are most pertinent to the prediction of a trained neural network. However, in the context of medical imaging, there is no study to our knowledge that has examined the efficacy of these techniques and quantified them using overlap with ground truth bounding boxes. In this work, we explored the credibility of the various existing saliency map methods on the RSNA Pneumonia dataset. We found that GradCAM was the most sensitive to model parameter and label randomization, and was highly agnostic to model architecture.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Electric field Induced Patterns in Soft Visco-elastic films: From Long Waves of Viscous Liquids to Short Waves of Elastic Solids
Authors:
N. Arun,
Ashutosh Sharma,
Partho S. G. Pattader,
Indrani Banerjee,
Hemant M. Dixit,
K. S. Narayan
Abstract:
We show that the electric field driven surface instability of visco-elastic films has two distinct regimes: (1) The visco-elastic films behaving like a liquid display long wavelengths governed by applied voltage and surface tension, independent of its elastic storage and viscous loss moduli, and (2) the films behaving like a solid require a threshold voltage for the instability whose wavelength…
▽ More
We show that the electric field driven surface instability of visco-elastic films has two distinct regimes: (1) The visco-elastic films behaving like a liquid display long wavelengths governed by applied voltage and surface tension, independent of its elastic storage and viscous loss moduli, and (2) the films behaving like a solid require a threshold voltage for the instability whose wavelength always scales as ~ 4 x film thickness, independent of its surface tension, applied voltage, loss and storage moduli. Wavelength in a narrow transition zone between these regimes depends on the storage modulus.
△ Less
Submitted 2 June, 2009;
originally announced June 2009.