Search | arXiv e-print repository

doi 10.1016/j.media.2023.102861

Uncertainty Aware Training to Improve Deep Learning Model Calibration for Classification of Cardiac MR Images

Authors: Tareen Dawood, Chen Chen, Baldeep S. Sidhua, Bram Ruijsink, Justin Goulda, Bradley Porter, Mark K. Elliott, Vishal Mehta, Christopher A. Rinaldi, Esther Puyol-Anton, Reza Razavi, Andrew P. King

Abstract: Quantifying uncertainty of predictions has been identified as one way to develop more trustworthy artificial intelligence (AI) models beyond conventional reporting of performance metrics. When considering their role in a clinical decision support setting, AI classification models should ideally avoid confident wrong predictions and maximise the confidence of correct predictions. Models that do thi… ▽ More Quantifying uncertainty of predictions has been identified as one way to develop more trustworthy artificial intelligence (AI) models beyond conventional reporting of performance metrics. When considering their role in a clinical decision support setting, AI classification models should ideally avoid confident wrong predictions and maximise the confidence of correct predictions. Models that do this are said to be well-calibrated with regard to confidence. However, relatively little attention has been paid to how to improve calibration when training these models, i.e., to make the training strategy uncertainty-aware. In this work we evaluate three novel uncertainty-aware training strategies comparing against two state-of-the-art approaches. We analyse performance on two different clinical applications: cardiac resynchronisation therapy (CRT) response prediction and coronary artery disease (CAD) diagnosis from cardiac magnetic resonance (CMR) images. The best-performing model in terms of both classification accuracy and the most common calibration measure, expected calibration error (ECE) was the Confidence Weight method, a novel approach that weights the loss of samples to explicitly penalise confident incorrect predictions. The method reduced the ECE by 17% for CRT response prediction and by 22% for CAD diagnosis when compared to a baseline classifier in which no uncertainty-aware strategy was included. In both applications, as well as reducing the ECE there was a slight increase in accuracy from 69% to 70% and 70% to 72% for CRT response prediction and CAD diagnosis respectively. However, our analysis showed a lack of consistency in terms of optimal models when using different calibration measures. This indicates the need for careful consideration of performance metrics when training and selecting models for complex high-risk applications in healthcare. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.13861 [pdf, other]

Bias in Unsupervised Anomaly Detection in Brain MRI

Authors: Cosmin I. Bercea, Esther Puyol-Antón, Benedikt Wiestler, Daniel Rueckert, Julia A. Schnabel, Andrew P. King

Abstract: Unsupervised anomaly detection methods offer a promising and flexible alternative to supervised approaches, holding the potential to revolutionize medical scan analysis and enhance diagnostic performance. In the current landscape, it is commonly assumed that differences between a test case and the training distribution are attributed solely to pathological conditions, implying that any disparity… ▽ More Unsupervised anomaly detection methods offer a promising and flexible alternative to supervised approaches, holding the potential to revolutionize medical scan analysis and enhance diagnostic performance. In the current landscape, it is commonly assumed that differences between a test case and the training distribution are attributed solely to pathological conditions, implying that any disparity indicates an anomaly. However, the presence of other potential sources of distributional shift, including scanner, age, sex, or race, is frequently overlooked. These shifts can significantly impact the accuracy of the anomaly detection task. Prominent instances of such failures have sparked concerns regarding the bias, credibility, and fairness of anomaly detection. This work presents a novel analysis of biases in unsupervised anomaly detection. By examining potential non-pathological distributional shifts between the training and testing distributions, we shed light on the extent of these biases and their influence on anomaly detection results. Moreover, this study examines the algorithmic limitations that arise due to biases, providing valuable insights into the challenges encountered by anomaly detection algorithms in accurately learning and capturing the entire range of variability present in the normative distribution. Through this analysis, we aim to enhance the understanding of these biases and pave the way for future improvements in the field. Here, we specifically investigate Alzheimer's disease detection from brain MR imaging as a case study, revealing significant biases related to sex, race, and scanner variations that substantially impact the results. These findings align with the broader goal of improving the reliability, fairness, and effectiveness of anomaly detection in medical imaging. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.13415 [pdf, other]

An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation

Authors: Tiarna Lee, Esther Puyol-Antón, Bram Ruijsink, Keana Aitcheson, Miao**g Shi, Andrew P. King

Abstract: In medical imaging, artificial intelligence (AI) is increasingly being used to automate routine tasks. However, these algorithms can exhibit and exacerbate biases which lead to disparate performances between protected groups. We investigate the impact of model choice on how imbalances in subject sex and race in training datasets affect AI-based cine cardiac magnetic resonance image segmentation. W… ▽ More In medical imaging, artificial intelligence (AI) is increasingly being used to automate routine tasks. However, these algorithms can exhibit and exacerbate biases which lead to disparate performances between protected groups. We investigate the impact of model choice on how imbalances in subject sex and race in training datasets affect AI-based cine cardiac magnetic resonance image segmentation. We evaluate three convolutional neural network-based models and one vision transformer model. We find significant sex bias in three of the four models and racial bias in all of the models. However, the severity and nature of the bias varies between the models, highlighting the importance of model choice when attempting to train fair AI-based segmentation models for medical imaging tasks. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.08038 [pdf, other]

Deep Learning Framework for Spleen Volume Estimation from 2D Cross-sectional Views

Authors: Zhen Yuan, Esther Puyol-Anton, Haran Jogeesvaran, Baba Inusa, Andrew P. King

Abstract: Abnormal spleen enlargement (splenomegaly) is regarded as a clinical indicator for a range of conditions, including liver disease, cancer and blood diseases. While spleen length measured from ultrasound images is a commonly used surrogate for spleen size, spleen volume remains the gold standard metric for assessing splenomegaly and the severity of related clinical conditions. Computed tomography i… ▽ More Abnormal spleen enlargement (splenomegaly) is regarded as a clinical indicator for a range of conditions, including liver disease, cancer and blood diseases. While spleen length measured from ultrasound images is a commonly used surrogate for spleen size, spleen volume remains the gold standard metric for assessing splenomegaly and the severity of related clinical conditions. Computed tomography is the main imaging modality for measuring spleen volume, but it is less accessible in areas where there is a high prevalence of splenomegaly (e.g., the Global South). Our objective was to enable automated spleen volume measurement from 2D cross-sectional segmentations, which can be obtained from ultrasound imaging. In this study, we describe a variational autoencoder-based framework to measure spleen volume from single- or dual-view 2D spleen segmentations. We propose and evaluate three volume estimation methods within this framework. We also demonstrate how 95% confidence intervals of volume estimates can be produced to make our method more clinically useful. Our best model achieved mean relative volume accuracies of 86.62% and 92.58% for single- and dual-view segmentations, respectively, surpassing the performance of the clinical standard approach of linear regression using manual measurements and a comparative deep learning-based 2D-3D reconstruction-based approach. The proposed spleen volume estimation framework can be integrated into standard clinical workflows which currently use 2D ultrasound images to measure spleen length. To the best of our knowledge, this is the first work to achieve direct 3D spleen volume estimation from 2D spleen segmentations. △ Less

Submitted 17 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: 22 pages, 7 figures

arXiv:2301.13296 [pdf, other]

Addressing Deep Learning Model Calibration Using Evidential Neural Networks and Uncertainty-Aware Training

Authors: Tareen Dawood, Emily Chan, Reza Razavi, Andrew P. King, Esther Puyol-Anton

Abstract: In terms of accuracy, deep learning (DL) models have had considerable success in classification problems for medical imaging applications. However, it is well-known that the outputs of such models, which typically utilise the SoftMax function in the final classification layer can be over-confident, i.e. they are poorly calibrated. Two competing solutions to this problem have been proposed: uncerta… ▽ More In terms of accuracy, deep learning (DL) models have had considerable success in classification problems for medical imaging applications. However, it is well-known that the outputs of such models, which typically utilise the SoftMax function in the final classification layer can be over-confident, i.e. they are poorly calibrated. Two competing solutions to this problem have been proposed: uncertainty-aware training and evidential neural networks (ENNs). In this paper, we perform an investigation into the improvements to model calibration that can be achieved by each of these approaches individually, and their combination. We perform experiments on two classification tasks: a simpler MNIST digit classification task and a more complex and realistic medical imaging artefact detection task using Phase Contrast Cardiac Magnetic Resonance images. The experimental results demonstrate that model calibration can suffer when the task becomes challenging enough to require a higher-capacity model. However, in our complex artefact detection task, we saw an improvement in calibration for both a low and higher-capacity model when implementing both the ENN and uncertainty-aware training together, indicating that this approach can offer a promising way to improve calibration in such settings. The findings highlight the potential use of these approaches to improve model calibration in a complex application, which would in turn improve clinician trust in DL models. △ Less

Submitted 27 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2209.14212 [pdf, other]

Automated Quality Controlled Analysis of 2D Phase Contrast Cardiovascular Magnetic Resonance Imaging

Authors: Emily Chan, Ciaran O'Hanlon, Carlota Asegurado Marquez, Marwenie Petalcorin, Jorge Mariscal-Harana, Haotian Gu, Raymond J. Kim, Robert M. Judd, Phil Chowienczyk, Julia A. Schnabel, Reza Razavi, Andrew P. King, Bram Ruijsink, Esther Puyol-Antón

Abstract: Flow analysis carried out using phase contrast cardiac magnetic resonance imaging (PC-CMR) enables the quantification of important parameters that are used in the assessment of cardiovascular function. An essential part of this analysis is the identification of the correct CMR views and quality control (QC) to detect artefacts that could affect the flow quantification. We propose a novel deep lear… ▽ More Flow analysis carried out using phase contrast cardiac magnetic resonance imaging (PC-CMR) enables the quantification of important parameters that are used in the assessment of cardiovascular function. An essential part of this analysis is the identification of the correct CMR views and quality control (QC) to detect artefacts that could affect the flow quantification. We propose a novel deep learning based framework for the fully-automated analysis of flow from full CMR scans that first carries out these view selection and QC steps using two sequential convolutional neural networks, followed by automatic aorta and pulmonary artery segmentation to enable the quantification of key flow parameters. Accuracy values of 0.958 and 0.914 were obtained for view classification and QC, respectively. For segmentation, Dice scores were $>$0.969 and the Bland-Altman plots indicated excellent agreement between manual and automatic peak flow values. In addition, we tested our pipeline on an external validation data set, with results indicating good robustness of the pipeline. This work was carried out using multivendor clinical data consisting of 986 cases, indicating the potential for the use of this pipeline in a clinical setting. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: STACOM 2022 workshop

arXiv:2209.01627 [pdf, other]

A systematic study of race and sex bias in CNN-based cardiac MR segmentation

Authors: Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Miao**g Shi, Andrew P. King

Abstract: In computer vision there has been significant research interest in assessing potential demographic bias in deep learning models. One of the main causes of such bias is imbalance in the training data. In medical imaging, where the potential impact of bias is arguably much greater, there has been less interest. In medical imaging pipelines, segmentation of structures of interest plays an important r… ▽ More In computer vision there has been significant research interest in assessing potential demographic bias in deep learning models. One of the main causes of such bias is imbalance in the training data. In medical imaging, where the potential impact of bias is arguably much greater, there has been less interest. In medical imaging pipelines, segmentation of structures of interest plays an important role in estimating clinical biomarkers that are subsequently used to inform patient management. Convolutional neural networks (CNNs) are starting to be used to automate this process. We present the first systematic study of the impact of training set imbalance on race and sex bias in CNN-based segmentation. We focus on segmentation of the structures of the heart from short axis cine cardiac magnetic resonance images, and train multiple CNN segmentation models with different levels of race/sex imbalance. We find no significant bias in the sex experiment but significant bias in two separate race experiments, highlighting the need to consider adequate representation of different demographic groups in health datasets. △ Less

Submitted 4 September, 2022; originally announced September 2022.

arXiv:2208.03305 [pdf, other]

Deep Learning-based Segmentation of Pleural Effusion From Ultrasound Using Coordinate Convolutions

Authors: Germain Morilhat, Naomi Kifle, Sandra FinesilverSmith, Bram Ruijsink, Vittoria Vergani, Habtamu Tegegne Desita, Zerubabel Tegegne Desita, Esther Puyol-Anton, Aaron Carass, Andrew P. King

Abstract: In many low-to-middle income (LMIC) countries, ultrasound is used for assessment of pleural effusion. Typically, the extent of the effusion is manually measured by a sonographer, leading to significant intra-/inter-observer variability. In this work, we investigate the use of deep learning (DL) to automate the process of pleural effusion segmentation from ultrasound images. On two datasets acquire… ▽ More In many low-to-middle income (LMIC) countries, ultrasound is used for assessment of pleural effusion. Typically, the extent of the effusion is manually measured by a sonographer, leading to significant intra-/inter-observer variability. In this work, we investigate the use of deep learning (DL) to automate the process of pleural effusion segmentation from ultrasound images. On two datasets acquired in a LMIC setting, we achieve median Dice Similarity Coefficients (DSCs) of 0.82 and 0.74 respectively using the nnU-net DL model. We also investigate the use of coordinate convolutions in the DL model and find that this results in a statistically significant improvement in the median DSC on the first dataset to 0.85, with no significant change on the second dataset. This work showcases, for the first time, the potential of DL in automating the process of effusion assessment from ultrasound in LMIC settings where there is often a lack of experienced radiologists to perform such tasks. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: This paper has been accepted for publication at the MICCAI FAIR workshop

arXiv:2206.08137 [pdf]

An AI tool for automated analysis of large-scale unstructured clinical cine CMR databases

Authors: Jorge Mariscal-Harana, Clint Asher, Vittoria Vergani, Maleeha Rizvi, Louise Keehn, Raymond J. Kim, Robert M. Judd, Steffen E. Petersen, Reza Razavi, Andrew King, Bram Ruijsink, Esther Puyol-Antón

Abstract: Artificial intelligence (AI) techniques have been proposed for automating analysis of short axis (SAX) cine cardiac magnetic resonance (CMR), but no CMR analysis tool exists to automatically analyse large (unstructured) clinical CMR datasets. We develop and validate a robust AI tool for start-to-end automatic quantification of cardiac function from SAX cine CMR in large clinical databases. Our pip… ▽ More Artificial intelligence (AI) techniques have been proposed for automating analysis of short axis (SAX) cine cardiac magnetic resonance (CMR), but no CMR analysis tool exists to automatically analyse large (unstructured) clinical CMR datasets. We develop and validate a robust AI tool for start-to-end automatic quantification of cardiac function from SAX cine CMR in large clinical databases. Our pipeline for processing and analysing CMR databases includes automated steps to identify the correct data, robust image pre-processing, an AI algorithm for biventricular segmentation of SAX CMR and estimation of functional biomarkers, and automated post-analysis quality control to detect and correct errors. The segmentation algorithm was trained on 2793 CMR scans from two NHS hospitals and validated on additional cases from this dataset (n=414) and five external datasets (n=6888), including scans of patients with a range of diseases acquired at 12 different centres using CMR scanners from all major vendors. Median absolute errors in cardiac biomarkers were within the range of inter-observer variability: <8.4mL (left ventricle volume), <9.2mL (right ventricle volume), <13.3g (left ventricular mass), and <5.9% (ejection fraction) across all datasets. Stratification of cases according to phenotypes of cardiac disease and scanner vendors showed good performance across all groups. We show that our proposed tool, which combines image pre-processing steps, a domain-generalisable AI algorithm trained on a large-scale multi-domain CMR dataset and quality control steps, allows robust analysis of (clinical or research) databases from multiple centres, vendors, and cardiac diseases. This enables translation of our tool for use in fully-automated processing of large multi-centre databases. △ Less

Submitted 5 July, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted at EHJ Digital Health; Bram Ruijsink and Esther Puyol-Antón are shared last authors

arXiv:2205.01673 [pdf, other]

A Deep Learning-based Integrated Framework for Quality-aware Undersampled Cine Cardiac MRI Reconstruction and Analysis

Authors: Inês P. Machado, Esther Puyol-Antón, Kerstin Hammernik, Gastão Cruz, Devran Ugurlu, Ihsane Olakorede, Ilkay Oksuz, Bram Ruijsink, Miguel Castelo-Branco, Alistair A. Young, Claudia Prieto, Julia A. Schnabel, Andrew P. King

Abstract: Cine cardiac magnetic resonance (CMR) imaging is considered the gold standard for cardiac function evaluation. However, cine CMR acquisition is inherently slow and in recent decades considerable effort has been put into accelerating scan times without compromising image quality or the accuracy of derived results. In this paper, we present a fully-automated, quality-controlled integrated framework… ▽ More Cine cardiac magnetic resonance (CMR) imaging is considered the gold standard for cardiac function evaluation. However, cine CMR acquisition is inherently slow and in recent decades considerable effort has been put into accelerating scan times without compromising image quality or the accuracy of derived results. In this paper, we present a fully-automated, quality-controlled integrated framework for reconstruction, segmentation and downstream analysis of undersampled cine CMR data. The framework enables active acquisition of radial k-space data, in which acquisition can be stopped as soon as acquired data are sufficient to produce high quality reconstructions and segmentations. This results in reduced scan times and automated analysis, enabling robust and accurate estimation of functional biomarkers. To demonstrate the feasibility of the proposed approach, we perform realistic simulations of radial k-space acquisitions on a dataset of subjects from the UK Biobank and present results on in-vivo cine CMR k-space data collected from healthy subjects. The results demonstrate that our method can produce quality-controlled images in a mean scan time reduced from 12 to 4 seconds per slice, and that image quality is sufficient to allow clinically relevant parameters to be automatically estimated to within 5% mean absolute difference. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2203.11726 [pdf, other]

AI-enabled Assessment of Cardiac Systolic and Diastolic Function from Echocardiography

Authors: Esther Puyol-Antón, Bram Ruijsink, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Mark K. Elliott, Vishal Mehta, Haotian Gu, Miguel Xochicale, Alberto Gomez, Christopher A. Rinaldi, Martin Cowie, Phil Chowienczyk, Reza Razavi, Andrew P. King

Abstract: Left ventricular (LV) function is an important factor in terms of patient management, outcome, and long-term survival of patients with heart disease. The most recently published clinical guidelines for heart failure recognise that over reliance on only one measure of cardiac function (LV ejection fraction) as a diagnostic and treatment stratification biomarker is suboptimal. Recent advances in AI-… ▽ More Left ventricular (LV) function is an important factor in terms of patient management, outcome, and long-term survival of patients with heart disease. The most recently published clinical guidelines for heart failure recognise that over reliance on only one measure of cardiac function (LV ejection fraction) as a diagnostic and treatment stratification biomarker is suboptimal. Recent advances in AI-based echocardiography analysis have shown excellent results on automated estimation of LV volumes and LV ejection fraction. However, from time-varying 2-D echocardiography acquisition, a richer description of cardiac function can be obtained by estimating functional biomarkers from the complete cardiac cycle. In this work we propose for the first time an AI approach for deriving advanced biomarkers of systolic and diastolic LV function from 2-D echocardiography based on segmentations of the full cardiac cycle. These biomarkers will allow clinicians to obtain a much richer picture of the heart in health and disease. The AI model is based on the 'nn-Unet' framework and was trained and tested using four different databases. Results show excellent agreement between manual and automated analysis and showcase the potential of the advanced systolic and diastolic biomarkers for patient stratification. Finally, for a subset of 50 cases, we perform a correlation analysis between clinical biomarkers derived from echocardiography and CMR and we show excellent agreement between the two modalities. △ Less

Submitted 21 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Journal ref: MICCAI ASMUS 2020

arXiv:2109.13230 [pdf, ps, other]

The Impact of Domain Shift on Left and Right Ventricle Segmentation in Short Axis Cardiac MR Images

Authors: Devran Ugurlu, Esther Puyol-Anton, Bram Ruijsink, Alistair Young, Ines Machado, Kerstin Hammernik, Andrew P. King, Julia A. Schnabel

Abstract: Domain shift refers to the difference in the data distribution of two datasets, normally between the training set and the test set for machine learning algorithms. Domain shift is a serious problem for generalization of machine learning models and it is well-established that a domain shift between the training and test sets may cause a drastic drop in the model's performance. In medical imaging, t… ▽ More Domain shift refers to the difference in the data distribution of two datasets, normally between the training set and the test set for machine learning algorithms. Domain shift is a serious problem for generalization of machine learning models and it is well-established that a domain shift between the training and test sets may cause a drastic drop in the model's performance. In medical imaging, there can be many sources of domain shift such as different scanners or scan protocols, different pathologies in the patient population, anatomical differences in the patient population (e.g. men vs women) etc. Therefore, in order to train models that have good generalization performance, it is important to be aware of the domain shift problem, its potential causes and to devise ways to address it. In this paper, we study the effect of domain shift on left and right ventricle blood pool segmentation in short axis cardiac MR images. Our dataset contains short axis images from 4 different MR scanners and 3 different pathology groups. The training is performed with nnUNet. The results show that scanner differences cause a greater drop in performance compared to changing the pathology group, and that the impact of domain shift is greater on right ventricle segmentation compared to left ventricle segmentation. Increasing the number of training subjects increased cross-scanner performance more than in-scanner performance at small training set sizes, but this difference in improvement decreased with larger training set sizes. Training models using data from multiple scanners improved cross-domain performance. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: Accepted to STACOM 2021

arXiv:2109.10641 [pdf, other]

Uncertainty-Aware Training for Cardiac Resynchronisation Therapy Response Prediction

Authors: Tareen Dawood, Chen Chen, Robin Andlauer, Baldeep S. Sidhu, Bram Ruijsink, Justin Gould, Bradley Porter, Mark Elliott, Vishal Mehta, C. Aldo Rinaldi, Esther Puyol-Antón, Reza Razavi, Andrew P. King

Abstract: Evaluation of predictive deep learning (DL) models beyond conventional performance metrics has become increasingly important for applications in sensitive environments like healthcare. Such models might have the capability to encode and analyse large sets of data but they often lack comprehensive interpretability methods, preventing clinical trust in predictive outcomes. Quantifying uncertainty of… ▽ More Evaluation of predictive deep learning (DL) models beyond conventional performance metrics has become increasingly important for applications in sensitive environments like healthcare. Such models might have the capability to encode and analyse large sets of data but they often lack comprehensive interpretability methods, preventing clinical trust in predictive outcomes. Quantifying uncertainty of a prediction is one way to provide such interpretability and promote trust. However, relatively little attention has been paid to how to include such requirements into the training of the model. In this paper we: (i) quantify the data (aleatoric) and model (epistemic) uncertainty of a DL model for Cardiac Resynchronisation Therapy response prediction from cardiac magnetic resonance images, and (ii) propose and perform a preliminary investigation of an uncertainty-aware loss function that can be used to retrain an existing DL image-based classification model to encourage confidence in correct predictions and reduce confidence in incorrect predictions. Our initial results are promising, showing a significant increase in the (epistemic) confidence of true positive predictions, with some evidence of a reduction in false negative confidence. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: STACOM 2021 Workshop

arXiv:2109.09421 [pdf, other]

Improved AI-based segmentation of apical and basal slices from clinical cine CMR

Authors: Jorge Mariscal-Harana, Naomi Kifle, Reza Razavi, Andrew P. King, Bram Ruijsink, Esther Puyol-Antón

Abstract: Current artificial intelligence (AI) algorithms for short-axis cardiac magnetic resonance (CMR) segmentation achieve human performance for slices situated in the middle of the heart. However, an often-overlooked fact is that segmentation of the basal and apical slices is more difficult. During manual analysis, differences in the basal segmentations have been reported as one of the major sources of… ▽ More Current artificial intelligence (AI) algorithms for short-axis cardiac magnetic resonance (CMR) segmentation achieve human performance for slices situated in the middle of the heart. However, an often-overlooked fact is that segmentation of the basal and apical slices is more difficult. During manual analysis, differences in the basal segmentations have been reported as one of the major sources of disagreement in human interobserver variability. In this work, we aim to investigate the performance of AI algorithms in segmenting basal and apical slices and design strategies to improve their segmentation. We trained all our models on a large dataset of clinical CMR studies obtained from two NHS hospitals (n=4,228) and evaluated them against two external datasets: ACDC (n=100) and M&Ms (n=321). Using manual segmentations as a reference, CMR slices were assigned to one of four regions: non-cardiac, base, middle, and apex. Using the nnU-Net framework as a baseline, we investigated two different approaches to reduce the segmentation performance gap between cardiac regions: (1) non-uniform batch sampling, which allows us to choose how often images from different regions are seen during training; and (2) a cardiac-region classification model followed by three (i.e. base, middle, and apex) region-specific segmentation models. We show that the classification and segmentation approach was best at reducing the performance gap across all datasets. We also show that improvements in the classification performance can subsequently lead to a significantly better performance in the segmentation task. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: *Shared last authors

arXiv:2109.07955 [pdf, other]

Quality-aware Cine Cardiac MRI Reconstruction and Analysis from Undersampled k-space Data

Authors: Ines Machado, Esther Puyol-Anton, Kerstin Hammernik, Gastao Cruz, Devran Ugurlu, Bram Ruijsink, Miguel Castelo-Branco, Alistair Young, Claudia Prieto, Julia A. Schnabel, Andrew P. King

Abstract: Cine cardiac MRI is routinely acquired for the assessment of cardiac health, but the imaging process is slow and typically requires several breath-holds to acquire sufficient k-space profiles to ensure good image quality. Several undersampling-based reconstruction techniques have been proposed during the last decades to speed up cine cardiac MRI acquisition. However, the undersampling factor is co… ▽ More Cine cardiac MRI is routinely acquired for the assessment of cardiac health, but the imaging process is slow and typically requires several breath-holds to acquire sufficient k-space profiles to ensure good image quality. Several undersampling-based reconstruction techniques have been proposed during the last decades to speed up cine cardiac MRI acquisition. However, the undersampling factor is commonly fixed to conservative values before acquisition to ensure diagnostic image quality, potentially leading to unnecessarily long scan times. In this paper, we propose an end-to-end quality-aware cine short-axis cardiac MRI framework that combines image acquisition and reconstruction with downstream tasks such as segmentation, volume curve analysis and estimation of cardiac functional parameters. The goal is to reduce scan time by acquiring only a fraction of k-space data to enable the reconstruction of images that can pass quality control checks and produce reliable estimates of cardiac functional parameters. The framework consists of a deep learning model for the reconstruction of 2D+t cardiac cine MRI images from undersampled data, an image quality-control step to detect good quality reconstructions, followed by a deep learning model for bi-ventricular segmentation, a quality-control step to detect good quality segmentations and automated calculation of cardiac functional parameters. To demonstrate the feasibility of the proposed approach, we perform simulations using a cohort of selected participants from the UK Biobank (n=270), 200 healthy subjects and 70 patients with cardiomyopathies. Our results show that we can produce quality-controlled images in a scan time reduced from 12 to 4 seconds per slice, enabling reliable estimates of cardiac functional parameters such as ejection fraction within 5% mean absolute error. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2107.10662 [pdf, other]

A Multimodal Deep Learning Model for Cardiac Resynchronisation Therapy Response Prediction

Authors: Esther Puyol-Antón, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Mark K. Elliott, Vishal Mehta, Christopher A. Rinaldi, Andrew P. King

Abstract: We present a novel multimodal deep learning framework for cardiac resynchronisation therapy (CRT) response prediction from 2D echocardiography and cardiac magnetic resonance (CMR) data. The proposed method first uses the `nnU-Net' segmentation model to extract segmentations of the heart over the full cardiac cycle from the two modalities. Next, a multimodal deep learning classifier is used for CRT… ▽ More We present a novel multimodal deep learning framework for cardiac resynchronisation therapy (CRT) response prediction from 2D echocardiography and cardiac magnetic resonance (CMR) data. The proposed method first uses the `nnU-Net' segmentation model to extract segmentations of the heart over the full cardiac cycle from the two modalities. Next, a multimodal deep learning classifier is used for CRT response prediction, which combines the latent spaces of the segmentation models of the two modalities. At inference time, this framework can be used with 2D echocardiography data only, whilst taking advantage of the implicit relationship between CMR and echocardiography features learnt from the model. We evaluate our pipeline on a cohort of 50 CRT patients for whom paired echocardiography/CMR data were available, and results show that the proposed multimodal classifier results in a statistically significant improvement in accuracy compared to the baseline approach that uses only 2D echocardiography data. The combination of multimodal data enables CRT response to be predicted with 77.38% accuracy (83.33% sensitivity and 71.43% specificity), which is comparable with the current state-of-the-art in machine learning-based CRT response prediction. Our work represents the first multimodal deep learning approach for CRT response prediction. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2010.02041 [pdf, other]

Probabilistic 3D surface reconstruction from sparse MRI information

Authors: Katarína Tóthová, Sarah Parisot, Matthew Lee, Esther Puyol-Antón, Andrew King, Marc Pollefeys, Ender Konukoglu

Abstract: Surface reconstruction from magnetic resonance (MR) imaging data is indispensable in medical image analysis and clinical research. A reliable and effective reconstruction tool should: be fast in prediction of accurate well localised and high resolution models, evaluate prediction uncertainty, work with as little input data as possible. Current deep learning state of the art (SOTA) 3D reconstructio… ▽ More Surface reconstruction from magnetic resonance (MR) imaging data is indispensable in medical image analysis and clinical research. A reliable and effective reconstruction tool should: be fast in prediction of accurate well localised and high resolution models, evaluate prediction uncertainty, work with as little input data as possible. Current deep learning state of the art (SOTA) 3D reconstruction methods, however, often only produce shapes of limited variability positioned in a canonical position or lack uncertainty evaluation. In this paper, we present a novel probabilistic deep learning approach for concurrent 3D surface reconstruction from sparse 2D MR image data and aleatoric uncertainty prediction. Our method is capable of reconstructing large surface meshes from three quasi-orthogonal MR imaging slices from limited training sets whilst modelling the location of each mesh vertex through a Gaussian distribution. Prior shape information is encoded using a built-in linear principal component analysis (PCA) model. Extensive experiments on cardiac MR data show that our probabilistic approach successfully assesses prediction uncertainty while at the same time qualitatively and quantitatively outperforms SOTA methods in shape prediction. Compared to SOTA, we are capable of properly localising and orientating the prediction via the use of a spatially aware neural network. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: MICCAI 2020

arXiv:2009.02704 [pdf, other]

Deep Learning for Automatic Spleen Length Measurement in Sickle Cell Disease Patients

Authors: Zhen Yuan, Esther Puyol-Anton, Haran Jogeesvaran, Catriona Reid, Baba Inusa, Andrew P. King

Abstract: Sickle Cell Disease (SCD) is one of the most common genetic diseases in the world. Splenomegaly (abnormal enlargement of the spleen) is frequent among children with SCD. If left untreated, splenomegaly can be life-threatening. The current workflow to measure spleen size includes palpation, possibly followed by manual length measurement in 2D ultrasound imaging. However, this manual measurement is… ▽ More Sickle Cell Disease (SCD) is one of the most common genetic diseases in the world. Splenomegaly (abnormal enlargement of the spleen) is frequent among children with SCD. If left untreated, splenomegaly can be life-threatening. The current workflow to measure spleen size includes palpation, possibly followed by manual length measurement in 2D ultrasound imaging. However, this manual measurement is dependent on operator expertise and is subject to intra- and inter-observer variability. We investigate the use of deep learning to perform automatic estimation of spleen length from ultrasound images. We investigate two types of approach, one segmentation-based and one based on direct length estimation, and compare the results against measurements made by human experts. Our best model (segmentation-based) achieved a percentage length error of 7.42%, which is approaching the level of inter-observer variability (5.47%-6.34%). To the best of our knowledge, this is the first attempt to measure spleen size in a fully automated way from ultrasound images. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 9 pages, 2 figures

arXiv:2009.00584 [pdf, other]

Quality-aware semi-supervised learning for CMR segmentation

Authors: Bram Ruijsink, Esther Puyol-Anton, Ye Li, Wenja Bai, Eric Kerfoot, Reza Razavi, Andrew P. King

Abstract: One of the challenges in develo** deep learning algorithms for medical image segmentation is the scarcity of annotated training data. To overcome this limitation, data augmentation and semi-supervised learning (SSL) methods have been developed. However, these methods have limited effectiveness as they either exploit the existing data set only (data augmentation) or risk negative impact by adding… ▽ More One of the challenges in develo** deep learning algorithms for medical image segmentation is the scarcity of annotated training data. To overcome this limitation, data augmentation and semi-supervised learning (SSL) methods have been developed. However, these methods have limited effectiveness as they either exploit the existing data set only (data augmentation) or risk negative impact by adding poor training examples (SSL). Segmentations are rarely the final product of medical image analysis - they are typically used in downstream tasks to infer higher-order patterns to evaluate diseases. Clinicians take into account a wealth of prior knowledge on biophysics and physiology when evaluating image analysis results. We have used these clinical assessments in previous works to create robust quality-control (QC) classifiers for automated cardiac magnetic resonance (CMR) analysis. In this paper, we propose a novel scheme that uses QC of the downstream task to identify high quality outputs of CMR segmentation networks, that are subsequently utilised for further network training. In essence, this provides quality-aware augmentation of training data in a variant of SSL for segmentation networks (semiQCSeg). We evaluate our approach in two CMR segmentation tasks (aortic and short axis cardiac volume segmentation) using UK Biobank data and two commonly used network architectures (U-net and a Fully Convolutional Network) and compare against supervised and SSL strategies. We show that semiQCSeg improves training of the segmentation networks. It decreases the need for labelled data, while outperforming the other methods in terms of Dice and clinical metrics. SemiQCSeg can be an efficient approach for training segmentation networks for medical image data when labelled datasets are scarce. △ Less

Submitted 1 September, 2020; originally announced September 2020.

Comments: MICCAI STACOM 2020

arXiv:2006.13811 [pdf, other]

doi 10.1007/978-3-030-59710-8_28

Interpretable Deep Models for Cardiac Resynchronisation Therapy Response Prediction

Authors: Esther Puyol-Antón, Chen Chen, James R. Clough, Bram Ruijsink, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Mark Elliott, Vishal Mehta, Daniel Rueckert, Christopher A. Rinaldi, Andrew P. King

Abstract: Advances in deep learning (DL) have resulted in impressive accuracy in some medical image classification tasks, but often deep models lack interpretability. The ability of these models to explain their decisions is important for fostering clinical trust and facilitating clinical translation. Furthermore, for many problems in medicine there is a wealth of existing clinical knowledge to draw upon, w… ▽ More Advances in deep learning (DL) have resulted in impressive accuracy in some medical image classification tasks, but often deep models lack interpretability. The ability of these models to explain their decisions is important for fostering clinical trust and facilitating clinical translation. Furthermore, for many problems in medicine there is a wealth of existing clinical knowledge to draw upon, which may be useful in generating explanations, but it is not obvious how this knowledge can be encoded into DL models - most models are learnt either from scratch or using transfer learning from a different domain. In this paper we address both of these issues. We propose a novel DL framework for image-based classification based on a variational autoencoder (VAE). The framework allows prediction of the output of interest from the latent space of the autoencoder, as well as visualisation (in the image domain) of the effects of crossing the decision boundary, thus enhancing the interpretability of the classifier. Our key contribution is that the VAE disentangles the latent space based on `explanations' drawn from existing clinical knowledge. The framework can predict outputs as well as explanations for these outputs, and also raises the possibility of discovering new biomarkers that are separate (or disentangled) from the existing knowledge. We demonstrate our framework on the problem of predicting response of patients with cardiomyopathy to cardiac resynchronization therapy (CRT) from cine cardiac magnetic resonance images. The sensitivity and specificity of the proposed model on the task of CRT response prediction are 88.43% and 84.39% respectively, and we showcase the potential of our model in enhancing understanding of the factors contributing to CRT response. △ Less

Submitted 9 July, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: MICCAI 2020 conference

arXiv:1908.04538 [pdf, other]

doi 10.1007/978-3-030-39074-7_3

Assessing the Impact of Blood Pressure on Cardiac Function Using Interpretable Biomarkers and Variational Autoencoders

Authors: Esther Puyol-Antón, Bram Ruijsink, James R. Clough, Ilkay Oksuz, Daniel Rueckert, Reza Razavi, Andrew P. King

Abstract: Maintaining good cardiac function for as long as possible is a major concern for healthcare systems worldwide and there is much interest in learning more about the impact of different risk factors on cardiac health. The aim of this study is to analyze the impact of systolic blood pressure (SBP) on cardiac function while preserving the interpretability of the model using known clinical biomarkers i… ▽ More Maintaining good cardiac function for as long as possible is a major concern for healthcare systems worldwide and there is much interest in learning more about the impact of different risk factors on cardiac health. The aim of this study is to analyze the impact of systolic blood pressure (SBP) on cardiac function while preserving the interpretability of the model using known clinical biomarkers in a large cohort of the UK Biobank population. We propose a novel framework that combines deep learning based estimation of interpretable clinical biomarkers from cardiac cine MR data with a variational autoencoder (VAE). The VAE architecture integrates a regression loss in the latent space, which enables the progression of cardiac health with SBP to be learnt. Results on 3,600 subjects from the UK Biobank show that the proposed model allows us to gain important insight into the deterioration of cardiac function with increasing SBP, identify key interpretable factors involved in this process, and lastly exploit the model to understand patterns of positive and adverse adaptation of cardiac function. △ Less

Submitted 13 August, 2019; originally announced August 2019.

arXiv:1906.06188 [pdf, ps, other]

Global and Local Interpretability for Cardiac MRI Classification

Authors: James R. Clough, Ilkay Oksuz, Esther Puyol-Anton, Bram Ruijsink, Andrew P. King, Julia A. Schnabel

Abstract: Deep learning methods for classifying medical images have demonstrated impressive accuracy in a wide range of tasks but often these models are hard to interpret, limiting their applicability in clinical practice. In this work we introduce a convolutional neural network model for identifying disease in temporal sequences of cardiac MR segmentations which is interpretable in terms of clinically fami… ▽ More Deep learning methods for classifying medical images have demonstrated impressive accuracy in a wide range of tasks but often these models are hard to interpret, limiting their applicability in clinical practice. In this work we introduce a convolutional neural network model for identifying disease in temporal sequences of cardiac MR segmentations which is interpretable in terms of clinically familiar measurements. The model is based around a variational autoencoder, reducing the input into a low-dimensional latent space in which classification occurs. We then use the recently developed `concept activation vector' technique to associate concepts which are diagnostically meaningful (eg. clinical biomarkers such as `low left-ventricular ejection fraction') to certain vectors in the latent space. These concepts are then qualitatively inspected by observing the change in the image domain resulting from interpolations in the latent space in the direction of these vectors. As a result, when the model classifies images it is also capable of providing naturally interpretable concepts relevant to that classification and demonstrating the meaning of those concepts in the image domain. Our approach is demonstrated on the UK Biobank cardiac MRI dataset where we detect the presence of coronary artery disease. △ Less

Submitted 12 August, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

Comments: Accepted at MICCAI 2019, 9 pages, 3 figures

arXiv:1906.05695 [pdf, other]

Detection and Correction of Cardiac MR Motion Artefacts during Reconstruction from K-space

Authors: lkay Oksuz, James Clough, Bram Ruijsink, Esther Puyol-Anton, Aurelien Bustin, Gastao Cruz, Claudia Prieto, Daniel Rueckert, Andrew P. King, Julia A. Schnabel

Abstract: In fully sampled cardiac MR (CMR) acquisitions, motion can lead to corruption of k-space lines, which can result in artefacts in the reconstructed images. In this paper, we propose a method to automatically detect and correct motion-related artefacts in CMR acquisitions during reconstruction from k-space data. Our correction method is inspired by work on undersampled CMR reconstruction, and uses d… ▽ More In fully sampled cardiac MR (CMR) acquisitions, motion can lead to corruption of k-space lines, which can result in artefacts in the reconstructed images. In this paper, we propose a method to automatically detect and correct motion-related artefacts in CMR acquisitions during reconstruction from k-space data. Our correction method is inspired by work on undersampled CMR reconstruction, and uses deep learning to optimize a data-consistency term for under-sampled k-space reconstruction. Our main methodological contribution is the addition of a detection network to classify motion-corrupted k-space lines to convert the problem of artefact correction to a problem of reconstruction using the data consistency term. We train our network to automatically correct for motion-related artefacts using synthetically corrupted cine CMR k-space data as well as uncorrupted CMR images. Using a test set of 50 2D+time cine CMR datasets from the UK Biobank, we achieve good image quality in the presence of synthetic motion artefacts. We quantitatively compare our method with a variety of techniques for recovering good image quality and showcase better performance compared to state of the art denoising techniques with a PSNR of 37.1. Moreover, we show that our method preserves the quality of uncorrupted images and therefore can be also utilized as a general image reconstruction algorithm. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: Accepted to MICCAI 2019. arXiv admin note: text overlap with arXiv:1808.05130

Showing 1–23 of 23 results for author: Puyol-Anton, E