Robustness of an Artificial Intelligence Solution for Diagnosis of Normal Chest X-Rays
Authors:
Tom Dyer,
Jordan Smith,
Gaetan Dissez,
Nicole Tay,
Qaiser Malik,
Tom Naunton Morgan,
Paul Williams,
Liliana Garcia-Mondragon,
George Pearse,
Simon Rasalingham
Abstract:
Purpose: Artificial intelligence (AI) solutions for medical diagnosis require thorough evaluation to demonstrate that performance is maintained for all patient sub-groups and to ensure that proposed improvements in care will be delivered equitably. This study evaluates the robustness of an AI solution for the diagnosis of normal chest X-rays (CXRs) by comparing performance across multiple patient…
▽ More
Purpose: Artificial intelligence (AI) solutions for medical diagnosis require thorough evaluation to demonstrate that performance is maintained for all patient sub-groups and to ensure that proposed improvements in care will be delivered equitably. This study evaluates the robustness of an AI solution for the diagnosis of normal chest X-rays (CXRs) by comparing performance across multiple patient and environmental subgroups, as well as comparing AI errors with those made by human experts.
Methods: A total of 4,060 CXRs were sampled to represent a diverse dataset of NHS patients and care settings. Ground-truth labels were assigned by a 3-radiologist panel. AI performance was evaluated against assigned labels and sub-groups analysis was conducted against patient age and sex, as well as CXR view, modality, device manufacturer and hospital site.
Results: The AI solution was able to remove 18.5% of the dataset by classification as High Confidence Normal (HCN). This was associated with a negative predictive value (NPV) of 96.0%, compared to 89.1% for diagnosis of normal scans by radiologists. In all AI false negative (FN) cases, a radiologist was found to have also made the same error when compared to final ground-truth labels. Subgroup analysis showed no statistically significant variations in AI performance, whilst reduced normal classification was observed in data from some hospital sites.
Conclusion: We show the AI solution could provide meaningful workload savings by diagnosis of 18.5% of scans as HCN with a superior NPV to human readers. The AI solution is shown to perform well across patient subgroups and error cases were shown to be subjective or subtle in nature.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
Enhancing Early Lung Cancer Detection on Chest Radiographs with AI-assistance: A Multi-Reader Study
Authors:
Gaetan Dissez,
Nicole Tay,
Tom Dyer,
Matthew Tam,
Richard Dittrich,
David Doyne,
James Hoare,
Jackson J. Pat,
Stephanie Patterson,
Amanda Stockham,
Qaiser Malik,
Tom Naunton Morgan,
Paul Williams,
Liliana Garcia-Mondragon,
Jordan Smith,
George Pearse,
Simon Rasalingham
Abstract:
Objectives: The present study evaluated the impact of a commercially available explainable AI algorithm in augmenting the ability of clinicians to identify lung cancer on chest X-rays (CXR).
Design: This retrospective study evaluated the performance of 11 clinicians for detecting lung cancer from chest radiographs, with and without assistance from a commercially available AI algorithm (red dot,…
▽ More
Objectives: The present study evaluated the impact of a commercially available explainable AI algorithm in augmenting the ability of clinicians to identify lung cancer on chest X-rays (CXR).
Design: This retrospective study evaluated the performance of 11 clinicians for detecting lung cancer from chest radiographs, with and without assistance from a commercially available AI algorithm (red dot, Behold.ai) that predicts suspected lung cancer from CXRs. Clinician performance was evaluated against clinically confirmed diagnoses.
Setting: The study analysed anonymised patient data from an NHS hospital; the dataset consisted of 400 chest radiographs from adult patients (18 years and above) who had a CXR performed in 2020, with corresponding clinical text reports.
Participants: A panel of readers consisting of 11 clinicians (consultant radiologists, radiologist trainees and reporting radiographers) participated in this study.
Main outcome measures: Overall accuracy, sensitivity, specificity and precision for detecting lung cancer on CXRs by clinicians, with and without AI input. Agreement rates between clinicians and performance standard deviation were also evaluated, with and without AI input.
Results: The use of the AI algorithm by clinicians led to an improved overall performance for lung tumour detection, achieving an overall increase of 17.4% of lung cancers being identified on CXRs which would have otherwise been missed, an overall increase in detection of smaller tumours, a 24% and 13% increased detection of stage 1 and stage 2 lung cancers respectively, and standardisation of clinician performance.
Conclusions: This study showed great promise in the clinical utility of AI algorithms in improving early lung cancer diagnosis and promoting health equity through overall improvement in reader performances, without impacting downstream imaging resources.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.