AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT
Authors:
Edward H. Lee,
Brendan Kelly,
Emre Altinmakas,
Hakan Dogan,
Maryam Mohammadzadeh,
Errol Colak,
Steve Fu,
Olivia Choudhury,
Ujjwal Ratan,
Felipe Kitamura,
Hernan Chaves,
Jimmy Zheng,
Mourad Said,
Eduardo Reis,
Jaekwang Lim,
Patricia Yokoo,
Courtney Mitchell,
Golnaz Houshmand,
Marzyeh Ghassemi,
Ronan Killeen,
Wendy Qiu,
Joel Hayden,
Farnaz Rafiee,
Chad Klochko,
Nicholas Bevins
, et al. (5 additional authors not shown)
Abstract:
While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI developm…
▽ More
While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI development that enables learning across hospitals without data share. In this study, we show the results of various FL strategies on one of the largest and most diverse COVID-19 chest CT datasets: 21 participating hospitals across five continents that comprise >10,000 patients with >1 million images. We also propose an FL strategy that leverages synthetically generated data to overcome class and size imbalances. We also describe the sources of data heterogeneity in the context of FL, and show how even among the correctly labeled populations, disparities can arise due to these biases.
△ Less
Submitted 13 April, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
A Generalizable Artificial Intelligence Model for COVID-19 Classification Task Using Chest X-ray Radiographs: Evaluated Over Four Clinical Datasets with 15,097 Patients
Authors:
Ran Zhang,
Xin Tie,
John W. Garrett,
Dalton Griner,
Zhihua Qi,
Nicholas B. Bevins,
Scott B. Reeder,
Guang-Hong Chen
Abstract:
Purpose: To answer the long-standing question of whether a model trained from a single clinical site can be generalized to external sites.
Materials and Methods: 17,537 chest x-ray radiographs (CXRs) from 3,264 COVID-19-positive patients and 4,802 COVID-19-negative patients were collected from a single site for AI model development. The generalizability of the trained model was retrospectively e…
▽ More
Purpose: To answer the long-standing question of whether a model trained from a single clinical site can be generalized to external sites.
Materials and Methods: 17,537 chest x-ray radiographs (CXRs) from 3,264 COVID-19-positive patients and 4,802 COVID-19-negative patients were collected from a single site for AI model development. The generalizability of the trained model was retrospectively evaluated using four different real-world clinical datasets with a total of 26,633 CXRs from 15,097 patients (3,277 COVID-19-positive patients). The area under the receiver operating characteristic curve (AUC) was used to assess diagnostic performance.
Results: The AI model trained using a single-source clinical dataset achieved an AUC of 0.82 (95% CI: 0.80, 0.84) when applied to the internal temporal test set. When applied to datasets from two external clinical sites, an AUC of 0.81 (95% CI: 0.80, 0.82) and 0.82 (95% CI: 0.80, 0.84) were achieved. An AUC of 0.79 (95% CI: 0.77, 0.81) was achieved when applied to a multi-institutional COVID-19 dataset collected by the Medical Imaging and Data Resource Center (MIDRC). A power-law dependence, N^(k )(k is empirically found to be -0.21 to -0.25), indicates a relatively weak performance dependence on the training data sizes.
Conclusion: COVID-19 classification AI model trained using well-curated data from a single clinical site is generalizable to external clinical sites without a significant drop in performance.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.