-
Mind the Gap: Federated Learning Broadens Domain Generalization in Diagnostic AI Models
Authors:
Soroosh Tayebi Arasteh,
Christiane Kuhl,
Marwin-Jonathan Saehn,
Peter Isfort,
Daniel Truhn,
Sven Nebelung
Abstract:
Develo** robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborati…
▽ More
Develo** robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborative, on the diagnostic on-domain and off-domain performance of AI models interpreting chest radiographs has not been assessed. Consequently, using 610,000 chest radiographs from five institutions across the globe, we assessed diagnostic performance as a function of training strategy (i.e., local vs. collaborative), network architecture (i.e., convolutional vs. transformer-based), generalization performance (i.e., on-domain vs. off-domain), imaging finding (i.e., cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, and no abnormality), dataset size (i.e., from n=18,000 to 213,921 radiographs), and dataset diversity. Large datasets not only showed minimal performance gains with FL but, in some instances, even exhibited decreases. In contrast, smaller datasets revealed marked improvements. Thus, on-domain performance was mainly driven by training data size. However, off-domain performance leaned more on training diversity. When trained collaboratively across diverse external institutions, AI models consistently surpassed models trained locally for off-domain tasks, emphasizing FL's potential in leveraging data diversity. In conclusion, FL can bolster diagnostic privacy, reproducibility, and off-domain reliability of AI models and, potentially, optimize healthcare outcomes.
△ Less
Submitted 19 December, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Preserving privacy in domain transfer of medical AI models comes at no performance costs: The integral role of differential privacy
Authors:
Soroosh Tayebi Arasteh,
Mahshad Lotfinia,
Teresa Nolte,
Marwin Saehn,
Peter Isfort,
Christiane Kuhl,
Sven Nebelung,
Georgios Kaissis,
Daniel Truhn
Abstract:
Develo** robust and effective artificial intelligence (AI) models in medicine requires access to large amounts of patient data. The use of AI models solely trained on large multi-institutional datasets can help with this, yet the imperative to ensure data privacy remains, particularly as membership inference risks breaching patient confidentiality. As a proposed remedy, we advocate for the integ…
▽ More
Develo** robust and effective artificial intelligence (AI) models in medicine requires access to large amounts of patient data. The use of AI models solely trained on large multi-institutional datasets can help with this, yet the imperative to ensure data privacy remains, particularly as membership inference risks breaching patient confidentiality. As a proposed remedy, we advocate for the integration of differential privacy (DP). We specifically investigate the performance of models trained with DP as compared to models trained without DP on data from institutions that the model had not seen during its training (i.e., external validation) - the situation that is reflective of the clinical use of AI models. By leveraging more than 590,000 chest radiographs from five institutions, we evaluated the efficacy of DP-enhanced domain transfer (DP-DT) in diagnosing cardiomegaly, pleural effusion, pneumonia, atelectasis, and in identifying healthy subjects. We juxtaposed DP-DT with non-DP-DT and examined diagnostic accuracy and demographic fairness using the area under the receiver operating characteristic curve (AUC) as the main metric, as well as accuracy, sensitivity, and specificity. Our results show that DP-DT, even with exceptionally high privacy levels (epsilon around 1), performs comparably to non-DP-DT (P>0.119 across all domains). Furthermore, DP-DT led to marginal AUC differences - less than 1% - for nearly all subgroups, relative to non-DP-DT. Despite consistent evidence suggesting that DP models induce significant performance degradation for on-domain applications, we show that off-domain performance is almost not affected. Therefore, we ardently advocate for the adoption of DP in training diagnostic medical AI models, given its minimal impact on performance.
△ Less
Submitted 7 December, 2023; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels
Authors:
Soroosh Tayebi Arasteh,
Peter Isfort,
Marwin Saehn,
Gustav Mueller-Franzes,
Firas Khader,
Jakob Nikolas Kather,
Christiane Kuhl,
Sven Nebelung,
Daniel Truhn
Abstract:
Due to the rapid advancements in recent years, medical image analysis is largely dominated by deep learning (DL). However, building powerful and robust DL models requires training with large multi-party datasets. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled vary widely. For Instance, an institution might provide a dataset of chest…
▽ More
Due to the rapid advancements in recent years, medical image analysis is largely dominated by deep learning (DL). However, building powerful and robust DL models requires training with large multi-party datasets. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled vary widely. For Instance, an institution might provide a dataset of chest radiographs containing labels denoting the presence of pneumonia, while another institution might have a focus on determining the presence of metastases in the lung. Training a single AI model utilizing all these data is not feasible with conventional federated learning (FL). This prompts us to propose an extension to the widespread FL process, namely flexible federated learning (FFL) for collaborative training on such data. Using 695,000 chest radiographs from five institutions from across the globe - each with differing labels - we demonstrate that having heterogeneously labeled datasets, FFL-based training leads to significant performance increase compared to conventional FL training, where only the uniformly annotated images are utilized. We believe that our proposed algorithm could accelerate the process of bringing collaborative training methods from research and simulation phase to the real-world applications in healthcare.
△ Less
Submitted 13 April, 2023; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Image prediction of disease progression by style-based manifold extrapolation
Authors:
Tianyu Han,
Jakob Nikolas Kather,
Federico Pedersoli,
Markus Zimmermann,
Sebastian Keil,
Maximilian Schulze-Hagen,
Marc Terwoelbeck,
Peter Isfort,
Christoph Haarburger,
Fabian Kiessling,
Volkmar Schulz,
Christiane Kuhl,
Sven Nebelung,
Daniel Truhn
Abstract:
Disease-modifying management aims to prevent deterioration and progression of the disease, not just relieve symptoms. Unfortunately, the development of necessary therapies is often hampered by the failure to recognize the presymptomatic disease and limited understanding of disease development. We present a generic solution for this problem by a methodology that allows the prediction of progression…
▽ More
Disease-modifying management aims to prevent deterioration and progression of the disease, not just relieve symptoms. Unfortunately, the development of necessary therapies is often hampered by the failure to recognize the presymptomatic disease and limited understanding of disease development. We present a generic solution for this problem by a methodology that allows the prediction of progression risk and morphology in individuals using a latent extrapolation optimization approach. To this end, we combined a regularized generative adversarial network (GAN) and a latent nearest neighbor algorithm for joint optimization to generate plausible images of future time points. We evaluated our method on osteoarthritis (OA) data from a multi-center longitudinal study (the Osteoarthritis Initiative, OAI). With presymptomatic baseline data, our model is generative and significantly outperforms the end-to-end learning model in discriminating the progressive cohort. Two experiments were performed with seven experienced radiologists. When no synthetic follow-up radiographs were provided, our model performed better than all seven radiologists. In cases where the synthetic follow-ups generated by our model were available, the specificity and sensitivity of all readers in discriminating progressors increased from $72.3\%$ to $88.6\%$ and from $42.1\%$ to $51.6\%$, respectively. Our results open up a new possibility of using model-based morphology and risk prediction to make predictions about future disease occurrence, as demonstrated in the example of OA.
△ Less
Submitted 8 April, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.