-
Real-World Federated Learning in Radiology: Hurdles to overcome and Benefits to gain
Authors:
Markus R. Bujotzek,
Ünal Akünal,
Stefan Denner,
Peter Neher,
Maximilian Zenk,
Eric Frodl,
Astha Jaiswal,
Moon Kim,
Nicolai R. Krekiehn,
Manuel Nickel,
Richard Ruppel,
Marcus Both,
Felix Döllinger,
Marcel Opitz,
Thorsten Persigehl,
Jens Kleesiek,
Tobias Penzkofer,
Klaus Maier-Hein,
Rickmer Braren,
Andreas Bucher
Abstract:
Objective: Federated Learning (FL) enables collaborative model training while kee** data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles, leaving behind a significant knowledge gap.…
▽ More
Objective: Federated Learning (FL) enables collaborative model training while kee** data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles, leaving behind a significant knowledge gap. Minding efforts to implement real-world FL, there is a notable lack of comprehensive assessment comparing FL to less complex alternatives. Materials & Methods: We extensively reviewed FL literature, categorizing insights along with our findings according to their nature and phase while establishing a FL initiative, summarized to a comprehensive guide. We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. We extensively evaluated FL against less complex alternatives in three distinct evaluation scenarios. Results: The proposed guide outlines essential steps, identified hurdles, and proposed solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results show that FL outperforms less complex alternatives in all evaluation scenarios, justifying the effort required to translate FL into real-world applications. Discussion & Conclusion: Our proposed guide aims to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications. Our results underscore the value of efforts needed to translate FL into real-world applications by demonstrating advantageous performance over alternatives, and emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks
Authors:
Paul Azunre,
Craig Corcoran,
Numa Dhamani,
Jeffrey Gleason,
Garrett Honke,
David Sullivan,
Rebecca Ruppel,
Sandeep Verma,
Jonathon Morgan
Abstract:
A character-level convolutional neural network (CNN) motivated by applications in "automated machine learning" (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to…
▽ More
A character-level convolutional neural network (CNN) motivated by applications in "automated machine learning" (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata---i.e., network structure, headers, etc.---can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Abstractive Tabular Dataset Summarization via Knowledge Base Semantic Embeddings
Authors:
Paul Azunre,
Craig Corcoran,
David Sullivan,
Garrett Honke,
Rebecca Ruppel,
Sandeep Verma,
Jonathon Morgan
Abstract:
This paper describes an abstractive summarization method for tabular data which employs a knowledge base semantic embedding to generate the summary. Assuming the dataset contains descriptive text in headers, columns and/or some augmenting metadata, the system employs the embedding to recommend a subject/type for each text segment. Recommendations are aggregated into a small collection of super typ…
▽ More
This paper describes an abstractive summarization method for tabular data which employs a knowledge base semantic embedding to generate the summary. Assuming the dataset contains descriptive text in headers, columns and/or some augmenting metadata, the system employs the embedding to recommend a subject/type for each text segment. Recommendations are aggregated into a small collection of super types considered to be descriptive of the dataset by exploiting the hierarchy of types in a pre-specified ontology. Using February 2015 Wikipedia as the knowledge base, and a corresponding DBpedia ontology as types, we present experimental results on open data taken from several sources--OpenML, CKAN and data.world--to illustrate the effectiveness of the approach.
△ Less
Submitted 5 April, 2018; v1 submitted 4 April, 2018;
originally announced April 2018.