Fairness and Privacy in Federated Learning and Their Implications in Healthcare
Authors:
Navya Annapareddy,
Jade Preston,
Judy Fox
Abstract:
Currently, many contexts exist where distributed learning is difficult or otherwise constrained by security and communication limitations. One common domain where this is a consideration is in Healthcare where data is often governed by data-use-ordinances like HIPAA. On the other hand, larger sample sizes and shared data models are necessary to allow models to better generalize on account of the p…
▽ More
Currently, many contexts exist where distributed learning is difficult or otherwise constrained by security and communication limitations. One common domain where this is a consideration is in Healthcare where data is often governed by data-use-ordinances like HIPAA. On the other hand, larger sample sizes and shared data models are necessary to allow models to better generalize on account of the potential for more variability and balancing underrepresented classes. Federated learning is a type of distributed learning model that allows data to be trained in a decentralized manner. This, in turn, addresses data security, privacy, and vulnerability considerations as data itself is not shared across a given learning network nodes. Three main challenges to federated learning include node data is not independent and identically distributed (iid), clients requiring high levels of communication overhead between peers, and there is the heterogeneity of different clients within a network with respect to dataset bias and size. As the field has grown, the notion of fairness in federated learning has also been introduced through novel implementations. Fairness approaches differ from the standard form of federated learning and also have distinct challenges and considerations for the healthcare domain. This paper endeavors to outline the typical lifecycle of fair federated learning in research as well as provide an updated taxonomy to account for the current state of fairness in implementations. Lastly, this paper provides added insight into the implications and challenges of implementing and supporting fairness in federated learning in the healthcare domain.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
Towards Fair and Privacy Preserving Federated Learning for the Healthcare Domain
Authors:
Navya Annapareddy,
Yingzheng Liu,
Judy Fox
Abstract:
Federated learning enables data sharing in healthcare contexts where it might otherwise be difficult due to data-use-ordinances or security and communication constraints. Distributed and shared data models allow models to become generalizable and learn from heterogeneous clients. While addressing data security, privacy, and vulnerability considerations, data itself is not shared across nodes in a…
▽ More
Federated learning enables data sharing in healthcare contexts where it might otherwise be difficult due to data-use-ordinances or security and communication constraints. Distributed and shared data models allow models to become generalizable and learn from heterogeneous clients. While addressing data security, privacy, and vulnerability considerations, data itself is not shared across nodes in a given learning network. On the other hand, FL models often struggle with variable client data distributions and operate on an assumption of independent and identically distributed data. As the field has grown, the notion of fairness-aware federated learning mechanisms has also been introduced and is of distinct significance to the healthcare domain where many sensitive groups and protected classes exist. In this paper, we create a benchmark methodology for FAFL mechanisms under various heterogeneous conditions on datasets in the healthcare domain typically outside the scope of current federated learning benchmarks, such as medical imaging and waveform data formats. Our results indicate considerable variation in how various FAFL schemes respond to high levels of data heterogeneity. Additionally, doing so under privacy-preserving conditions can create significant increases in network communication cost and latency compared to the typical federated learning scheme.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.