Search | arXiv e-print repository

SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

Authors: Tanguy Marchand, Boris Muzellec, Constance Beguier, Jean Ogier du Terrail, Mathieu Andreux

Abstract: The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which all… ▽ More The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which allows us to optimize it with exponential search. We numerically show that the resulting algorithm is more stable than the state-of-the-art approach based on the Brent minimization method. Building on this simple algorithm and Secure Multiparty Computation routines, we propose SecureFedYJ, a federated algorithm that performs a pooled-equivalent YJ transformation without leaking more information than the final fitted parameters do. Quantitative experiments on real data demonstrate that, in addition to being secure, our approach reliably normalizes features across silos as well as if data were pooled, making it a viable approach for safe federated feature Gaussianization. △ Less

Submitted 13 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Accepted to Neurips2022

arXiv:2101.02997 [pdf, other]

Differentially Private Federated Learning for Cancer Prediction

Authors: Constance Beguier, Jean Ogier du Terrail, Iqraa Meah, Mathieu Andreux, Eric W. Tramel

Abstract: Since 2014, the NIH funded iDASH (integrating Data for Analysis, Anonymization, SHaring) National Center for Biomedical Computing has hosted yearly competitions on the topic of private computing for genomic data. For one track of the 2020 iteration of this competition, participants were challenged to produce an approach to federated learning (FL) training of genomic cancer prediction models using… ▽ More Since 2014, the NIH funded iDASH (integrating Data for Analysis, Anonymization, SHaring) National Center for Biomedical Computing has hosted yearly competitions on the topic of private computing for genomic data. For one track of the 2020 iteration of this competition, participants were challenged to produce an approach to federated learning (FL) training of genomic cancer prediction models using differential privacy (DP), with submissions ranked according to held-out test accuracy for a given set of DP budgets. More precisely, in this track, we are tasked with training a supervised model for the prediction of breast cancer occurrence from genomic data split between two virtual centers while ensuring data privacy with respect to model transfer via DP. In this article, we present our 3rd place submission to this competition. During the competition, we encountered two main challenges discussed in this article: i) ensuring correctness of the privacy budget evaluation and ii) achieving an acceptable trade-off between prediction performance and privacy budget. △ Less

Submitted 24 March, 2021; v1 submitted 8 January, 2021; originally announced January 2021.

arXiv:2008.07424 [pdf, other]

Siloed Federated Learning for Multi-Centric Histopathology Datasets

Authors: Mathieu Andreux, Jean Ogier du Terrail, Constance Beguier, Eric W. Tramel

Abstract: While federated learning is a promising approach for training deep learning models over distributed sensitive datasets, it presents new challenges for machine learning, especially when applied in the medical domain where multi-centric data heterogeneity is common. Building on previous domain adaptation works, this paper proposes a novel federated learning approach for deep learning architectures v… ▽ More While federated learning is a promising approach for training deep learning models over distributed sensitive datasets, it presents new challenges for machine learning, especially when applied in the medical domain where multi-centric data heterogeneity is common. Building on previous domain adaptation works, this paper proposes a novel federated learning approach for deep learning architectures via the introduction of local-statistic batch normalization (BN) layers, resulting in collaboratively-trained, yet center-specific models. This strategy improves robustness to data heterogeneity while also reducing the potential for information leaks by not sharing the center-specific layer activation statistics. We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets. We show that our approach compares favorably to previous state-of-the-art methods, especially for transfer learning across datasets. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: Accepted to MICCAI 2020 DCL workshop

arXiv:2007.14861 [pdf, ps, other]

Efficient Sparse Secure Aggregation for Federated Learning

Authors: Constance Beguier, Mathieu Andreux, Eric W. Tramel

Abstract: Federated Learning enables one to jointly train a machine learning model across distributed clients holding sensitive datasets. In real-world settings, this approach is hindered by expensive communication and privacy concerns. Both of these challenges have already been addressed individually, resulting in competing optimisations. In this article, we tackle them simultaneously for one of the first… ▽ More Federated Learning enables one to jointly train a machine learning model across distributed clients holding sensitive datasets. In real-world settings, this approach is hindered by expensive communication and privacy concerns. Both of these challenges have already been addressed individually, resulting in competing optimisations. In this article, we tackle them simultaneously for one of the first times. More precisely, we adapt compression-based federated techniques to additive secret sharing, leading to an efficient secure aggregation protocol, with an adaptable security level. We prove its privacy against malicious adversaries and its correctness in the semi-honest setting. Experiments on deep convolutional networks demonstrate that our secure protocol achieves high accuracy with low communication costs. Compared to prior works on secure aggregation, our protocol has a lower communication and computation costs for a similar accuracy. △ Less

Submitted 18 October, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Showing 1–4 of 4 results for author: Beguier, C