-
FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings
Authors:
Jean Ogier du Terrail,
Quentin Klopfenstein,
Honghao Li,
Imke Mayer,
Nicolas Loiseau,
Mohammad Hallal,
Félix Balazard,
Mathieu Andreux
Abstract:
External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original col…
▽ More
External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original collection centers, along with pharmaceutical companies' competitive motives. In this paper, we leverage a privacy-enhancing technology called federated learning (FL) to remove some of the barriers to data sharing. We introduce a federated learning inverse probability of treatment weighted (IPTW) method for time-to-event outcomes called FedECA which eases the implementation of ECA by limiting patients' data exposure. We show with extensive experiments that FedECA outperforms its closest competitor, matching-adjusted indirect comparison (MAIC), in terms of statistical power and ability to balance the treatment and control groups. To encourage the use of such methods, we publicly release our code which relies on Substra, an open-source FL software with proven experience in privacy-sensitive contexts.
△ Less
Submitted 20 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
SRATTA : Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning
Authors:
Tanguy Marchand,
Régis Loeb,
Ulysse Marteau-Ferey,
Jean Ogier du Terrail,
Arthur Pignet
Abstract:
We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples fro…
▽ More
We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
Authors:
Jean Ogier du Terrail,
Samy-Safwan Ayed,
Edwige Cyffers,
Felix Grimberg,
Chaoyang He,
Regis Loeb,
Paul Mangold,
Tanguy Marchand,
Othmane Marfoq,
Erum Mushtaq,
Boris Muzellec,
Constantin Philippenko,
Santiago Silva,
Maria Teleńczuk,
Shadi Albarqouni,
Salman Avestimehr,
Aurélien Bellet,
Aymeric Dieuleveut,
Martin Jaggi,
Sai Praneeth Karimireddy,
Marco Lorenzi,
Giovanni Neglia,
Marc Tommasi,
Mathieu Andreux
Abstract:
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works hav…
▽ More
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.
△ Less
Submitted 5 May, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning
Authors:
Tanguy Marchand,
Boris Muzellec,
Constance Beguier,
Jean Ogier du Terrail,
Mathieu Andreux
Abstract:
The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which all…
▽ More
The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which allows us to optimize it with exponential search. We numerically show that the resulting algorithm is more stable than the state-of-the-art approach based on the Brent minimization method. Building on this simple algorithm and Secure Multiparty Computation routines, we propose SecureFedYJ, a federated algorithm that performs a pooled-equivalent YJ transformation without leaking more information than the final fitted parameters do. Quantitative experiments on real data demonstrate that, in addition to being secure, our approach reliably normalizes features across silos as well as if data were pooled, making it a viable approach for safe federated feature Gaussianization.
△ Less
Submitted 13 October, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Differentially Private Federated Learning for Cancer Prediction
Authors:
Constance Beguier,
Jean Ogier du Terrail,
Iqraa Meah,
Mathieu Andreux,
Eric W. Tramel
Abstract:
Since 2014, the NIH funded iDASH (integrating Data for Analysis, Anonymization, SHaring) National Center for Biomedical Computing has hosted yearly competitions on the topic of private computing for genomic data. For one track of the 2020 iteration of this competition, participants were challenged to produce an approach to federated learning (FL) training of genomic cancer prediction models using…
▽ More
Since 2014, the NIH funded iDASH (integrating Data for Analysis, Anonymization, SHaring) National Center for Biomedical Computing has hosted yearly competitions on the topic of private computing for genomic data. For one track of the 2020 iteration of this competition, participants were challenged to produce an approach to federated learning (FL) training of genomic cancer prediction models using differential privacy (DP), with submissions ranked according to held-out test accuracy for a given set of DP budgets. More precisely, in this track, we are tasked with training a supervised model for the prediction of breast cancer occurrence from genomic data split between two virtual centers while ensuring data privacy with respect to model transfer via DP. In this article, we present our 3rd place submission to this competition. During the competition, we encountered two main challenges discussed in this article: i) ensuring correctness of the privacy budget evaluation and ii) achieving an acceptable trade-off between prediction performance and privacy budget.
△ Less
Submitted 24 March, 2021; v1 submitted 8 January, 2021;
originally announced January 2021.
-
Siloed Federated Learning for Multi-Centric Histopathology Datasets
Authors:
Mathieu Andreux,
Jean Ogier du Terrail,
Constance Beguier,
Eric W. Tramel
Abstract:
While federated learning is a promising approach for training deep learning models over distributed sensitive datasets, it presents new challenges for machine learning, especially when applied in the medical domain where multi-centric data heterogeneity is common. Building on previous domain adaptation works, this paper proposes a novel federated learning approach for deep learning architectures v…
▽ More
While federated learning is a promising approach for training deep learning models over distributed sensitive datasets, it presents new challenges for machine learning, especially when applied in the medical domain where multi-centric data heterogeneity is common. Building on previous domain adaptation works, this paper proposes a novel federated learning approach for deep learning architectures via the introduction of local-statistic batch normalization (BN) layers, resulting in collaboratively-trained, yet center-specific models. This strategy improves robustness to data heterogeneity while also reducing the potential for information leaks by not sharing the center-specific layer activation statistics. We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets. We show that our approach compares favorably to previous state-of-the-art methods, especially for transfer learning across datasets.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Faster RER-CNN: application to the detection of vehicles in aerial images
Authors:
Jean Ogier du Terrail,
Frédéric Jurie
Abstract:
Detecting small vehicles in aerial images is a difficult job that can be challenging even for humans. Rotating objects, low resolution, small inter-class variability and very large images comprising complicated backgrounds render the work of photo-interpreters tedious and wearisome. Unfortunately even the best classical detection pipelines like Faster R-CNN cannot be used off-the-shelf with good r…
▽ More
Detecting small vehicles in aerial images is a difficult job that can be challenging even for humans. Rotating objects, low resolution, small inter-class variability and very large images comprising complicated backgrounds render the work of photo-interpreters tedious and wearisome. Unfortunately even the best classical detection pipelines like Faster R-CNN cannot be used off-the-shelf with good results because they were built to process object centric images from day-to-day life with multi-scale vertical objects. In this work we build on the Faster R-CNN approach to turn it into a detection framework that deals appropriately with the rotation equivariance inherent to any aerial image task. This new pipeline (Faster Rotation Equivariant Regions CNN) gives, without any bells and whistles, state-of-the-art results on one of the most challenging aerial imagery datasets: VeDAI and give good results w.r.t. the baseline Faster R-CNN on two others: Munich and GoogleEarth .
△ Less
Submitted 16 September, 2020; v1 submitted 20 September, 2018;
originally announced September 2018.
-
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks
Authors:
Shivang Agarwal,
Jean Ogier Du Terrail,
Frédéric Jurie
Abstract:
Object detection-the computer vision task dealing with detecting instances of objects of a certain class (e.g., 'car', 'plane', etc.) in images-attracted a lot of attention from the community during the last 5 years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolu…
▽ More
Object detection-the computer vision task dealing with detecting instances of objects of a certain class (e.g., 'car', 'plane', etc.) in images-attracted a lot of attention from the community during the last 5 years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolutional neural networks (DCNN). This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can be extended. This survey also reviews the public datasets and associated state-of-the-art algorithms.
△ Less
Submitted 20 August, 2019; v1 submitted 10 September, 2018;
originally announced September 2018.