-
Identifying Climate Targets in National Laws and Policies using Machine Learning
Authors:
Matyas Juhasz,
Tina Marchand,
Roshan Melwani,
Kalyan Dutia,
Sarah Goodenough,
Harrison Pim,
Henry Franks
Abstract:
Quantified policy targets are a fundamental element of climate policy, typically characterised by domain-specific and technical language. Current methods for curating comprehensive views of global climate policy targets entail significant manual effort. At present there are few scalable methods for extracting climate targets from national laws or policies, which limits policymakers' and researcher…
▽ More
Quantified policy targets are a fundamental element of climate policy, typically characterised by domain-specific and technical language. Current methods for curating comprehensive views of global climate policy targets entail significant manual effort. At present there are few scalable methods for extracting climate targets from national laws or policies, which limits policymakers' and researchers' ability to (1) assess private and public sector alignment with global goals and (2) inform policy decisions. In this paper we present an approach for extracting mentions of climate targets from national laws and policies. We create an expert-annotated dataset identifying three categories of target ('Net Zero', 'Reduction' and 'Other' (e.g. renewable energy targets)) and train a classifier to reliably identify them in text. We investigate bias and equity impacts related to our model and identify specific years and country names as problematic features. Finally, we investigate the characteristics of the dataset produced by running this classifier on the Climate Policy Radar (CPR) dataset of global national climate laws and policies and UNFCCC submissions, highlighting the potential of automated and scalable data collection for existing climate policy databases and supporting further research. Our work represents a significant upgrade in the accessibility of these key climate policy elements for policymakers and researchers. We publish our model at https://huggingface.co/ClimatePolicyRadar/national-climate-targets and related dataset at https://huggingface.co/datasets/ClimatePolicyRadar/national-climate-targets.
△ Less
Submitted 4 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
SRATTA : Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning
Authors:
Tanguy Marchand,
Régis Loeb,
Ulysse Marteau-Ferey,
Jean Ogier du Terrail,
Arthur Pignet
Abstract:
We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples fro…
▽ More
We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
Authors:
Jean Ogier du Terrail,
Samy-Safwan Ayed,
Edwige Cyffers,
Felix Grimberg,
Chaoyang He,
Regis Loeb,
Paul Mangold,
Tanguy Marchand,
Othmane Marfoq,
Erum Mushtaq,
Boris Muzellec,
Constantin Philippenko,
Santiago Silva,
Maria Teleńczuk,
Shadi Albarqouni,
Salman Avestimehr,
Aurélien Bellet,
Aymeric Dieuleveut,
Martin Jaggi,
Sai Praneeth Karimireddy,
Marco Lorenzi,
Giovanni Neglia,
Marc Tommasi,
Mathieu Andreux
Abstract:
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works hav…
▽ More
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.
△ Less
Submitted 5 May, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning
Authors:
Tanguy Marchand,
Boris Muzellec,
Constance Beguier,
Jean Ogier du Terrail,
Mathieu Andreux
Abstract:
The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which all…
▽ More
The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which allows us to optimize it with exponential search. We numerically show that the resulting algorithm is more stable than the state-of-the-art approach based on the Brent minimization method. Building on this simple algorithm and Secure Multiparty Computation routines, we propose SecureFedYJ, a federated algorithm that performs a pooled-equivalent YJ transformation without leaking more information than the final fitted parameters do. Quantitative experiments on real data demonstrate that, in addition to being secure, our approach reliably normalizes features across silos as well as if data were pooled, making it a viable approach for safe federated feature Gaussianization.
△ Less
Submitted 13 October, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Wavelet Conditional Renormalization Group
Authors:
Tanguy Marchand,
Misaki Ozawa,
Giulio Biroli,
Stéphane Mallat
Abstract:
We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Re…
▽ More
We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Renormalization Group (WC-RG) -- proceeds scale by scale, estimating models for the conditional probabilities of "fast degrees of freedom" conditioned by coarse-grained fields. These probability distributions are modeled by energy functions associated with scale interactions, and are represented in an orthogonal wavelet basis. WC-RG decomposes the microscopic energy function as a sum of interaction energies at all scales and can efficiently generate new samples by going from coarse to fine scales. Near phase transitions, it avoids the "critical slowing down" of direct estimation and sampling algorithms. This is explained theoretically by combining results from RG and wavelet theories, and verified numerically for the Gaussian and $\varphi^4$ field theories. We show that multiscale WC-RG energy-based models are more general than local potential models and can capture the physics of complex many-body interacting systems at all length scales. This is demonstrated for weak-gravitational-lensing fields reflecting dark matter distributions in cosmology, which include long-range interactions with long-tail probability distributions. WC-RG has a large number of potential applications in non-equilibrium systems, where the underlying distribution is not known {\it a priori}. Finally, we discuss the connection between WC-RG and deep network architectures.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Self supervised learning improves dMMR/MSI detection from histology slides across multiple cancers
Authors:
Charlie Saillard,
Olivier Dehaene,
Tanguy Marchand,
Olivier Moindrot,
Aurélie Kamoun,
Benoit Schmauch,
Simon Jegou
Abstract:
Microsatellite instability (MSI) is a tumor phenotype whose diagnosis largely impacts patient care in colorectal cancers (CRC), and is associated with response to immunotherapy in all solid tumors. Deep learning models detecting MSI tumors directly from H&E stained slides have shown promise in improving diagnosis of MSI patients. Prior deep learning models for MSI detection have relied on neural n…
▽ More
Microsatellite instability (MSI) is a tumor phenotype whose diagnosis largely impacts patient care in colorectal cancers (CRC), and is associated with response to immunotherapy in all solid tumors. Deep learning models detecting MSI tumors directly from H&E stained slides have shown promise in improving diagnosis of MSI patients. Prior deep learning models for MSI detection have relied on neural networks pretrained on ImageNet dataset, which does not contain any medical image. In this study, we leverage recent advances in self-supervised learning by training neural networks on histology images from the TCGA dataset using MoCo V2. We show that these networks consistently outperform their counterparts pretrained using ImageNet and obtain state-of-the-art results for MSI detection with AUCs of 0.92 and 0.83 for CRC and gastric tumors, respectively. These models generalize well on an external CRC cohort (0.97 AUC on PAIP) and improve transfer from one organ to another. Finally we show that predictive image regions exhibit meaningful histological patterns, and that the use of MoCo features highlighted more relevant patterns according to an expert pathologist.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
New Interpretable Statistics for Large Scale Structure Analysis and Generation
Authors:
E. Allys,
T. Marchand,
J. -F. Cardoso,
F. Villaescusa-Navarro,
S. Ho,
S. Mallat
Abstract:
We introduce Wavelet Phase Harmonics (WPH) statistics: interpretable low-dimensional statistics that describe 2D non-Gaussian fields. These statistics are built from WPH moments, which were recently introduced in the data science and machine learning community. We apply WPH statistics to projected 2D matter density fields from the Quijote N-body simulations of the large-scale structure of the Univ…
▽ More
We introduce Wavelet Phase Harmonics (WPH) statistics: interpretable low-dimensional statistics that describe 2D non-Gaussian fields. These statistics are built from WPH moments, which were recently introduced in the data science and machine learning community. We apply WPH statistics to projected 2D matter density fields from the Quijote N-body simulations of the large-scale structure of the Universe. By computing Fisher information matrices, we find that the WPH statistics place more stringent constraints on four of five cosmological parameters when compared to statistics based on the combination of the power spectrum and bispectrum. We also use the WPH statistics with a maximum entropy model to statistically generate new 2D density fields that accurately reproduce the probability density function, the mean and standard deviation of the power spectrum, the bispectrum, and Minkowski functionals of the input density fields. Although other methods are efficient for either parameter estimates or statistical syntheses of the large-scale structure, WPH statistics are the first statistics that achieve state-of-the-art results for both tasks as well as being interpretable.
△ Less
Submitted 5 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
The mass quadrupole moment of compact binary systems at the fourth post-Newtonian order
Authors:
Tanguy Marchand,
Quentin Henry,
François Larrouturou,
Sylvain Marsat,
Guillaume Faye,
Luc Blanchet
Abstract:
The mass-type quadrupole moment of inspiralling compact binaries (without spins) is computed at the fourth post-Newtonian (4PN) approximation of general relativity. The multipole moments are defined by matching between the field in the exterior zone of the matter system and the PN field in the near zone, following the multipolar-post-Minkowskian (MPM)-PN formalism. The matching implies a specific…
▽ More
The mass-type quadrupole moment of inspiralling compact binaries (without spins) is computed at the fourth post-Newtonian (4PN) approximation of general relativity. The multipole moments are defined by matching between the field in the exterior zone of the matter system and the PN field in the near zone, following the multipolar-post-Minkowskian (MPM)-PN formalism. The matching implies a specific regularization for handling infra-red (IR) divergences of the multipole moments at infinity, based on the Hadamard finite part procedure. On the other hand, the calculation entails ultra-violet (UV) divergences due to the modelling of compact objects by delta-functions, that are treated with dimensional regularization (DR). In future work we intend to systematically study the IR divergences by means of dimensional regularization as well. Our result constitutes an important step in the goal of obtaining the gravitational wave templates of inspiralling compact binary systems with 4PN/4.5PN accuracy.
△ Less
Submitted 21 September, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Center-of-Mass Equations of Motion and Conserved Integrals of Compact Binary Systems at the Fourth Post-Newtonian Order
Authors:
Laura Bernard,
Luc Blanchet,
Guillaume Faye,
Tanguy Marchand
Abstract:
The dynamics of compact binary systems at the fourth post-Newtonian (4PN) approximation of general relativity has been recently completed in a self-consistent way. In this paper, we compute the ten Poincaré constants of the motion and present the equations of motion in the frame of the center of mass (CM), together with the corresponding CM Lagrangian, conserved energy and conserved angular moment…
▽ More
The dynamics of compact binary systems at the fourth post-Newtonian (4PN) approximation of general relativity has been recently completed in a self-consistent way. In this paper, we compute the ten Poincaré constants of the motion and present the equations of motion in the frame of the center of mass (CM), together with the corresponding CM Lagrangian, conserved energy and conserved angular momentum. Next, we investigate the reduction of the CM dynamics to the case of quasi-circular orbits. The non local (in time) tail effect at the 4PN order is consistently included, as well as the relevant radiation-reaction dissipative contributions to the energy and angular momentum.
△ Less
Submitted 19 March, 2018; v1 submitted 1 November, 2017;
originally announced November 2017.
-
Ambiguity-Free Completion of the Equations of Motion of Compact Binary Systems at the Fourth Post-Newtonian Order
Authors:
Tanguy Marchand,
Laura Bernard,
Luc Blanchet,
Guillaume Faye
Abstract:
We present the first complete (i.e., ambiguity-free) derivation of the equations of motion of two non-spinning compact objects up to the 4PN order, based on the Fokker action of point particles in harmonic coordinates. The last ambiguity parameter is determined from first principle, by resorting to a matching between the near zone and far zone fields, and a consistent computation of the 4PN tail e…
▽ More
We present the first complete (i.e., ambiguity-free) derivation of the equations of motion of two non-spinning compact objects up to the 4PN order, based on the Fokker action of point particles in harmonic coordinates. The last ambiguity parameter is determined from first principle, by resorting to a matching between the near zone and far zone fields, and a consistent computation of the 4PN tail effect in d dimensions. Dimensional regularization is used throughout for treating IR divergences appearing at 4PN order, as well as UV divergences due to the model of point particles describing compact objects.
△ Less
Submitted 9 March, 2018; v1 submitted 28 July, 2017;
originally announced July 2017.
-
Gravitational-wave tail effects to quartic non-linear order
Authors:
Tanguy Marchand,
Luc Blanchet,
Guillaume Faye
Abstract:
Gravitational-wave tails are due to the backscattering of linear waves onto the space-time curvature generated by the total mass of the matter source. The dominant tails correspond to quadratic non-linear interactions and arise at the one-and-a-half post-Newtonian (1.5PN) order in the gravitational waveform. The "tails-of-tails", which are cubic non-linear effects appearing at the 3PN order in the…
▽ More
Gravitational-wave tails are due to the backscattering of linear waves onto the space-time curvature generated by the total mass of the matter source. The dominant tails correspond to quadratic non-linear interactions and arise at the one-and-a-half post-Newtonian (1.5PN) order in the gravitational waveform. The "tails-of-tails", which are cubic non-linear effects appearing at the 3PN order in the waveform, are also known. We derive here higher non-linear tail effects, namely those associated with quartic non-linear interactions or "tails-of-tails-of-tails", which are shown to arise at the 4.5PN order. As an application, we obtain at that order the complete coefficient in the total gravitational-wave energy flux of compact binary systems moving on circular orbits. Our result perfectly agrees with black-hole perturbation calculations in the limit of extreme mass ratio of the two compact objects.
△ Less
Submitted 28 November, 2016; v1 submitted 26 July, 2016;
originally announced July 2016.