-
Less is More: A privacy-respecting Android malware classifier using Federated Learning
Authors:
Rafa Gálvez,
Veelasha Moonsamy,
Claudia Diaz
Abstract:
In this paper we present LiM ("Less is More"), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users' devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the fe…
▽ More
In this paper we present LiM ("Less is More"), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users' devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in >100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with MaMaDroid's dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.
△ Less
Submitted 16 June, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates
Authors:
Miles D. Cranmer,
Richard Galvez,
Lauren Anderson,
David N. Spergel,
Shirley Ho
Abstract:
We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallax…
▽ More
We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallaxes. Dust estimation and dereddening is done iteratively inside the model and without prior distance information, using the Bayestar map. The signal-to-noise (precision) of distance measurements improves on average by more than 48% over the raw Gaia data, and we also demonstrate how the accuracy of distances have improved over other models, especially in the noisy-parallax regime. Applications are discussed, including significantly improved Milky Way disk separation and substructure detection. We conclude with a discussion of future work, which exploits the normalizing flow architecture to allow us to exactly marginalize over missing photometry, enabling the inclusion of many surveys without losing coverage.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
A Machine Learning Dataset Prepared From the NASA Solar Dynamics Observatory Mission
Authors:
Richard Galvez,
David F. Fouhey,
Meng **,
Alexandre Szenicer,
Andrés Muñoz-Jaramillo,
Mark C. M. Cheung,
Paul J. Wright,
Monica G. Bobra,
Yang Liu,
James Mason,
Rajat Thomas
Abstract:
In this paper we present a curated dataset from the NASA Solar Dynamics Observatory (SDO) mission in a format suitable for machine learning research. Beginning from level 1 scientific products we have processed various instrumental corrections, downsampled to manageable spatial and temporal resolutions, and synchronized observations spatially and temporally. We illustrate the use of this dataset w…
▽ More
In this paper we present a curated dataset from the NASA Solar Dynamics Observatory (SDO) mission in a format suitable for machine learning research. Beginning from level 1 scientific products we have processed various instrumental corrections, downsampled to manageable spatial and temporal resolutions, and synchronized observations spatially and temporally. We illustrate the use of this dataset with two example applications: forecasting future EVE irradiance from present EVE irradiance and translating HMI observations into AIA observations. For each application we provide metrics and baselines for future model comparison. We anticipate this curated dataset will facilitate machine learning research in heliophysics and the physical sciences generally, increasing the scientific return of the SDO mission. This work is a direct result of the 2018 NASA Frontier Development Laboratory Program. Please see the appendix for access to the dataset.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.