Robust, privacy-preserving, transparent, and auditable on-device blocklisting
Authors:
Kurt Thomas,
Sarah Meiklejohn,
Michael A. Specter,
Xiang Wang,
Xavier Llorà,
Stephan Somogyi,
David Kleidermacher
Abstract:
With the accelerated adoption of end-to-end encryption, there is an opportunity to re-architect security and anti-abuse primitives in a manner that preserves new privacy expectations. In this paper, we consider two novel protocols for on-device blocklisting that allow a client to determine whether an object (e.g., URL, document, image, etc.) is harmful based on threat information possessed by a so…
▽ More
With the accelerated adoption of end-to-end encryption, there is an opportunity to re-architect security and anti-abuse primitives in a manner that preserves new privacy expectations. In this paper, we consider two novel protocols for on-device blocklisting that allow a client to determine whether an object (e.g., URL, document, image, etc.) is harmful based on threat information possessed by a so-called remote enforcer in a way that is both privacy-preserving and trustworthy. Our protocols leverage a unique combination of private set intersection to promote privacy, cryptographic hashes to ensure resilience to false positives, cryptographic signatures to improve transparency, and Merkle inclusion proofs to ensure consistency and auditability. We benchmark our protocols -- one that is time-efficient, and the other space-efficient -- to demonstrate their practical use for applications such as email, messaging, storage, and other applications. We also highlight remaining challenges, such as privacy and censorship tensions that exist with logging or reporting. We consider our work to be a critical first step towards enabling complex, multi-stakeholder discussions on how best to provide on-device protections.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
Robust Machine Learning Applied to Astronomical Datasets II: Quantifying Photometric Redshifts for Quasars Using Instance-Based Learning
Authors:
Nicholas M. Ball,
Robert J. Brunner,
Adam D. Myers,
Natalie E. Strand,
Stacey L. Alberts,
David Tcheng,
Xavier Llorà
Abstract:
We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs,…
▽ More
We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs, we find that the instance-based photometric redshifts are assigned with no regions of catastrophic failure. Remaining outliers are simply scattered about the ideal relation, in a similar manner to the pattern seen in the optical for normal galaxies at redshifts z < ~1. The instance-based algorithm is trained on a representative sample of the data and pseudo-blind-tested on the remaining unseen data. The variance between the photometric and spectroscopic redshifts is sigma^2 = 0.123 +/- 0.002 (compared to sigma^2 = 0.265 +/- 0.006 for the CZR), and 54.9 +/- 0.7%, 73.3 +/- 0.6%, and 80.7 +/- 0.3% of the objects are within delta z < 0.1, 0.2, and 0.3 respectively. We also match our sample to the Second Data Release of the Galaxy Evolution Explorer legacy data and the resulting 7,642 objects show a further improvement, giving a variance of sigma^2 = 0.054 +/- 0.005, and 70.8 +/- 1.2%, 85.8 +/- 1.0%, and 90.8 +/- 0.7% of objects within delta z < 0.1, 0.2, and 0.3. We show that the improvement is indeed due to the extra information provided by GALEX, by training on the same dataset using purely SDSS photometry, which has a variance of sigma^2 = 0.090 +/- 0.007. Each set of results represents a realistic standard for application to further datasets for which the spectra are representative.
△ Less
Submitted 22 March, 2007; v1 submitted 17 December, 2006;
originally announced December 2006.