-
Pushing the Boundaries of Private, Large-Scale Query Answering
Authors:
Brendan Avent,
Aleksandra Korolova
Abstract:
We address the problem of efficiently and effectively answering large numbers of queries on a sensitive dataset while ensuring differential privacy (DP). We separately analyze this problem in two distinct settings, grounding our work in a state-of-the-art DP mechanism for large-scale query answering: the Relaxed Adaptive Projection (RAP) mechanism.
The first setting is a classic setting in DP li…
▽ More
We address the problem of efficiently and effectively answering large numbers of queries on a sensitive dataset while ensuring differential privacy (DP). We separately analyze this problem in two distinct settings, grounding our work in a state-of-the-art DP mechanism for large-scale query answering: the Relaxed Adaptive Projection (RAP) mechanism.
The first setting is a classic setting in DP literature where all queries are known to the mechanism in advance. Within this setting, we identify challenges in the RAP mechanism's original analysis, then overcome them with an enhanced implementation and analysis. We then extend the capabilities of the RAP mechanism to be able to answer a more general and powerful class of queries (r-of-k thresholds) than previously considered. Empirically evaluating this class, we find that the mechanism is able to answer orders of magnitude larger sets of queries than prior works, and does so quickly and with high utility.
We then define a second setting motivated by real-world considerations and whose definition is inspired by work in the field of machine learning. In this new setting, a mechanism is only given partial knowledge of queries that will be posed in the future, and it is expected to answer these future-posed queries with high utility. We formally define this setting and how to measure a mechanism's utility within it. We then comprehensively empirically evaluate the RAP mechanism's utility within this new setting. From this evaluation, we find that even with weak partial knowledge of the future queries that will be posed, the mechanism is able to efficiently and effectively answer arbitrary queries posed in the future. Taken together, the results from these two settings advance the state of the art on differentially private large-scale query answering.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Advances and Open Problems in Federated Learning
Authors:
Peter Kairouz,
H. Brendan McMahan,
Brendan Avent,
Aurélien Bellet,
Mehdi Bennis,
Arjun Nitin Bhagoji,
Kallista Bonawitz,
Zachary Charles,
Graham Cormode,
Rachel Cummings,
Rafael G. L. D'Oliveira,
Hubert Eichner,
Salim El Rouayheb,
David Evans,
Josh Gardner,
Zachary Garrett,
Adrià Gascón,
Badih Ghazi,
Phillip B. Gibbons,
Marco Gruteser,
Zaid Harchaoui,
Chaoyang He,
Lie He,
Zhouyuan Huo,
Ben Hutchinson
, et al. (34 additional authors not shown)
Abstract:
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re…
▽ More
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
△ Less
Submitted 8 March, 2021; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Automatic Discovery of Privacy-Utility Pareto Fronts
Authors:
Brendan Avent,
Javier Gonzalez,
Tom Diethe,
Andrei Paleyes,
Borja Balle
Abstract:
Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while maintaining acceptable…
▽ More
Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while maintaining acceptable utility. Analytical utility guarantees offer a rigorous tool to reason about this trade-off, but are generally only available for relatively simple problems. For more complex tasks, such as training neural networks under differential privacy, the utility achieved by a given algorithm can only be measured empirically. This paper presents a Bayesian optimization methodology for efficiently characterizing the privacy--utility trade-off of any differentially private algorithm using only empirical measurements of its utility. The versatility of our method is illustrated on a number of machine learning tasks involving multiple models, optimizers, and datasets.
△ Less
Submitted 21 July, 2020; v1 submitted 26 May, 2019;
originally announced May 2019.
-
The Power of The Hybrid Model for Mean Estimation
Authors:
Brendan Avent,
Yatharth Dubey,
Aleksandra Korolova
Abstract:
We explore the power of the hybrid model of differential privacy (DP), in which some users desire the guarantees of the local model of DP and others are content with receiving the trusted-curator model guarantees. In particular, we study the utility of hybrid model estimators that compute the mean of arbitrary real-valued distributions with bounded support. When the curator knows the distribution'…
▽ More
We explore the power of the hybrid model of differential privacy (DP), in which some users desire the guarantees of the local model of DP and others are content with receiving the trusted-curator model guarantees. In particular, we study the utility of hybrid model estimators that compute the mean of arbitrary real-valued distributions with bounded support. When the curator knows the distribution's variance, we design a hybrid estimator that, for realistic datasets and parameter settings, achieves a constant factor improvement over natural baselines. We then analytically characterize how the estimator's utility is parameterized by the problem setting and parameter choices. When the distribution's variance is unknown, we design a heuristic hybrid estimator and analyze how it compares to the baselines. We find that it often performs better than the baselines, and sometimes almost as well as the known-variance estimator. We then answer the question of how our estimator's utility is affected when users' data are not drawn from the same distribution, but rather from distributions dependent on their trust model preference. Concretely, we examine the implications of the two groups' distributions diverging and show that in some cases, our estimators maintain fairly high utility. We then demonstrate how our hybrid estimator can be incorporated as a sub-component in more complex, higher-dimensional applications. Finally, we propose a new privacy amplification notion for the hybrid model that emerges due to interaction between the groups, and derive corresponding amplification results for our hybrid estimators.
△ Less
Submitted 15 July, 2020; v1 submitted 29 November, 2018;
originally announced November 2018.
-
BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model
Authors:
Brendan Avent,
Aleksandra Korolova,
David Zeber,
Torgeir Hovden,
Benjamin Livshits
Abstract:
We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm for the task of privately computing the head of a search log. This blended app…
▽ More
We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm for the task of privately computing the head of a search log. This blended approach provides significant improvements in the utility of obtained data compared to related work while providing users with their desired privacy guarantees. Specifically, on two large search click data sets, comprising 1.75 and 16 GB respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.
△ Less
Submitted 21 November, 2019; v1 submitted 2 May, 2017;
originally announced May 2017.