Skip to main content

Showing 1–50 of 64 results for author: Kairouz, P

.
  1. arXiv:2407.03496  [pdf, other

    cs.CR cs.DB

    Releasing Large-Scale Human Mobility Histograms with Differential Privacy

    Authors: Christopher Bian, Albert Cheu, Yannis Guzman, Marco Gruteser, Peter Kairouz, Ryan McKenna, Edo Roth

    Abstract: Environmental Insights Explorer (EIE) is a Google product that reports aggregate statistics about human mobility, including various methods of transit used by people across roughly 50,000 regions globally. These statistics are used to estimate carbon emissions and provided to policymakers to inform their decisions on transportation policy and infrastructure. Due to the inherent sensitivity of this… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2406.09073  [pdf, other

    cs.LG

    Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

    Authors: Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera, Gintare Karolina Dziugaite, Peter Triantafillou, Isabelle Guyon

    Abstract: We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In thi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.05175  [pdf, other

    cs.CR cs.CL cs.LG

    Air Gap: Protecting Privacy-Conscious Conversational Agents

    Authors: Eugene Bagdasaryan, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

    Abstract: The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into re… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.02341  [pdf, other

    cs.CR cs.LG

    Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

    Authors: Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu

    Abstract: We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  5. arXiv:2404.11607  [pdf, other

    cs.DS

    Private federated discovery of out-of-vocabulary words for Gboard

    Authors: Ziteng Sun, Peter Kairouz, Haicheng Sun, Adria Gascon, Ananda Theertha Suresh

    Abstract: The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for G… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  6. arXiv:2404.10764  [pdf, other

    cs.CR cs.LG

    Confidential Federated Computations

    Authors: Hubert Eichner, Daniel Ramage, Kallista Bonawitz, Dzmitry Huba, Tiziano Santoro, Brett McLarnon, Timon Van Overveldt, Nova Fallen, Peter Kairouz, Albert Cheu, Katharine Daly, Adria Gascon, Marco Gruteser, Brendan McMahan

    Abstract: Federated Learning and Analytics (FLA) have seen widespread adoption by technology platforms for processing sensitive on-device data. However, basic FLA systems have privacy limitations: they do not necessarily require anonymization mechanisms like differential privacy (DP), and provide limited protections against a potentially malicious service provider. Adding DP to a basic FLA system currently… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  7. arXiv:2404.01041  [pdf, other

    cs.LG cs.AI cs.CR cs.MA

    Can LLMs get help from other LLMs without revealing private information?

    Authors: Florian Hartmann, Duc-Hieu Tran, Peter Kairouz, Victor Cărbune, Blaise Aguera y Arcas

    Abstract: Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself. Serving stacks for large language models (LLMs) increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs. However, applying cascade systems in situations where th… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2402.13659  [pdf, other

    cs.CR cs.CL

    Privacy-Preserving Instructions for Aligning Large Language Models

    Authors: Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu

    Abstract: Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic in… ▽ More

    Submitted 2 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Code available at https://github.com/google-research/google-research/tree/master/dp_instructions

  9. arXiv:2310.09266  [pdf, other

    cs.CR cs.CL cs.LG

    User Inference Attacks on Large Language Models

    Authors: Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz, Christopher A. Choquette-Choo, Zheng Xu

    Abstract: Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we consider a realistic threat model, called user inference, wherein an attacker infers whether or not a user's data was used for fine-tuning. We design attacks for performing u… ▽ More

    Submitted 23 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: v2 contains experiments on additional datasets and differential privacy

  10. arXiv:2307.13347  [pdf, other

    cs.DS cs.CR cs.IT

    Federated Heavy Hitter Recovery under Linear Sketching

    Authors: Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh

    Abstract: Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  11. arXiv:2307.10999  [pdf, other

    cs.LG stat.ML

    Private Federated Learning with Autotuned Compression

    Authors: Enayat Ullah, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh

    Abstract: We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-opt… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted to ICML 2023

  12. arXiv:2306.14793  [pdf, other

    cs.CR

    Private Federated Learning in Gboard

    Authors: Yuanbo Zhang, Daniel Ramage, Zheng Xu, Yanxiang Zhang, Shumin Zhai, Peter Kairouz

    Abstract: This white paper describes recent advances in Gboard(Google Keyboard)'s use of federated learning, DP-Follow-the-Regularized-Leader (DP-FTRL) algorithm, and secure aggregation techniques to train machine learning (ML) models for suggestion, prediction and correction intelligence from many users' ty** data. Gboard's investment in those privacy technologies allows users' ty** data to be processe… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  13. arXiv:2306.09396  [pdf, other

    cs.DS cs.LG

    Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

    Authors: **gfeng Wu, Wennan Zhu, Peter Kairouz, Vladimir Braverman

    Abstract: In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketc… ▽ More

    Submitted 2 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 camera ready version

  14. arXiv:2305.18465  [pdf, other

    cs.LG cs.CR

    Federated Learning of Gboard Language Models with Differential Privacy

    Authors: Zheng Xu, Yanxiang Zhang, Galen Andrew, Christopher A. Choquette-Choo, Peter Kairouz, H. Brendan McMahan, Jesse Rosenstock, Yuanbo Zhang

    Abstract: We train language models (LMs) with federated learning (FL) and differential privacy (DP) in the Google Keyboard (Gboard). We apply the DP-Follow-the-Regularized-Leader (DP-FTRL)~\citep{kairouz21b} algorithm to achieve meaningfully formal DP guarantees without requiring uniform sampling of client devices. To provide favorable privacy-utility trade-offs, we introduce a new client participation crit… ▽ More

    Submitted 17 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: ACL industry track; v2 updating SecAgg details

  15. arXiv:2305.18447  [pdf, other

    cs.LG cs.CR cs.IT math.ST

    Unleashing the Power of Randomization in Auditing Differentially Private ML

    Authors: Krishna Pillutla, Galen Andrew, Peter Kairouz, H. Brendan McMahan, Alina Oprea, Sewoong Oh

    Abstract: We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. This gives us the freedom to desi… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  16. arXiv:2304.06929  [pdf

    cs.CR

    Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

    Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

    Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  17. arXiv:2304.01541  [pdf, other

    stat.ML cs.CR cs.LG

    Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation

    Authors: Wei-Ning Chen, Dan Song, Ayfer Ozgur, Peter Kairouz

    Abstract: Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA). We study the optimal accuracy of mean and frequency estimation (canonical models for FL and FA respectively) under joint communication and $(\varepsilon, δ)$-differential privacy (DP) constraints. We show that in order to achieve the optimal error under $(\varepsilon, δ)$-DP, it is suffic… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  18. arXiv:2303.18086  [pdf, other

    cs.CR cs.DB

    Differentially Private Stream Processing at Scale

    Authors: Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek

    Abstract: We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

  19. arXiv:2302.03098  [pdf, other

    cs.LG cs.CR

    One-shot Empirical Privacy Estimation for Federated Learning

    Authors: Galen Andrew, Peter Kairouz, Sewoong Oh, Alina Oprea, H. Brendan McMahan, Vinith M. Suriyakumar

    Abstract: Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution),… ▽ More

    Submitted 18 April, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Final revision, oral presentation at ICLR 2024

  20. arXiv:2207.09916  [pdf, other

    cs.CR cs.IT cs.LG stat.ML

    The Poisson binomial mechanism for secure and private federated learning

    Authors: Wei-Ning Chen, Ayfer Özgür, Peter Kairouz

    Abstract: We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. We provide a tight analysis of its privacy guarantees, showing that it achieves the same privacy-accuracy trade-offs as the continuous Gaussian mechanism. Our analysis is based on a novel bound on the Rényi diverge… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: 25 pages

  21. arXiv:2206.03008  [pdf, other

    cs.LG cs.CR

    Algorithms for bounding contribution for histogram estimation under user-level privacy

    Authors: Yuhan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, Marco Gruteser

    Abstract: We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be different for each user. In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribu… ▽ More

    Submitted 30 June, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: 32 pages, ICML 2023

  22. arXiv:2203.03761  [pdf, other

    cs.LG stat.ML

    The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning

    Authors: Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, Ananda Theertha Suresh

    Abstract: We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round. Taking into account the constraints imposed by SecAgg, we characterize the fundamental communication cost required to obtain the best accuracy achievable under… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  23. arXiv:2201.04782  [pdf, other

    cs.CR cs.LG

    Privacy-Utility Trades in Crowdsourced Signal Map Obfuscation

    Authors: Jiang Zhang, Lillian Clark, Matthew Clark, Konstantinos Psounis, Peter Kairouz

    Abstract: Cellular providers and data aggregating companies crowdsource celluar signal strength measurements from user devices to generate signal maps, which can be used to improve network performance. Recognizing that this data collection may be at odds with growing awareness of privacy concerns, we consider obfuscating such data before the data leaves the mobile device. The goal is to increase privacy suc… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  24. arXiv:2111.02356  [pdf, other

    cs.CR cs.LG

    Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation

    Authors: Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista Bonawitz, Deborah Estrin, Marco Gruteser

    Abstract: We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices. It aims to ensure differential privacy before data becomes visible to a service provider while maintaining high data accuracy and minimizing resource consumption on users' devices. To achieve this, we revisit distributed differential privacy based on recent results in secur… ▽ More

    Submitted 26 June, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: In PETS'22

  25. arXiv:2111.00092  [pdf, other

    cs.CR cs.LG

    Optimal Compression of Locally Differentially Private Mechanisms

    Authors: Abhin Shah, Wei-Ning Chen, Johannes Balle, Peter Kairouz, Lucas Theis

    Abstract: Compressing the output of ε-locally differentially private (LDP) randomizers naively leads to suboptimal utility. In this work, we demonstrate the benefits of using schemes that jointly compress and privatize the data using shared randomness. In particular, we investigate a family of schemes based on Minimal Random Coding (Havasi et al., 2019) and prove that they offer optimal privacy-accuracy-com… ▽ More

    Submitted 26 February, 2022; v1 submitted 29 October, 2021; originally announced November 2021.

  26. arXiv:2110.04995  [pdf, other

    cs.LG cs.CR cs.DS math.PR stat.ML

    The Skellam Mechanism for Differentially Private Federated Learning

    Authors: Naman Agarwal, Peter Kairouz, Ziyu Liu

    Abstract: We introduce the multi-dimensional Skellam mechanism, a discrete differential privacy mechanism based on the difference of two independent Poisson random variables. To quantify its privacy guarantees, we analyze the privacy loss distribution via a numerical evaluation and provide a sharp bound on the Rényi divergence between two shifted Skellam distributions. While useful in both centralized and d… ▽ More

    Submitted 29 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Paper published in NeurIPS 2021

  27. arXiv:2110.03189  [pdf, other

    cs.IT

    Pointwise Bounds for Distribution Estimation under Communication Constraints

    Authors: Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

    Abstract: We consider the problem of estimating a $d$-dimensional discrete distribution from its samples observed under a $b$-bit communication constraint. In contrast to most previous results that largely focus on the global minimax error, we study the local behavior of the estimation error and provide \emph{pointwise} bounds that depend on the target distribution $p$. In particular, we show that the… ▽ More

    Submitted 29 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  28. arXiv:2108.12851  [pdf, other

    cs.IT

    Lower Bounds for the MMSE via Neural Network Estimation and Their Applications to Privacy

    Authors: Mario Diaz, Peter Kairouz, Lalitha Sankar

    Abstract: The minimum mean-square error (MMSE) achievable by optimal estimation of a random variable $Y\in\mathbb{R}$ given another random variable $X\in\mathbb{R}^{d}$ is of much interest in a variety of statistical settings. In the context of estimation-theoretic privacy, the MMSE has been proposed as an information leakage measure that captures the ability of an adversary in estimating $Y$ upon observing… ▽ More

    Submitted 10 July, 2022; v1 submitted 29 August, 2021; originally announced August 2021.

    Comments: 42 pages

  29. arXiv:2108.10241  [pdf, other

    cs.LG cs.CR cs.DC

    Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning

    Authors: Virat Shejwalkar, Amir Houmansadr, Peter Kairouz, Daniel Ramage

    Abstract: While recent works have indicated that federated learning (FL) may be vulnerable to poisoning attacks by compromised clients, their real impact on production FL systems is not fully understood. In this work, we aim to develop a comprehensive systemization for poisoning attacks on FL by enumerating all possible threat models, variations of poisoning, and adversary capabilities. We specifically put… ▽ More

    Submitted 13 December, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

    Comments: To appear in the IEEE Symposium on Security & Privacy (Oakland), 2022

  30. arXiv:2107.06917  [pdf, other

    cs.LG

    A Field Guide to Federated Optimization

    Authors: Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H. Brendan McMahan, Blaise Aguera y Arcas, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, Suhas Diggavi, Hubert Eichner, Advait Gadhikar, Zachary Garrett, Antonious M. Girgis, Filip Hanzely, Andrew Hard, Chaoyang He, Samuel Horvath, Zhouyuan Huo, Alex Ingerman, Martin Jaggi, Tara Javidi, Peter Kairouz , et al. (28 additional authors not shown)

    Abstract: Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  31. arXiv:2106.08597  [pdf, ps, other

    stat.ML cs.LG

    Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

    Authors: Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

    Abstract: We consider the problem of estimating a $d$-dimensional $s$-sparse discrete distribution from its samples observed under a $b$-bit communication constraint. The best-known previous result on $\ell_2$ estimation error for this problem is $O\left( \frac{s\log\left( {d}/{s}\right)}{n2^b}\right)$. Surprisingly, we show that when sample size $n$ exceeds a minimum threshold $n^*(s, d, b)$, we can achiev… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  32. arXiv:2105.05180  [pdf, other

    cs.CR cs.LG

    On the Renyi Differential Privacy of the Shuffle Model

    Authors: Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Ananda Theertha Suresh, Peter Kairouz

    Abstract: The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model. In the shuffle model, each of the $n$ clients randomizes its response using a local differentially private (LDP) mechanism and the untrusted server only receives a random permutation (shuffle) of the client responses without association to ea… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  33. arXiv:2103.00039  [pdf, other

    cs.CR cs.LG

    Practical and Private (Deep) Learning without Sampling or Shuffling

    Authors: Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

    Abstract: We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in importan… ▽ More

    Submitted 10 December, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

  34. arXiv:2102.06387  [pdf, other

    cs.LG cs.DS stat.ML

    The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation

    Authors: Peter Kairouz, Ziyu Liu, Thomas Steinke

    Abstract: We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analys… ▽ More

    Submitted 8 September, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  35. arXiv:2011.00083  [pdf, other

    cs.IT cs.CR cs.DS cs.LG

    Estimating Sparse Discrete Distributions Under Local Privacy and Communication Constraints

    Authors: Jayadev Acharya, Peter Kairouz, Yuhan Liu, Ziteng Sun

    Abstract: We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor and the sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a priva… ▽ More

    Submitted 18 February, 2021; v1 submitted 30 October, 2020; originally announced November 2020.

  36. arXiv:2008.07180  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs

    Authors: Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, Ananda Theertha Suresh

    Abstract: We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. Unique challenges to the traditional ERM problem in the context of FL include (i) need to provide privacy guarantees on clients' data, (ii) compress the communication between clients and the server, since client… ▽ More

    Submitted 23 September, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

  37. arXiv:2008.06570  [pdf, ps, other

    cs.LG stat.ML

    Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

    Authors: Peter Kairouz, Mónica Ribero, Keith Rush, Abhradeep Thakurta

    Abstract: We revisit the problem of empirical risk minimziation (ERM) with differential privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions on the subspace from which gradients can be drawn, achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We show a convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of the gra… ▽ More

    Submitted 30 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

  38. arXiv:2007.11707  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Breaking the Communication-Privacy-Accuracy Trilemma

    Authors: Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

    Abstract: Two major challenges in distributed learning and estimation are 1) preserving the privacy of the local samples; and 2) communicating them efficiently to a central server, while achieving high accuracy for the end-to-end task. While there has been significant interest in addressing each of these challenges separately in the recent literature, treatments that simultaneously address both challenges a… ▽ More

    Submitted 20 April, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: 35 pages, 9 figures, submitted to NeurIPS 2020

  39. arXiv:2007.06605  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Amplification via Random Check-Ins

    Authors: Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, Abhradeep Thakurta

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via naïve schemes. A key assumption in both these approaches is that the elements in the data set can be u… ▽ More

    Submitted 30 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Updated proof for $(ε_0, δ_0)$-DP local randomizers

  40. arXiv:2001.09700  [pdf, other

    cs.LG stat.ML

    DP-CGAN: Differentially Private Synthetic Data and Label Generation

    Authors: Reihaneh Torkzadehmahani, Peter Kairouz, Benedict Paten

    Abstract: Generative Adversarial Networks (GANs) are one of the well-known models to generate synthetic data including images, especially for research communities that cannot use original sensitive datasets because they are not publicly accessible. One of the main challenges in this area is to preserve the privacy of individuals who participate in the training of the GAN models. To address this challenge, w… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

    Comments: 7 pages, 4 figures

  41. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  42. arXiv:1911.07963  [pdf, other

    cs.LG cs.CR stat.ML

    Can You Really Backdoor Federated Learning?

    Authors: Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, H. Brendan McMahan

    Abstract: The decentralized nature of federated learning makes detecting and defending against adversarial attacks a challenging task. This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task. Unlike existing works, we allow non-malicious clients to… ▽ More

    Submitted 2 December, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: To appear at the 2nd International Workshop on Federated Learning for Data Privacy and Confidentiality at NeurIPS 2019

  43. arXiv:1911.06679  [pdf, other

    cs.LG stat.ML

    Generative Models for Effective ML on Private, Decentralized Datasets

    Authors: Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

    Abstract: To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data - of representative samples, of outliers, of misclassifications - is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-p… ▽ More

    Submitted 4 February, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

    Comments: 26 pages, 8 figures. Camera-ready ICLR 2020 version

  44. arXiv:1911.03405  [pdf, other

    stat.ML cs.LG

    Theoretical Guarantees for Model Auditing with Finite Adversaries

    Authors: Mario Diaz, Peter Kairouz, Jiachun Liao, Lalitha Sankar

    Abstract: Privacy concerns have led to the development of privacy-preserving approaches for learning models from sensitive data. Yet, in practice, even models learned with privacy guarantees can inadvertently memorize unique training examples or leak sensitive features. To identify such privacy violations, existing model auditing techniques use finite adversaries defined as machine learning models with (a)… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: 18 pages, 1 figure

  45. arXiv:1911.00038  [pdf, other

    cs.LG cs.CR cs.DS cs.IT stat.ML

    Context-Aware Local Differential Privacy

    Authors: Jayadev Acharya, Keith Bonawitz, Peter Kairouz, Daniel Ramage, Ziteng Sun

    Abstract: Local differential privacy (LDP) is a strong notion of privacy for individual users that often comes at the expense of a significant drop in utility. The classical definition of LDP assumes that all elements in the data domain are equally sensitive. However, in many applications, some symbols are more sensitive than others. This work proposes a context-aware framework of local differential privacy… ▽ More

    Submitted 27 July, 2020; v1 submitted 31 October, 2019; originally announced November 2019.

  46. arXiv:1910.00411  [pdf, other

    cs.LG stat.ML

    Generating Fair Universal Representations using Adversarial Models

    Authors: Peter Kairouz, Jiachun Liao, Chong Huang, Maunil Vyas, Monica Welfert, Lalitha Sankar

    Abstract: We present a data-driven framework for learning fair universal representations (FUR) that guarantee statistical fairness for any learning task that may not be known a priori. Our framework leverages recent advances in adversarial learning to allow a data holder to learn representations in which a set of sensitive attributes are decoupled from the rest of the dataset. We formulate this as a constra… ▽ More

    Submitted 11 May, 2022; v1 submitted 27 September, 2019; originally announced October 2019.

    Comments: Extended version of a paper accepted to TIFS

  47. arXiv:1906.02314  [pdf, other

    cs.LG stat.ML

    A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization

    Authors: Tyler Sypherd, Mario Diaz, John Kevin Cava, Gautam Dasarathy, Peter Kairouz, Lalitha Sankar

    Abstract: We introduce a tunable loss function called $α$-loss, parameterized by $α\in (0,\infty]$, which interpolates between the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $α$-loss and Arimoto conditional entropy, verify the classification-calibration o… ▽ More

    Submitted 21 December, 2022; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Published at the Transactions on Information Theory

  48. arXiv:1902.08534  [pdf, other

    cs.CR

    Federated Heavy Hitters Discovery with Differential Privacy

    Authors: Wennan Zhu, Peter Kairouz, Brendan McMahan, Haicheng Sun, Wei Li

    Abstract: The discovery of heavy hitters (most frequent items) in user-generated data streams drives improvements in the app and web ecosystems, but can incur substantial privacy risks if not done with care. To address these risks, we propose a distributed and privacy-preserving algorithm for discovering the heavy hitters in a population of user-generated data streams. We leverage the sampling and threshold… ▽ More

    Submitted 28 February, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

  49. arXiv:1902.04639  [pdf, other

    cs.LG cs.IT stat.ML

    A Tunable Loss Function for Binary Classification

    Authors: Tyler Sypherd, Mario Diaz, Lalitha Sankar, Peter Kairouz

    Abstract: We present $α$-loss, $α\in [1,\infty]$, a tunable loss function for binary classification that bridges log-loss ($α=1$) and $0$-$1$ loss ($α= \infty$). We prove that $α$-loss has an equivalent margin-based form and is classification-calibrated, two desirable properties for a good surrogate loss function for the ideal yet intractable $0$-$1$ loss. For logistic regression-based classification, we pr… ▽ More

    Submitted 19 March, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: 9 pages, 1 figure, ISIT 2019

  50. arXiv:1812.06210  [pdf, ps, other

    cs.LG stat.ML

    A General Approach to Adding Differential Privacy to Iterative Training Procedures

    Authors: H. Brendan McMahan, Galen Andrew, Ulfar Erlingsson, Steve Chien, Ilya Mironov, Nicolas Papernot, Peter Kairouz

    Abstract: In this work we address the practical challenges of training machine learning models on privacy-sensitive datasets by introducing a modular approach that minimizes changes to training algorithms, provides a variety of configuration strategies for the privacy mechanism, and then isolates and simplifies the critical logic that computes the final privacy guarantees. A key challenge is that training a… ▽ More

    Submitted 4 March, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

    Comments: Presented at NeurIPS 2018 workshop on Privacy Preserving Machine Learning; Companion paper to TensorFlow Privacy OSS Library