Search | arXiv e-print repository

Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey

Authors: Joshua C. Zhao, Saurabh Bagchi, Salman Avestimehr, Kevin S. Chan, Somali Chaterji, Dimitris Dimitriadis, Jiacheng Li, Ninghui Li, Arash Nourian, Holger R. Roth

Abstract: Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important pr… ▽ More Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology enabling collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be "reverse engineered" to infer information about the private training data. It has been shown under a wide variety of settings that this premise for privacy does {\em not} hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which FL client privacy can be broken. We dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL. We conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Submitted to ACM Computing Surveys

ACM Class: I.2; H.4; I.5

arXiv:2403.18144 [pdf, other]

Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning

Authors: Joshua C. Zhao, Ahaan Dabholkar, Atul Sharma, Saurabh Bagchi

Abstract: Federated learning is a decentralized learning paradigm introduced to preserve privacy of client data. Despite this, prior work has shown that an attacker at the server can still reconstruct the private training data using only the client updates. These attacks are known as data reconstruction attacks and fall into two major categories: gradient inversion (GI) and linear layer leakage attacks (LLL… ▽ More Federated learning is a decentralized learning paradigm introduced to preserve privacy of client data. Despite this, prior work has shown that an attacker at the server can still reconstruct the private training data using only the client updates. These attacks are known as data reconstruction attacks and fall into two major categories: gradient inversion (GI) and linear layer leakage attacks (LLL). However, despite demonstrating the effectiveness of these attacks in breaching privacy, prior work has not investigated the usefulness of the reconstructed data for downstream tasks. In this work, we explore data reconstruction attacks through the lens of training and improving models with leaked data. We demonstrate the effectiveness of both GI and LLL attacks in maliciously training models using the leaked data more accurately than a benign federated learning strategy. Counter-intuitively, this bump in training quality can occur despite limited reconstruction quality or a small total number of leaked images. Finally, we show the limitations of these attacks for downstream training, individually for GI attacks and for LLL attacks. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2303.14868 [pdf, other]

The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning

Authors: Joshua C. Zhao, Ahmed Roushdy Elkordy, Atul Sharma, Yahya H. Ezzeldin, Salman Avestimehr, Saurabh Bagchi

Abstract: Secure aggregation promises a heightened level of privacy in federated learning, maintaining that a server only has access to a decrypted aggregate update. Within this setting, linear layer leakage methods are the only data reconstruction attacks able to scale and achieve a high leakage rate regardless of the number of clients or batch size. This is done through increasing the size of an injected… ▽ More Secure aggregation promises a heightened level of privacy in federated learning, maintaining that a server only has access to a decrypted aggregate update. Within this setting, linear layer leakage methods are the only data reconstruction attacks able to scale and achieve a high leakage rate regardless of the number of clients or batch size. This is done through increasing the size of an injected fully-connected (FC) layer. However, this results in a resource overhead which grows larger with an increasing number of clients. We show that this resource overhead is caused by an incorrect perspective in all prior work that treats an attack on an aggregate update in the same way as an individual update with a larger batch size. Instead, by attacking the update from the perspective that aggregation is combining multiple individual updates, this allows the application of sparsity to alleviate resource overhead. We show that the use of sparsity can decrease the model size overhead by over 327$\times$ and the computation time by 3.34$\times$ compared to SOTA while maintaining equivalent total leakage rate, 77% even with $1000$ clients in aggregation. △ Less

Submitted 26 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2303.12233 [pdf, other]

LOKI: Large-scale Data Reconstruction Attack against Federated Learning through Model Manipulation

Authors: Joshua C. Zhao, Atul Sharma, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Salman Avestimehr, Saurabh Bagchi

Abstract: Federated learning was introduced to enable machine learning over large decentralized datasets while promising privacy by eliminating the need for data sharing. Despite this, prior work has shown that shared gradients often contain private information and attackers can gain knowledge either through malicious modification of the architecture and parameters or by using optimization to approximate us… ▽ More Federated learning was introduced to enable machine learning over large decentralized datasets while promising privacy by eliminating the need for data sharing. Despite this, prior work has shown that shared gradients often contain private information and attackers can gain knowledge either through malicious modification of the architecture and parameters or by using optimization to approximate user data from the shared gradients. However, prior data reconstruction attacks have been limited in setting and scale, as most works target FedSGD and limit the attack to single-client gradients. Many of these attacks fail in the more practical setting of FedAVG or if updates are aggregated together using secure aggregation. Data reconstruction becomes significantly more difficult, resulting in limited attack scale and/or decreased reconstruction quality. When both FedAVG and secure aggregation are used, there is no current method that is able to attack multiple clients concurrently in a federated learning setting. In this work we introduce LOKI, an attack that overcomes previous limitations and also breaks the anonymity of aggregation as the leaked data is identifiable and directly tied back to the clients they come from. Our design sends clients customized convolutional parameters, and the weight gradients of data points between clients remain separate even through aggregation. With FedAVG and aggregation across 100 clients, prior work can leak less than 1% of images on MNIST, CIFAR-100, and Tiny ImageNet. Using only a single training round, LOKI is able to leak 76-86% of all data samples. △ Less

Submitted 25 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: To appear in the IEEE Symposium on Security & Privacy (S&P) 2024

arXiv:1911.07148 [pdf, ps, other]

Finding Modular Functions for Ramanujan-Type Identities

Authors: William Y. C. Chen, Julia Q. D. Du, Jack C. D. Zhao

Abstract: This paper is concerned with a class of partition functions $a(n)$ introduced by Radu and defined in terms of eta-quotients. By utilizing the transformation laws of Newman, Schoeneberg and Robins, and Radu's algorithms, we present an algorithm to find Ramanujan-type identities for $a(mn+t)$. While this algorithm is not guaranteed to succeed, it applies to many cases. For example, we deduce a witne… ▽ More This paper is concerned with a class of partition functions $a(n)$ introduced by Radu and defined in terms of eta-quotients. By utilizing the transformation laws of Newman, Schoeneberg and Robins, and Radu's algorithms, we present an algorithm to find Ramanujan-type identities for $a(mn+t)$. While this algorithm is not guaranteed to succeed, it applies to many cases. For example, we deduce a witness identity for $p(11n+6)$ with integer coefficients. Our algorithm also leads to Ramanujan-type identities for the overpartition functions $\overline{p}(5n+2)$ and $\overline{p}(5n+3)$ and Andrews--Paule's broken $2$-diamond partition functions $\triangle_{2}(25n+14)$ and $\triangle_{2}(25n+24)$. It can also be extended to derive Ramanujan-type identities on a more general class of partition functions. For example, it yields the Ramanujan-type identities on Andrews' singular overpartition functions $\overline{Q}_{3,1}(9n+3)$ and $ \overline{Q}_{3,1}(9n+6)$ due to Shen, the $2$-dissection formulas of Ramanujan and the $8$-dissection formulas due to Hirschhorn. △ Less

Submitted 16 November, 2019; originally announced November 2019.

Comments: 45 pages, to appear in Annals of Combinatorics

MSC Class: 05A15; 11P83; 11P84; 05A17

arXiv:1802.01374 [pdf, ps, other]

Congruences for the Coefficients of the Powers of the Euler Product

Authors: Julia Q. D. Du, Edward Y. S. Liu, Jack C. D. Zhao

Abstract: Let $p_k(n)$ be given by the $k$-th power of the Euler Product $\prod _{n=1}^{\infty}(1-q^n)^k=\sum_{n=0}^{\infty}p_k(n)q^{n}$. By investigating the properties of the modular equations of the second and the third order under the Atkin $U$-operator, we determine the generating functions of $p_{8k}(2^{2α} n +\frac{k(2^{2α}-1)}{3})$ $(1\leq k\leq 3)$ and $p_{3k} (3^{2β}n+\frac{k(3^{2β}-1)}{8})$… ▽ More Let $p_k(n)$ be given by the $k$-th power of the Euler Product $\prod _{n=1}^{\infty}(1-q^n)^k=\sum_{n=0}^{\infty}p_k(n)q^{n}$. By investigating the properties of the modular equations of the second and the third order under the Atkin $U$-operator, we determine the generating functions of $p_{8k}(2^{2α} n +\frac{k(2^{2α}-1)}{3})$ $(1\leq k\leq 3)$ and $p_{3k} (3^{2β}n+\frac{k(3^{2β}-1)}{8})$ $(1\leq k\leq 8)$ in terms of some linear recurring sequences. Combining with a result of Engstrom about the periodicity of linear recurring sequences modulo $m$, we obtain infinite families of congruences for $p_k(n)$ modulo any $m\geq2$, where $1\leq k\leq 24$ and $3|k$ or $8|k$. Based on these congruences for $p_k(n)$, infinite families of congruences for many partition functions such as the overpartition function, $t$-core partition functions and $\ell$-regular partition functions are easily obtained. △ Less

Submitted 12 March, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

Comments: 26 pages, replaced references, corrected typos

Showing 1–6 of 6 results for author: Zhao, J C