Search | arXiv e-print repository

Communication Efficient Private Federated Learning Using Dithering

Authors: Burak Hasircioglu, Deniz Gunduz

Abstract: The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addit… ▽ More The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2205.01556 [pdf, ps, other]

Privacy Amplification via Random Participation in Federated Learning

Authors: Burak Hasircioglu, Deniz Gunduz

Abstract: Running a randomized algorithm on a subsampled dataset instead of the entire dataset amplifies differential privacy guarantees. In this work, in a federated setting, we consider random participation of the clients in addition to subsampling their local datasets. Since such random participation of the clients creates correlation among the samples of the same client in their subsampling, we analyze… ▽ More Running a randomized algorithm on a subsampled dataset instead of the entire dataset amplifies differential privacy guarantees. In this work, in a federated setting, we consider random participation of the clients in addition to subsampling their local datasets. Since such random participation of the clients creates correlation among the samples of the same client in their subsampling, we analyze the corresponding privacy amplification via non-uniform subsampling. We show that when the size of the local datasets is small, the privacy guarantees via random participation is close to those of the centralized setting, in which the entire dataset is located in a single host and subsampled. On the other hand, when the local datasets are large, observing the output of the algorithm may disclose the identities of the sampled clients with high confidence. Our analysis reveals that, even in this case, privacy guarantees via random participation outperform those via only local subsampling. △ Less

Submitted 3 May, 2022; originally announced May 2022.

arXiv:2202.03129 [pdf, other]

Over-the-Air Ensemble Inference with Model Privacy

Authors: Selim F. Yilmaz, Burak Hasircioglu, Deniz Gunduz

Abstract: We consider distributed inference at the wireless edge, where multiple clients with an ensemble of models, each trained independently on a local dataset, are queried in parallel to make an accurate decision on a new sample. In addition to maximizing inference accuracy, we also want to maximize the privacy of local models. We exploit the superposition property of the air to implement bandwidth-effi… ▽ More We consider distributed inference at the wireless edge, where multiple clients with an ensemble of models, each trained independently on a local dataset, are queried in parallel to make an accurate decision on a new sample. In addition to maximizing inference accuracy, we also want to maximize the privacy of local models. We exploit the superposition property of the air to implement bandwidth-efficient ensemble inference methods. We introduce different over-the-air ensemble methods and show that these schemes perform significantly better than their orthogonal counterparts, while using less resources and providing privacy guarantees. We also provide experimental results verifying the benefits of the proposed over-the-air inference approach, whose source code is shared publicly on Github. △ Less

Submitted 15 May, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: To appear in IEEE International Symposium on Information Theory (ISIT) 2022

arXiv:2106.07731 [pdf, ps, other]

doi 10.1109/JSAC.2022.3142355

Bivariate Polynomial Codes for Secure Distributed Matrix Multiplication

Authors: Burak Hasircioglu, Jesus Gomez-Vilardebo, Deniz Gunduz

Abstract: We consider the problem of secure distributed matrix multiplication (SDMM). Coded computation has been shown to be an effective solution in distributed matrix multiplication, both providing privacy against workers and boosting the computation speed by efficiently mitigating stragglers. In this work, we present a non-direct secure extension of the recently introduced bivariate polynomial codes. Biv… ▽ More We consider the problem of secure distributed matrix multiplication (SDMM). Coded computation has been shown to be an effective solution in distributed matrix multiplication, both providing privacy against workers and boosting the computation speed by efficiently mitigating stragglers. In this work, we present a non-direct secure extension of the recently introduced bivariate polynomial codes. Bivariate polynomial codes have been shown to be able to further speed up distributed matrix multiplication by exploiting the partial work done by the stragglers rather than completely ignoring them while reducing the upload communication cost and/or the workers' storage's capacity needs. We show that, especially for upload communication or storage constrained settings, the proposed approach reduces the average computation time of SDMM compared to its competitors in the literature. △ Less

Submitted 6 February, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: To appear in IEEE Journal on Selected Areas in Communications. arXiv admin note: text overlap with arXiv:2102.08304

arXiv:2102.08304 [pdf, ps, other]

Speeding Up Private Distributed Matrix Multiplication via Bivariate Polynomial Codes

Authors: Burak Hasircioglu, Jesus Gomez-Vilardebo, Deniz Gunduz

Abstract: We consider the problem of private distributed matrix multiplication under limited resources. Coded computation has been shown to be an effective solution in distributed matrix multiplication, both providing privacy against the workers and boosting the computation speed by efficiently mitigating stragglers. In this work, we propose the use of recently-introduced bivariate polynomial codes to furth… ▽ More We consider the problem of private distributed matrix multiplication under limited resources. Coded computation has been shown to be an effective solution in distributed matrix multiplication, both providing privacy against the workers and boosting the computation speed by efficiently mitigating stragglers. In this work, we propose the use of recently-introduced bivariate polynomial codes to further speed up private distributed matrix multiplication by exploiting the partial work done by the stragglers rather than completely ignoring them. We show that the proposed approach reduces the average computation time of private distributed matrix multiplication compared to its competitors in the literature while improving the upload communication cost and the workers' storage efficiency. △ Less

Submitted 13 July, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: To appear in IEEE International Symposium on Information Theory (ISIT) 2021

arXiv:2101.11693 [pdf, other]

Dopamine: Differentially Private Federated Learning on Medical Data

Authors: Mohammad Malekzadeh, Burak Hasircioglu, Nitish Mital, Kunal Katarya, Mehmet Emre Ozfatura, Deniz Gündüz

Abstract: While rich medical datasets are hosted in hospitals distributed across the world, concerns on patients' privacy is a barrier against using such data to train deep neural networks (DNNs) for medical diagnostics. We propose Dopamine, a system to train DNNs on distributed datasets, which employs federated learning (FL) with differentially-private stochastic gradient descent (DPSGD), and, in combinati… ▽ More While rich medical datasets are hosted in hospitals distributed across the world, concerns on patients' privacy is a barrier against using such data to train deep neural networks (DNNs) for medical diagnostics. We propose Dopamine, a system to train DNNs on distributed datasets, which employs federated learning (FL) with differentially-private stochastic gradient descent (DPSGD), and, in combination with secure aggregation, can establish a better trade-off between differential privacy (DP) guarantee and DNN's accuracy than other approaches. Results on a diabetic retinopathy~(DR) task show that Dopamine provides a DP guarantee close to the centralized training counterpart, while achieving a better classification accuracy than FL with parallel DP where DPSGD is applied without coordination. Code is available at https://github.com/ipc-lab/private-ml-for-health. △ Less

Submitted 29 January, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-21)

arXiv:2011.08579 [pdf, ps, other]

Private Wireless Federated Learning with Anonymous Over-the-Air Computation

Authors: Burak Hasircioglu, Deniz Gunduz

Abstract: In conventional federated learning (FL), differential privacy (DP) guarantees can be obtained by injecting additional noise to local model updates before transmitting to the parameter server (PS). In the wireless FL scenario, we show that the privacy of the system can be boosted by exploiting over-the-air computation (OAC) and anonymizing the transmitting devices. In OAC, devices transmit their mo… ▽ More In conventional federated learning (FL), differential privacy (DP) guarantees can be obtained by injecting additional noise to local model updates before transmitting to the parameter server (PS). In the wireless FL scenario, we show that the privacy of the system can be boosted by exploiting over-the-air computation (OAC) and anonymizing the transmitting devices. In OAC, devices transmit their model updates simultaneously and in an uncoded fashion, resulting in a much more efficient use of the available spectrum. We further exploit OAC to provide anonymity for the transmitting devices. The proposed approach improves the performance of private wireless FL by reducing the amount of noise that must be injected. △ Less

Submitted 13 February, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

Comments: To appear in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021

arXiv:2001.07227 [pdf, other]

doi 10.1109/JSAIT.2021.3105365

Bivariate Polynomial Coding for Efficient Distributed Matrix Multiplication

Authors: Burak Hasircioglu, Jesus Gomez-Vilardebo, Deniz Gunduz

Abstract: Coded computing is an effective technique to mitigate "stragglers" in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the computation time depend only on the fastest workers. However, these schemes completely ignore the work done by the straggling workers resulting in a waste of computat… ▽ More Coded computing is an effective technique to mitigate "stragglers" in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the computation time depend only on the fastest workers. However, these schemes completely ignore the work done by the straggling workers resulting in a waste of computational resources. To reduce the amount of work left unfinished at workers, one can further decompose the matrix multiplication task into smaller sub-tasks, and assign multiple sub-tasks to each worker, possibly heterogeneously, to better fit their particular storage and computation capacities. In this work, we propose a novel family of bivariate polynomial codes to efficiently exploit the work carried out by straggling workers. We show that bivariate polynomial codes bring significant advantages in terms of upload communication costs and storage efficiency, measured in terms of the number of sub-tasks that can be computed per worker. We propose two bivariate polynomial coding schemes. The first one exploits the fact that bivariate interpolation is always possible on a rectangular grid of evaluation points. We obtain such points at the cost of adding some redundant computations. For the second scheme, we relax the decoding constraints and require decodability for almost all choices of the evaluation points. We present interpolation sets satisfying such decodability conditions for certain storage configurations of workers. Our numerical results show that bivariate polynomial coding considerably reduces the average computation time of distributed matrix multiplication. We believe this work opens up a new class of previously unexplored coding schemes for efficient coded distributed computation. △ Less

Submitted 18 August, 2021; v1 submitted 20 January, 2020; originally announced January 2020.

Comments: To appear in "IEEE Journal on Selected Areas in Information Theory: Special Issue on Coded Computing"

Showing 1–8 of 8 results for author: Hasircioglu, B