Search | arXiv e-print repository

Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits

Authors: Sunwoo Lee, Anit Kumar Sahu, Chaoyang He, Salman Avestimehr

Abstract: Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss c… ▽ More Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2% higher validation accuracy than the periodic full averaging. △ Less

Submitted 11 January, 2022; originally announced January 2022.

arXiv:2007.07210 [pdf, other]

Simple and Efficient Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes

Authors: Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter

Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for develo** black-box adversarial attacks. Issues with BO's performance in high dimensions are avo… ▽ More We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for develo** black-box adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace. We demonstrate the efficacy of our proposed attack method by evaluating both $\ell_\infty$ and $\ell_2$ norm constrained untargeted and targeted hard label black-box attacks on three standard datasets - MNIST, CIFAR-10 and ImageNet. Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries compared to the current state-of-the-art black-box adversarial attacks. △ Less

Submitted 11 June, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted at KDD 2021. arXiv admin note: substantial text overlap with arXiv:1909.13857

arXiv:2001.01920 [pdf, other]

FedDANE: A Federated Newton-Type Method

Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

Abstract: Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions.… ▽ More Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions. Despite encouraging theoretical results, we find that the method has underwhelming performance empirically. In particular, through empirical simulations on both synthetic and real-world datasets, FedDANE consistently underperforms baselines of FedAvg and FedProx in realistic federated settings. We identify low device participation and statistical device heterogeneity as two underlying causes of this underwhelming performance, and conclude by suggesting several directions of future work. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: Asilomar Conference on Signals, Systems, and Computers 2019

arXiv:1909.13857 [pdf, other]

Black-box Adversarial Attacks with Bayesian Optimization

Authors: Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter

Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high… ▽ More We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to $80\%$ with respect to the state of the art methods. △ Less

Submitted 30 September, 2019; originally announced September 2019.

arXiv:1909.12473 [pdf, other]

Noisy Batch Active Learning with Deterministic Annealing

Authors: Gaurav Gupta, Anit Kumar Sahu, Wan-Yi Lin

Abstract: We study the problem of training machine learning models incrementally with batches of samples annotated with noisy oracles. We select each batch of samples that are important and also diverse via clustering and importance sampling. More importantly, we incorporate model uncertainty into the sampling probability to compensate for poor estimation of the importance scores when the training data is t… ▽ More We study the problem of training machine learning models incrementally with batches of samples annotated with noisy oracles. We select each batch of samples that are important and also diverse via clustering and importance sampling. More importantly, we incorporate model uncertainty into the sampling probability to compensate for poor estimation of the importance scores when the training data is too small to build a meaningful model. Experiments on benchmark image classification datasets (MNIST, SVHN, CIFAR10, and EMNIST) show improvement over existing active learning strategies. We introduce an extra denoising layer to deep networks to make active learning robust to label noises and show significant improvements. △ Less

Submitted 28 October, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

arXiv:1908.07873 [pdf, other]

doi 10.1109/MSP.2020.2975749

Federated Learning: Challenges, Methods, and Future Directions

Authors: Tian Li, Anit Kumar Sahu, Ameet Talwalkar, Virginia Smith

Abstract: Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while kee** data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving da… ▽ More Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while kee** data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities. △ Less

Submitted 21 August, 2019; originally announced August 2019.

arXiv:1905.09435 [pdf, other]

MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

Authors: Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, Soummya Kar

Abstract: This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that ca… ▽ More This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that can achieve a win-win in this error-runtime trade-off for any arbitrary network topology. The main idea of MATCHA is to parallelize inter-node communication by decomposing the topology into matchings. To preserve fast error convergence speed, it identifies and communicates more frequently over critical links, and saves communication time by using other links less frequently. Experiments on a suite of datasets and deep neural networks validate the theoretical analyses and demonstrate that MATCHA takes up to $5\times$ less time than vanilla decentralized SGD to reach the same training loss. △ Less

Submitted 18 November, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

arXiv:1903.07266 [pdf, other]

Distributed stochastic optimization with gradient tracking over strongly-connected networks

Authors: Ran Xin, Anit Kumar Sahu, Usman A. Khan, Soummya Kar

Abstract: In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxilia… ▽ More In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxiliary variable to asymptotically track the gradient of the global cost in expectation. The $\mathcal{S}$-$\mathcal{AB}$ algorithm employs row- and column-stochastic weights simultaneously to ensure both consensus and optimality. Since doubly-stochastic weights are not used, $\mathcal{S}$-$\mathcal{AB}$ is applicable to arbitrary strongly-connected graphs. We show that under a sufficiently small constant step-size, $\mathcal{S}$-$\mathcal{AB}$ converges linearly (in expected mean-square sense) to a neighborhood of the global minimizer. We present numerical simulations based on real-world data sets to illustrate the theoretical results. △ Less

Submitted 9 April, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

arXiv:1812.06127 [pdf, other]

Federated Optimization in Heterogeneous Networks

Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

Abstract: Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedPr… ▽ More Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedProx, to tackle heterogeneity in federated networks. FedProx can be viewed as a generalization and re-parametrization of FedAvg, the current state-of-the-art method for federated learning. While this re-parameterization makes only minor modifications to the method itself, these modifications have important ramifications both in theory and in practice. Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity). Practically, we demonstrate that FedProx allows for more robust convergence than FedAvg across a suite of realistic federated datasets. In particular, in highly heterogeneous settings, FedProx demonstrates significantly more stable and accurate convergence behavior relative to FedAvg---improving absolute test accuracy by 22% on average. △ Less

Submitted 21 April, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: MLSys 2020

arXiv:1811.04475 [pdf, other]

Managing App Install Ad Campaigns in RTB: A Q-Learning Approach

Authors: Anit Kumar Sahu, Shaunak Mishra, Narayan Bhamidipati

Abstract: Real time bidding (RTB) enables demand side platforms (bidders) to scale ad campaigns across multiple publishers affiliated to an RTB ad exchange. While driving multiple campaigns for mobile app install ads via RTB, the bidder typically has to: (i) maintain each campaign's efficiency (i.e., meet advertiser's target cost-per-install), (ii) be sensitive to advertiser's budget, and (iii) make profit… ▽ More Real time bidding (RTB) enables demand side platforms (bidders) to scale ad campaigns across multiple publishers affiliated to an RTB ad exchange. While driving multiple campaigns for mobile app install ads via RTB, the bidder typically has to: (i) maintain each campaign's efficiency (i.e., meet advertiser's target cost-per-install), (ii) be sensitive to advertiser's budget, and (iii) make profit after payouts to the ad exchange. In this process, there is a sense of delayed rewards for the bidder's actions; the exchange charges the bidder right after the ad is shown, but the bidder gets to know about resultant installs after considerable delay. This makes it challenging for the bidder to decide beforehand the bid (and corresponding cost charged to advertiser) for each ad display opportunity. To jointly handle the objectives mentioned above, we propose a state space based policy which decides the exchange bid and advertiser cost for each opportunity. The state space captures the current efficiency, budget utilization and profit. The policy based on this state space is trained on past decisions and outcomes via a novel Q-learning algorithm which accounts for the delay in install notifications. In our experiments based on data from app install campaigns managed by Yahoo's Gemini advertising platform, the Q-learning based policy led to a significant increase in the profit and number of efficient campaigns. △ Less

Submitted 11 November, 2018; originally announced November 2018.

Comments: 6 pages

Showing 1–10 of 10 results for author: Sahu, A K