-
Secure Aggregation for Buffered Asynchronous Federated Learning
Authors:
**hyun So,
Ramy E. Ali,
Başak Güler,
A. Salman Avestimehr
Abstract:
Federated learning (FL) typically relies on synchronous training, which is slow due to stragglers. While asynchronous training handles stragglers efficiently, it does not ensure privacy due to the incompatibility with the secure aggregation protocols. A buffered asynchronous training protocol known as FedBuff has been proposed recently which bridges the gap between synchronous and asynchronous tra…
▽ More
Federated learning (FL) typically relies on synchronous training, which is slow due to stragglers. While asynchronous training handles stragglers efficiently, it does not ensure privacy due to the incompatibility with the secure aggregation protocols. A buffered asynchronous training protocol known as FedBuff has been proposed recently which bridges the gap between synchronous and asynchronous training to mitigate stragglers and to also ensure privacy simultaneously. FedBuff allows the users to send their updates asynchronously while ensuring privacy by storing the updates in a trusted execution environment (TEE) enabled private buffer. TEEs, however, have limited memory which limits the buffer size. Motivated by this limitation, we develop a buffered asynchronous secure aggregation (BASecAgg) protocol that does not rely on TEEs. The conventional secure aggregation protocols cannot be applied in the buffered asynchronous setting since the buffer may have local models corresponding to different rounds and hence the masks that the users use to protect their models may not cancel out. BASecAgg addresses this challenge by carefully designing the masks such that they cancel out even if they correspond to different rounds. Our convergence analysis and experiments show that BASecAgg almost has the same convergence guarantees as FedBuff without relying on TEEs.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
ApproxIFER: A Model-Agnostic Approach to Resilient and Robust Prediction Serving Systems
Authors:
Mahdi Soleymani,
Ramy E. Ali,
Hessam Mahdavifar,
A. Salman Avestimehr
Abstract:
Due to the surge of cloud-assisted AI services, the problem of designing resilient prediction serving systems that can effectively cope with stragglers/failures and minimize response delays has attracted much interest. The common approach for tackling this problem is replication which assigns the same prediction task to multiple workers. This approach, however, is very inefficient and incurs signi…
▽ More
Due to the surge of cloud-assisted AI services, the problem of designing resilient prediction serving systems that can effectively cope with stragglers/failures and minimize response delays has attracted much interest. The common approach for tackling this problem is replication which assigns the same prediction task to multiple workers. This approach, however, is very inefficient and incurs significant resource overheads. Hence, a learning-based approach known as parity model (ParM) has been recently proposed which learns models that can generate parities for a group of predictions in order to reconstruct the predictions of the slow/failed workers. While this learning-based approach is more resource-efficient than replication, it is tailored to the specific model hosted by the cloud and is particularly suitable for a small number of queries (typically less than four) and tolerating very few (mostly one) number of stragglers. Moreover, ParM does not handle Byzantine adversarial workers. We propose a different approach, named Approximate Coded Inference (ApproxIFER), that does not require training of any parity models, hence it is agnostic to the model hosted by the cloud and can be readily applied to different data domains and model architectures. Compared with earlier works, ApproxIFER can handle a general number of stragglers and scales significantly better with the number of queries. Furthermore, ApproxIFER is robust against Byzantine workers. Our extensive experiments on a large number of datasets and model architectures also show significant accuracy improvement by up to 58% over the parity model approaches.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Basil: A Fast and Byzantine-Resilient Approach for Decentralized Training
Authors:
Ahmed Roushdy Elkordy,
Saurav Prakash,
A. Salman Avestimehr
Abstract:
Detection and mitigation of Byzantine behaviors in a decentralized learning setting is a daunting task, especially when the data distribution at the users is heterogeneous. As our main contribution, we propose Basil, a fast and computationally efficient Byzantine robust algorithm for decentralized training systems, which leverages a novel sequential, memory assisted and performance-based criteria…
▽ More
Detection and mitigation of Byzantine behaviors in a decentralized learning setting is a daunting task, especially when the data distribution at the users is heterogeneous. As our main contribution, we propose Basil, a fast and computationally efficient Byzantine robust algorithm for decentralized training systems, which leverages a novel sequential, memory assisted and performance-based criteria for training over a logical ring while filtering the Byzantine users. In the IID dataset distribution setting, we provide the theoretical convergence guarantees of Basil, demonstrating its linear convergence rate. Furthermore, for the IID setting, we experimentally demonstrate that Basil is robust to various Byzantine attacks, including the strong Hidden attack, while providing up to ${\sim}16 \%$ higher test accuracy over the state-of-the-art Byzantine-resilient decentralized learning approach. Additionally, we generalize Basil to the non-IID dataset distribution setting by proposing Anonymous Cyclic Data Sharing (ACDS), a technique that allows each node to anonymously share a random fraction of its local non-sensitive dataset (e.g., landmarks images) with all other nodes. We demonstrate that Basil alongside ACDS with only $5\%$ data sharing provides effective toleration of Byzantine nodes, unlike the state-of-the-art Byzantine robust algorithm that completely fails in the heterogeneous data setting. Finally, to reduce the overall latency of Basil resulting from its sequential implementation over the logical ring, we propose Basil+. In particular, Basil+ provides scalability by enabling Byzantine-robust parallel training across groups of logical rings, and at the same time, it retains the performance gains of Basil due to sequential training within each group. Furthermore, we experimentally demonstrate the scalability gains of Basil+ through different sets of experiments.
△ Less
Submitted 6 October, 2022; v1 submitted 16 September, 2021;
originally announced September 2021.
-
List-Decodable Coded Computing: Breaking the Adversarial Toleration Barrier
Authors:
Mahdi Soleymani,
Ramy E. Ali,
Hessam Mahdavifar,
A. Salman Avestimehr
Abstract:
We consider the problem of coded computing, where a computational task is performed in a distributed fashion in the presence of adversarial workers. We propose techniques to break the adversarial toleration threshold barrier previously known in coded computing. More specifically, we leverage list-decoding techniques for folded Reed-Solomon codes and propose novel algorithms to recover the correct…
▽ More
We consider the problem of coded computing, where a computational task is performed in a distributed fashion in the presence of adversarial workers. We propose techniques to break the adversarial toleration threshold barrier previously known in coded computing. More specifically, we leverage list-decoding techniques for folded Reed-Solomon codes and propose novel algorithms to recover the correct codeword using side information. In the coded computing setting, we show how the master node can perform certain carefully designed extra computations to obtain the side information. The workload of computing this side information is negligible compared to the computations done by each worker. This side information is then utilized to prune the output of the list decoder and uniquely recover the true outcome. We further propose folded Lagrange coded computing (FLCC) to incorporate the developed techniques into a specific coded computing setting. Our results show that FLCC outperforms LCC by breaking the barrier on the number of adversaries that can be tolerated. In particular, the corresponding threshold in FLCC is improved by a factor of two asymptotically compared to that of LCC.
△ Less
Submitted 19 August, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
On Polynomial Approximations for Privacy-Preserving and Verifiable ReLU Networks
Authors:
Ramy E. Ali,
**hyun So,
A. Salman Avestimehr
Abstract:
Outsourcing deep neural networks (DNNs) inference tasks to an untrusted cloud raises data privacy and integrity concerns. While there are many techniques to ensure privacy and integrity for polynomial-based computations, DNNs involve non-polynomial computations. To address these challenges, several privacy-preserving and verifiable inference techniques have been proposed based on replacing the non…
▽ More
Outsourcing deep neural networks (DNNs) inference tasks to an untrusted cloud raises data privacy and integrity concerns. While there are many techniques to ensure privacy and integrity for polynomial-based computations, DNNs involve non-polynomial computations. To address these challenges, several privacy-preserving and verifiable inference techniques have been proposed based on replacing the non-polynomial activation functions such as the rectified linear unit (ReLU) function with polynomial activation functions. Such techniques usually require polynomials with integer coefficients or polynomials over finite fields. Motivated by such requirements, several works proposed replacing the ReLU function with the square function. In this work, we empirically show that the square function is not the best degree-2 polynomial that can replace the ReLU function even when restricting the polynomials to have integer coefficients. We instead propose a degree-2 polynomial activation function with a first order term and empirically show that it can lead to much better models. Our experiments on the CIFAR and Tiny ImageNet datasets on various architectures such as VGG-16 show that our proposed function improves the test accuracy by up to 10.4% compared to the square function.
△ Less
Submitted 6 February, 2024; v1 submitted 10 November, 2020;
originally announced November 2020.
-
A Scalable Approach for Privacy-Preserving Collaborative Machine Learning
Authors:
**hyun So,
Basak Guler,
A. Salman Avestimehr
Abstract:
We consider a collaborative learning scenario in which multiple data-owners wish to jointly train a logistic regression model, while kee** their individual datasets private from the other parties. We propose COPML, a fully-decentralized training framework that achieves scalability and privacy-protection simultaneously. The key idea of COPML is to securely encode the individual datasets to distri…
▽ More
We consider a collaborative learning scenario in which multiple data-owners wish to jointly train a logistic regression model, while kee** their individual datasets private from the other parties. We propose COPML, a fully-decentralized training framework that achieves scalability and privacy-protection simultaneously. The key idea of COPML is to securely encode the individual datasets to distribute the computation load effectively across many parties and to perform the training computations as well as the model updates in a distributed manner on the securely encoded data. We provide the privacy analysis of COPML and prove its convergence. Furthermore, we experimentally demonstrate that COPML can achieve significant speedup in training over the benchmark protocols. Our protocol provides strong statistical privacy guarantees against colluding parties (adversaries) with unbounded computational power, while achieving up to $16\times$ speedup in the training time against the benchmark protocols.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Secure Aggregation with Heterogeneous Quantization in Federated Learning
Authors:
Ahmed Roushdy Elkordy,
A. Salman Avestimehr
Abstract:
Secure model aggregation across many users is a key component of federated learning systems. The state-of-the-art protocols for secure model aggregation, which are based on additive masking, require all users to quantize their model updates to the same level of quantization. This severely degrades their performance due to lack of adaptation to available bandwidth at different users. We propose thr…
▽ More
Secure model aggregation across many users is a key component of federated learning systems. The state-of-the-art protocols for secure model aggregation, which are based on additive masking, require all users to quantize their model updates to the same level of quantization. This severely degrades their performance due to lack of adaptation to available bandwidth at different users. We propose three schemes that allow secure model aggregation while using heterogeneous quantization. This enables the users to adjust their quantization proportional to their available bandwidth, which can provide a substantially better trade-off between the accuracy of training and the communication time. The proposed schemes are based on a grou** strategy by partitioning the network into groups, and partitioning the local model updates of users into segments. Instead of applying aggregation protocol to the entire local model update vector, it is applied on segments with specific coordination between users. We theoretically evaluate the quantization error for our schemes, and also demonstrate how our schemes can be utilized to overcome Byzantine users.
△ Less
Submitted 15 November, 2021; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Analog Lagrange Coded Computing
Authors:
Mahdi Soleymani,
Hessam Mahdavifar,
A. Salman Avestimehr
Abstract:
A distributed computing scenario is considered, where the computational power of a set of worker nodes is used to perform a certain computation task over a dataset that is dispersed among the workers. Lagrange coded computing (LCC), proposed by Yu et al., leverages the well-known Lagrange polynomial to perform polynomial evaluation of the dataset in such a scenario in an efficient parallel fashion…
▽ More
A distributed computing scenario is considered, where the computational power of a set of worker nodes is used to perform a certain computation task over a dataset that is dispersed among the workers. Lagrange coded computing (LCC), proposed by Yu et al., leverages the well-known Lagrange polynomial to perform polynomial evaluation of the dataset in such a scenario in an efficient parallel fashion while kee** the privacy of data amidst possible collusion of workers. This solution relies on quantizing the data into a finite field, so that Shamir's secret sharing, as one of its main building blocks, can be employed. Such a solution, however, is not properly scalable with the size of dataset, mainly due to computation overflows. To address such a critical issue, we propose a novel extension of LCC to the analog domain, referred to as analog LCC (ALCC). All the operations in the proposed ALCC protocol are done over the infinite fields of R/C but for practical implementations floating-point numbers are used. We characterize the privacy of data in ALCC, against any subset of colluding workers up to a certain size, in terms of the distinguishing security (DS) and the mutual information security (MIS) metrics. Also, the accuracy of outcome is characterized in a practical setting assuming operations are performed using floating-point numbers. Consequently, a fundamental trade-off between the accuracy of the outcome of ALCC and its privacy level is observed and is numerically evaluated. Moreover, we implement the proposed scheme to perform matrix-matrix multiplication over a batch of matrices. It is observed that ALCC is superior compared to the state-of-the-art LCC, implemented using fixed-point numbers, assuming both schemes use an equal number of bits to represent data symbols.
△ Less
Submitted 29 January, 2021; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Byzantine-Resilient Secure Federated Learning
Authors:
**hyun So,
Basak Guler,
A. Salman Avestimehr
Abstract:
Secure federated learning is a privacy-preserving framework to improve machine learning models by training over large volumes of data collected by mobile users. This is achieved through an iterative process where, at each iteration, users update a global model using their local datasets. Each user then masks its local model via random keys, and the masked models are aggregated at a central server…
▽ More
Secure federated learning is a privacy-preserving framework to improve machine learning models by training over large volumes of data collected by mobile users. This is achieved through an iterative process where, at each iteration, users update a global model using their local datasets. Each user then masks its local model via random keys, and the masked models are aggregated at a central server to compute the global model for the next iteration. As the local models are protected by random masks, the server cannot observe their true values. This presents a major challenge for the resilience of the model against adversarial (Byzantine) users, who can manipulate the global model by modifying their local models or datasets. Towards addressing this challenge, this paper presents the first single-server Byzantine-resilient secure aggregation framework (BREA) for secure federated learning. BREA is based on an integrated stochastic quantization, verifiable outlier detection, and secure model aggregation approach to guarantee Byzantine-resilience, privacy, and convergence simultaneously. We provide theoretical convergence and privacy guarantees and characterize the fundamental trade-offs in terms of the network size, user dropouts, and privacy protection. Our experiments demonstrate convergence in the presence of Byzantine users, and comparable accuracy to conventional federated learning benchmarks.
△ Less
Submitted 20 February, 2021; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Privacy-Preserving Distributed Learning in the Analog Domain
Authors:
Mahdi Soleymani,
Hessam Mahdavifar,
A. Salman Avestimehr
Abstract:
We consider the critical problem of distributed learning over data while kee** it private from the computational servers. The state-of-the-art approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixed-point…
▽ More
We consider the critical problem of distributed learning over data while kee** it private from the computational servers. The state-of-the-art approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixed-point representation of the data and computation overflows. To address these critical issues, we propose a novel algorithm to solve the problem when data is in the analog domain, e.g., the field of real/complex numbers. We characterize the privacy of the data from both information-theoretic and cryptographic perspectives, while establishing a connection between the two notions in the analog domain. More specifically, the well-known connection between the distinguishing security (DS) and the mutual information security (MIS) metrics is extended from the discrete domain to the continues domain. This is then utilized to bound the amount of information about the data leaked to the servers in our protocol, in terms of the DS metric, using well-known results on the capacity of single-input multiple-output (SIMO) channel with correlated noise. It is shown how the proposed framework can be adopted to do computation tasks when data is represented using floating-point numbers. We then show that this leads to a fundamental trade-off between the privacy level of data and accuracy of the result. As an application, we also show how to train a machine learning model while kee** the data as well as the trained model private. Then numerical results are shown for experiments on the MNIST dataset. Furthermore, experimental advantages are shown comparing to fixed-point implementations over finite fields.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Coded Computing for Federated Learning at the Edge
Authors:
Saurav Prakash,
Sagar Dhakal,
Mustafa Akdeniz,
A. Salman Avestimehr,
Nageen Himayat
Abstract:
Federated Learning (FL) is an exciting new paradigm that enables training a global model from data generated locally at the client nodes, without moving client data to a centralized server. Performance of FL in a multi-access edge computing (MEC) network suffers from slow convergence due to heterogeneity and stochastic fluctuations in compute power and communication link qualities across clients.…
▽ More
Federated Learning (FL) is an exciting new paradigm that enables training a global model from data generated locally at the client nodes, without moving client data to a centralized server. Performance of FL in a multi-access edge computing (MEC) network suffers from slow convergence due to heterogeneity and stochastic fluctuations in compute power and communication link qualities across clients. A recent work, Coded Federated Learning (CFL), proposes to mitigate stragglers and speed up training for linear regression tasks by assigning redundant computations at the MEC server. Coding redundancy in CFL is computed by exploiting statistical properties of compute and communication delays. We develop CodedFedL that addresses the difficult task of extending CFL to distributed non-linear regression and classification problems with multioutput labels. The key innovation of our work is to exploit distributed kernel embedding using random Fourier features that transforms the training task into distributed linear regression. We provide an analytical solution for load allocation, and demonstrate significant performance gains for CodedFedL through experiments over benchmark datasets using practical network parameters.
△ Less
Submitted 9 May, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
Authors:
Seyed Mohammadreza Mousavi Kalan,
Zalan Fabian,
A. Salman Avestimehr,
Mahdi Soltanolkotabi
Abstract:
Transfer learning has emerged as a powerful technique for improving the performance of machine learning models on new domains where labeled training data may be scarce. In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data. Despite recent e…
▽ More
Transfer learning has emerged as a powerful technique for improving the performance of machine learning models on new domains where labeled training data may be scarce. In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data. Despite recent empirical success of transfer learning approaches, the benefits and fundamental limits of transfer learning are poorly understood. In this paper we develop a statistical minimax framework to characterize the fundamental limits of transfer learning in the context of regression with linear and one-hidden layer neural network models. Specifically, we derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data as well as appropriate notions of similarity between the source and target tasks. Our lower bound provides new insights into the benefits and limitations of transfer learning. We further corroborate our theoretical finding with various experiments.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Turbo-Aggregate: Breaking the Quadratic Aggregation Barrier in Secure Federated Learning
Authors:
**hyun So,
Basak Guler,
A. Salman Avestimehr
Abstract:
Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggrega…
▽ More
Federated learning is a distributed framework for training machine learning models over the data residing at mobile devices, while protecting the privacy of individual users. A major bottleneck in scaling federated learning to a large number of users is the overhead of secure model aggregation across many users. In particular, the overhead of the state-of-the-art protocols for secure model aggregation grows quadratically with the number of users. In this paper, we propose the first secure aggregation framework, named Turbo-Aggregate, that in a network with $N$ users achieves a secure aggregation overhead of $O(N\log{N})$, as opposed to $O(N^2)$, while tolerating up to a user dropout rate of $50\%$. Turbo-Aggregate employs a multi-group circular strategy for efficient model aggregation, and leverages additive secret sharing and novel coding techniques for injecting aggregation redundancy in order to handle user dropouts while guaranteeing user privacy. We experimentally demonstrate that Turbo-Aggregate achieves a total running time that grows almost linear in the number of users, and provides up to $40\times$ speedup over the state-of-the-art protocols with up to $N=200$ users. Our experiments also demonstrate the impact of model size and bandwidth on the performance of Turbo-Aggregate.
△ Less
Submitted 20 February, 2021; v1 submitted 10 February, 2020;
originally announced February 2020.
-
Coded Computing for Secure Boolean Computations
Authors:
Chien-Sheng Yang,
A. Salman Avestimehr
Abstract:
The growing size of modern datasets necessitates splitting a large scale computation into smaller computations and operate in a distributed manner. Adversaries in a distributed system deliberately send erroneous data in order to affect the computation for their benefit. Boolean functions are the key components of many applications, e.g., verification functions in blockchain systems and design of c…
▽ More
The growing size of modern datasets necessitates splitting a large scale computation into smaller computations and operate in a distributed manner. Adversaries in a distributed system deliberately send erroneous data in order to affect the computation for their benefit. Boolean functions are the key components of many applications, e.g., verification functions in blockchain systems and design of cryptographic algorithms. We consider the problem of computing a Boolean function in a distributed computing system with particular focus on \emph{security against Byzantine workers}. Any Boolean function can be modeled as a multivariate polynomial with high degree in general. However, the security threshold (i.e., the maximum number of adversarial workers can be tolerated such that the correct results can be obtained) provided by the recent proposed Lagrange Coded Computing (LCC) can be extremely low if the degree of the polynomial is high. We propose three different schemes called \emph{coded Algebraic normal form (ANF)}, \emph{coded Disjunctive normal form (DNF)} and \emph{coded polynomial threshold function (PTF)}. The key idea of the proposed schemes is to model it as the concatenation of some low-degree polynomials and threshold functions. In terms of the security threshold, we show that the proposed coded ANF and coded DNF are optimal by providing a matching outer bound.
△ Less
Submitted 4 March, 2021; v1 submitted 23 January, 2020;
originally announced January 2020.
-
Entangled Polynomial Codes for Secure, Private, and Batch Distributed Matrix Multiplication: Breaking the "Cubic" Barrier
Authors:
Qian Yu,
A. Salman Avestimehr
Abstract:
In distributed matrix multiplication, a common scenario is to assign each worker a fraction of the multiplication task, by partitioning the input matrices into smaller submatrices. In particular, by dividing two input matrices into $m$-by-$p$ and $p$-by-$n$ subblocks, a single multiplication task can be viewed as computing linear combinations of $pmn$ submatrix products, which can be assigned to…
▽ More
In distributed matrix multiplication, a common scenario is to assign each worker a fraction of the multiplication task, by partitioning the input matrices into smaller submatrices. In particular, by dividing two input matrices into $m$-by-$p$ and $p$-by-$n$ subblocks, a single multiplication task can be viewed as computing linear combinations of $pmn$ submatrix products, which can be assigned to $pmn$ workers. Such block-partitioning based designs have been widely studied under the topics of secure, private, and batch computation, where the state of the arts all require computing at least "cubic" ($pmn$) number of submatrix multiplications. Entangled polynomial codes, first presented for straggler mitigation, provides a powerful method for breaking the cubic barrier. It achieves a subcubic recovery threshold, meaning that the final product can be recovered from \emph{any} subset of multiplication results with a size order-wise smaller than $pmn$. In this work, we show that entangled polynomial codes can be further extended to also include these three important settings, and provide a unified framework that order-wise reduces the total computational costs upon the state of the arts by achieving subcubic recovery thresholds.
△ Less
Submitted 13 April, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing
Authors:
Chien-Sheng Yang,
Ramtin Pedarsani,
A. Salman Avestimehr
Abstract:
With recent advancements in edge computing capabilities, there has been a significant increase in utilizing the edge cloud for event-driven and time-sensitive computations. However, large-scale edge computing networks can suffer substantially from unpredictable and unreliable computing resources which can result in high variability of service quality. Thus, it is crucial to design efficient task s…
▽ More
With recent advancements in edge computing capabilities, there has been a significant increase in utilizing the edge cloud for event-driven and time-sensitive computations. However, large-scale edge computing networks can suffer substantially from unpredictable and unreliable computing resources which can result in high variability of service quality. Thus, it is crucial to design efficient task scheduling policies that guarantee quality of service and the timeliness of computation queries. In this paper, we study the problem of computation offloading over unknown edge cloud networks with a sequence of timely computation jobs. Motivated by the MapReduce computation paradigm, we assume each computation job can be partitioned to smaller Map functions that are processed at the edge, and the Reduce function is computed at the user after the Map results are collected from the edge nodes. We model the service quality (success probability of returning result back to the user within deadline) of each edge device as function of context (collection of factors that affect edge devices). The user decides the computations to offload to each device with the goal of receiving a recoverable set of computation results in the given deadline. Our goal is to design an efficient edge computing policy in the dark without the knowledge of the context or computation capabilities of each device. By leveraging the \emph{coded computing} framework in order to tackle failures or stragglers in computation, we formulate this problem using contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the cumulative expected reward. We propose an online learning policy called \emph{online coded edge computing policy}, which provably achieves asymptotically-optimal performance in terms of regret loss compared with the optimal offline policy for the proposed CC-MAB problem.
△ Less
Submitted 4 March, 2021; v1 submitted 19 December, 2019;
originally announced December 2019.
-
Harmonic Coding: An Optimal Linear Code for Privacy-Preserving Gradient-Type Computation
Authors:
Qian Yu,
A. Salman Avestimehr
Abstract:
We consider the problem of distributedly computing a general class of functions, referred to as gradient-type computation, while maintaining the privacy of the input dataset. Gradient-type computation evaluates the sum of some `partial gradients', defined as polynomials of subsets of the input. It underlies many algorithms in machine learning and data analytics. We propose Harmonic Coding, which u…
▽ More
We consider the problem of distributedly computing a general class of functions, referred to as gradient-type computation, while maintaining the privacy of the input dataset. Gradient-type computation evaluates the sum of some `partial gradients', defined as polynomials of subsets of the input. It underlies many algorithms in machine learning and data analytics. We propose Harmonic Coding, which universally computes any gradient-type function, while requiring the minimum possible number of workers. Harmonic Coding strictly improves computing schemes developed based on prior works, such as Shamir's secret sharing and Lagrange Coded Computing, by injecting coded redundancy using harmonic progression. It enables the computing results of the workers to be interpreted as the sum of partial gradients and some redundant results, which then allows the cancellation of non-gradient terms in the decoding process. By proving a matching converse, we demonstrate the optimality of Harmonic Coding, even compared to the schemes that are non-universal (i.e., can be designed based on a specific gradient-type function).
△ Less
Submitted 30 April, 2019;
originally announced April 2019.
-
Timely-Throughput Optimal Coded Computing over Cloud Networks
Authors:
Chien-Sheng Yang,
Ramtin Pedarsani,
A. Salman Avestimehr
Abstract:
In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In t…
▽ More
In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clusters
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning
Authors:
Amirhossein Reisizadeh,
Saurav Prakash,
Ramtin Pedarsani,
Amir Salman Avestimehr
Abstract:
We focus on the commonly used synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key system bottlenecks: communication bandwidth and stragglers' delays. In particular, Ring-AllReduce (RAR) design has been proposed to avoid bandwidth bottleneck at an…
▽ More
We focus on the commonly used synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key system bottlenecks: communication bandwidth and stragglers' delays. In particular, Ring-AllReduce (RAR) design has been proposed to avoid bandwidth bottleneck at any particular node by allowing each worker to only communicate with its neighbors that are arranged in a logical ring. On the other hand, Gradient Coding (GC) has been recently proposed to mitigate stragglers in a master-worker topology by allowing carefully designed redundant allocation of the data set to the workers. We propose a joint communication topology design and data set allocation strategy, named CodedReduce (CR), that combines the best of both RAR and GC. That is, it parallelizes the communications over a tree topology leading to efficient bandwidth utilization, and carefully designs a redundant data set allocation and coding strategy at the nodes to make the proposed gradient aggregation scheme robust to stragglers. In particular, we quantify the communication parallelization gain and resiliency of the proposed CR scheme, and prove its optimality when the communication topology is a regular tree. Moreover, we characterize the expected run-time of CR and show order-wise speedups compared to the benchmark schemes. Finally, we empirically evaluate the performance of our proposed CR design over Amazon EC2 and demonstrate that it achieves speedups of up to 27.2x and 7.0x, respectively over the benchmarks GC and RAR.
△ Less
Submitted 29 September, 2021; v1 submitted 5 February, 2019;
originally announced February 2019.
-
CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning
Authors:
**hyun So,
Basak Guler,
A. Salman Avestimehr
Abstract:
How to train a machine learning model while kee** the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its converg…
▽ More
How to train a machine learning model while kee** the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via extensive experiments on Amazon EC2, we demonstrate that CodedPrivateML provides significant speedup over cryptographic approaches based on multi-party computing (MPC).
△ Less
Submitted 20 February, 2021; v1 submitted 2 February, 2019;
originally announced February 2019.
-
Fitting ReLUs via SGD and Quantized SGD
Authors:
Seyed Mohammadreza Mousavi Kalan,
Mahdi Soltanolkotabi,
A. Salman Avestimehr
Abstract:
In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form $\mathbf{x}\rightarrow \max(0,\langle\mathbf{w},\mathbf{x}\rangle)$ with $\mathbf{w}\in\mathbb{R}^d$ denoting the weight vector. We focus on a planted model where the inputs are chosen i.i.d. from a Gaussian d…
▽ More
In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form $\mathbf{x}\rightarrow \max(0,\langle\mathbf{w},\mathbf{x}\rangle)$ with $\mathbf{w}\in\mathbb{R}^d$ denoting the weight vector. We focus on a planted model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to a planted weight vector. We first show that mini-batch stochastic gradient descent when suitably initialized, converges at a geometric rate to the planted model with a number of samples that is optimal up to numerical constants. Next we focus on a parallel implementation where in each iteration the mini-batch gradient is calculated in a distributed manner across multiple processors and then broadcast to a master or all other processors. To reduce the communication cost in this setting we utilize a Quanitzed Stochastic Gradient Scheme (QSGD) where the partial gradients are quantized. Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost. We further corroborate our numerical findings via various experiments including distributed implementations over Amazon EC2.
△ Less
Submitted 1 April, 2019; v1 submitted 19 January, 2019;
originally announced January 2019.
-
INTERPOL: Information Theoretically Verifiable Polynomial Evaluation
Authors:
Saeid Sahraei,
A. Salman Avestimehr
Abstract:
We study the problem of verifiable polynomial evaluation in the user-server and multi-party setups. We propose {INTERPOL}, an information-theoretically verifiable algorithm that allows a user to delegate the evaluation of a polynomial to a server, and verify the correctness of the results with high probability and in sublinear complexity. Compared to the existing approaches which typically rely on…
▽ More
We study the problem of verifiable polynomial evaluation in the user-server and multi-party setups. We propose {INTERPOL}, an information-theoretically verifiable algorithm that allows a user to delegate the evaluation of a polynomial to a server, and verify the correctness of the results with high probability and in sublinear complexity. Compared to the existing approaches which typically rely on cryptographic assumptions, {INTERPOL} stands out in that it does not assume any computational limitation on the server. {INTERPOL} relies on decomposition of polynomial evaluation into two matrix multiplications, and injection of computation redundancy in the form of locally computed parities with secret coefficients for verification. We show that {INTERPOL} has several desirable properties such as adaptivity and public verifiability. Furthermore, by generalizing {INTERPOL} to a multi-party setting consisting of a network of $n$ untrusted nodes, where each node is interested in evaluating the same polynomial, we demonstrate that we can achieve an overall computational complexity comparable to a trusted setup, while guaranteeing information-theoretic verification at each node.
△ Less
Submitted 27 April, 2019; v1 submitted 10 January, 2019;
originally announced January 2019.
-
PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously
Authors:
Songze Li,
Mingchao Yu,
Chien-Sheng Yang,
A. Salman Avestimehr,
Sreeram Kannan,
Pramod Viswanath
Abstract:
Today's blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notio…
▽ More
Today's blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notion of sharding: different subsets of nodes handle different portions of the blockchain, thereby reducing the load for each individual node. However, existing sharding proposals achieve efficiency scaling by compromising on trust - corrupting the nodes in a given shard will lead to the permanent loss of the corresponding portion of data. In this paper, we settle the trilemma by demonstrating a new protocol for coded storage and computation in blockchains. In particular, we propose PolyShard: ``polynomially coded sharding'' scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system. We provide simulation results that numerically demonstrate the performance improvement over state of the arts, and the scalability of the PolyShard system. Finally, we discuss potential enhancements, and highlight practical considerations in building such a system.
△ Less
Submitted 24 January, 2020; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding
Authors:
Songze Li,
Seyed Mohammadreza Mousavi Kalan,
Qian Yu,
Mahdi Soltanolkotabi,
A. Salman Avestimehr
Abstract:
We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed systems are significantly slowed down due to the presence of slow-running machines (stragglers) as well as various communication bottlenecks. We propose "polynom…
▽ More
We consider the problem of training a least-squares regression model on a large dataset using gradient descent. The computation is carried out on a distributed system consisting of a master node and multiple worker nodes. Such distributed systems are significantly slowed down due to the presence of slow-running machines (stragglers) as well as various communication bottlenecks. We propose "polynomially coded regression" (PCR) that substantially reduces the effect of stragglers and lessens the communication burden in such systems. The key idea of PCR is to encode the partial data stored at each worker, such that the computations at the workers can be viewed as evaluating a polynomial at distinct points. This allows the master to compute the final gradient by interpolating this polynomial. PCR significantly reduces the recovery threshold, defined as the number of workers the master has to wait for prior to computing the gradient. In particular, PCR requires a recovery threshold that scales inversely proportionally with the amount of computation/storage available at each worker. In comparison, state-of-the-art straggler-mitigation schemes require a much higher recovery threshold that only decreases linearly in the per worker computation/storage load. We prove that PCR's recovery threshold is near minimal and within a factor two of the best possible scheme. Our experiments over Amazon EC2 demonstrate that compared with state-of-the-art schemes, PCR improves the run-time by 1.50x ~ 2.36x with naturally occurring stragglers, and by as much as 2.58x ~ 4.29x with artificial stragglers.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Compressed Coded Distributed Computing
Authors:
Songze Li,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
Communication overhead is one of the major performance bottlenecks in large-scale distributed computing systems, in particular for machine learning applications. Conventionally, compression techniques are used to reduce the load of communication by combining intermediate results of the same computation task as much as possible. Recently, via the development of coded distributed computing (CDC), it…
▽ More
Communication overhead is one of the major performance bottlenecks in large-scale distributed computing systems, in particular for machine learning applications. Conventionally, compression techniques are used to reduce the load of communication by combining intermediate results of the same computation task as much as possible. Recently, via the development of coded distributed computing (CDC), it has been shown that it is possible to enable coding opportunities across intermediate results of different computation tasks to further reduce the communication load. We propose a new scheme, named compressed coded distributed computing (in short, compressed CDC), which jointly exploits the above two techniques (i.e., combining the intermediate results of the same computation and coding across the intermediate results of different computations) to significantly reduce the communication load for computations with linear aggregation (reduction) of intermediate results in the final stage that are prevalent in machine learning (e.g., distributed training algorithms where partial gradients are computed distributedly and then averaged in the final stage). In particular, compressed CDC first compresses/combines several intermediate results for a single computation, and then utilizes multiple such combined packets to create a coded multicast packet that is simultaneously useful for multiple computations. We characterize the achievable communication load of compressed CDC and show that it substantially outperforms both combining methods and CDC scheme.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Communication-Aware Scheduling of Serial Tasks for Dispersed Computing
Authors:
Chien-Sheng Yang,
Ramtin Pedarsani,
A. Salman Avestimehr
Abstract:
There is a growing interest in development of in-network dispersed computing paradigms that leverage the computing capabilities of heterogeneous resources dispersed across the network for processing massive amount of data is collected at the edge of the network. We consider the problem of task scheduling for such networks, in a dynamic setting in which arriving computation jobs are modeled as chai…
▽ More
There is a growing interest in development of in-network dispersed computing paradigms that leverage the computing capabilities of heterogeneous resources dispersed across the network for processing massive amount of data is collected at the edge of the network. We consider the problem of task scheduling for such networks, in a dynamic setting in which arriving computation jobs are modeled as chains, with nodes representing tasks, and edges representing precedence constraints among tasks. In our proposed model, motivated by significant communication costs in dispersed computing environments, the communication times are taken into account. More specifically, we consider a network where servers are capable of serving all task types, and sending the results of processed tasks from one server to another server results in some communication delay that makes the design of optimal scheduling policy significantly more challenging than classical queueing networks. As the main contributions of the paper, we first characterize the capacity region of the network, then propose a novel virtual queueing network encoding the state of the network. Finally, we propose a Max-Weight type scheduling policy, and considering the virtual queueing network in the fluid limit, we use a Lyapunov argument to show that the policy is throughput-optimal.
△ Less
Submitted 25 May, 2019; v1 submitted 17 April, 2018;
originally announced April 2018.
-
Fundamental Resource Trade-offs for Encoded Distributed Optimization
Authors:
A. Salman Avestimehr,
Seyed Mohammadreza Mousavi Kalan,
Mahdi Soltanolkotabi
Abstract:
Dealing with the shear size and complexity of today's massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a.~stragglers can significantly slow down computation as the slowest node may dictate…
▽ More
Dealing with the shear size and complexity of today's massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a.~stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy), and straggler toleration in this framework.
△ Less
Submitted 1 April, 2019; v1 submitted 31 March, 2018;
originally announced April 2018.
-
Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding
Authors:
Qian Yu,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangl…
▽ More
We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangled polynomial code}, for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of $2$ using \emph{bilinear complexity}, by develo** an improved version of the entangled polynomial code. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of $2$ of this fundamental quantity. On the other hand, the improved version of the entangled polynomial code enables further and orderwise reduction in the recovery threshold, compared to its basic version. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault-tolerant computing, leading to tight characterizations.
△ Less
Submitted 9 April, 2020; v1 submitted 23 January, 2018;
originally announced January 2018.
-
Coded Computing for Distributed Graph Analytics
Authors:
Saurav Prakash,
Amirhossein Reisizadeh,
Ramtin Pedarsani,
Amir Salman Avestimehr
Abstract:
Performance of distributed graph processing systems significantly suffers from 'communication bottleneck' as a large number of messages are exchanged among servers at each step of the computation. Motivated by graph based MapReduce, we propose a coded computing framework that leverages computation redundancy to alleviate the communication bottleneck in distributed graph processing. We develop a no…
▽ More
Performance of distributed graph processing systems significantly suffers from 'communication bottleneck' as a large number of messages are exchanged among servers at each step of the computation. Motivated by graph based MapReduce, we propose a coded computing framework that leverages computation redundancy to alleviate the communication bottleneck in distributed graph processing. We develop a novel 'coding' scheme that systematically injects structured redundancy in computation phase to enable 'coded' multicasting opportunities during message exchange between servers, reducing communication load substantially in large-scale graph processing. For theoretical analysis, we consider random graph models, and prove that our proposed scheme enables an (asymptotically) inverse-linear trade-off between 'computation load' and 'average communication load' for two popular random graph models -- Erdos-Renyi model, and power law model. Particularly, for a given computation load r, (i.e. when each graph vertex is carefully stored at r servers), the proposed scheme slashes the average communication load by (nearly) a multiplicative factor of r. For the Erdos-Renyi model, our proposed scheme is optimal asymptotically as the graph size increases by providing an information-theoretic converse. To illustrate the benefits of our scheme in practice, we implement PageRank over Amazon EC2, using artificial as well as real-world datasets, demonstrating significant gains over conventional PageRank. We also specialize our scheme and extend our theoretical results to two other random graph models -- random bi-partite model, and stochastic block model. They asymptotically enable inverse-linear trade-offs between computation and communication loads in distributed graph processing for these popular random graph models as well. We complement the achievability results with converse bounds for both of these models.
△ Less
Submitted 9 June, 2020; v1 submitted 16 January, 2018;
originally announced January 2018.
-
An Approximation Algorithm for Optimal Clique Cover Delivery in Coded Caching
Authors:
Seyed Mohammad Asghari,
Yi Ouyang,
Ashutosh Nayyar,
A. Salman Avestimehr
Abstract:
Coded caching can significantly reduce the communication bandwidth requirement for satisfying users' demands by utilizing the multicasting gain among multiple users. Most existing works assume that the users follow the prescriptions for content placement made by the system. However, users may prefer to decide what files to cache. To address this issue, we consider a network consisting of a file se…
▽ More
Coded caching can significantly reduce the communication bandwidth requirement for satisfying users' demands by utilizing the multicasting gain among multiple users. Most existing works assume that the users follow the prescriptions for content placement made by the system. However, users may prefer to decide what files to cache. To address this issue, we consider a network consisting of a file server connected through a shared link to $K$ users, each equipped with a cache which has been already filled arbitrarily. Given an arbitrary content placement, the goal is to find a delivery strategy for the server that minimizes the load of the shared link. In this paper, we focus on a specific class of coded multicasting delivery schemes known as the "clique cover delivery scheme". We first formulate the optimal clique cover delivery problem as a combinatorial optimization problem. Using a connection with the weighted set cover problem, we propose an approximation algorithm and show that it provides an approximation ratio of $(1 + \log K)$, while the approximation ratio for the existing coded delivery schemes is linear in $K$. Numerical simulations show that our proposed algorithm provides a considerable bandwidth reduction over the existing coded delivery schemes for almost all content placement schemes.
△ Less
Submitted 28 March, 2019; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Near-Optimal Straggler Mitigation for Distributed Gradient Methods
Authors:
Songze Li,
Seyed Mohammadreza Mousavi Kalan,
A. Salman Avestimehr,
Mahdi Soltanolkotabi
Abstract:
Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a full…
▽ More
Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a full gradient and the learning model is updated. However, a major performance bottleneck that arises is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. We propose a distributed computing scheme, called Batched Coupon's Collector (BCC) to alleviate the effect of stragglers in gradient methods. We prove that our BCC scheme is robust to a near optimal number of random stragglers. We also empirically demonstrate that our proposed BCC scheme reduces the run-time by up to 85.4% over Amazon EC2 clusters when compared with other straggler mitigation strategies. We also generalize the proposed BCC scheme to minimize the completion time when implementing gradient descent-based algorithms over heterogeneous worker nodes.
△ Less
Submitted 27 October, 2017;
originally announced October 2017.
-
Coded Fourier Transform
Authors:
Qian Yu,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider the problem of computing the Fourier transform of high-dimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs. We show that by exploiting the algebraic structure of the Fourier transform operation and leveraging concepts from coding theory, one can ef…
▽ More
We consider the problem of computing the Fourier transform of high-dimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs. We show that by exploiting the algebraic structure of the Fourier transform operation and leveraging concepts from coding theory, one can efficiently deal with the straggler effects. In particular, we propose a computation strategy, named as coded FFT, which achieves the optimal recovery threshold, defined as the minimum number of workers that the master node needs to wait for in order to compute the output. This is the first code that achieves the optimum robustness in terms of tolerating stragglers or failures for computing Fourier transforms. Furthermore, the reconstruction process for coded FFT can be mapped to MDS decoding, which can be solved efficiently. Moreover, we extend coded FFT to settings including computing general $n$-dimensional Fourier transforms, and provide the optimal computing strategy for those settings.
△ Less
Submitted 17 October, 2017;
originally announced October 2017.
-
On Heterogeneous Coded Distributed Computing
Authors:
Mehrdad Kiamari,
Chenwei Wang,
A. Salman Avestimehr
Abstract:
We consider the recently proposed Coded Distributed Computing (CDC) framework that leverages carefully designed redundant computations to enable coding opportunities that substantially reduce the communication load of distributed computing. We generalize this framework to heterogeneous systems where different nodes in the computing cluster can have different storage (or processing) capabilities. W…
▽ More
We consider the recently proposed Coded Distributed Computing (CDC) framework that leverages carefully designed redundant computations to enable coding opportunities that substantially reduce the communication load of distributed computing. We generalize this framework to heterogeneous systems where different nodes in the computing cluster can have different storage (or processing) capabilities. We provide the information-theoretically optimal data set placement and coded data shuffling scheme that minimizes the communication load in a cluster with 3 nodes. For clusters with $K>3$ nodes, we provide an algorithm description to generalize our coding ideas to larger networks.
△ Less
Submitted 1 September, 2017;
originally announced September 2017.
-
SINR-Threshold Scheduling with Binary Power Control for D2D Networks
Authors:
Mehrdad Kiamari,
Chenwei Wang,
A. Salman Avestimehr,
Haralabos Papadopoulos
Abstract:
In this paper, we consider a device-to-device communication network in which $K$ transmitter-receiver pairs are sharing spectrum with each other. We propose a novel but simple binary scheduling scheme for this network to maximize the average sum rate of the pairs. According to the scheme, each receiver predicts its Signal-to-Interference-plus-Noise Ratio (SINR), assuming \emph{all} other user pair…
▽ More
In this paper, we consider a device-to-device communication network in which $K$ transmitter-receiver pairs are sharing spectrum with each other. We propose a novel but simple binary scheduling scheme for this network to maximize the average sum rate of the pairs. According to the scheme, each receiver predicts its Signal-to-Interference-plus-Noise Ratio (SINR), assuming \emph{all} other user pairs are active, and compares it to a preassigned threshold to decide whether its corresponding transmitter to be activated or not. For our proposed scheme, the optimal threshold that maximizes the expected sum rate is obtained analytically for the two user-pair case and empirically in the general $K$ user-pair case. Simulation results reveal that our proposed SINR-threshold scheduling scheme outperforms ITLinQ \cite{navid}, FlashLinQ \cite{flash} and the method presented in \cite{G} in terms of the expected sum rate (network throughput). In addition, the computational complexity of the proposed scheme is $O(K)$, outperforming both ITLinQ and FlashLinQ that have $O(K^2)$ complexity requirements. Moreover, we also discuss the application of our proposed new scheme into an operator-assisted cellular D2D heterogeneous network.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.
-
Communication-Aware Computing for Edge Processing
Authors:
Songze Li,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a mobile edge computing problem, in which mobile users offload their computation tasks to computing nodes (e.g., base stations) at the network edge. The edge nodes compute the requested functions and communicate the computed results to the users via wireless links. For this problem, we propose a Universal Coded Edge Computing (UCEC) scheme for linear functions to simultaneously minimiz…
▽ More
We consider a mobile edge computing problem, in which mobile users offload their computation tasks to computing nodes (e.g., base stations) at the network edge. The edge nodes compute the requested functions and communicate the computed results to the users via wireless links. For this problem, we propose a Universal Coded Edge Computing (UCEC) scheme for linear functions to simultaneously minimize the load of computation at the edge nodes, and maximize the physical-layer communication efficiency towards the mobile users. In the proposed UCEC scheme, edge nodes create coded inputs of the users, from which they compute coded output results. Then, the edge nodes utilize the computed coded results to create communication messages that zero-force all the interference signals over the air at each user. Specifically, the proposed scheme is universal since the coded computations performed at the edge nodes are oblivious of the channel states during the communication process from the edge nodes to the users.
△ Less
Submitted 22 June, 2017;
originally announced June 2017.
-
Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication
Authors:
Qian Yu,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to efficiently deal with straggling w…
▽ More
We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to efficiently deal with straggling workers. The proposed strategy, named as \emph{polynomial codes}, achieves the optimum recovery threshold, defined as the minimum number of workers that the master needs to wait for in order to compute the output. Furthermore, by leveraging the algebraic structure of polynomial codes, we can map the reconstruction problem of the final output to a polynomial interpolation problem, which can be solved efficiently. Polynomial codes provide order-wise improvement over the state of the art in terms of recovery threshold, and are also optimal in terms of several other metrics. Furthermore, we extend this code to distributed convolution and show its order-wise optimality.
△ Less
Submitted 24 January, 2018; v1 submitted 30 May, 2017;
originally announced May 2017.
-
Capacity Region of the Symmetric Injective K-User Deterministic Interference Channel
Authors:
Mehrdad Kiamari,
A. Salman Avestimehr
Abstract:
We characterize the capacity region of the symmetric injective K-user Deterministic Interference Channel (DIC) for all channel parameters. The achievable rate region is derived by first projecting the achievable rate region of Han-Kobayashi (HK) scheme, which is in terms of common and private rates for each user, along the direction of aggregate rates for each user (i.e., the sum of common and pri…
▽ More
We characterize the capacity region of the symmetric injective K-user Deterministic Interference Channel (DIC) for all channel parameters. The achievable rate region is derived by first projecting the achievable rate region of Han-Kobayashi (HK) scheme, which is in terms of common and private rates for each user, along the direction of aggregate rates for each user (i.e., the sum of common and private rates). We then show that the projected region is characterized by only the projection of those facets in the HK region for which the coefficient of common rate and private rate are the same for all users, hence simplifying the region. Furthermore, we derive a tight converse for each facet of the simplified achievable rate region.
△ Less
Submitted 30 April, 2017;
originally announced May 2017.
-
How to Optimally Allocate Resources for Coded Distributed Computing?
Authors:
Qian Yu,
Songze Li,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute computations across many nodes to take advantage of parallel processing. However, as we allocate more and more computing resources to a computation task and furt…
▽ More
Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute computations across many nodes to take advantage of parallel processing. However, as we allocate more and more computing resources to a computation task and further distribute the computations, large amounts of (partially) computed data must be moved between consecutive stages of computation tasks among the nodes, hence the communication load can become the bottleneck. In this paper, we study the optimal allocation of computing resources in distributed computing, in order to minimize the total execution time in distributed computing accounting for both the duration of computation and communication phases. In particular, we consider a general MapReduce-type distributed computing framework, in which the computation is decomposed into three stages: \emph{Map}, \emph{Shuffle}, and \emph{Reduce}. We focus on a recently proposed \emph{Coded Distributed Computing} approach for MapReduce and study the optimal allocation of computing resources in this framework. For all values of problem parameters, we characterize the optimal number of servers that should be used for distributed processing, provide the optimal placements of the Map and Reduce tasks, and propose an optimal coded data shuffling scheme, in order to minimize the total execution time. To prove the optimality of the proposed scheme, we first derive a matching information-theoretic converse on the execution time, then we prove that among all possible resource allocation schemes that achieve the minimum execution time, our proposed scheme uses the exactly minimum possible number of servers.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
Coding for Distributed Fog Computing
Authors:
Songze Li,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
Redundancy is abundant in Fog networks (i.e., many computing and storage points) and grows linearly with network size. We demonstrate the transformational role of coding in Fog computing for leveraging such redundancy to substantially reduce the bandwidth consumption and latency of computing. In particular, we discuss two recently proposed coding concepts, namely Minimum Bandwidth Codes and Minimu…
▽ More
Redundancy is abundant in Fog networks (i.e., many computing and storage points) and grows linearly with network size. We demonstrate the transformational role of coding in Fog computing for leveraging such redundancy to substantially reduce the bandwidth consumption and latency of computing. In particular, we discuss two recently proposed coding concepts, namely Minimum Bandwidth Codes and Minimum Latency Codes, and illustrate their impacts in Fog computing. We also review a unified coding framework that includes the above two coding techniques as special cases, and enables a tradeoff between computation latency and communication load to optimize system performance. At the end, we will discuss several open problems and future research directions.
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
Coded TeraSort
Authors:
Songze Li,
Sucha Supittayapornpong,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the da…
▽ More
We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97x - 3.39x speedup, compared with TeraSort, for typical settings of interest.
△ Less
Submitted 15 February, 2017;
originally announced February 2017.
-
Characterizing the Rate-Memory Tradeoff in Cache Networks within a Factor of 2
Authors:
Qian Yu,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a basic caching system, where a single server with a database of $N$ files (e.g. movies) is connected to a set of $K$ users through a shared bottleneck link. Each user has a local cache memory with a size of $M$ files. The system operates in two phases: a placement phase, where each cache memory is populated up to its size from the database, and a following delivery phase, where each u…
▽ More
We consider a basic caching system, where a single server with a database of $N$ files (e.g. movies) is connected to a set of $K$ users through a shared bottleneck link. Each user has a local cache memory with a size of $M$ files. The system operates in two phases: a placement phase, where each cache memory is populated up to its size from the database, and a following delivery phase, where each user requests a file from the database, and the server is responsible for delivering the requested contents. The objective is to design the two phases to minimize the load (peak or average) of the bottleneck link. We characterize the rate-memory tradeoff of the above caching system within a factor of $2.00884$ for both the peak rate and the average rate (under uniform file popularity), improving state of the arts that are within a factor of $4$ and $4.7$ respectively. Moreover, in a practically important case where the number of files ($N$) is large, we exactly characterize the tradeoff for systems with no more than $5$ users, and characterize the tradeoff within a factor of $2$ otherwise. To establish these results, we develop two new converse bounds that improve over the state of the art.
△ Less
Submitted 31 August, 2018; v1 submitted 15 February, 2017;
originally announced February 2017.
-
Coded Computation over Heterogeneous Clusters
Authors:
Amirhossein Reisizadeh,
Saurav Prakash,
Ramtin Pedarsani,
Amir Salman Avestimehr
Abstract:
In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler nodes, etc. On the other hand, these systems enjoy abundance of redundancy - a vast number of computing nodes and large storage capacity. There have been recent…
▽ More
In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler nodes, etc. On the other hand, these systems enjoy abundance of redundancy - a vast number of computing nodes and large storage capacity. There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in homogeneous clusters. In this paper, we focus on general heterogeneous distributed computing clusters consisting of a variety of computing machines with different capabilities. We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. In particular, we propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal for a broad class of processing time distributions. Moreover, we show that HCMM is unboundedly faster than any uncoded scheme. To demonstrate practicality of HCMM, we carry out experiments over Amazon EC2 clusters where HCMM is found to be up to $61\%$, $46\%$ and $36\%$ respectively faster than three benchmark load allocation schemes - Uniform Uncoded, Load-balanced Uncoded, and Uniform Coded. Additionally, we provide a generalization to the problem of optimal load allocation in heterogeneous settings, where we take into account the monetary costs associated with the clusters. We argue that HCMM is asymptotically optimal for budget-constrained scenarios as well, and we develop a heuristic algorithm for (HCMM) load allocation for budget-limited computation tasks.
△ Less
Submitted 19 June, 2019; v1 submitted 20 January, 2017;
originally announced January 2017.
-
On the Optimality of Separation between Caching and Delivery in General Cache Networks
Authors:
Navid Naderializadeh,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a system, containing a library of multiple files and a general memoryless communication network through which a server is connected to multiple users, each equipped with a local isolated cache of certain size that can be used to store part of the library. Each user will ask for one of the files in the library, which needs to be delivered by the server through the intermediate communica…
▽ More
We consider a system, containing a library of multiple files and a general memoryless communication network through which a server is connected to multiple users, each equipped with a local isolated cache of certain size that can be used to store part of the library. Each user will ask for one of the files in the library, which needs to be delivered by the server through the intermediate communication network. The objective is to design the cache placement (without prior knowledge of users' future requests) and the delivery phase in order to minimize the (normalized) delivery delay. We assume that the delivery phase consists of two steps: (1) generation of a set of multicast messages at the server, one for each subset of users, and (2) delivery of the multicast messages to the users.
In this setting, we show that there exists a universal scheme for cache placement and multicast message generation, which is independent of the underlying communication network between the server and the users, and achieves the optimal delivery delay to within a constant factor for all memoryless networks. We prove this result, even though the capacity region of the underlying communication network is not known, even approximately. This result demonstrates that in the aforementioned setting, a separation between caching and multicast message generation on one hand, and delivering the multicast messages to the users on the other hand is approximately optimal. This result has the important practical implication that the prefetching can be done independent of network structure in the upcoming delivery phase.
△ Less
Submitted 5 May, 2018; v1 submitted 20 January, 2017;
originally announced January 2017.
-
The Exact Rate-Memory Tradeoff for Caching with Uncoded Prefetching
Authors:
Qian Yu,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a prefetching phase. In a following delivery phase, each user requests a file from the database, and the server needs to deliver users' demands as efficiently as p…
▽ More
We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a prefetching phase. In a following delivery phase, each user requests a file from the database, and the server needs to deliver users' demands as efficiently as possible by taking into account their cache contents. We focus on an important and commonly used class of prefetching schemes, where the caches are filled with uncoded data. We provide the exact characterization of the rate-memory tradeoff for this problem, by deriving both the minimum average rate (for a uniform file popularity) and the minimum peak rate required on the bottleneck link for a given cache size available at each user. In particular, we propose a novel caching scheme, which strictly improves the state of the art by exploiting commonality among user demands. We then demonstrate the exact optimality of our proposed scheme through a matching converse, by dividing the set of all demands into types, and showing that the placement phase in the proposed caching scheme is universally optimal for all types. Using these techniques, we also fully characterize the rate-memory tradeoff for a decentralized setting, in which users fill out their cache content without any coordination.
△ Less
Submitted 18 February, 2019; v1 submitted 25 September, 2016;
originally announced September 2016.
-
A Unified Coding Framework for Distributed Computing with Straggling Servers
Authors:
Songze Li,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme of [1]-[3] that repeats the intermediate computations to create coded multicasting opportunities to reduce communication load, and the coded scheme of [4], [5]…
▽ More
We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme of [1]-[3] that repeats the intermediate computations to create coded multicasting opportunities to reduce communication load, and the coded scheme of [4], [5] that generates redundant intermediate computations to combat against straggling servers can be viewed as special instances of the proposed framework, by considering two extremes of this tradeoff: minimizing either the load of communication or the latency of computation individually. Furthermore, the latency-load tradeoff achieved by the proposed coded framework allows to systematically operate at any point on that tradeoff to perform distributed computing tasks. We also prove an information-theoretic lower bound on the latency-load tradeoff, which is shown to be within a constant multiplicative gap from the achieved tradeoff at the two end points.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
A Scalable Framework for Wireless Distributed Computing
Authors:
Songze Li,
Qian Yu,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a wireless distributed computing system, in which multiple mobile users, connected wirelessly through an access point, collaborate to perform a computation task. In particular, users communicate with each other via the access point to exchange their locally computed intermediate computation results, which is known as data shuffling. We propose a scalable framework for this system, in w…
▽ More
We consider a wireless distributed computing system, in which multiple mobile users, connected wirelessly through an access point, collaborate to perform a computation task. In particular, users communicate with each other via the access point to exchange their locally computed intermediate computation results, which is known as data shuffling. We propose a scalable framework for this system, in which the required communication bandwidth for data shuffling does not increase with the number of users in the network. The key idea is to utilize a particular repetitive pattern of placing the dataset (thus a particular repetitive pattern of intermediate computations), in order to provide coding opportunities at both the users and the access point, which reduce the required uplink communication bandwidth from users to access point and the downlink communication bandwidth from access point to users by factors that grow linearly with the number of users. We also demonstrate that the proposed dataset placement and coded shuffling schemes are optimal (i.e., achieve the minimum required shuffling load) for both a centralized setting and a decentralized setting, by develo** tight information-theoretic lower bounds.
△ Less
Submitted 5 May, 2017; v1 submitted 19 August, 2016;
originally announced August 2016.
-
Active Learning On Weighted Graphs Using Adaptive And Non-adaptive Approaches
Authors:
Eyal En Gad,
Akshay Gadde,
A. Salman Avestimehr,
Antonio Ortega
Abstract:
This paper studies graph-based active learning, where the goal is to reconstruct a binary signal defined on the nodes of a weighted graph, by sampling it on a small subset of the nodes. A new sampling algorithm is proposed, which sequentially selects the graph nodes to be sampled, based on an aggressive search for the boundary of the signal over the graph. The algorithm generalizes a recent method…
▽ More
This paper studies graph-based active learning, where the goal is to reconstruct a binary signal defined on the nodes of a weighted graph, by sampling it on a small subset of the nodes. A new sampling algorithm is proposed, which sequentially selects the graph nodes to be sampled, based on an aggressive search for the boundary of the signal over the graph. The algorithm generalizes a recent method for sampling nodes in unweighted graphs. The generalization improves the sampling performance using the information gained from the available graph weights. An analysis of the number of samples required by the proposed algorithm is provided, and the gain over the unweighted method is further demonstrated in simulations. Additionally, the proposed method is compared with an alternative state of-the-art method, which is based on the graph's spectral properties. It is shown that the proposed method significantly outperforms the spectral sampling method, if the signal needs to be predicted with high accuracy. On the other hand, if a higher level of inaccuracy is tolerable, then the spectral method outperforms the proposed aggressive search method. Consequently, we propose a hybrid method, which is shown to combine the advantages of both approaches.
△ Less
Submitted 18 May, 2016;
originally announced May 2016.
-
Topological Interference Management with Reconfigurable Antennas
Authors:
Heecheol Yang,
Navid Naderializadeh,
Amir Salman Avestimehr,
Jungwoo Lee
Abstract:
We study the symmetric degrees-of-freedom (DoF) of partially connected interference networks under linear coding strategies at transmitters without channel state information beyond topology. We assume that the receivers are equipped with reconfigurable antennas that can switch among their preset modes. In such a network setting, we characterize the class of network topologies in which half linear…
▽ More
We study the symmetric degrees-of-freedom (DoF) of partially connected interference networks under linear coding strategies at transmitters without channel state information beyond topology. We assume that the receivers are equipped with reconfigurable antennas that can switch among their preset modes. In such a network setting, we characterize the class of network topologies in which half linear symmetric DoF is achievable. Moreover, we derive a general upper bound on the linear symmetric DoF for arbitrary network topologies. We also show that this upper bound is tight if the transmitters have at most two co-interferers.
△ Less
Submitted 4 May, 2016;
originally announced May 2016.
-
A Fundamental Tradeoff between Computation and Communication in Distributed Computing
Authors:
Songze Li,
Mohammad Ali Maddah-Ali,
Qian Yu,
A. Salman Avestimehr
Abstract:
How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other.
More specifically, a general distributed computing framework, motivated by commonly used structures like…
▽ More
How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other.
More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "Coded Distributed Computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of $r$ (i.e., evaluating each function at $r$ carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor.
An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme. As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized.
Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by $1.97\times$ - $3.39\times$, for typical settings of interest.
△ Less
Submitted 22 September, 2017; v1 submitted 24 April, 2016;
originally announced April 2016.
-
Fundamental Limits of Cache-Aided Interference Management
Authors:
Navid Naderializadeh,
Mohammad Ali Maddah-Ali,
A. Salman Avestimehr
Abstract:
We consider a system comprising a library of $N$ files (e.g., movies) and a wireless network with $K_T$ transmitters, each equipped with a local cache of size of $M_T$ files, and $K_R$ receivers, each equipped with a local cache of size of $M_R$ files. Each receiver will ask for one of the $N$ files in the library, which needs to be delivered. The objective is to design the cache placement (withou…
▽ More
We consider a system comprising a library of $N$ files (e.g., movies) and a wireless network with $K_T$ transmitters, each equipped with a local cache of size of $M_T$ files, and $K_R$ receivers, each equipped with a local cache of size of $M_R$ files. Each receiver will ask for one of the $N$ files in the library, which needs to be delivered. The objective is to design the cache placement (without prior knowledge of receivers' future requests) and the communication scheme to maximize the throughput of the delivery. In this setting, we show that the sum degrees-of-freedom (sum-DoF) of $\min\left\{\frac{K_T M_T+K_R M_R}{N},K_R\right\}$ is achievable, and this is within a factor of 2 of the optimum, under one-shot linear schemes. This result shows that (i) the one-shot sum-DoF scales linearly with the aggregate cache size in the network (i.e., the cumulative memory available at all nodes), (ii) the transmitters' and receivers' caches contribute equally in the one-shot sum-DoF, and (iii) caching can offer a throughput gain that scales linearly with the size of the network.
To prove the result, we propose an achievable scheme that exploits the redundancy of the content at transmitters' caches to cooperatively zero-force some outgoing interference and availability of the unintended content at receivers' caches to cancel (subtract) some of the incoming interference. We develop a particular pattern for cache placement that maximizes the overall gains of cache-aided transmit and receive interference cancellations. For the converse, we present an integer optimization problem which minimizes the number of communication blocks needed to deliver any set of requested files to the receivers. We then provide a lower bound on the value of this optimization problem, hence leading to an upper bound on the linear one-shot sum-DoF of the network, which is within a factor of 2 of the achievable sum-DoF.
△ Less
Submitted 20 April, 2016; v1 submitted 12 February, 2016;
originally announced February 2016.