Search | arXiv e-print repository

Coded Computing: A Learning-Theoretic Framework

Authors: Parsa Moradi, Behrooz Tahmasebi, Mohammad Ali Maddah-Ali

Abstract: Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a si… ▽ More Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and develo** a new framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder. We show that in the proposed solution, the mean squared error of the estimation decays with the rate of $O(S^4 N^{-3})$ and $O(S^{\frac{8}{5}}N^{\frac{-3}{5}})$ in noiseless and noisy computation settings, respectively, where $N$ is the number of worker nodes with at most $S$ slow servers (stragglers). Finally, we evaluate the proposed scheme on inference tasks for various machine learning models and demonstrate that the proposed framework outperforms the state-of-the-art in terms of accuracy and rate of convergence. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 28 pages, 4 figures

arXiv:2402.04377 [pdf, other]

NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

Authors: Parsa Moradi, Mohammad Ali Maddah-Ali

Abstract: Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant framework for approximate coded computing. NeRCC includes three layers: (1) encoding regression and sampling, which generates coded data points, as a combination… ▽ More Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant framework for approximate coded computing. NeRCC includes three layers: (1) encoding regression and sampling, which generates coded data points, as a combination of original data points, (2) computing, in which a cluster of workers run inference on the coded data points, (3) decoding regression and sampling, which approximately recovers the predictions of the original data points from the available predictions on the coded data points. We argue that the overall objective of the framework reveals an underlying interconnection between two regression models in the encoding and decoding layers. We propose a solution to the nested regressions problem by summarizing their dependence on two regularization terms that are jointly optimized. Our extensive experiments on different datasets and various machine learning models, including LeNet5, RepVGG, and Vision Transformer (ViT), demonstrate that NeRCC accurately approximates the original predictions in a wide range of stragglers, outperforming the state-of-the-art by up to 23%. △ Less

Submitted 8 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.17419 [pdf, other]

Few-Shot Channel-Agnostic Analog Coding: A Near-Optimal Scheme

Authors: Mohammad Ali Maddah-Ali, Soheil Mohajer

Abstract: In this paper, we investigate the problem of transmitting an analog source to a destination over $N$ uses of an additive-white-Gaussian-noise (AWGN) channel, where $N$ is very small (in the order of 10 or even less). The proposed coding scheme is based on representing the source symbol using a novel progressive expansion technique, partitioning the digits of expansion into $N$ ordered sets, and fi… ▽ More In this paper, we investigate the problem of transmitting an analog source to a destination over $N$ uses of an additive-white-Gaussian-noise (AWGN) channel, where $N$ is very small (in the order of 10 or even less). The proposed coding scheme is based on representing the source symbol using a novel progressive expansion technique, partitioning the digits of expansion into $N$ ordered sets, and finally map** the symbols in each set to a real number by applying the reverse progressive expansion. In the last step, we introduce some gaps between the signal levels to prevent the carry-over of the additive noise from propagation to other levels. This shields the most significant levels of the signal from an additive noise, hitting the signal at a less significant level. The parameters of the progressive expansion and the shielding procedure are opportunistically independent of the $\SNR$ so that the proposed scheme achieves a distortion $D$, where $-\log(D)$ is within $O(\log\log(\SNR))$ of the optimal performance for all values of $\SNR$, leading to a channel-agnostic scheme. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.16643 [pdf, other]

Game of Coding: Beyond Trusted Majorities

Authors: Hanzaleh Akbari Nodehi, Viveck R. Cadambe, Mohammad Ali Maddah-Ali

Abstract: Coding theory revolves around the incorporation of redundancy into transmitted symbols, computation tasks, and stored data to guard against adversarial manipulation. However, error correction in coding theory is contingent upon a strict trust assumption. In the context of computation and storage, it is required that honest nodes outnumber adversarial ones by a certain margin. However, in several e… ▽ More Coding theory revolves around the incorporation of redundancy into transmitted symbols, computation tasks, and stored data to guard against adversarial manipulation. However, error correction in coding theory is contingent upon a strict trust assumption. In the context of computation and storage, it is required that honest nodes outnumber adversarial ones by a certain margin. However, in several emerging real-world cases, particularly, in decentralized blockchain-oriented applications, such assumptions are often unrealistic. Consequently, despite the important role of coding in addressing significant challenges within decentralized systems, its applications become constrained. Still, in decentralized platforms, a distinctive characteristic emerges, offering new avenues for secure coding beyond the constraints of conventional methods. In these scenarios, the adversary benefits when the legitimate decoder recovers the data, and preferably with a high estimation error. This incentive motivates them to act rationally, trying to maximize their gains. In this paper, we propose a game theoretic formulation for coding, called the game of coding, that captures this unique dynamic where each of the adversary and the data collector (decoder) have a utility function to optimize. The utility functions reflect the fact that both the data collector and the adversary are interested in increasing the chance of data being recoverable by the data collector. Moreover, the utility functions express the interest of the data collector to estimate the input with lower estimation error, but the opposite interest of the adversary. As a first, still highly non-trivial step, we characterize the equilibrium of the game for the repetition code with a repetition factor of 2, for a wide class of utility functions with minimal assumptions. △ Less

Submitted 28 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2304.05691 [pdf, ps, other]

Vers: fully distributed Coded Computing System with Distributed Encoding

Authors: Nastaran Abadi Khooshemehr, Mohammad Ali Maddah-Ali

Abstract: Coded computing has proved to be useful in distributed computing. We have observed that almost all coded computing systems studied so far consider a setup of one master and some workers. However, recently emerging technologies such as blockchain, internet of things, and federated learning introduce new requirements for coded computing systems. In these systems, data is generated in a distributed m… ▽ More Coded computing has proved to be useful in distributed computing. We have observed that almost all coded computing systems studied so far consider a setup of one master and some workers. However, recently emerging technologies such as blockchain, internet of things, and federated learning introduce new requirements for coded computing systems. In these systems, data is generated in a distributed manner, so central encoding/decoding by a master is not feasible and scalable. This paper presents a fully distributed coded computing system that consists of $k\in\mathbb{N}$ data owners and $N\in\mathbb{N}$ workers, where data owners employ workers to do some computations on their data, as specified by a target function $f$ of degree $d\in\mathbb{N}$. As there is no central encoder, workers perform encoding themselves, prior to computation phase. The challenge in this system is the presence of adversarial data owners that do not know the data of honest data owners but cause discrepancies by sending different data to different workers, which is detrimental to local encodings in workers. There are at most $β\in\mathbb{N}$ adversarial data owners, and each sends at most $v\in\mathbb{N}$ different versions of data. Since the adversaries and their possibly colluded behavior are not known to workers and honest data owners, workers compute tags of their received data, in addition to their main computational task, and send them to data owners to help them in decoding. We introduce a tag function that allows data owners to partition workers into sets that previously had received the same data from all data owners. Then, we characterize the fundamental limit of the system, $t^*$, which is the minimum number of workers whose work can be used to correctly calculate the desired function of data of honest data owners. We show that $t^*=v^βd(K-1)+1$, and present converse and achievable proofs. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2302.09913 [pdf, other]

ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment

Authors: Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, Giuseppe Caire

Abstract: In this paper, we propose ByzSecAgg, an efficient secure aggregation scheme for federated learning that is protected against Byzantine attacks and privacy leakages. Processing individual updates to manage adversarial behavior, while preserving privacy of data against colluding nodes, requires some sort of secure secret sharing. However, the communication load for secret sharing of long vectors of… ▽ More In this paper, we propose ByzSecAgg, an efficient secure aggregation scheme for federated learning that is protected against Byzantine attacks and privacy leakages. Processing individual updates to manage adversarial behavior, while preserving privacy of data against colluding nodes, requires some sort of secure secret sharing. However, the communication load for secret sharing of long vectors of updates can be very high. ByzSecAgg solves this problem by partitioning local updates into smaller sub-vectors and sharing them using ramp secret sharing. However, this sharing method does not admit bi-linear computations, such as pairwise distance calculations, needed by outlier-detection algorithms. To overcome this issue, each user runs another round of ramp sharing, with different embedding of data in the sharing polynomial. This technique, motivated by ideas from coded computing, enables secure computation of pairwise distance. In addition, to maintain the integrity and privacy of the local update, ByzSecAgg also uses a vector commitment method, in which the commitment size remains constant (i.e. does not increase with the length of the local update), while simultaneously allowing verification of the secret sharing process. In terms of communication loads, ByzSecAgg significantly outperforms the state-of-the-art scheme, known as BREA. △ Less

Submitted 2 June, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2301.04753 [pdf, other]

Cache-Aided $K$-User Broadcast Channels with State Information at Receivers

Authors: Hadi Reisizadeh, Mohammad Ali Maddah-Ali, Soheil Mohajer

Abstract: We study a $K$-user coded-caching broadcast problem in a joint source-channel coding framework. The transmitter observes a database of files that are being generated at a certain rate per channel use, and each user has a cache, which can store a fixed fraction of the generated symbols. In the delivery phase, the transmitter broadcasts a message so that the users can decode their desired files usin… ▽ More We study a $K$-user coded-caching broadcast problem in a joint source-channel coding framework. The transmitter observes a database of files that are being generated at a certain rate per channel use, and each user has a cache, which can store a fixed fraction of the generated symbols. In the delivery phase, the transmitter broadcasts a message so that the users can decode their desired files using the received signal and their cache content. The communication between the transmitter and the receivers happens over a (deterministic) \textit{time-varying} erasure broadcast channel, and the channel state information is only available to the users. We characterize the maximum achievable source rate for the $2$-user and the degraded $K$-user problems. We provide an upper bound for any caching strategy's achievable source rates. Finally, we present a linear programming formulation to show that the upper bound is not a sharp characterization. Closing the gap between the achievable rate and the optimum rate remains open. △ Less

Submitted 11 October, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

arXiv:2209.11936 [pdf, other]

Online Admission Control and Rebalancing in Payment Channel Networks

Authors: Mahsa Bastankhah, Krishnendu Chatterjee, Mohammad Ali Maddah-Ali, Stefan Schmid, Jakub Svoboda, Michelle Yeo

Abstract: Payment channel networks (PCNs) are a promising technology to improve the scalability of cryptocurrencies. PCNs, however, face the challenge that the frequent usage of certain routes may deplete channels in one direction, and hence prevent further transactions. In order to reap the full potential of PCNs, recharging and rebalancing mechanisms are required to provision channels, as well as an admis… ▽ More Payment channel networks (PCNs) are a promising technology to improve the scalability of cryptocurrencies. PCNs, however, face the challenge that the frequent usage of certain routes may deplete channels in one direction, and hence prevent further transactions. In order to reap the full potential of PCNs, recharging and rebalancing mechanisms are required to provision channels, as well as an admission control logic to decide which transactions to reject in case capacity is insufficient. This paper presents a formal model of this optimisation problem. In particular, we consider an online algorithms perspective, where transactions arrive over time in an unpredictable manner. Our main contributions are competitive online algorithms which come with provable guarantees over time. We empirically evaluate our algorithms on randomly generated transactions to compare the average performance of our algorithms to our theoretical bounds. We also show how this model and approach differs from related problems in classic communication networks. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2207.08392 [pdf, other]

Bitcoin-Enhanced Proof-of-Stake Security: Possibilities and Impossibilities

Authors: Ertem Nusret Tas, David Tse, Fangyu Gai, Sreeram Kannan, Mohammad Ali Maddah-Ali, Fisher Yu

Abstract: Bitcoin is the most secure blockchain in the world, supported by the immense hash power of its Proof-of-Work miners. Proof-of-Stake chains are energy-efficient, have fast finality but face several security issues: susceptibility to non-slashable long-range safety attacks, low liveness resilience and difficulty to bootstrap from low token valuation. We show that these security issues are inherent i… ▽ More Bitcoin is the most secure blockchain in the world, supported by the immense hash power of its Proof-of-Work miners. Proof-of-Stake chains are energy-efficient, have fast finality but face several security issues: susceptibility to non-slashable long-range safety attacks, low liveness resilience and difficulty to bootstrap from low token valuation. We show that these security issues are inherent in any PoS chain without an external trusted source, and propose a new protocol, Babylon, where an off-the-shelf PoS protocol checkpoints onto Bitcoin to resolve these issues. An impossibility result justifies the optimality of Babylon. A use case of Babylon is to reduce the stake withdrawal delay: our experimental results show that this delay can be reduced from weeks in existing PoS chains to less than 5 hours using Babylon, at a transaction cost of less than 10K USD per annum for posting the checkpoints onto Bitcoin. △ Less

Submitted 12 May, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: Forthcoming in IEEE Symposium on Security and Privacy 2023

arXiv:2203.13060 [pdf, ps, other]

SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning

Authors: Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, Songze Li, Giuseppe Caire

Abstract: We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communica… ▽ More We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D=o(N)$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols, with a worst-case information-theoretic security guarantee, against any subset of up to $T=o(N)$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for $T<N-D$ and for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$ symbols, and per-user communication load of up to $(1+\frac{T+D}{K})L$ symbols, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$. △ Less

Submitted 8 September, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.04169

arXiv:2202.04696 [pdf, other]

Distributed Attribute-based Private Access Control

Authors: Amir Masoud Jafarpisheh, Mahtab Mirmohseni, Mohammad Ali Maddah-Ali

Abstract: In attribute-based access control, users with certain verified attributes will gain access to some particular data. Concerning with privacy of the users' attributes, we study the problem of distributed attribute-based private access control (DAPAC) with multiple authorities, where each authority will learn and verify only one of the attributes. To investigate its fundamental limits, we introduce… ▽ More In attribute-based access control, users with certain verified attributes will gain access to some particular data. Concerning with privacy of the users' attributes, we study the problem of distributed attribute-based private access control (DAPAC) with multiple authorities, where each authority will learn and verify only one of the attributes. To investigate its fundamental limits, we introduce an information theoretic DAPAC framework, with $N \in \mathbb{N}$, $N\geq 2$, replicated non-colluding servers (authorities) and some users. Each user has an attribute vector $\mathbf{v^*}=(v_1^*, ..., v_N^*)$ of dimension $N$ and is eligible to retrieve a message $W^{\mathbf{v}^*}$, available in all servers. Each server $n\in [N]$ is able to only observe and verify the $n$'th attribute of a user. In response, it sends a function of its data to the user. The system must satisfy the following conditions: (1) Correctness: the user with attribute vector $\mathbf{v^*}$ is able to retrieve his intended message $W^{\mathbf{v}^*}$ from the servers' response, (2) Data Secrecy: the user will not learn anything about the other messages, (3) Attribute Privacy: each Server~$n$ learns nothing beyond attribute $n$ of the user. The capacity of the DAPAC is defined as the ratio of the file size and the aggregated size of the responses, maximized over all feasible schemes. We obtain a lower bound on the capacity of this problem by proposing an achievable algorithm with rate $\frac{1}{2K}$, where $K$ is the size of the alphabet of each attribute. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2202.04169 [pdf, ps, other]

SwiftAgg: Communication-Efficient and Dropout-Resistant Secure Aggregation for Federated Learning with Worst-Case Security Guarantees

Authors: Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, Songze Li, Giuseppe Caire

Abstract: We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Spe… ▽ More We propose SwiftAgg, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N$ distributed users, each of size $L$, trained on their local data, in a privacy-preserving manner. Compared with state-of-the-art secure aggregation protocols, SwiftAgg significantly reduces the communication overheads without any compromise on security. Specifically, in presence of at most $D$ dropout users, SwiftAgg achieves a users-to-server communication load of $(T+1)L$ and a users-to-users communication load of up to $(N-1)(T+D+1)L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. The key idea of SwiftAgg is to partition the users into groups of size $D+T+1$, then in the first phase, secret sharing and aggregation of the individual models are performed within each group, and then in the second phase, model aggregation is performed on $D+T+1$ sequences of users across the groups. If a user in a sequence drops out in the second phase, the rest of the sequence remain silent. This design allows only a subset of users to communicate with each other, and only the users in a single group to directly communicate with the server, eliminating the requirements of 1) all-to-all communication network across users; and 2) all users communicating with the server, for other secure aggregation protocols. This helps to substantially slash the communication costs of the system. △ Less

Submitted 29 April, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

arXiv:2103.01589 [pdf, other]

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Authors: Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali

Abstract: Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, with… ▽ More Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, with $s \in \mathbb{N}$ stragglers and $a \in \mathbb{N}$ adversarial nodes. In particular, we show that the optimum communication cost, normalized by the size of the gradient vectors, is equal to $(r-s-2a)^{-1}$, where $r \in \mathbb{N}$ is the minimum number that a data partition is replicated. In other words, the communication cost is determined by the data partition with the minimum replication, irrespective of the structure of the placement. The proposed achievable scheme also allows us to target the computation of a polynomial function of the aggregated gradient matrix. It also allows us to borrow some ideas from approximation computing and propose an approximate gradient coding scheme for the cases when the repetition in data placement is smaller than what is needed to meet the restriction imposed on communication cost or when the number of stragglers appears to be more than the presumed value in the system design. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2103.01568 [pdf, other]

The Capacity Region of Distributed Multi-User Secret Sharing

Authors: Ali Khalesi, Mahtab Mirmohseni, Mohammad Ali Maddah-Ali

Abstract: In this paper, we study the problem of distributed multi-user secret sharing, including a trusted master node, $N\in \mathbb{N}$ storage nodes, and $K$ users, where each user has access to the contents of a subset of storage nodes. Each user has an independent secret message with certain rate, defined as the size of the message normalized by the size of a storage node. Having access to the secret… ▽ More In this paper, we study the problem of distributed multi-user secret sharing, including a trusted master node, $N\in \mathbb{N}$ storage nodes, and $K$ users, where each user has access to the contents of a subset of storage nodes. Each user has an independent secret message with certain rate, defined as the size of the message normalized by the size of a storage node. Having access to the secret messages, the trusted master node places encoded shares in the storage nodes, such that (i) each user can recover its own message from the content of the storage nodes that it has access to, (ii) each user cannot gain any information about the message of any other user. We characterize the capacity region of the distributed multi-user secret sharing, defined as the set of all achievable rate tuples, subject to the correctness and privacy constraints. In the achievable scheme, for each user, the master node forms a polynomial with the degree equal to the number of its accessible storage nodes minus one, where the value of this polynomial at certain points are stored as the encoded shares. The message of that user is embedded in some of the coefficients of the polynomial. The remaining coefficients are determined such that the content of each storage node serves as the encoded shares for all users that have access to that storage node. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2103.01344 [pdf, ps, other]

Multi-Party Proof Generation in QAP-based zk-SNARKs

Authors: Ali Rahimi, Mohammad Ali Maddah-Ali

Abstract: Zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) allows a party, known as the prover, to convince another party, known as the verifier, that he knows a private value $v$, without revealing it, such that $F(u,v)=y$ for some function $F$ and public values $u$ and $y$. There are various versions of zk-SNARK, among them, Quadratic Arithmetic Program (QAP)-based zk-SNARK has been… ▽ More Zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) allows a party, known as the prover, to convince another party, known as the verifier, that he knows a private value $v$, without revealing it, such that $F(u,v)=y$ for some function $F$ and public values $u$ and $y$. There are various versions of zk-SNARK, among them, Quadratic Arithmetic Program (QAP)-based zk-SNARK has been widely used in practice, specially in Blockchain technology. This is attributed to two desirable features; its fixed-size proof and the very light computation load of the verifier. However, the computation load of the prover in QAP-based zkSNARKs, is very heavy, even-though it is designed to be very efficient. This load can be beyond the prover's computation power to handle, and has to be offloaded to some external servers. In the existing offloading solutions, either (i) the load of computation, offloaded to each sever, is a fraction of the prover's primary computation (e.g., DZIK), however the servers need to be trusted, (ii) the servers are not required to be trusted, but the computation complexity imposed to each one is the same as the prover's primary computation (e.g., Trinocchio). In this paper, we present a scheme, which has the benefits of both solutions. In particular, we propose a secure multi-party proof generation algorithm where the prover can delegate its task to $N $ servers, where (i) even if a group of $T \in \mathbb{N}$ servers, $T\le N$, collude, they cannot gain any information about the secret value $v$, (ii) the computation complexity of each server is less than $1/(N-T)$ of the prover's primary computation. The design is such that we don't lose the efficiency of the prover's algorithm in the process of delegating the tasks to external servers. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 31 pages, 2 figures

arXiv:2102.02867 [pdf, ps, other]

The Discrepancy Attack on Polyshard-ed Blockchains

Authors: Nastaran Abadi Khooshemehr, Mohammad Ali Maddah-Ali

Abstract: Sharding, i.e. splitting the miners or validators to form and run several subchains in parallel, is known as one of the main solutions to the scalability problem of blockchains. The drawback is that as the number of miners expanding each subchain becomes small, it becomes vulnerable to security attacks. To solve this problem, a framework, named as \textit{Polyshard}, has been proposed in which eac… ▽ More Sharding, i.e. splitting the miners or validators to form and run several subchains in parallel, is known as one of the main solutions to the scalability problem of blockchains. The drawback is that as the number of miners expanding each subchain becomes small, it becomes vulnerable to security attacks. To solve this problem, a framework, named as \textit{Polyshard}, has been proposed in which each validator verifies a coded combination of the blocks introduced by different subchains, thus hel** to protect the security of all subchains. In this paper, we introduce an attack on Polyshard, called \textit{the discrepancy} attack, which is the result of malicious nodes controlling a few subchains and dispersing different blocks to different nodes. We show that this attack undermines the security of Polyshard and is undetectable in its current setting. △ Less

Submitted 20 August, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

arXiv:2010.10083 [pdf, other]

Bias-Resistant Social News Aggregator Based on Blockchain

Authors: Amir Ziashahabi, Mohammad Ali Maddah-Ali, Abbas Heydarnoori

Abstract: In today's world, social networks have become one of the primary sources for creation and propagation of news. Social news aggregators are one of the actors in this area in which users post news items and use positive or negative votes to indicate their preference toward a news item. News items will be ordered and displayed according to their aggregated votes. This approach suffers from several pr… ▽ More In today's world, social networks have become one of the primary sources for creation and propagation of news. Social news aggregators are one of the actors in this area in which users post news items and use positive or negative votes to indicate their preference toward a news item. News items will be ordered and displayed according to their aggregated votes. This approach suffers from several problems raging from being prone to the dominance of the majority to difficulty in discerning between correct and fake news, and lack of incentive for honest behaviors. In this paper, we propose a graph-based news aggregator in which instead of voting on the news items, users submit their votes on the relations between pairs of news items. More precisely, if a user believes two news items support each other, he will submit a positive vote on the link between the two items, and if he believes that two news items undermine each other, he will submit a negative vote on the corresponding link. This approach has mainly two desirable features: (1) mitigating the effect of personal preferences on voting, (2) connection of new items to endorsing and disputing evidence. This approach helps the newsreaders to understand different aspects of a news item better. We also introduce an incentive layer that uses blockchain as a distributed transparent manager to encourages users to behave honestly and abstain from adversary behaviors. The incentive layer takes into account that users can have different viewpoints toward news, enabling users from a wide range of viewpoints to contribute to the network and benefit from its rewards. In addition, we introduce a protocol that enables us to prove fraud in computations of the incentive layer model on the blockchain. Ultimately, we will analyze the fraud proof protocol and examine our incentive layer on a wide range of synthesized datasets. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: 23 page, 8 figures, Abstract abridged due to arXiv limits

arXiv:2009.08327 [pdf, other]

Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing

Authors: Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali

Abstract: One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of w… ▽ More One of the major challenges in using distributed learning to train complicated models with large data sets is to deal with stragglers effect. As a solution, coded computation has been recently proposed to efficiently add redundancy to the computation tasks. In this technique, coding is used across data sets, and computation is done over coded data, such that the results of an arbitrary subset of worker nodes with a certain size are enough to recover the final results. The major challenges with those approaches are (1) they are limited to polynomial function computations, (2) the size of the subset of servers that we need to wait for grows with the multiplication of the size of the data set and the model complexity (the degree of the polynomial), which can be prohibitively large, (3) they are not numerically stable for computation over real numbers. In this paper, we propose Berrut Approximated Coded Computing (BACC), as an alternative approach, which is not limited to polynomial function computation. In addition, the master node can approximately calculate the final results, using the outcomes of any arbitrary subset of available worker nodes. The approximation approach is proven to be numerically stable with low computational complexity. In addition, the accuracy of the approximation is established theoretically and verified by simulation results in different settings such as distributed learning problems. In particular, BACC is used to train a deep neural network on a cluster of servers, which outperforms repetitive computation (repetition coding) in terms of the rate of convergence. △ Less

Submitted 1 November, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

arXiv:2007.13230 [pdf, other]

Energy Efficiency Through Joint Routing and Function Placement in Different Modes of SDN/NFV Networks

Authors: Reza Moosavi, Saeedeh Parsaeefard, Mohammad Ali Maddah-Ali, Vahid Shah-Mansouri, Babak Hossein Khalaj, Mehdi Bennis

Abstract: Network function virtualization (NFV) and software defined networking (SDN) are two promising technologies to enable 5G and 6G services and achieve cost reduction, network scalability, and deployment flexibility. However, migration to full SDN/NFV networks in order to serve these services is a time consuming process and costly for mobile operators. This paper focuses on energy efficiency during… ▽ More Network function virtualization (NFV) and software defined networking (SDN) are two promising technologies to enable 5G and 6G services and achieve cost reduction, network scalability, and deployment flexibility. However, migration to full SDN/NFV networks in order to serve these services is a time consuming process and costly for mobile operators. This paper focuses on energy efficiency during the transition of mobile core networks (MCN) to full SDN/NFV networks, and explores how energy efficiency can be addressed during such migration. We propose a general system model containing a combination of legacy nodes and links, in addition to newly introduced NFV and SDN nodes. We refer to this system model as partial SDN and hybrid NFV MCN which can cover different modes of SDN and NFV implementations. Based on this framework, we formulate energy efficiency by considering joint routing and function placement in the network. Since this problem belongs to the class of non-linear integer programming problems, to solve it efficiently, we present a modified Viterbi algorithm (MVA) based on multi-stage graph modeling and a modified Dijkstra's algorithm. We simulate this algorithm for a number of network scenarios with different fractions of NFV and SDN nodes, and evaluate how much energy can be saved through such transition. Simulation results confirm the expected performance of the algorithm which saves up to 70% energy compared to network where all nodes are always on. Interestingly, the amount of energy saved by the proposed algorithm in the case of hybrid NFV and partial SDN networks can reach up to 60-90% of the saved energy in full NFV/SDN networks. △ Less

Submitted 17 October, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

arXiv:2004.04985 [pdf, other]

Coded Secure Multi-Party Computation for Massive Matrices with Adversarial Nodes

Authors: Seyed Reza Hoseini Najarkolaei, Mohammad Ali Maddah-Ali, Mohammad Reza Aref

Abstract: In this work, we consider the problem of secure multi-party computation (MPC), consisting of $Γ$ sources, each has access to a large private matrix, $N$ processing nodes or workers, and one data collector or master. The master is interested in the result of a polynomial function of the input matrices. Each source sends a randomized functions of its matrix, called as its share, to each worker. The… ▽ More In this work, we consider the problem of secure multi-party computation (MPC), consisting of $Γ$ sources, each has access to a large private matrix, $N$ processing nodes or workers, and one data collector or master. The master is interested in the result of a polynomial function of the input matrices. Each source sends a randomized functions of its matrix, called as its share, to each worker. The workers process their shares in interaction with each other, and send some results to the master such that it can derive the final result. There are several constraints: (1) each worker can store a function of each input matrix, with the size of $\frac{1}{m}$ fraction of that input matrix, (2) up to $t$ of the workers, for some integer $t$, are adversary and may collude to gain information about the private inputs or can do malicious actions to make the final result incorrect. The objective is to design an MPC scheme with the minimum number the workers, called the recovery threshold, such that the final result is correct, workers learn no information about the input matrices, and the master learns nothing beyond the final result. In this paper, we propose an MPC scheme that achieves the recovery threshold of $3t+2m-1$ workers, which is order-wise less than the recovery threshold of the conventional methods. The challenge in dealing with this set up is that when nodes interact with each other, the malicious messages that adversarial nodes generate propagate through the system, and can mislead the honest nodes. To deal with this challenge, we design some subroutines that can detect erroneous messages, and correct or drop them. △ Less

Submitted 10 April, 2020; originally announced April 2020.

Comments: 41 Pages

arXiv:2004.00811 [pdf, ps, other]

Fundamental Limits of Distributed Encoding

Authors: Nastaran Abadi Khooshemehr, Mohammad Ali Maddah-Ali

Abstract: In general coding theory, we often assume that error is observed in transferring or storing encoded symbols, while the process of encoding itself is error-free. Motivated by recent applications of coding theory, in this paper, we consider the case where the process of encoding is distributed and prone to error. We introduce the problem of distributed encoding, comprising of $K\in\mathbb{N}$ isolat… ▽ More In general coding theory, we often assume that error is observed in transferring or storing encoded symbols, while the process of encoding itself is error-free. Motivated by recent applications of coding theory, in this paper, we consider the case where the process of encoding is distributed and prone to error. We introduce the problem of distributed encoding, comprising of $K\in\mathbb{N}$ isolated source nodes and $N\in\mathbb{N}$ encoding nodes. Each source node has one symbol from a finite field and sends it to all encoding nodes. Each encoding node stores an encoded symbol, as a function of the received symbols. However, some of the source nodes are controlled by the adversary and may send different symbols to different encoding nodes. Depending on the number of adversarial nodes, denoted by $β\in\mathbb{N}$, and the number of symbols that each one generates, denoted by $v\in\mathbb{N}$, the process of decoding from the encoded symbols could be impossible. Assume that a decoder connects to an arbitrary subset of $t \in\mathbb{N}$ encoding nodes and wants to decode the symbols of the honest nodes correctly, without necessarily identifying the sets of honest and adversarial nodes. In this paper, we study $t^*\in\mathbb{N}$, the minimum of $t$, which is a function of $K$, $N$, $β$, and $v$. We show that when the encoding nodes use linear coding, $t^*_{\textrm{linear}}=K+2β(v-1)$, if $N\ge K+2β(v-1)$, and $t^*_{\textrm{linear}}=N$, if $N\le K+2β(v-1)$. In order to achieve $t^*_{\textrm{linear}}$, we use random linear coding and show that in any feasible solution that the decoder finds, the messages of the honest nodes are decoded correctly. For the converse of the fundamental limit, we show that when the adversary behaves in a particular way, it can always confuse the decoder between two feasible solutions that differ in the message of at least one honest node. △ Less

Submitted 27 February, 2021; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:2003.12423 [pdf, other]

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

Authors: Naeimeh Omidvar, Mohammad Ali Maddah-Ali, Hamed Mahdavi

Abstract: In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm, the worker nodes calculate and communicate some scalers, that are the directional derivatives of the sample functions in some \emph{pre-shared directions}. How… ▽ More In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm, the worker nodes calculate and communicate some scalers, that are the directional derivatives of the sample functions in some \emph{pre-shared directions}. However, to maintain accuracy, after every specific number of iterations, they communicate the vectors of stochastic gradients. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order stochastic gradient estimation, by performing just two function evaluations rather than computing a first-order gradient vector. The proposed method highly improves the convergence rate of the zeroth-order methods, guaranteeing order-wise faster convergence. Moreover, compared to the famous communication-efficient methods of model averaging (that perform local model updates and periodic communication of the gradients to synchronize the local models), we prove that for the general class of non-convex stochastic problems and with reasonable choice of parameters, the proposed method guarantees the same orders of communication load and convergence rate, while having order-wise less computational complexity. Experimental results on various learning problems in neural networks applications demonstrate the effectiveness of the proposed approach compared to various state-of-the-art distributed SGD methods. △ Less

Submitted 27 March, 2020; originally announced March 2020.

ACM Class: G.1.6

arXiv:2003.12052 [pdf, other]

Corella: A Private Multi Server Learning Approach based on Correlated Queries

Authors: Hamidreza Ehteram, Mohammad Ali Maddah-Ali, Mahtab Mirmohseni

Abstract: The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud or at the edge of the network. One of the major challenges in this setup is to guarantee the privacy of the client data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding n… ▽ More The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud or at the edge of the network. One of the major challenges in this setup is to guarantee the privacy of the client data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation (MPC), which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption (HE) methods, which significantly increases computation load at the servers. In this paper, we propose $\textit{Corella}$ as an alternative approach to protect the privacy of data. The proposed scheme relies on a cluster of servers, where at most $T \in \mathbb{N}$ of them may collude, each running a learning model (e.g., a deep neural network). Each server is fed with the client data, added with $\textit{strong}$ noise, independent from user data. The variance of the noise is set to be large enough to make the information leakage to any subset of up to $T$ servers information-theoretically negligible. On the other hand, the added noises for different servers are $\textit{correlated}$. This correlation among the queries allows the parameters of the models running on different servers to be $\textit{trained}$ such that the client can mitigate the contribution of the noises by combining the outputs of the servers, and recover the final result with high accuracy and with a minor computational effort. Simulation results for various datasets demonstrate the accuracy of the proposed approach for the classification, using deep neural networks, and the autoencoder, as supervised and unsupervised learning tasks, respectively. △ Less

Submitted 27 July, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

Comments: 13 pages, 9 figures, 4 tables

arXiv:2003.11424 [pdf, other]

BlockMarkchain: A Secure Decentralized Data Market with a Constant Load on the Blockchain

Authors: Hamidreza Ehteram, Mohammad Taha Toghani, Mohammad Ali Maddah-Ali

Abstract: In this paper, we develop BlockMarkchain, as a secure data market place, where individual data sellers can exchange certified data with buyers, in a secure environment, without any mutual trust among the parties, and without trusting on a third party, as a mediator. To develop this platform, we rely on a smart contract, deployed on a secure public blockchain. The main challenges here are to verify… ▽ More In this paper, we develop BlockMarkchain, as a secure data market place, where individual data sellers can exchange certified data with buyers, in a secure environment, without any mutual trust among the parties, and without trusting on a third party, as a mediator. To develop this platform, we rely on a smart contract, deployed on a secure public blockchain. The main challenges here are to verify the validity of data and to prevent malicious behavior of the parties, while preserving the privacy of the data and taking into account the limited computing and storage resources available on the blockchain. In BlockMarkchain, the buyer has the option to dispute the honesty of the seller and prove the invalidity of the data to the smart contract. The smart contract evaluates the buyer's claim and punishes the dishonest party by forfeiting his/her deposit in favor of the honest party. BlockMarkchain enjoys several salient features including (i) the certified data has never been revealed on the public blockchain, (ii) the size of data posted on the blockchain, the load of computation on the blockchain, and the cost of communication with the blockchain is constant and negligible, and (iii) the computation cost of verifications on the parties is not expensive. △ Less

Submitted 25 March, 2020; originally announced March 2020.

Comments: 16 pages, 4 figures

arXiv:1908.04255 [pdf, other]

Secure Coded Multi-Party Computation for Massive Matrix Operations

Authors: Hanzaleh Akbari Nodehi, Mohammad Ali Maddah-Ali

Abstract: In this paper, we consider a secure multi-party computation problem (MPC), where the goal is to offload the computation of an arbitrary polynomial function of some massive private matrices (inputs) to a cluster of workers. The workers are not reliable. Some of them may collude to gain information about the input data (semi-honest workers). The system is initialized by sharing a (randomized) functi… ▽ More In this paper, we consider a secure multi-party computation problem (MPC), where the goal is to offload the computation of an arbitrary polynomial function of some massive private matrices (inputs) to a cluster of workers. The workers are not reliable. Some of them may collude to gain information about the input data (semi-honest workers). The system is initialized by sharing a (randomized) function of each input matrix to each server. Since the input matrices are massive, each share's size is assumed to be at most $1/k$ fraction of the input matrix, for some $k \in \mathbb{N}$. The objective is to minimize the number of workers needed to perform the computation task correctly, such that even if an arbitrary subset of $t-1$ workers, for some $t \in \mathbb{N}$, collude, they cannot gain any information about the input matrices. We propose a sharing scheme, called \emph{polynomial sharing}, and show that it admits basic operations such as adding and multiplication of matrices and transposing a matrix. By concatenating the procedures for basic operations, we show that any polynomial function of the input matrices can be calculated, subject to the problem constraints. We show that the proposed scheme can offer order-wise gain in terms of the number of workers needed, compared to the approaches formed by the concatenation of job splitting and conventional MPC approaches. △ Less

Submitted 15 September, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

arXiv:1908.01204 [pdf, ps, other]

Private Sequential Function Computation

Authors: Behrooz Tahmasebi, Mohammad Ali Maddah-Ali

Abstract: Consider a system, including a user, $N$ servers, and $K$ basic functions which are known at all of the servers. Using the combination of those basic functions, it is possible to construct a wide class of functions. The user wishes to compute a particular combination of the basic functions, by offloading the computation to $N$ servers, while the servers should not obtain any information about whic… ▽ More Consider a system, including a user, $N$ servers, and $K$ basic functions which are known at all of the servers. Using the combination of those basic functions, it is possible to construct a wide class of functions. The user wishes to compute a particular combination of the basic functions, by offloading the computation to $N$ servers, while the servers should not obtain any information about which combination of the basic functions is to be computed. The objective is to minimize the total number of queries asked by the user from the servers to achieve the desired result. As a first step toward this problem, in this paper, we consider the case where the user is interested in a class of functions which are composition of the basic functions, while each basic function appears in the composition exactly once. This means that in this case, to ensure privacy, we only require to hide to the order of the basic functions in the desired composition of the user. We further assume that the basic functions are linear and can be represented by (possibly large-scale) matrices. We call this problem as private sequential function computation. We study the capacity $C$, defined as the supremum of the number of desired computations, normalized by the number of computations done at the servers, subject to the privacy constraint. We prove that $(1-\frac{1}{N})/ (1-\frac{1}{\max(K,N)}) \le C \le 1$. For the achievability, we show that the user can retrieve the desired order of composition, by choosing a proper order of inquiries among different servers, while kee** the order of computations for each server fixed, irrespective of the desired order of composition. In the end, we develop an information-theoretic converse which results in an upper bound on the capacity. △ Less

Submitted 7 March, 2020; v1 submitted 3 August, 2019; originally announced August 2019.

Comments: 28 pages. This paper has been presented at IEEE ISIT 2019

arXiv:1907.04302 [pdf, ps, other]

Interactive Verifiable Polynomial Evaluation

Authors: Saeid Sahraei, Mohammad Ali Maddah-Ali, Salman Avestimehr

Abstract: Cloud computing platforms have created the possibility for computationally limited users to delegate demanding tasks to strong but untrusted servers. Verifiable computing algorithms help build trust in such interactions by enabling the server to provide a proof of correctness of his results which the user can check very efficiently. In this paper, we present a doubly-efficient interactive algorith… ▽ More Cloud computing platforms have created the possibility for computationally limited users to delegate demanding tasks to strong but untrusted servers. Verifiable computing algorithms help build trust in such interactions by enabling the server to provide a proof of correctness of his results which the user can check very efficiently. In this paper, we present a doubly-efficient interactive algorithm for verifiable polynomial evaluation. Unlike the mainstream literature on verifiable computing, the soundness of our algorithm is information-theoretic and cannot be broken by a computationally unbounded server. By relying on basic properties of error correcting codes, our algorithm enforces a dishonest server to provide false results to problems which become progressively easier to verify. After roughly $\log d$ rounds, the user can verify the response of the server against a look-up table that has been pre-computed during an initialization phase. For a polynomial of degree $d$, we achieve a user complexity of $O(d^ε)$, a server complexity of $O(d^{1+ε})$, a round complexity of $O(\log d)$ and an initialization complexity of $O(d^{1+ε})$. △ Less

Submitted 9 July, 2019; originally announced July 2019.

arXiv:1904.00800 [pdf, ps, other]

Private Shotgun DNA Sequencing: A Structured Approach

Authors: Ali Gholami, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari

Abstract: DNA sequencing has faced a huge demand since it was first introduced as a service to the public. This service is often offloaded to the sequencing companies who will have access to full knowledge of individuals' sequences, a major violation of privacy. To address this challenge, we propose a solution, which is based on separating the process of reading the fragments of sequences, which is done at… ▽ More DNA sequencing has faced a huge demand since it was first introduced as a service to the public. This service is often offloaded to the sequencing companies who will have access to full knowledge of individuals' sequences, a major violation of privacy. To address this challenge, we propose a solution, which is based on separating the process of reading the fragments of sequences, which is done at a sequencing machine, and assembling the reads, which is done at a trusted local data collector. To confuse the sequencer, in a pooled sequencing scenario, in which multiple sequences are going to be sequenced simultaneously, for each target individual, we add fragments of one non-target individual, with a known DNA sequence at the data collector. Then coverage depth of the individuals, defined as the number of DNA fragments per DNA site, are selected proportional to the powers of two. This layered structured solution allows us to ensure privacy, using only one sequencing machine, in contrast to our previous solution, where we relied on the existence of multiple non-colluding sequencing machines. △ Less

Submitted 2 April, 2019; v1 submitted 28 March, 2019; originally announced April 2019.

Comments: 10 pages, 3 figures. arXiv admin note: text overlap with arXiv:1811.10693

ACM Class: E.4; H.1.1

arXiv:1902.06319 [pdf, ps, other]

Private Inner Product Retrieval for Distributed Machine Learning

Authors: Mohammad Hossein Mousavi, Mohammad Ali Maddah-Ali, Mahtab Mirmohseni

Abstract: In this paper, we argue that in many basic algorithms for machine learning, including support vector machine (SVM) for classification, principal component analysis (PCA) for dimensionality reduction, and regression for dependency estimation, we need the inner products of the data samples, rather than the data samples themselves. Motivated by the above observation, we introduce the problem of pri… ▽ More In this paper, we argue that in many basic algorithms for machine learning, including support vector machine (SVM) for classification, principal component analysis (PCA) for dimensionality reduction, and regression for dependency estimation, we need the inner products of the data samples, rather than the data samples themselves. Motivated by the above observation, we introduce the problem of private inner product retrieval for distributed machine learning, where we have a system including a database of some files, duplicated across some non-colluding servers. A user intends to retrieve a subset of specific size of the inner products of the data files with minimum communication load, without revealing any information about the identity of the requested subset. For achievability, we use the algorithms for multi-message private information retrieval. For converse, we establish that as the length of the files becomes large, the set of all inner products converges to independent random variables with uniform distribution, and derive the rate of convergence. To prove that, we construct special dependencies among sequences of the sets of all inner products with different length, which forms a time-homogeneous irreducible Markov chain, without affecting the marginal distribution. We show that this Markov chain has a uniform distribution as its unique stationary distribution, with rate of convergence dominated by the second largest eigenvalue of the transition probability matrix. This allows us to develop a converse, which converges to a tight bound in some cases, as the size of the files becomes large. While this converse is based on the one in multi-message private information retrieval, due to the nature of retrieving inner products instead of data itself some changes are made to reach the desired result. △ Less

Submitted 17 February, 2019; originally announced February 2019.

arXiv:1901.06698 [pdf, other]

Cloud-Aided Interference Management with Cache-Enabled Edge Nodes and Users

Authors: Seyed Pooya Shariatpanahi, **g**g Zhang, Osvaldo Simeone, Babak Hossein Khalaj, Mohammad-Ali Maddah-Ali

Abstract: This paper considers a cloud-RAN architecture with cache-enabled multi-antenna Edge Nodes (ENs) that deliver content to cache-enabled end-users. The ENs are connected to a central server via limited-capacity fronthaul links, and, based on the information received from the central server and the cached contents, they transmit on the shared wireless medium to satisfy users' requests. By leveraging c… ▽ More This paper considers a cloud-RAN architecture with cache-enabled multi-antenna Edge Nodes (ENs) that deliver content to cache-enabled end-users. The ENs are connected to a central server via limited-capacity fronthaul links, and, based on the information received from the central server and the cached contents, they transmit on the shared wireless medium to satisfy users' requests. By leveraging cooperative transmission as enabled by ENs' caches and fronthaul links, as well as multicasting opportunities provided by users' caches, a close-to-optimal caching and delivery scheme is proposed. As a result, the minimum Normalized Delivery Time (NDT), a high-SNR measure of delivery latency, is characterized to within a multiplicative constant gap of $3/2$ under the assumption of uncoded caching and fronthaul transmission, and of one-shot linear precoding. This result demonstrates the interplay among fronthaul links capacity, ENs' caches, and end-users' caches in minimizing the content delivery time. △ Less

Submitted 20 January, 2019; originally announced January 2019.

Comments: 9 pages, 3 figures, submitted

arXiv:1812.10460 [pdf, other]

CodedSketch: A Coding Scheme for Distributed Computation of Approximated Matrix Multiplication

Authors: Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali

Abstract: In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to compute an approximation of the multiplication of two massive matrices. The objective is to reduce the recovery threshold, defined as the total number of worker nodes that we need to wait for to be able to recover the final result. To exploit the fact that only an approximated result is required, in reducing the… ▽ More In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to compute an approximation of the multiplication of two massive matrices. The objective is to reduce the recovery threshold, defined as the total number of worker nodes that we need to wait for to be able to recover the final result. To exploit the fact that only an approximated result is required, in reducing the recovery threshold, some sorts of pre-compression are required. However, compression inherently involves some randomness that would lose the structure of the matrices. On the other hand, considering the structure of the matrices is crucial to reduce the recovery threshold. In CodedSketch, we use count--sketch, as a hash-based compression scheme, on the rows of the first and columns of the second matrix, and a structured polynomial code on the columns of the first and rows of the second matrix. This arrangement allows us to exploit the gain of both in reducing the recovery threshold. To increase the accuracy of computation, multiple independent count--sketches are needed. This independency allows us to theoretically characterize the accuracy of the result and establish the recovery threshold achieved by the proposed scheme. To guarantee the independency of resulting count--sketches in the output, while kee** its cost on the recovery threshold minimum, we use another layer of structured codes. △ Less

Submitted 12 February, 2021; v1 submitted 26 December, 2018; originally announced December 2018.

arXiv:1808.03708 [pdf, other]

The Capacity of Associated Subsequence Retrieval

Authors: Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari

Abstract: The objective of a genome-wide association study (GWAS) is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes (e.g., high blood pressure). Motivated by the GWAS problem, in this paper we introduce the information-theoretic problem of \emph{associated subsequence retrieval}, where a dataset of $N$ (possibly high-dimensional) sequences of length… ▽ More The objective of a genome-wide association study (GWAS) is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes (e.g., high blood pressure). Motivated by the GWAS problem, in this paper we introduce the information-theoretic problem of \emph{associated subsequence retrieval}, where a dataset of $N$ (possibly high-dimensional) sequences of length $G$, and their corresponding observable (binary) characteristics is given. The sequences are chosen independently and uniformly at random from $\mathcal{X}^G$, where $\mathcal{X}$ is a finite alphabet. The observable (binary) characteristic is only related to a specific unknown subsequence of length $L$ of the sequences, called \textit{associated subsequence}. For each sequence, if the associated subsequence of it belongs to a universal finite set, then it is more likely to display the observable characteristic (i.e., it is more likely that the observable characteristic is one). The goal is to retrieve the associated subsequence using a dataset of $N$ sequences and their observable characteristics. We demonstrate that as the parameters $N$, $G$, and $L$ grow, a threshold effect appears in the curve of probability of error versus the rate which is defined as ${Gh(L/G)}/{N}$, where $h(\cdot)$ is the binary entropy function. This effect allows us to define the capacity of associated subsequence retrieval. We develop an achievable scheme and a matching converse for this problem, and thus characterize its capacity in two scenarios: the zero-error-rate and the $ε$-error-rate. △ Less

Submitted 14 October, 2020; v1 submitted 10 August, 2018; originally announced August 2018.

arXiv:1807.08646 [pdf, other]

$K$--User Interference Channel with Backhaul Cooperation: DoF vs. Backhaul Load Trade--Off

Authors: Borna Kananian, Mohammad Ali Maddah-ali, Babak Hossein Khalaj

Abstract: In this paper, we consider multiple-antenna $K$-user interference channels with backhaul collaboration in one side (among the transmitters or among the receivers) and investigate the trade-off between the rate in the channel versus the communication load in the backhaul. In this investigation, we focus on a first order approximation result, where the rate of the wireless channel is measured by the… ▽ More In this paper, we consider multiple-antenna $K$-user interference channels with backhaul collaboration in one side (among the transmitters or among the receivers) and investigate the trade-off between the rate in the channel versus the communication load in the backhaul. In this investigation, we focus on a first order approximation result, where the rate of the wireless channel is measured by the degrees of freedom (DoF) per user, and the load of the backhaul is measured by the entropy of backhaul messages per user normalized by $\log$ of transmit power, at high power regimes. This trade-off is fully characterized for the case of even values of $K$, and approximately characterized for the case of odd values of $K$, with vanishing approximation gap as $K$ grows. For full DoF, this result establishes the optimality (approximately) of the most straightforward scheme, called Centralized Scheme, in which the messages are collected at one of the nodes, centrally processed, and forwarded back to each node. In addition, this result shows that the gain of the schemes, relying on distributed processing, through pairwise communication among the nodes (e.g., cooperative alignment) does not scale with the size of the network. For the converse, we develop a new outer-bound on the trade-off based on splitting the set of collaborative nodes (transmitters or receivers) into two subsets, and assuming full cooperation within each group. In continue, we further investigate the trade-off for the cases, where the backhaul or the wireless links (interference channel) are not fully connected. △ Less

Submitted 23 July, 2018; originally announced July 2018.

arXiv:1807.03337 [pdf, other]

Optimum Transmission Delay for Function Computation in NFV-based Networks: the role of Network Coding and Redundant Computing

Authors: Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Saeedeh Parsaeefard, Babak Hossein Khalaj

Abstract: In this paper, we study the problem of delay minimization in NFV-based networks. In such systems, the ultimate goal of any request is to compute a sequence of functions in the network, where each function can be computed at only a specific subset of network nodes. In conventional approaches, for each function, we choose one node from the corresponding subset of the nodes to compute that function.… ▽ More In this paper, we study the problem of delay minimization in NFV-based networks. In such systems, the ultimate goal of any request is to compute a sequence of functions in the network, where each function can be computed at only a specific subset of network nodes. In conventional approaches, for each function, we choose one node from the corresponding subset of the nodes to compute that function. In contrast, in this work, we allow each function to be computed in more than one node, redundantly in parallel, to respond to a given request. We argue that such redundancy in computation not only improves the reliability of the network, but would also, perhaps surprisingly, reduce the overall transmission delay. In particular, we establish that by judiciously choosing the subset of nodes which compute each function, in conjunction with a linear network coding scheme to deliver the result of each computation, we can characterize and achieve the optimal end-to-end transmission delay. In addition, we show that using such technique, we can significantly reduce the transmission delay as compared to the conventional approach. In some scenarios, such reduction can even scale with the size of the network. More precisely, by increasing the number of nodes that can compute the given function in parallel by a multiplicative factor, the end-to-end delay will also decrease by the same factor. Moreover, we show that while finding the subset of nodes for each computation, in general, is a complex integer program, approximation algorithms can be proposed to reduce the computational complexity. In fact, for the case where the number of computing nodes for a given function is upper-bounded by a constant, a dynamic programming scheme can be proposed to find the optimum subsets in polynomial times. Our numerical simulations confirm the achieved gain in performance in comparison with conventional approaches. △ Less

Submitted 11 September, 2018; v1 submitted 9 July, 2018; originally announced July 2018.

Comments: Revised Version

arXiv:1805.11892 [pdf, ps, other]

Multi-Message Private Information Retrieval with Private Side Information

Authors: Seyed Pooya Shariatpanahi, Mahdi Jafari Siavoshani, Mohammad Ali Maddah-Ali

Abstract: We consider the problem of private information retrieval (PIR) where a single user with private side information aims to retrieve multiple files from a library stored (uncoded) at a number of servers. We assume the side information at the user includes a subset of files stored privately (i.e., the server does not know the indices of these files). In addition, we require that the identity of the re… ▽ More We consider the problem of private information retrieval (PIR) where a single user with private side information aims to retrieve multiple files from a library stored (uncoded) at a number of servers. We assume the side information at the user includes a subset of files stored privately (i.e., the server does not know the indices of these files). In addition, we require that the identity of the requests and side information at the user are not revealed to any of the servers. The problem involves finding the minimum load to be transmitted from the servers to the user such that the requested files can be decoded with the help of received and side information. By providing matching lower and upper bounds, for certain regimes, we characterize the minimum load imposed to all the servers (i.e., the capacity of this PIR problem). Our result shows that the capacity is the same as the capacity of a multi-message PIR problem without private side information, but with a library of reduced size. The effective size of the library is equal to the original library size minus the size of side information. △ Less

Submitted 30 May, 2018; originally announced May 2018.

arXiv:1805.01993 [pdf, other]

Compressed Coded Distributed Computing

Authors: Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: Communication overhead is one of the major performance bottlenecks in large-scale distributed computing systems, in particular for machine learning applications. Conventionally, compression techniques are used to reduce the load of communication by combining intermediate results of the same computation task as much as possible. Recently, via the development of coded distributed computing (CDC), it… ▽ More Communication overhead is one of the major performance bottlenecks in large-scale distributed computing systems, in particular for machine learning applications. Conventionally, compression techniques are used to reduce the load of communication by combining intermediate results of the same computation task as much as possible. Recently, via the development of coded distributed computing (CDC), it has been shown that it is possible to enable coding opportunities across intermediate results of different computation tasks to further reduce the communication load. We propose a new scheme, named compressed coded distributed computing (in short, compressed CDC), which jointly exploits the above two techniques (i.e., combining the intermediate results of the same computation and coding across the intermediate results of different computations) to significantly reduce the communication load for computations with linear aggregation (reduction) of intermediate results in the final stage that are prevalent in machine learning (e.g., distributed training algorithms where partial gradients are computed distributedly and then averaged in the final stage). In particular, compressed CDC first compresses/combines several intermediate results for a single computation, and then utilizes multiple such combined packets to create a coded multicast packet that is simultaneously useful for multiple computations. We characterize the achievable communication load of compressed CDC and show that it substantially outperforms both combining methods and CDC scheme. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: A shorter version to appear in ISIT 2018

arXiv:1801.07487 [pdf, other]

doi 10.1109/TIT.2019.2963864

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

Authors: Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangl… ▽ More We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangled polynomial code}, for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of $2$ using \emph{bilinear complexity}, by develo** an improved version of the entangled polynomial code. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of $2$ of this fundamental quantity. On the other hand, the improved version of the entangled polynomial code enables further and orderwise reduction in the recovery threshold, compared to its basic version. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault-tolerant computing, leading to tight characterizations. △ Less

Submitted 9 April, 2020; v1 submitted 23 January, 2018; originally announced January 2018.

Journal ref: Published in: IEEE Transactions on Information Theory (Jan. 2020)

arXiv:1711.04677 [pdf, ps, other]

Private Function Retrieval

Authors: Mahtab Mirmohseni, Mohammad Ali Maddah-Ali

Abstract: The widespread use of cloud computing services raises the question of how one can delegate the processing tasks to the untrusted distributed parties without breeching the privacy of its data and algorithms. Motivated by the algorithm privacy concerns in a distributed computing system, in this paper, we introduce the private function retrieval (PFR) problem, where a user wishes to efficiently retri… ▽ More The widespread use of cloud computing services raises the question of how one can delegate the processing tasks to the untrusted distributed parties without breeching the privacy of its data and algorithms. Motivated by the algorithm privacy concerns in a distributed computing system, in this paper, we introduce the private function retrieval (PFR) problem, where a user wishes to efficiently retrieve a linear function of $K$ messages from $N$ non-communicating replicated servers while kee** the function hidden from each individual server. The goal is to find a scheme with minimum communication cost. To characterize the fundamental limits of the communication cost, we define the capacity of PFR problem as the size of the message that can be privately retrieved (which is the size of one file) normalized to the required downloaded information bits. We first show that for the PFR problem with $K$ messages, $N=2$ servers and a linear function with binary coefficients the capacity is $C=\frac{1}{2}\Big(1-\frac{1}{2^K}\Big)^{-1}$. Interestingly, this is the capacity of retrieving one of $K$ messages from $N=2$ servers while kee** the index of the requested message hidden from each individual server, the problem known as private information retrieval (PIR). Then, we extend the proposed achievable scheme to the case of arbitrary number of servers and coefficients in the field $GF(q)$ with arbitrary $q$ and obtain $R=\Big(1-\frac{1}{N}\Big)\Big(1+\frac{\frac{1}{N-1}}{(\frac{q^K-1}{q-1})^{N-1}}\Big)$. △ Less

Submitted 15 November, 2017; v1 submitted 13 November, 2017; originally announced November 2017.

arXiv:1710.06471 [pdf, other]

Coded Fourier Transform

Authors: Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider the problem of computing the Fourier transform of high-dimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs. We show that by exploiting the algebraic structure of the Fourier transform operation and leveraging concepts from coding theory, one can ef… ▽ More We consider the problem of computing the Fourier transform of high-dimensional vectors, distributedly over a cluster of machines consisting of a master node and multiple worker nodes, where the worker nodes can only store and process a fraction of the inputs. We show that by exploiting the algebraic structure of the Fourier transform operation and leveraging concepts from coding theory, one can efficiently deal with the straggler effects. In particular, we propose a computation strategy, named as coded FFT, which achieves the optimal recovery threshold, defined as the minimum number of workers that the master node needs to wait for in order to compute the output. This is the first code that achieves the optimum robustness in terms of tolerating stragglers or failures for computing Fourier transforms. Furthermore, the reconstruction process for coded FFT can be mapped to MDS decoding, which can be solved efficiently. Moreover, we extend coded FFT to settings including computing general $n$-dimensional Fourier transforms, and provide the optimal computing strategy for those settings. △ Less

Submitted 17 October, 2017; originally announced October 2017.

arXiv:1706.07523 [pdf, other]

Communication-Aware Computing for Edge Processing

Authors: Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider a mobile edge computing problem, in which mobile users offload their computation tasks to computing nodes (e.g., base stations) at the network edge. The edge nodes compute the requested functions and communicate the computed results to the users via wireless links. For this problem, we propose a Universal Coded Edge Computing (UCEC) scheme for linear functions to simultaneously minimiz… ▽ More We consider a mobile edge computing problem, in which mobile users offload their computation tasks to computing nodes (e.g., base stations) at the network edge. The edge nodes compute the requested functions and communicate the computed results to the users via wireless links. For this problem, we propose a Universal Coded Edge Computing (UCEC) scheme for linear functions to simultaneously minimize the load of computation at the edge nodes, and maximize the physical-layer communication efficiency towards the mobile users. In the proposed UCEC scheme, edge nodes create coded inputs of the users, from which they compute coded output results. Then, the edge nodes utilize the computed coded results to create communication messages that zero-force all the interference signals over the air at each user. Specifically, the proposed scheme is universal since the coded computations performed at the edge nodes are oblivious of the channel states during the communication process from the edge nodes to the users. △ Less

Submitted 22 June, 2017; originally announced June 2017.

Comments: To Appear in ISIT 2017

arXiv:1705.10464 [pdf, other]

Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

Authors: Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to efficiently deal with straggling w… ▽ More We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts of the input matrices. We propose a computation strategy that leverages ideas from coding theory to design intermediate computations at the worker nodes, in order to efficiently deal with straggling workers. The proposed strategy, named as \emph{polynomial codes}, achieves the optimum recovery threshold, defined as the minimum number of workers that the master needs to wait for in order to compute the output. Furthermore, by leveraging the algebraic structure of polynomial codes, we can map the reconstruction problem of the final output to a polynomial interpolation problem, which can be solved efficiently. Polynomial codes provide order-wise improvement over the state of the art in terms of recovery threshold, and are also optimal in terms of several other metrics. Furthermore, we extend this code to distributed convolution and show its order-wise optimality. △ Less

Submitted 24 January, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

arXiv:1702.07297 [pdf, other]

How to Optimally Allocate Resources for Coded Distributed Computing?

Authors: Qian Yu, Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute computations across many nodes to take advantage of parallel processing. However, as we allocate more and more computing resources to a computation task and furt… ▽ More Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute computations across many nodes to take advantage of parallel processing. However, as we allocate more and more computing resources to a computation task and further distribute the computations, large amounts of (partially) computed data must be moved between consecutive stages of computation tasks among the nodes, hence the communication load can become the bottleneck. In this paper, we study the optimal allocation of computing resources in distributed computing, in order to minimize the total execution time in distributed computing accounting for both the duration of computation and communication phases. In particular, we consider a general MapReduce-type distributed computing framework, in which the computation is decomposed into three stages: \emph{Map}, \emph{Shuffle}, and \emph{Reduce}. We focus on a recently proposed \emph{Coded Distributed Computing} approach for MapReduce and study the optimal allocation of computing resources in this framework. For all values of problem parameters, we characterize the optimal number of servers that should be used for distributed processing, provide the optimal placements of the Map and Reduce tasks, and propose an optimal coded data shuffling scheme, in order to minimize the total execution time. To prove the optimality of the proposed scheme, we first derive a matching information-theoretic converse on the execution time, then we prove that among all possible resource allocation schemes that achieve the minimum execution time, our proposed scheme uses the exactly minimum possible number of servers. △ Less

Submitted 23 February, 2017; originally announced February 2017.

arXiv:1702.06082 [pdf, other]

Coding for Distributed Fog Computing

Authors: Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: Redundancy is abundant in Fog networks (i.e., many computing and storage points) and grows linearly with network size. We demonstrate the transformational role of coding in Fog computing for leveraging such redundancy to substantially reduce the bandwidth consumption and latency of computing. In particular, we discuss two recently proposed coding concepts, namely Minimum Bandwidth Codes and Minimu… ▽ More Redundancy is abundant in Fog networks (i.e., many computing and storage points) and grows linearly with network size. We demonstrate the transformational role of coding in Fog computing for leveraging such redundancy to substantially reduce the bandwidth consumption and latency of computing. In particular, we discuss two recently proposed coding concepts, namely Minimum Bandwidth Codes and Minimum Latency Codes, and illustrate their impacts in Fog computing. We also review a unified coding framework that includes the above two coding techniques as special cases, and enables a tradeoff between computation latency and communication load to optimize system performance. At the end, we will discuss several open problems and future research directions. △ Less

Submitted 20 February, 2017; originally announced February 2017.

Comments: To appear in IEEE Communications Magazine, Issue on Fog Computing and Networking

arXiv:1702.04850 [pdf, other]

Coded TeraSort

Authors: Songze Li, Sucha Supittayapornpong, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the da… ▽ More We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of Coded TeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97x - 3.39x speedup, compared with TeraSort, for typical settings of interest. △ Less

Submitted 15 February, 2017; originally announced February 2017.

Comments: to appear in proceedings of 2017 International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics

arXiv:1702.04563 [pdf, other]

Characterizing the Rate-Memory Tradeoff in Cache Networks within a Factor of 2

Authors: Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider a basic caching system, where a single server with a database of $N$ files (e.g. movies) is connected to a set of $K$ users through a shared bottleneck link. Each user has a local cache memory with a size of $M$ files. The system operates in two phases: a placement phase, where each cache memory is populated up to its size from the database, and a following delivery phase, where each u… ▽ More We consider a basic caching system, where a single server with a database of $N$ files (e.g. movies) is connected to a set of $K$ users through a shared bottleneck link. Each user has a local cache memory with a size of $M$ files. The system operates in two phases: a placement phase, where each cache memory is populated up to its size from the database, and a following delivery phase, where each user requests a file from the database, and the server is responsible for delivering the requested contents. The objective is to design the two phases to minimize the load (peak or average) of the bottleneck link. We characterize the rate-memory tradeoff of the above caching system within a factor of $2.00884$ for both the peak rate and the average rate (under uniform file popularity), improving state of the arts that are within a factor of $4$ and $4.7$ respectively. Moreover, in a practically important case where the number of files ($N$) is large, we exactly characterize the tradeoff for systems with no more than $5$ users, and characterize the tradeoff within a factor of $2$ otherwise. To establish these results, we develop two new converse bounds that improve over the state of the art. △ Less

Submitted 31 August, 2018; v1 submitted 15 February, 2017; originally announced February 2017.

arXiv:1701.05881

On the Optimality of Separation between Caching and Delivery in General Cache Networks

Authors: Navid Naderializadeh, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider a system, containing a library of multiple files and a general memoryless communication network through which a server is connected to multiple users, each equipped with a local isolated cache of certain size that can be used to store part of the library. Each user will ask for one of the files in the library, which needs to be delivered by the server through the intermediate communica… ▽ More We consider a system, containing a library of multiple files and a general memoryless communication network through which a server is connected to multiple users, each equipped with a local isolated cache of certain size that can be used to store part of the library. Each user will ask for one of the files in the library, which needs to be delivered by the server through the intermediate communication network. The objective is to design the cache placement (without prior knowledge of users' future requests) and the delivery phase in order to minimize the (normalized) delivery delay. We assume that the delivery phase consists of two steps: (1) generation of a set of multicast messages at the server, one for each subset of users, and (2) delivery of the multicast messages to the users. In this setting, we show that there exists a universal scheme for cache placement and multicast message generation, which is independent of the underlying communication network between the server and the users, and achieves the optimal delivery delay to within a constant factor for all memoryless networks. We prove this result, even though the capacity region of the underlying communication network is not known, even approximately. This result demonstrates that in the aforementioned setting, a separation between caching and multicast message generation on one hand, and delivering the multicast messages to the users on the other hand is approximately optimal. This result has the important practical implication that the prefetching can be done independent of network structure in the upcoming delivery phase. △ Less

Submitted 5 May, 2018; v1 submitted 20 January, 2017; originally announced January 2017.

Comments: Presented in part at the 2017 IEEE International Symposium on Information Theory (ISIT) -- withdrawn due to possible errors in the achievability proof in Section IV-A

arXiv:1609.07817 [pdf, other]

doi 10.1109/TIT.2017.2785237

The Exact Rate-Memory Tradeoff for Caching with Uncoded Prefetching

Authors: Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a prefetching phase. In a following delivery phase, each user requests a file from the database, and the server needs to deliver users' demands as efficiently as p… ▽ More We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a prefetching phase. In a following delivery phase, each user requests a file from the database, and the server needs to deliver users' demands as efficiently as possible by taking into account their cache contents. We focus on an important and commonly used class of prefetching schemes, where the caches are filled with uncoded data. We provide the exact characterization of the rate-memory tradeoff for this problem, by deriving both the minimum average rate (for a uniform file popularity) and the minimum peak rate required on the bottleneck link for a given cache size available at each user. In particular, we propose a novel caching scheme, which strictly improves the state of the art by exploiting commonality among user demands. We then demonstrate the exact optimality of our proposed scheme through a matching converse, by dividing the set of all demands into types, and showing that the placement phase in the proposed caching scheme is universally optimal for all types. Using these techniques, we also fully characterize the rate-memory tradeoff for a decentralized setting, in which users fill out their cache content without any coordination. △ Less

Submitted 18 February, 2019; v1 submitted 25 September, 2016; originally announced September 2016.

Journal ref: Published in: IEEE Transactions on Information Theory ( Volume: 64, Issue: 2, Feb. 2018 )

arXiv:1609.01690 [pdf, other]

A Unified Coding Framework for Distributed Computing with Straggling Servers

Authors: Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme of [1]-[3] that repeats the intermediate computations to create coded multicasting opportunities to reduce communication load, and the coded scheme of [4], [5]… ▽ More We propose a unified coded framework for distributed computing with straggling servers, by introducing a tradeoff between "latency of computation" and "load of communication" for some linear computation tasks. We show that the coded scheme of [1]-[3] that repeats the intermediate computations to create coded multicasting opportunities to reduce communication load, and the coded scheme of [4], [5] that generates redundant intermediate computations to combat against straggling servers can be viewed as special instances of the proposed framework, by considering two extremes of this tradeoff: minimizing either the load of communication or the latency of computation individually. Furthermore, the latency-load tradeoff achieved by the proposed coded framework allows to systematically operate at any point on that tradeoff to perform distributed computing tasks. We also prove an information-theoretic lower bound on the latency-load tradeoff, which is shown to be within a constant multiplicative gap from the achieved tradeoff at the two end points. △ Less

Submitted 6 September, 2016; originally announced September 2016.

Comments: a shorter version to appear in NetCod 2016

arXiv:1608.05743 [pdf, ps, other]

A Scalable Framework for Wireless Distributed Computing

Authors: Songze Li, Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

Abstract: We consider a wireless distributed computing system, in which multiple mobile users, connected wirelessly through an access point, collaborate to perform a computation task. In particular, users communicate with each other via the access point to exchange their locally computed intermediate computation results, which is known as data shuffling. We propose a scalable framework for this system, in w… ▽ More We consider a wireless distributed computing system, in which multiple mobile users, connected wirelessly through an access point, collaborate to perform a computation task. In particular, users communicate with each other via the access point to exchange their locally computed intermediate computation results, which is known as data shuffling. We propose a scalable framework for this system, in which the required communication bandwidth for data shuffling does not increase with the number of users in the network. The key idea is to utilize a particular repetitive pattern of placing the dataset (thus a particular repetitive pattern of intermediate computations), in order to provide coding opportunities at both the users and the access point, which reduce the required uplink communication bandwidth from users to access point and the downlink communication bandwidth from access point to users by factors that grow linearly with the number of users. We also demonstrate that the proposed dataset placement and coded shuffling schemes are optimal (i.e., achieve the minimum required shuffling load) for both a centralized setting and a decentralized setting, by develo** tight information-theoretic lower bounds. △ Less

Submitted 5 May, 2017; v1 submitted 19 August, 2016; originally announced August 2016.

Comments: To appear in IEEE/ACM Transactions on Networking

arXiv:1604.07086 [pdf, other]

A Fundamental Tradeoff between Computation and Communication in Distributed Computing

Authors: Songze Li, Mohammad Ali Maddah-Ali, Qian Yu, A. Salman Avestimehr

Abstract: How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other. More specifically, a general distributed computing framework, motivated by commonly used structures like… ▽ More How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other. More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "Coded Distributed Computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of $r$ (i.e., evaluating each function at $r$ carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor. An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme. As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized. Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by $1.97\times$ - $3.39\times$, for typical settings of interest. △ Less

Submitted 22 September, 2017; v1 submitted 24 April, 2016; originally announced April 2016.

Comments: To appear in IEEE Transactions on Information Theory

Showing 1–50 of 79 results for author: Maddah-Ali, M