Coded Computing for Distributed Graph Analytics

Prakash, Saurav; Reisizadeh, Amirhossein; Pedarsani, Ramtin; Avestimehr, Amir Salman

doi:10.1109/TIT.2020.2999675

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1801.05522 (cs)

[Submitted on 17 Jan 2018 (v1), last revised 9 Jun 2020 (this version, v4)]

Title:Coded Computing for Distributed Graph Analytics

Authors:Saurav Prakash, Amirhossein Reisizadeh, Ramtin Pedarsani, Amir Salman Avestimehr

View PDF

Abstract:Performance of distributed graph processing systems significantly suffers from 'communication bottleneck' as a large number of messages are exchanged among servers at each step of the computation. Motivated by graph based MapReduce, we propose a coded computing framework that leverages computation redundancy to alleviate the communication bottleneck in distributed graph processing. We develop a novel 'coding' scheme that systematically injects structured redundancy in computation phase to enable 'coded' multicasting opportunities during message exchange between servers, reducing communication load substantially in large-scale graph processing. For theoretical analysis, we consider random graph models, and prove that our proposed scheme enables an (asymptotically) inverse-linear trade-off between 'computation load' and 'average communication load' for two popular random graph models -- Erdos-Renyi model, and power law model. Particularly, for a given computation load r, (i.e. when each graph vertex is carefully stored at r servers), the proposed scheme slashes the average communication load by (nearly) a multiplicative factor of r. For the Erdos-Renyi model, our proposed scheme is optimal asymptotically as the graph size increases by providing an information-theoretic converse. To illustrate the benefits of our scheme in practice, we implement PageRank over Amazon EC2, using artificial as well as real-world datasets, demonstrating significant gains over conventional PageRank. We also specialize our scheme and extend our theoretical results to two other random graph models -- random bi-partite model, and stochastic block model. They asymptotically enable inverse-linear trade-offs between computation and communication loads in distributed graph processing for these popular random graph models as well. We complement the achievability results with converse bounds for both of these models.

Comments:	Accepted for publication in the IEEE Transactions on Information Theory
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
Cite as:	arXiv:1801.05522 [cs.DC]
	(or arXiv:1801.05522v4 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1801.05522
Related DOI:	https://doi.org/10.1109/TIT.2020.2999675

Submission history

From: Saurav Prakash [view email]
[v1] Wed, 17 Jan 2018 01:43:10 UTC (887 KB)
[v2] Wed, 4 Jul 2018 12:29:48 UTC (795 KB)
[v3] Thu, 20 Jun 2019 03:39:16 UTC (722 KB)
[v4] Tue, 9 Jun 2020 14:43:15 UTC (1,877 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Coded Computing for Distributed Graph Analytics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Coded Computing for Distributed Graph Analytics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators