Computer Science > Distributed, Parallel, and Cluster Computing
[Submitted on 17 Jan 2018 (v1), last revised 9 Jun 2020 (this version, v4)]
Title:Coded Computing for Distributed Graph Analytics
View PDFAbstract:Performance of distributed graph processing systems significantly suffers from 'communication bottleneck' as a large number of messages are exchanged among servers at each step of the computation. Motivated by graph based MapReduce, we propose a coded computing framework that leverages computation redundancy to alleviate the communication bottleneck in distributed graph processing. We develop a novel 'coding' scheme that systematically injects structured redundancy in computation phase to enable 'coded' multicasting opportunities during message exchange between servers, reducing communication load substantially in large-scale graph processing. For theoretical analysis, we consider random graph models, and prove that our proposed scheme enables an (asymptotically) inverse-linear trade-off between 'computation load' and 'average communication load' for two popular random graph models -- Erdos-Renyi model, and power law model. Particularly, for a given computation load r, (i.e. when each graph vertex is carefully stored at r servers), the proposed scheme slashes the average communication load by (nearly) a multiplicative factor of r. For the Erdos-Renyi model, our proposed scheme is optimal asymptotically as the graph size increases by providing an information-theoretic converse. To illustrate the benefits of our scheme in practice, we implement PageRank over Amazon EC2, using artificial as well as real-world datasets, demonstrating significant gains over conventional PageRank. We also specialize our scheme and extend our theoretical results to two other random graph models -- random bi-partite model, and stochastic block model. They asymptotically enable inverse-linear trade-offs between computation and communication loads in distributed graph processing for these popular random graph models as well. We complement the achievability results with converse bounds for both of these models.
Submission history
From: Saurav Prakash [view email][v1] Wed, 17 Jan 2018 01:43:10 UTC (887 KB)
[v2] Wed, 4 Jul 2018 12:29:48 UTC (795 KB)
[v3] Thu, 20 Jun 2019 03:39:16 UTC (722 KB)
[v4] Tue, 9 Jun 2020 14:43:15 UTC (1,877 KB)
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.