Computer Science > Distributed, Parallel, and Cluster Computing
[Submitted on 29 May 2019]
Title:Evaluation of pilot jobs for Apache Spark applications on HPC clusters
View PDFAbstract:Big Data has become prominent throughout many scientific fields and, as a result, scientific communities have sought out Big Data frameworks to accelerate the processing of their increasingly data-intensive pipelines. However, while scientific communities typically rely on High-Performance Computing (HPC) clusters for the parallelization of their pipelines, many popular Big Data frameworks such as Hadoop and Apache Spark were primarily designed to be executed on dedicated commodity infrastructures. This paper evaluates the benefits of pilot jobs over traditional batch submission for Apache Spark on HPC clusters. Surprisingly, our results show that the speed-up provided by pilot jobs over batch scheduling is moderate to inexistent (0.98 on average) despite the presence of long queuing times. In addition, pilot jobs provide an extra layer of scheduling that complexifies debugging and deployment. We conclude that traditional batch scheduling should remain the default strategy to deploy Apache Spark applications on HPC clusters.
Submission history
From: Valérie Hayot-Sasson [view email][v1] Wed, 29 May 2019 20:55:50 UTC (330 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.