Analysis of Workflow Schedulers in Simulated Distributed Environments
Authors:
Jakub Beránek,
Stanislav Böhm,
Vojtěch Cima
Abstract:
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm. Many scheduling heuristics have been proposed in existing works; nevertheless, they are often tested in oversimplified environments. We provide an extensible sim…
▽ More
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm. Many scheduling heuristics have been proposed in existing works; nevertheless, they are often tested in oversimplified environments. We provide an extensible simulation environment designed for prototy** and benchmarking task schedulers, which contains implementations of various scheduling algorithms and is open-sourced, in order to be fully reproducible. We use this environment to perform a comprehensive analysis of workflow scheduling algorithms with a focus on quantifying the effect of scheduling challenges that have so far been mostly neglected, such as delays between scheduler invocations or partially unknown task durations. Our results indicate that network models used by many previous works might produce results that are off by an order of magnitude in comparison to a more realistic model. Additionally, we show that certain implementation details of scheduling algorithms which are often neglected can have a large effect on the scheduler's performance, and they should thus be described in great detail to enable proper evaluation.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms
Authors:
Christian Pilato,
Stanislav Bohm,
Fabien Brocheton,
Jeronimo Castrillon,
Riccardo Cevasco,
Vojtech Cima,
Radim Cmar,
Dionysios Diamantopoulos,
Fabrizio Ferrandi,
Jan Martinovic,
Gianluca Palermo,
Michele Paolino,
Antonio Parodi,
Lorenzo Pittaluga,
Daniel Raho,
Francesco Regazzoni,
Katerina Slaninova,
Christoph Hagleitner
Abstract:
High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements…
▽ More
High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements with the characteristics of the underlying hardware is a key element to improve the predictions thanks to high performance and better use of resources.
We present EVEREST, a novel H2020 project started on October 1st, 2020 that aims at develo** a holistic environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms. EVEREST focuses on programmability issues through a data-driven design approach, the use of hardware-accelerated AI, and an efficient runtime monitoring with virtualization support. In the different stages, EVEREST combines state-of-the-art programming models, emerging communication standards, and novel domain-specific extensions. We describe the EVEREST approach and the use cases that drive our research.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.