-
The LDBC Graphalytics Benchmark
Authors:
Alexandru Iosup,
Ahmed Musaafir,
Alexandru Uta,
Arnau Prat Pérez,
Gábor Szárnyas,
Hassan Chafi,
Ilie Gabriel Tănase,
Lifeng Nai,
Michael Anderson,
Mihai Capotă,
Narayanan Sundaram,
Peter Boncz,
Siegfried Depner,
Stijn Heldens,
Thomas Manhardt,
Tim Hegeman,
Wing Lung Ngai,
Yinglong Xia
Abstract:
In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, s…
▽ More
In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report.
△ Less
Submitted 6 April, 2023; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Towards a property graph generator for benchmarking
Authors:
Arnau Prat-Pérez,
Joan Guisado-Gámez,
Xavier Fernández Salas,
Petr Koupy,
Siegfried Depner,
Davide Basilio Bartolini
Abstract:
The use of synthetic graph generators is a common practice among graph-oriented benchmark designers, as it allows obtaining graphs with the required scale and characteristics. However, finding a graph generator that accurately fits the needs of a given benchmark is very difficult, thus practitioners end up creating ad-hoc ones. Such a task is usually time-consuming, and often leads to reinventing…
▽ More
The use of synthetic graph generators is a common practice among graph-oriented benchmark designers, as it allows obtaining graphs with the required scale and characteristics. However, finding a graph generator that accurately fits the needs of a given benchmark is very difficult, thus practitioners end up creating ad-hoc ones. Such a task is usually time-consuming, and often leads to reinventing the wheel. In this paper, we introduce the conceptual design of DataSynth, a framework for property graphs generation with customizable schemas and characteristics. The goal of DataSynth is to assist benchmark designers in generating graphs efficiently and at scale, saving from implementing their own generators. Additionally, DataSynth introduces novel features barely explored so far, such as modeling the correlation between properties and the structure of the graph. This is achieved by a novel property-to-node matching algorithm for which we present preliminary promising results.
△ Less
Submitted 3 April, 2017;
originally announced April 2017.
-
A Load-Balanced Parallel and Distributed Sorting Algorithm Implemented with PGX.D
Authors:
Zahra Khatami,
Sungpack Hong,
**soo Lee,
Siegfried Depner,
Hassan Chafi,
J. Ramanujam,
Hartmut Kaiser
Abstract:
Sorting has been one of the most challenging studied problems in different scientific researches. Although many techniques and algorithms have been proposed on the theory of having efficient parallel sorting implementation, however achieving desired performance on different types of the architectures with large number of processors is still a challenging issue. Maximizing parallelism level in appl…
▽ More
Sorting has been one of the most challenging studied problems in different scientific researches. Although many techniques and algorithms have been proposed on the theory of having efficient parallel sorting implementation, however achieving desired performance on different types of the architectures with large number of processors is still a challenging issue. Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalance and waiting time due to memory latencies. In this paper, we present a distributed sorting algorithm implemented in PGX.D, a fast distributed graph processing system, which outperforms the Spark's distributed sorting implementation by around 2x-3x by hiding communication latencies and minimizing unnecessary overheads. Furthermore, it shows that the proposed PGX.D sorting method handles dataset containing many duplicated data entries efficiently and always results in kee** balanced workloads for different input data distribution types.
△ Less
Submitted 14 January, 2017; v1 submitted 1 November, 2016;
originally announced November 2016.