-
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems
Authors:
Maciej Besta,
Raghavendra Kanakagiri,
Grzegorz Kwasniewski,
Rachata Ausavarungnirun,
Jakub Beránek,
Konstantinos Kanellopoulos,
Kacper Janda,
Zur Vonarburg-Shmaria,
Lukas Gianinazzi,
Ioana Stefan,
Juan Gómez Luna,
Marcin Copik,
Lukas Kapp-Schwoerer,
Salvatore Di Girolamo,
Marek Konieczny,
Nils Blach,
Onur Mutlu,
Torsten Hoefler
Abstract:
Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are memory-bound and thus could be accelerated by hardware techniques such as Processing-in-Memory (PIM). However, they also come with nonstraightforward paralleli…
▽ More
Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are memory-bound and thus could be accelerated by hardware techniques such as Processing-in-Memory (PIM). However, they also come with nonstraightforward parallelism and complicated memory access patterns. In this work, we address this problem with a simple yet surprisingly powerful observation: operations on sets of vertices, such as intersection or union, form a large part of many complex graph mining algorithms, and can offer rich and simple parallelism at multiple levels. This observation drives our cross-layer design, in which we (1) expose set operations using a novel programming paradigm, (2) express and execute these operations efficiently with carefully designed set-centric ISA extensions called SISA, and (3) use PIM to accelerate SISA instructions. The key design idea is to alleviate the bandwidth needs of SISA instructions by map** set operations to two types of PIM: in-DRAM bulk bitwise computing for bitvectors representing high-degree vertices, and near-memory logic layers for integer arrays representing low-degree vertices. Set-centric SISA-enhanced algorithms are efficient and outperform hand-tuned baselines, offering more than 10x speedup over the established Bron-Kerbosch algorithm for listing maximal cliques. We deliver more than 10 SISA set-centric algorithm formulations, illustrating SISA's wide applicability.
△ Less
Submitted 25 October, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
Authors:
Maciej Besta,
Zur Vonarburg-Shmaria,
Yannick Schaffner,
Leonardo Schwarz,
Grzegorz Kwasniewski,
Lukas Gianinazzi,
Jakub Beranek,
Kacper Janda,
Tobias Holenstein,
Sebastian Leisinger,
Peter Tatkowski,
Esref Ozdemir,
Adrian Balla,
Marcin Copik,
Philipp Lindenberger,
Pavel Kalvoda,
Marek Konieczny,
Onur Mutlu,
Torsten Hoefler
Abstract:
We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of dif…
▽ More
We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of different fine-grained elements of graph mining algorithms, such as graph representations or algorithm subroutines. The platform includes parallel implementations of more than 40 considered baselines, and it facilitates develo** complex and fast mining algorithms. High modularity is possible by harnessing set algebra operations such as set intersection and difference, which enables breaking complex graph mining algorithms into simple building blocks that can be separately experimented with. GMS is supported with a broad concurrency analysis for portability in performance insights, and a novel performance metric to assess the throughput of graph mining algorithms, enabling more insightful evaluation. As use cases, we harness GMS to rapidly redesign and accelerate state-of-the-art baselines of core graph mining problems: degeneracy reordering (by up to >2x), maximal clique listing (by up to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x), also obtaining better theoretical performance bounds.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
Authors:
Maciej Besta,
Armon Carigiet,
Zur Vonarburg-Shmaria,
Kacper Janda,
Lukas Gianinazzi,
Torsten Hoefler
Abstract:
We develop the first parallel graph coloring heuristics with strong theoretical guarantees on work and depth and coloring quality. The key idea is to design a relaxation of the vertex degeneracy order, a well-known graph theory concept, and to color vertices in the order dictated by this relaxation. This introduces a tunable amount of parallelism into the degeneracy ordering that is otherwise hard…
▽ More
We develop the first parallel graph coloring heuristics with strong theoretical guarantees on work and depth and coloring quality. The key idea is to design a relaxation of the vertex degeneracy order, a well-known graph theory concept, and to color vertices in the order dictated by this relaxation. This introduces a tunable amount of parallelism into the degeneracy ordering that is otherwise hard to parallelize. This simple idea enables significant benefits in several key aspects of graph coloring. For example, one of our algorithms ensures polylogarithmic depth and a bound on the number of used colors that is superior to all other parallelizable schemes, while maintaining work-efficiency. In addition to provable guarantees, the developed algorithms have competitive run-times for several real-world graphs, while almost always providing superior coloring quality. Our degeneracy ordering relaxation is of separate interest for algorithms outside the context of coloring.
△ Less
Submitted 11 November, 2020; v1 submitted 25 August, 2020;
originally announced August 2020.