Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication
Authors:
Lukas Gianinazzi,
Alexandros Nikolaos Ziogas,
Langwen Huang,
Piotr Luczynski,
Saleh Ashkboos,
Florian Scheidl,
Armon Carigiet,
Chio Ge,
Nabil Abubaker,
Maciej Besta,
Tal Ben-Nun,
Torsten Hoefler
Abstract:
We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit th…
▽ More
We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit the sparsity in the problem. To address these challenges, we propose decomposing the sparse matrix into a small number of highly structured matrices called arrow matrices, which are connected by permutations. Our approach enables communication-avoiding multiplications, achieving a polynomial reduction in communication volume per iteration for matrices corresponding to planar graphs and other minor-excluded families of graphs. Our evaluation demonstrates that our approach outperforms a state-of-the-art method for sparse matrix multiplication on matrices with hundreds of millions of rows, offering near-linear strong and weak scaling.
△ Less
Submitted 20 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
Authors:
Maciej Besta,
Armon Carigiet,
Zur Vonarburg-Shmaria,
Kacper Janda,
Lukas Gianinazzi,
Torsten Hoefler
Abstract:
We develop the first parallel graph coloring heuristics with strong theoretical guarantees on work and depth and coloring quality. The key idea is to design a relaxation of the vertex degeneracy order, a well-known graph theory concept, and to color vertices in the order dictated by this relaxation. This introduces a tunable amount of parallelism into the degeneracy ordering that is otherwise hard…
▽ More
We develop the first parallel graph coloring heuristics with strong theoretical guarantees on work and depth and coloring quality. The key idea is to design a relaxation of the vertex degeneracy order, a well-known graph theory concept, and to color vertices in the order dictated by this relaxation. This introduces a tunable amount of parallelism into the degeneracy ordering that is otherwise hard to parallelize. This simple idea enables significant benefits in several key aspects of graph coloring. For example, one of our algorithms ensures polylogarithmic depth and a bound on the number of used colors that is superior to all other parallelizable schemes, while maintaining work-efficiency. In addition to provable guarantees, the developed algorithms have competitive run-times for several real-world graphs, while almost always providing superior coloring quality. Our degeneracy ordering relaxation is of separate interest for algorithms outside the context of coloring.
△ Less
Submitted 11 November, 2020; v1 submitted 25 August, 2020;
originally announced August 2020.