-
GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs
Authors:
Juelin Liu,
Sandeep Polisetty,
Hui Guan,
Marco Serafini
Abstract:
Graph pattern matching is a fundamental problem encountered by many common graph mining tasks and the basic building block of several graph mining systems.
This paper explores for the first time how to proactively prune graphs to speed up graph pattern matching by leveraging the structure of the query pattern and the input graph.
We propose building auxiliary graphs, which are different pruned…
▽ More
Graph pattern matching is a fundamental problem encountered by many common graph mining tasks and the basic building block of several graph mining systems.
This paper explores for the first time how to proactively prune graphs to speed up graph pattern matching by leveraging the structure of the query pattern and the input graph.
We propose building auxiliary graphs, which are different pruned versions of the graph, during query execution.
This requires careful balancing between the upfront cost of building and managing auxiliary graphs and the gains of faster set operations.
To this end, we propose GraphMini, a new system that uses query compilation and a new cost model to minimize the cost of building and maintaining auxiliary graphs and maximize gains.
Our evaluation shows that using GraphMini can achieve one order of magnitude speedup compared to state-of-the-art subgraph enumeration systems on commonly used benchmarks.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
Authors:
Sandeep Polisetty,
Juelin Liu,
Kobi Falus,
Yi Ren Fung,
Seung-Hwan Lim,
Hui Guan,
Marco Serafini
Abstract:
Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch training is commonly used to train GNNs on large graphs, and data parallelism is the standard approach to scale mini-batch training across multiple GPUs. One of the major performance costs in GNN training is the l…
▽ More
Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch training is commonly used to train GNNs on large graphs, and data parallelism is the standard approach to scale mini-batch training across multiple GPUs. One of the major performance costs in GNN training is the loading of input features, which prevents GPUs from being fully utilized. In this paper, we argue that this problem is exacerbated by redundancies that are inherent to the data parallel approach. To address this issue, we introduce a hybrid parallel mini-batch training paradigm called split parallelism. Split parallelism avoids redundant data loads and splits the sampling and training of each mini-batch across multiple GPUs online, at each iteration, using a lightweight splitting algorithm. We implement split parallelism in GSplit and show that it outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and $P^3$.
△ Less
Submitted 27 June, 2024; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Accelerating Graph Sampling for Graph Machine Learning using GPUs
Authors:
Abhinav Jangda,
Sandeep Polisetty,
Arjun Guha,
Marco Serafini
Abstract:
Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling.
Samplin…
▽ More
Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling.
Sampling is an embarrassingly parallel problem and may appear to lend itself to GPU acceleration, but the irregularity of graphs makes it hard to use GPU resources effectively. This paper presents NextDoor, a system designed to effectively perform graph sampling on GPUs. NextDoor employs a new approach to graph sampling that we call transit-parallelism, which allows load balancing and caching of edges. NextDoor provides end-users with a high-level abstraction for writing a variety of graph sampling algorithms. We implement several graph sampling applications, and show that NextDoor runs them orders of magnitude faster than existing systems.
△ Less
Submitted 10 May, 2021; v1 submitted 14 September, 2020;
originally announced September 2020.
-
On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners
Authors:
Sravya Polisetty,
Andriy Miranskyy,
Ayse Bener
Abstract:
Background: Developers spend a significant amount of time and efforts to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and efficiency. The current models are not capable o…
▽ More
Background: Developers spend a significant amount of time and efforts to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and efficiency. The current models are not capable of meeting these criteria, making it harder to adopt these models in practice. Recently, deep-learning-based bug localization models have been proposed in the literature, which show a better performance than the state-of-the-art models.
Aim: In this research, we would like to investigate whether deep learning models meet the expectations of practitioners or not.
Method: We constructed a Convolution Neural Network and a Simple Logistic model to examine their effectiveness in localizing bugs. We train these models on five open source projects written in Java and compare their performance with the performance of other state-of-the-art models trained on these datasets.
Results: Our experiments show that although the deep learning models perform better than classic machine learning models, they meet the adoption criteria set by the practitioners only partially.
Conclusions: This work provides evidence that the practitioners should be cautious while using the current state of the art models for production-level use-cases. It also highlights the need for standardization of performance benchmarks to ensure that bug localization models are assessed equitably and realistically.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.