-
SENSEi: Input-Sensitive Compilation for Accelerating GNNs
Authors:
Damitha Lenadora,
Vimarsh Sathia,
Gerasimos Gerogiannis,
Serif Yesil,
Josep Torrellas,
Charith Mendis
Abstract:
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense ma…
▽ More
Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations and selects the best among them based on input attributes. SENSEi executes in two stages: (1) an offline compilation stage that enumerates all valid re-associations leading to different sparse-dense matrix compositions and uses input-oblivious pruning techniques to prune away clearly unprofitable candidates and (2) an online runtime system that explores the remaining candidates and uses light-weight cost models to select the best re-association based on the input graph and the embedding sizes on a given hardware platform. On a wide range of configurations, SENSEi achieves speedups of up to $2.012\times$ and $1.85\times$ on graph convolutional networks and up to $6.294\times$ and $16.274\times$ on graph attention networks, on GPUs and CPUs respectively. We also show that its technique generalizes to GNN variants, including those that require sampling. Furthermore, we show that SENSEi's techniques are agnostic to the underlying GNN system, and can be used to yield synergistic improvements across a diverse set of implementations.
△ Less
Submitted 8 March, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Hybrid Cloud and HPC Approach to High-Performance Dataframes
Authors:
Kaiying Shan,
Niranda Perera,
Damitha Lenadora,
Tianle Zhong,
Arup Sarker,
Supun Kamburugamuve,
Thejaka Amila Kanewela,
Chathura Widanage,
Geoffrey Fox
Abstract:
Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data processing both as a standalone application and as a library, especially for Python applications. While Cylon shows promising performance results, we experienced difficu…
▽ More
Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data processing both as a standalone application and as a library, especially for Python applications. While Cylon shows promising performance results, we experienced difficulties trying to integrate with frameworks incompatible with the traditional Message Passing Interface (MPI). While MPI implementations encompass scalable and efficient communication routines, their process launching mechanisms work well with mainstream HPC systems but are incompatible with some environments that adopt their own resource management systems. In this work, we alleviated this issue by directly integrating the Unified Communication X (UCX) framework, which supports a variety of classic HPC and non-HPC process-bootstrap** mechanisms as our communication framework. While we experimented with our methodology on Cylon, the same technique can be used to bring MPI communication to other applications that do not employ MPI's built-in process management approach.
△ Less
Submitted 29 December, 2022; v1 submitted 28 December, 2022;
originally announced December 2022.
-
High Performance Dataframes from Parallel Processing Patterns
Authors:
Niranda Perera,
Supun Kamburugamuve,
Chathura Widanage,
Vibhatha Abeykoon,
Ahmet Uyar,
Kaiying Shan,
Hasara Maithree,
Damitha Lenadora,
Thejaka Amila Kanewala,
Geoffrey Fox
Abstract:
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performance limitations even while working on even moderat…
▽ More
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performance limitations even while working on even moderately large data sets. We believe that there is plenty of room for improvement by investigating the generic distributed patterns of dataframe operators. In this paper, we propose a framework that lays the foundation for building high performance distributed-memory parallel dataframe systems based on these parallel processing patterns. We also present Cylon, as a reference runtime implementation. We demonstrate how this framework has enabled Cylon achieving scalable high performance. We also underline the flexibility of the proposed API and the extensibility of the framework on different hardware. To the best of our knowledge, Cylon is the first and only distributed-memory parallel dataframe system available today.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
A Fast, Scalable, Universal Approach For Distributed Data Aggregations
Authors:
Niranda Perera,
Vibhatha Abeykoon,
Chathura Widanage,
Supun Kamburugamuve,
Thejaka Amila Kanewala,
Pulasthi Wickramasinghe,
Ahmet Uyar,
Hasara Maithree,
Damitha Lenadora,
Geoffrey Fox
Abstract:
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these appl…
▽ More
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grou** of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that can universally integrate with existing frameworks, and thereby increase the productivity and efficiency of the entire data analytics pipeline. Cylon endeavors to fulfill this void. In this paper, we present Cylon's fast and scalable aggregation operations implemented on top of a distributed in-memory table structure that universally integrates with existing frameworks.
△ Less
Submitted 14 December, 2020; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Exploratory Analysis of a Social Media Network in Sri Lanka during the COVID-19 Virus Outbreak
Authors:
Damitha Lenadora,
Gihan Gamage,
Dilantha Haputhanthri,
Dulani Meedeniya,
Indika Perera
Abstract:
During the COVID-19 pandemic, multiple aspects of human life were subjected to unprecedented changes, globally. In Sri Lanka, a develo** country located in South Asia, it was possible to observe a range of events that arose due to the influence of the COVID-19 virus outbreak. Thus, the people of Sri Lanka used Social Media to voice their opinions regarding such events and those involved in them,…
▽ More
During the COVID-19 pandemic, multiple aspects of human life were subjected to unprecedented changes, globally. In Sri Lanka, a develo** country located in South Asia, it was possible to observe a range of events that arose due to the influence of the COVID-19 virus outbreak. Thus, the people of Sri Lanka used Social Media to voice their opinions regarding such events and those involved in them, enabling the ideal avenue to explore the social perception. However, the outcome of such actions was at certain times detrimental. This study was conducted as an attempt to identify the reasons for such instances as well as to identify the behaviours of the Sri Lankan populace during such a crisis event. To support this study, observations, as well as data of related posts from a sample of 50 sources, were manually collected from the most popular social media platform in Sri Lanka, Facebook. The posts considered spanned until approximately a month after the initial major virus outbreak in the country and contained content that even vaguely related to the virus. Utilising such data, various forms of analyses such as topic significance and topic co-occurrences were conducted. The findings highlight, while there can be social detrimental ideas shared, the majority of the posts point constructive and positive thoughts suggesting the successful influence from the cultural and social values Sri Lanka society promotes throughout.
△ Less
Submitted 17 June, 2020; v1 submitted 14 June, 2020;
originally announced June 2020.