-
Homeostasis: Design and Implementation of a Self-Stabilizing Compiler
Authors:
Aman Nougrahiya,
V. Krishna Nandivada
Abstract:
Mainstream compilers perform a multitude of analyses and optimizations on the given input program. Each analysis (such as points-to analysis) may generate a program-abstraction (such as points-to graph). Each optimization is typically composed of multiple alternating phases of inspection of such program-abstractions and transformations of the program. Upon transformation of a program, the program-…
▽ More
Mainstream compilers perform a multitude of analyses and optimizations on the given input program. Each analysis (such as points-to analysis) may generate a program-abstraction (such as points-to graph). Each optimization is typically composed of multiple alternating phases of inspection of such program-abstractions and transformations of the program. Upon transformation of a program, the program-abstractions generated by various analyses may become inconsistent with the modified program. Consequently, the correctness of the downstream inspection (and consequent transformation) phases cannot be ensured until the relevant program-abstractions are stabilized; that is, the program-abstractions are either invalidated or made consistent with the modified program. In general, the existing compiler frameworks do not perform automated stabilization of the program-abstractions and instead leave it to the compiler pass writers to deal with the complex task of identifying the relevant program-abstractions to be stabilized, the points where the stabilization is to be performed, and the exact procedure of stabilization. In this paper, we address these challenges by providing the design and implementation of a novel compiler-design framework called Homeostasis. Homeostasis automatically captures all the program changes performed by each transformation phase, and later, triggers the required stabilization using the captured information, if needed. We also provide a formal description of Homeostasis and a correctness proof thereof. To assess the feasibility of using Homeostasis in compilers of parallel programs, we have implemented our proposed idea in IMOP, a compiler framework for OpenMP C programs. We present an evaluation which demonstrates that Homeostasis is efficient and easy to use.
△ Less
Submitted 1 March, 2024; v1 submitted 3 June, 2021;
originally announced June 2021.
-
An Adaptive Load Balancer For Graph Analytical Applications on GPUs
Authors:
Vishwesh Jatala,
Loc Hoang,
Roshan Dathathri,
Gurbinder Gill,
V Krishna Nandivada,
Keshav **ali
Abstract:
Load-balancing among the threads of a GPU for graph analytics workloads is difficult because of the irregular nature of graph applications and the high variability in vertex degrees, particularly in power-law graphs. We describe a novel load balancing scheme to address this problem. Our scheme is implemented in the IrGL compiler to allow users to generate efficient load balanced code for a GPU fro…
▽ More
Load-balancing among the threads of a GPU for graph analytics workloads is difficult because of the irregular nature of graph applications and the high variability in vertex degrees, particularly in power-law graphs. We describe a novel load balancing scheme to address this problem. Our scheme is implemented in the IrGL compiler to allow users to generate efficient load balanced code for a GPU from high-level sequential programs. We evaluated several graph analytics applications on up to 16 distributed GPUs using IrGL to compile the code and the Gluon substrate for inter-GPU communication. Our experiments show that this scheme can achieve an average speed-up of 2.2x on inputs that suffer from severe load imbalance problems when previous state-of-the-art load-balancing schemes are used.
△ Less
Submitted 27 February, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.
-
DCAFE: Dynamic load-balanced loop Chunking & Aggressive Finish Elimination for Recursive Task Parallel Programs
Authors:
Suyash Gupta,
Rahul Shrivastava,
V. Krishna Nandivada
Abstract:
In this paper, we present two symbiotic optimizations to optimize recursive task parallel (RTP) programs by reducing the task creation and termination overheads. Our first optimization Aggressive Finish-Elimination (AFE) helps reduce the redundant join operations to a large extent. The second optimization Dynamic Load-Balanced loop Chunking (DLBC) extends the prior work on loop chunking to decide…
▽ More
In this paper, we present two symbiotic optimizations to optimize recursive task parallel (RTP) programs by reducing the task creation and termination overheads. Our first optimization Aggressive Finish-Elimination (AFE) helps reduce the redundant join operations to a large extent. The second optimization Dynamic Load-Balanced loop Chunking (DLBC) extends the prior work on loop chunking to decide on the number of parallel tasks based on the number of available worker threads, at runtime. Further, we discuss the impact of exceptions on our optimizations and extend them to handle RTP programs that may throw exceptions. We implemented DCAFE (= DLBC+AFE) in the X10v2.3 compiler and tested it over a set of benchmark kernels on two different hardwares (a 16-core Intel system and a 64-core AMD system). With respect to the base X10 compiler extended with loop-chunking of Nandivada et al [Nandivada et al.(2013)Nandivada, Shirako, Zhao, and Sarkar](LC), DCAFE achieved a geometric mean speed up of 5.75x and 4.16x on the Intel and AMD system, respectively. We also present an evaluation with respect to the energy consumption on the Intel system and show that on average, compared to the LC versions, the DCAFE versions consume 71.2% less energy.
△ Less
Submitted 21 February, 2015;
originally announced February 2015.
-
IMSuite: A Benchmark Suite for Simulating Distributed Algorithms
Authors:
Suyash Gupta,
V. Krishna Nandivada
Abstract:
Considering the diverse nature of real-world distributed applications that makes it hard to identify a representative subset of distributed benchmarks, we focus on their underlying distributed algorithms. We present and characterize a new kernel benchmark suite (named IMSuite) that simulates some of the classical distributed algorithms in task parallel languages. We present multiple variations of…
▽ More
Considering the diverse nature of real-world distributed applications that makes it hard to identify a representative subset of distributed benchmarks, we focus on their underlying distributed algorithms. We present and characterize a new kernel benchmark suite (named IMSuite) that simulates some of the classical distributed algorithms in task parallel languages. We present multiple variations of our kernels, broadly categorized under two heads: (a) varying synchronization primitives (with and without fine grain synchronization primitives); and (b) varying forms of parallelization (data parallel and recursive task parallel). Our characterization covers interesting aspects of distributed applications such as distribution of remote communication requests, number of synchronization, task creation, task termination and atomic operations. We study the behavior (execution time) of our kernels by varying the problem size, the number of compute threads, and the input configurations. We also present an involved set of input generators and output validators.
△ Less
Submitted 10 October, 2013;
originally announced October 2013.
-
Lexical State Analyzer
Authors:
Kartik Gupta,
V. Krishna Nandivada
Abstract:
Lexical states provide a powerful mechanism to scan regular expressions in a context sensitive manner. At the same time, lexical states also make it hard to reason about the correctness of the grammar. We first categorize the related correctness issues into two classes: errors and warnings, and then present a context sensitive and a context insensitive analysis to identify errors and warnings in c…
▽ More
Lexical states provide a powerful mechanism to scan regular expressions in a context sensitive manner. At the same time, lexical states also make it hard to reason about the correctness of the grammar. We first categorize the related correctness issues into two classes: errors and warnings, and then present a context sensitive and a context insensitive analysis to identify errors and warnings in context-free-grammars (CFGs). We also present a comparative study of these analyses. A standalone tool (LSA) has also been implemented by us that can identify errors and warnings in JavaCC grammars. The LSA tool outputs a graph that depicts the grammar and the error transitions. It can also generates counter example strings that can be used to establish the errors. We have used LSA to analyze a host of open-source JavaCC grammar files to good effect.
△ Less
Submitted 14 August, 2013;
originally announced August 2013.