Skip to main content

Showing 1–11 of 11 results for author: Jangda, A

Searching in archive cs. Search in all archives.
.
  1. Fast Kronecker Matrix-Matrix Multiplication on GPUs

    Authors: Abhinav Jangda, Mohit Yadav

    Abstract: Kronecker Matrix-Matrix Multiplication (Kron-Matmul) is the multiplication of a matrix with the Kronecker Product of several smaller matrices. Kron-Matmul is a core operation for many scientific and machine learning computations. State-of-the-art Kron-Matmul implementations utilize existing tensor algebra operations, such as matrix multiplication, transpose, and tensor matrix multiplication. Howev… ▽ More

    Submitted 27 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted at PPoPP 2024

  2. arXiv:2308.09895  [pdf, other

    cs.PL cs.LG

    Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

    Authors: Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha

    Abstract: Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript)… ▽ More

    Submitted 10 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

  3. arXiv:2305.13450  [pdf, other

    cs.DC

    A Framework for Fine-Grained Synchronization of Dependent GPU Kernels

    Authors: Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi

    Abstract: Machine Learning (ML) models execute several parallel computations including Generalized Matrix Multiplication, Convolution, Dropout, etc. These computations are commonly executed on Graphics Processing Units (GPUs), by dividing the computation into independent processing blocks, known as tiles. Since the number of tiles are usually higher than the execution units of a GPU, tiles are executed on a… ▽ More

    Submitted 14 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at CGO 2024

  4. arXiv:2208.08227  [pdf, other

    cs.LG cs.PL

    MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

    Authors: Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda

    Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities wi… ▽ More

    Submitted 19 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  5. arXiv:2105.05720  [pdf, other

    cs.DC cs.LG cs.PL

    Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

    Authors: Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi

    Abstract: Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the op… ▽ More

    Submitted 26 March, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

  6. arXiv:2009.06693  [pdf, other

    cs.DC cs.LG

    Accelerating Graph Sampling for Graph Machine Learning using GPUs

    Authors: Abhinav Jangda, Sandeep Polisetty, Arjun Guha, Marco Serafini

    Abstract: Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling. Samplin… ▽ More

    Submitted 10 May, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Published in EuroSys 2021

  7. arXiv:1909.07190  [pdf, other

    cs.PL cs.DC

    Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs

    Authors: Abhinav Jangda, Arjun Guha

    Abstract: Domain-specific languages that execute image processing pipelineson GPUs, such as Halide and Forma, operate by 1) dividing the image into overlapped tiles, and 2) fusing loops to improve memory locality. However, current approaches have limitations: 1) they require intra thread block synchronization, which has a non-trivial cost, 2) they must choose between small tiles that require more overlapped… ▽ More

    Submitted 8 September, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

  8. Formal Foundations of Serverless Computing

    Authors: Abhinav Jangda, Donald Pinckney, Yuriy Brun, Arjun Guha

    Abstract: Serverless computing (also known as functions as a service) is a new cloud computing abstraction that makes it easier to write robust, large-scale web services. In serverless computing, programmers write what are called serverless functions, and the cloud platform transparently manages the operating system, resource allocation, load-balancing, and fault tolerance. When demand for the service spike… ▽ More

    Submitted 4 October, 2020; v1 submitted 15 February, 2019; originally announced February 2019.

    Journal ref: PACMPL, OOPSLA issue, vol. 3, October 2019, pp. 149:1-149:26

  9. Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code

    Authors: Abhinav Jangda, Bobby Powers, Emery Berger, Arjun Guha

    Abstract: All major web browsers now support WebAssembly, a low-level bytecode intended to serve as a compilation target for code written in languages like C and C++. A key goal of WebAssembly is performance parity with native code; previous work reports near parity, with many applications compiled to WebAssembly running on average 10% slower than native code. However, this evaluation was limited to a suite… ▽ More

    Submitted 31 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: Accepted (to appear) at USENIX Annual Technical Conference 2019

  10. arXiv:1901.05138  [pdf, other

    cs.PL cs.AI

    Predicting Variable Types in Dynamically Typed Programming Languages

    Authors: Abhinav Jangda, Gaurav Anand

    Abstract: Dynamic Programming Languages are quite popular because they increase the programmer's productivity. However, the absence of types in the source code makes the program written in these languages difficult to understand and virtual machines that execute these programs cannot produced optimized code. To overcome this challenge, we develop a technique to predict types of all identifiers including var… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

  11. arXiv:1509.08068  [pdf, other

    cs.PL

    Block-Level Parallelism in Parsing Block Structured Languages

    Authors: Abhinav Jangda

    Abstract: Softwares source code is becoming large and complex. Compilation of large base code is a time consuming process. Parallel compilation of code will help in reducing the time complexity. Parsing is one of the phases in compiler in which significant amount of time of compilation is spent. Techniques have already been developed to extract the parallelism available in parser. Current LR(k) parallel par… ▽ More

    Submitted 27 September, 2015; originally announced September 2015.