Skip to main content

Showing 1–5 of 5 results for author: Bambhaniya, A R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07953  [pdf, other

    cs.LG cs.AI cs.AR

    Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

    Authors: Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

    Abstract: Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tun… ▽ More

    Submitted 31 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  2. arXiv:2402.04744  [pdf, other

    cs.LG cs.AR

    Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

    Authors: Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

    Abstract: N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions (… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 18 pages, 8 figures, 17 tables. Code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity

  3. arXiv:2306.17266  [pdf, other

    cs.DC cs.LG

    Subgraph Stationary Hardware-Software Inference Co-Design

    Authors: Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov

    Abstract: A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-ex… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: 16 pages; MLSYS 2023

  4. arXiv:2302.08687  [pdf, other

    cs.AR cs.AI cs.LG

    VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

    Authors: Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna

    Abstract: Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible via GEMM instructions. CPUs are pervasive and need to handle diverse requirements across DL workloads running in edge/HPC/cloud platforms. Therefore, as DL workloads embrace sparsity to reduce the computations… ▽ More

    Submitted 23 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: This paper is accepted to HPCA 2023

  5. arXiv:2211.16648  [pdf, other

    cs.DC cs.AI cs.LG

    COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

    Authors: Divya Kiran Kadiyala, Saeed Rashidi, Taekyung Heo, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexandros Daglis

    Abstract: Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train. Designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging task requiring careful balance of compute, memory, and network resources. Moreover, a plethora of each model's tuning knobs drastically affect the performance, wi… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 November, 2022; originally announced November 2022.