Skip to main content

Showing 1–14 of 14 results for author: Sadayappan, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11307  [pdf, other

    cs.CL

    An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers

    Authors: Ashim Gupta, Sina Mahdipour Saravani, P. Sadayappan, Vivek Srikumar

    Abstract: The increasing size of transformer-based models in NLP makes the question of compressing them important. In this work, we present a comprehensive analysis of factorization based model compression techniques. Specifically, we focus on comparing straightforward low-rank factorization against the recently introduced Monarch factorization, which exhibits impressive performance preservation on the GLUE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. What Operations can be Performed Directly on Compressed Arrays, and with What Error?

    Authors: Tripti Agarwal, Harvey Dam, Dorra Ben Khalifa, Matthieu Martel, P. Sadayappan, Ganesh Gopalakrishnan

    Abstract: In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental oper… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: An extended but earlier version of paper in https://dl.acm.org/doi/10.1145/3624062.3625122 published at the DRBSD Workshop in 2023

  3. arXiv:2305.19400  [pdf, other

    cs.CE

    Automating GPU Scalability for Complex Scientific Models: Phonon Boltzman Transport Equation

    Authors: Eric Heisler, Siddharth Saurav, Aadesh Deshmukh, Sandip Mazumder, Ponnuswamy Sadayappan, Hari Sundar

    Abstract: Heterogeneous computing environments combining CPU and GPU resources provide a great boost to large-scale scientific computing applications. Code generation utilities that partition the work into CPU and GPU tasks while considering data movement costs allow researchers to more quickly and easily develop high-performance solutions, and make these resources accessible to a larger user base. We pre… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  4. TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition

    Authors: Lizhi Xiang, Miao Yin, Chengming Zhang, Aravind Sukumaran-Rajam, P. Sadayappan, Bo Yuan, Dingwen Tao

    Abstract: Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference cod… ▽ More

    Submitted 4 January, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: 14 pages, 9 figures, 3 tables, accepted by PPoPP '23

  5. Efficient distributed algorithms for Convolutional Neural Networks

    Authors: Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P Sadayappan

    Abstract: Several efficient distributed algorithms have been developed for matrix-matrix multiplication: the 3D algorithm, the 2D SUMMA algorithm, and the 2.5D algorithm. Each of these algorithms was independently conceived and they trade-off memory needed per node and the inter-node data communication volume. The convolutional neural network (CNN) computation may be viewed as a generalization of matrix-m… ▽ More

    Submitted 30 May, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

    Comments: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '21), July 6--8, 2021, Virtual Event, USA

    Journal ref: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '21), July 6--8, 2021, Virtual Event, USA

  6. Analytical Characterization and Design Space Exploration for Optimization of CNNs

    Authors: Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P. Sadayappan

    Abstract: Moving data through the memory hierarchy is a fundamental bottleneck that can limit the performance of core algorithms of machine learning, such as convolutional neural networks (CNNs). Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. However, the search space for finding the best loop-level optimization configuration is… ▽ More

    Submitted 5 March, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Comments: In proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '21), April 19-23, 2021, Virtual, USA

    Journal ref: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

  7. arXiv:1911.06664  [pdf, other

    cs.CC

    Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs

    Authors: Auguste Olivry, Julien Langou, Louis-Noël Pouchet, P. Sadayappan, Fabrice Rastello

    Abstract: For most relevant computation, the energy and time needed for data movement dominates that for performing arithmetic operations on all computing systems today. Hence it is of critical importance to understand the minimal total data movement achievable during the execution of an algorithm. The achieved total data movement for different schedules of an algorithm can vary widely depending on how effi… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

  8. arXiv:1904.07935  [pdf, other

    cs.LG cs.DC stat.ML

    PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

    Authors: Gordon E. Moon, Aravind Sukumaran-Rajam, Srinivasan Parthasarathy, P. Sadayappan

    Abstract: Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed in the past. However, existing parallel NMF algorithms have not ad… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

    Comments: 11 pages, 5 tables, 9 figures

  9. arXiv:1904.03329  [pdf, other

    cs.DC

    Load-Balanced Sparse MTTKRP on GPUs

    Authors: Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, P. Sadayappan

    Abstract: Sparse matricized tensor times Khatri-Rao product (MTTKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP operation on GPUs, addressing both performance and storage requirements. We begin by identifying the performance bottlenecks in directly extending the state-of-the-art CSF (compressed sparse fiber) format from CPUs… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

  10. arXiv:1811.00839  [pdf, other

    cs.AI cs.IR cs.LG cs.SI

    ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation

    Authors: Jiankai Sun, Bortik Bandyopadhyay, Armin Bashizade, Jiongqian Liang, P. Sadayappan, Srinivasan Parthasarathy

    Abstract: Directed graphs have been widely used in Community Question Answering services (CQAs) to model asymmetric relationships among different types of nodes in CQA graphs, e.g., question, answer, user. Asymmetric transitivity is an essential property of directed graphs, since it can play an important role in downstream graph inference and analysis. Question difficulty and user expertise follow the chara… ▽ More

    Submitted 6 November, 2018; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: has been accepted to the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019), acceptance rate: 1150/7095 = 16.2%

  11. On Characterizing the Data Access Complexity of Programs

    Authors: Venmugil Elango, Fabrice Rastello, Louis-Noel Pouchet, J. Ramanujam, P. Sadayappan

    Abstract: Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of develo** lower bounds for data access complexity has be… ▽ More

    Submitted 9 November, 2014; originally announced November 2014.

    ACM Class: F.2; D.2.8

  12. arXiv:1404.4767  [pdf, other

    cs.DC cs.DS

    On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

    Authors: Venmugil Elango, Fabrice Rastello, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan

    Abstract: Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter) in parallel computer systems is decreasing. It is there- fore of considerabl… ▽ More

    Submitted 18 April, 2014; originally announced April 2014.

    Report number: RR-8522

  13. arXiv:1401.5024  [pdf, other

    cs.OH

    Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

    Authors: Naznin Fauzia, Venmugil Elango, Mahesh Ravishankar, J. Ramanujam, Fabrice Rastello, Atanas Rountev, Louis-Noël Pouchet, P. Sadayappan

    Abstract: Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order… ▽ More

    Submitted 21 December, 2013; originally announced January 2014.

    Comments: Transaction on Architecture and Code Optimization (2014)

  14. arXiv:1103.2405  [pdf

    math.NA cs.MS

    Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

    Authors: Xintian Yang, Srinivasan Parthasarathy, Ponnuswamy Sadayappan

    Abstract: Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable, approach to data representation for computing this kernel, particularly targeting sparse matrices representing power-law graphs. Using real data, we show how our… ▽ More

    Submitted 11 March, 2011; originally announced March 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 4, pp. 231-242 (2011)