Skip to main content

Showing 1–16 of 16 results for author: Santhi, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.12883  [pdf, other

    cs.SE cs.PF cs.PL

    LLVM Static Analysis for Program Characterization and Memory Reuse Profile Estimation

    Authors: Atanu Barai, Nandakishore Santhi, Abdur Razzak, Stephan Eidenbenz, Abdel-Hameed A. Badawy

    Abstract: Profiling various application characteristics, including the number of different arithmetic operations performed, memory footprint, etc., dynamically is time- and space-consuming. On the other hand, static analysis methods, although fast, can be less accurate. This paper presents an LLVM-based probabilistic static analysis method that accurately predicts different program characteristics and estim… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: This paper was accepted at the MEMSYS '23 conference, The International Symposium on Memory Systems, October 02, 2023 - October 05, 2023, Alexandria, VA

  2. arXiv:2208.11174  [pdf, other

    cs.AR

    Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

    Authors: Hamdy Abdelkhalik, Yehia Arafa, Nandakishore Santhi, Abdel-Hameed Badawy

    Abstract: Graphics processing units (GPUs) are now considered the leading hardware to accelerate general-purpose workloads such as AI, data analytics, and HPC. Over the last decade, researchers have focused on demystifying and evaluating the microarchitecture features of various GPU architectures beyond what vendors reveal. This line of work is necessary to understand the hardware better and build more effi… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  3. arXiv:2202.07798  [pdf, other

    cs.LG cs.PF

    BB-ML: Basic Block Performance Prediction using Machine Learning Techniques

    Authors: Hamdy Abdelkhalik, Shamminuj Aktar, Yehia Arafa, Atanu Barai, Gopinath Chennupati, Nandakishore Santhi, Nishant Panda, Nirmal Prajapati, Nazmul Haque Turja, Stephan Eidenbenz, Abdel-Hameed Badawy

    Abstract: Recent years have seen the adoption of Machine Learning (ML) techniques to predict the performance of large-scale applications, mostly at a coarse level. In contrast, we propose to use ML techniques for performance prediction at a much finer granularity, namely at the Basic Block (BB) level, which are single entry, single exit code blocks that are used for analysis by the compilers to break down a… ▽ More

    Submitted 11 November, 2023; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: Accepted at the 29th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2023)

  4. PPT-Multicore: Performance Prediction of OpenMP applications using Reuse Profiles and Analytical Modeling

    Authors: Atanu Barai, Yehia Arafa, Abdel-Hameed Badawy, Gopinath Chennupati, Nandakishore Santhi, Stephan Eidenbenz

    Abstract: We present PPT-Multicore, an analytical model embedded in the Performance Prediction Toolkit (PPT) to predict parallel application performance running on a multicore processor. PPT-Multicore builds upon our previous work towards a multicore cache model. We extract LLVM basic block labeled memory trace using an architecture-independent LLVM-based instrumentation tool only once in an application's l… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2103.10635. J Supercomput (2021)

    Report number: LA-UR-21-22749

  5. PPT-SASMM: Scalable Analytical Shared Memory Model: Predicting the Performance of Multicore Caches from a Single-Threaded Execution Trace

    Authors: Atanu Barai, Gopinath Chennupati, Nandakishore Santhi, Abdel-Hameed Badawy, Yehia Arafa, Stephan Eidenbenz

    Abstract: Performance modeling of parallel applications on multicore processors remains a challenge in computational co-design due to multicore processors' complex design. Multicores include complex private and shared memory hierarchies. We present a Scalable Analytical Shared Memory Model (SASMM). SASMM can predict the performance of parallel applications running on a multicore. SASMM uses a probabilistic… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: 11 pages, 5 figures. arXiv admin note: text overlap with arXiv:1907.12666

  6. arXiv:2010.04212  [pdf, other

    cs.PF cs.AR

    Machine Learning Enabled Scalable Performance Prediction of Scientific Codes

    Authors: Gopinath Chennupati, Nandakishore Santhi, Phill Romero, Stephan Eidenbenz

    Abstract: We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input, predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) anal… ▽ More

    Submitted 12 November, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: Under review at ACM TOMACS 2020 for a special issue

  7. Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs

    Authors: Yehia Arafa, Ammar ElWazir, Abdelrahman ElKanishy, Youssef Aly, Ayatelrahman Elsayed, Abdel-Hameed Badawy, Gopinath Chennupati, Stephan Eidenbenz, Nandakishore Santhi

    Abstract: GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of… ▽ More

    Submitted 2 June, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  8. arXiv:1907.12666  [pdf, other

    cs.PF cs.SE

    Modeling Shared Cache Performance of OpenMP Programs using Reuse Distance

    Authors: Atanu Barai, Gopinath Chennupati, Nandakishore Santhi, Abdel-Hameed A. Badawy, Stephan Eidenbenz

    Abstract: Performance modeling of parallel applications on multicore computers remains a challenge in computational co-design due to the complex design of multicore processors including private and shared memory hierarchies. We present a Scalable Analytical Shared Memory Model to predict the performance of parallel applications that runs on a multicore computer and shares the same level of cache in the hier… ▽ More

    Submitted 29 July, 2019; originally announced July 2019.

    Report number: LA-UR-19-27398

  9. arXiv:1905.08778  [pdf, other

    cs.DC cs.PF

    Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

    Authors: Yehia Arafa, Abdel-Hameed Badawy, Gopinath Chennupati, Nandakishore Santhi, Stephan Eidenbenz

    Abstract: The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for graphics operations as well as general-purpose computing (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed ch… ▽ More

    Submitted 1 September, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: Several typos in addition to paper tittle are updated

  10. arXiv:1804.03719  [pdf, other

    cs.ET quant-ph

    Quantum Algorithm Implementations for Beginners

    Authors: Abhijith J., Adetokunbo Adedoyin, John Ambrosiano, Petr Anisimov, William Casper, Gopinath Chennupati, Carleton Coffrin, Hristo Djidjev, David Gunter, Satish Karra, Nathan Lemons, Shizeng Lin, Alexander Malyzhenkov, David Mascarenas, Susan Mniszewski, Balu Nadiga, Daniel O'Malley, Diane Oyen, Scott Pakin, Lakshman Prasad, Randy Roberts, Phillip Romero, Nandakishore Santhi, Nikolai Sinitsyn, Pieter J. Swart , et al. (9 additional authors not shown)

    Abstract: As quantum computers become available to the general public, the need has arisen to train a cohort of quantum programmers, many of whom have been develo** classical computer programs for most of their careers. While currently available quantum computers have less than 100 qubits, quantum computing hardware is widely expected to grow in terms of qubit count, quality, and connectivity. This review… ▽ More

    Submitted 26 June, 2022; v1 submitted 10 April, 2018; originally announced April 2018.

    Comments: ACM Transactions on Quantum Computing

    Report number: LA-UR-20-22353

    Journal ref: ACM Transactions on Quantum Computing, Volume 3, Issue 4, 18 (2022)

  11. arXiv:1712.04892  [pdf, other

    cs.AR

    Accelerator Codesign as Non-Linear Optimization

    Authors: Nirmal Prajapati, Sanjay Rajopadhye, Hristo Djidjev, Nandkishore Santhi, Tobias Grosser, Rumen Andonov

    Abstract: We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable GPGPUs. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterizati… ▽ More

    Submitted 13 December, 2017; originally announced December 2017.

    Comments: 10 pages, 4 figures, 2 tables

  12. arXiv:1011.4098  [pdf, other

    cs.SI math.PR stat.AP

    Understanding Cascading Failures in Power Grids

    Authors: Sachin Kadloor, Nandakishore Santhi

    Abstract: In the past, we have observed several large blackouts, i.e. loss of power to large areas. It has been noted by several researchers that these large blackouts are a result of a cascade of failures of various components. As a power grid is made up of several thousands or even millions of components (relays, breakers, transformers, etc.), it is quite plausible that a few of these components do not pe… ▽ More

    Submitted 17 November, 2010; originally announced November 2010.

    Comments: 12 pages; 9 figures; being submitted to IEEE Trans. on Smart Grids

    Report number: LA-UR 10-07070

  13. On Algebraic Decoding of $q$-ary Reed-Muller and Product-Reed-Solomon Codes

    Authors: Nandakishore Santhi

    Abstract: We consider a list decoding algorithm recently proposed by Pellikaan-Wu \cite{PW2005} for $q$-ary Reed-Muller codes $\mathcal{RM}_q(\ell, m, n)$ of length $n \leq q^m$ when $\ell \leq q$. A simple and easily accessible correctness proof is given which shows that this algorithm achieves a relative error-correction radius of $τ\leq (1 - \sqrt{\ell q^{m-1}/{n}})$. This is an improvement over the pr… ▽ More

    Submitted 21 April, 2007; originally announced April 2007.

    Comments: 5 pages, 5 figures, to be presented at 2007 IEEE International Symposium on Information Theory, Nice, France (ISIT 2007)

    Report number: LA-UR-07-0469 ACM Class: E.4

  14. arXiv:cs/0608087  [pdf, ps, other

    cs.IT cs.DM

    On an Improvement over Rényi's Equivocation Bound

    Authors: Nandakishore Santhi, Alexander Vardy

    Abstract: We consider the problem of estimating the probability of error in multi-hypothesis testing when MAP criterion is used. This probability, which is also known as the Bayes risk is an important measure in many communication and information theory problems. In general, the exact Bayes risk can be difficult to obtain. Many upper and lower bounds are known in literature. One such upper bound is the eq… ▽ More

    Submitted 22 August, 2006; originally announced August 2006.

    Comments: 8 pages, 6 figures, To be presented at the 44-th Annual Allerton Conference on Communication, Control, and Computing, September 2006

    ACM Class: E.4

  15. arXiv:cs/0608086  [pdf, ps, other

    cs.IT cs.DM

    Analog Codes on Graphs

    Authors: Nandakishore Santhi, Alexander Vardy

    Abstract: We consider the problem of transmission of a sequence of real data produced by a Nyquist sampled band-limited analog source over a band-limited analog channel, which introduces an additive white Gaussian noise. An analog coding scheme is described, which can achieve a mean-squared error distortion proportional to $(1+SNR)^{-B}$ for a bandwidth expansion factor of $B/R$, where $0 < R < 1$ is the… ▽ More

    Submitted 22 August, 2006; originally announced August 2006.

    Comments: 18 pages, 15 figures, Portions appeared in Proceedings of the International Symposium on Information Theory (ISIT), Yokohama, Japan, July 2003

    ACM Class: E.4

  16. arXiv:cs/0608085   

    cs.CC cs.DM cs.IT

    A Quadratic Time-Space Tradeoff for Unrestricted Deterministic Decision Branching Programs

    Authors: Nandakishore Santhi, Alexander Vardy

    Abstract: For a decision problem from coding theory, we prove a quadratic expected time-space tradeoff of the form $\eT\eS=Ω(\tfrac{n^2}{q})$ for $q$-way deterministic decision branching programs, where $q\geq 2$. Here $\eT$ is the expected computation time and $\eS$ is the expected space, when all inputs are equally likely. This bound is to our knowledge, the first such to show an exponential size requirem… ▽ More

    Submitted 17 November, 2010; v1 submitted 22 August, 2006; originally announced August 2006.

    Comments: Withdrawn

    ACM Class: F.2.3; E.4