Skip to main content

Showing 1–7 of 7 results for author: Arafa, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2301.03592  [pdf, other

    cs.DC cs.ET

    Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

    Authors: George Michelogiannakis, Yehia Arafa, Brandon Cook, Liang Yuan Dai, Abdel Hameed Badawy, Madeleine Glick, Yuyang Wang, Keren Bergman, John Shalf

    Abstract: The diversity of workload requirements and increasing hardware heterogeneity in emerging high performance computing (HPC) systems motivate resource disaggregation. Resource disaggregation allows compute and memory resources to be allocated individually as required to each workload. However, it is unclear how to efficiently realize this capability and cost-effectively meet the stringent bandwidth a… ▽ More

    Submitted 17 July, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: 15 pages, 12 figures, 4 tables. Published in IEEE Cluster 2023

    ACM Class: C.2.1

  2. arXiv:2208.11174  [pdf, other

    cs.AR

    Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

    Authors: Hamdy Abdelkhalik, Yehia Arafa, Nandakishore Santhi, Abdel-Hameed Badawy

    Abstract: Graphics processing units (GPUs) are now considered the leading hardware to accelerate general-purpose workloads such as AI, data analytics, and HPC. Over the last decade, researchers have focused on demystifying and evaluating the microarchitecture features of various GPU architectures beyond what vendors reveal. This line of work is necessary to understand the hardware better and build more effi… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  3. arXiv:2202.07798  [pdf, other

    cs.LG cs.PF

    BB-ML: Basic Block Performance Prediction using Machine Learning Techniques

    Authors: Hamdy Abdelkhalik, Shamminuj Aktar, Yehia Arafa, Atanu Barai, Gopinath Chennupati, Nandakishore Santhi, Nishant Panda, Nirmal Prajapati, Nazmul Haque Turja, Stephan Eidenbenz, Abdel-Hameed Badawy

    Abstract: Recent years have seen the adoption of Machine Learning (ML) techniques to predict the performance of large-scale applications, mostly at a coarse level. In contrast, we propose to use ML techniques for performance prediction at a much finer granularity, namely at the Basic Block (BB) level, which are single entry, single exit code blocks that are used for analysis by the compilers to break down a… ▽ More

    Submitted 11 November, 2023; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: Accepted at the 29th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2023)

  4. PPT-Multicore: Performance Prediction of OpenMP applications using Reuse Profiles and Analytical Modeling

    Authors: Atanu Barai, Yehia Arafa, Abdel-Hameed Badawy, Gopinath Chennupati, Nandakishore Santhi, Stephan Eidenbenz

    Abstract: We present PPT-Multicore, an analytical model embedded in the Performance Prediction Toolkit (PPT) to predict parallel application performance running on a multicore processor. PPT-Multicore builds upon our previous work towards a multicore cache model. We extract LLVM basic block labeled memory trace using an architecture-independent LLVM-based instrumentation tool only once in an application's l… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2103.10635. J Supercomput (2021)

    Report number: LA-UR-21-22749

  5. PPT-SASMM: Scalable Analytical Shared Memory Model: Predicting the Performance of Multicore Caches from a Single-Threaded Execution Trace

    Authors: Atanu Barai, Gopinath Chennupati, Nandakishore Santhi, Abdel-Hameed Badawy, Yehia Arafa, Stephan Eidenbenz

    Abstract: Performance modeling of parallel applications on multicore processors remains a challenge in computational co-design due to multicore processors' complex design. Multicores include complex private and shared memory hierarchies. We present a Scalable Analytical Shared Memory Model (SASMM). SASMM can predict the performance of parallel applications running on a multicore. SASMM uses a probabilistic… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: 11 pages, 5 figures. arXiv admin note: text overlap with arXiv:1907.12666

  6. Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs

    Authors: Yehia Arafa, Ammar ElWazir, Abdelrahman ElKanishy, Youssef Aly, Ayatelrahman Elsayed, Abdel-Hameed Badawy, Gopinath Chennupati, Stephan Eidenbenz, Nandakishore Santhi

    Abstract: GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of… ▽ More

    Submitted 2 June, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  7. arXiv:1905.08778  [pdf, other

    cs.DC cs.PF

    Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

    Authors: Yehia Arafa, Abdel-Hameed Badawy, Gopinath Chennupati, Nandakishore Santhi, Stephan Eidenbenz

    Abstract: The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for graphics operations as well as general-purpose computing (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed ch… ▽ More

    Submitted 1 September, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: Several typos in addition to paper tittle are updated