Skip to main content

Showing 1–13 of 13 results for author: Hajj, I E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.01893  [pdf, other

    cs.AR cs.DC cs.SE

    SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

    Authors: **fan Chen, Juan Gómez-Luna, Izzat El Hajj, Yuxin Guo, Onur Mutlu

    Abstract: Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g., the UPMEM system) is now available and has demonstrated potential in many applications. However, programming such real PIM hardware remains a challenge for man… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  2. arXiv:2304.01676  [pdf, other

    cs.DC

    Predicting the Performance-Cost Trade-off of Applications Across Multiple Systems

    Authors: Amir Nassereldine, Safaa Diab, Mohammed Baydoun, Kenneth Leach, Maxim Alt, Dejan Milojicic, Izzat El Hajj

    Abstract: In modern computing environments, users may have multiple systems accessible to them such as local clusters, private clouds, or public clouds. This abundance of choices makes it difficult for users to select the system and configuration for running an application that best meet their performance and cost objectives. To assist such users, we propose a prediction tool that predicts the full performa… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  3. arXiv:2302.13394  [pdf, other

    cs.AR

    Asynchronous Persistence with ASAP

    Authors: Ahmed Abulila, Izzat El Hajj, Myoungsoo Jung, Nam Sung Kim

    Abstract: Supporting atomic durability of updates for persistent memories is typically achieved with Write-Ahead Logging (WAL). WAL flushes log entries to persistent memory before making the actual data persistent to ensure that a consistent state can be recovered if a crash occurs. Performing WAL in hardware is attractive because it makes most aspects of log management transparent to software, and it compl… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: 2 pages, 2 figures, 14th Annual Non-Volatile Memories Workshop

  4. arXiv:2212.01473  [pdf, other

    cs.DC

    Parallelizing Maximal Clique Enumeration on GPUs

    Authors: Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, **jun Xiong, Wen-mei Hwu

    Abstract: We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which has limited scalability because of the explosion in the number of tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing depth-first tra… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

  5. arXiv:2208.01243  [pdf, other

    cs.AR cs.DC

    A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems

    Authors: Safaa Diab, Amir Nassereldine, Mohammed Alser, Juan Gómez-Luna, Onur Mutlu, Izzat El Hajj

    Abstract: Sequence alignment is a memory bound computation whose performance in modern systems is limited by the memory bandwidth bottleneck. Processing-in-memory architectures alleviate this bottleneck by providing the memory with computing competencies. We propose Alignment-in-Memory (AIM), a framework for high-throughput sequence alignment using processing-in-memory, and evaluate it on UPMEM, the first p… ▽ More

    Submitted 27 March, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

  6. arXiv:2204.10402  [pdf, other

    cs.DC

    Parallel Vertex Cover Algorithms on GPUs

    Authors: Peter Yamout, Karim Barada, Adnan Jaljuli, Amer E. Mouawad, Izzat El Hajj

    Abstract: Finding small vertex covers in a graph has applications in numerous domains. Two common formulations of the problem include: Minimum Vertex Cover, which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover, which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations traverse a search tree, which grows exponentially with… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  7. arXiv:2204.02085  [pdf, other

    cs.AR cs.DC

    High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory

    Authors: Safaa Diab, Amir Nassereldine, Mohammed Alser, Juan Gómez Luna, Onur Mutlu, Izzat El Hajj

    Abstract: We show that the wavefront algorithm can achieve higher pairwise read alignment throughput on a UPMEM PIM system than on a server-grade multi-threaded CPU system.

    Submitted 23 April, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  8. arXiv:2201.02789  [pdf, other

    cs.DC cs.AR

    A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

    Authors: Mhd Ghaith Olabi, Juan Gómez Luna, Onur Mutlu, Wen-mei Hwu, Izzat El Hajj

    Abstract: Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted beforehand. However, prior works have shown that dynamic parallelism may impose a high performance penalty when a large number of small grids are launched. The large num… ▽ More

    Submitted 8 January, 2022; originally announced January 2022.

  9. arXiv:2110.01709  [pdf, other

    cs.AR cs.DC cs.PF

    Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

    Authors: Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu

    Abstract: Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads… ▽ More

    Submitted 3 April, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Invited paper to appear at Workshop on Computing with Unconventional Technologies (CUT) 2021 https://sites.google.com/umn.edu/cut-2021/home. arXiv admin note: substantial text overlap with arXiv:2105.03814

  10. arXiv:2105.03814  [pdf, other

    cs.AR cs.DC cs.PF

    Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

    Authors: Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu

    Abstract: Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bo… ▽ More

    Submitted 4 May, 2022; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: Our open source software is available at https://github.com/CMU-SAFARI/prim-benchmarks

  11. arXiv:2104.13209  [pdf, other

    cs.DC cs.DS

    Parallel K-Clique Counting on GPUs

    Authors: Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, **jun Xiong, Wen-mei Hwu

    Abstract: Counting k-cliques in a graph is an important problem in graph analysis with many applications such as community detection and graph partitioning. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. Parallelizing k-clique counting has been well-studied on CPUs and many solutions exist. However, there are no performant solutions for k-clique countin… ▽ More

    Submitted 6 June, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

  12. arXiv:1912.11516  [pdf, other

    cs.DC cs.AR cs.ET eess.SP

    PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM

    Authors: Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-mei Hwu, Kaushik Roy

    Abstract: The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars a… ▽ More

    Submitted 24 December, 2019; originally announced December 2019.

    Comments: 13 pages, 15 figures

  13. arXiv:1901.10351  [pdf, other

    cs.ET cs.AR

    PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

    Authors: Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei Hwu, John Paul Strachan, Kaushik Roy, Dejan S Milojicic

    Abstract: Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications. We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossba… ▽ More

    Submitted 29 January, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Accepted in ASPLOS 2019