Skip to main content

Showing 1–14 of 14 results for author: Sarbazi-Azad, H

.
  1. Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

    Authors: Rakesh Nadig, Mohammad Sadrosadati, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, Onur Mutlu

    Abstract: The performance and capacity of solid-state drives (SSDs) are continuously improving to meet the increasing demands of modern data-intensive applications. Unfortunately, communication between the SSD controller and memory chips (e.g., 2D/3D NAND flash chips) is a critical performance bottleneck for many applications. SSDs use a multi-channel shared bus architecture where multiple memory chips conn… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: To appear in Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023

  2. arXiv:2209.10914  [pdf, other

    cs.AR

    Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources

    Authors: Sina Darabi, Mohammad Sadrosadati, Joël Lindegger, Negar Akbarzadeh, Mohammad Hosseini, Jisung Park, Juan Gómez-Luna, Hamid Sarbazi-Azad, Onur Mutlu

    Abstract: Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU core… ▽ More

    Submitted 6 April, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

    Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

  3. arXiv:2201.04353  [pdf, other

    cs.DL

    A simple model for citation curve

    Authors: Y. C. Tay, Mostafa Rezazad, Hamid Sarbazi-Azad

    Abstract: There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three para… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 13 pages, 19 figures, 2 tables

  4. arXiv:2102.01764  [pdf, other

    cs.AR

    MANA: Microarchitecting an Instruction Prefetcher

    Authors: Ali Ansari, Fatemeh Golshan, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

    Abstract: L1 instruction (L1-I) cache misses are a source of performance bottleneck. Sequential prefetchers are simple solutions to mitigate this problem; however, prior work has shown that these prefetchers leave considerable potentials uncovered. This observation has motivated many researchers to come up with more advanced instruction prefetchers. In 2011, Proactive Instruction Fetch (PIF) showed that a h… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 24 pages with 15 figures

  5. arXiv:2101.00969  [pdf, other

    cs.AR

    Understanding Power Consumption and Reliability of High-Bandwidth Memory with Voltage Underscaling

    Authors: Seyed Saber Nabavi Larimi, Behzad Salami, Osman S. Unsal, Adrian Cristal Kestelman, Hamid Sarbazi-Azad, Onur Mutlu

    Abstract: Modern computing devices employ High-Bandwidth Memory (HBM) to meet their memory bandwidth requirements. An HBM-enabled device consists of multiple DRAM layers stacked on top of one another next to a compute chip (e.g. CPU, GPU, and FPGA) in the same package. Although such HBM structures provide high bandwidth at a small form factor, the stacked memory layers consume a substantial portion of the p… ▽ More

    Submitted 30 December, 2020; originally announced January 2021.

    Comments: To appear at DATE 2021 conference

  6. arXiv:2010.09330  [pdf, other

    cs.AR

    Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

    Authors: Mohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, Onur Mutlu

    Abstract: Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching reg… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: To Appear in ACM Transactions on Computer Systems (TOCS)

  7. arXiv:2009.00715  [pdf, other

    cs.AR

    A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

    Authors: Mohammad Bakhshalipour, Mehran Shakerinava, Fatemeh Golshan, Ali Ansari, Pejman Lotfi-Karman, Hamid Sarbazi-Azad

    Abstract: Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness of data prefetching is evident to both industry and academy: nowadays, almost every high-performance processor incorporates a few data prefetchers for capturin… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

  8. arXiv:2005.03451  [pdf, other

    cs.LG

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Authors: Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu

    Abstract: We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power tr… ▽ More

    Submitted 30 December, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: To appear at the DSN 2020 conference

  9. arXiv:1812.11473  [pdf, other

    cs.LG stat.ML

    ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

    Authors: Hajar Falahati, Pejman Lotfi-Kamran, Mohammad Sadrosadati, Hamid Sarbazi-Azad

    Abstract: Memory bandwidth bottleneck is a major challenges in processing machine learning (ML) algorithms. In-memory acceleration has potential to address this problem; however, it needs to address two challenges. First, in-memory accelerator should be general enough to support a large set of different ML algorithms. Second, it should be efficient enough to utilize bandwidth while meeting limited power and… ▽ More

    Submitted 9 January, 2019; v1 submitted 30 December, 2018; originally announced December 2018.

    Comments: 11 pages, 9 figures

    MSC Class: 68M01

  10. arXiv:1809.08828  [pdf, other

    cs.AR

    Die-Stacked DRAM: Memory, Cache, or MemCache?

    Authors: Mohammad Bakhshalipour, HamidReza Zare, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

    Abstract: Die-stacked DRAM is a promising solution for satisfying the ever-increasing memory bandwidth requirements of multi-core processors. Manufacturing technology has enabled stacking several gigabytes of DRAM modules on the active die, thereby providing orders of magnitude higher bandwidth as compared to the conventional DIMM-based DDR memories. Nevertheless, die-stacked DRAM, due to its limited capaci… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

  11. arXiv:1808.05024  [pdf, other

    cs.AR

    Making Belady-Inspired Replacement Policies More Effective Using Expected Hit Count

    Authors: Seyed Armin Vakil Ghahani, Sara Mahdizadeh Shahri, Mohammad Bakhshalipour, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

    Abstract: Memory-intensive workloads operate on massive amounts of data that cannot be captured by last-level caches (LLCs) of modern processors. Consequently, processors encounter frequent off-chip misses, and hence, lose a significant performance potential. One way to reduce the number of off-chip misses is through using a well-behaved replacement policy in the LLC. Existing processors employ a variation… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

  12. arXiv:1808.04864  [pdf, other

    cs.AR

    Scale-Out Processors & Energy Efficiency

    Authors: Pouya Esmaili-Dokht, Mohammad Bakhshalipour, Behnam Khodabandeloo, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

    Abstract: Scale-out workloads like media streaming or Web search serve millions of users and operate on a massive amount of data, and hence, require enormous computational power. As the number of users is increasing and the size of data is expanding, even more computational power is necessary for powering up such workloads. Data centers with thousands of servers are providing the computational power necessa… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

  13. arXiv:1805.07269  [pdf, other

    cs.DC

    Parallelizing Bisection Root-Finding: A Case for Accelerating Serial Algorithms in Multicore Substrates

    Authors: Mohammad Bakhshalipour, Hamid Sarbazi-Azad

    Abstract: Multicore architectures dominate today's processor market. Even though the number of cores and threads are pretty high and continues to grow, inherently serial algorithms do not benefit from the abundance of cores and threads. In this paper, we propose Runahead Computing, a technique which uses idle threads in a multi-threaded architecture for accelerating the execution time of serial algorithms.… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

    Comments: 5 pages, 7 figures

  14. arXiv:0710.1924  [pdf

    cs.NI cs.AI

    A Heuristic Routing Mechanism Using a New Addressing Scheme

    Authors: Mohsen Ravanbakhsh, Yasin Abbasi-Yadkori, Maghsoud Abbaspour, Hamid Sarbazi-Azad

    Abstract: Current methods of routing are based on network information in the form of routing tables, in which routing protocols determine how to update the tables according to the network changes. Despite the variability of data in routing tables, node addresses are constant. In this paper, we first introduce the new concept of variable addresses, which results in a novel framework to cope with routing pr… ▽ More

    Submitted 10 October, 2007; originally announced October 2007.

    Comments: 8 pages, because of lack of space journal reference just contains the reference to the proceeding

    Journal ref: Proceedings of First International Conference on Bio Inspired models of Networks, Information and Computing Systems (BIONETICS), Cavalese, Italy, December 2006