Skip to main content

Showing 1–17 of 17 results for author: Shahroodi, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.17724  [pdf, other

    cs.AR cs.ET

    High-Performance Data Map** for BNNs on PCM-based Integrated Photonics

    Authors: Taha Shahroodi, Raphael Cardoso, Stephan Wong, Alberto Bosio, Ian O'Connor, Said Hamdioui

    Abstract: State-of-the-Art (SotA) hardware implementations of Deep Neural Networks (DNNs) incur high latencies and costs. Binary Neural Networks (BNNs) are potential alternative solutions to realize faster implementations without losing accuracy. In this paper, we first present a new data map**, called TacitMap, suited for BNNs implemented based on a Computation-In-Memory (CIM) architecture. TacitMap maxi… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: To appear in Design Automation and Test in Europe (DATE), 2024

  2. arXiv:2401.16279  [pdf, other

    cs.AR

    Rethinking the Producer-Consumer Relationship in Modern DRAM-Based Systems

    Authors: Minesh Patel, Taha Shahroodi, Aditya Manglik, Abdullah Giray Yağlıkçı, Ataberk Olgun, Haocong Luo, Onur Mutlu

    Abstract: Generational improvements to commodity DRAM throughout half a century have long solidified its prevalence as main memory across the computing industry. However, overcoming today's DRAM technology scaling challenges requires new solutions driven by both DRAM producers and consumers. In this paper, we observe that the separation of concerns between producers and consumers specified by industry-wide… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.10378

  3. arXiv:2310.15634  [pdf, other

    cs.AR cs.ET q-bio.GN

    An In-Memory Architecture for High-Performance Long-Read Pre-Alignment Filtering

    Authors: Taha Shahroodi, Michael Miao, Joel Lindegger, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: With the recent move towards sequencing of accurate long reads, finding solutions that support efficient analysis of these reads becomes more necessary. The long execution time required for sequence alignment of long reads negatively affects genomic studies relying on sequence alignment. Although pre-alignment filtering as an extra step before alignment was recently introduced to mitigate sequence… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  4. arXiv:2310.04366  [pdf, other

    cs.AR cs.ET q-bio.GN

    Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors

    Authors: Taha Shahroodi, Gagandeep Singh, Mahdi Zahedi, Haiyu Mao, Joel Lindegger, Can Firtina, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: Basecalling, an essential step in many genome analysis studies, relies on large Deep Neural Networks (DNNs) to achieve high accuracy. Unfortunately, these DNNs are computationally slow and inefficient, leading to considerable delays and resource constraints in the sequence analysis process. A Computation-In-Memory (CIM) architecture using memristors can significantly accelerate the performance of… ▽ More

    Submitted 26 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

  5. arXiv:2211.06261  [pdf, other

    cs.ET

    BCIM: Efficient Implementation of Binary Neural Network Based on Computation in Memory

    Authors: Mahdi Zahedi, Taha Shahroodi, Stephan Wong, Said Hamdioui

    Abstract: Applications of Binary Neural Networks (BNNs) are promising for embedded systems with hard constraints on computing power. Contrary to conventional neural networks with the floating-point datatype, BNNs use binarized weights and activations which additionally reduces memory requirements. Memristors, emerging non-volatile memory devices, show great potential as the target implementation platform fo… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  6. arXiv:2207.09765  [pdf, other

    cs.AR cs.AI cs.LG q-bio.GN q-bio.QM

    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

    Authors: Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie Kim, Taha Shahroodi, Meryem Banu Cavlak, Joel Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, Onur Mutlu

    Abstract: Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highl… ▽ More

    Submitted 21 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM TACO

  7. arXiv:2206.01932  [pdf, other

    cs.AR q-bio.GN

    Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

    Authors: Taha Shahroodi, Mahdi Zahedi, Can Firtina, Mohammed Alser, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: Food profiling is an essential step in any food monitoring system needed to prevent health risks and potential frauds in the food industry. Significant improvements in sequencing technologies are pushing food profiling to become the main computational bottleneck. State-of-the-art profilers are unfortunately too costly for food profiling. Our goal is to design a food profiler that solves the main… ▽ More

    Submitted 24 August, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

  8. arXiv:2204.10378  [pdf, other

    cs.AR

    A Case for Transparent Reliability in DRAM Systems

    Authors: Minesh Patel, Taha Shahroodi, Aditya Manglik, A. Giray Yaglikci, Ataberk Olgun, Haocong Luo, Onur Mutlu

    Abstract: Today's systems have diverse needs that are difficult to address using one-size-fits-all commodity DRAM. Unfortunately, although system designers can theoretically adapt commodity DRAM chips to meet their particular design goals (e.g., by reducing access timings to improve performance, implementing system-level RowHammer mitigations), we observe that designers today lack sufficient insight into co… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  9. Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

    Authors: Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu

    Abstract: Past research has proposed numerous hardware prefetching techniques, most of which rely on exploiting one specific type of program context information (e.g., program counter, cacheline address) to predict future memory accesses. These techniques either completely neglect a prefetcher's undesirable effects (e.g., memory bandwidth usage) on the overall system, or incorporate system-level feedback as… ▽ More

    Submitted 6 April, 2023; v1 submitted 24 September, 2021; originally announced September 2021.

    ACM Class: C.1.2

  10. pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

    Authors: João Dinis Ferreira, Gabriel Falcao, Juan Gómez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S. Kim, Geraldo F. Oliveira, Taha Shahroodi, Anant Nori, Onur Mutlu

    Abstract: Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory… ▽ More

    Submitted 3 October, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

    ACM Class: B.3.1; C.1.3

    Journal ref: IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, 900-919

  11. arXiv:2104.05119  [pdf, other

    cs.AR

    BurstLink: Techniques for Energy-Efficient Conventional and Virtual Reality Video Display

    Authors: Jawad Haj-Yahya, Jisung Park, Rahul Bera, Juan Gómez Luna, Efraim Rotem, Taha Shahroodi, Jeremie Kim, Onur Mutlu

    Abstract: Conventional planar video streaming is the most popular application in mobile systems and the rapid growth of 360 video content and virtual reality (VR) devices are accelerating the adoption of VR video streaming. Unfortunately, video streaming consumes significant system energy due to the high power consumption of the system components (e.g., DRAM, display interfaces, and display panel) involved… ▽ More

    Submitted 1 November, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: The paper will be presented at MICRO 2021

  12. BlockHammer: Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows

    Authors: Abdullah Giray Yağlıkçı, Minesh Patel, Jeremie S. Kim, Roknoddin Azizi, Ataberk Olgun, Lois Orosa, Hasan Hassan, Jisung Park, Konstantinos Kanellopoulos, Taha Shahroodi, Saugata Ghose, Onur Mutlu

    Abstract: Aggressive memory density scaling causes modern DRAM devices to suffer from RowHammer, a phenomenon where rapidly activating a DRAM row can cause bit-flips in physically-nearby rows. Recent studies demonstrate that modern DRAM chips, including chips previously marketed as RowHammer-safe, are even more vulnerable to RowHammer than older chips. Many works show that attackers can exploit RowHammer bi… ▽ More

    Submitted 29 July, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: A shorter version of this work is to appear at the 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27), 2021

  13. arXiv:2009.07985  [pdf, other

    cs.AR

    Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics

    Authors: Minesh Patel, Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, Onur Mutlu

    Abstract: Increasing single-cell DRAM error rates have pushed DRAM manufacturers to adopt on-die error-correction coding (ECC), which operates entirely within a DRAM chip to improve factory yield. The on-die ECC function and its effects on DRAM reliability are considered trade secrets, so only the manufacturer knows precisely how on-die ECC alters the externally-visible reliability characteristics. Conseque… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: To appear in the MICRO 2020 conference proceedings

  14. arXiv:2005.12775  [pdf, other

    cs.AR

    CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off

    Authors: Haocong Luo, Taha Shahroodi, Hasan Hassan, Minesh Patel, Abdullah Giray Yaglikci, Lois Orosa, Jisung Park, Onur Mutlu

    Abstract: DRAM is the prevalent main memory technology, but its long access latency can limit the performance of many workloads. Although prior works provide DRAM designs that reduce DRAM access latency, their reduced storage capacities hinder the performance of workloads that need large memory capacity. Because the capacity-latency trade-off is fixed at design time, previous works cannot achieve maximum pe… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: This work is to appear at ISCA 2020

  15. SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

    Authors: Konstantinos Kanellopoulos, Nandita Vijaykumar, Christina Giannoula, Roknoddin Azizi, Skanda Koppula, Nika Mansouri Ghiasi, Taha Shahroodi, Juan Gomez Luna, Onur Mutlu

    Abstract: Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing unnecessary computation on zero elements. However, compression techniques like Compressed Sparse Row (CSR) that are widely used today introduce significant instruc… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

  16. arXiv:1910.09020  [pdf, other

    q-bio.GN cs.AR cs.DC cs.DS

    SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

    Authors: Mohammed Alser, Taha Shahroodi, Juan Gomez-Luna, Can Alkan, Onur Mutlu

    Abstract: Motivation: We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that conn… ▽ More

    Submitted 23 November, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

    Comments: To appear in Bioinformatics

    Journal ref: Bioinformatics, Apr 1;36(22-23):5282-5290, 2021

  17. arXiv:1910.05340  [pdf, other

    cs.DC cs.LG

    EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM

    Authors: Skanda Koppula, Lois Orosa, Abdullah Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, Onur Mutlu

    Abstract: The effectiveness of deep neural networks (DNN) in vision, speech, and language processing has prompted a tremendous demand for energy-efficient high-performance DNN inference systems. Due to the increasing memory intensity of most DNN workloads, main memory can dominate the system's energy consumption and stall time. One effective way to reduce the energy consumption and increase the performance… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

    Comments: This work is to appear at MICRO 2019