Skip to main content

Showing 1–14 of 14 results for author: Hadidi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11674  [pdf, other

    cs.CL

    Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference

    Authors: Donghyeon Joo, Ramyad Hadidi, Soheil Feizi, Bahar Asgari

    Abstract: The increasing size of large language models (LLMs) challenges their usage on resource-constrained platforms. For example, memory on modern GPUs is insufficient to hold LLMs that are hundreds of Gigabytes in size. Offloading is a popular method to escape this constraint by storing weights of an LLM model to host CPU memory and SSD, then loading each weight to GPU before every use. In our case stud… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 14 pages, 16 figures

  2. arXiv:2404.10689  [pdf, other

    cs.LG eess.SP

    Network architecture search of X-ray based scientific applications

    Authors: Adarsha Balaji, Ramyad Hadidi, Gregory Kollmer, Mohammed E. Fouda, Prasanna Balaprakash

    Abstract: X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the de… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  3. arXiv:2104.04563  [pdf, other

    cs.RO

    Context-Aware Task Handling in Resource-Constrained Robots with Virtualization

    Authors: Ramyad Hadidi, Nima Shoghi Ghalehshahi, Bahar Asgari, Hyesoon Kim

    Abstract: Intelligent mobile robots are critical in several scenarios. However, as their computational resources are limited, mobile robots struggle to handle several tasks concurrently and yet guaranteeing real-timeliness. To address this challenge and improve the real-timeliness of critical tasks under resource constraints, we propose a fast context-aware task handling technique. To effectively handling t… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

  4. arXiv:2104.04447  [pdf, other

    cs.DC

    Creating Robust Deep Neural Networks With Coded Distributed Computing for IoT Systems

    Authors: Ramyad Hadidi, Jiashen Cao, Hyesoon Kim

    Abstract: The increasing interest in serverless computation and ubiquitous wireless networks has led to numerous connected devices in our surroundings. Among such devices, IoT devices have access to an abundance of raw data, but their inadequate resources in computing limit their capabilities. Specifically, with the emergence of deep neural networks (DNNs), not only is the demand for the computing power of… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

  5. arXiv:2102.08481  [pdf, other

    cs.DB

    THIA: Accelerating Video Analytics using Early Inference and Fine-Grained Query Planning

    Authors: Jiashen Cao, Ramyad Hadidi, Joy Arulraj, Hyesoon Kim

    Abstract: To efficiently process visual data at scale, researchers have proposed two techniques for lowering the computational overhead associated with the underlying deep learning models. The first approach consists of leveraging a specialized, lightweight model to directly answer the query. The second approach focuses on filtering irrelevant frames using a lightweight model and processing the filtered fra… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

  6. Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

    Authors: Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Amaan Marfatia, Hyesoon Kim

    Abstract: Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in p… ▽ More

    Submitted 18 October, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: 11 pages, 14 figures, 2 tables

  7. arXiv:2011.08936  [pdf, other

    cs.CR

    Secure Location-Aware Authentication and Communication for Intelligent Transportation Systems

    Authors: Nima Shoghi Ghalehshahi, Ramyad Hadidi, Lee Jaewon, Jun Chen, Arthur Siqueria, Rahul Rajan, Shaan Dhawan, Pooya Shoghi Ghalehshahi, Hyesoon Kim

    Abstract: Intelligent transportation systems (ITS) are expected to effectively create a stand-alone network for secure communication among autonomous agents. In such a dynamic and fast-changing network with high-speed agents, verifying the authenticity and integrity of messages while taking preventive action (e.g., applying brakes) within tens of milliseconds is one of the main challenges. In such a brief m… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  8. arXiv:2011.07092  [pdf, other

    cs.CV

    Reducing Inference Latency with Concurrent Architectures for Image Recognition

    Authors: Ramyad Hadidi, Jiashen Cao, Michael S. Ryoo, Hyesoon Kim

    Abstract: Satisfying the high computation demand of modern deep learning architectures is challenging for achieving low inference latency. The current approaches in decreasing latency only increase parallelism within a layer. This is because architectures typically capture a single-chain dependency pattern that prevents efficient distribution with a higher concurrency (i.e., simultaneous execution of one in… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

  9. arXiv:2003.06464  [pdf, other

    eess.SP cs.LG

    LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference in Image Recognition

    Authors: Ramyad Hadidi, Bahar Asgari, Jiashen Cao, Younmin Bae, Da Eun Shim, Hyojong Kim, Sung-Kyu Lim, Michael S. Ryoo, Hyesoon Kim

    Abstract: Deep neural networks (DNNs) have inspired new studies in myriad edge applications with robots, autonomous agents, and Internet-of-things (IoT) devices. However, performing inference of DNNs in the edge is still a severe challenge, mainly because of the contradiction between the intensive resource requirements of DNNs and the tight resource availability in several edge domains. Further, as communic… ▽ More

    Submitted 17 November, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

  10. arXiv:1901.02537  [pdf, other

    cs.CV

    Collaborative Execution of Deep Neural Networks on Internet of Things Devices

    Authors: Ramyad Hadidi, Jiashen Cao, Micheal S. Ryoo, Hyesoon Kim

    Abstract: With recent advancements in deep neural networks (DNNs), we are able to solve traditionally challenging problems. Since DNNs are compute intensive, consumers, to deploy a service, need to rely on expensive and scarce compute resources in the cloud. This approach, in addition to its dependability on high-quality network infrastructure and data centers, raises new privacy concerns. These challenges… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: Updated version after sysML

  11. arXiv:1802.02138  [pdf, other

    cs.CV cs.AR

    Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices

    Authors: Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim

    Abstract: The prevalence of Internet of things (IoT) devices and abundance of sensor data has created an increase in real-time data processing such as recognition of speech, image, and video. While currently such processes are offloaded to the computationally powerful cloud system, a localized and distributed approach is desirable because (i) it preserves the privacy of users and (ii) it omits the dependenc… ▽ More

    Submitted 21 March, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

  12. CODA: Enabling Co-location of Computation and Data for Near-Data Processing

    Authors: Hyojong Kim, Ramyad Hadidi, Lifeng Nai, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, Gabriel H. Loh

    Abstract: Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory modules introduces a new challenge. In today's systems, where no computation occurs in memory modules, the physical address space is interleaved at a fine granularity… ▽ More

    Submitted 25 October, 2017; originally announced October 2017.

    Comments: 14 pages, 16 figures

    Journal ref: ACM Transactions on Architecture and Code Optimization (TACO) Volume 15 Issue 3, October 2018 Article No. 32

  13. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Authors: Ramyad Hadidi, Bahar Asgari, Jeffrey Young, Burhan Ahmad Mudassar, Kartikay Garg, Tushar Krishna, Hyesoon Kim

    Abstract: Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits su… ▽ More

    Submitted 13 February, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

    Journal ref: 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  14. Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube

    Authors: Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, Sudhakar Yalamanchili, Hyesoon Kim

    Abstract: Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small… ▽ More

    Submitted 3 October, 2017; v1 submitted 8 June, 2017; originally announced June 2017.

    Comments: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-3

    Journal ref: Proceedings of the 2017 IEEE International Symposium on Workload Characterization