Skip to main content

Showing 1–10 of 10 results for author: Asgari, B

.
  1. arXiv:2406.11674  [pdf, other

    cs.CL

    Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference

    Authors: Donghyeon Joo, Ramyad Hadidi, Soheil Feizi, Bahar Asgari

    Abstract: The increasing size of large language models (LLMs) challenges their usage on resource-constrained platforms. For example, memory on modern GPUs is insufficient to hold LLMs that are hundreds of Gigabytes in size. Offloading is a popular method to escape this constraint by storing weights of an LLM model to host CPU memory and SSD, then loading each weight to GPU before every use. In our case stud… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 14 pages, 16 figures

  2. arXiv:2406.10166  [pdf, other

    cs.LG

    Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

    Authors: Sanjali Yadav, Bahar Asgari

    Abstract: Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and computational demands. However, the irregular structure of sparse matrices poses significant challenges for performance optimization. Traditional hardware accelerators a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to ISCA 2024 MLArchSys workshop https://openreview.net/forum?id=A1V9FaZRbV

  3. arXiv:2104.04563  [pdf, other

    cs.RO

    Context-Aware Task Handling in Resource-Constrained Robots with Virtualization

    Authors: Ramyad Hadidi, Nima Shoghi Ghalehshahi, Bahar Asgari, Hyesoon Kim

    Abstract: Intelligent mobile robots are critical in several scenarios. However, as their computational resources are limited, mobile robots struggle to handle several tasks concurrently and yet guaranteeing real-timeliness. To address this challenge and improve the real-timeliness of critical tasks under resource constraints, we propose a fast context-aware task handling technique. To effectively handling t… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

  4. Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

    Authors: Bahar Asgari, Ramyad Hadidi, Joshua Dierberger, Charlotte Steinichen, Amaan Marfatia, Hyesoon Kim

    Abstract: Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in p… ▽ More

    Submitted 18 October, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: 11 pages, 14 figures, 2 tables

  5. arXiv:2003.06464  [pdf, other

    eess.SP cs.LG

    LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference in Image Recognition

    Authors: Ramyad Hadidi, Bahar Asgari, Jiashen Cao, Younmin Bae, Da Eun Shim, Hyojong Kim, Sung-Kyu Lim, Michael S. Ryoo, Hyesoon Kim

    Abstract: Deep neural networks (DNNs) have inspired new studies in myriad edge applications with robots, autonomous agents, and Internet-of-things (IoT) devices. However, performing inference of DNNs in the edge is still a severe challenge, mainly because of the contradiction between the intensive resource requirements of DNNs and the tight resource availability in several edge domains. Further, as communic… ▽ More

    Submitted 17 November, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

  6. arXiv:1803.06068  [pdf, other

    cs.AR cs.PF

    Memory Slices: A Modular Building Block for Scalable, Intelligent Memory Systems

    Authors: Bahar Asgari, Saibal Mukhopadhyay, Sudhakar Yalamanchili

    Abstract: While reduction in feature size makes computation cheaper in terms of latency, area, and power consumption, performance of emerging data-intensive applications is determined by data movement. These trends have introduced the concept of scalability as reaching a desirable performance per unit cost by using as few number of units as possible. Many proposals have moved compute closer to the memory. H… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

  7. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Authors: Ramyad Hadidi, Bahar Asgari, Jeffrey Young, Burhan Ahmad Mudassar, Kartikay Garg, Tushar Krishna, Hyesoon Kim

    Abstract: Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits su… ▽ More

    Submitted 13 February, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

    Journal ref: 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  8. Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube

    Authors: Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, Sudhakar Yalamanchili, Hyesoon Kim

    Abstract: Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small… ▽ More

    Submitted 3 October, 2017; v1 submitted 8 June, 2017; originally announced June 2017.

    Comments: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-3

    Journal ref: Proceedings of the 2017 IEEE International Symposium on Workload Characterization

  9. arXiv:0805.0888  [pdf

    cs.OH

    Geometrical Variation Analysis of an Electrothermally Driven Polysilicon Microactuator

    Authors: M. Shamshirsaz, M. Maroufi, M. B. Asgari

    Abstract: The analytical models that predict thermal and mechanical responses of microactuator have been developed. These models are based on electro thermal and thermo mechanical analysis of the microbeam. Also, Finite Element Analysis (FEA) is used to evaluate microactuator tip deflection. Analytical and Finite Element results are compared with experimental results in literature and show good agreement… ▽ More

    Submitted 7 May, 2008; originally announced May 2008.

    Comments: Submitted on behalf of EDA Publishing Association (http://irevues.inist.fr/handle/2042/16838)

    Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2008, Nice : France (2008)

  10. arXiv:0802.3054  [pdf

    cs.OH

    Analysis of polysilicon micro beams buckling with temperature-dependent properties

    Authors: M. Shamshirsaz, M. Bahrami, M. B. Asgari, M. Tayefeh

    Abstract: The suspended electrothermal polysilicon micro beams generate displacements and forces by thermal buckling effects. In the previous electro-thermal and thermo-elastic models of suspended polysilicon micro beams, the thermo-mechanical properties of polysilicon have been considered constant over a wide rang of temperature (20- 900 degrees C). In reality, the thermo-mechanical properties of polysil… ▽ More

    Submitted 21 February, 2008; originally announced February 2008.

    Comments: Submitted on behalf of EDA Publishing Association (http://irevues.inist.fr/EDA-Publishing)

    Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2007, Stresa, lago Maggiore : Italie (2007)