Skip to main content

Showing 1–5 of 5 results for author: Aly, M M S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.06038  [pdf, other

    cs.LG cs.AI

    From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

    Authors: Xue Geng, Zhe Wang, Chunyun Chen, Qing Xu, Kaixin Xu, Chao **, Manas Gupta, Xulei Yang, Zhenghua Chen, Mohamed M. Sabry Aly, Jie Lin, Min Wu, Xiaoli Li

    Abstract: Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compress… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This manuscript is the accepted version for TNNLS(IEEE Transactions on Neural Networks and Learning Systems)

  2. EXTENT: Enabling Approximation-Oriented Energy Efficient STT-RAM Write Circuit

    Authors: Saeed Seyedfaraji, Javad Talafy Daryani, Mohamed M. Sabry Aly, Semeen Rehman

    Abstract: Spin Transfer Torque Random Access Memory (STT-RAM) has garnered interest due to its various characteristics such as non-volatility, low leakage power, high density. Its magnetic properties have a vital role in STT switching operations through thermal effectiveness. A key challenge for STT-RAM in industrial adaptation is the high write energy and latency. In this paper, we overcome this challenge… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Journal ref: IEEE Access ( Volume: 10), 2022, Page(s): 82144 - 82155

  3. arXiv:2205.11141  [pdf, other

    cs.CV cs.LG

    OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

    Authors: Peng Hu, Xi Peng, Hongyuan Zhu, Mohamed M. Sabry Aly, Jie Lin

    Abstract: As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted in AAAI2021 and Just upload to retrieve Arxiv DOI for Project Record

  4. arXiv:2105.12990  [pdf, other

    cs.CV

    PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

    Authors: Tianyi Zhang, Jie Lin, Peng Hu, Bin Zhao, Mohamed M. Sabry Aly

    Abstract: Non-maximum Suppression (NMS) is an essential postprocessing step in modern convolutional neural networks for object detection. Unlike convolutions which are inherently parallel, the de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized and thus could be the performance bottleneck in convolutional object detection pipelines. MaxpoolNMS is introduced as a parallelizable alterna… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: Accepted by CVPR2021

  5. arXiv:1901.02064  [pdf, other

    cs.LG stat.ML

    Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

    Authors: Xue Geng, Jie Fu, Bin Zhao, Jie Lin, Mohamed M. Sabry Aly, Christopher Pal, Vijay Chandrasekhar

    Abstract: This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quantization operations would incur less information los… ▽ More

    Submitted 4 January, 2019; originally announced January 2019.

    Journal ref: Data Compression Conference 2019