Skip to main content

Showing 1–4 of 4 results for author: Heo, J H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.15531  [pdf, other

    cs.LG

    Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

    Authors: Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

    Abstract: Large Language Models (LLMs) have recently demonstrated remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to the large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outlier… ▽ More

    Submitted 24 March, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: ICLR 2024. 19 pages, 11 figures, 10 tables

  2. arXiv:2305.04526  [pdf, other

    cs.CV

    CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation

    Authors: Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

    Abstract: Transfer learning has become a popular task adaptation method in the era of foundation models. However, many foundation models require large storage and computing resources, which makes off-the-shelf deployment impractical. Post-training compression techniques such as pruning and quantization can help lower deployment costs. Unfortunately, the resulting performance degradation limits the usability… ▽ More

    Submitted 8 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Preprint

  3. arXiv:2303.02331  [pdf, other

    cs.CV cs.AI cs.LG

    Training-Free Acceleration of ViTs with Delayed Spatial Merging

    Authors: Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram

    Abstract: Token merging has emerged as a new paradigm that can accelerate the inference of Vision Transformers (ViTs) without any retraining or fine-tuning. To push the frontier of training-free acceleration in ViTs, we improve token merging by adding the perspectives of 1) activation outliers and 2) hierarchical representations. Through a careful analysis of the attention behavior in ViTs, we characterize… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: ICML 2024 ES-FoMo Workshop

  4. Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

    Authors: Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram

    Abstract: This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach unlocked by an emergent pruning scheme, periodic pattern-based sparsity (PPS). By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorde… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: 6 pages, Published in ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) 2022