Skip to main content

Showing 1–1 of 1 results for author: Dangi, P

.
  1. arXiv:2405.17025  [pdf, other

    cs.AR cs.AI

    SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs

    Authors: Zhenyu Bai, Pranav Dangi, Huize Li, Tulika Mitra

    Abstract: Efficiently supporting long context length is crucial for Transformer models. The quadratic complexity of the self-attention computation plagues traditional Transformers. Sliding window-based static sparse attention mitigates the problem by limiting the attention scope of the input tokens, reducing the theoretical complexity from quadratic to linear. Although the sparsity induced by window attenti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepeted paper for DAC'22