Search | arXiv e-print repository

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Authors: Hongjie Wang, Bhishma Dedhia, Niraj K. Jha

Abstract: Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally exp… ▽ More Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally expensive fine-tuning, which is undesirable in many edge deployment cases. In this work, we propose Zero-TPrune, the first zero-shot method that considers both the importance and similarity of tokens in performing token pruning. It leverages the attention graph of pre-trained Transformer models to produce an importance distribution for tokens via our proposed Weighted Page Rank (WPR) algorithm. This distribution further guides token partitioning for efficient similarity-based pruning. Due to the elimination of the fine-tuning overhead, Zero-TPrune can prune large models at negligible computational cost, switch between different pruning configurations at no computational cost, and perform hyperparameter tuning efficiently. We evaluate the performance of Zero-TPrune on vision tasks by applying it to various vision Transformer backbones and testing them on ImageNet. Without any fine-tuning, Zero-TPrune reduces the FLOPs cost of DeiT-S by 34.7% and improves its throughput by 45.3% with only 0.4% accuracy loss. Compared with state-of-the-art pruning methods that require fine-tuning, Zero-TPrune not only eliminates the need for fine-tuning after pruning but also does so with only 0.1% accuracy loss. Compared with state-of-the-art fine-tuning-free pruning methods, Zero-TPrune reduces accuracy loss by up to 49% with similar FLOPs budgets. Project webpage: https://jha-lab.github.io/zerotprune. △ Less

Submitted 7 April, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2005.04036 [pdf, other]

On Minimizing Channel-Aware Age of Information in a Multi-Sensor Setting

Authors: Bhishma Dedhia, Sharayu Moharir

Abstract: We propose a variant of the Age of Information (AoI) metric called Channel-Aware Age of Information (CA-AoI). Unlike AoI, CA-AoI takes into account the channel conditions between the source and the intended destination to compute the "age" of the recent most update received by the destination. This new metric ensures that the resource allocation is not heavily tilted towards the sources with poor… ▽ More We propose a variant of the Age of Information (AoI) metric called Channel-Aware Age of Information (CA-AoI). Unlike AoI, CA-AoI takes into account the channel conditions between the source and the intended destination to compute the "age" of the recent most update received by the destination. This new metric ensures that the resource allocation is not heavily tilted towards the sources with poor channel conditions. We design scheduling policies for multi-sensor systems in which sensors report their measurements to a central monitoring station via a shared unreliable communication channel with the goal of minimizing the time-average of the weighted sum of CA-AoIs. We initially derive universal lower bounds for the freshness objective. We show that the scheduling problem is indexable and derive low complexity Whittle index based scheduling policies. We also design stationary randomized scheduling algorithms and give optimization procedures to find the optimal parameters of the policy. Via simulations, we show that our proposed policies surpass the greedy policy in several settings. Moreover the Whittle Index based scheduling policies outperform other policies in all the settings considered. △ Less

Submitted 11 January, 2021; v1 submitted 8 May, 2020; originally announced May 2020.

arXiv:1903.01380 [pdf]

Saliency Prediction for Omnidirectional Images Considering Optimization on Sphere Domain

Authors: Bhishma Dedhia, Jui-Chiu Chiang, Yi-Fan Char

Abstract: There are several formats to describe the omnidirectional images. Among them, equirectangular projection (ERP), represented as 2D image, is the most widely used format. There exist many outstanding methods capable of well predicting the saliency maps for the conventional 2D images. But these works cannot be directly extended to predict the saliency map of the ERP image, since the content on ERP is… ▽ More There are several formats to describe the omnidirectional images. Among them, equirectangular projection (ERP), represented as 2D image, is the most widely used format. There exist many outstanding methods capable of well predicting the saliency maps for the conventional 2D images. But these works cannot be directly extended to predict the saliency map of the ERP image, since the content on ERP is not for direct display. Instead, the viewport image on demand is generated after converting the ERP image to the sphere domain, followed by rectilinear projection. In this paper, we propose a model to predict the saliency maps of the ERP images using existing saliency predictors for the 2D image. Some pre-processing and post-processing are used to manage the problem mentioned above. In particular, a smoothing based optimization is realized on the sphere domain. A public dataset of omnidirectional images is used to perform all the experiments and competitive results are achieved. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Showing 1–3 of 3 results for author: Dedhia, B