-
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Authors:
Hongjie Wang,
Bhishma Dedhia,
Niraj K. Jha
Abstract:
Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally exp…
▽ More
Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally expensive fine-tuning, which is undesirable in many edge deployment cases. In this work, we propose Zero-TPrune, the first zero-shot method that considers both the importance and similarity of tokens in performing token pruning. It leverages the attention graph of pre-trained Transformer models to produce an importance distribution for tokens via our proposed Weighted Page Rank (WPR) algorithm. This distribution further guides token partitioning for efficient similarity-based pruning. Due to the elimination of the fine-tuning overhead, Zero-TPrune can prune large models at negligible computational cost, switch between different pruning configurations at no computational cost, and perform hyperparameter tuning efficiently. We evaluate the performance of Zero-TPrune on vision tasks by applying it to various vision Transformer backbones and testing them on ImageNet. Without any fine-tuning, Zero-TPrune reduces the FLOPs cost of DeiT-S by 34.7% and improves its throughput by 45.3% with only 0.4% accuracy loss. Compared with state-of-the-art pruning methods that require fine-tuning, Zero-TPrune not only eliminates the need for fine-tuning after pruning but also does so with only 0.1% accuracy loss. Compared with state-of-the-art fine-tuning-free pruning methods, Zero-TPrune reduces accuracy loss by up to 49% with similar FLOPs budgets. Project webpage: https://jha-lab.github.io/zerotprune.
△ Less
Submitted 7 April, 2024; v1 submitted 26 May, 2023;
originally announced May 2023.
-
On Minimizing Channel-Aware Age of Information in a Multi-Sensor Setting
Authors:
Bhishma Dedhia,
Sharayu Moharir
Abstract:
We propose a variant of the Age of Information (AoI) metric called Channel-Aware Age of Information (CA-AoI). Unlike AoI, CA-AoI takes into account the channel conditions between the source and the intended destination to compute the "age" of the recent most update received by the destination. This new metric ensures that the resource allocation is not heavily tilted towards the sources with poor…
▽ More
We propose a variant of the Age of Information (AoI) metric called Channel-Aware Age of Information (CA-AoI). Unlike AoI, CA-AoI takes into account the channel conditions between the source and the intended destination to compute the "age" of the recent most update received by the destination. This new metric ensures that the resource allocation is not heavily tilted towards the sources with poor channel conditions. We design scheduling policies for multi-sensor systems in which sensors report their measurements to a central monitoring station via a shared unreliable communication channel with the goal of minimizing the time-average of the weighted sum of CA-AoIs. We initially derive universal lower bounds for the freshness objective. We show that the scheduling problem is indexable and derive low complexity Whittle index based scheduling policies. We also design stationary randomized scheduling algorithms and give optimization procedures to find the optimal parameters of the policy. Via simulations, we show that our proposed policies surpass the greedy policy in several settings. Moreover the Whittle Index based scheduling policies outperform other policies in all the settings considered.
△ Less
Submitted 11 January, 2021; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Saliency Prediction for Omnidirectional Images Considering Optimization on Sphere Domain
Authors:
Bhishma Dedhia,
Jui-Chiu Chiang,
Yi-Fan Char
Abstract:
There are several formats to describe the omnidirectional images. Among them, equirectangular projection (ERP), represented as 2D image, is the most widely used format. There exist many outstanding methods capable of well predicting the saliency maps for the conventional 2D images. But these works cannot be directly extended to predict the saliency map of the ERP image, since the content on ERP is…
▽ More
There are several formats to describe the omnidirectional images. Among them, equirectangular projection (ERP), represented as 2D image, is the most widely used format. There exist many outstanding methods capable of well predicting the saliency maps for the conventional 2D images. But these works cannot be directly extended to predict the saliency map of the ERP image, since the content on ERP is not for direct display. Instead, the viewport image on demand is generated after converting the ERP image to the sphere domain, followed by rectilinear projection. In this paper, we propose a model to predict the saliency maps of the ERP images using existing saliency predictors for the 2D image. Some pre-processing and post-processing are used to manage the problem mentioned above. In particular, a smoothing based optimization is realized on the sphere domain. A public dataset of omnidirectional images is used to perform all the experiments and competitive results are achieved.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.