Search | arXiv e-print repository

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

Authors: Xuanli He, Iman Keivanloo, Yi Xu, Xiang He, Belinda Zeng, Santosh Rajagopalan, Trishul Chilimbi

Abstract: Pre-training and then fine-tuning large language models is commonly used to achieve state-of-the-art performance in natural language processing (NLP) tasks. However, most pre-trained models suffer from low inference speed. Deploying such large models to applications with latency constraints is challenging. In this work, we focus on accelerating the inference via conditional computations. To achiev… ▽ More Pre-training and then fine-tuning large language models is commonly used to achieve state-of-the-art performance in natural language processing (NLP) tasks. However, most pre-trained models suffer from low inference speed. Deploying such large models to applications with latency constraints is challenging. In this work, we focus on accelerating the inference via conditional computations. To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT. The former manages to save the computation via removing non-salient tokens, while the latter can fulfill the computation reduction by terminating the inference early before reaching the final layer, if the exiting condition is met. Our empirical studies demonstrate that compared to previous state of arts, MP is not only able to achieve a speed-adjustable inference but also to surpass token pruning and early exiting by reducing up to 70% giga floating point operations (GFLOPs) with less than 0.5% accuracy drop. Token pruning and early exiting express distinctive preferences to sequences with different lengths. However, MP is capable of achieving an average of 8.06x speedup on two popular text classification tasks, regardless of the sizes of the inputs. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Comments: 8 pages

arXiv:2110.08919 [pdf, other]

Low-Precision Quantization for Efficient Nearest Neighbor Search

Authors: Anthony Ko, Iman Keivanloo, Vihan Lakshman, Eric Schkufza

Abstract: Fast k-Nearest Neighbor search over real-valued vector spaces (KNN) is an important algorithmic task for information retrieval and recommendation systems. We present a method for using reduced precision to represent vectors through quantized integer values, enabling both a reduction in the memory overhead of indexing these vectors and faster distance computations at query time. While most traditio… ▽ More Fast k-Nearest Neighbor search over real-valued vector spaces (KNN) is an important algorithmic task for information retrieval and recommendation systems. We present a method for using reduced precision to represent vectors through quantized integer values, enabling both a reduction in the memory overhead of indexing these vectors and faster distance computations at query time. While most traditional quantization techniques focus on minimizing the reconstruction error between a point and its uncompressed counterpart, we focus instead on preserving the behavior of the underlying distance metric. Furthermore, our quantization approach is applied at the implementation level and can be combined with existing KNN algorithms. Our experiments on both open source and proprietary datasets across multiple popular KNN frameworks validate that quantized distance metrics can reduce memory by 60% and improve query throughput by 30%, while incurring only a 2% reduction in recall. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: 5 pages

arXiv:1807.02278 [pdf, other]

doi 10.1109/SCAM.2015.7335404

Recommending Insightful Comments for Source Code using Crowdsourced Knowledge

Authors: Mohammad Masudur Rahman, Chanchal K. Roy, Iman Keivanloo

Abstract: Recently, automatic code comment generation is proposed to facilitate program comprehension. Existing code comment generation techniques focus on describing the functionality of the source code. However, there are other aspects such as insights about quality or issues of the code, which are overlooked by earlier approaches. In this paper, we describe a mining approach that recommends insightful co… ▽ More Recently, automatic code comment generation is proposed to facilitate program comprehension. Existing code comment generation techniques focus on describing the functionality of the source code. However, there are other aspects such as insights about quality or issues of the code, which are overlooked by earlier approaches. In this paper, we describe a mining approach that recommends insightful comments about the quality, deficiencies or scopes for further improvement of the source code. First, we conduct an exploratory study that motivates crowdsourced knowledge from Stack Overflow discussions as a potential resource for source code comment recommendation. Second, based on the findings from the exploratory study, we propose a heuristic-based technique for mining insightful comments from Stack Overflow Q & A site for source code comment recommendation. Experiments with 292 Stack Overflow code segments and 5,039 discussion comments show that our approach has a promising recall of 85.42%. We also conducted a complementary user study which confirms the accuracy and usefulness of the recommended comments. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: The 15th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2015), pp. 81--90, Bremen, Germany, September 2015

Journal ref: Proc. SCAM 2015, pp. 81--90

Showing 1–3 of 3 results for author: Keivanloo, I