-
CEEBERT: Cross-Domain Inference in Early Exit BERT
Authors:
Divya Jyoti Bajpai,
Manjesh Kumar Hanawal
Abstract:
Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challen…
▽ More
Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly, eliminating the need for labeled data. Experimental results on five distinct datasets with BERT and ALBERT models demonstrate CeeBERT's ability to improve latency by reducing unnecessary computations with minimal drop in performance. By adapting to the threshold values, CeeBERT can speed up the BERT/ALBERT models by $2\times$ - $3.5\times$ with minimal drop in accuracy.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
FAIR: Filtering of Automatically Induced Rules
Authors:
Divya Jyoti Bajpai,
Ayush Maheshwari,
Manjesh Kumar Hanawal,
Ganesh Ramakrishnan
Abstract:
The availability of large annotated data can be a critical bottleneck in training machine learning algorithms successfully, especially when applied to diverse domains. Weak supervision offers a promising alternative by accelerating the creation of labeled training data using domain-specific rules. However, it requires users to write a diverse set of high-quality rules to assign labels to the unlab…
▽ More
The availability of large annotated data can be a critical bottleneck in training machine learning algorithms successfully, especially when applied to diverse domains. Weak supervision offers a promising alternative by accelerating the creation of labeled training data using domain-specific rules. However, it requires users to write a diverse set of high-quality rules to assign labels to the unlabeled data. Automatic Rule Induction (ARI) approaches circumvent this problem by automatically creating rules from features on a small labeled set and filtering a final set of rules from them. In the ARI approach, the crucial step is to filter out a set of a high-quality useful subset of rules from the large set of automatically created rules. In this paper, we propose an algorithm (Filtering of Automatically Induced Rules) to filter rules from a large number of automatically induced rules using submodular objective functions that account for the collective precision, coverage, and conflicts of the rule set. We experiment with three ARI approaches and five text classification datasets to validate the superior performance of our algorithm with respect to several semi-supervised label aggregation approaches. Further, we show that achieves statistically significant results in comparison to existing rule-filtering approaches.
△ Less
Submitted 4 July, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
I-SplitEE: Image classification in Split Computing DNNs with Early Exits
Authors:
Divya Jyoti Bajpai,
Aastha Jaiswal,
Manjesh Kumar Hanawal
Abstract:
The recent advances in Deep Neural Networks (DNNs) stem from their exceptional performance across various domains. However, their inherent large size hinders deploying these networks on resource-constrained devices like edge, mobile, and IoT platforms. Strategies have emerged, from partial cloud computation offloading (split computing) to integrating early exits within DNN layers. Our work present…
▽ More
The recent advances in Deep Neural Networks (DNNs) stem from their exceptional performance across various domains. However, their inherent large size hinders deploying these networks on resource-constrained devices like edge, mobile, and IoT platforms. Strategies have emerged, from partial cloud computation offloading (split computing) to integrating early exits within DNN layers. Our work presents an innovative unified approach merging early exits and split computing. We determine the 'splitting layer', the optimal depth in the DNN for edge device computations, and whether to infer on edge device or be offloaded to the cloud for inference considering accuracy, computational efficiency, and communication costs. Also, Image classification faces diverse environmental distortions, influenced by factors like time of day, lighting, and weather. To adapt to these distortions, we introduce I-SplitEE, an online unsupervised algorithm ideal for scenarios lacking ground truths and with sequential data. Experimental validation using Caltech-256 and Cifar-10 datasets subjected to varied distortions showcases I-SplitEE's ability to reduce costs by a minimum of 55% with marginal performance degradation of at most 5%.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
SplitEE: Early Exit in Deep Neural Networks with Split Computing
Authors:
Divya J. Bajpai,
Vivek K. Trivedi,
Sohan L. Yadav,
Manjesh K. Hanawal
Abstract:
Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To overcome the issue, various approaches are considered, like offloading part of the computation to the cloud for final inference (split computing) or performing th…
▽ More
Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To overcome the issue, various approaches are considered, like offloading part of the computation to the cloud for final inference (split computing) or performing the inference at an intermediary layer without passing through all layers (early exits). In this work, we propose combining both approaches by using early exits in split computing. In our approach, we decide up to what depth of DNNs computation to perform on the device (splitting layer) and whether a sample can exit from this layer or need to be offloaded. The decisions are based on a weighted combination of accuracy, computational, and communication costs. We develop an algorithm named SplitEE to learn an optimal policy. Since pre-trained DNNs are often deployed in new domains where the ground truths may be unavailable and samples arrive in a streaming fashion, SplitEE works in an online and unsupervised setup. We extensively perform experiments on five different datasets. SplitEE achieves a significant cost reduction ($>50\%$) with a slight drop in accuracy ($<2\%$) as compared to the case when all samples are inferred at the final layer. The anonymized source code is available at \url{https://anonymous.4open.science/r/SplitEE_M-B989/README.md}.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.