XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

Mukherjee, Subhabrata; Awadallah, Ahmed Hassan; Gao, Jianfeng

Computer Science > Computation and Language

arXiv:2106.04563 (cs)

[Submitted on 8 Jun 2021 (v1), last revised 12 Jun 2021 (this version, v2)]

Title:XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

Authors:Subhabrata Mukherjee, Ahmed Hassan Awadallah, Jianfeng Gao

View PDF

Abstract:While deep and large pre-trained models are the state-of-the-art for various natural language processing tasks, their huge size poses significant challenges for practical uses in resource constrained settings. Recent works in knowledge distillation propose task-agnostic as well as task-specific methods to compress these models, with task-specific ones often yielding higher compression rate. In this work, we develop a new task-agnostic distillation framework XtremeDistilTransformers that leverages the advantage of task-specific methods for learning a small universal model that can be applied to arbitrary tasks and languages. To this end, we study the transferability of several source tasks, augmentation resources and model architecture for distillation. We evaluate our model performance on multiple tasks, including the General Language Understanding Evaluation (GLUE) benchmark, SQuAD question answering dataset and a massive multi-lingual NER dataset with 41 languages. We release three distilled task-agnostic checkpoints with 13MM, 22MM and 33MM parameters obtaining SOTA performance in several tasks.

Comments:	Code and checkpoints released (links in draft)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2106.04563 [cs.CL]
	(or arXiv:2106.04563v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2106.04563

Submission history

From: Subhabrata Mukherjee [view email]
[v1] Tue, 8 Jun 2021 17:49:33 UTC (5,244 KB)
[v2] Sat, 12 Jun 2021 03:59:31 UTC (5,244 KB)

Computer Science > Computation and Language

Title:XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators