DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Cao, Qingqing; Trivedi, Harsh; Balasubramanian, Aruna; Balasubramanian, Niranjan

Computer Science > Computation and Language

arXiv:2005.00697 (cs)

[Submitted on 2 May 2020]

Title:DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Authors:Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

View PDF

Abstract:Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at this https URL.

Comments:	ACL 2020 camera ready
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2005.00697 [cs.CL]
	(or arXiv:2005.00697v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00697

Submission history

From: Qingqing Cao [view email]
[v1] Sat, 2 May 2020 04:28:22 UTC (1,075 KB)

Computer Science > Computation and Language

Title:DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators