Hard-Coded Gaussian Attention for Neural Machine Translation

You, Weiqiu; Sun, Simeng; Iyyer, Mohit

Computer Science > Computation and Language

arXiv:2005.00742 (cs)

[Submitted on 2 May 2020]

Title:Hard-Coded Gaussian Attention for Neural Machine Translation

Authors:Weiqiu You, Simeng Sun, Mohit Iyyer

View PDF

Abstract:Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by develo** a "hard-coded" attention variant without any learned parameters. Surprisingly, replacing all learned self-attention heads in the encoder and decoder with fixed, input-agnostic Gaussian distributions minimally impacts BLEU scores across four different language pairs. However, additionally hard-coding cross attention (which connects the decoder to the encoder) significantly lowers BLEU, suggesting that it is more important than self-attention. Much of this BLEU drop can be recovered by adding just a single learned cross attention head to an otherwise hard-coded Transformer. Taken as a whole, our results offer insight into which components of the Transformer are actually important, which we hope will guide future work into the development of simpler and more efficient attention-based models.

Comments:	ACL 2020 Camera Ready (12 pages)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2005.00742 [cs.CL]
	(or arXiv:2005.00742v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00742

Submission history

From: Weiqiu You [view email]
[v1] Sat, 2 May 2020 08:16:13 UTC (187 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Mohit Iyyer

export BibTeX citation

Computer Science > Computation and Language

Title:Hard-Coded Gaussian Attention for Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Hard-Coded Gaussian Attention for Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators