Showing 1–1 of 1 results for author: Cruz, M T

Search v0.5.6 released 2020-02-24

arXiv:2104.00769 [pdf, other]

eess.AS cs.CL cs.LG cs.SD

doi 10.21437/Interspeech.2021-1286

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Authors: Axel Berg, Mark O'Connor, Miguel Tairum Cruz

Abstract: The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully se… ▽ More The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively. △ Less

Submitted 15 June, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: Proceedings of INTERSPEECH

Journal ref: Proc. Interspeech 2021, 4249-4253

Search v0.5.6 released 2020-02-24