-
Small-footprint slimmable networks for keyword spotting
Authors:
Zuhaib Akhtar,
Mohammad Omar Khursheed,
Dongsu Du,
Yuzong Liu
Abstract:
In this work, we present Slimmable Neural Networks applied to the problem of small-footprint keyword spotting. We show that slimmable neural networks allow us to create super-nets from Convolutioanl Neural Networks and Transformers, from which sub-networks of different sizes can be extracted. We demonstrate the usefulness of these models on in-house Alexa data and Google Speech Commands, and focus…
▽ More
In this work, we present Slimmable Neural Networks applied to the problem of small-footprint keyword spotting. We show that slimmable neural networks allow us to create super-nets from Convolutioanl Neural Networks and Transformers, from which sub-networks of different sizes can be extracted. We demonstrate the usefulness of these models on in-house Alexa data and Google Speech Commands, and focus our efforts on models for the on-device use case, limiting ourselves to less than 250k parameters. We show that slimmable models can match (and in some cases, outperform) models trained from scratch. Slimmable neural networks are therefore a class of models particularly useful when the same functionality is to be replicated at different memory and compute budgets, with different accuracy requirements.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Latency Control for Keyword Spotting
Authors:
Christin Jose,
Joseph Wang,
Grant P. Strimel,
Mohammad Omar Khursheed,
Yuriy Mishchenko,
Brian Kulis
Abstract:
Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function withou…
▽ More
Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function without explicit knowledge of the keyword endpoint. Through a single, tunable hyperparameter, our approach enables one to balance detection latency and accuracy for the targeted application. Empirically, we show that our approach gives superior performance under latency constraints when compared to existing methods. Namely, we make a substantial 25\% relative false accepts improvement for a fixed latency target when compared to the baseline state-of-the-art. We also show that when our approach is used in conjunction with a max-pooling loss, we are able to improve relative false accepts by 25 % at a fixed latency when compared to cross entropy loss.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting
Authors:
Mohammad Omar Khursheed,
Christin Jose,
Rajath Kumar,
Gengshen Fu,
Brian Kulis,
Santosh Kumar Cheekatmalla
Abstract:
In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architectu…
▽ More
In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architecture, and we can get up to 32% reduction in False Accepts at a 50k parameter budget with 75% reduction in parameter size compared to word-level Dense Neural Network models. We discuss solutions to the challenging problem of performing inference on streaming audio with this architecture, as well as differences in start-end index errors and latency in comparison to CNN, DNN, and DNN-HMM models.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Small Footprint Convolutional Recurrent Networks for Streaming Wakeword Detection
Authors:
Mohammad Omar Khursheed,
Christin Jose,
Rajath Kumar,
Gengshen Fu,
Brian Kulis,
Santosh Kumar Cheekatmalla
Abstract:
In this work, we propose small footprint Convolutional Recurrent Neural Network models applied to the problem of wakeword detection and augment them with scaled dot product attention. We find that false accepts compared to Convolutional Neural Network models in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using CRNNs, and we can get up to 32% improvement…
▽ More
In this work, we propose small footprint Convolutional Recurrent Neural Network models applied to the problem of wakeword detection and augment them with scaled dot product attention. We find that false accepts compared to Convolutional Neural Network models in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using CRNNs, and we can get up to 32% improvement at a 50k parameter budget with 75% reduction in parameter size compared to word-level Dense Neural Network models. We discuss solutions to the challenging problem of performing inference on streaming audio with CRNNs, as well as differences in start-end index errors and latency in comparison to CNN, DNN, and DNN-HMM models.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.