Boosting keyword spotting through on-device learnable user speech characteristics

Cioflan, Cristian; Cavigelli, Lukas; Benini, Luca

Computer Science > Sound

arXiv:2403.07802 (cs)

[Submitted on 12 Mar 2024]

Title:Boosting keyword spotting through on-device learnable user speech characteristics

Authors:Cristian Cioflan, Lukas Cavigelli, Luca Benini

View PDF HTML (experimental)

Abstract:Keyword spotting systems for always-on TinyML-constrained applications require on-site tuning to boost the accuracy of offline trained classifiers when deployed in unseen inference conditions. Adapting to the speech peculiarities of target users requires many in-domain samples, often unavailable in real-world scenarios. Furthermore, current on-device learning techniques rely on computationally intensive and memory-hungry backbone update schemes, unfit for always-on, battery-powered devices. In this work, we propose a novel on-device learning architecture, composed of a pretrained backbone and a user-aware embedding learning the user's speech characteristics. The so-generated features are fused and used to classify the input utterance. For domain shifts generated by unseen speakers, we measure error rate reductions of up to 19% from 30.1% to 24.3% based on the 35-class problem of the Google Speech Commands dataset, through the inexpensive update of the user projections. We moreover demonstrate the few-shot learning capabilities of our proposed architecture in sample- and class-scarce learning conditions. With 23.7 kparameters and 1 MFLOP per epoch required for on-device training, our system is feasible for TinyML applications aimed at battery-powered microcontrollers.

Comments:	5 pages, 3 tables, 2 figures. Accepted as a full paper by the tinyML Research Symposium 2024
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2403.07802 [cs.SD]
	(or arXiv:2403.07802v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2403.07802

Submission history

From: Cristian Cioflan [view email]
[v1] Tue, 12 Mar 2024 16:41:31 UTC (206 KB)

Computer Science > Sound

Title:Boosting keyword spotting through on-device learnable user speech characteristics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Boosting keyword spotting through on-device learnable user speech characteristics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators