Showing 1–2 of 2 results for author: Bergsma, B

Search v0.5.6 released 2020-02-24

arXiv:2309.11922 [pdf, other]

eess.AS cs.SD

Cluster-based pruning techniques for audio data

Authors: Boris Bergsma, Marta Brzezinska, Oleg V. Yazyev, Milos Cernak

Abstract: Deep learning models have become widely adopted in various domains, but their performance heavily relies on a vast amount of data. Datasets often contain a large number of irrelevant or redundant samples, which can lead to computational inefficiencies during the training. In this work, we introduce, for the first time in the context of the audio domain, the k-means clustering as a method for effic… ▽ More Deep learning models have become widely adopted in various domains, but their performance heavily relies on a vast amount of data. Datasets often contain a large number of irrelevant or redundant samples, which can lead to computational inefficiencies during the training. In this work, we introduce, for the first time in the context of the audio domain, the k-means clustering as a method for efficient data pruning. K-means clustering provides a way to group similar samples together, allowing the reduction of the size of the dataset while preserving its representative characteristics. As an example, we perform clustering analysis on the keyword spotting (KWS) dataset. We discuss how k-means clustering can significantly reduce the size of audio datasets while maintaining the classification performance across neural networks (NNs) with different architectures. We further comment on the role of scaling analysis in identifying the optimal pruning strategies for a large number of samples. Our studies serve as a proof-of-principle, demonstrating the potential of data selection with distance-based clustering algorithms for the audio domain and highlighting promising research avenues. △ Less

Submitted 21 September, 2023; originally announced September 2023.
arXiv:2110.03715 [pdf, other]

eess.AS cs.SD eess.SP

PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition

Authors: Boris Bergsma, Minhao Yang, Milos Cernak

Abstract: At the end of Moore's law, new computing paradigms are required to prolong the battery life of wearable and IoT smart audio devices. Theoretical analysis and physical validation have shown that analog signal processing (ASP) can be more power-efficient than its digital counterpart in the realm of low-to-medium signal-to-noise ratio applications. In addition, ASP allows a direct interface with an a… ▽ More At the end of Moore's law, new computing paradigms are required to prolong the battery life of wearable and IoT smart audio devices. Theoretical analysis and physical validation have shown that analog signal processing (ASP) can be more power-efficient than its digital counterpart in the realm of low-to-medium signal-to-noise ratio applications. In addition, ASP allows a direct interface with an analog microphone without a power-hungry analog-to-digital converter. Here, we present power-efficient analog acoustic features (PEAF) that are validated by fabricated CMOS chips for running audio recognition. Linear, non-linear, and learnable PEAF variants are evaluated on two speech processing tasks that are demanded in many battery-operated devices: wake word detection (WWD) and keyword spotting (KWS). Compared to digital acoustic features, higher power efficiency with competitive classification accuracy can be obtained. A novel theoretical framework based on information theory is established to analyze the information flow in each individual stage of the feature extraction pipeline. The analysis identifies the information bottleneck and helps improve the KWS accuracy by up to 7%. This work may pave the way to building more power-efficient smart audio devices with best-in-class inference performance. △ Less

Submitted 29 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Submitted to Interspeech 2022

Search v0.5.6 released 2020-02-24