Search | arXiv e-print repository

doi 10.21437/Interspeech.2022-524

Extending GCC-PHAT using Shift Equivariant Neural Networks

Authors: Axel Berg, Mark O'Connor, Kalle Åström, Magnus Oskarsson

Abstract: Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost… ▽ More Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Proceedings of INTERSPEECH

Journal ref: Proc. Interspeech 2022, 1791-1795

arXiv:2104.00769 [pdf, other]

doi 10.21437/Interspeech.2021-1286

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Authors: Axel Berg, Mark O'Connor, Miguel Tairum Cruz

Abstract: The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully se… ▽ More The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively. △ Less

Submitted 15 June, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: Proceedings of INTERSPEECH

Journal ref: Proc. Interspeech 2021, 4249-4253

arXiv:1912.08308 [pdf, other]

Distributed Network Privacy using Error Correcting Codes

Authors: Matt O'Connor, W. Bastiaan Kleijn

Abstract: Most current distributed processing research deals with improving the flexibility and convergence speed of algorithms for networks of finite size with no constraints on information sharing and no concept for expected levels of signal privacy. In this work we investigate the concept of data privacy in unbounded public networks, where linear codes are used to create hard limits on the number of node… ▽ More Most current distributed processing research deals with improving the flexibility and convergence speed of algorithms for networks of finite size with no constraints on information sharing and no concept for expected levels of signal privacy. In this work we investigate the concept of data privacy in unbounded public networks, where linear codes are used to create hard limits on the number of nodes contributing to a distributed task. We accomplish this by wrap** local observations in a linear code and intentionally applying symbol errors prior to transmission. If many nodes join the distributed task, a proportional number of symbol errors are introduced into the code leading to decoding failure if the code's predefined symbol error limit is exceeded. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Showing 1–3 of 3 results for author: O'Connor, M