-
Extending GCC-PHAT using Shift Equivariant Neural Networks
Authors:
Axel Berg,
Mark O'Connor,
Kalle Åström,
Magnus Oskarsson
Abstract:
Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost…
▽ More
Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Authors:
Axel Berg,
Mark O'Connor,
Miguel Tairum Cruz
Abstract:
The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully se…
▽ More
The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.
△ Less
Submitted 15 June, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Distributed Network Privacy using Error Correcting Codes
Authors:
Matt O'Connor,
W. Bastiaan Kleijn
Abstract:
Most current distributed processing research deals with improving the flexibility and convergence speed of algorithms for networks of finite size with no constraints on information sharing and no concept for expected levels of signal privacy. In this work we investigate the concept of data privacy in unbounded public networks, where linear codes are used to create hard limits on the number of node…
▽ More
Most current distributed processing research deals with improving the flexibility and convergence speed of algorithms for networks of finite size with no constraints on information sharing and no concept for expected levels of signal privacy. In this work we investigate the concept of data privacy in unbounded public networks, where linear codes are used to create hard limits on the number of nodes contributing to a distributed task. We accomplish this by wrap** local observations in a linear code and intentionally applying symbol errors prior to transmission. If many nodes join the distributed task, a proportional number of symbol errors are introduced into the code leading to decoding failure if the code's predefined symbol error limit is exceeded.
△ Less
Submitted 17 December, 2019;
originally announced December 2019.