-
Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression
Authors:
Ernst Seidel,
Pejman Mowlaee,
Tim Fingscheidt
Abstract:
In recent years, the introduction of neural networks (NNs) into the field of speech enhancement has brought significant improvements. However, many of the proposed methods are quite demanding in terms of computational complexity and memory footprint. For the application in dedicated communication devices, such as speakerphones, hands-free car systems, or smartphones, efficiency plays a major role…
▽ More
In recent years, the introduction of neural networks (NNs) into the field of speech enhancement has brought significant improvements. However, many of the proposed methods are quite demanding in terms of computational complexity and memory footprint. For the application in dedicated communication devices, such as speakerphones, hands-free car systems, or smartphones, efficiency plays a major role along with performance. In this context, we present an efficient, high-performance hybrid joint acoustic echo control and noise suppression system, whereby our main contribution is the postfilter NN, performing both noise and residual echo suppression. The preservation of nearend speech is improved by a Bark-scale auditory filterbank for the NN postfilter. The proposed hybrid method is benchmarked with state-of-the-art methods and its effectiveness is demonstrated on the ICASSP 2023 AEC Challenge blind test set. We demonstrate that it offers high-quality nearend speech preservation during both double-talk and nearend speech conditions. At the same time, it is capable of efficient removal of echo leaks, achieving a comparable performance to already small state-of-the-art models such as the end-to-end DeepVQE-S, while requiring only around 10 % of its computational complexity. This makes it easily realtime implementable on a speakerphone device.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Efficient Acoustic Echo Suppression with Condition-Aware Training
Authors:
Ernst Seidel,
Pejman Mowlaee,
Tim Fingscheidt
Abstract:
The topic of deep acoustic echo control (DAEC) has seen many approaches with various model topologies in recent years. Convolutional recurrent networks (CRNs), consisting of a convolutional encoder and decoder encompassing a recurrent bottleneck, are repeatedly employed due to their ability to preserve nearend speech even in double-talk (DT) condition. However, past architectures are either comput…
▽ More
The topic of deep acoustic echo control (DAEC) has seen many approaches with various model topologies in recent years. Convolutional recurrent networks (CRNs), consisting of a convolutional encoder and decoder encompassing a recurrent bottleneck, are repeatedly employed due to their ability to preserve nearend speech even in double-talk (DT) condition. However, past architectures are either computationally complex or trade off smaller model sizes with a decrease in performance. We propose an improved CRN topology which, compared to other realizations of this class of architectures, not only saves parameters and computational complexity, but also shows improved performance in DT, outperforming both baseline architectures FCRN and CRUSE. Striving for a condition-aware training, we also demonstrate the importance of a high proportion of double-talk and the missing value of nearend-only speech in DAEC training data. Finally, we show how to control the trade-off between aggressive echo suppression and near-end speech preservation by fine-tuning with condition-aware component loss functions.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Bandwidth-Scalable Fully Mask-Based Deep FCRN Acoustic Echo Cancellation and Postfiltering
Authors:
Ernst Seidel,
Rasmus Kongsgaard Olsson,
Karim Haddad,
Zhengyang Li,
Pejman Mowlaee,
Tim Fingscheidt
Abstract:
Although today's speech communication systems support various bandwidths from narrowband to super-wideband and beyond, state-of-the art DNN methods for acoustic echo cancellation (AEC) are lacking modularity and bandwidth scalability. Our proposed DNN model builds upon a fully convolutional recurrent network (FCRN) and introduces scalability over various bandwidths up to a fullband (FB) system (48…
▽ More
Although today's speech communication systems support various bandwidths from narrowband to super-wideband and beyond, state-of-the art DNN methods for acoustic echo cancellation (AEC) are lacking modularity and bandwidth scalability. Our proposed DNN model builds upon a fully convolutional recurrent network (FCRN) and introduces scalability over various bandwidths up to a fullband (FB) system (48 kHz sampling rate). This modular approach allows joint wideband (WB) pre-training of mask-based AEC and postfilter stages with dedicated losses, followed by a separate training of them on FB data. A third lightweight blind bandwidth extension stage is separately trained on FB data, flexibly allowing to extend the WB postfilter output towards higher bandwidths until reaching FB. Thereby, higher frequency noise and echo are reliably suppressed. On the ICASSP 2022 Acoustic Echo Cancellation Challenge blind test set we report a competitive performance, showing robustness even under highly delayed echo and dynamic echo path changes.
△ Less
Submitted 7 November, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Circular Statistics-based low complexity DOA estimation for hearing aid application
Authors:
Lars D. Mosgaard,
David Pelegrin-Garcia,
Thomas B. Elmedyb,
Michael J. Pihl,
Pejman Mowlaee
Abstract:
The proposed Circular statistics-based Inter-Microphone Phase difference estimation Localizer (CIMPL) method is tailored toward binaural hearing aid systems with microphone arrays in each unit. The method utilizes the circular statistics (circular mean and circular variance) of inter-microphone phase difference (IPD) across different microphone pairs. These IPDs are firstly mapped to time delays t…
▽ More
The proposed Circular statistics-based Inter-Microphone Phase difference estimation Localizer (CIMPL) method is tailored toward binaural hearing aid systems with microphone arrays in each unit. The method utilizes the circular statistics (circular mean and circular variance) of inter-microphone phase difference (IPD) across different microphone pairs. These IPDs are firstly mapped to time delays through a variance-weighted linear fit, then mapped to azimuth direction-of-arrival (DoA) and lastly information of different microphone pairs is combined. The variance is carried through the different transformations and acts as a reliability index of the estimated angle. Both the resulting angle and variance are fed into a wrapped Kalman filter, which provides a smoothed estimate of the DoA. The proposed method improves the accuracy of the tracked angle of a single moving source compared with the benchmark method provided by the LOCATA challenge, and it runs approximately 75 times faster.
△ Less
Submitted 17 December, 2018;
originally announced December 2018.