Showing 1–2 of 2 results for author: Masztalski, P

Search v0.5.6 released 2020-02-24

arXiv:2008.07244 [pdf, other]

eess.AS cs.LG cs.SD stat.ML

Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Authors: Michał Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski

Abstract: We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can oper… ▽ More We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can operate in a low-latency incremental inference mode which matches the complexity of layer-by-layer batch mode. Compared to a similar fully-convolutional architecture, MASnet incorporates depthwise and pointwise convolutions for a large reduction in fused multiply-accumulate operations per second (FMA/s), at the cost of some reduction in SNR. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: Accepted for INTERSPEECH 2020
arXiv:2008.07231 [pdf, other]

eess.AS cs.LG cs.SD

StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Authors: Piotr Masztalski, Mateusz Matuszewski, Karol Piaskowski, Michał Romaniuk

Abstract: In this paper we introduce StoRIR - a stochastic room impulse response generation method dedicated to audio data augmentation in machine learning applications. This technique, in contrary to geometrical methods like image-source or ray tracing, does not require prior definition of room geometry, absorption coefficients or microphone and source placement and is dependent solely on the acoustic para… ▽ More In this paper we introduce StoRIR - a stochastic room impulse response generation method dedicated to audio data augmentation in machine learning applications. This technique, in contrary to geometrical methods like image-source or ray tracing, does not require prior definition of room geometry, absorption coefficients or microphone and source placement and is dependent solely on the acoustic parameters of the room. The method is intuitive, easy to implement and allows to generate RIRs of very complicated enclosures. We show that StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method, effectively improving many of them by more than 5 %. We publish a Python implementation of StoRIR online △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: Accepted for INTERSPEECH 2020

Search v0.5.6 released 2020-02-24