Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Xu, Yong; Zhang, Zhuohuang; Yu, Meng; Zhang, Shi-Xiong; Yu, Dong

Computer Science > Sound

arXiv:2101.01280 (cs)

[Submitted on 4 Jan 2021 (v1), last revised 3 Apr 2021 (this version, v5)]

Title:Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Authors:Yong Xu, Zhuohuang Zhang, Meng Yu, Shi-Xiong Zhang, Dong Yu

View PDF

Abstract:Although the conventional mask-based minimum variance distortionless response (MVDR) could reduce the non-linear distortion, the residual noise level of the MVDR separated speech is still high. In this paper, we propose a spatio-temporal recurrent neural network based beamformer (RNN-BF) for target speech separation. This new beamforming framework directly learns the beamforming weights from the estimated speech and noise spatial covariance matrices. Leveraging on the temporal modeling capability of RNNs, the RNN-BF could automatically accumulate the statistics of the speech and noise covariance matrices to learn the frame-level beamforming weights in a recursive way. An RNN-based generalized eigenvalue (RNN-GEV) beamformer and a more generalized RNN beamformer (GRNN-BF) are proposed. We further improve the RNN-GEV and the GRNN-BF by using layer normalization to replace the commonly used mask normalization on the covariance matrices. The proposed GRNN-BF obtains better performance against prior arts in terms of speech quality (PESQ), speech-to-noise ratio (SNR) and word error rate (WER).

Comments:	Submitted to Interspeech2021, Demo: this https URL
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2101.01280 [cs.SD]
	(or arXiv:2101.01280v5 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2101.01280

Submission history

From: Yong Xu Dr [view email]
[v1] Mon, 4 Jan 2021 23:31:41 UTC (888 KB)
[v2] Fri, 26 Mar 2021 08:57:40 UTC (940 KB)
[v3] Mon, 29 Mar 2021 20:01:55 UTC (969 KB)
[v4] Wed, 31 Mar 2021 19:06:12 UTC (969 KB)
[v5] Sat, 3 Apr 2021 08:16:54 UTC (969 KB)

Computer Science > Sound

Title:Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators