End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Kim, Jaeyoung; El-Khamy, Mostafa; Lee, Jungwon

Computer Science > Sound

arXiv:1901.09146 (cs)

[Submitted on 26 Jan 2019 (v1), last revised 8 Mar 2023 (this version, v4)]

Title:End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Authors:Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

View PDF

Abstract:Supervised learning based on a deep neural network recently has achieved substantial improvement on speech enhancement. Denoising networks learn map** from noisy speech to clean one directly, or to a spectrum mask which is the ratio between clean and noisy spectra. In either case, the network is optimized by minimizing mean square error (MSE) between ground-truth labels and time-domain or spectrum output. However, existing schemes have either of two critical issues: spectrum and metric mismatches. The spectrum mismatch is a well known issue that any spectrum modification after short-time Fourier transform (STFT), in general, cannot be fully recovered after inverse short-time Fourier transform (ISTFT). The metric mismatch is that a conventional MSE metric is sub-optimal to maximize our target metrics, signal-to-distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ). This paper presents a new end-to-end denoising framework with the goal of joint SDR and PESQ optimization. First, the network optimization is performed on the time-domain signals after ISTFT to avoid spectrum mismatch. Second, two loss functions which have improved correlations with SDR and PESQ metrics are proposed to minimize metric mismatch. The experimental result showed that the proposed denoising scheme significantly improved both SDR and PESQ performance over the existing methods.

Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as:	arXiv:1901.09146 [cs.SD]
	(or arXiv:1901.09146v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1901.09146

Submission history

From: Jaeyoung Kim [view email]
[v1] Sat, 26 Jan 2019 02:48:08 UTC (5,601 KB)
[v2] Wed, 30 Jan 2019 19:38:57 UTC (7,553 KB)
[v3] Sun, 5 Mar 2023 06:04:37 UTC (3,497 KB)
[v4] Wed, 8 Mar 2023 23:46:09 UTC (3,497 KB)

Computer Science > Sound

Title:End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators