A weighted-variance variational autoencoder model for speech enhancement

Golmakani, Ali; Sadeghi, Mostafa; Alameda-Pineda, Xavier; Serizel, Romain

Computer Science > Sound

arXiv:2211.00990 (cs)

[Submitted on 2 Nov 2022 (v1), last revised 26 Oct 2023 (this version, v2)]

Title:A weighted-variance variational autoencoder model for speech enhancement

Authors:Ali Golmakani (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Xavier Alameda-Pineda (ROBOTLEARN), Romain Serizel (MULTISPEECH)

View PDF

Abstract:We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.

Subjects:	Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2211.00990 [cs.SD]
	(or arXiv:2211.00990v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2211.00990

Submission history

From: Mostafa SADEGHI [view email] [via CCSD proxy]
[v1] Wed, 2 Nov 2022 09:51:15 UTC (21 KB)
[v2] Thu, 26 Oct 2023 11:47:25 UTC (35 KB)

Computer Science > Sound

Title:A weighted-variance variational autoencoder model for speech enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:A weighted-variance variational autoencoder model for speech enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators