Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Luong, Hieu-Thi; Yamagishi, Junichi

Computer Science > Sound

arXiv:2106.13479 (cs)

[Submitted on 25 Jun 2021]

Title:Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Authors:Hieu-Thi Luong, Junichi Yamagishi

View PDF

Abstract:Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers. However, by learning useful latent representation, the system can be used for many more practical scenarios. In this paper, we investigate the use of quantized vectors to model the latent linguistic embedding and compare it with the continuous counterpart. By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding that takes on different properties while having a similar performance in terms of quality and speaker similarity. Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations, but has a discrete latent space that is useful for reducing the representation bit-rate, which is desirable for data transferring, or limiting the information leaking, which is important for speaker anonymization and other tasks of that nature.

Comments:	to be presented at SSW11
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2106.13479 [cs.SD]
	(or arXiv:2106.13479v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2106.13479

Submission history

From: Hieu-Thi Luong [view email]
[v1] Fri, 25 Jun 2021 07:51:35 UTC (472 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.SD

< prev | next >

new | recent | 2021-06

Change to browse by:

cs
cs.CL
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Hieu-Thi Luong
Junichi Yamagishi

export BibTeX citation

Computer Science > Sound

Title:Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators