Showing 1–1 of 1 results for author: Bailleul, C

Search v0.5.6 released 2020-02-24

arXiv:2108.07737 [pdf, other]

cs.CL cs.SD eess.AS

Combining speakers of multiple languages to improve quality of neural voices

Authors: Javier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie, Yannis Stylianou

Abstract: In this work, we explore multiple architectures and training procedures for develo** a multi-speaker and multi-lingual neural TTS system with the goals of a) improving the quality when the available data in the target language is limited and b) enabling cross-lingual synthesis. We report results from a large experiment using 30 speakers in 8 different languages across 15 different locales. The s… ▽ More In this work, we explore multiple architectures and training procedures for develo** a multi-speaker and multi-lingual neural TTS system with the goals of a) improving the quality when the available data in the target language is limited and b) enabling cross-lingual synthesis. We report results from a large experiment using 30 speakers in 8 different languages across 15 different locales. The system is trained on the same amount of data per speaker. Compared to a single-speaker model, when the suggested system is fine tuned to a speaker, it produces significantly better quality in most of the cases while it only uses less than $40\%$ of the speaker's data used to build the single-speaker model. In cross-lingual synthesis, on average, the generated quality is within $80\%$ of native single-speaker models, in terms of Mean Opinion Score. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 6 pages. 3 figures. Accepted to 11th Speech Synthesis Workshop, SSW11 (https://ssw11.hte.hu/en/)

Search v0.5.6 released 2020-02-24