Skip to main content

Showing 1–2 of 2 results for author: Wan, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2208.13183  [pdf, other

    cs.SD eess.AS

    Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

    Authors: Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark

    Abstract: Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. T… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Comments: To be published in Interspeech 2022

  2. arXiv:1905.07195  [pdf, other

    cs.CL cs.SD eess.AS

    CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

    Authors: Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark

    Abstract: The prosodic aspects of speech signals produced by current text-to-speech systems are typically averaged over training material, and as such lack the variety and liveliness found in natural speech. To avoid monotony and averaged prosody contours, it is desirable to have a way of modeling the variation in the prosodic aspects of speech, so audio signals can be synthesized in multiple ways for a giv… ▽ More

    Submitted 4 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.