Skip to main content

Showing 1–4 of 4 results for author: Foglianti, L

.
  1. arXiv:2106.08352  [pdf, other

    eess.AS cs.LG cs.SD

    Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

    Authors: Devang S Ram Mohan, Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King

    Abstract: Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct rendit… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: To be published in Interspeech 2021. 5 pages, 4 figures

  2. arXiv:2106.08321  [pdf, other

    eess.AS

    ADEPT: A Dataset for Evaluating Prosody Transfer

    Authors: Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King

    Abstract: Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for meas… ▽ More

    Submitted 21 July, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 5 pages, 1 figure, accepted to Interspeech 2021

  3. Phonological Features for 0-shot Multilingual Speech Synthesis

    Authors: Marlene Staib, Tian Huey Teh, Alexandra Torresquintero, Devang S Ram Mohan, Lorenzo Foglianti, Raphael Lenain, Jiameng Gao

    Abstract: Code-switching---the intra-utterance use of multiple languages---is prevalent across the world. Within text-to-speech (TTS), multilingual models have been found to enable code-switching. By modifying the linguistic input to sequence-to-sequence TTS, we show that code-switching is possible for languages unseen during training, even within monolingual models. We use a small set of phonological featu… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 5 pages, to be presented at INTERSPEECH 2020

  4. arXiv:2008.03096  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

    Authors: Devang S Ram Mohan, Raphael Lenain, Lorenzo Foglianti, Tian Huey Teh, Marlene Staib, Alexandra Torresquintero, Jiameng Gao

    Abstract: Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised. This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation. Interleaving the action of reading a character with that of synthesising audio reduces this latency. However, the order of this sequence of interleaved actions v… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: To be published in Interspeech 2020. 5 pages, 4 figures