Skip to main content

Showing 1–2 of 2 results for author: Peinelt, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.01576  [pdf, other

    cs.CL cs.SD eess.AS

    A Comparative Analysis of Pretrained Language Models for Text-to-Speech

    Authors: Marcel Granero-Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman

    Abstract: State-of-the-art text-to-speech (TTS) systems have utilized pretrained language models (PLMs) to enhance prosody and create more natural-sounding speech. However, while PLMs have been extensively researched for natural language understanding (NLU), their impact on TTS has been overlooked. In this study, we aim to address this gap by conducting a comparative analysis of different PLMs for two TTS t… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted for presentation at the 12th ISCA Speech Synthesis Workshop (SSW) in Grenoble, France, from 26th to 28th August 2023

  2. arXiv:2306.11327  [pdf, other

    eess.AS cs.SD

    eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer

    Authors: Ammar Abbas, Sri Karlapati, Bastian Schnell, Penny Karanasou, Marcel Granero Moya, Amith Nagaraj, Ayman Boustati, Nicole Peinelt, Alexis Moinet, Thomas Drugman

    Abstract: We present eCat, a novel end-to-end multispeaker model capable of: a) generating long-context speech with expressive and contextually appropriate prosody, and b) performing fine-grained prosody transfer between any pair of seen speakers. eCat is trained using a two-stage training approach. In Stage I, the model learns speaker-independent word-level prosody representations in an end-to-end fashion… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to be published in the Proceedings of InterSpeech 2023