Skip to main content

Showing 1–4 of 4 results for author: Latorre, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2212.10075  [pdf, other

    eess.AS

    Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling

    Authors: Tuomo Raitio, Javier Latorre, Andrea Davis, Tuuli Morrill, Ladan Golipour

    Abstract: Neural text-to-speech (TTS) can provide quality close to natural speech if an adequate amount of high-quality speech material is available for training. However, acquiring speech data for TTS training is costly and time-consuming, especially if the goal is to generate different speaking styles. In this work, we show that we can transfer speaking style across speakers and improve the quality of syn… ▽ More

    Submitted 28 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to 12th ISCA Speech Synthesis Workshop (SSW)

  2. arXiv:2108.07737  [pdf, other

    cs.CL cs.SD eess.AS

    Combining speakers of multiple languages to improve quality of neural voices

    Authors: Javier Latorre, Charlotte Bailleul, Tuuli Morrill, Alistair Conkie, Yannis Stylianou

    Abstract: In this work, we explore multiple architectures and training procedures for develo** a multi-speaker and multi-lingual neural TTS system with the goals of a) improving the quality when the available data in the target language is limited and b) enabling cross-lingual synthesis. We report results from a large experiment using 30 speakers in 8 different languages across 15 different locales. The s… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: 6 pages. 3 figures. Accepted to 11th Speech Synthesis Workshop, SSW11 (https://ssw11.hte.hu/en/)

  3. arXiv:1811.06315  [pdf, other

    cs.CL eess.AS

    Effect of data reduction on sequence-to-sequence neural TTS

    Authors: Javier Latorre, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman, Srikanth Ronanki, Klimkov Viacheslav

    Abstract: Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings. However, these models require large amounts of data. This paper shows that the lack of data from one speaker can be compensated with data from other speakers. The naturalness of Tacotron2-like models trained on a blend of 5k utterances fro… ▽ More

    Submitted 23 November, 2018; v1 submitted 15 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 extra for references. Submitted to ICASSP 2019

  4. arXiv:1811.06292  [pdf, other

    eess.AS cs.SD

    Towards achieving robust universal neural vocoding

    Authors: Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

    Abstract: This paper explores the potential universality of neural vocoders. We train a WaveRNN-based vocoder on 74 speakers coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-d… ▽ More

    Submitted 4 July, 2019; v1 submitted 15 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 extra for references. Accepted on Interspeech 2019