Skip to main content

Showing 1–5 of 5 results for author: Klabbers, E

.
  1. arXiv:2306.12040  [pdf, other

    cs.CL eess.AS

    Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Map**, Features Input, and Source Language Selection

    Authors: Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

    Abstract: We compare using a PHOIBLE-based phone map** method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We u… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Accepted at the Speech Synthesis Workshop 2023

  2. arXiv:2306.00535  [pdf, other

    cs.CL eess.AS

    The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

    Authors: Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

    Abstract: We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a)… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  3. arXiv:2305.19396  [pdf, other

    eess.AS cs.CL

    Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages

    Authors: Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

    Abstract: We train a MOS prediction model based on wav2vec 2.0 using the open-access data sets BVCC and SOMOS. Our test with neural TTS data in the low-resource language (LRL) West Frisian shows that pre-training on BVCC before fine-tuning on SOMOS leads to the best accuracy for both fine-tuned and zero-shot prediction. Further fine-tuning experiments show that using more than 30 percent of the total data d… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023

  4. arXiv:2204.00061  [pdf, other

    cs.SD cs.CL eess.AS

    Data-augmented cross-lingual synthesis in a teacher-student framework

    Authors: Marcel de Korte, Jaebok Kim, Aki Kunikoshi, Adaeze Adigwe, Esther Klabbers

    Abstract: Cross-lingual synthesis can be defined as the task of letting a speaker generate fluent synthetic speech in another language. This is a challenging task, and resulting speech can suffer from reduced naturalness, accented speech, and/or loss of essential voice characteristics. Previous research shows that many models appear to have insufficient generalization capabilities to perform well on every o… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

    ACM Class: I.2.7

  5. arXiv:2008.09659  [pdf, other

    eess.AS cs.CL cs.SD

    Efficient neural speech synthesis for low-resource languages through multilingual modeling

    Authors: Marcel de Korte, Jaebok Kim, Esther Klabbers

    Abstract: Recent advances in neural TTS have led to models that can produce high-quality synthetic speech. However, these models typically require large amounts of training data, which can make it costly to produce a new voice with the desired quality. Although multi-speaker modeling can reduce the data requirements necessary for a new voice, this approach is usually not viable for many low-resource languag… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.