-
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Authors:
Cheng Gong,
Erica Cooper,
Xin Wang,
Chunyu Qiang,
Mengzhe Geng,
Dan Wells,
Longbiao Wang,
Jianwu Dang,
Marc Tessier,
Aidan Pine,
Korin Richmond,
Junichi Yamagishi
Abstract:
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on…
▽ More
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Authors:
Cheng Gong,
Xin Wang,
Erica Cooper,
Dan Wells,
Longbiao Wang,
Jianwu Dang,
Korin Richmond,
Junichi Yamagishi
Abstract:
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. In most cases, TTS systems are built using a single speaker's voice. However, there is growing interest in develo** systems that can synthesize voices…
▽ More
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. In most cases, TTS systems are built using a single speaker's voice. However, there is growing interest in develo** systems that can synthesize voices for new speakers using only a few seconds of their speech. This paper presents ZMM-TTS, a multilingual and multispeaker framework utilizing quantized latent speech representations from a large-scale, pre-trained, self-supervised model. Our paper is the first to incorporate the representations from text-based and speech-based self-supervised learning models into multilingual speech synthesis tasks. We conducted comprehensive subjective and objective evaluations through a series of experiments. Our model has been proven effective in terms of speech naturalness and similarity for both seen and unseen speakers in six high-resource languages. We also tested the efficiency of our method on two hypothetical low-resource languages. The results are promising, indicating that our proposed approach can synthesize audio that is intelligible and has a high degree of similarity to the target speaker's voice, even without any training data for the new, unseen language.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Stretch with Stretch: Physical Therapy Exercise Games Led by a Mobile Manipulator
Authors:
Matthew Lamsey,
You Liang Tan,
Meredith D. Wells,
Madeline Beatty,
Zexuan Liu,
Arjun Majumdar,
Kendra Washington,
Jerry Feldman,
Naveen Kuppuswamy,
Elizabeth Nguyen,
Arielle Wallenstein,
Madeleine E. Hackney,
Charles C. Kemp
Abstract:
Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use…
▽ More
Physical therapy (PT) is a key component of many rehabilitation regimens, such as treatments for Parkinson's disease (PD). However, there are shortages of physical therapists and adherence to self-guided PT is low. Robots have the potential to support physical therapists and increase adherence to self-guided PT, but prior robotic systems have been large and immobile, which can be a barrier to use in homes and clinics. We present Stretch with Stretch (SWS), a novel robotic system for leading stretching exercise games for older adults with PD. SWS consists of a compact and lightweight mobile manipulator (Hello Robot Stretch RE1) that visually and verbally guides users through PT exercises. The robot's soft end effector serves as a target that users repetitively reach towards and press with a hand, foot, or knee. For each exercise, target locations are customized for the individual via a visually estimated kinematic model, a haptically estimated range of motion, and the person's exercise performance. The system includes sound effects and verbal feedback from the robot to keep users engaged throughout a session and augment physical exercise with cognitive exercise. We conducted a user study for which people with PD (n=10) performed 6 exercises with the system. Participants perceived the SWS to be useful and easy to use. They also reported mild to moderate perceived exertion (RPE).
△ Less
Submitted 21 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Comparing Machine Learning and Interpolation Methods for Loop-Level Calculations
Authors:
Ibrahim Chahrour,
James D. Wells
Abstract:
The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but ofte…
▽ More
The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but often the motivation of using one method over another is lacking. Comparing these methods is typically highly dependent on the problem at hand, so we specify to the case where we can evaluate the function a large number of times, after which quick and accurate evaluation can take place. We consider four interpolation and three machine learning techniques and compare their performance on three toy functions, the four-point scalar Passarino-Veltman $D_0$ function, and the two-loop self-energy master integral $M$. We find that in low dimensions ($d = 3$), traditional interpolation techniques like the Radial Basis Function perform very well, but in higher dimensions ($d=5, 6, 9$) we find that multi-layer perceptrons (a.k.a neural networks) do not suffer as much from the curse of dimensionality and provide the fastest and most accurate predictions.
△ Less
Submitted 28 March, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
The deal.II finite element library: design, features, and insights
Authors:
Daniel Arndt,
Wolfgang Bangerth,
Denis Davydov,
Timo Heister,
Luca Heltai,
Martin Kronbichler,
Matthias Maier,
Jean-Paul Pelteret,
Bruno Turcksin,
David Wells
Abstract:
deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diver…
▽ More
deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diverse and worldwide community of developers and users, as well as an educational platform. We therefore also discuss some of the technical and social challenges and lessons learned in running a large community software project over the course of two decades.
△ Less
Submitted 17 February, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.