Deep Speech Synthesis from Articulatory Representations
Authors:
Peter Wu,
Shinji Watanabe,
Louis Goldstein,
Alan W Black,
Gopala K. Anumanchipalli
Abstract:
In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. How…
▽ More
In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. However, it remains unclear whether these models can achieve the efficiency and fidelity of the human speech production system. To help bridge this gap, we propose a time-domain articulatory synthesis methodology and demonstrate its efficacy with both electromagnetic articulography (EMA) and synthetic articulatory feature inputs. Our model is computationally efficient and achieves a transcription word error rate (WER) of 18.5% for the EMA-to-speech task, yielding an improvement of 11.6% compared to prior work. Through interpolation experiments, we also highlight the generalizability and interpretability of our approach.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
Derivation of Fitts' law from the Task Dynamics model of speech production
Authors:
Tanner Sorensen,
Adam Lammert,
Louis Goldstein,
Shrikanth Narayanan
Abstract:
Fitts' law is a linear equation relating movement time to an index of movement difficulty. The recent finding that Fitts' law applies to voluntary movement of the vocal tract raises the question of whether the theory of speech production implies Fitts' law. The present letter establishes a theoretical connection between Fitts' law and the Task Dynamics model of speech production. We derive a varia…
▽ More
Fitts' law is a linear equation relating movement time to an index of movement difficulty. The recent finding that Fitts' law applies to voluntary movement of the vocal tract raises the question of whether the theory of speech production implies Fitts' law. The present letter establishes a theoretical connection between Fitts' law and the Task Dynamics model of speech production. We derive a variant of Fitts' law where the intercept and slope are functions of the parameters of the Task Dynamics model and the index of difficulty is a product logarithm, or Lambert W function, rather than a logarithm.
△ Less
Submitted 17 March, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.