A Spectral Energy Distance for Parallel Speech Synthesis
Authors:
Alexey A. Gritsenko,
Tim Salimans,
Rianne van den Berg,
Jasper Snoek,
Nal Kalchbrenner
Abstract:
Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems. A downside of such autoregressive models is that they require executing tens of thousands of sequential operations per second of generated audio, making them ill-suited fo…
▽ More
Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems. A downside of such autoregressive models is that they require executing tens of thousands of sequential operations per second of generated audio, making them ill-suited for deployment on specialized deep learning hardware. Here, we propose a new learning method that allows us to train highly parallel models of speech, without requiring access to an analytical likelihood function. Our approach is based on a generalized energy distance between the distributions of the generated and real audio. This spectral energy distance is a proper scoring rule with respect to the distribution over magnitude-spectrograms of the generated waveform audio and offers statistical consistency guarantees. The distance can be calculated from minibatches without bias, and does not involve adversarial learning, yielding a stable and consistent method for training implicit generative models. Empirically, we achieve state-of-the-art generation quality among implicit generative models, as judged by the recently-proposed cFDSD metric. When combining our method with adversarial techniques, we also improve upon the recently-proposed GAN-TTS model in terms of Mean Opinion Score as judged by trained human evaluators.
△ Less
Submitted 23 October, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.
Automatic Identification of Twin Zygosity in Resting-State Functional MRI
Authors:
Andrey Gritsenko,
Martin A. Lindquist,
Gregory R. Kirk,
Moo K. Chung
Abstract:
A key strength of twin studies arises from the fact that there are two types of twins, monozygotic and dizygotic, that share differing amounts of genetic information. Accurate differentiation of twin types allows efficient inference on genetic influences in a population. However, identification of zygosity is often prone to errors without genotying. In this study, we propose a novel pairwise featu…
▽ More
A key strength of twin studies arises from the fact that there are two types of twins, monozygotic and dizygotic, that share differing amounts of genetic information. Accurate differentiation of twin types allows efficient inference on genetic influences in a population. However, identification of zygosity is often prone to errors without genotying. In this study, we propose a novel pairwise feature representation to classify the zygosity of twin pairs of resting state functional magnetic resonance images (rs-fMRI). For this, we project an fMRI signal to a set of basis functions and use the projection coefficients as the compact and discriminative feature representation of noisy fMRI. We encode the relationship between twins as the correlation between the new feature representations across brain regions. We employ hill climbing variable selection to identify brain regions that are the most genetically affected. The proposed framework was applied to 208 twin pairs and achieved 94.19% classification accuracy in automatically identifying the zygosity of paired images.
△ Less
Submitted 26 October, 2018; v1 submitted 30 June, 2018;
originally announced July 2018.