Skip to main content

Showing 1–1 of 1 results for author: Mischler, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2306.07691  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

    Authors: Yinghao Aaron Li, Cong Han, Vinay S. Raghavan, Gavin Mischler, Nima Mesgarani

    Abstract: In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis. StyleTTS 2 differs from its predecessor by modeling styles as a latent random variable through diffusion models to generate the most suitable style for the text without requiring reference speech, a… ▽ More

    Submitted 19 November, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023