Skip to main content

Showing 1–4 of 4 results for author: Cheon, S J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2104.01409  [pdf, other

    eess.AS cs.AI cs.SD

    Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

    Authors: Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung ** Choi, Nam Soo Kim

    Abstract: Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency. In this work, we propose a novel non-autoregressive TTS model, namely Diff-TTS, which achieves highly natural and efficient speech synthesis. Given the text, Diff-TTS exploits a denoising d… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH 2021

  2. Expressive Text-to-Speech using Style Tag

    Authors: Minchan Kim, Sung Jun Cheon, Byoung ** Choi, Jong ** Kim, Nam Soo Kim

    Abstract: As recent text-to-speech (TTS) systems have been rapidly improved in speech quality and generation speed, many researchers now focus on a more challenging issue: expressive TTS. To control speaking styles, existing expressive TTS models use categorical style index or reference speech as style input. In this work, we propose StyleTagging-TTS (ST-TTS), a novel expressive TTS model that utilizes a st… ▽ More

    Submitted 6 October, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  3. Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

    Authors: Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo Kim

    Abstract: Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attent… ▽ More

    Submitted 14 January, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  4. arXiv:2006.04598  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung ** Choi, Nam Soo Kim

    Abstract: In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conven… ▽ More

    Submitted 2 July, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 8 pages, 4 figures, Second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020)