Skip to main content

Showing 1–4 of 4 results for author: Cheon, S J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2104.01409  [pdf, other

    eess.AS cs.AI cs.SD

    Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

    Authors: Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung ** Choi, Nam Soo Kim

    Abstract: Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency. In this work, we propose a novel non-autoregressive TTS model, namely Diff-TTS, which achieves highly natural and efficient speech synthesis. Given the text, Diff-TTS exploits a denoising d… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH 2021

  2. Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

    Authors: Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo Kim

    Abstract: Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attent… ▽ More

    Submitted 14 January, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  3. arXiv:2006.04598  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung ** Choi, Nam Soo Kim

    Abstract: In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conven… ▽ More

    Submitted 2 July, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 8 pages, 4 figures, Second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020)

  4. Giving Space to Your Message: Assistive Word Segmentation for the Electronic Ty** of Digital Minorities

    Authors: Won Ik Cho, Sung Jun Cheon, Woo Hyun Kang, Ji Won Kim, Nam Soo Kim

    Abstract: For readability and disambiguation of the written text, appropriate word segmentation is recommended for documentation, and it also holds for the digitized texts. If the language is agglutinative while far from scriptio continua, for instance in the Korean language, the problem becomes more significant. However, some device users these days find it challenging to communicate via key stroking, not… ▽ More

    Submitted 4 May, 2021; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: DIS 2021 Camera-ready