Skip to main content

Showing 1–15 of 15 results for author: Choi, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05965  [pdf, other

    eess.AS cs.AI

    MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

    Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung ** Choi, Nam Soo Kim

    Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2401.01498  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

    Authors: Minchan Kim, Myeonghun Jeong, Byoung ** Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We propose a novel text-to-speech (TTS) framework centered around a neural transducer. Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence (seq2seq) modeling and fine-grained acoustic modeling stages, utilizing discrete semantic tokens obtained from wav2vec2.0 embeddings. For a robust and efficient alignment modeling, we employ a neural transducer named token trans… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2311.02898  [pdf, other

    eess.AS cs.LG

    Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

    Authors: Minchan Kim, Myeonghun Jeong, Byoung ** Choi, Dongjune Lee, Nam Soo Kim

    Abstract: We introduce a text-to-speech(TTS) framework based on a neural transducer. We use discretized semantic tokens acquired from wav2vec2.0 embeddings, which makes it easy to adopt a neural transducer for the TTS framework enjoying its monotonic alignment constraints. The proposed model first generates aligned semantic tokens using the neural transducer, then synthesizes a speech sample from the semant… ▽ More

    Submitted 8 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ASRU2023

  4. arXiv:2305.15526  [pdf, other

    eess.SP

    Radiomap Inpainting for Restricted Areas based on Propagation Priority and Depth Map

    Authors: Songyang Zhang, Tianhang Yu, Brian Choi, Feng Ouyang, Zhi Ding

    Abstract: Providing rich and useful information regarding spectrum activities and propagation channels, radiomaps characterize the detailed distribution of power spectral density (PSD) and are important tools for network planning in modern wireless systems. Generally, radiomaps are constructed from radio strength measurements by deployed sensors and user devices. However, not all areas are accessible for ra… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: submitted to IEEE journal for possible publication

  5. SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

    Authors: Byoung ** Choi, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

    Abstract: Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker. The main challenge of ZSM-TTS is to increase the overall speaker similarity for unseen speakers. One of the most successful speaker conditioning methods for flow-based multi-speaker text-to-speech (TTS) models is to utilize the functions which predict the scal… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Signal Processing Letters

  6. arXiv:2210.05979  [pdf, other

    eess.AS cs.SD

    Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

    Authors: Byoung ** Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim

    Abstract: Several recently proposed text-to-speech (TTS) models achieved to generate the speech samples with the human-level quality in the single-speaker and multi-speaker TTS scenarios with a set of pre-defined speakers. However, synthesizing a new speaker's voice with a single reference audio, commonly known as zero-shot multi-speaker text-to-speech (ZSM-TTS), is still a very challenging task. The main c… ▽ More

    Submitted 22 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: APSIPA 2022

  7. arXiv:2209.04566  [pdf, other

    eess.SP

    Exemplar-Based Radio Map Reconstruction of Missing Areas Using Propagation Priority

    Authors: Songyang Zhang, Tianhang Yu, Jonathan Tivald, Brian Choi, Feng Ouyang, Zhi Ding

    Abstract: Radio map describes network coverage and is a practically important tool for network planning in modern wireless systems. Generally, radio strength measurements are collected to construct fine-resolution radio maps for analysis. However, certain protected areas are not accessible for measurement due to physical constraints and security considerations, leading to blanked spaces on a radio map. Non-… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: To appear in 2022 IEEE Global Communications Conference (Globecom)

  8. Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

    Authors: Minchan Kim, Myeonghun Jeong, Byoung ** Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim

    Abstract: Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is troublesome to collect. In this paper, we propose a transfer learning framework for TTS that utilizes a large amount of unlabeled speech dataset for pre-training. By leveraging wav2vec2.0 representation, unlabeled speech can highly improve performance, especially in the lack of labeled speech. We also… ▽ More

    Submitted 6 October, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech2022

  9. arXiv:2104.01409  [pdf, other

    eess.AS cs.AI cs.SD

    Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

    Authors: Myeonghun Jeong, Hyeongju Kim, Sung Jun Cheon, Byoung ** Choi, Nam Soo Kim

    Abstract: Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency. In this work, we propose a novel non-autoregressive TTS model, namely Diff-TTS, which achieves highly natural and efficient speech synthesis. Given the text, Diff-TTS exploits a denoising d… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH 2021

  10. Expressive Text-to-Speech using Style Tag

    Authors: Minchan Kim, Sung Jun Cheon, Byoung ** Choi, Jong ** Kim, Nam Soo Kim

    Abstract: As recent text-to-speech (TTS) systems have been rapidly improved in speech quality and generation speed, many researchers now focus on a more challenging issue: expressive TTS. To control speaking styles, existing expressive TTS models use categorical style index or reference speech as style input. In this work, we propose StyleTagging-TTS (ST-TTS), a novel expressive TTS model that utilizes a st… ▽ More

    Submitted 6 October, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  11. arXiv:2012.00179  [pdf, other

    cs.LG cs.CV eess.IV

    Crowd-Sourced Road Quality Map** in the Develo** World

    Authors: Benjamin Choi, John Kamalu

    Abstract: Road networks are among the most essential components of a country's infrastructure. By facilitating the movement and exchange of goods, people, and ideas, they support economic and cultural activity both within and across borders. Up-to-date map** of the the geographical distribution of roads and their quality is essential in high-impact applications ranging from land use planning to wilderness… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: Presented at NeurIPS 2020 Workshop on Machine Learning for the Develo** World

  12. arXiv:2006.04598  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung ** Choi, Nam Soo Kim

    Abstract: In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conven… ▽ More

    Submitted 2 July, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 8 pages, 4 figures, Second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020)

  13. arXiv:2005.10985  [pdf

    eess.SP cs.CV cs.LG

    Apply VGGNet-based deep learning model of vibration data for prediction model of gravity acceleration equipment

    Authors: SeonWoo Lee, HyeonTak Yu, HoJun Yang, JaeHeung Yang, GangMin Lim, KyuSung Kim, ByeongKeun Choi, JangWoo Kwon

    Abstract: Hypergravity accelerators are a type of large machinery used for gravity training or medical research. A failure of such large equipment can be a serious problem in terms of safety or costs. This paper proposes a prediction model that can proactively prevent failures that may occur in a hypergravity accelerator. The method proposed in this paper was to convert vibration signals to spectograms and… ▽ More

    Submitted 18 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: 15 pages, 10 figures "for associated publication of paper is as follow: Journal of Mechanics in Medicine and Biology, https://www.worldscientific.com/worldscinet/jmmb"

  14. arXiv:1908.05133  [pdf

    eess.SP cs.HC

    Assessing Workers Perceived Risk During Construction Task Using A Wristband-Type Biosensor

    Authors: Byungjoo Choi, Gaang Lee, Houtan Jebelli, SangHyun Lee

    Abstract: The construction industry has demonstrated a high frequency and severity of accidents. Construction accidents are the result of the interaction between unsafe work conditions and workers unsafe behaviors. Given this relation, perceived risk is determined by an individual response to a potential work hazard during the work. As such, risk perception is critical to understand workers unsafe behaviors… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

    Journal ref: Proceedings of the Creative Construction Conference (CCC 2019)

  15. arXiv:1904.09302  [pdf, other

    eess.SY

    Model Predictive Control Framework for Improving Vehicle Cornering Performance Using Handling Characteristics

    Authors: Kyoungseok Han, Giseo Park, Gokul S. Sankar, Kanghyun Nam, Seibum B. Choi

    Abstract: This paper proposes a new control strategy to improve vehicle cornering performance in a model predictive control framework. The most distinguishing feature of the proposed method is that the natural handling characteristics of the production vehicle is exploited to reduce the complexity of the conventional control methods. For safety s sake, most production vehicles are built to exhibit an unders… ▽ More

    Submitted 14 November, 2019; v1 submitted 19 April, 2019; originally announced April 2019.