Skip to main content

Showing 1–3 of 3 results for author: Huo, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2205.04029  [pdf, other

    cs.SD cs.MM eess.AS

    Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

    Authors: Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin **

    Abstract: This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training,… ▽ More

    Submitted 2 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: Accepted by Interspeech

  2. arXiv:2010.12024  [pdf, other

    eess.AS cs.LG cs.SD

    Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

    Authors: Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin **

    Abstract: The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well and are prone to over-fitting due to data scarcity. However, we often encounter data limitation problem in building SVS systems because of high data acquisition and annotation costs. In this work, we propose a Perceptual Entropy (PE) loss derived from a psycho-acoustic hearing model to regular… ▽ More

    Submitted 26 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted by ICASSP2021

  3. arXiv:2008.08647  [pdf, other

    eess.AS cs.SD

    Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

    Authors: Jiatong Shi, Nan Huo, Qin **

    Abstract: Mispronunciation detection is an essential component of the Computer-Assisted Pronunciation Training (CAPT) systems. State-of-the-art mispronunciation detection models use Deep Neural Networks (DNN) for acoustic modeling, and a Goodness of Pronunciation (GOP) based algorithm for pronunciation scoring. However, GOP based scoring models have two major limitations: i.e., (i) They depend on forced a… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: Accepted by Interspeech2020