Skip to main content

Showing 1–3 of 3 results for author: Minematsu, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2306.08850  [pdf, other

    cs.SD eess.AS

    Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music

    Authors: Lifan Zhong, Erica Cooper, Junichi Yamagishi, Nobuaki Minematsu

    Abstract: With the growing amount of musical data available, automatic instrument recognition, one of the essential problems in Music Information Retrieval (MIR), is drawing more and more attention. While automatic recognition of single instruments has been well-studied, it remains challenging for polyphonic, multi-instrument musical recordings. This work presents our efforts toward building a robust end-to… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Submitted to APSIPA 2023

  2. arXiv:2204.03855  [pdf, other

    eess.AS cs.CL

    Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition

    Authors: Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao Kurohashi

    Abstract: Low-resource speech recognition has been long-suffering from insufficient training data. In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing… ▽ More

    Submitted 30 April, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: 7 pages, ICASSP 2023

  3. arXiv:1807.11679  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

    Authors: Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu

    Abstract: Recent neural networks such as WaveNet and sampleRNN that learn directly from speech waveform samples have achieved very high-quality synthetic speech in terms of both naturalness and speaker similarity even in multi-speaker text-to-speech synthesis systems. Such neural networks are being used as an alternative to vocoders and hence they are often called neural vocoders. The neural vocoder uses ac… ▽ More

    Submitted 31 July, 2018; originally announced July 2018.