Skip to main content

Showing 1–6 of 6 results for author: Yoneyama, R

.
  1. arXiv:2310.05203  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023

    Authors: Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda

    Abstract: This paper presents our systems (denoted as T13) for the singing voice conversion challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice conversion (SVC) tasks (Task 1 and Task 2), we adopt a recognition-synthesis approach with self-supervised learning-based representation. To achieve data-efficient SVC with a limited amount of target singer/speaker's data (150 to 160 utt… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to ASRU 2023

  2. arXiv:2210.15987  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit

    Authors: Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda

    Abstract: This paper describes the design of NNSVS, an open-source software for neural network-based singing voice synthesis research. NNSVS is inspired by Sinsy, an open-source pioneer in singing voice synthesis research, and provides many additional features such as multi-stream models, autoregressive fundamental frequency models, and neural vocoders. Furthermore, NNSVS provides extensive documentation an… ▽ More

    Submitted 1 March, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted to ICASSP 2023

  3. arXiv:2210.15887  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs

    Authors: Reo Yoneyama, Ryuichi Yamamoto, Kentaro Tachibana

    Abstract: Neural audio super-resolution models are typically trained on low- and high-resolution audio signal pairs. Although these methods achieve highly accurate super-resolution if the acoustic characteristics of the input data are similar to those of the training data, challenges remain: the models suffer from quality degradation for out-of-domain data, and paired data are required for training. To addr… ▽ More

    Submitted 27 February, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Acceptted to ICASSP 2023

  4. arXiv:2210.15533  [pdf, other

    cs.SD cs.LG eess.AS

    Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder

    Authors: Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

    Abstract: Our previous work, the unified source-filter GAN (uSFGAN) vocoder, introduced a novel architecture based on the source-filter theory into the parallel waveform generative adversarial network to achieve high voice quality and pitch controllability. However, the high temporal resolution inputs result in high computation costs. Although the HiFi-GAN vocoder achieves fast high-fidelity voice generatio… ▽ More

    Submitted 27 February, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted to ICASSP 2023

  5. arXiv:2205.06053  [pdf, other

    cs.SD cs.LG eess.AS

    Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

    Authors: Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

    Abstract: This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism. In our previous work, we proposed unified Source-Filter GAN (uSFGAN) for develo** a high-fidelity neural vocoder with flexible voice controllability using a unified source-filter neural network architecture. However, the capability of uSFGAN to model the aperiodic source excit… ▽ More

    Submitted 30 June, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted to INTERSPEECH 2022

  6. arXiv:2104.04668  [pdf, other

    cs.SD cs.LG eess.AS

    Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN

    Authors: Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

    Abstract: We propose a unified approach to data-driven source-filter modeling using a single neural network for develo** a neural vocoder capable of generating high-quality synthetic speech waveforms while retaining flexibility of the source-filter model to control their voice characteristics. Our proposed network called unified source-filter generative adversarial networks (uSFGAN) is developed by factor… ▽ More

    Submitted 27 June, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH 2021