Skip to main content

Showing 1–7 of 7 results for author: Oura, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2211.11222  [pdf, other

    eess.AS cs.CL cs.SD

    Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

    Authors: Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency war** parameter and fundame… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  2. arXiv:2108.13985  [pdf, other

    eess.AS

    Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism

    Authors: Yoshihiko Nankaku, Kenta Sumiya, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Keiichi Tokuda

    Abstract: This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism. In speech synthesis, it has been shown that methods based on Seq2Seq models using deep neural networks can synthesize high quality speech under the appropriate conditions. However, several essential problems still have remained, i.e., requiri… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: 5 pages, 3 figures

  3. arXiv:2108.02776  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System

    Authors: Yukiya Hono, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: This paper presents Sinsy, a deep neural network (DNN)-based singing voice synthesis (SVS) system. In recent years, DNNs have been utilized in statistical parametric SVS systems, and DNN-based SVS systems have demonstrated better performance than conventional hidden Markov model-based ones. SVS systems are required to synthesize a singing voice with pitch and timing that strictly follow a given mu… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 14 pages, 11 figures, 3 tables, Accepted to IEEE/ACM Transactions on Audio, Speech and Language Processing

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2803-2815, 2021

  4. arXiv:2102.07786  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

    Authors: Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveforms parallelly and can be used as a speech vocoder by conditioning an acoustic feature. Since a speech waveform contains periodic and aperiodic components, both co… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: 5 pages, accepted to ICASSP 2021

  5. arXiv:2009.08474  [pdf, other

    eess.AS cs.LG cs.SD

    Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis

    Authors: Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: This paper proposes a hierarchical generative model with a multi-grained latent variable to synthesize expressive speech. In recent years, fine-grained latent variables are introduced into the text-to-speech synthesis that enable the fine control of the prosody and speaking styles of synthesized speech. However, the naturalness of speech degrades when these latent variables are obtained by samplin… ▽ More

    Submitted 26 December, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: 5 pages, accepted to INTERSPEECH 2020, demo page: https://www.rinna.jp/research/interspeech2020/

  6. arXiv:1910.11690  [pdf, other

    eess.AS cs.LG cs.SD

    Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

    Authors: Kazuhiro Nakamura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed techniqu… ▽ More

    Submitted 21 April, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: Accepted to ICASSP 2020. Singing voice samples (Japanese, English, Chinese): https://www.techno-speech.com/news-20181214a-en. arXiv admin note: substantial text overlap with arXiv:1904.06868

  7. arXiv:1904.06868  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Singing voice synthesis based on convolutional neural networks

    Authors: Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: The present paper describes a singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. In these systems, the relationship between musical score feature sequences and acoustic feature sequences extracted from singing voices… ▽ More

    Submitted 25 June, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: Singing voice samples (Japanese, English, Chinese): https://www.techno-speech.com/news-20181214a-en