Skip to main content

Showing 1–4 of 4 results for author: Inoue, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05551  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

    Authors: Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li

    Abstract: Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the i… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  2. Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

    Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

    Abstract: It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS) synthesis. Prior studies have primarily focused on learning a global prosodic representation at the utterance level, which strongly correlates with linguistic prosody. Our goal is to construct a hierarchical emotion distribution (ED) that effectively encapsulates intensity variations of emotions at various… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: This is accepted to IEEE ICASSP 2024

  3. arXiv:2403.02002  [pdf, other

    cs.SD eess.AS

    Fine-Grained Quantitative Emotion Editing for Speech Generation

    Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

    Abstract: It remains a significant challenge how to quantitatively control the expressiveness of speech emotion in speech generation. In this work, we present a novel approach for manipulating the rendering of emotions for speech generation. We propose a hierarchical emotion distribution extractor, i.e. Hierarchical ED, that quantifies the intensity of emotions at different levels of granularity. Support ve… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: This paper is submitted to IEEE Signal Processing Letters

  4. arXiv:2010.03164  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial attacks on audio source separation

    Authors: Naoya Takahashi, Shota Inoue, Yuki Mitsufuji

    Abstract: Despite the excellent performance of neural-network-based audio source separation methods and their wide range of applications, their robustness against intentional attacks has been largely neglected. In this work, we reformulate various adversarial attack methods for the audio source separation problem and intensively investigate them under different attack conditions and target models. We furthe… ▽ More

    Submitted 14 February, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted at ICASSP 2021