Skip to main content

Showing 1–3 of 3 results for author: Nakagome, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14890  [pdf, other

    cs.CL eess.AS

    InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions

    Authors: Yu Nakagome, Michael Hentschel

    Abstract: Despite recent advances in end-to-end speech recognition methods, their output is biased to the training data's vocabulary, resulting in inaccurate recognition of unknown terms or proper nouns. To improve the recognition accuracy for a given set of such terms, we propose an adaptation parameter-free approach based on Self-conditioned CTC. Our method improves the recognition accuracy of misrecogniz… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2204.00174  [pdf, other

    cs.CL cs.SD eess.AS

    InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

    Authors: Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

    Abstract: This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions. During the training, intermediate predictions are changed to incorrect intermediate predictions, and fed in… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

    Comments: This paper was submitted to INTERSPEECH2022

  3. arXiv:1911.04228  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function

    Authors: Masahito Togami, Yoshiki Masuyama, Tatsuya Komatsu, Yu Nakagome

    Abstract: In this paper, we propose a multi-channel speech source separation with a deep neural network (DNN) which is trained under the condition that no clean signal is available. As an alternative to a clean signal, the proposed method adopts an estimated speech signal by an unsupervised speech source separation with a statistical model. As a statistical model of microphone input signal, we adopts a time… ▽ More

    Submitted 11 November, 2019; originally announced November 2019.