Skip to main content

Showing 1–3 of 3 results for author: Nozaki, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2207.03169  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-end Speech-to-Punctuated-Text Recognition

    Authors: Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto

    Abstract: Conventional automatic speech recognition systems do not produce punctuation marks which are important for the readability of the speech recognition results. They are also needed for subsequent natural language processing tasks such as machine translation. There have been a lot of works on punctuation prediction models that insert punctuation marks into speech recognition results as post-processin… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to INTERSPEECH2022

  2. arXiv:2110.05249  [pdf, other

    eess.AS cs.CL cs.SD

    A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

    Authors: Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

    Abstract: Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we con… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to ASRU2021

  3. arXiv:2104.02724  [pdf, other

    eess.AS cs.CL cs.SD

    Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

    Authors: Jumon Nozaki, Tatsuya Komatsu

    Abstract: This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models. We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer. During both training and inference, each generated prediction in the intermediate layers i… ▽ More

    Submitted 8 October, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Accepted to INTERSPEECH2021