Skip to main content

Showing 1–4 of 4 results for author: Sudoh, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00826  [pdf, other

    cs.CL cs.SD eess.AS

    NAIST Simultaneous Speech Translation System for IWSLT 2024

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding poli… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: IWSLT 2024 system paper

  2. arXiv:2306.08582  [pdf, other

    cs.CL cs.SD eess.AS

    Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted to IWSLT2023 scientific paper

  3. Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation

    Authors: Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Speech translation (ST) automatically converts utterances in a source language into text in another language. Splitting continuous speech into shorter segments, known as speech segmentation, plays an important role in ST. Recent segmentation methods trained to mimic the segmentation of ST corpora have surpassed traditional approaches. Tsiamas et al. proposed a segmentation frame classifier (SFC) b… ▽ More

    Submitted 18 December, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  4. arXiv:2203.15479  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

    Authors: Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segm… ▽ More

    Submitted 13 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted to INTERSPEECH 2022