Skip to main content

Showing 1–4 of 4 results for author: Ueda, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2211.05869  [pdf, other

    cs.CL cs.SD eess.AS

    A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

    Authors: Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe

    Abstract: Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and thei… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: Accepted at SLT 2022

  2. arXiv:2203.17068  [pdf, other

    eess.AS cs.SD

    EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

    Authors: Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

    Abstract: In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, speaker counting with encoder-decoder based attractors (EDA), and speech separation using Conv-TasNet. In addition, we propose a multiple 1x1 convoluti… ▽ More

    Submitted 15 December, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted in SLT 2022

  3. arXiv:2111.14706  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

    Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

    Abstract: As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can b… ▽ More

    Submitted 3 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022 (5 pages)

  4. arXiv:2102.04051  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

    Authors: Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

    Abstract: We propose a conditional generative adversarial network (GAN) incorporating humans' perceptual evaluations. A deep neural network (DNN)-based generator of a GAN can represent a real-data distribution accurately but can never represent a human-acceptable distribution, which are ranges of data in which humans accept the naturalness regardless of whether the data are real or not. A HumanGAN was propo… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: 5 pages, 6 figures, to be published in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing