Skip to main content

Showing 1–8 of 8 results for author: Fuh, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.12766  [pdf, other

    eess.AS cs.SD

    A Study on Incorporating Whisper for Robust Speech Assessment

    Authors: Ryandhimas E. Zezario, Yu-Wen Chen, Szu-Wei Fu, Yu Tsao, Hsin-Min Wang, Chiou-Shann Fuh

    Abstract: This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model. We first investigate the effectiveness of Whisper in deploying a more robust speech assessment model. After that, we explore combining representations from Whisper and SSL models. The experimental results r… ▽ More

    Submitted 29 April, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE ICME 2024

  2. arXiv:2309.09548  [pdf, other

    eess.AS cs.LG cs.SD

    Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

    Authors: Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper's embeddings to create cross-domain acoustic features and includes m… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2308.09262  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

    Authors: Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is u… ▽ More

    Submitted 13 March, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE ICASSP 2024

  4. arXiv:2204.03310  [pdf, other

    eess.AS cs.LG cs.SD

    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

    Authors: Ryandhimas E. Zezario, Szu-wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility sco… ▽ More

    Submitted 30 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  5. arXiv:2204.03305  [pdf, other

    eess.AS cs.LG cs.SD

    MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

    Authors: Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale lis… ▽ More

    Submitted 30 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  6. arXiv:2111.02363  [pdf, other

    eess.AS cs.LG cs.SD

    Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

    Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously. Experimental results show that MOSA-Net can improve the linear correlation coefficient (LCC) by 0.026 (0.990 vs 0.964 in seen noise environments) and 0.012 (0.969 vs 0.957 in unseen noise environments) in PESQ prediction, compared t… ▽ More

    Submitted 23 June, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

  7. arXiv:2012.09359  [pdf

    eess.AS cs.LG cs.SD

    Speech Enhancement with Zero-Shot Model Selection

    Authors: Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

    Abstract: Recent research on speech enhancement (SE) has seen the emergence of deep-learning-based methods. It is still a challenging task to determine the effective ways to increase the generalizability of SE under diverse test conditions. In this study, we combine zero-shot learning and ensemble learning to propose a zero-shot model selection (ZMOS) approach to increase the generalization of SE performanc… ▽ More

    Submitted 31 August, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: Accepted in EUSIPCO 2021

  8. arXiv:2011.04292  [pdf

    cs.SD cs.LG eess.AS

    STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

    Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

    Abstract: The calculation of most objective speech intelligibility assessment metrics requires clean speech as a reference. Such a requirement may limit the applicability of these metrics in real-world scenarios. To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net. The input and output of STOI-Net are speech spectral features a… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted in APSIPA 2020