Skip to main content

Showing 1–12 of 12 results for author: Tseng, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.09385  [pdf, other

    eess.AS cs.CL eess.SP

    A Large-Scale Evaluation of Speech Foundation Models

    Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

    Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

  2. arXiv:2311.15582  [pdf, other

    cs.SD cs.LG eess.AS

    Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

    Authors: Yi-Heng Lin, Wen-Hsuan Tseng, Li-Chin Chen, Ching-Ting Tan, Yu Tsao

    Abstract: The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment. Currently, because the assessment relies on experienced clinicians, it tends to be inconsistent, and thus, difficult to standardize. To address t… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Published in IEEE 42th International Conference on Consumer Electronics (ICCE 2024)

  3. arXiv:2303.12861  [pdf, other

    eess.IV cs.LG eess.SP physics.bio-ph

    Parallel Diffusion Model-based Sparse-view Cone-beam Breast CT

    Authors: Wenjun Xia, Hsin Wu Tseng, Chuang Niu, Wenxiang Cong, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Srinivasan Vedantham, Ge Wang

    Abstract: Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation d… ▽ More

    Submitted 28 January, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

  4. arXiv:2303.00733  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

    Authors: Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained language model (LM) to directly generate the output for downstream tasks. Recently, prompt tuning has demonstrated its storage and computation efficiency in both natural language processing (NLP) and speech processing fields. These advantages have also revealed prompt tuning as a candidate approach to serving… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Project website: https://ga642381.github.io/SpeechPrompt

  5. arXiv:2302.12757  [pdf, other

    eess.AS cs.CL cs.SD

    Ensemble knowledge distillation of self-supervised speech models

    Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

    Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerw… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  6. arXiv:2204.03219  [pdf, other

    eess.AS cs.LG cs.SD

    DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

    Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

    Abstract: Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic s… ▽ More

    Submitted 15 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022. Code will be available in the future

  7. arXiv:2203.16773  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

    Authors: Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee

    Abstract: Speech representations learned from Self-supervised learning (SSL) models can benefit various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. Recently, prompting in Natural Language Processing (NLP) has been found to be an e… ▽ More

    Submitted 10 July, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to be published in the Proceedings of Interspeech 2022

  8. arXiv:2111.05113  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    Membership Inference Attacks Against Self-supervised Speech Models

    Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

    Abstract: Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In… ▽ More

    Submitted 15 August, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: Accepted to Interspeech 2022. Code will be available in the future

  9. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  10. arXiv:2104.03017  [pdf, other

    eess.AS cs.LG cs.SD

    Utilizing Self-supervised Representations for MOS Prediction

    Authors: Wei-Cheng Tseng, Chien-yu Huang, Wei-Tsung Kao, Yist Y. Lin, Hung-yi Lee

    Abstract: Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and… ▽ More

    Submitted 20 September, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: In Proceedings of Interspeech 2021. We acknowledge the support of AWS Machine Learning Research Awards program. Source code available at https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/mos_prediction

  11. arXiv:2011.02882  [pdf

    cs.SD cs.CL eess.AS

    Query Expansion System for the VoxCeleb Speaker Recognition Challenge 2020

    Authors: Yu-Sen Cheng, Chun-Liang Shih, Tien-Hong Lo, Wen-Ting Tseng, Berlin Chen

    Abstract: In this report, we describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. Two approaches are adopted. One is to apply query expansion on speaker verification, which shows significant progress compared to baseline in the study. Another is to use Kaldi extract x-vector and to combine its Probabilistic Linear Discriminant Analysis (PLDA) score with ResNet score.

    Submitted 4 November, 2020; originally announced November 2020.

  12. arXiv:2007.11784  [pdf

    eess.IV cs.CV cs.LG

    Deep Learning Based Segmentation of Various Brain Lesions for Radiosurgery

    Authors: Siang-Ruei Wu, Hao-Yun Chang, Florence T Su, Heng-Chun Liao, Wanju Tseng, Chun-Chih Liao, Feipei Lai, Feng-Ming Hsu, Furen Xiao

    Abstract: Semantic segmentation of medical images with deep learning models is rapidly developed. In this study, we benchmarked state-of-the-art deep learning segmentation algorithms on our clinical stereotactic radiosurgery dataset, demonstrating the strengths and weaknesses of these algorithms in a fairly practical scenario. In particular, we compared the model performances with respect to their sampling… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.