Skip to main content

Showing 1–6 of 6 results for author: Tseng, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.12847  [pdf, other

    cs.IR cs.LG cs.MM cs.SD eess.AS

    A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

    Authors: Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

    Abstract: Nowadays, humans are constantly exposed to music, whether through voluntary streaming services or incidental encounters during commercial breaks. Despite the abundance of music, certain pieces remain more memorable and often gain greater popularity. Inspired by this phenomenon, we focus on measuring and predicting music memorability. To achieve this, we collect a new music piece dataset with relia… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, 174-181. Milan, Italy, November 5-9, 2023

  2. arXiv:2402.03988  [pdf, other

    eess.AS cs.CL cs.SD

    REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

    Authors: Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

    Abstract: Unsupervised automatic speech recognition (ASR) aims to learn the map** between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the map** between speech and text… ▽ More

    Submitted 28 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  3. arXiv:2305.07455  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

    Authors: Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-yi Lee

    Abstract: Most of the speech translation models heavily rely on parallel data, which is hard to collect especially for low-resource languages. To tackle this issue, we propose to build a cascaded speech translation system without leveraging any kind of paired data. We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS. The results show that our work is co… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  4. arXiv:2211.08402  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Introducing Semantics into Speech Encoders

    Authors: Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang

    Abstract: Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 11 pages, 3 figures

  5. arXiv:2210.07978  [pdf, other

    cs.SD cs.CL eess.AS

    Improving generalizability of distilled self-supervised speech processing models under distorted settings

    Authors: Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee

    Abstract: Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. Th… ▽ More

    Submitted 20 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE SLT2022

  6. arXiv:2110.03504  [pdf, other

    cs.CL eess.AS

    Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

    Authors: Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-yi Lee

    Abstract: Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL mo… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022