Skip to main content

Showing 1–9 of 9 results for author: Garcia, L P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.05887  [pdf, other

    eess.AS

    Aligning Speech to Languages to Enhance Code-switching Speech Recognition

    Authors: Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

    Abstract: Code-switching (CS) refers to the switching of languages within a speech signal and results in language confusion for automatic speech recognition (ASR). To address language confusion, we propose the language alignment loss that performs frame-level language identification using pseudo language labels learned from the ASR decoder. This eliminates the need for frame-level language annotations. To f… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  2. arXiv:2402.10642  [pdf, other

    eess.AS cs.AI

    Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

    Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

    Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2311.15954  [pdf, other

    cs.CL eess.AS

    A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

    Authors: Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao, Leibny Paola Garcia

    Abstract: In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures, 4 tables

  4. arXiv:2309.16953  [pdf, other

    eess.AS cs.SD

    Enhancing Code-switching Speech Recognition with Interactive Language Biases

    Authors: Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

    Abstract: Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteri… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  5. arXiv:2306.01031  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

    Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

    Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) cr… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  6. arXiv:2211.17196  [pdf, other

    cs.CL cs.SD eess.AS

    EURO: ESPnet Unsupervised ASR Open-source Toolkit

    Authors: Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extend… ▽ More

    Submitted 20 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  7. arXiv:2210.14567  [pdf, other

    eess.AS cs.SD

    Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

    Authors: Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

    Abstract: Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  8. arXiv:2210.11658  [pdf, other

    eess.SP

    A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters

    Authors: Yu Xuan, Xiangyu Zhang, Shuyue Stella Li, Zihan Shen, Xin Xie, Leibny Paola Garcia, Roberto Togneri

    Abstract: The detection of abnormal fetal heartbeats during pregnancy is important for monitoring the health conditions of the fetus. While adult ECG has made several advances in modern medicine, noninvasive fetal electrocardiography (FECG) remains a great challenge. In this paper, we introduce a new method based on affine combinations of adaptive filters to extract FECG signals. The affine combination of m… ▽ More

    Submitted 26 February, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, 3 tables

  9. arXiv:2209.12702  [pdf, other

    eess.AS cs.SD

    End-to-End Lyrics Recognition with Self-supervised Learning

    Authors: Xiangyu Zhang, Shuyue Stella Li, Zhanhong He, Roberto Togneri, Leibny Paola Garcia

    Abstract: Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We e… ▽ More

    Submitted 26 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 4 pages, 2 figures, 3 tables