Skip to main content

Showing 1–6 of 6 results for author: Perera, L P G

.
  1. arXiv:2309.15796  [pdf, other

    eess.AS cs.CL cs.LG

    Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

    Authors: Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

    Abstract: Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-temporal Classification (OTC), a novel training criterion that explicitly incorporates label uncertainties originating from such weak supervision. Thi… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  2. arXiv:2305.19493  [pdf

    eess.AS

    MERLIon CCS Challenge Evaluation Plan

    Authors: Leibny Paola Garcia Perera, Y. H. Victoria Chua, Hexin Liu, Fei Ting Woon, Andy W. H. Khong, Justin Dauwels, Sanjeev Khudanpur, Suzy J. Styles

    Abstract: This paper introduces the inaugural Multilingual Everyday Recordings- Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge, focused on develo** robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom. Aligning closely with Interspeech 2023 th… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Evaluation plan for Interspeech 2023 special session "MERLIon"

  3. arXiv:2305.18925  [pdf, other

    eess.AS cs.CL cs.SD

    Investigating model performance in language identification: beyond simple error statistics

    Authors: Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels

    Abstract: Language development experts need tools that can automatically identify languages from fluent, conversational speech, and provide reliable estimates of usage rates at the level of an individual recording. However, language identification systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023, 5 pages, 5 figures

  4. arXiv:2305.18881  [pdf, other

    eess.AS

    MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization

    Authors: Victoria Y. H. Chua, Hexin Liu, Leibny Paola Garcia Perera, Fei Ting Woon, **yi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles

    Abstract: To enhance the reliability and robustness of language identification (LID) and language diarization (LD) systems for heterogeneous populations and scenarios, there is a need for speech processing models to be trained on datasets that feature diverse language registers and speech patterns. We present the MERLIon CCS challenge, featuring a first-of-its-kind Zoom video call dataset of parent-child sh… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023, 5 pages, 2 figures, 3 tables

  5. arXiv:2203.12366  [pdf, other

    eess.AS

    PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification

    Authors: Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur

    Abstract: We propose a novel model to hierarchically incorporate phoneme and phonotactic information for language identification (LID) without requiring phoneme annotations for training. In this model, named PHO-LID, a self-supervised phoneme segmentation task and a LID task share a convolutional neural network (CNN) module, which encodes both language identity and sequential phonemic information in the inp… ▽ More

    Submitted 31 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022, updated to the submitted version

  6. arXiv:2203.03218  [pdf, other

    eess.AS cs.CL cs.SD

    Enhance Language Identification using Dual-mode Model with Knowledge Distillation

    Authors: Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, Sanjeev Khudanpur

    Abstract: In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances. The dual-mode XSA-LID model is trained by jointly optimizing both the full and short modes with their respective inputs being the full-length speech and its short clip e… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Submitted to Odyssey 2022