Skip to main content

Showing 1–8 of 8 results for author: Gales, M J F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2311.05550  [pdf, other

    cs.CL cs.LG eess.AS

    Towards End-to-End Spoken Grammatical Error Correction

    Authors: Stefano BannĂ², Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales

    Abstract: Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  2. arXiv:2307.09378  [pdf, other

    cs.CL cs.SD eess.AS

    Adapting an ASR Foundation Model for Spoken Language Assessment

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a… ▽ More

    Submitted 10 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Proceedings of SLaTE

  3. Adapting an Unadaptable ASR System

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  4. arXiv:2305.12498  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Multi-Head State Space Model for Speech Recognition

    Authors: Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

    Abstract: State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in… ▽ More

    Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  5. N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

    Authors: Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian

    Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 mo… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Proceedings of INTERSPEECH

  6. arXiv:2211.08849  [pdf, other

    eess.AS cs.CL

    L2 proficiency assessment using self-supervised speech representations

    Authors: Stefano BannĂ², Kate M. Knill, Marco Matassoni, Vyas Raina, Mark J. F. Gales

    Abstract: There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though these approaches can yield high performance systems, they require speech recognition systems that can b… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  7. arXiv:1909.13695  [pdf, other

    eess.AS cs.CL cs.SD

    Non-native Speaker Verification for Spoken Language Assessment

    Authors: Linlin Wang, Yu Wang, Mark J. F. Gales

    Abstract: Automatic spoken language assessment systems are becoming more popular in order to handle increasing interests in second language learning. One challenge for these systems is to detect malpractice. Malpractice can take a range of forms, this paper focuses on detecting when a candidate attempts to impersonate another in a speaking test. This form of malpractice is closely related to speaker verific… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  8. arXiv:1909.12289  [pdf, other

    cs.LG cs.CL eess.AS stat.ML

    Attention Forcing for Sequence-to-sequence Model Training

    Authors: Qingyun Dou, Yiting Lu, Joshua Efiong, Mark J. F. Gales

    Abstract: Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during infe… ▽ More

    Submitted 2 October, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 11 pages, 4 figures, conference

    ACM Class: I.2