Skip to main content

Showing 1–7 of 7 results for author: Loweimi, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.00898  [pdf, other

    cs.SD cs.CL eess.AS

    Phonetic Error Analysis of Raw Waveform Acoustic Models with Parametric and Non-Parametric CNNs

    Authors: Erfan Loweimi, Andrea Carmantini, Peter Bell, Steve Renals, Zoran Cvetkovic

    Abstract: In this paper, we analyse the error patterns of the raw waveform acoustic models in TIMIT's phone recognition task. Our analysis goes beyond the conventional phone error rate (PER) metric. We categorise the phones into three groups: {affricate, diphthong, fricative, nasal, plosive, semi-vowel, vowel, silence}, {consonant, vowel+, silence}, and {voiced, unvoiced, silence} and, compute the PER for e… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 5 pages, 6 figures, 3 tables

  2. arXiv:2110.11144  [pdf, other

    eess.AS cs.LG cs.SD

    RCT: Random Consistency Training for Semi-supervised Sound Event Detection

    Authors: Nian Shao, Erfan Loweimi, Xiaofei Li

    Abstract: Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem while bringing no extra annotation budget. This paper researches on several core modules of SSL, and introduces a random consistency training (RCT) strategy. First, a self-consistency loss is… ▽ More

    Submitted 27 March, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Preprint for interspeech 2022

  3. arXiv:2102.04697  [pdf, other

    eess.AS cs.AI cs.SD

    Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

    Authors: Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

    Abstract: Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset. That is, in general, freezing the trained feature extractor (the lower layers) and retraining the classifier (the upper layers) on the same dataset leads to worse performance. In this paper, for the first time, we show that the frozen… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Accepted by ICASSP 2021

  4. arXiv:2011.04906  [pdf, other

    cs.CL cs.SD eess.AS

    On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

    Authors: Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

    Abstract: Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results. However, we note the range of the learned context increases from the lower to upper self-attention layers, whilst acoustic events often happen within short time spans in a left-to-right order. This leads to a q… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2005.13895

  5. arXiv:2011.04004  [pdf, other

    cs.CL cs.SD eess.AS

    Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models

    Authors: Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

    Abstract: Recently, Transformer based models have shown competitive automatic speech recognition (ASR) performance. One key factor in the success of these models is the multi-head attention mechanism. However, for trained models, we have previously observed that many attention matrices are close to diagonal, indicating the redundancy of the corresponding attention heads. We have also found that some archite… ▽ More

    Submitted 6 April, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

  6. arXiv:2005.13895  [pdf, other

    eess.AS cs.CL cs.SD

    When Can Self-Attention Be Replaced by Feed Forward Layers?

    Authors: Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

    Abstract: Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition. The key factor for the outstanding performance of self-attention models is their ability to capture temporal relationships without being limited by the distance between two related events. However, we note that the range of the learned context prog… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

  7. arXiv:1909.13759  [pdf, other

    eess.AS cs.CL cs.SD

    Acoustic Model Adaptation from Raw Waveforms with SincNet

    Authors: Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

    Abstract: Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed to reduce the number of parameters required in raw-waveform modelling, by restricting the filter functions, rather than having to learn every tap of e… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to IEEE ASRU 2019