Skip to main content

Showing 1–24 of 24 results for author: McLoughlin, I

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.12111  [pdf, other

    eess.AS cs.SD

    Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection

    Authors: Xiao-Min Zeng, Yan Song, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, Li-Rong Dai, Ian McLoughlin

    Abstract: In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD). GeCo exploits a Predictive AutoEncoder (PAE) equipped with self-attention as a generative model to perform frame-level prediction. The output of the PAE together with original normal samples, are used for supervised contrastive representative learning in a multi-t… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted by ICASSP2023

  2. arXiv:2303.03689  [pdf, other

    eess.AS cs.SD

    AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

    Authors: Kang Li, Yan Song, Li-Rong Dai, Ian McLoughlin, Xin Fang, Lin Liu

    Abstract: In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED. Pretrained AST models have recently shown promise on DCASE2022 challenge task4 where they help mitigate a lack of sufficient real annotated data. However, mainly due to differences betwe… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: accepted to ICASSP 2023

  3. arXiv:2206.08317  [pdf, other

    cs.SD cs.CL eess.AS

    Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

    Authors: Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan

    Abstract: Transformers have recently dominated the ASR field. Although able to yield good performance, they involve an autoregressive (AR) decoder to generate tokens one by one, which is computationally inefficient. To speed up inference, non-autoregressive (NAR) methods, e.g. single-step NAR, were designed, to enable parallel generation. However, due to an independence assumption within the output tokens,… ▽ More

    Submitted 30 March, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 5 pages, 3 figures, accepted by INTERSPEECH 2022

  4. arXiv:2201.03054  [pdf, ps, other

    cs.SD eess.AS

    An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies

    Authors: Lam Pham, Dat Ngo, Truong Hoang, Alexander Schindler, Ian McLoughlin

    Abstract: In this paper, we evaluate various deep learning frameworks for detecting respiratory anomalies from input audio recordings. To this end, we firstly transform audio respiratory cycles collected from patients into spectrograms where both temporal and spectral features are presented, referred to as the front-end feature extraction. We then feed the spectrograms into back-end deep learning networks f… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

  5. arXiv:2104.05784  [pdf, other

    cs.SD eess.AS

    Extremely Low Footprint End-to-End ASR System for Smart Device

    Authors: Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlin

    Abstract: Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models, e.g. Transformer, have emerged as being superior. Such models have opened the door to deployment of ASR on smart devices, however they still suffer… ▽ More

    Submitted 6 July, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures, accepted by INTERSPEECH 2021

  6. arXiv:2103.02420  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Multi-view Audio and Music Classification

    Authors: Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Pham, Philipp Koch, Ian McLoughlin, Alfred Mertins

    Abstract: We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view emb… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted to ICASSP 2021

  7. arXiv:2012.13699  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Inception-Based Network and Multi-Spectrogram Ensemble Applied For Predicting Respiratory Anomalies and Lung Diseases

    Authors: Lam Pham, Huy Phan, Ross King, Alfred Mertins, Ian McLoughlin

    Abstract: This paper presents an inception-based deep neural network for detecting lung diseases using respiratory sound input. Recordings of respiratory sound collected from patients are firstly transformed into spectrograms where both spectral and temporal information are well presented, referred to as front-end feature extraction. These spectrograms are then fed into the proposed network, referred to as… ▽ More

    Submitted 26 December, 2020; originally announced December 2020.

  8. arXiv:2010.14099  [pdf, other

    cs.SD eess.AS

    Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model

    Authors: Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

    Abstract: Recently, online end-to-end ASR has gained increasing attention. However, the performance of online systems still lags far behind that of offline systems, with a large gap in quality of recognition. For specific scenarios, we can trade-off between performance and latency, and can train multiple systems with different delays to match the performance and latency requirements of various application s… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2021

  9. arXiv:2010.10584  [pdf, other

    cs.HC cs.CV eess.SP

    Incandescent Bulb and LED Brake Lights:Novel Analysis of Reaction Times

    Authors: Ramaswamy Palaniappan, Surej Mouli, Evangelina Fringi, Howard Bowman, Ian McLoughlin

    Abstract: Rear-end collision accounts for around 8% of all vehicle crashes in the UK, with the failure to notice or react to a brake light signal being a major contributory cause. Meanwhile traditional incandescent brake light bulbs on vehicles are increasingly being replaced by a profusion of designs featuring LEDs. In this paper, we investigate the efficacy of brake light design using a novel approach to… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 10 pages, 18 figures

    Journal ref: For a revised version and its published version refer to IEEE Access journal, 2021

  10. arXiv:2010.09132  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Self-Attention Generative Adversarial Network for Speech Enhancement

    Authors: Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Philipp Koch, Ngoc Q. K. Duong, Ian McLoughlin, Alfred Mertins

    Abstract: Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we… ▽ More

    Submitted 6 February, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: 46th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021). Source code is available at http://github.com/pquochuy/sasegan

  11. arXiv:2009.05527  [pdf, ps, other

    eess.AS cs.LG

    On Multitask Loss Function for Audio Event Detection and Localization

    Authors: Huy Phan, Lam Pham, Philipp Koch, Ngoc Q. K. Duong, Ian McLoughlin, Alfred Mertins

    Abstract: Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch with sigmoid cross-entropy loss for event activity detection and a regression branch with mean squared error loss for direction-of-arrival estimation. In this work, we propose a multitask regression model, in which both (multi-l… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: Accepted for publication in DCASE 2020 Workshop

  12. arXiv:2006.01713  [pdf, other

    cs.SD eess.AS

    SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

    Authors: Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

    Abstract: End-to-end speech recognition has become popular in recent years, since it can integrate the acoustic, pronunciation and language models into a single neural network. Among end-to-end approaches, attention-based methods have emerged as being superior. For example, Transformer, which adopts an encoder-decoder architecture. The key improvement introduced by Transformer is the utilization of self-att… ▽ More

    Submitted 20 May, 2020; originally announced June 2020.

    Comments: submitted to INTERSPEECH2020

  13. arXiv:2004.04072  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    CNN-MoE based framework for classification of respiratory anomalies and lung disease detection

    Authors: Lam Pham, Huy Phan, Ramaswamy Palaniappan, Alfred Mertins, Ian McLoughlin

    Abstract: This paper presents and explores a robust deep learning framework for auscultation analysis. This aims to classify anomalies in respiratory cycles and detect disease, from respiratory sound recordings. The framework begins with front-end feature extraction that transforms input sound into a spectrogram representation. Then, a back-end deep learning network is used to classify the spectrogram featu… ▽ More

    Submitted 2 June, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

  14. arXiv:2002.04857  [pdf, ps, other

    eess.AS cs.SD

    Deep Feature Embedding and Hierarchical Classification for Audio Scene Classification

    Authors: Lam Pham, Ian McLoughlin, Huy Phan, Ramaswamy Palaniappan, Alfred Mertins

    Abstract: In this work, we propose an approach that features deep feature embedding learning and hierarchical classification with triplet loss function for Acoustic Scene Classification (ASC). In the one hand, a deep convolutional neural network is firstly trained to learn a feature embedding from scene audio signals. Via the trained convolutional neural network, the learned embedding embeds an input into t… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

  15. arXiv:2002.04502  [pdf, ps, other

    cs.SD eess.AS

    Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework

    Authors: Lam Pham, Huy Phan, Truc Nguyen, Ramaswamy Palaniappan, Alfred Mertins, Ian McLoughlin

    Abstract: This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at the front-end, transformed into higher level features through a well-trained CNN-DNN front-end encoder. The high level features and their combination (via a trai… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

  16. arXiv:2002.03894  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Deep Learning Framework For Predicting Respiratory Anomalies and Diseases

    Authors: Lam Pham, Ian McLoughlin, Huy Phan, Minh Tran, Truc Nguyen, Ramaswamy Palaniappan

    Abstract: This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of respiratory sounds. The complete detection process firstly involves front end feature extraction where recordings are transformed into spectrograms that convey both spectral and temporal information. Then a back-end deep learning model classifies the features into classes of respiratory… ▽ More

    Submitted 21 January, 2020; originally announced February 2020.

  17. arXiv:2001.05532  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Improving GANs for Speech Enhancement

    Authors: Huy Phan, Ian V. McLoughlin, Lam Pham, Oliver Y. Chén, Philipp Koch, Maarten De Vos, Alfred Mertins

    Abstract: Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement map**. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement map**, which gradually refines the noisy input sig… ▽ More

    Submitted 12 September, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: This letter has been accepted for publication in IEEE Signal Processing Letters

  18. arXiv:1912.09003  [pdf, ps, other

    eess.AS cs.CL cs.SD

    LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

    Authors: Xiaoxiao Miao, Ian McLoughlin

    Abstract: This paper presents a novel Dialect Identification (DID) system developed for the Fifth Edition of the Multi-Genre Broadcast challenge, the task of Fine-grained Arabic Dialect Identification (MGB-5 ADI Challenge). The system improves upon traditional DNN x-vector performance by employing a Convolutional and Long Short Term Memory-Recurrent (CLSTM) architecture to combine the benefits of a convolut… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

  19. arXiv:1907.13177  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning

    Authors: Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, Maarten De Vos

    Abstract: Background: Despite recent significant progress in the development of automatic sleep staging methods, building a good model still remains a big challenge for sleep studies with a small cohort due to the data-variability and data-inefficiency issues. This work presents a deep transfer learning approach to overcome these issues and enable transferring knowledge from a large dataset to a small cohor… ▽ More

    Submitted 27 August, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

    Comments: This article has been published in IEEE Transactions on Biomedical Engineering

  20. arXiv:1904.03543  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Spatio-Temporal Attention Pooling for Audio Scene Classification

    Authors: Huy Phan, Oliver Y. Chén, Lam Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, Alfred Mertins

    Abstract: Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The… ▽ More

    Submitted 28 June, 2019; v1 submitted 6 April, 2019; originally announced April 2019.

    Comments: To appear at the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)

  21. arXiv:1811.01095  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene?

    Authors: Huy Phan, Oliver Y. Chén, Philipp Koch, Lam Pham, Ian McLoughlin, Alfred Mertins, Maarten De Vos

    Abstract: Due to the variability in characteristics of audio scenes, some scenes can naturally be recognized earlier than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized given state-of-the-art models. Moreover, as model fusion with deep network ensemble is preva… ▽ More

    Submitted 8 May, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted to 2019 AES Conference on Audio Forensics

  22. arXiv:1811.01092  [pdf, ps, other

    cs.LG cs.SD eess.AS stat.ML

    Unifying Isolated and Overlap** Audio Event Detection with Multi-Label Multi-Task Convolutional Recurrent Neural Networks

    Authors: Huy Phan, Oliver Y. Chén, Philipp Koch, Lam Pham, Ian McLoughlin, Alfred Mertins, Maarten De Vos

    Abstract: We propose a multi-label multi-task framework based on a convolutional recurrent neural network to unify detection of isolated and overlap** audio events. The framework leverages the power of convolutional recurrent neural network architectures; convolutional layers learn effective features over which higher recurrent layers perform sequential modelling. Furthermore, the output layer is designed… ▽ More

    Submitted 18 February, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted for the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)

  23. On the Use of a Spectral Glottal Model for the Source-filter Separation of Speech

    Authors: Olivier Perrotin, Ian Vince McLoughlin

    Abstract: The estimation of glottal flow from a speech waveform is a key method for speech analysis and parameterization. Significant research effort has been made to dissociate the first vocal tract resonance from the glottal formant (the low-frequency resonance describing the open-phase of the vocal fold vibration). However few methods cope with estimation of high-frequency spectral tilt to describe the r… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

    Comments: 8 pages, 4 figures

  24. arXiv:1712.02116  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Enabling Early Audio Event Detection with Neural Networks

    Authors: Huy Phan, Philipp Koch, Ian McLoughlin, Alfred Mertins

    Abstract: This paper presents a methodology for early detection of audio events from audio streams. Early detection is the ability to infer an ongoing event during its initial stage. The proposed system consists of a novel inference step coupled with dual parallel tailored-loss deep neural networks (DNNs). The DNNs share a similar architecture except for their loss functions, i.e. weighted loss and multitas… ▽ More

    Submitted 6 April, 2019; v1 submitted 6 December, 2017; originally announced December 2017.

    Comments: Published version available at https://ieeexplore.ieee.org/document/8461859

    Journal ref: Published in Proceedings of 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 141-145, 2018