Skip to main content

Showing 1–9 of 9 results for author: Rezagholizadeh, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.08654  [pdf, other

    eess.AS cs.SD

    An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech representation learning enables the extraction of meaningful features from raw waveforms. These features can then be efficiently used across multiple downstream tasks. However, two significant issues arise when considering the deployment of such methods ``in-the-wild": (i) Their large size, which can be prohibitive for edge applications; and (ii) their robustness to detrimen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

  2. arXiv:2309.14462  [pdf, ps, other

    eess.AS cs.SD

    On the Impact of Quantization and Pruning of Self-Supervised Speech Models for Downstream Speech Recognition Tasks "In-the-Wild''

    Authors: Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors. Notwithstanding, while such models achieve SOTA performance in matched train/test conditions, their performance degrades substantially when tested in unseen conditions… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  3. arXiv:2306.06819  [pdf, other

    cs.CL cs.LG eess.AS

    Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

    Authors: Anderson R. Avila, Mehdi Rezagholizadeh, Chao Xing

    Abstract: Recent voice assistants are usually based on the cascade spoken language understanding (SLU) solution, which consists of an automatic speech recognition (ASR) engine and a natural language understanding (NLU) system. Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-… ▽ More

    Submitted 13 June, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

  4. arXiv:2305.14546  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

    Authors: Vamsikrishna Chemudupati, Marzieh Tahaei, Heitor Guimaraes, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago Falk

    Abstract: Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks. The self-supervised training of these models leads to universal speech representations that can be used for different downstream tasks, ranging from automatic speech recognition (ASR) to speaker identification. Recently, Whisper, a transformer-based model was proposed and traine… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  5. arXiv:2305.05443  [pdf, ps, other

    eess.AS cs.SD

    An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild'' Edge Applications

    Authors: Heitor Guimarães, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Unsupervised speech models are becoming ubiquitous in the speech and machine learning communities. Upstream models are responsible for learning meaningful representations from raw audio. Later, these representations serve as input to downstream models to solve a number of tasks, such as keyword spotting or emotion recognition. As edge speech applications start to emerge, it is important to gauge h… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Extended Abstract accepted in the Edge Intelligence Workshop (EIW) 2022

  6. arXiv:2302.09437  [pdf, other

    eess.AS cs.SD

    RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech pre-training enables deep neural network models to capture meaningful and disentangled factors from raw waveform signals. The learned universal speech representations can then be used across numerous downstream tasks. These representations, however, are sensitive to distribution shifts caused by environmental factors, such as noise and/or room reverberation. Their large size… ▽ More

    Submitted 22 February, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  7. arXiv:2211.06562  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition. Existing models, such as HuBERT, however, can be fairly large thus may not be suitable for edge speech applications. Moreover, realistic applications typically involve speech corrupted by noise… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: ENLSP-II NeurIPS Workshop 2022, 6 pages

  8. arXiv:2103.13329  [pdf, other

    eess.AS cs.CL cs.SD

    Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

    Authors: Md Akmal Haidar, Mehdi Rezagholizadeh

    Abstract: Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-v… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted in ICASSP 2021 conference

  9. arXiv:1911.03604  [pdf, other

    cs.CL cs.SD eess.AS

    A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

    Authors: Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh

    Abstract: While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task.… ▽ More

    Submitted 24 March, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Submitted to IEEE Signal Processing Letters Minor changes in Section 3