Skip to main content

Showing 1–5 of 5 results for author: Ismail, M A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.00282  [pdf, other

    eess.AS cs.SD

    PAM: Prompting Audio-Language Models for Audio Quality Assessment

    Authors: Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

    Abstract: While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calcu… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  2. arXiv:2308.06327  [pdf, other

    eess.AS cs.CL cs.SD

    Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

    Authors: Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad, Kshitiz Kumar, Jian Wu

    Abstract: We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language ide… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  3. arXiv:2206.04769  [pdf, other

    cs.SD eess.AS

    CLAP: Learning Audio Concepts From Natural Language Supervision

    Authors: Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, Huaming Wang

    Abstract: Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many recordings focusing on one task. Learning under such restricted supervision limits the flexibility of models because they require labeled audio for training and can only predict the predefined categories. Instead, we propose to learn audio concepts from natural language supervision. We call our app… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  4. Interpreting glottal flow dynamics for detecting COVID-19 from voice

    Authors: Soham Deshmukh, Mahmoud Al Ismail, Rita Singh

    Abstract: In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key symptoms. Studies show that in these cases, voice production is also adversely affected -- vocal fold oscillations are asynchronous, asymmetrical and more restricted during phonation. This paper proposes a method that analyzes the differential dynamics of the glottal flow waveform (GFW) during voice produc… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

  5. arXiv:2010.10707  [pdf, other

    eess.AS cs.LG cs.SD

    Detection of COVID-19 through the analysis of vocal fold oscillations

    Authors: Mahmoud Al Ismail, Soham Deshmukh, Rita Singh

    Abstract: Phonation, or the vibration of the vocal folds, is the primary source of vocalization in the production of voiced sounds by humans. It is a complex bio-mechanical process that is highly sensitive to changes in the speaker's respiratory parameters. Since most symptomatic cases of COVID-19 present with moderate to severe impairment of respiratory functions, we hypothesize that signatures of COVID-19… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 5 pages, 6 figures