Skip to main content

Showing 1–10 of 10 results for author: Dhamyal, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  2. arXiv:2310.02298  [pdf, other

    cs.SD cs.AI eess.AS

    Prompting Audios Using Acoustic Properties For Emotion Representation

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emoti… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.07737

  3. arXiv:2310.00706  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

    Authors: Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

    Abstract: Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Erro… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  4. arXiv:2211.07737  [pdf, other

    cs.SD cs.LG eess.AS

    Describing emotions with acoustic property prompts for speech emotion recognition

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a broad continuum and treating emotions as a discrete number of classes limits the ability of a model to capture the nuances in the continuum. The challenge is how to describe the nuances of emotions and how to enable a model to learn the descriptions. In this work, we devise a method to automatically create a description (or prompt) for a given audio by computing acoustic properti… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  5. arXiv:2210.16642  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

    Authors: Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Traditionally, in paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels. Accordingly, models that have been proposed for emotion detection use one or the other of these label types. However, psychologists like Russell and Plutchik have proposed theories and models that unite these views, maintaining that the… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: Under Review at ICASSP 2023

  6. arXiv:2206.12568  [pdf, other

    cs.SD cs.AI eess.AS

    Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

    Authors: Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

    Abstract: This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice utilized a combination of spectro-temporal modulation and self-supervised features, followed by an encoder-decoder network organized in a multitask paradigm. We evaluate… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  7. arXiv:2204.04802  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

    Authors: Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh

    Abstract: Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice. Different researchers use different kinds of information from the voice signal to achieve this. Various types of phonated sounds and the sound of cough and breath have all been used with varying degree of success in automated voice-based COVID-19 detection apps. In this paper, we show that detecting C… ▽ More

    Submitted 25 October, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2022

  8. arXiv:2110.04678  [pdf, other

    cs.SD cs.AI eess.AS

    An Overview of Techniques for Biomarker Discovery in Voice Signal

    Authors: Rita Singh, Ankit Shah, Hira Dhamyal

    Abstract: This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: Last two authors contributed equally to the paper

  9. Masked Proxy Loss For Text-Independent Speaker Verification

    Authors: Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance… ▽ More

    Submitted 24 June, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted at Interspeech 2021

  10. arXiv:1911.05733  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    The phonetic bases of vocal expressed emotion: natural versus acted

    Authors: Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

    Abstract: Can vocal emotions be emulated? This question has been a recurrent concern of the speech community, and has also been vigorously investigated. It has been fueled further by its link to the issue of validity of acted emotion databases. Much of the speech and vocal emotion research has relied on acted emotion databases as valid proxies for studying natural emotions. To create models that generalize… ▽ More

    Submitted 24 July, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: 5 pages, 6 figures