Skip to main content

Showing 1–15 of 15 results for author: Mitra, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.01882  [pdf, other

    cs.HC

    Using Virtual Reality for Detection and Intervention of Depression -- A Systematic Literature Review

    Authors: Mohammad Waqas, Y Pawankumar Gururaj, V D Shanmukha Mitra, Sai Anirudh Karri, Raghu Reddy, Syed Azeemuddin

    Abstract: The use of emerging technologies like Virtual Reality (VR) in therapeutic settings has increased in the past few years. By incorporating VR, a mental health condition like depression can be assessed effectively, while also providing personalized motivation and meaningful engagement for treatment purposes. The integration of external sensors further enhances the engagement of the subjects with the… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 8 pages, 2 figures, 3 tables, Conference full paper

  2. arXiv:2312.16180  [pdf, other

    cs.SD cs.AI cs.CL cs.LG

    Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

    Authors: Vikramjit Mitra, **g** Nie, Erdrin Azemi

    Abstract: Representations derived from models such as BERT (Bidirectional Encoder Representations from Transformers) and HuBERT (Hidden units BERT), have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Despite their large dimensionality, and even though these representations are not tailored for emotion recognition tasks, they are frequently used to train large spee… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 5 pages

    Journal ref: ICASSP 2024

  3. arXiv:2303.03177  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

    Authors: Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano

    Abstract: Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 5 pages, conference

  4. arXiv:2207.03334  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

    Authors: Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, Joseph Yitan Cheng, Erdrin Azemi

    Abstract: Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: 5 pages, 3 figures, Interspeech 2022

  5. arXiv:2107.14028  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Estimating Respiratory Rate From Breath Audio Obtained Through Wearable Microphones

    Authors: Agni Kumar, Vikramjit Mitra, Carolyn Oliver, Adeeti Ullal, Matt Biddulph, Irida Mance

    Abstract: Respiratory rate (RR) is a clinical metric used to assess overall health and physical fitness. An individual's RR can change from their baseline due to chronic illness symptoms (e.g., asthma, congestive heart failure), acute illness (e.g., breathlessness due to infection), and over the course of the day due to physical exhaustion during heightened exertion. Remote estimation of RR can offer a cost… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2021

  6. arXiv:2106.11759  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.SD

    Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

    Authors: Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

    Abstract: Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetition… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: 5 pages, 1 page reference, 2 figures

  7. arXiv:2102.12394  [pdf, other

    eess.AS cs.SD

    SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter

    Authors: Colin Lea, Vikramjit Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey P. Bigham

    Abstract: The ability to automatically detect stuttering events in speech could help speech pathologists track an individual's fluency over time or help improve speech recognition systems for people with atypical speech patterns. Despite increasing interest in this area, existing public datasets are too small to build generalizable dysfluency detection systems and lack sufficient annotations. In this work,… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  8. arXiv:2002.01323  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Detecting Emotion Primitives from Speech and their use in discerning Categorical Emotions

    Authors: Vasudha Kowtha, Vikramjit Mitra, Chris Bartels, Erik Marchi, Sue Booker, William Caruso, Sachin Kajarekar, Devang Naik

    Abstract: Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity. While modern speech technologies rely heavily on speech recognition and natural language understanding for speech content understanding, the investigation of vocal expression is increasingly gaining attention. Key considerations for building robust emotion… ▽ More

    Submitted 30 January, 2020; originally announced February 2020.

    Comments: 5 pages

  9. arXiv:2001.01755  [pdf, ps, other

    cs.LG cs.SD eess.AS stat.ML

    Investigation and Analysis of Hyper and Hypo neuron pruning to selectively update neurons during Unsupervised Adaptation

    Authors: Vikramjit Mitra, Horacio Franco

    Abstract: Unseen or out-of-domain data can seriously degrade the performance of a neural network model, indicating the model's failure to generalize to unseen data. Neural net pruning can not only help to reduce a model's size but can improve the model's generalization capacity as well. Pruning approaches look for low-salient neurons that are less contributive to a model's decision and hence can be removed… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: DSP, 29 pages, 8 figures

  10. arXiv:1907.00112  [pdf

    cs.CL cs.LG cs.SD eess.AS

    Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

    Authors: Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik

    Abstract: Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions.… ▽ More

    Submitted 28 June, 2019; originally announced July 2019.

    Comments: 5 pages, 6 figures

  11. arXiv:1905.06533  [pdf, other

    cs.CL cs.SD eess.AS

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Authors: Emre Yılmaz, Vikramjit Mitra, Ganesh Sivaraman, Horacio Franco

    Abstract: The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on… ▽ More

    Submitted 20 May, 2019; v1 submitted 16 May, 2019; originally announced May 2019.

    Comments: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.10948

  12. arXiv:1807.10948  [pdf, other

    cs.CL

    Articulatory Features for ASR of Pathological Speech

    Authors: Emre Yılmaz, Vikramjit Mitra, Chris Bartels, Horacio Franco

    Abstract: In this work, we investigate the joint use of articulatory and acoustic features for automatic speech recognition (ASR) of pathological speech. Despite long-lasting efforts to build speaker- and text-independent ASR systems for people with dysarthria, the performance of state-of-the-art systems is still considerably lower on this type of speech than on normal speech. The most prominent reason for… ▽ More

    Submitted 28 July, 2018; originally announced July 2018.

    Comments: Accepted for publication at Interspeech 2018

  13. arXiv:1802.06861  [pdf

    cs.CL cs.SD eess.AS

    Interpreting DNN output layer activations: A strategy to cope with unseen data in speech recognition

    Authors: Vikramjit Mitra, Horacio Franco

    Abstract: Unseen data can degrade performance of deep neural net acoustic models. To cope with unseen data, adaptation techniques are deployed. For unlabeled unseen data, one must generate some hypothesis given an existing model, which is used as the label for model adaptation. However, assessing the goodness of the hypothesis can be difficult, and an erroneous hypothesis can lead to poorly trained models.… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

    Comments: 5 pages. arXiv admin note: substantial text overlap with arXiv:1708.09516

  14. arXiv:1802.05853  [pdf

    cs.CL cs.SD eess.AS

    Articulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition

    Authors: Vikramjit Mitra, Wen Wang, Chris Bartels, Horacio Franco, Dimitra Vergyri

    Abstract: This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, which a… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

    Comments: 5 pages

  15. arXiv:1708.09516  [pdf

    cs.LG cs.CL stat.ML

    Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition

    Authors: Vikramjit Mitra, Horacio Franco

    Abstract: Unseen data conditions can inflict serious performance degradation on systems relying on supervised machine learning algorithms. Because data can often be unseen, and because traditional machine learning algorithms are trained in a supervised manner, unsupervised adaptation techniques must be used to adapt the model to the unseen data conditions. However, unsupervised adaptation is often challengi… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

    Comments: 7 pages, Index Terms: automatic speech recognition, robust speech recognition, unsupervised adaptation, neural network activations, confidence measures