Skip to main content

Showing 1–18 of 18 results for author: Ghosh, P K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.00698  [pdf, other

    eess.AS

    SPIRE-SIES: A Spontaneous Indian English Speech Corpus

    Authors: Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh

    Abstract: In this paper, we present a 170.83 hour Indian English spontaneous speech dataset. Lack of Indian English speech data is one of the major hindrances in develo** robust speech systems which are adapted to the Indian speech style. Moreover this scarcity is even more for spontaneous speech. This corpus is crowd sourced over varied Indian nativities, genders and age groups. Traditional spontaneous s… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 6 pages, 7 plots, 3 tables, Accepted at O-COCOSDA 2023

  2. arXiv:2310.08846  [pdf, other

    eess.AS

    Speaking rate attention-based duration prediction for speed control TTS

    Authors: Jesuraj Bandekar, Sathvik Udupa, Abhayjeet Singh, Anjali Jayakumar, Deekshitha G, Sandhya Badiger, Saurabh Kumar, Pooja VH, Prasanta Kumar Ghosh

    Abstract: With the advent of high-quality speech synthesis, there is a lot of interest in controlling various prosodic attributes of speech. Speaking rate is an essential attribute towards modelling the expressivity of speech. In this work, we propose a novel approach to control the speaking rate for non-autoregressive TTS. We achieve this by conditioning the speaking rate inside the duration predictor, all… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  3. arXiv:2307.07948  [pdf, ps, other

    eess.AS cs.CL

    Model Adaptation for ASR in low-resource Indian Languages

    Authors: Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati, Rohan Saxena, Sai Praneeth Reddy Mora, Srinivasa Raghavan

    Abstract: Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: ASRU Special session overview paper

  4. arXiv:2305.00242  [pdf, other

    physics.med-ph eess.AS eess.SP

    Analysis of vocal breath sounds before and after administering Bronchodilator in Asthmatic patients

    Authors: Shivani Yadav, Dipanjan Gope, Uma Maheswari K., Prasanta Kumar Ghosh

    Abstract: Asthma is one of the chronic inflammatory diseases of the airways, which causes chest tightness, wheezing, breathlessness, and cough. Spirometry is an effort-dependent test used to monitor and diagnose lung conditions like Asthma. Vocal breath sound (VBS) based analysis can be an alternative to spirometry as VBS characteristics change depending on the lung condition. VBS test consumes less time, a… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

  5. arXiv:2304.03758  [pdf, ps, other

    eess.AS eess.SP

    An unsupervised segmentation of vocal breath sounds

    Authors: Shivani Yadav, Dipanjan Gope, Uma Maheswari K., Prasanta Kumar Ghosh

    Abstract: Breathing is an essential part of human survival, which carries information about a person's physiological and psychological state. Generally, breath boundaries are marked by experts before using for any task. An unsupervised algorithm for breath boundary detection has been proposed for breath sounds recorded at the mouth also referred as vocal breath sounds (VBS) in this work. Breath sounds recor… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  6. arXiv:2211.06371   

    eess.AS

    Vocal Breath Sound Based Gender Classification

    Authors: Mohammad Shaique Solanki, Ashutosh M Bharadwaj, Jeevan K, Prasanta Kumar Ghosh

    Abstract: Voiced speech signals such as continuous speech are known to have acoustic features such as pitch(F0), and formant frequencies(F1, F2, F3) which can be used for gender classification. However, gender classification studies using non-speech signals such as vocal breath sounds have not been explored as they lack typical gender-specific acoustic features. In this work, we explore whether vocal breath… ▽ More

    Submitted 25 May, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: Some updates in the paper. Will new version after updares

  7. arXiv:2210.16881  [pdf, other

    eess.AS cs.SD

    Real-Time MRI Video synthesis from time aligned phonemes with sequence-to-sequence networks

    Authors: Sathvik Udupa, Prasanta Kumar Ghosh

    Abstract: Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. We obtain time-aligned phonemes from forced alignment, to obtain frame-level phoneme sequences which are aligned with rtMRI frames. We propose a sequence-to-sequence learn… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: submitted to ICASSP 2023

  8. arXiv:2210.16871  [pdf, other

    eess.AS cs.SD

    Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

    Authors: Sathvik Udupa, Siddarth C, Prasanta Kumar Ghosh

    Abstract: In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the map** for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion c… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: submitted to ICASSP 2023

  9. arXiv:2203.06004  [pdf, other

    cs.CV eess.AS

    An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production

    Authors: Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh

    Abstract: The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation of this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time War** (DTW) distance between the… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: accepted for ICASSP 2022

  10. arXiv:2112.04151  [pdf, ps, other

    eess.AS cs.CL cs.SD

    A study on native American English speech recognition by Indian listeners with varying word familiarity level

    Authors: Abhayjeet Singh, Achuth Rao MV, Rakesh Vaideeswaran, Chiranjeevi Yarra, Prasanta Kumar Ghosh

    Abstract: In this study, listeners of varied Indian nativities are asked to listen and recognize TIMIT utterances spoken by American speakers. We have three kinds of responses from each listener while they recognize an utterance: 1. Sentence difficulty ratings, 2. Speaker difficulty ratings, and 3. Transcription of the utterance. From these transcriptions, word error rate (WER) is calculated and used as a m… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: 6 pages, 5 figues, COCOSDA 2021

  11. arXiv:2106.00639  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

    Authors: Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

    Abstract: The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a… ▽ More

    Submitted 5 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: The Manuscript is submitted to IEEE-EMBS Journal of Biomedical and Health Informatics on June 1, 2021

  12. arXiv:2104.05017  [pdf, other

    eess.AS cs.SD

    Estimating articulatory movements in speech production with transformer networks

    Authors: Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

    Abstract: We estimate articulatory movements in speech production from different modalities - acoustics and phonemes. Acoustic-to articulatory inversion (AAI) is a sequence-to-sequence task. On the other hand, phoneme to articulatory (PTA) motion estimation faces a key challenge in reliably aligning the text and the articulatory movements. To address this challenge, we explore the use of a transformer archi… ▽ More

    Submitted 12 June, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: accepted for oral presentation at INTERSPEECH 2021

  13. Multilingual and code-switching ASR challenges for low resource Indian languages

    Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

    Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: 6 pages

  14. arXiv:2103.09148  [pdf, other

    eess.AS cs.SD

    DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

    Authors: Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

    Abstract: The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These… ▽ More

    Submitted 17 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: To appear in Proceedings of Interspeech, 2021

  15. arXiv:2006.11536  [pdf, other

    eess.AS

    Speaker conditioned acoustic-to-articulatory inversion using x-vectors

    Authors: Aravind Illa, Prasanta Kumar Ghosh

    Abstract: Speech production involves the movement of various articulators, including tongue, jaw, and lips. Estimating the movement of the articulators from the acoustics of speech is known as acoustic-to-articulatory inversion (AAI). Recently, it has been shown that instead of training AAI in a speaker specific manner, pooling the acoustic-articulatory data from multiple speakers is beneficial. Further, ad… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  16. arXiv:2006.03107  [pdf, other

    eess.AS cs.LG cs.SD

    Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates

    Authors: Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

    Abstract: While speaking at different rates, articulators (like tongue, lips) tend to move differently and the enunciations are also of different durations. In the past, affine transformation and DNN have been used to transform articulatory movements from neutral to fast(N2F) and neutral to slow(N2S) speaking rates [1]. In this work, we improve over the existing transformation techniques by modeling rate sp… ▽ More

    Submitted 20 August, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

    Comments: 5 pages, 4 figures, InterSpeech 2020

  17. Coswara -- A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

    Authors: Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy

    Abstract: The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is the reverse transcription polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for… ▽ More

    Submitted 11 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: A description of Coswara dataset to evaluate COVID-19 diagnosis using respiratory sounds

  18. arXiv:1910.14375  [pdf, other

    eess.AS cs.LG

    A comparative study of estimating articulatory movements from phoneme sequences and acoustic features

    Authors: Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

    Abstract: Unlike phoneme sequences, movements of speech articulators (lips, tongue, jaw, velum) and the resultant acoustic signal are known to encode not only the linguistic message but also carry para-linguistic information. While several works exist for estimating articulatory movement from acoustic signals, little is known to what extent articulatory movements can be predicted only from linguistic inform… ▽ More

    Submitted 19 February, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: 5 pages, 5 figures, accepted in ICASSP 2020