Skip to main content

Showing 1–14 of 14 results for author: Ghosh, P K

Searching in archive cs. Search in all archives.
.
  1. Neural network based approach for solving problems in plane wave duct acoustics

    Authors: D. Veerababu, Prasanta K. Ghosh

    Abstract: Neural networks have emerged as a tool for solving differential equations in many branches of engineering and science. But their progress in frequency domain acoustics is limited by the vanishing gradient problem that occurs at higher frequencies. This paper discusses a formulation that can address this issue. The problem of solving the governing differential equation along with the boundary condi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Published Journal Article

    ACM Class: G.1.6; I.6.4; J.2

    Journal ref: Journal of Sound and Vibration, 585, 2024:118476

  2. arXiv:2307.07948  [pdf, ps, other

    eess.AS cs.CL

    Model Adaptation for ASR in low-resource Indian Languages

    Authors: Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati, Rohan Saxena, Sai Praneeth Reddy Mora, Srinivasa Raghavan

    Abstract: Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: ASRU Special session overview paper

  3. arXiv:2210.16881  [pdf, other

    eess.AS cs.SD

    Real-Time MRI Video synthesis from time aligned phonemes with sequence-to-sequence networks

    Authors: Sathvik Udupa, Prasanta Kumar Ghosh

    Abstract: Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. We obtain time-aligned phonemes from forced alignment, to obtain frame-level phoneme sequences which are aligned with rtMRI frames. We propose a sequence-to-sequence learn… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: submitted to ICASSP 2023

  4. arXiv:2210.16871  [pdf, other

    eess.AS cs.SD

    Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

    Authors: Sathvik Udupa, Siddarth C, Prasanta Kumar Ghosh

    Abstract: In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the map** for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion c… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: submitted to ICASSP 2023

  5. arXiv:2203.06004  [pdf, other

    cs.CV eess.AS

    An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production

    Authors: Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh

    Abstract: The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation of this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time War** (DTW) distance between the… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: accepted for ICASSP 2022

  6. arXiv:2112.04151  [pdf, ps, other

    eess.AS cs.CL cs.SD

    A study on native American English speech recognition by Indian listeners with varying word familiarity level

    Authors: Abhayjeet Singh, Achuth Rao MV, Rakesh Vaideeswaran, Chiranjeevi Yarra, Prasanta Kumar Ghosh

    Abstract: In this study, listeners of varied Indian nativities are asked to listen and recognize TIMIT utterances spoken by American speakers. We have three kinds of responses from each listener while they recognize an utterance: 1. Sentence difficulty ratings, 2. Speaker difficulty ratings, and 3. Transcription of the utterance. From these transcriptions, word error rate (WER) is calculated and used as a m… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: 6 pages, 5 figues, COCOSDA 2021

  7. arXiv:2106.00639  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

    Authors: Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

    Abstract: The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a… ▽ More

    Submitted 5 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: The Manuscript is submitted to IEEE-EMBS Journal of Biomedical and Health Informatics on June 1, 2021

  8. arXiv:2104.05017  [pdf, other

    eess.AS cs.SD

    Estimating articulatory movements in speech production with transformer networks

    Authors: Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

    Abstract: We estimate articulatory movements in speech production from different modalities - acoustics and phonemes. Acoustic-to articulatory inversion (AAI) is a sequence-to-sequence task. On the other hand, phoneme to articulatory (PTA) motion estimation faces a key challenge in reliably aligning the text and the articulatory movements. To address this challenge, we explore the use of a transformer archi… ▽ More

    Submitted 12 June, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: accepted for oral presentation at INTERSPEECH 2021

  9. Multilingual and code-switching ASR challenges for low resource Indian languages

    Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

    Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: 6 pages

  10. arXiv:2103.09148  [pdf, other

    eess.AS cs.SD

    DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

    Authors: Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

    Abstract: The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These… ▽ More

    Submitted 17 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: To appear in Proceedings of Interspeech, 2021

  11. arXiv:2006.03107  [pdf, other

    eess.AS cs.LG cs.SD

    Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates

    Authors: Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

    Abstract: While speaking at different rates, articulators (like tongue, lips) tend to move differently and the enunciations are also of different durations. In the past, affine transformation and DNN have been used to transform articulatory movements from neutral to fast(N2F) and neutral to slow(N2S) speaking rates [1]. In this work, we improve over the existing transformation techniques by modeling rate sp… ▽ More

    Submitted 20 August, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

    Comments: 5 pages, 4 figures, InterSpeech 2020

  12. Coswara -- A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

    Authors: Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy

    Abstract: The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is the reverse transcription polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for… ▽ More

    Submitted 11 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: A description of Coswara dataset to evaluate COVID-19 diagnosis using respiratory sounds

  13. arXiv:1910.14375  [pdf, other

    eess.AS cs.LG

    A comparative study of estimating articulatory movements from phoneme sequences and acoustic features

    Authors: Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

    Abstract: Unlike phoneme sequences, movements of speech articulators (lips, tongue, jaw, velum) and the resultant acoustic signal are known to encode not only the linguistic message but also carry para-linguistic information. While several works exist for estimating articulatory movement from acoustic signals, little is known to what extent articulatory movements can be predicted only from linguistic inform… ▽ More

    Submitted 19 February, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: 5 pages, 5 figures, accepted in ICASSP 2020

  14. arXiv:1004.5495  [pdf

    cs.NI

    The Role of Boolean Function in Fractal Formation and it s Application to CDMA Wireless Communication

    Authors: Somnath Mukherjee, Pabitra Kumar Ghosh

    Abstract: In this paper, a new transformation is generated from a three variable Boolean function 3, which is used to produce a self-similar fractal pattern of dimension 1.58. This very fractal pattern is used to reconstruct the whole structural position of resources in wireless CDMA network. This reconstruction minimizes the number of resources in the network and so naturally network consumption costs are… ▽ More

    Submitted 19 May, 2010; v1 submitted 30 April, 2010; originally announced April 2010.

    Comments: 8 pages, 14 figures