Skip to main content

Showing 1–19 of 19 results for author: Chetupalli, S R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.12741  [pdf, other

    eess.AS cs.LG cs.SD q-bio.QM

    Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection

    Authors: Debarpan Bhattacharya, Neeraj Kumar Sharma, Debottam Dutta, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

    Abstract: This paper presents the Coswara dataset, a dataset containing diverse set of respiratory sounds and rich meta-data, recorded between April-2020 and February-2022 from 2635 individuals (1819 SARS-CoV-2 negative, 674 positive, and 142 recovered subjects). The respiratory sounds contained nine sound categories associated with variants of breathing, cough and speech. The rich metadata contained demogr… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted for publiation in Nature Scientific Data

  2. arXiv:2303.08702  [pdf, other

    eess.AS cs.SD

    Beamformer-Guided Target Speaker Extraction

    Authors: Mohamed Elminshawi, Srikanth Raj Chetupalli, Emanuël A. P. Habets

    Abstract: We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method to extract a target speaker's voice from a multi-channel recording informed by the direction of arrival of the target. The proposed method employs a front-end beamformer steered towards the target speaker to provide an auxiliary signal to a single-channel TSE system. By allowing for time-varying embeddings in the single-chann… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

  3. arXiv:2303.07143  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-Microphone Speaker Separation by Spatial Regions

    Authors: Julian Wechsler, Srikanth Raj Chetupalli, Wolfgang Mack, Emanuël A. P. Habets

    Abstract: We consider the task of region-based source separation of reverberant multi-microphone recordings. We assume pre-defined spatial regions with a single active source per region. The objective is to estimate the signals from the individual spatial regions as captured by a reference microphone while retaining a correspondence between signals and spatial regions. We propose a data-driven approach usin… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing

  4. arXiv:2206.12309  [pdf, other

    eess.AS cs.LG eess.SP

    Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals

    Authors: Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

    Abstract: The COVID-19 outbreak resulted in multiple waves of infections that have been associated with different SARS-CoV-2 variants. Studies have reported differential impact of the variants on respiratory health of patients. We explore whether acoustic signals, collected from COVID-19 subjects, show computationally distinguishable acoustic patterns suggesting a possibility to predict the underlying virus… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Journal ref: Interspeech, 2022

  5. arXiv:2206.06184  [pdf, other

    eess.AS eess.SP

    AmbiSep: Ambisonic-to-Ambisonic Reverberant Speech Separation Using Transformer Networks

    Authors: Adrian Herzog, Srikanth Raj Chetupalli, Emanuël A. P. Habets

    Abstract: Consider a multichannel Ambisonic recording containing a mixture of several reverberant speech signals. Retreiving the reverberant Ambisonic signals corresponding to the individual speech sources blindly from the mixture is a challenging task as it requires to estimate multiple signal channels for each source. In this work, we propose AmbiSep, a deep neural network-based plane-wave domain masking… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Preprint submitted to IWAENC 2022 (https://iwaenc2022.org)

  6. arXiv:2206.05053  [pdf, other

    cs.HC cs.LG cs.SD eess.AS eess.SP

    Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms

    Authors: Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K K, Sadhana Gonuguntla, Murali Alagesan

    Abstract: The COVID-19 pandemic has accelerated research on design of alternative, quick and effective COVID-19 diagnosis approaches. In this paper, we describe the Coswara tool, a website application designed to enable COVID-19 detection by analysing respiratory sound samples and health symptoms. A user using this service can log into a website using any device connected to the internet, provide there curr… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Journal ref: Interspeech, 2022

  7. arXiv:2202.00733  [pdf, other

    eess.AS cs.SD

    New Insights on Target Speaker Extraction

    Authors: Mohamed Elminshawi, Wolfgang Mack, Srikanth Raj Chetupalli, Soumitro Chakrabarty, Emanuël A. P. Habets

    Abstract: Speaker extraction (SE) aims to segregate the speech of a target speaker from a mixture of interfering speakers with the help of auxiliary information. Several forms of auxiliary information have been employed in single-channel SE, such as a speech snippet enrolled from the target speaker or visual information corresponding to the spoken utterance. The effectiveness of the auxiliary information in… ▽ More

    Submitted 15 September, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

  8. arXiv:2110.01177  [pdf, other

    eess.AS cs.SD q-bio.QM

    The Second DiCOVA Challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics

    Authors: Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy

    Abstract: The Second Diagnosis of COVID-19 using Acoustics (DiCOVA) Challenge aimed at accelerating the research in acoustics based detection of COVID-19, a topic at the intersection of acoustics, signal processing, machine learning, and healthcare. This paper presents the details of the challenge, which was an open call for researchers to analyze a dataset of audio recordings consisting of breathing, cough… ▽ More

    Submitted 11 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

  9. arXiv:2109.04544  [pdf, other

    eess.AS cs.SD eess.SP

    Directional MCLP Analysis and Reconstruction for Spatial Speech Communication

    Authors: Srikanth Raj Chetupalli, Thippur V. Sreenivas

    Abstract: Spatial speech communication, i.e., the reconstruction of spoken signal along with the relative speaker position in the enclosure (reverberation information) is considered in this paper. Directional, diffuse components and the source position information are estimated at the transmitter, and perceptually effective reproduction is considered at the receiver. We consider spatially distributed microp… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: The manuscript is submitted as a full paper to IEEE/ACM Transactions on Audio, Speech and Language Processing

  10. arXiv:2106.10997  [pdf, other

    eess.AS cs.SD

    Towards sound based testing of COVID-19 -- Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge

    Authors: Neeraj Kumar Sharma, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy

    Abstract: The technology development for point-of-care tests (POCTs) targeting respiratory diseases has witnessed a growing demand in the recent past. Investigating the presence of acoustic biomarkers in modalities such as cough, breathing and speech sounds, and using them for building POCTs can offer fast, contactless and inexpensive testing. In view of this, over the past year, we launched the ``Coswara''… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Manuscript in review in the Elsevier Computer Speech and Language journal

  11. arXiv:2106.00639  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

    Authors: Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

    Abstract: The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a… ▽ More

    Submitted 5 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: The Manuscript is submitted to IEEE-EMBS Journal of Biomedical and Health Informatics on June 1, 2021

  12. arXiv:2104.02359  [pdf, other

    eess.AS

    LEAP Submission for the Third DIHARD Diarization Challenge

    Authors: Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj Chetupalli, Sriram Ganapathy

    Abstract: The LEAP submission for DIHARD-III challenge is described in this paper. The proposed system is composed of a speech bandwidth classifier, and diarization systems fine-tuned for narrowband and wideband speech separately. We use an end-to-end speaker diarization system for the narrowband conversational telephone speech recordings. For the wideband multi-speaker recordings, we use a neural embedding… ▽ More

    Submitted 14 June, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Accepted in INTERSPEECH 2021

  13. arXiv:2104.01882  [pdf, other

    eess.AS

    Speaker conditioned acoustic modeling for multi-speaker conversational ASR

    Authors: Srikanth Raj Chetupalli, Sriram Ganapathy

    Abstract: In this paper, we propose a novel approach for the transcription of speech conversations with natural speaker overlap, from single channel speech recordings. The proposed model is a combination of a speaker diarization system and a hybrid automatic speech recognition (ASR) system. The speaker conditioned acoustic model (SCAM) in the ASR system consists of a series of embedding layers which use the… ▽ More

    Submitted 29 August, 2022; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: Manuscript accepted for presentation at Interspeech 2022

  14. arXiv:2103.09148  [pdf, other

    eess.AS cs.SD

    DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

    Authors: Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

    Abstract: The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These… ▽ More

    Submitted 17 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: To appear in Proceedings of Interspeech, 2021

  15. arXiv:2008.03517  [pdf, ps, other

    eess.AS

    Context Dependent RNNLM for Automatic Transcription of Conversations

    Authors: Srikanth Raj Chetupalli, Sriram Ganapathy

    Abstract: Conversational speech, while being unstructured at an utterance level, typically has a macro topic which provides larger context spanning multiple utterances. The current language models in speech recognition systems using recurrent neural networks (RNNLM) rely mainly on the local context and exclude the larger context. In order to model the long term dependencies of words across multiple sentence… ▽ More

    Submitted 8 August, 2020; originally announced August 2020.

    Comments: Manuscript accepted for publication at INTERSPEECH 2020, Oct 25-29, Shanghai, China

  16. Coswara -- A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

    Authors: Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy

    Abstract: The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is the reverse transcription polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for… ▽ More

    Submitted 11 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: A description of Coswara dataset to evaluate COVID-19 diagnosis using respiratory sounds

  17. arXiv:1910.09782  [pdf, ps, other

    eess.AS eess.SP

    Joint spatial filter and time-varying MCLP for dereverberation and interference suppression of a dynamic/static speech source

    Authors: Srikanth Raj Chetupalli, Thippur V. Sreenivas

    Abstract: Dereverberation of a moving speech source in the presence of other directional interferers, is a harder problem than that of stationary source and interference cancellation. We explore joint multi channel linear prediction (MCLP) and relative transfer function (RTF) formulation in a stochastic framework and maximum likelihood estimation. We found that the combination of spatial filtering with dist… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: Manuscript submitted for review to IEEE/ACM Transactions on Audio, Speech, and Language Processing on 18 Jul 2019

  18. arXiv:1812.01346  [pdf, ps, other

    eess.AS cs.SD

    LSTM based AE-DNN constraint for better late reverb suppression in multi-channel LP formulation

    Authors: Srikanth Raj Chetupalli, Thippur V. Sreenivas

    Abstract: Prediction of late reverberation component using multi-channel linear prediction (MCLP) in short-time Fourier transform (STFT) domain is an effective means to enhance reverberant speech. Traditionally, a speech power spectral density (PSD) weighted prediction error (WPE) minimization approach is used to estimate the prediction filters. The method is sensitive to the estimate of the desired signal… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

  19. arXiv:1810.13109  [pdf, ps, other

    eess.AS cs.SD

    Latent variable approach to diarization of audio recordings using ad-hoc randomly placed mobile devices

    Authors: Srikanth Raj Chetupalli, Anirban Bhowmick, Thippur V. Sreenivas

    Abstract: Diarization of audio recordings from ad-hoc mobile devices using spatial information is considered in this paper. A two-channel synchronous recording is assumed for each mobile device, which is used to compute directional statistics separately at each device in a frame-wise manner. The recordings across the mobile devices are asynchronous, but a coarse synchronization is performed by aligning the… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

    Comments: Paper Submitted to the International Conference on Acoustics Speech and Signal Processing (ICASSP) 2019 to be held in Brighton, UK between May 12-17, 2019