Skip to main content

Showing 1–11 of 11 results for author: Prasanna, S R M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09494  [pdf, other

    eess.AS cs.LG

    The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

    Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy

    Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this datas… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, Interspeech 2024

  2. arXiv:2308.10470  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit Self-supervised Language Representation for Spoken Language Diarization

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmen… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Planning to Submit in IEEE-JSTSP

  3. arXiv:2306.12913  [pdf, other

    eess.AS cs.CL cs.SD

    Implicit spoken language diarization

    Authors: Jagabandhu Mishra, Amartya Chowdhury, S. R. Mahadeva Prasanna

    Abstract: Spoken language diarization (LD) and related tasks are mostly explored using the phonotactic approach. Phonotactic approaches mostly use explicit way of language modeling, hence requiring intermediate phoneme modeling and transcribed data. Alternatively, the ability of deep learning approaches to model temporal dynamics may help for the implicit modeling of language information through deep embedd… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  4. arXiv:2302.13209  [pdf, other

    eess.AS cs.SD

    I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

    Authors: Jagabandhu Mishra, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna

    Abstract: Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is limited. In a country like India, almost all the speakers are polyglot in nature. Consequently, the development of a Multilingual SV (MSV) system on the data colle… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  5. Spoken language change detection inspired by speaker change detection

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: Spoken language change detection (LCD) refers to identifying the language transitions in a code-switched utterance. Similarly, identifying the speaker transitions in a multispeaker utterance is known as speaker change detection (SCD). Since tasks-wise both are similar, the architecture/framework developed for the SCD task may be suitable for the LCD task. Hence, the aim of the present work is to d… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  6. arXiv:2203.02680   

    eess.AS cs.SD eess.SP

    Language vs Speaker Change: A Comparative Study

    Authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna

    Abstract: Spoken language change detection (LCD) refers to detecting language switching points in a multilingual speech signal. Speaker change detection (SCD) refers to locating the speaker change points in a multispeaker speech signal. The objective of this work is to understand the challenges in LCD task by comparing it with SCD task. Human subjective study for change detection is performed for LCD and SC… ▽ More

    Submitted 6 October, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

    Comments: The work is substantially modified. The new version of the same will be submitted soon

  7. arXiv:2110.00797  [pdf, other

    eess.AS cs.SD

    Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

    Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation te… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  8. arXiv:2110.00794  [pdf, other

    cs.SD eess.AS q-bio.QM

    Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

    Authors: Protima Nomo Sudro, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The cleft lip and palate (CLP) speech intelligibility is distorted due to the deformation in their articulatory system. For addressing the same, a few previous works perform phoneme specific modification in CLP speech. In CLP speech, both the articulation error and the nasalization distorts the intelligibility of a word. Consequently, modification of a specific phoneme may not always yield in enha… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  9. Sonority Measurement Using System, Source, and Suprasegmental Information

    Authors: Bidisha Sharma, S. R. Mahadeva Prasanna

    Abstract: Sonorant sounds are characterized by regions with prominent formant structure, high energy and high degree of periodicity. In this work, the vocal-tract system, excitation source and suprasegmental features derived from the speech signal are analyzed to measure the sonority information present in each of them. Vocal-tract system information is extracted from the Hilbert envelope of numerator of gr… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 25, Issue: 3, March 2017)

  10. arXiv:2102.00270  [pdf, other

    eess.AS

    Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks

    Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S R Mahadeva Prasanna

    Abstract: Cleft lip and palate (CLP) refer to a congenital craniofacial condition that causes various speech-related disorders. As a result of structural and functional deformities, the affected subjects' speech intelligibility is significantly degraded, limiting the accessibility and usability of speech-controlled devices. Towards addressing this problem, it is desirable to improve the CLP speech intelligi… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

    Comments: 8 pages, 4 figures, IEEE spoken language and technology workshop

  11. arXiv:1811.01222  [pdf, ps, other

    eess.AS cs.SD

    Time-Frequency Audio Features for Speech-Music Classification

    Authors: Mrinmoy Bhattacharjee, S. R. M. Prasanna, Prithwijit Guha

    Abstract: Distinct striation patterns are observed in the spectrograms of speech and music. This motivated us to propose three novel time-frequency features for speech-music classification. These features are extracted in two stages. First, a preset number of prominent spectral peak locations are identified from the spectra of each frame. These important peak locations obtained from each frame are used to f… ▽ More

    Submitted 3 November, 2018; originally announced November 2018.

    Comments: 4 pages, 16 figures