Skip to main content

Showing 1–8 of 8 results for author: Sitaram, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.05078  [pdf, other

    eess.AS cs.SD

    Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

    Authors: Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah

    Abstract: This paper introduces a novel objective function for quality mean opinion score (MOS) prediction of unseen speech synthesis systems. The proposed function measures the similarity of relative positions of predicted MOS values, in a mini-batch, rather than the actual MOS values. That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss. Our exper… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to ASRU 2023

  2. arXiv:2303.06982  [pdf, other

    cs.SD eess.AS

    Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model

    Authors: Hemant Yadav, Sunayana Sitaram, Rajiv Ratn Shah

    Abstract: Recent developments in pre-trained speech representation utilizing self-supervised learning (SSL) have yielded exceptional results on a variety of downstream tasks. One such technique, known as masked predictive coding (MPC), has been employed by some of the most high-performing models. In this study, we investigate the impact of MPC loss on the type of information learnt at various layers in the… ▽ More

    Submitted 11 January, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

  3. arXiv:2211.16319  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

    Authors: Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali

    Abstract: Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minimal editing of automatic hypotheses. We validate the… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to SLT 2022

  4. arXiv:2202.12576  [pdf, ps, other

    cs.CL cs.SD eess.AS

    A Survey of Multilingual Models for Automatic Speech Recognition

    Authors: Hemant Yadav, Sunayana Sitaram

    Abstract: Although Automatic Speech Recognition (ASR) systems have achieved human-like performance for a few languages, the majority of the world's languages do not have usable systems due to the lack of large speech datasets to train these models. Cross-lingual transfer is an attractive solution to this problem, because low-resource languages can potentially benefit from higher-resource languages either th… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

    Comments: 9 pages. Submitted to LREC 2022

  5. Multilingual and code-switching ASR challenges for low resource Indian languages

    Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

    Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: 6 pages

  6. arXiv:2006.05257  [pdf, other

    eess.AS cs.CL cs.SD

    Learning not to Discriminate: Task Agnostic Learning for Improving Monolingual and Code-switched Speech Recognition

    Authors: Gurunath Reddy Madhumani, Sanket Shah, Basil Abraham, Vikas Joshi, Sunayana Sitaram

    Abstract: Recognizing code-switched speech is challenging for Automatic Speech Recognition (ASR) for a variety of reasons, including the lack of code-switched training data. Recently, we showed that monolingual ASR systems fine-tuned on code-switched data deteriorate in performance on monolingual speech recognition, which is not desirable as ASR systems deployed in multilingual scenarios should recognize bo… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 5 pages (4 pages + 1 reference), 3 tables, 2 figures

  7. arXiv:2006.00782  [pdf, other

    eess.AS cs.CL cs.SD

    Learning to Recognize Code-switched Speech Without Forgetting Monolingual Speech Recognition

    Authors: Sanket Shah, Basil Abraham, Gurunath Reddy M, Sunayana Sitaram, Vikas Joshi

    Abstract: Recently, there has been significant progress made in Automatic Speech Recognition (ASR) of code-switched speech, leading to gains in accuracy on code-switched datasets in many language pairs. Code-switched speech co-occurs with monolingual speech in one or both languages being mixed. In this work, we show that fine-tuning ASR models on code-switched speech harms performance on monolingual speech.… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: 5 pages (4 pages + 1 page references), 5 tables, 1 figure, 1 algorithm, 16 references

  8. arXiv:1906.09426  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    End-to-End ASR for Code-switched Hindi-English Speech

    Authors: Brij Mohan Lal Srivastava, Basil Abraham, Sunayana Sitaram, Rupesh Mehta, Preethi Jyothi

    Abstract: End-to-end (E2E) models have been explored for large speech corpora and have been found to match or outperform traditional pipeline-based systems in some languages. However, most prior work on end-to-end models use speech corpora exceeding hundreds or thousands of hours. In this study, we explore end-to-end models for code-switched Hindi-English language with less than 50 hours of data. We utilize… ▽ More

    Submitted 22 June, 2019; originally announced June 2019.