Skip to main content

Showing 1–7 of 7 results for author: Khapra, M M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.15760  [pdf, other

    cs.CL cs.SD eess.AS

    Svarah: Evaluating English ASR Systems on Indian Accents

    Authors: Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

    Abstract: India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  2. arXiv:2305.15386  [pdf, other

    cs.CL cs.SD eess.AS

    Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR

    Authors: Kaushal Santosh Bhogale, Sai Sundaresan, Abhigyan Raman, Tahir Javed, Mitesh M. Khapra, Pratyush Kumar

    Abstract: Improving ASR systems is necessary to make new LLM-based use-cases accessible to people across the globe. In this paper, we focus on Indian languages, and make the case that diverse benchmarks are required to evaluate and improve ASR systems for Indian languages. To address this, we collate Vistaar as a set of 59 benchmarks across various language and domain combinations, on which we evaluate 3 pu… ▽ More

    Submitted 2 August, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  3. arXiv:2211.09536  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Towards Building Text-To-Speech Systems for the Next Billion Users

    Authors: Gokul Karthik Kumar, Praveen S V, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

    Abstract: Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively l… ▽ More

    Submitted 17 February, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023. Gokul and Praveen contributed equally

  4. arXiv:2208.12666  [pdf, other

    cs.CL cs.SD eess.AS

    Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

    Authors: Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

    Abstract: End-to-end (E2E) models have become the default choice for state-of-the-art speech recognition systems. Such models are trained on large amounts of labelled data, which are often not available for low-resource languages. Techniques such as self-supervised learning and transfer learning hold promise, but have not yet been effective in training accurate models. On the other hand, collecting labelled… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  5. arXiv:2208.11761  [pdf, other

    cs.CL cs.SD eess.AS

    IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages

    Authors: Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

    Abstract: A cornerstone in AI research has been the creation and adoption of standardized training and test datasets to earmark the progress of state-of-the-art models. A particularly successful example is the GLUE dataset for training and evaluating Natural Language Understanding (NLU) models for English. The large body of research around self-supervised BERT-based language models revolved around performan… ▽ More

    Submitted 15 December, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

  6. arXiv:2111.03945  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Building ASR Systems for the Next Billion Users

    Authors: Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

    Abstract: Recent methods in speech and language technology pretrain very LARGE models which are fine-tuned for specific tasks. However, the benefits of such LARGE models are often limited to a few resource rich languages of the world. In this work, we make multiple contributions towards building ASR systems for low resource languages from the Indian subcontinent. First, we curate 17,000 hours of raw speech… ▽ More

    Submitted 22 December, 2021; v1 submitted 6 November, 2021; originally announced November 2021.

  7. arXiv:2011.15045  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Unsupervised Deep Video Denoising

    Authors: Dev Yashpal Sheth, Sreyas Mohan, Joshua L. Vincent, Ramon Manzorro, Peter A. Crozier, Mitesh M. Khapra, Eero P. Simoncelli, Carlos Fernandez-Granda

    Abstract: Deep convolutional neural networks (CNNs) for video denoising are typically trained with supervision, assuming the availability of clean videos. However, in many applications, such as microscopy, noiseless videos are not available. To address this, we propose an Unsupervised Deep Video Denoiser (UDVD), a CNN architecture designed to be trained exclusively with noisy data. The performance of UDVD i… ▽ More

    Submitted 19 August, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

    Comments: Dev and Sreyas contributed equally. To appear at 2021 IEEE/CVF International Conference on Computer Vision (ICCV). See https://sreyas-mohan.github.io/udvd/ for code and more results