Skip to main content

Showing 1–9 of 9 results for author: Bhogale, K

.
  1. arXiv:2403.01926  [pdf, other

    cs.CL

    IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages

    Authors: Tahir Javed, Janki Atul Nawale, Eldho Ittan George, Sakshi Joshi, Kaushal Santosh Bhogale, Deovrat Mehendale, Ishvinder Virender Sethi, Aparna Ananthanarayanan, Hafsah Faquih, Pratiti Palit, Sneha Ravishankar, Saranya Sukumaran, Tripura Panchagnula, Sunjay Murali, Kunal Sharad Gandhi, Ambujavalli R, Manickam K M, C Venkata Vaijayanthi, Krishnan Srinivasa Raghavan Karunganni, Pratyush Kumar, Mitesh M Khapra

    Abstract: We present INDICVOICES, a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language. Through this paper, we share our journey of capturing the cultural,… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  2. arXiv:2305.15760  [pdf, other

    cs.CL cs.SD eess.AS

    Svarah: Evaluating English ASR Systems on Indian Accents

    Authors: Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

    Abstract: India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  3. arXiv:2305.15386  [pdf, other

    cs.CL cs.SD eess.AS

    Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR

    Authors: Kaushal Santosh Bhogale, Sai Sundaresan, Abhigyan Raman, Tahir Javed, Mitesh M. Khapra, Pratyush Kumar

    Abstract: Improving ASR systems is necessary to make new LLM-based use-cases accessible to people across the globe. In this paper, we focus on Indian languages, and make the case that diverse benchmarks are required to evaluate and improve ASR systems for Indian languages. To address this, we collate Vistaar as a set of 59 benchmarks across various language and domain combinations, on which we evaluate 3 pu… ▽ More

    Submitted 2 August, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  4. arXiv:2210.09866  [pdf, other

    cs.CV cs.LG

    Towards Efficient and Effective Self-Supervised Learning of Visual Representations

    Authors: Sravanti Addepalli, Kaushal Bhogale, Priyam Dey, R. Venkatesh Babu

    Abstract: Self-supervision has emerged as a propitious method for visual representation learning after the recent paradigm shift from handcrafted pretext tasks to instance-similarity based approaches. Most state-of-the-art methods enforce similarity between various augmentations of a given image, while some methods additionally use contrastive approaches to explicitly ensure diverse representations. While t… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: ECCV 2022

  5. arXiv:2208.12666  [pdf, other

    cs.CL cs.SD eess.AS

    Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

    Authors: Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

    Abstract: End-to-end (E2E) models have become the default choice for state-of-the-art speech recognition systems. Such models are trained on large amounts of labelled data, which are often not available for low-resource languages. Techniques such as self-supervised learning and transfer learning hold promise, but have not yet been effective in training accurate models. On the other hand, collecting labelled… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  6. arXiv:2208.11761  [pdf, other

    cs.CL cs.SD eess.AS

    IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages

    Authors: Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

    Abstract: A cornerstone in AI research has been the creation and adoption of standardized training and test datasets to earmark the progress of state-of-the-art models. A particularly successful example is the GLUE dataset for training and evaluating Natural Language Understanding (NLU) models for English. The large body of research around self-supervised BERT-based language models revolved around performan… ▽ More

    Submitted 15 December, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

  7. arXiv:2111.03945  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Building ASR Systems for the Next Billion Users

    Authors: Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

    Abstract: Recent methods in speech and language technology pretrain very LARGE models which are fine-tuned for specific tasks. However, the benefits of such LARGE models are often limited to a few resource rich languages of the world. In this work, we make multiple contributions towards building ASR systems for low resource languages from the Indian subcontinent. First, we curate 17,000 hours of raw speech… ▽ More

    Submitted 22 December, 2021; v1 submitted 6 November, 2021; originally announced November 2021.

  8. arXiv:2011.00809  [pdf, other

    cs.CV eess.IV

    Data-free Knowledge Distillation for Segmentation using Data-Enriching GAN

    Authors: Kaushal Bhogale

    Abstract: Distilling knowledge from huge pre-trained networks to improve the performance of tiny networks has favored deep learning models to be used in many real-time and mobile applications. Several approaches that demonstrate success in this field have made use of the true training dataset to extract relevant knowledge. In absence of the True dataset, however, extracting knowledge from deep networks is s… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  9. arXiv:1811.06194  [pdf

    cs.CV

    Face Verification and Forgery Detection for Ophthalmic Surgery Images

    Authors: Kaushal Bhogale, Nishant Shankar, Adheesh Juvekar, Asutosh Padhi

    Abstract: Although modern face verification systems are accessible and accurate, they are not always robust to pose variance and occlusions. Moreover, accurate models require a large amount of data to train. We structure our experiments to operate on small amounts of data obtained from an NGO that funds ophthalmic surgeries. We set up our face verification task as that of verifying pre-operation and post-op… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.