Skip to main content

Showing 1–3 of 3 results for author: Mitra, A C

.
  1. arXiv:2403.00212  [pdf, other

    cs.CL cs.CV cs.LG cs.SD eess.AS

    Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

    Authors: Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

    Abstract: This research addresses the challenge of training an ASR model for personalized voices with minimal data. Utilizing just 14 minutes of custom audio from a YouTube video, we employ Retrieval-Based Voice Conversion (RVC) to create a custom Common Voice 16.0 corpus. Subsequently, a Cross-lingual Self-supervised Representations (XLSR) Wav2Vec2 model is fine-tuned on this dataset. The developed web-bas… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  2. arXiv:2401.06183  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2

    Authors: Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

    Abstract: Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a transformative solution to this persistent obstacle an end-to-end speech conversion framework tailored for Hindi-to-English translation, culminating in the synthesis of English audio. By integrating cutting-edge technologies… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  3. arXiv:2311.14836  [pdf, other

    cs.SD cs.CL eess.AS

    Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

    Authors: Anand Kamble, Aniket Tathe, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

    Abstract: This paper proposes two innovative methodologies to construct customized Common Voice datasets for low-resource languages like Hindi. The first methodology leverages Bark, a transformer-based text-to-audio model developed by Suno, and incorporates Meta's enCodec and a pre-trained HuBert model to enhance Bark's performance. The second methodology employs Retrieval-Based Voice Conversion (RVC) and u… ▽ More

    Submitted 9 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.