Skip to main content

Showing 1–12 of 12 results for author: Dhawan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

    Authors: Krishna C. Puvvada, Piotr Żelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while b… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech-2024

  2. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae ** Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  3. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae ** Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for develo** neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  4. arXiv:2309.10922  [pdf, other

    eess.AS cs.SD

    Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

    Authors: Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

    Abstract: Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain. To this end, various compression and representation-learning based tokenization schemes have been proposed. However, there is limited investigation into the performance of compression-based audio tokens compared… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Preprint. Submitted to ICASSP 2024

  5. arXiv:2309.08860  [pdf, other

    cs.RO

    DenseTact-Mini: An Optical Tactile Sensor for Gras** Multi-Scale Objects From Flat Surfaces

    Authors: Won Kyung Do, Ankush Kundan Dhawan, Mathilda Kitzmann, Monroe Kennedy III

    Abstract: Dexterous manipulation, especially of small daily objects, continues to pose complex challenges in robotics. This paper introduces the DenseTact-Mini, an optical tactile sensor with a soft, rounded, smooth gel surface and compact design equipped with a synthetic fingernail. We propose three distinct gras** strategies: tap gras** using adhesion forces such as electrostatic and van der Waals, fi… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  6. arXiv:2309.05248  [pdf, other

    eess.AS cs.SD

    Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

    Authors: Tae ** Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

    Abstract: Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. W… ▽ More

    Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages 1 reference page, ICASSP format

  7. arXiv:2306.08753  [pdf, other

    eess.AS cs.CL cs.SD

    Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

    Authors: Kunal Dhawan, Dima Rekesh, Boris Ginsburg

    Abstract: Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation. This paper proposes (1) a new method for creating code-switching ASR datasets from purely monolingual data sources, and (2) a novel Concatenated Tokenizer that enables ASR models to generate language ID for each emitted text token whil… ▽ More

    Submitted 16 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  8. arXiv:2109.14796  [pdf, other

    cs.CL

    Phonetic Word Embeddings

    Authors: Rahul Sharma, Kunal Dhawan, Balakrishna Pailla

    Abstract: This work presents a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds. This metric is employed to learn a continuous vector embedding space that groups similar sounding words together and can be used for various downstream computational phonology tasks. The efficacy of the method is presented for two different languages (… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

  9. arXiv:1907.08293  [pdf, other

    eess.AS cs.CL cs.SD

    Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

    Authors: Kunal Dhawan, Ganji Sreeram, Kumar Priyadarshi, Rohit Sinha

    Abstract: End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition. As the target labels are learned directly from speech data, the E2E systems need a bigger corpus for effective training. In the context of code-switching task, the E2E systems face two challenges: (i) the expansion of the target set due to multiple languages involved, and (ii) the la… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

  10. arXiv:1907.06859  [pdf, other

    eess.AS cs.SD

    Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

    Authors: Kunal Dhawan, Colin Vaz, Ruchir Travadi, Shrikanth Narayanan

    Abstract: We propose an algorithm to extract noise-robust acoustic features from noisy speech. We use Total Variability Modeling in combination with Non-negative Matrix Factorization (NMF) to learn a total variability subspace and adapt NMF dictionaries for each utterance. Unlike several other approaches for extracting noise-robust features, our algorithm does not require a training corpus of parallel clean… ▽ More

    Submitted 16 July, 2019; originally announced July 2019.

  11. arXiv:1907.06342  [pdf, other

    cs.CL cs.SD eess.AS

    Joint Language Identification of Code-Switching Speech using Attention based E2E Network

    Authors: Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha

    Abstract: Language identification (LID) has relevance in many speech processing applications. For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance. In the existing works, the LID on code-switching speech involves modelling of the underlying languages separately. In this work, we propose a joint… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

  12. arXiv:1810.00662  [pdf, other

    cs.CL

    Hindi-English Code-Switching Speech Corpus

    Authors: Ganji Sreeram, Kunal Dhawan, Rohit Sinha

    Abstract: Code-switching refers to the usage of two languages within a sentence or discourse. It is a global phenomenon among multilingual communities and has emerged as an independent area of research. With the increasing demand for the code-switching automatic speech recognition (ASR) systems, the development of a code-switching speech corpus has become highly desirable. However, for training such systems… ▽ More

    Submitted 23 September, 2018; originally announced October 2018.