Skip to main content

Showing 1–18 of 18 results for author: Dhawan, K

.
  1. arXiv:2406.19674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

    Authors: Krishna C. Puvvada, Piotr Żelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while b… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech-2024

  2. arXiv:2406.05298  [pdf, other

    eess.AS

    Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

    Authors: Ryan Langman, Ante Jukić, Kunal Dhawan, Nithin Rao Koluguri, Boris Ginsburg

    Abstract: Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis tasks such as text-to-speech (TTS). However, the data distribution produced by such codecs is too complex for some TTS models to predict, hence requir… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae ** Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  4. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae ** Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for develo** neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  5. arXiv:2309.10922  [pdf, other

    eess.AS cs.SD

    Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

    Authors: Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

    Abstract: Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain. To this end, various compression and representation-learning based tokenization schemes have been proposed. However, there is limited investigation into the performance of compression-based audio tokens compared… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Preprint. Submitted to ICASSP 2024

  6. arXiv:2309.08860  [pdf, other

    cs.RO

    DenseTact-Mini: An Optical Tactile Sensor for Gras** Multi-Scale Objects From Flat Surfaces

    Authors: Won Kyung Do, Ankush Kundan Dhawan, Mathilda Kitzmann, Monroe Kennedy III

    Abstract: Dexterous manipulation, especially of small daily objects, continues to pose complex challenges in robotics. This paper introduces the DenseTact-Mini, an optical tactile sensor with a soft, rounded, smooth gel surface and compact design equipped with a synthetic fingernail. We propose three distinct gras** strategies: tap gras** using adhesion forces such as electrostatic and van der Waals, fi… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  7. arXiv:2309.05248  [pdf, other

    eess.AS cs.SD

    Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

    Authors: Tae ** Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

    Abstract: Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. W… ▽ More

    Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages 1 reference page, ICASSP format

  8. arXiv:2306.08753  [pdf, other

    eess.AS cs.CL cs.SD

    Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

    Authors: Kunal Dhawan, Dima Rekesh, Boris Ginsburg

    Abstract: Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation. This paper proposes (1) a new method for creating code-switching ASR datasets from purely monolingual data sources, and (2) a novel Concatenated Tokenizer that enables ASR models to generate language ID for each emitted text token whil… ▽ More

    Submitted 16 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  9. arXiv:2109.14796  [pdf, other

    cs.CL

    Phonetic Word Embeddings

    Authors: Rahul Sharma, Kunal Dhawan, Balakrishna Pailla

    Abstract: This work presents a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds. This metric is employed to learn a continuous vector embedding space that groups similar sounding words together and can be used for various downstream computational phonology tasks. The efficacy of the method is presented for two different languages (… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

  10. arXiv:1907.08293  [pdf, other

    eess.AS cs.CL cs.SD

    Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

    Authors: Kunal Dhawan, Ganji Sreeram, Kumar Priyadarshi, Rohit Sinha

    Abstract: End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition. As the target labels are learned directly from speech data, the E2E systems need a bigger corpus for effective training. In the context of code-switching task, the E2E systems face two challenges: (i) the expansion of the target set due to multiple languages involved, and (ii) the la… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

  11. arXiv:1907.06859  [pdf, other

    eess.AS cs.SD

    Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

    Authors: Kunal Dhawan, Colin Vaz, Ruchir Travadi, Shrikanth Narayanan

    Abstract: We propose an algorithm to extract noise-robust acoustic features from noisy speech. We use Total Variability Modeling in combination with Non-negative Matrix Factorization (NMF) to learn a total variability subspace and adapt NMF dictionaries for each utterance. Unlike several other approaches for extracting noise-robust features, our algorithm does not require a training corpus of parallel clean… ▽ More

    Submitted 16 July, 2019; originally announced July 2019.

  12. arXiv:1907.06342  [pdf, other

    cs.CL cs.SD eess.AS

    Joint Language Identification of Code-Switching Speech using Attention based E2E Network

    Authors: Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha

    Abstract: Language identification (LID) has relevance in many speech processing applications. For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance. In the existing works, the LID on code-switching speech involves modelling of the underlying languages separately. In this work, we propose a joint… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

  13. arXiv:1810.00662  [pdf, other

    cs.CL

    Hindi-English Code-Switching Speech Corpus

    Authors: Ganji Sreeram, Kunal Dhawan, Rohit Sinha

    Abstract: Code-switching refers to the usage of two languages within a sentence or discourse. It is a global phenomenon among multilingual communities and has emerged as an independent area of research. With the increasing demand for the code-switching automatic speech recognition (ASR) systems, the development of a code-switching speech corpus has become highly desirable. However, for training such systems… ▽ More

    Submitted 23 September, 2018; originally announced October 2018.

  14. From fusion to total disassembly: global stop** in heavy-ion collisions

    Authors: Jatinder K. Dhawan, Narinder Dhiman, Aman D. Sood, Rajeev K. Puri

    Abstract: Using the quantum molecular dynamics model, we aim to investigate the emis- sion of light complex particles, and degree of stop** reached in heavy-ion colli- sions. We took incident energies between 50 and 1000 MeV/nucleon. In addition, central and peripheral collisions and different masses are also considered. We ob- serve that the light complex particles act in almost similar manner as anisotr… ▽ More

    Submitted 12 July, 2010; originally announced July 2010.

    Journal ref: Phys.Rev.C74:057901,2006

  15. Study of fragmentation using clusterization algorithm with realistic binding energies

    Authors: Yogesh K. Vermani, Jatinder K. Dhawan, Supriya Goyal, Rajeev K. Puri, J. Aichelin

    Abstract: We here study fragmentation using \emph{simulated annealing clusterization algorithm} (SACA) with binding energy at a microscopic level. In an earlier version, a constant binding energy (4 MeV/nucleon) was used. We improve this binding energy criterion by calculating the binding energy of different clusters using modified Bethe-Weizsäcker mass (BWM) formula. We also compare our calculations with… ▽ More

    Submitted 28 December, 2009; originally announced December 2009.

    Journal ref: J.Phys.G37:015105,2010

  16. An Improved Limit on the Muon Electric Dipole Moment

    Authors: G. W. Bennett, B. Bousquet, H. N. Brown, G. Bunce, R. M. Carey, P. Cushman, G. T. Danby, P. T. Debevec, M. Deile, H. Deng, W. Deninger, S. K. Dhawan, V. P. Druzhinin, L. Duong, E. Efstathiadis, F. J. M. Farley, G. V. Fedotovich, S. Giron, F. E. Gray, D. Grigoriev, M. Grosse-Perdekamp, A. Grossmann, M. F. Hare, D. W. Hertzog, X. Huang , et al. (51 additional authors not shown)

    Abstract: Three independent searches for an electric dipole moment (EDM) of the positive and negative muons have been performed, using spin precession data from the muon g-2 storage ring at Brookhaven National Laboratory. Details on the experimental apparatus and the three analyses are presented. Since the individual results on the positive and negative muon, as well as the combined result, d=-0.1(0.9)E-1… ▽ More

    Submitted 26 July, 2009; v1 submitted 7 November, 2008; originally announced November 2008.

    Comments: 19 pages, 15 figures, 7 tables

  17. High-statistics measurement of the pion form factor in the rho-meson energy range with the CMD-2 detector

    Authors: The CMD-2 Collaboration, :, R. R. Akhmetshin, V. M. Aulchenko, V. Sh. Banzarov, L. M. Barkov, N. S. Bashtovoy, A. E. Bondar, D. V. Bondarev, A. V. Bragin, S. K. Dhawan, S. I. Eidelman, D. A. Epifanov, G. V. Fedotovich, N. I. Gabyshev, D. A. Gorbachev, A. A. Grebenuk, D. N. Grigoriev, V. W. Hughes, F. V. Ignatov, S. V. Karpov, V. F. Kazanin, B. I. Khazin, I. A. Koop, P. P. Krokovny , et al. (31 additional authors not shown)

    Abstract: We present a measurement of the pion form factor based on e+e- annihilation data from the CMD-2 detector in the energy range 0.6<sqrt(s)<1.0 GeV with a systematic uncertainty of 0.8%. A data sample is five times larger than that used in our previous measurement.

    Submitted 28 January, 2007; v1 submitted 9 October, 2006; originally announced October 2006.

    Comments: 18 pages, 10 figures. Added comparison with KLOE measurement, minor updates. Accepted by PLB

    Journal ref: Phys.Lett.B648:28-38,2007

  18. Measurement of the e+e- -> pi+pi- cross section with the CMD-2 detector in the 370-520 MeV c.m. energy range

    Authors: R. R. Akhmetshin, V. M. Aulchenko, V. Sh. Banzarov, L. M. Barkov, N. S. Bashtovoy, A. E. Bondar, D. V. Bondarev, A. V. Bragin, S. K. Dhawan, S. I. Eidelman, D. A. Epifanov, G. V. Fedotovich, N. I. Gabyshev, D. A. Gorbachev, A. A. Grebenuk, D. N. Grigoriev, V. W. Hughes, F. V. Ignatov, S. V. Karpov, V. F. Kazanin, B. I. Khazin, I. A. Koop, P. P. Krokovny, A. S. Kuzmin, I. B. Logashenko , et al. (28 additional authors not shown)

    Abstract: The cross section of the process e+e- -> pi+pi- has been measured at the CMD-2 detector in the 370-520 MeV center-of-mass (c.m.) energy range. A systematic uncertainty of the measurement is 0.7 %. Using all CMD-2 data on the pion form factor, the pion electromagnetic radius was calculated. The cross section of muon pair production was also determined.

    Submitted 6 October, 2006; v1 submitted 6 October, 2006; originally announced October 2006.

    Comments: 11 pages, 4 figures

    Journal ref: JETPLett.84:413-417,2006; PismaZh.Eksp.Teor.Fiz.84:491-495,2006