Skip to main content

Showing 1–2 of 2 results for author: Kumar, V B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08641  [pdf, ps, other

    cs.SD cs.CL eess.AS

    ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

    Authors: Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, **chuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe

    Abstract: ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a ne… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  2. arXiv:2303.10160  [pdf, other

    eess.AS cs.LG cs.SD

    Visual Information Matters for ASR Error Correction

    Authors: Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang

    Abstract: Aiming to improve the Automatic Speech Recognition (ASR) outputs with a post-processing step, ASR error correction (EC) techniques have been widely developed due to their efficiency in using parallel text data. Previous works mainly focus on using text or/ and speech data, which hinders the performance gain when not only text and speech information, but other modalities, such as visual information… ▽ More

    Submitted 26 May, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023