A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Rai, Anand Kumar; Jaiswal, Siddharth D; Mukherjee, Animesh

Computer Science > Computation and Language

arXiv:2307.10587 (cs)

[Submitted on 20 Jul 2023]

Title:A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Authors:Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

View PDF

Abstract:Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.

Subjects:	Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2307.10587 [cs.CL]
	(or arXiv:2307.10587v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2307.10587

Submission history

From: Anand Kumar Rai [view email]
[v1] Thu, 20 Jul 2023 05:03:00 UTC (4,544 KB)

Computer Science > Computation and Language

Title:A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators