Skip to main content

Showing 1–1 of 1 results for author: Cerón, J F

.
  1. arXiv:2111.09344  [pdf, other

    cs.LG stat.ML

    The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

    Authors: Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi

    Abstract: The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions. We describe our data collection methodology and release our data collection… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Part of 2021 Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks