Skip to main content

Showing 1–5 of 5 results for author: Zevallos, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.03639  [pdf, ps, other

    cs.CL eess.AS

    Evaluating Self-Supervised Speech Representations for Indigenous American Languages

    Authors: Chih-Chen Chen, William Chen, Rodolfo Zevallos, John E. Ortega

    Abstract: The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider in… ▽ More

    Submitted 8 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  2. arXiv:2207.06872  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Data Augmentation for Low-Resource Quechua ASR Improvement

    Authors: Rodolfo Zevallos, Nuria Bel, Guillermo Cámbara, Mireia Farrús, Jordi Luque

    Abstract: Automatic Speech Recognition (ASR) is a key element in new services that helps users to interact with an automated system. Deep learning methods have made it possible to deploy systems with word error rates below 5% for ASR of English. However, the use of these methods is only available for languages with hundreds or thousands of hours of audio and their corresponding transcriptions. For the so-ca… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted to INTERSPEECH 2022. arXiv admin note: substantial text overlap with arXiv:2204.00291

  3. arXiv:2207.05498  [pdf, ps, other

    cs.CL

    Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition

    Authors: Rodolfo Zevallos, Luis Camacho, Nelsi Melgarejo

    Abstract: The Huqariq corpus is a multilingual collection of speech from native Peruvian languages. The transcribed corpus is intended for the research and development of speech technologies to preserve endangered languages in Peru. Huqariq is primarily designed for the development of automatic speech recognition, language identification and text-to-speech tools. In order to achieve corpus collection sustai… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Language Resources and Evaluation Conference (LREC 2022)

  4. arXiv:2205.15599  [pdf, other

    cs.CL

    Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish

    Authors: Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon

    Abstract: We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, the exiled language of Sephardic Jews, which survived for centuries, but now faces the threat of extinction in the digital age. Building on resources created by the Sephardic community of Turkey and elsewhere, we create corpora and tools that would help preserve this language for fu… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  5. arXiv:2204.00291  [pdf

    cs.CL cs.SD eess.AS

    Text-To-Speech Data Augmentation for Low Resource Speech Recognition

    Authors: Rodolfo Zevallos

    Abstract: Nowadays, the main problem of deep learning techniques used in the development of automatic speech recognition (ASR) models is the lack of transcribed data. The goal of this research is to propose a new data augmentation method to improve ASR models for agglutinative and low-resource languages. This novel data augmentation method generates both synthetic text and synthetic audio. Some experiments… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.