data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

Lodagala, Vasista Sai; Ghosh, Sreyan; Umesh, S.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2211.01246 (eess)

[Submitted on 2 Nov 2022 (v1), last revised 13 May 2023 (this version, v2)]

Title:data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

Authors:Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

View PDF

Abstract:In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations, and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. data2vec-aqc achieves up to 14.1% and 20.9% relative WER improvement over the existing state-of-the-art data2vec system over the test-clean and test-other sets, respectively of LibriSpeech, without the use of any language model (LM). Our proposed model also achieves up to 17.8\% relative WER gains over the baseline data2vec when fine-tuned on a subset of the Switchboard dataset. Code: this https URL.

Comments:	Accepted to ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2211.01246 [eess.AS]
	(or arXiv:2211.01246v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2211.01246

Submission history

From: Sreyan Ghosh [view email]
[v1] Wed, 2 Nov 2022 16:29:59 UTC (2,058 KB)
[v2] Sat, 13 May 2023 21:16:36 UTC (2,059 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators