MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

Yoon, Ji Won; Kim, Seok Min; Kim, Nam Soo

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2306.08463 (eess)

[Submitted on 14 Jun 2023]

Title:MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

Authors:Ji Won Yoon, Seok Min Kim, Nam Soo Kim

View PDF

Abstract:Self-supervised learning (SSL) has shown significant progress in speech processing tasks. However, despite the intrinsic randomness in the Transformer structure, such as dropout variants and layer-drop, improving the model-level consistency remains under-explored in the speech SSL literature. To address this, we propose a new pre-training method that uses consistency regularization to improve Data2vec 2.0, the recent state-of-the-art (SOTA) SSL model. Specifically, the proposed method involves sampling two different student sub-models within the Data2vec 2.0 framework, enabling two output variants derived from a single input without additional parameters. Subsequently, we regularize the outputs from the student sub-models to be consistent and require them to predict the representation of the teacher model. Our experimental results demonstrate that the proposed approach improves the SSL model's robustness and generalization ability, resulting in SOTA results on the SUPERB benchmark.

Comments:	INTERSPEECH 2023
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2306.08463 [eess.AS]
	(or arXiv:2306.08463v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2306.08463

Submission history

From: Ji Won Yoon [view email]
[v1] Wed, 14 Jun 2023 12:17:43 UTC (1,162 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators