XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words

Algayres, Robin; Diego-Simon, Pablo; Sagot, Benoit; Dupoux, Emmanuel

Computer Science > Computation and Language

arXiv:2310.05235 (cs)

[Submitted on 8 Oct 2023]

Title:XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words

Authors:Robin Algayres, Pablo Diego-Simon, Benoit Sagot, Emmanuel Dupoux

View PDF

Abstract:Due to the absence of explicit word boundaries in the speech stream, the task of segmenting spoken sentences into word units without text supervision is particularly challenging. In this work, we leverage the most recent self-supervised speech models that have proved to quickly adapt to new tasks through fine-tuning, even in low resource conditions. Taking inspiration from semi-supervised learning, we fine-tune an XLS-R model to predict word boundaries themselves produced by top-tier speech segmentation systems: DPDP, VG-HuBERT, GradSeg and DP-Parse. Once XLS-R is fine-tuned, it is used to infer new word boundary labels that are used in turn for another fine-tuning step. Our method consistently improves the performance of each system and sets a new state-of-the-art that is, on average 130% higher than the previous one as measured by the F1 score on correctly discovered word tokens on five corpora featuring different languages. Finally, our system can segment speech from languages unseen during fine-tuning in a zero-shot fashion.

Comments:	Findings at EMNLP 2023
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2310.05235 [cs.CL]
	(or arXiv:2310.05235v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05235

Submission history

From: Robin Algayres [view email]
[v1] Sun, 8 Oct 2023 17:05:00 UTC (591 KB)

Computer Science > Computation and Language

Title:XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators