Do self-supervised speech models develop human-like perception biases?

Millet, Juliette; Dunbar, Ewan

Computer Science > Computation and Language

arXiv:2205.15819 (cs)

[Submitted on 31 May 2022]

Title:Do self-supervised speech models develop human-like perception biases?

Authors:Juliette Millet, Ewan Dunbar

View PDF

Abstract:Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners' native languages. Does the same thing happen in self-supervised models? We examine the representational spaces of three kinds of state-of-the-art self-supervised models: wav2vec 2.0, HuBERT and contrastive predictive coding (CPC), and compare them with the perceptual spaces of French-speaking and English-speaking human listeners, both globally and taking account of the behavioural differences between the two language groups. We show that the CPC model shows a small native language effect, but that wav2vec 2.0 and HuBERT seem to develop a universal speech perception space which is not language specific. A comparison against the predictions of supervised phone recognisers suggests that all three self-supervised models capture relatively fine-grained perceptual phenomena, while supervised models are better at capturing coarser, phone-level, effects of listeners' native language, on perception.

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2205.15819 [cs.CL]
	(or arXiv:2205.15819v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.15819
Journal reference:	2022. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7591-7605, Dublin, Ireland. Association for Computational Linguistics

Submission history

From: Ewan Dunbar [view email]
[v1] Tue, 31 May 2022 14:21:40 UTC (1,398 KB)

Computer Science > Computation and Language

Title:Do self-supervised speech models develop human-like perception biases?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Do self-supervised speech models develop human-like perception biases?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators