Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Zeng, Chang; Wang, Xin; Cooper, Erica; Miao, Xiaoxiao; Yamagishi, Junichi

doi:10.1109/ICASSP43922.2022.9746688

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2104.01541 (eess)

[Submitted on 4 Apr 2021 (v1), last revised 6 Oct 2021 (this version, v2)]

Title:Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Authors:Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

View PDF

Abstract:Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities. To make better use of multiple enrollment utterances, we propose a novel attention back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we conduct a series of experiments on CNCeleb and VoxCeleb datasets by combining it with several sate-of-the-art speaker encoders including TDNN and ResNet. Experimental results using multiple enrollment utterances on CNCeleb show that the proposed attention back-end model leads to lower EER and minDCF score than the PLDA and cosine similarity counterparts for each speaker encoder and an experiment on VoxCeleb indicate that our model can be used even for single enrollment case.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2104.01541 [eess.AS]
	(or arXiv:2104.01541v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2104.01541
Related DOI:	https://doi.org/10.1109/ICASSP43922.2022.9746688

Submission history

From: Chang Zeng [view email]
[v1] Sun, 4 Apr 2021 05:42:56 UTC (4,088 KB)
[v2] Wed, 6 Oct 2021 01:46:16 UTC (2,620 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators