Deep Representation Decomposition for Rate-Invariant Speaker Verification

Tong, Fuchuan; Zheng, Siqi; Zhou, Haodong; ** block to train the parameters adversarially to minimize the cosine similarity between the two decomposed components. As a result, identity-related features become robust to speaking rate and then are used for verification. Experiments are conducted on VoxCeleb1 data and HI-MIA data to demonstrate the effectiveness of our proposed approach.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2205.14294 (eess)

[Submitted on 28 May 2022]

Title:Deep Representation Decomposition for Rate-Invariant Speaker Verification

Authors:Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li

View PDF

Abstract:While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker verification systems, which may actually degrade the system performance. To reduce intra-class discrepancy caused by speaking rate, we propose a deep representation decomposition approach with adversarial learning to learn speaking rate-invariant speaker embeddings. Specifically, adopting an attention block, we decompose the original embedding into an identity-related component and a rate-related component through multi-task training. Additionally, to reduce the latent relationship between the two decomposed components, we further propose a cosine map** block to train the parameters adversarially to minimize the cosine similarity between the two decomposed components. As a result, identity-related features become robust to speaking rate and then are used for verification. Experiments are conducted on VoxCeleb1 data and HI-MIA data to demonstrate the effectiveness of our proposed approach.

Comments:	Accepted by Odyssey 2022
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2205.14294 [eess.AS]
	(or arXiv:2205.14294v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2205.14294

Submission history

From: Fuchuan Tong [view email]
[v1] Sat, 28 May 2022 01:27:06 UTC (1,676 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Deep Representation Decomposition for Rate-Invariant Speaker Verification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Deep Representation Decomposition for Rate-Invariant Speaker Verification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators