Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Wen, Peisong; Xu, Qianqian; Jiang, Yangbangyan; Yang, Zhiyong; He, Yuan; Huang, Qingming

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.07293 (cs)

[Submitted on 12 Mar 2021]

Title:Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Authors:Peisong Wen, Qianqian Xu, Yangbangyan Jiang, Zhiyong Yang, Yuan He, Qingming Huang

View PDF

Abstract:Nowadays, we have witnessed the early progress on learning the association between voice and face automatically, which brings a new wave of studies to the computer vision community. However, most of the prior arts along this line (a) merely adopt local information to perform modality alignment and (b) ignore the diversity of learning difficulty across different subjects. In this paper, we propose a novel framework to jointly address the above-mentioned issues. Targeting at (a), we propose a two-level modality alignment loss where both global and local information are considered. Compared with the existing methods, we introduce a global loss into the modality alignment process. The global component of the loss is driven by the identity classification. Theoretically, we show that minimizing the loss could maximize the distance between embeddings across different identities while minimizing the distance between embeddings belonging to the same identity, in a global sense (instead of a mini-batch). Targeting at (b), we propose a dynamic reweighting scheme to better explore the hard but valuable identities while filtering out the unlearnable identities. Experiments show that the proposed method outperforms the previous methods in multiple settings, including voice-face matching, verification and retrieval.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.07293 [cs.CV]
	(or arXiv:2103.07293v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.07293

Submission history

From: Peisong Wen [view email]
[v1] Fri, 12 Mar 2021 14:10:48 UTC (810 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Qianqian Xu
Yangbangyan Jiang
Zhiyong Yang
Yuan He
Qingming Huang

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators