Uncertainty-Aware Multi-View Visual Semantic Embedding

Wei, Wenzhang; Gui, Zhipeng; Wu, Changguang; Zhao, Anqi; Wang, Xingguang; Wu, Huayi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.08154v1 (cs)

[Submitted on 15 Sep 2023 (this version), latest version 21 Dec 2023 (v2)]

Title:Uncertainty-Aware Multi-View Visual Semantic Embedding

Authors:Wenzhang Wei, Zhipeng Gui, Changguang Wu, Anqi Zhao, Xingguang Wang, Huayi Wu

View PDF

Abstract:The key challenge in image-text retrieval is effectively leveraging semantic information to measure the similarity between vision and language data. However, using instance-level binary labels, where each image is paired with a single text, fails to capture multiple correspondences between different semantic units, leading to uncertainty in multi-modal semantic understanding. Although recent research has captured fine-grained information through more complex model structures or pre-training techniques, few studies have directly modeled uncertainty of correspondence to fully exploit binary labels. To address this issue, we propose an Uncertainty-Aware Multi-View Visual Semantic Embedding (UAMVSE)} framework that decomposes the overall image-text matching into multiple view-text matchings. Our framework introduce an uncertainty-aware loss function (UALoss) to compute the weighting of each view-text loss by adaptively modeling the uncertainty in each view-text correspondence. Different weightings guide the model to focus on different semantic information, enhancing the model's ability to comprehend the correspondence of images and texts. We also design an optimized image-text matching strategy by normalizing the similarity matrix to improve model performance. Experimental results on the Flicker30k and MS-COCO datasets demonstrate that UAMVSE outperforms state-of-the-art models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:2309.08154 [cs.CV]
	(or arXiv:2309.08154v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.08154

Submission history

From: WenZhang Wei [view email]
[v1] Fri, 15 Sep 2023 04:39:11 UTC (13,422 KB)
[v2] Thu, 21 Dec 2023 03:53:38 UTC (12,273 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Uncertainty-Aware Multi-View Visual Semantic Embedding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Uncertainty-Aware Multi-View Visual Semantic Embedding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators