Image-text Retrieval via Preserving Main Semantics of Vision

Zhang, Xu; Niu, Xinzheng; Fournier-Viger, Philippe; Dai, Xudong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.10254 (cs)

[Submitted on 20 Apr 2023 (v1), last revised 28 Apr 2023 (this version, v2)]

Title:Image-text Retrieval via Preserving Main Semantics of Vision

Authors:Xu Zhang, Xinzheng Niu, Philippe Fournier-Viger, Xudong Dai

View PDF

Abstract:Image-text retrieval is one of the major tasks of cross-modal retrieval. Several approaches for this task map images and texts into a common space to create correspondences between the two modalities. However, due to the content (semantics) richness of an image, redundant secondary information in an image may cause false matches. To address this issue, this paper presents a semantic optimization approach, implemented as a Visual Semantic Loss (VSL), to assist the model in focusing on an image's main content. This approach is inspired by how people typically annotate the content of an image by describing its main content. Thus, we leverage the annotated texts corresponding to an image to assist the model in capturing the main content of the image, reducing the negative impact of secondary content. Extensive experiments on two benchmark datasets (MSCOCO and Flickr30K) demonstrate the superior performance of our method. The code is available at: this https URL.

Comments:	6 pages, 3 figures, accepted by ICME2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.10254 [cs.CV]
	(or arXiv:2304.10254v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.10254

Submission history

From: Xu Zhang Zhang [view email]
[v1] Thu, 20 Apr 2023 12:23:29 UTC (965 KB)
[v2] Fri, 28 Apr 2023 08:09:54 UTC (965 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image-text Retrieval via Preserving Main Semantics of Vision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image-text Retrieval via Preserving Main Semantics of Vision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators