Compositional Learning of Image-Text Query for Image Retrieval

Anwaar, Muhammad Umer; Labintcev, Egor; Kleinsteuber, Martin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2006.11149 (cs)

[Submitted on 19 Jun 2020 (v1), last revised 31 May 2021 (this version, v3)]

Title:Compositional Learning of Image-Text Query for Image Retrieval

Authors:Muhammad Umer Anwaar, Egor Labintcev, Martin Kleinsteuber

View PDF

Abstract:In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications. For instance, a user of an E-Commerce platform is interested in buying a dress, which should look similar to her friend's dress, but the dress should be of white color with a ribbon sash. In this case, we would like the algorithm to retrieve some dresses with desired modifications in the query dress. We propose an autoencoder based model, ComposeAE, to learn the composition of image and text query for retrieving images. We adopt a deep metric learning approach and learn a metric that pushes composition of source image and text query closer to the target images. We also propose a rotational symmetry constraint on the optimization problem. Our approach is able to outperform the state-of-the-art method TIRG \cite{TIRG} on three benchmark datasets, namely: MIT-States, Fashion200k and Fashion IQ. In order to ensure fair comparison, we introduce strong baselines by enhancing TIRG method. To ensure reproducibility of the results, we publish our code here: \url{this https URL}.

Comments:	Published at IEEE WACV 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:2006.11149 [cs.CV]
	(or arXiv:2006.11149v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2006.11149

Submission history

From: Muhammad Umer Anwaar [view email]
[v1] Fri, 19 Jun 2020 14:21:41 UTC (1,235 KB)
[v2] Sun, 28 Jun 2020 06:06:12 UTC (1,235 KB)
[v3] Mon, 31 May 2021 21:35:55 UTC (1,234 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Compositional Learning of Image-Text Query for Image Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Compositional Learning of Image-Text Query for Image Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators