Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Yang, Bo; Wang, Chen; Ma, **; Liu, Zhuang; Sun, Fangde

doi:10.3390/rs16101653

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.02141 (cs)

[Submitted on 3 Feb 2024 (v1), last revised 16 May 2024 (this version, v3)]

Title:Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Authors:Bo Yang, Chen Wang, ** Song, Zhuang Liu, Fangde Sun

View PDF

Abstract:Effectively and efficiently retrieving images from remote sensing databases is a critical challenge in the realm of remote sensing big data. Utilizing hand-drawn sketches as retrieval inputs offers intuitive and user-friendly advantages, yet the potential of multi-level feature integration from sketches remains underexplored, leading to suboptimal retrieval performance. To address this gap, our study introduces a novel zero-shot, sketch-based retrieval method for remote sensing images, leveraging multi-level feature extraction, self-attention-guided tokenization and filtering, and cross-modality attention update. This approach employs only vision information and does not require semantic knowledge concerning the sketch and image. It starts by employing multi-level self-attention guided feature extraction to tokenize the query sketches, as well as self-attention feature extraction to tokenize the candidate images. It then employs cross-attention mechanisms to establish token correspondence between these two modalities, facilitating the computation of sketch-to-image similarity. Our method significantly outperforms existing sketch-based remote sensing image retrieval techniques, as evidenced by tests on multiple datasets. Notably, it also exhibits robust zero-shot learning capabilities and strong generalizability in handling unseen categories and novel remote sensing data. The method's scalability can be further enhanced by the pre-calculation of retrieval tokens for all candidate images in a database. This research underscores the significant potential of multi-level, attention-guided tokenization in cross-modal remote sensing image retrieval. For broader accessibility and research facilitation, we have made the code and dataset used in this study publicly available online. Code and dataset are available at this https URL.

Comments:	44 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.02141 [cs.CV]
	(or arXiv:2402.02141v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.02141
Journal reference:	Remote Sens. 2024, 16, 1653
Related DOI:	https://doi.org/10.3390/rs16101653

Submission history

From: Bo Yang [view email]
[v1] Sat, 3 Feb 2024 13:11:14 UTC (1,484 KB)
[v2] Tue, 5 Mar 2024 12:15:57 UTC (1,076 KB)
[v3] Thu, 16 May 2024 03:00:22 UTC (1,156 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators