Skip to main content

Showing 1–1 of 1 results for author: Yankai, R

.
  1. arXiv:2405.19226  [pdf, other

    cs.CV cs.MM

    ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions

    Authors: Honglin Lin, Siyu Li, Guoshun Nan, Chaoyue Tang, Xueting Wang, **gxin Xu, Rong Yankai, Zhili Zhou, Yutong Gao, Qimei Cui, Xiaofeng Tao

    Abstract: Image retrieval from contextual descriptions (IRCD) aims to identify an image within a set of minimally contrastive candidates based on linguistically complex text. Despite the success of VLMs, they still significantly lag behind human performance in IRCD. The main challenges lie in aligning key contextual cues in two modalities, where these subtle cues are concealed in tiny areas of multiple cont… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted in ACL 2024 Findings