Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

Liu, Xihui; Wang, Zihao; Shao, **g; Wang, Xiaogang; Li, Hongsheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:1903.00839 (cs)

[Submitted on 3 Mar 2019 (v1), last revised 2 Apr 2019 (this version, v2)]

Title:Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

Authors:Xihui Liu, Zihao Wang, **g Shao, Xiaogang Wang, Hongsheng Li

View PDF

Abstract:Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions. Although the attention mechanism has been successfully applied for cross-modal alignments, previous attention models focus on only the most dominant features of both modalities, and neglect the fact that there could be multiple comprehensive textual-visual correspondences between images and referring expressions. To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training samples online, and to drive the model to discover complementary textual-visual correspondences. Extensive experiments demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance on three referring expression grounding datasets.

Comments:	Accepted by CVPR 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:1903.00839 [cs.CV]
	(or arXiv:1903.00839v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1903.00839

Submission history

From: Xihui Liu [view email]
[v1] Sun, 3 Mar 2019 05:55:15 UTC (811 KB)
[v2] Tue, 2 Apr 2019 07:33:37 UTC (802 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-03

Change to browse by:

cs
cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Xihui Liu
Zihao Wang
**g Shao
Xiaogang Wang
Hongsheng Li

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators