Using Syntax to Ground Referring Expressions in Natural Images

Cirik, Volkan; Berg-Kirkpatrick, Taylor; Morency, Louis-Philippe

Computer Science > Computer Vision and Pattern Recognition

arXiv:1805.10547 (cs)

[Submitted on 26 May 2018]

Title:Using Syntax to Ground Referring Expressions in Natural Images

Authors:Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency

View PDF

Abstract:We introduce GroundNet, a neural network for referring expression recognition -- the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of \textit{both} the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.

Comments:	AAAI 2018
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1805.10547 [cs.CV]
	(or arXiv:1805.10547v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1805.10547

Submission history

From: Volkan Cirik [view email]
[v1] Sat, 26 May 2018 22:02:05 UTC (931 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Using Syntax to Ground Referring Expressions in Natural Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Using Syntax to Ground Referring Expressions in Natural Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators