Visuo-Linguistic Question Answering (VLQA) Challenge

Sampat, Shailaja Keyur; Yang, Yezhou; Baral, Chitta

Computer Science > Computer Vision and Pattern Recognition

arXiv:2005.00330 (cs)

[Submitted on 1 May 2020 (v1), last revised 18 Nov 2020 (this version, v3)]

Title:Visuo-Linguistic Question Answering (VLQA) Challenge

Authors:Shailaja Keyur Sampat, Yezhou Yang, Chitta Baral

View PDF

Abstract:Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however joint reasoning is still a challenge for state-of-the-art computer vision and natural language processing (NLP) systems. We propose a novel task to derive joint inference about a given image-text modality and compile the Visuo-Linguistic Question Answering (VLQA) challenge corpus in a question answering setting. Each dataset item consists of an image and a reading passage, where questions are designed to combine both visual and textual information i.e., ignoring either modality would make the question unanswerable. We first explore the best existing vision-language architectures to solve VLQA subsets and show that they are unable to reason well. We then develop a modular method with slightly better baseline performance, but it is still far behind human performance. We believe that VLQA will be a good benchmark for reasoning over a visuo-linguistic context. The dataset, code and leaderboard is available at this https URL.

Comments:	Findings of EMNLP 2020 (22 pages, 13 figures)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2005.00330 [cs.CV]
	(or arXiv:2005.00330v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2005.00330

Submission history

From: Shailaja Keyur Sampat [view email]
[v1] Fri, 1 May 2020 12:18:55 UTC (5,435 KB)
[v2] Thu, 8 Oct 2020 01:06:30 UTC (2,661 KB)
[v3] Wed, 18 Nov 2020 07:45:20 UTC (2,661 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Visuo-Linguistic Question Answering (VLQA) Challenge

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visuo-Linguistic Question Answering (VLQA) Challenge

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators