Visual News: Benchmark and Challenges in News Image Captioning

Liu, Fuxiao; Wang, Yinghan; Wang, Tianlu; Ordonez, Vicente

Computer Science > Computer Vision and Pattern Recognition

arXiv:2010.03743 (cs)

[Submitted on 8 Oct 2020 (v1), last revised 13 Sep 2021 (this version, v3)]

Title:Visual News: Benchmark and Challenges in News Image Captioning

Authors:Fuxiao Liu, Yinghan Wang, Tianlu Wang, Vicente Ordonez

View PDF

Abstract:We propose Visual News Captioner, an entity-aware model for the task of news image captioning. We also introduce Visual News, a large-scale benchmark consisting of more than one million news images along with associated news articles, image captions, author information, and other metadata. Unlike the standard image captioning task, news images depict situations where people, locations, and events are of paramount importance. Our proposed method can effectively combine visual and textual features to generate captions with richer information such as events and entities. More specifically, built upon the Transformer architecture, our model is further equipped with novel multi-modal feature fusion techniques and attention mechanisms, which are designed to generate named entities more accurately. Our method utilizes much fewer parameters while achieving slightly better prediction results than competing methods. Our larger and more diverse Visual News dataset further highlights the remaining challenges in captioning news images.

Comments:	9 pages, 5 figures, accepted to EMNLP2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2010.03743 [cs.CV]
	(or arXiv:2010.03743v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2010.03743

Submission history

From: Fuxiao Liu [view email]
[v1] Thu, 8 Oct 2020 03:07:00 UTC (4,819 KB)
[v2] Tue, 13 Oct 2020 17:41:41 UTC (4,734 KB)
[v3] Mon, 13 Sep 2021 18:53:35 UTC (2,050 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Visual News: Benchmark and Challenges in News Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual News: Benchmark and Challenges in News Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators