Improving Visual Reasoning by Exploiting The Knowledge in Texts

Sharifzadeh, Sahand; Baharlou, Sina Moayed; Schmitt, Martin; Schütze, Hinrich; Tresp, Volker

Computer Science > Computer Vision and Pattern Recognition

arXiv:2102.04760v1 (cs)

[Submitted on 9 Feb 2021 (this version), latest version 8 Oct 2021 (v2)]

Title:Improving Visual Reasoning by Exploiting The Knowledge in Texts

Authors:Sahand Sharifzadeh, Sina Moayed Baharlou, Martin Schmitt, Hinrich Schütze, Volker Tresp

View PDF

Abstract:This paper presents a new framework for training image-based classifiers from a combination of texts and images with very few labels. We consider a classification framework with three modules: a backbone, a relational reasoning component, and a classification component. While the backbone can be trained from unlabeled images by self-supervised learning, we can fine-tune the relational reasoning and the classification components from external sources of knowledge instead of annotated images. By proposing a transformer-based model that creates structured knowledge from textual input, we enable the utilization of the knowledge in texts. We show that, compared to the supervised baselines with 1% of the annotated images, we can achieve ~8x more accurate results in scene graph classification, ~3x in object classification, and ~1.5x in predicate classification.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2102.04760 [cs.CV]
	(or arXiv:2102.04760v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2102.04760

Submission history

From: Sahand Sharifzadeh [view email]
[v1] Tue, 9 Feb 2021 11:21:44 UTC (3,015 KB)
[v2] Fri, 8 Oct 2021 13:11:16 UTC (3,377 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-02

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sahand Sharifzadeh
Martin Schmitt
Hinrich Schütze
Volker Tresp

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Visual Reasoning by Exploiting The Knowledge in Texts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Improving Visual Reasoning by Exploiting The Knowledge in Texts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators