GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Yin, Da; Gao, Feng; Thattai, Govind; Johnston, Michael; Chang, Kai-Wei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2301.01893 (cs)

[Submitted on 5 Jan 2023]

Title:GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Authors:Da Yin, Feng Gao, Govind Thattai, Michael Johnston, Kai-Wei Chang

View PDF

Abstract:A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical region. In fact, a significant proportion of knowledge is locally shared by people from certain regions but may not apply equally in other regions because of cultural differences. If a model is unaware of regional characteristics, it may lead to performance disparity across regions and result in bias against underrepresented groups. We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model. There are two attributes of geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1) concepts under similar categories have unique knowledge and visual characteristics, 2) concepts with similar visual features may fall in completely different categories. Motivated by the attributes, we design new pre-training objectives Image Knowledge Matching (IKM) and Image Edit Checking (IEC) to pre-train GIVL. Compared with similar-size models pre-trained with similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced performance on geo-diverse V&L tasks.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2301.01893 [cs.CV]
	(or arXiv:2301.01893v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2301.01893

Submission history

From: Feng Gao [view email]
[v1] Thu, 5 Jan 2023 03:43:45 UTC (23,278 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators