Improving generalization by mimicking the human visual diet

Madan, Spandan; Li, You; Zhang, Mengmi; Pfister, Hanspeter; Kreiman, Gabriel

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.07802 (cs)

[Submitted on 15 Jun 2022 (v1), last revised 10 Jan 2024 (this version, v2)]

Title:Improving generalization by mimicking the human visual diet

Authors:Spandan Madan, You Li, Mengmi Zhang, Hanspeter Pfister, Gabriel Kreiman

View PDF HTML (experimental)

Abstract:We present a new perspective on bridging the generalization gap between biological and computer vision -- mimicking the human visual diet. While computer vision models rely on internet-scraped datasets, humans learn from limited 3D scenes under diverse real-world transformations with objects in natural context. Our results demonstrate that incorporating variations and contextual cues ubiquitous in the human visual training data (visual diet) significantly improves generalization to real-world transformations such as lighting, viewpoint, and material changes. This improvement also extends to generalizing from synthetic to real-world data -- all models trained with a human-like visual diet outperform specialized architectures by large margins when tested on natural image data. These experiments are enabled by our two key contributions: a novel dataset capturing scene context and diverse real-world transformations to mimic the human visual diet, and a transformer model tailored to leverage these aspects of the human visual diet. All data and source code can be accessed at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2206.07802 [cs.CV]
	(or arXiv:2206.07802v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.07802

Submission history

From: Spandan Madan [view email]
[v1] Wed, 15 Jun 2022 20:32:24 UTC (49,116 KB)
[v2] Wed, 10 Jan 2024 15:48:39 UTC (40,659 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Improving generalization by mimicking the human visual diet

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Improving generalization by mimicking the human visual diet

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators