The More You See in 2D, the More You Perceive in 3D

Han, Xinyang; Gao, Zelin; Kanazawa, Angjoo; Goel, Shubham; Gandelsman, Yossi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.03652 (cs)

[Submitted on 4 Apr 2024]

Title:The More You See in 2D, the More You Perceive in 3D

Authors:Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman

View PDF HTML (experimental)

Abstract:Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with the camera poses of the images via test-time fine-tuning. The adapted diffusion model and the obtained camera poses are then utilized as instance-specific priors for 3D reconstruction and novel view synthesis. We show that as the number of input images increases, the performance of our approach improves, bridging the gap between optimization-based prior-less 3D reconstruction methods and single-image-to-3D diffusion-based methods. We demonstrate our system on real images as well as standard synthetic benchmarks. Our ablation studies confirm that this adaption behavior is key for more accurate 3D understanding.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.03652 [cs.CV]
	(or arXiv:2404.03652v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.03652

Submission history

From: Yossi Gandelsman [view email]
[v1] Thu, 4 Apr 2024 17:59:40 UTC (12,193 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:The More You See in 2D, the More You Perceive in 3D

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:The More You See in 2D, the More You Perceive in 3D

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators