EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Wang, Tai; Mao, Xiaohan; Zhu, Chenming; Xu, Runsen; Lyu, Ruiyuan; Li, Peisen; Chen, Xiao; Zhang, Wenwei; Chen, Kai; Xue, Tianfan; Liu, Xihui; Lu, Cewu; Lin, Dahua; Pang, Jiangmiao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.16170 (cs)

[Submitted on 26 Dec 2023]

Title:EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Authors:Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang

View PDF HTML (experimental)

Abstract:In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions. This necessitates the ability to fully understand 3D scenes given their first-person observations and contextualize them into language for interaction. However, traditional research focuses more on scene-level input and output setups from a global view. To address the gap, we introduce EmbodiedScan, a multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. It encompasses over 5k scans encapsulating 1M ego-centric RGB-D views, 1M language prompts, 160k 3D-oriented boxes spanning over 760 categories, some of which partially align with LVIS, and dense semantic occupancy with 80 common categories. Building upon this database, we introduce a baseline framework named Embodied Perceptron. It is capable of processing an arbitrary number of multi-modal inputs and demonstrates remarkable 3D perception capabilities, both within the two series of benchmarks we set up, i.e., fundamental 3D perception tasks and language-grounded tasks, and in the wild. Codes, datasets, and benchmarks will be available at this https URL.

Comments:	A multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. Project page: this http URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2312.16170 [cs.CV]
	(or arXiv:2312.16170v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.16170

Submission history

From: Tai Wang [view email]
[v1] Tue, 26 Dec 2023 18:59:11 UTC (3,614 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators