CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

Xiong, Kaixin; Gong, Shi; Ye, Xiaoqing; Tan, Xiao; Wan, Ji; Ding, Errui; Wang, **gdong; Bai, Xiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.10209 (cs)

[Submitted on 17 Mar 2023]

Title:CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

Authors:Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, **gdong Wang, Xiang Bai

View PDF

Abstract:In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego-motion for boosting 3D object detection. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset. Codes and models are available on \href{this https URL}{Paddle3D} and \href{this https URL}{PyTorch Implementation}.

Comments:	Accepted by CVPR2023. Code is available
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2303.10209 [cs.CV]
	(or arXiv:2303.10209v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.10209

Submission history

From: Xiaoqing Ye [view email]
[v1] Fri, 17 Mar 2023 18:59:54 UTC (1,257 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators