EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Wang, Zhe; Fan, Siqi; Huo, Xiaoliang; Xu, Tongda; Wang, Yan; Liu, **g**g; Chen, Yilun; Zhang, Ya-Qin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.15272v1 (cs)

[Submitted on 23 Feb 2024]

Title:EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Authors:Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, **g**g Liu, Yilun Chen, Ya-Qin Zhang

View PDF HTML (experimental)

Abstract:In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.

Comments:	7 pages, 8 figures. Accepted by ICRA 2024. arXiv admin note: text overlap with arXiv:arXiv:2303.10975
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.15272 [cs.CV]
	(or arXiv:2402.15272v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.15272

Submission history

From: Wang Zhe [view email]
[v1] Fri, 23 Feb 2024 11:35:48 UTC (5,192 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators