Dense Interaction Learning for Video-based Person Re-identification

He, Tianyu; **, Xin; Shen, Xu; Huang, Jianqiang; Chen, Zhibo; Hua, Xian-Sheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.09013v1 (cs)

[Submitted on 16 Mar 2021 (this version), latest version 17 Aug 2021 (v3)]

Title:Dense Interaction Learning for Video-based Person Re-identification

Authors:Tianyu He, Xin **, Xu Shen, Jianqiang Huang, Zhibo Chen, Xian-Sheng Hua

View PDF

Abstract:Video-based person re-identification (re-ID) aims at matching the same person across video clips. Efficiently exploiting multi-scale fine-grained features while building the structural interaction among them is pivotal for its success. In this paper, we propose a hybrid framework, Dense Interaction Learning (DenseIL), that takes the principal advantages of both CNN-based and Attention-based architectures to tackle video-based person re-ID difficulties. DenseIL contains a CNN Encoder and a Transformer Decoder. The CNN Encoder is responsible for efficiently extracting discriminative spatial features while the Transformer Decoder is designed to deliberately model spatial-temporal inherent interaction across frames. Different from the vanilla Transformer, we additionally let the Transformer Decoder densely attends to intermediate fine-grained CNN features and that naturally yields multi-scale spatial-temporal feature representation for each video clip. Moreover, we introduce Spatio-TEmporal Positional Embedding (STEP-Emb) into the Transformer Decoder to investigate the positional relation among the spatial-temporal inputs. Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.

Comments:	Technical report, 12 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.09013 [cs.CV]
	(or arXiv:2103.09013v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.09013

Submission history

From: Tianyu He [view email]
[v1] Tue, 16 Mar 2021 12:22:08 UTC (766 KB)
[v2] Thu, 18 Mar 2021 07:03:30 UTC (614 KB)
[v3] Tue, 17 Aug 2021 03:19:16 UTC (615 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Dense Interaction Learning for Video-based Person Re-identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Dense Interaction Learning for Video-based Person Re-identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators