Spatiotemporal Transformer for Video-based Person Re-identification

Zhang, Tianyu; Wei, Longhui; Xie, Lingxi; Zhuang, Zijie; Zhang, Yongfei; Li, Bo; Tian, Qi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.16469 (cs)

[Submitted on 30 Mar 2021]

Title:Spatiotemporal Transformer for Video-based Person Re-identification

Authors:Tianyu Zhang, Longhui Wei, Lingxi Xie, Zijie Zhuang, Yongfei Zhang, Bo Li, Qi Tian

View PDF

Abstract:Recently, the Transformer module has been transplanted from natural language processing to computer vision. This paper applies the Transformer to video-based person re-identification, where the key issue is to extract the discriminative information from a tracklet. We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting, arguably due to a large number of attention parameters and insufficient training data. To solve this problem, we propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains with the perception-constrained Spatiotemporal Transformer (STT) module and Global Transformer (GT) module. The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks, MARS, DukeMTMC-VideoReID, and LS-VID, especially when the training and testing data are from different domains. More importantly, our research sheds light on the application of the Transformer on highly-structured visual data.

Comments:	10 pages, 7 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.16469 [cs.CV]
	(or arXiv:2103.16469v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.16469

Submission history

From: Tianyu Zhang [view email]
[v1] Tue, 30 Mar 2021 16:19:27 UTC (784 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tianyu Zhang
Longhui Wei
Lingxi Xie
Zijie Zhuang
Bo Li

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Spatiotemporal Transformer for Video-based Person Re-identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Spatiotemporal Transformer for Video-based Person Re-identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators