Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

Wang, Peng; Wen, Jun; Si, Chenyang; Qian, Yuntao; Wang, Liang

doi:10.1109/TIP.2022.3207577

Computer Science > Computer Vision and Pattern Recognition

arXiv:2111.11051 (cs)

[Submitted on 22 Nov 2021 (v1), last revised 11 Feb 2023 (this version, v2)]

Title:Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

Authors:Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, Liang Wang

View PDF

Abstract:Skeleton-based action recognition is widely used in varied areas, e.g., surveillance and human-machine interaction. Existing models are mainly learned in a supervised manner, thus heavily depending on large-scale labeled data which could be infeasible when labels are prohibitively expensive. In this paper, we propose a novel Contrast-Reconstruction Representation Learning network (CRRL) that simultaneously captures postures and motion dynamics for unsupervised skeleton-based action recognition. It mainly consists of three parts: Sequence Reconstructor, Contrastive Motion Learner, and Information Fuser. The Sequence Reconstructor learns representation from skeleton coordinate sequence via reconstruction, thus the learned representation tends to focus on trivial postural coordinates and be hesitant in motion learning. To enhance the learning of motions, the Contrastive Motion Learner performs contrastive learning between the representations learned from coordinate sequence and additional velocity sequence, respectively. Finally, in the Information Fuser, we explore varied strategies to combine the Sequence Reconstructor and Contrastive Motion Learner, and propose to capture postures and motions simultaneously via a knowledge-distillation based fusion strategy that transfers the motion learning from the Contrastive Motion Learner to the Sequence Reconstructor. Experimental results on several benchmarks, i.e., NTU RGB+D 60, NTU RGB+D 120, CMU mocap, and NW-UCLA, demonstrate the promise of the proposed CRRL method by far outperforming state-of-the-art approaches.

Comments:	Publised in IEEE TIP. (this https URL)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2111.11051 [cs.CV]
	(or arXiv:2111.11051v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2111.11051
Related DOI:	https://doi.org/10.1109/TIP.2022.3207577

Submission history

From: Peng Wang [view email]
[v1] Mon, 22 Nov 2021 08:45:34 UTC (770 KB)
[v2] Sat, 11 Feb 2023 03:02:13 UTC (2,865 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators