Dual Contrastive Learning for Spatio-temporal Representation

Ding, Shuangrui; Qian, Rui; Xiong, Hongkai

doi:10.1145/3503161.3547783

Abstract:Contrastive learning has shown promising potential in self-supervised spatio-temporal representation learning. Most works naively sample different clips to construct positive and negative pairs. However, we observe that this formulation inclines the model towards the background scene bias. The underlying reasons are twofold. First, the scene difference is usually more noticeable and easier to discriminate than the motion difference. Second, the clips sampled from the same video often share similar backgrounds but have distinct motions. Simply regarding them as positive pairs will draw the model to the static background rather than the motion pattern. To tackle this challenge, this paper presents a novel dual contrastive formulation. Concretely, we decouple the input RGB video sequence into two complementary modes, static scene and dynamic motion. Then, the original RGB features are pulled closer to the static features and the aligned dynamic features, respectively. In this way, the static scene and the dynamic motion are simultaneously encoded into the compact RGB representation. We further conduct the feature space decoupling via activation maps to distill static- and dynamic-related features. We term our method as \textbf{D}ual \textbf{C}ontrastive \textbf{L}earning for spatio-temporal \textbf{R}epresentation (DCLR). Extensive experiments demonstrate that DCLR learns effective spatio-temporal representations and obtains state-of-the-art or comparable performance on UCF-101, HMDB-51, and Diving-48 datasets.

Comments:	ACM MM 2022 camera ready
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2207.05340 [cs.CV]
	(or arXiv:2207.05340v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2207.05340
Related DOI:	https://doi.org/10.1145/3503161.3547783

Computer Science > Computer Vision and Pattern Recognition

Title:Dual Contrastive Learning for Spatio-temporal Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators