Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Nguyen, Hoang-Quan; Truong, Thanh-Dat; Luu, Khoa

Abstract:Action recognition has become one of the popular research topics in computer vision. There are various methods based on Convolutional Networks and self-attention mechanisms as Transformers to solve both spatial and temporal dimensions problems of action recognition tasks that achieve competitive performances. However, these methods lack a guarantee of the correctness of the action subject that the models give attention to, i.e., how to ensure an action recognition model focuses on the proper action subject to make a reasonable action prediction. In this paper, we propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos using Directed Gromov-Wasserstein Discrepancy. Furthermore, our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets. Therefore, the contributions in this work are three-fold. Firstly, we introduce the multi-view attention consistency to solve the problem of reasonable prediction in action recognition. Secondly, we define a new metric for multi-view consistent attention using Directed Gromov-Wasserstein Discrepancy. Thirdly, we built an action recognition model based on Video Transformers and Neural Radiance Fields. Compared to the recent action recognition methods, the proposed approach achieves state-of-the-art results on three large-scale datasets, i.e., Jester, Something-Something V2, and Kinetics-400.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.01337 [cs.CV]
	(or arXiv:2405.01337v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.01337

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators