Second-order Temporal Pooling for Action Recognition

Cherian, Anoop; Gould, Stephen

Computer Science > Computer Vision and Pattern Recognition

arXiv:1704.06925v2 (cs)

[Submitted on 23 Apr 2017 (v1), last revised 7 Aug 2018 (this version, v2)]

Title:Second-order Temporal Pooling for Action Recognition

Authors:Anoop Cherian, Stephen Gould

View PDF

Abstract:Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.

Comments:	Accepted in the International Journal of Computer Vision (IJCV)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1704.06925 [cs.CV]
	(or arXiv:1704.06925v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1704.06925

Submission history

From: Anoop Cherian [view email]
[v1] Sun, 23 Apr 2017 14:10:55 UTC (1,506 KB)
[v2] Tue, 7 Aug 2018 01:38:50 UTC (1,564 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Second-order Temporal Pooling for Action Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Second-order Temporal Pooling for Action Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators