Coarse-Fine Networks for Temporal Activity Detection in Videos

Kahatapitiya, Kumara; Ryoo, Michael S.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.01302 (cs)

[Submitted on 1 Mar 2021 (v1), last revised 1 Apr 2021 (this version, v2)]

Title:Coarse-Fine Networks for Temporal Activity Detection in Videos

Authors:Kumara Kahatapitiya, Michael S. Ryoo

View PDF

Abstract:In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion. Traditional Video models process inputs at one (or few) fixed temporal resolution without any dynamic frame selection. However, we argue that, processing multiple temporal resolutions of the input and doing so dynamically by learning to estimate the importance of each frame can largely improve video representations, specially in the domain of temporal activity localization. To this end, we propose (1) Grid Pool, a learned temporal downsampling layer to extract coarse features, and, (2) Multi-stage Fusion, a spatio-temporal attention mechanism to fuse a fine-grained context with the coarse features. We show that our method outperforms the state-of-the-arts for action detection in public datasets including Charades with a significantly reduced compute and memory footprint. The code is available at this https URL

Comments:	To appear at CVPR 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.01302 [cs.CV]
	(or arXiv:2103.01302v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.01302

Submission history

From: Kumara Kahatapitiya [view email]
[v1] Mon, 1 Mar 2021 20:48:01 UTC (652 KB)
[v2] Thu, 1 Apr 2021 17:57:04 UTC (446 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kumara Kahatapitiya
Michael S. Ryoo

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Coarse-Fine Networks for Temporal Activity Detection in Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Coarse-Fine Networks for Temporal Activity Detection in Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators