Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition

Gammulle, Harshala; Denman, Simon; Sridharan, Sridha; Fookes, Clinton

Computer Science > Computer Vision and Pattern Recognition

arXiv:1704.01194 (cs)

[Submitted on 4 Apr 2017]

Title:Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition

Authors:Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes

View PDF

Abstract:In this paper we address the problem of human action recognition from video sequences. Inspired by the exemplary results obtained via automatic feature learning and deep learning approaches in computer vision, we focus our attention towards learning salient spatial features via a convolutional neural network (CNN) and then map their temporal relationship with the aid of Long-Short-Term-Memory (LSTM) networks. Our contribution in this paper is a deep fusion framework that more effectively exploits spatial features from CNNs with temporal features from LSTM models. We also extensively evaluate their strengths and weaknesses. We find that by combining both the sets of features, the fully connected features effectively act as an attention mechanism to direct the LSTM to interesting parts of the convolutional feature sequence. The significance of our fusion method is its simplicity and effectiveness compared to other state-of-the-art methods. The evaluation results demonstrate that this hierarchical multi stream fusion method has higher performance compared to single stream map** methods allowing it to achieve high accuracy outperforming current state-of-the-art methods in three widely used databases: UCF11, UCFSports, jHMDB.

Comments:	Published as a conference paper at WACV 2017
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1704.01194 [cs.CV]
	(or arXiv:1704.01194v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1704.01194

Submission history

From: Harshala Gammulle Harshala Gammulle [view email]
[v1] Tue, 4 Apr 2017 21:32:04 UTC (815 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators