Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

Choi, Hyeongju; Beedu, Apoorva; Haresamudram, Harish; Essa, Irfan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2211.04331 (cs)

[Submitted on 8 Nov 2022]

Title:Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

Authors:Hyeongju Choi, Apoorva Beedu, Harish Haresamudram, Irfan Essa

View PDF

Abstract:To properly assist humans in their needs, human activity recognition (HAR) systems need the ability to fuse information from multiple modalities. Our hypothesis is that multimodal sensors, visual and non-visual tend to provide complementary information, addressing the limitations of other modalities. In this work, we propose a multi-modal framework that learns to effectively combine features from RGB Video and IMU sensors, and show its robustness for MMAct and UTD-MHAD datasets. Our model is trained in two-stage, where in the first stage, each input encoder learns to effectively extract features, and in the second stage, learns to combine these individual features. We show significant improvements of 22% and 11% compared to video only and IMU only setup on UTD-MHAD dataset, and 20% and 12% on MMAct datasets. Through extensive experimentation, we show the robustness of our model on zero shot setting, and limited annotated data setting. We further compare with state-of-the-art methods that use more input modalities and show that our method outperforms significantly on the more difficult MMact dataset, and performs comparably in UTD-MHAD dataset.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2211.04331 [cs.CV]
	(or arXiv:2211.04331v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2211.04331

Submission history

From: Hyeongju Choi [view email]
[v1] Tue, 8 Nov 2022 15:48:44 UTC (1,560 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators