Non-Volume Preserving-based Fusion to Group-Level Emotion Recognition on Crowd Videos

Quach, Kha Gia; Le, Ngan; Duong, Chi Nhan; Jalata, Ibsa; Roy, Kaushik; Luu, Khoa

Computer Science > Computer Vision and Pattern Recognition

arXiv:1811.11849 (cs)

[Submitted on 28 Nov 2018 (v1), last revised 23 Mar 2022 (this version, v4)]

Title:Non-Volume Preserving-based Fusion to Group-Level Emotion Recognition on Crowd Videos

Authors:Kha Gia Quach, Ngan Le, Chi Nhan Duong, Ibsa Jalata, Kaushik Roy, Khoa Luu

View PDF

Abstract:Group-level emotion recognition (ER) is a growing research area as the demands for assessing crowds of all sizes are becoming an interest in both the security arena as well as social media. This work extends the earlier ER investigations, which focused on either group-level ER on single images or within a video, by fully investigating group-level expression recognition on crowd videos. In this paper, we propose an effective deep feature level fusion mechanism to model the spatial-temporal information in the crowd videos. In our approach, the fusing process is performed on the deep feature domain by a generative probabilistic model, Non-Volume Preserving Fusion (NVPF), that models spatial information relationships. Furthermore, we extend our proposed spatial NVPF approach to the spatial-temporal NVPF approach to learn the temporal information between frames. To demonstrate the robustness and effectiveness of each component in the proposed approach, three experiments were conducted: (i) evaluation on AffectNet database to benchmark the proposed EmoNet for recognizing facial expression; (ii) evaluation on EmotiW2018 to benchmark the proposed deep feature level fusion mechanism NVPF; and, (iii) examine the proposed TNVPF on an innovative Group-level Emotion on Crowd Videos (GECV) dataset composed of 627 videos collected from publicly available sources. GECV dataset is a collection of videos containing crowds of people. Each video is labeled with emotion categories at three levels: individual faces, group of people, and the entire video frame.

Comments:	In press at Patter Recognition Journal
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1811.11849 [cs.CV]
	(or arXiv:1811.11849v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1811.11849

Submission history

From: Kha Gia Quach [view email]
[v1] Wed, 28 Nov 2018 21:35:23 UTC (8,447 KB)
[v2] Mon, 6 Jul 2020 03:02:03 UTC (8,384 KB)
[v3] Mon, 7 Jun 2021 06:14:58 UTC (15,378 KB)
[v4] Wed, 23 Mar 2022 05:41:56 UTC (46,848 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Non-Volume Preserving-based Fusion to Group-Level Emotion Recognition on Crowd Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Non-Volume Preserving-based Fusion to Group-Level Emotion Recognition on Crowd Videos

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators