Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Chen, Rongfei; Zhou, Wenju; Li, Yang; Zhou, Huiyu

doi:10.1109/TCSVT.2022.3197420

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.13954 (cs)

[Submitted on 30 Aug 2022]

Title:Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Authors:Rongfei Chen, Wenju Zhou, Yang Li, Huiyu Zhou

View PDF

Abstract:Multimodal sentiment analysis has a wide range of applications due to its information complementarity in multimodal interactions. Previous works focus more on investigating efficient joint representations, but they rarely consider the insufficient unimodal features extraction and data redundancy of multimodal fusion. In this paper, a Video-based Cross-modal Auxiliary Network (VCAN) is proposed, which is comprised of an audio features map module and a cross-modal selection module. The first module is designed to substantially increase feature diversity in audio feature extraction, aiming to improve classification accuracy by providing more comprehensive acoustic representations. To empower the model to handle redundant visual features, the second module is addressed to efficiently filter the redundant visual frames during integrating audiovisual data. Moreover, a classifier group consisting of several image classification networks is introduced to predict sentiment polarities and emotion categories. Extensive experimental results on RAVDESS, CMU-MOSI, and CMU-MOSEI benchmarks indicate that VCAN is significantly superior to the state-of-the-art methods for improving the classification accuracy of multimodal sentiment analysis.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2208.13954 [cs.CV]
	(or arXiv:2208.13954v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.13954
Related DOI:	https://doi.org/10.1109/TCSVT.2022.3197420

Submission history

From: Rongfei Chen [view email]
[v1] Tue, 30 Aug 2022 02:08:06 UTC (1,796 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators