Enhancing multimodal cooperation via sample-level modality valuation

Wei, Yake; Feng, Ruoxuan; Wang, Zihe; Hu, Di

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.06255 (cs)

[Submitted on 12 Sep 2023 (v1), last revised 14 Jun 2024 (this version, v4)]

Title:Enhancing multimodal cooperation via sample-level modality valuation

Authors:Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

View PDF HTML (experimental)

Abstract:One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality but they are often hard to provide the fine-grained observation of multimodal cooperation at sample-level with theoretical support. Hence it is essential to reasonably observe and improve the fine-grained cooperation between modalities especially when facing realistic scenarios where the modality discrepancy could vary across different samples. To this end we introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample. Via modality valuation we observe that modality discrepancy indeed could be different at sample-level beyond the global contribution discrepancy at dataset-level. We further analyze this issue and improve cooperation between modalities at sample-level by enhancing the discriminative ability of low-contributing modalities in a targeted manner. Overall our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement. The source code and dataset are available at this https URL.

Comments:	Accepted by CVPR 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as:	arXiv:2309.06255 [cs.CV]
	(or arXiv:2309.06255v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.06255

Submission history

From: Yake Wei [view email]
[v1] Tue, 12 Sep 2023 14:16:34 UTC (1,547 KB)
[v2] Tue, 21 Nov 2023 11:11:57 UTC (1,316 KB)
[v3] Thu, 21 Mar 2024 03:21:24 UTC (1,581 KB)
[v4] Fri, 14 Jun 2024 03:37:46 UTC (1,581 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing multimodal cooperation via sample-level modality valuation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Enhancing multimodal cooperation via sample-level modality valuation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators