Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

Javaloy, Adrián; Meghdadi, Maryam; Valera, Isabel

Computer Science > Machine Learning

arXiv:2206.04496 (cs)

[Submitted on 9 Jun 2022]

Title:Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

Authors:Adrián Javaloy, Maryam Meghdadi, Isabel Valera

View PDF

Abstract:A number of variational autoencoders (VAEs) have recently emerged with the aim of modeling multimodal data, e.g., to jointly model images and their corresponding captions. Still, multimodal VAEs tend to focus solely on a subset of the modalities, e.g., by fitting the image while neglecting the caption. We refer to this limitation as modality collapse. In this work, we argue that this effect is a consequence of conflicting gradients during multimodal VAE training. We show how to detect the sub-graphs in the computational graphs where gradients conflict (impartiality blocks), as well as how to leverage existing gradient-conflict solutions from multitask learning to mitigate modality collapse. That is, to ensure impartial optimization across modalities. We apply our training framework to several multimodal VAE models, losses and datasets from the literature, and empirically show that our framework significantly improves the reconstruction performance, conditional generation, and coherence of the latent space across modalities.

Comments:	Accepted as a Spotlight paper at ICML 2022. 27 pages, 10 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2206.04496 [cs.LG]
	(or arXiv:2206.04496v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2206.04496

Submission history

From: Adrián Javaloy [view email]
[v1] Thu, 9 Jun 2022 13:29:25 UTC (6,703 KB)

Computer Science > Machine Learning

Title:Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators