-
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Abstract: One of the key challenges in learning joint embeddings of multiple modalities, e.g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets. We propose to address this through joint Gaussian regularization of the latent representations. Building on Wasserstein autoencoders (WAEs) to encode the input in each domain, we enforce the latent embeddings to be simi… ▽ More
Submitted 14 September, 2019; originally announced September 2019.
Comments: Accepted at ICCV 2019 Workshop on Cross-Modal Learning in Real World
-
Multimodal Grounding for Language Processing
Abstract: This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding… ▽ More
Submitted 3 July, 2019; v1 submitted 17 June, 2018; originally announced June 2018.
Comments: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197/