Showing 1–2 of 2 results for author: Botschen, T

Search v0.5.6 released 2020-02-24

arXiv:1909.06635 [pdf, other]

cs.CV cs.CL cs.LG

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

Authors: Shweta Mahajan, Teresa Botschen, Iryna Gurevych, Stefan Roth

Abstract: One of the key challenges in learning joint embeddings of multiple modalities, e.g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets. We propose to address this through joint Gaussian regularization of the latent representations. Building on Wasserstein autoencoders (WAEs) to encode the input in each domain, we enforce the latent embeddings to be simi… ▽ More One of the key challenges in learning joint embeddings of multiple modalities, e.g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets. We propose to address this through joint Gaussian regularization of the latent representations. Building on Wasserstein autoencoders (WAEs) to encode the input in each domain, we enforce the latent embeddings to be similar to a Gaussian prior that is shared across the two domains, ensuring compatible continuity of the encoded semantic representations of images and texts. Semantic alignment is achieved through supervision from matching image-text pairs. To show the benefits of our semi-supervised representation, we apply it to cross-modal retrieval and phrase localization. We not only achieve state-of-the-art accuracy, but significantly better generalization across datasets, owing to the semantic continuity of the latent space. △ Less

Submitted 14 September, 2019; originally announced September 2019.

Comments: Accepted at ICCV 2019 Workshop on Cross-Modal Learning in Real World
arXiv:1806.06371 [pdf, other]

cs.CL cs.AI

Multimodal Grounding for Language Processing

Authors: Lisa Beinborn, Teresa Botschen, Iryna Gurevych

Abstract: This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding… ▽ More This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language. △ Less

Submitted 3 July, 2019; v1 submitted 17 June, 2018; originally announced June 2018.

Comments: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197/

Search v0.5.6 released 2020-02-24