CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Pham, Hai X.; Guerrero, Ricardo; Li, Jiatong; Pavlovic, Vladimir

Computer Science > Computer Vision and Pattern Recognition

arXiv:2102.02547v1 (cs)

[Submitted on 4 Feb 2021]

Title:CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Authors:Hai X. Pham, Ricardo Guerrero, Jiatong Li, Vladimir Pavlovic

View PDF

Abstract:Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automaticall} as a visual-linguistic association problem. More specifically, we introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks. This model allows one to discover complex functional and hierarchical relationships between images and text, and among textual parts of a recipe including title, ingredients and cooking instructions. Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are not only able to identify the main ingredients and cooking actions in the recipe descriptions without explicit supervision, but we can also learn more meaningful feature representations of food recipes, appropriate for challenging cross-modal retrieval and recipe adaption tasks.

Comments:	22 pages, accepted in AAAI 2021
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2102.02547 [cs.CV]
	(or arXiv:2102.02547v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2102.02547

Submission history

From: Hai Pham [view email]
[v1] Thu, 4 Feb 2021 11:24:34 UTC (5,577 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Hai Xuan Pham
Ricardo Guerrero
Jiatong Li
Vladimir Pavlovic

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators