Image Captioning using Facial Expression and Attention

Nezami, Omid Mohamad; Dras, Mark; Wan, Stephen; Paris, Cecile

Computer Science > Computer Vision and Pattern Recognition

arXiv:1908.02923 (cs)

[Submitted on 8 Aug 2019 (v1), last revised 15 Apr 2020 (this version, v3)]

Title:Image Captioning using Facial Expression and Attention

Authors:Omid Mohamad Nezami, Mark Dras, Stephen Wan, Cecile Paris

View PDF

Abstract:Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective characterisation of the image, although some models do incorporate subjective aspects related to the observer's view of the image, such as sentiment; current models, however, usually do not consider the emotional content of images during the caption generation process. This paper addresses this issue by proposing novel image captioning models which use facial expression features to generate image captions. The models generate image captions using long short-term memory networks applying facial features in addition to other visual features at different time steps. We compare a comprehensive collection of image captioning models with and without facial features using all standard evaluation metrics. The evaluation metrics indicate that applying facial features with an attention mechanism achieves the best performance, showing more expressive and more correlated image captions, on an image caption dataset extracted from the standard Flickr 30K dataset, consisting of around 11K images containing faces. An analysis of the generated captions finds that, perhaps unexpectedly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:1908.02923 [cs.CV]
	(or arXiv:1908.02923v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1908.02923

Submission history

From: Omid Mohamad Nezami [view email]
[v1] Thu, 8 Aug 2019 04:07:39 UTC (6,138 KB)
[v2] Thu, 9 Jan 2020 02:39:46 UTC (6,147 KB)
[v3] Wed, 15 Apr 2020 02:01:07 UTC (8,285 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning using Facial Expression and Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning using Facial Expression and Attention

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators