Skip to main content

Showing 1–1 of 1 results for author: Shamsfard, M

Searching in archive eess. Search in all archives.
.
  1. Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning

    Authors: Mozhgan Pourkeshavarz, Shahabedin Nabavi, Mohsen Ebrahimi Moghaddam, Mehrnoush Shamsfard

    Abstract: Recently, the attention-enriched encoder-decoder framework has aroused great interest in image captioning due to its overwhelming progress. Many visual attention models directly leverage meaningful regions to generate image descriptions. However, seeking a direct transition from visual space to text is not enough to generate fine-grained captions. This paper exploits a feature-compounding approach… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Journal ref: Multimedia Tools and Applications, Volume 83, pages 12209-12233, 2024