A Comprehensive Review of the Video-to-Text Problem
Authors:
Jesus Perez-Martin,
Benjamin Bustos,
Silvio Jamil F. Guimarães,
Ivan Sipiran,
Jorge Pérez,
Grethel Coello Said
Abstract:
Research in the Vision and Language area encompasses challenging topics that seek to connect visual and textual information. When the visual information is related to videos, this takes us into Video-Text Research, which includes several challenging tasks such as video question answering, video summarization with natural language, and video-to-text and text-to-video conversion. This paper reviews…
▽ More
Research in the Vision and Language area encompasses challenging topics that seek to connect visual and textual information. When the visual information is related to videos, this takes us into Video-Text Research, which includes several challenging tasks such as video question answering, video summarization with natural language, and video-to-text and text-to-video conversion. This paper reviews the video-to-text problem, in which the goal is to associate an input video with its textual description. This association can be mainly made by retrieving the most relevant descriptions from a corpus or generating a new one given a context video. These two ways represent essential tasks for Computer Vision and Natural Language Processing communities, called text retrieval from video task and video captioning/description task. These two tasks are substantially more complex than predicting or retrieving a single sentence from an image. The spatiotemporal information present in videos introduces diversity and complexity regarding the visual content and the structure of associated language descriptions. This review categorizes and describes the state-of-the-art techniques for the video-to-text problem. It covers the main video-to-text methods and the ways to evaluate their performance. We analyze twenty-six benchmark datasets, showing their drawbacks and strengths for the problem requirements. We also show the progress that researchers have made on each dataset, we cover the challenges in the field, and we discuss future research directions.
△ Less
Submitted 30 November, 2021; v1 submitted 26 March, 2021;
originally announced March 2021.
Semantic Search of Memes on Twitter
Authors:
Jesus Perez-Martin,
Benjamin Bustos,
Magdalena Saldana
Abstract:
Memes are becoming a useful source of data for analyzing behavior on social media. However, a problem to tackle is how to correctly identify a meme. As the number of memes published every day on social media is huge, there is a need for automatic methods for classifying and searching in large meme datasets. This paper proposes and compares several methods for automatically classifying images as me…
▽ More
Memes are becoming a useful source of data for analyzing behavior on social media. However, a problem to tackle is how to correctly identify a meme. As the number of memes published every day on social media is huge, there is a need for automatic methods for classifying and searching in large meme datasets. This paper proposes and compares several methods for automatically classifying images as memes. Also, we propose a method that allows us to implement a system for retrieving memes from a dataset using a textual query. We experimentally evaluate the methods using a large dataset of memes collected from Twitter users in Chile, which was annotated by a group of experts. Though some of the evaluated methods are effective, there is still room for improvement.
△ Less
Submitted 20 May, 2020; v1 submitted 4 February, 2020;
originally announced February 2020.