Skip to main content

Showing 1–2 of 2 results for author: Ghermi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10221  [pdf, other

    cs.CV cs.AI cs.CL

    Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding

    Authors: Ridouane Ghermi, Xi Wang, Vicky Kalogeiton, Ivan Laptev

    Abstract: Recent advances in vision-language models have significantly propelled video understanding. Existing datasets and tasks, however, have notable limitations. Most datasets are confined to short videos with limited events and narrow narratives. For example, datasets with instructional and egocentric videos often document the activities of one person in a single scene. Although some movie datasets off… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:1910.06017  [pdf, other

    cs.CV eess.IV

    OmniTrack: Real-time detection and tracking of objects, text and logos in video

    Authors: Hannes Fassold, Ridouane Ghermi

    Abstract: The automatic detection and tracking of general objects (like persons, animals or cars), text and logos in a video is crucial for many video understanding tasks, and usually real-time processing as required. We propose OmniTrack, an efficient and robust algorithm which is able to automatically detect and track objects, text as well as brand logos in real-time. It combines a powerful deep learning… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: accepted for IEEE ISM Conference, 2019