-
SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches
Authors:
Renzhi Wu,
Pramod Chunduri,
Dristi J Shah,
Ashmitha Julius Aravind,
Ali Payani,
Xu Chu,
Joy Arulraj,
Kexin Rong
Abstract:
In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajec…
▽ More
In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.
△ Less
Submitted 30 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning
Authors:
Pramod Chunduri,
Jaeho Bang,
Yao Lu,
Joy Arulraj
Abstract:
Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is…
▽ More
Detection and localization of actions in videos is an important problem in practice. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries because actions often involve a complex interaction between objects and are spread across a sequence of frames; detecting and localizing them requires computationally expensive deep neural networks. It is also important to consider the entire sequence of frames to answer the query effectively.
In this paper, we present ZEUS, a video analytics system tailored for answering action queries. We present a novel technique for efficiently answering these queries using deep reinforcement learning. ZEUS trains a reinforcement learning agent that learns to adaptively modify the input video segments that are subsequently sent to an action classification network. The agent alters the input segments along three dimensions - sampling rate, segment length, and resolution. To meet the user-specified accuracy target, ZEUS's query optimizer trains the agent based on an accuracy-aware, aggregate reward function. Evaluation on three diverse video datasets shows that ZEUS outperforms state-of-the-art frame- and window-based filtering techniques by up to 22.1x and 4.7x, respectively. It also consistently meets the user-specified accuracy target across all queries.
△ Less
Submitted 27 September, 2022; v1 submitted 6 April, 2021;
originally announced April 2021.
-
EKO: Adaptive Sampling of Compressed Video Data
Authors:
Jaeho Bang,
Pramod Chunduri,
Joy Arulraj
Abstract:
Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they use traditional video storage formats are tailored for human consumption. Second, they load and decode the entire compressed video in memory before applying the…
▽ More
Researchers have presented systems for efficiently analysing video data at scale using sampling algorithms. While these systems effectively leverage the temporal redundancy present in videos, they suffer from three limitations. First, they use traditional video storage formats are tailored for human consumption. Second, they load and decode the entire compressed video in memory before applying the sampling algorithm. Third, the sampling algorithms often require labeled training data obtained using a specific deep learning model. These limitations lead to lower accuracy, higher query execution time, and larger memory footprint. In this paper, we present EKO, a storage engine for efficiently managing video data. EKO relies on two optimizations. First, it uses a novel unsupervised, adaptive sampling algorithm for identifying the key frames in a given video. Second, it stores the identified key frames in a compressed representation that is optimized for machine consumption. We show that EKO improves F1-score by up to 9% compared to the next best performing state-of-the-art unsupervised, sampling algorithms by selecting more representative frames. It reduces query execution time by 3X and memory footprint by 10X in comparison to a widely-used, traditional video storage format.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.