Search | arXiv e-print repository

OW-VISCap: Open-World Video Instance Segmentation and Captioning

Authors: Anwesa Choudhuri, Girish Chowdhary, Alexander G. Schwing

Abstract: Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer… ▽ More Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer from highly overlap** predictions. To address these issues, we propose Open-World Video Instance Segmentation and Captioning (OW-VISCap), an approach to jointly segment, track, and caption previously seen or unseen objects in a video. For this, we introduce open-world object queries to discover never before seen objects without additional user-input. We generate rich and descriptive object-centric captions for each detected object via a masked attention augmented LLM input. We introduce an inter-query contrastive loss to ensure that the object queries differ from one another. Our generalized approach matches or surpasses state-of-the-art on three tasks: open-world video instance segmentation on the BURST dataset, dense video object captioning on the VidSTG dataset, and closed-world video instance segmentation on the OVIS dataset. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Project page: https://anwesachoudhuri.github.io/OpenWorldVISCap/

arXiv:2112.10764 [pdf, other]

Mask2Former for Video Instance Segmentation

Authors: Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

Abstract: We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouT… ▽ More We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouTubeVIS-2019 and 52.6 AP on YouTubeVIS-2021. We believe Mask2Former is also capable of handling video semantic and panoptic segmentation, given its versatility in image segmentation. We hope this will make state-of-the-art video segmentation research more accessible and bring more attention to designing universal image and video segmentation architectures. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: Code and models: https://github.com/facebookresearch/Mask2Former

arXiv:1503.01314 [pdf]

An Incentivized Approach for Fair Participation in Wireless Ad hoc Networks

Authors: Arka Rai Choudhuri, Kalyanasundaram S, Shriyak Sridhar, Annappa B

Abstract: In Wireless Ad hoc networks (WANETs), nodes separated by considerable distance communicate with each other by relaying their messages through other nodes. However, it might not be in the best interests of a node to forward the message of another node due to power constraints. In addition, all nodes being rational, some nodes may be selfish, i.e. they might not relay data from other nodes so as to… ▽ More In Wireless Ad hoc networks (WANETs), nodes separated by considerable distance communicate with each other by relaying their messages through other nodes. However, it might not be in the best interests of a node to forward the message of another node due to power constraints. In addition, all nodes being rational, some nodes may be selfish, i.e. they might not relay data from other nodes so as to increase their lifetime. In this paper, we present a fair and incentivized approach for participation in Ad hoc networks. Given the power required for each transmission, we are able to determine the power saving contributed by each intermediate hop. We propose the FAir Share incenTivizEd Ad hoc paRticipation protocol (FASTER), which takes a selected route from a routing protocol as input, to calculate the worth of each node using the cooperative game theory concept of 'Shapley Value' applied on the power saved by each node. This value can be used for allocation of Virtual Currency to the nodes, which can be spent on subsequent message transmissions. △ Less

Submitted 4 March, 2015; originally announced March 2015.

Comments: 6 pages, 4 figures, published in the International Journal of Recent Development in Engineering and Technology

Journal ref: IJRDET 2, no. 3 (2014): 117-121

Showing 1–3 of 3 results for author: Choudhuri, A