-
OW-VISCap: Open-World Video Instance Segmentation and Captioning
Authors:
Anwesa Choudhuri,
Girish Chowdhary,
Alexander G. Schwing
Abstract:
Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer…
▽ More
Open-world video instance segmentation is an important video understanding task. Yet most methods either operate in a closed-world setting, require an additional user-input, or use classic region-based proposals to identify never before seen objects. Further, these methods only assign a one-word label to detected objects, and don't generate rich object-centric descriptions. They also often suffer from highly overlap** predictions. To address these issues, we propose Open-World Video Instance Segmentation and Captioning (OW-VISCap), an approach to jointly segment, track, and caption previously seen or unseen objects in a video. For this, we introduce open-world object queries to discover never before seen objects without additional user-input. We generate rich and descriptive object-centric captions for each detected object via a masked attention augmented LLM input. We introduce an inter-query contrastive loss to ensure that the object queries differ from one another. Our generalized approach matches or surpasses state-of-the-art on three tasks: open-world video instance segmentation on the BURST dataset, dense video object captioning on the VidSTG dataset, and closed-world video instance segmentation on the OVIS dataset.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Mask2Former for Video Instance Segmentation
Authors:
Bowen Cheng,
Anwesa Choudhuri,
Ishan Misra,
Alexander Kirillov,
Rohit Girdhar,
Alexander G. Schwing
Abstract:
We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouT…
▽ More
We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline. In this report, we show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes. Specifically, Mask2Former sets a new state-of-the-art of 60.4 AP on YouTubeVIS-2019 and 52.6 AP on YouTubeVIS-2021. We believe Mask2Former is also capable of handling video semantic and panoptic segmentation, given its versatility in image segmentation. We hope this will make state-of-the-art video segmentation research more accessible and bring more attention to designing universal image and video segmentation architectures.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
An Incentivized Approach for Fair Participation in Wireless Ad hoc Networks
Authors:
Arka Rai Choudhuri,
Kalyanasundaram S,
Shriyak Sridhar,
Annappa B
Abstract:
In Wireless Ad hoc networks (WANETs), nodes separated by considerable distance communicate with each other by relaying their messages through other nodes. However, it might not be in the best interests of a node to forward the message of another node due to power constraints. In addition, all nodes being rational, some nodes may be selfish, i.e. they might not relay data from other nodes so as to…
▽ More
In Wireless Ad hoc networks (WANETs), nodes separated by considerable distance communicate with each other by relaying their messages through other nodes. However, it might not be in the best interests of a node to forward the message of another node due to power constraints. In addition, all nodes being rational, some nodes may be selfish, i.e. they might not relay data from other nodes so as to increase their lifetime. In this paper, we present a fair and incentivized approach for participation in Ad hoc networks. Given the power required for each transmission, we are able to determine the power saving contributed by each intermediate hop. We propose the FAir Share incenTivizEd Ad hoc paRticipation protocol (FASTER), which takes a selected route from a routing protocol as input, to calculate the worth of each node using the cooperative game theory concept of 'Shapley Value' applied on the power saved by each node. This value can be used for allocation of Virtual Currency to the nodes, which can be spent on subsequent message transmissions.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.