Showing 1–1 of 1 results for author: Warner, N

Search v0.5.6 released 2020-02-24

arXiv:2311.14822 [pdf, other]

cs.CV

Text and Click inputs for unambiguous open vocabulary instance segmentation

Authors: Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar

Abstract: Segmentation localizes objects in an image on a fine-grained per-pixel scale. Segmentation benefits by humans-in-the-loop to provide additional input of objects to segment using a combination of foreground or background clicks. Tasks include photoediting or novel dataset annotation, where human annotators leverage an existing segmentation model instead of drawing raw pixel level annotations. We pr… ▽ More Segmentation localizes objects in an image on a fine-grained per-pixel scale. Segmentation benefits by humans-in-the-loop to provide additional input of objects to segment using a combination of foreground or background clicks. Tasks include photoediting or novel dataset annotation, where human annotators leverage an existing segmentation model instead of drawing raw pixel level annotations. We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment. Compared to previous approaches, we leverage open-vocabulary image-text models to support a wide-range of text prompts. Conditioning segmentations on text prompts improves the accuracy of segmentations on novel or unseen classes. We demonstrate that the combination of a single user-specified foreground click and a text prompt allows a model to better disambiguate overlap** or co-occurring semantic categories, such as "tie", "suit", and "person". We study these results across common segmentation datasets such as refCOCO, COCO, VOC, and OpenImages. Source code available here. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 20 pages, 9 figures, 8 tables

Search v0.5.6 released 2020-02-24