Annotation Free Semantic Segmentation with Vision Foundation Models

Seifi, Soroush; Reino, Daniel Olmeda; Despinoy, Fabien; Aljundi, Rahaf

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.09307 (cs)

[Submitted on 14 Mar 2024 (v1), last revised 24 May 2024 (this version, v2)]

Title:Annotation Free Semantic Segmentation with Vision Foundation Models

Authors:Soroush Seifi, Daniel Olmeda Reino, Fabien Despinoy, Rahaf Aljundi

View PDF HTML (experimental)

Abstract:Semantic Segmentation is one of the most challenging vision tasks, usually requiring large amounts of training data with expensive pixel level annotations. With the success of foundation models and especially vision-language models, recent works attempt to achieve zeroshot semantic segmentation while requiring either large-scale training or additional image/pixel level annotations. In this work, we generate free annotations for any semantic segmentation dataset using existing foundation models. We use CLIP to detect objects and SAM to generate high quality object masks. Next, we build a lightweight module on top of a self-supervised vision encoder, DinoV2, to align the patch features with a pretrained text encoder for zeroshot semantic segmentation. Our approach can bring language-based semantics to any pretrained vision encoder with minimal training. Our module is lightweight, uses foundation models as the sole source of supervision and shows impressive generalization capability from little training data with no annotation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.09307 [cs.CV]
	(or arXiv:2403.09307v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.09307

Submission history

From: Soroush Seifi [view email]
[v1] Thu, 14 Mar 2024 11:57:58 UTC (37,634 KB)
[v2] Fri, 24 May 2024 11:05:00 UTC (38,516 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Annotation Free Semantic Segmentation with Vision Foundation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Annotation Free Semantic Segmentation with Vision Foundation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators