A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

Yang, Shiqi; Hashimoto, Atsushi; Ushiku, Yoshitaka

Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.02862 (cs)

[Submitted on 6 Jul 2023 (v1), last revised 1 Aug 2023 (this version, v2)]

Title:A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

Authors:Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku

View PDF

Abstract:In recent years large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields, such as image recognition and generation. Though achieving great success in their original application case, it is still unclear whether those foundation models can be applied to other different downstream tasks. In this paper, we conduct a short survey on the current methods for discriminative dense recognition tasks, which are built on the pretrained foundation model. And we also provide some preliminary experimental analysis of an existing open-vocabulary segmentation method based on Stable Diffusion, which indicates the current way of deploying diffusion model for segmentation is not optimal. This aims to provide insights for future research on adopting foundation model for downstream task.

Comments:	This is a short report on the current usage of foundation model (mainly pretrained diffusion model) for downstream dense recognition task (e.g., open vocabulary segmentation). We hope this short report could give an insight to the future research
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2307.02862 [cs.CV]
	(or arXiv:2307.02862v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.02862

Submission history

From: Shiqi Yang [view email]
[v1] Thu, 6 Jul 2023 08:57:53 UTC (16,090 KB)
[v2] Tue, 1 Aug 2023 06:47:27 UTC (16,178 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators