Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Shen, Dazhong; Song, Guanglu; Xue, Zeyue; Wang, Fu-Yun; Liu, Yu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.05384v1 (cs)

[Submitted on 8 Apr 2024]

Title:Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Authors:Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu

View PDF HTML (experimental)

Abstract:Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space. However, we argue that a global CFG scale results in spatial inconsistency on varying semantic strengths and suboptimal image quality. To address this problem, we present a novel approach, Semantic-aware Classifier-Free Guidance (S-CFG), to customize the guidance degrees for different semantic units in text-to-image diffusion models. Specifically, we first design a training-free semantic segmentation method to partition the latent image into relatively independent semantic regions at each denoising step. In particular, the cross-attention map in the denoising U-net backbone is renormalized for assigning each patch to the corresponding token, while the self-attention map is used to complete the semantic regions. Then, to balance the amplification of diverse semantic units, we adaptively adjust the CFG scales across different semantic regions to rescale the text guidance degrees into a uniform level. Finally, extensive experiments demonstrate the superiority of S-CFG over the original CFG strategy on various text-to-image diffusion models, without requiring any extra training cost. our codes are available at this https URL.

Comments:	accepted by CVPR-2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.05384 [cs.CV]
	(or arXiv:2404.05384v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.05384

Submission history

From: Dazhong Shen [view email]
[v1] Mon, 8 Apr 2024 10:45:29 UTC (20,062 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators