Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Liu, Zeyu; Liang, Weicong; Liang, Zhanhao; Luo, Chong; Li, Ji; Huang, Gao; Yuan, Yuhui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.09622 (cs)

[Submitted on 14 Mar 2024]

Title:Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Authors:Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan

View PDF HTML (experimental)

Abstract:Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset. We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than $20\%$ to nearly $90\%$ on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts. Finally, through fine-tuning Glyph-SDXL with a small set of high-quality, photorealistic images featuring visual text, we showcase a substantial improvement in scene text rendering capabilities in open-domain real images. These compelling outcomes aim to encourage further exploration in designing customized text encoders for diverse and challenging tasks.

Comments:	technical report, 18 pages, 19 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.09622 [cs.CV]
	(or arXiv:2403.09622v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.09622

Submission history

From: Yuhui Yuan [view email]
[v1] Thu, 14 Mar 2024 17:55:33 UTC (45,392 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators