Unambiguous Recognition Should Not Rely Solely on Natural Language Training

Luo, Renqing; Xu, Yuhan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.17148 (cs)

[Submitted on 24 Jun 2024]

Title:Unambiguous Recognition Should Not Rely Solely on Natural Language Training

Authors:Renqing Luo, Yuhan Xu

View PDF HTML (experimental)

Abstract:In LaTeX text recognition using Transformer-based architectures, this paper identifies certain "bias" issues. For instance, $e-t$ is frequently misrecognized as $e^{-t}$. This bias stems from the inherent characteristics of the dataset. To mitigate this bias, we propose a LaTeX printed text recognition model trained on a mixed dataset of pseudo-formulas and pseudo-text. The model employs a Swin Transformer as the encoder and a RoBERTa model as the decoder. Experimental results demonstrate that this approach reduces "bias", enhancing the accuracy and robustness of text recognition. For clear images, the model strictly adheres to the image content; for blurred images, it integrates both image and contextual information to produce reasonable recognition results.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.17148 [cs.CV]
	(or arXiv:2406.17148v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.17148

Submission history

From: Renqing Luo [view email]
[v1] Mon, 24 Jun 2024 21:38:36 UTC (111 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2024-06

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Unambiguous Recognition Should Not Rely Solely on Natural Language Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Unambiguous Recognition Should Not Rely Solely on Natural Language Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators