Representing text as abstract images enables image classifiers to also simultaneously classify text

Petrie, Stephen M.; Julius, T'Mir D.

Computer Science > Computation and Language

arXiv:1908.07846 (cs)

[Submitted on 19 Aug 2019 (v1), last revised 6 Feb 2020 (this version, v3)]

Title:Representing text as abstract images enables image classifiers to also simultaneously classify text

Authors:Stephen M. Petrie, T'Mir D. Julius

View PDF

Abstract:We introduce a novel method for converting text data into abstract image representations, which allows image-based processing techniques (e.g. image classification networks) to be applied to text-based comparison problems. We apply the technique to entity disambiguation of inventor names in US patents. The method involves converting text from each pairwise comparison between two inventor name records into a 2D RGB (stacked) image representation. We then train an image classification neural network to discriminate between such pairwise comparison images, and use the trained network to label each pair of records as either matched (same inventor) or non-matched (different inventors), obtaining highly accurate results. Our new text-to-image representation method could also be used more broadly for other NLP comparison problems, such as disambiguation of academic publications, or for problems that require simultaneous classification of both text and image datasets.

Comments:	Minor changes in order to submit paper to a different conference (e.g. made minor changes to writing in several places and added extra data to Table 3 in order to make it clearer)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1908.07846 [cs.CL]
	(or arXiv:1908.07846v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1908.07846

Submission history

From: Stephen Petrie Dr [view email]
[v1] Mon, 19 Aug 2019 17:28:29 UTC (254 KB)
[v2] Fri, 27 Sep 2019 08:39:41 UTC (254 KB)
[v3] Thu, 6 Feb 2020 07:28:03 UTC (346 KB)

Computer Science > Computation and Language

Title:Representing text as abstract images enables image classifiers to also simultaneously classify text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Representing text as abstract images enables image classifiers to also simultaneously classify text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators