Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

Gromov, Vasilii; Dang, Quynh Nhu

doi:10.1007/978-3-031-45170-6_3

Computer Science > Computation and Language

arXiv:2311.11441 (cs)

[Submitted on 19 Nov 2023]

Title:Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

Authors:Vasilii Gromov, Quynh Nhu Dang

View PDF

Abstract:With the development of generative models like GPT-3, it is increasingly more challenging to differentiate generated texts from human-written ones. There is a large number of studies that have demonstrated good results in bot identification. However, the majority of such works depend on supervised learning methods that require labelled data and/or prior knowledge about the bot-model architecture. In this work, we propose a bot identification algorithm that is based on unsupervised learning techniques and does not depend on a large amount of labelled data. By combining findings in semantic analysis by clustering (crisp and fuzzy) and information techniques, we construct a robust model that detects a generated text for different types of bot. We find that the generated texts tend to be more chaotic while literary works are more complex. We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.

Comments:	Accepted in Pattern Recognition and Machine Intelligence 2023. 8 pages, 3 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.11441 [cs.CL]
	(or arXiv:2311.11441v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.11441
Related DOI:	https://doi.org/10.1007/978-3-031-45170-6_3

Submission history

From: Quynh Nhu Dang [view email]
[v1] Sun, 19 Nov 2023 22:29:15 UTC (235 KB)

Computer Science > Computation and Language

Title:Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators