Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

Belal, Tanveer Ahmed; Shahariar, G. M.; Kabir, Md. Hasanul

doi:10.1109/ECCE57851.2023.10101588

Computer Science > Computation and Language

arXiv:2304.04087 (cs)

[Submitted on 8 Apr 2023]

Title:Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

Authors:Tanveer Ahmed Belal, G. M. Shahariar, Md. Hasanul Kabir

View PDF

Abstract:This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among which 8,488 are Toxic and any toxic comment may correspond to one or more of the six toxic categories - vulgar, hate, religious, threat, troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT Embedding achieved 89.42% accuracy for the binary classification task while as a multi-label classifier, a combination of Convolutional Neural Network and Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the predictions and interpret the word feature importance during classification by the proposed models, we utilized Local Interpretable Model-Agnostic Explanations (LIME) framework. We have made our dataset public and can be accessed at - this https URL

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2304.04087 [cs.CL]
	(or arXiv:2304.04087v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.04087
Journal reference:	2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)
Related DOI:	https://doi.org/10.1109/ECCE57851.2023.10101588

Submission history

From: Tanveer Ahmed Belal [view email]
[v1] Sat, 8 Apr 2023 19:28:26 UTC (246 KB)

Computer Science > Computation and Language

Title:Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators