-
Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss
Authors:
Debapriya Tula,
Shreyas MS,
Viswanatha Reddy,
Pranjal Sahu,
Sumanth Doddapaneni,
Prathyush Potluri,
Rohan Sukumaran,
Parth Patwa
Abstract:
Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and ann…
▽ More
Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and annotation of offensive content, there will always exist a significant class imbalance between offensive and non-offensive content. In this paper, we introduce a novel Code-Mixing Index (CMI) based focal loss which circumvents two challenges (1) code-mixing in languages (2) class imbalance problem for Dravidian language offense detection. We also replace the conventional dot product-based classifier with the cosine-based classifier which results in a boost in performance. Further, we use multilingual models that help transfer characteristics learnt across languages to work effectively with low resourced languages. It is also important to note that our model handles instances of mixed script (say usage of Latin and Dravidian-Tamil script) as well. To summarize, our model can handle offensive language detection in a low-resource, class imbalanced, multilingual and code-mixed setting.
△ Less
Submitted 6 May, 2022; v1 submitted 12 November, 2021;
originally announced November 2021.
-
Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking
Authors:
Rutvik Vijjali,
Prathyush Potluri,
Siddharth Kumar,
Sundeep Teki
Abstract:
The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people believing in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health ha…
▽ More
The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people believing in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health hazards. We developed a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing. The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user claims about particular COVID-19 claims. The second model verifies the level of truth in the claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset. The dataset is based on a publicly available knowledge source consisting of more than 5000 COVID-19 false claims and verified explanations, a subset of which was internally annotated and cross-validated to train and evaluate our models. We evaluate a series of models based on classical text-based features to more contextual Transformer based models and observe that a model pipeline based on BERT and ALBERT for the two stages respectively yields the best results.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
Training of CC4 Neural Network with Spread Unary Coding
Authors:
Pushpa Sree Potluri
Abstract:
This paper adapts the corner classification algorithm (CC4) to train the neural networks using spread unary inputs. This is an important problem as spread unary appears to be at the basis of data representation in biological learning. The modified CC4 algorithm is tested using the pattern classification experiment and the results are found to be good. Specifically, we show that the number of miscl…
▽ More
This paper adapts the corner classification algorithm (CC4) to train the neural networks using spread unary inputs. This is an important problem as spread unary appears to be at the basis of data representation in biological learning. The modified CC4 algorithm is tested using the pattern classification experiment and the results are found to be good. Specifically, we show that the number of misclassified points is not particularly sensitive to the chosen radius of generalization.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.
-
Error Correction Capacity of Unary Coding
Authors:
Pushpa Sree Potluri
Abstract:
Unary coding has found applications in data compression, neural network training, and in explaining the production mechanism of birdsong. Unary coding is redundant; therefore it should have inherent error correction capacity. An expression for the error correction capability of unary coding for the correction of single errors has been derived in this paper.
Unary coding has found applications in data compression, neural network training, and in explaining the production mechanism of birdsong. Unary coding is redundant; therefore it should have inherent error correction capacity. An expression for the error correction capability of unary coding for the correction of single errors has been derived in this paper.
△ Less
Submitted 26 November, 2014;
originally announced November 2014.