Search | arXiv e-print repository

Identifying and Categorizing Offensive Language in Social Media

Abstract: Offensive language is pervasive in social media. Individuals frequently take advantage of the perceived anonymity of computer-mediated communication, using this to engage in behavior that many of them would not consider in real life. The automatic identification of offensive content online is an important task that has gained more attention in recent years. This task can be modeled as a supervised… ▽ More Offensive language is pervasive in social media. Individuals frequently take advantage of the perceived anonymity of computer-mediated communication, using this to engage in behavior that many of them would not consider in real life. The automatic identification of offensive content online is an important task that has gained more attention in recent years. This task can be modeled as a supervised classification problem in which systems are trained using a dataset containing posts that are annotated with respect to the presence of some form(s) of abusive or offensive content. The objective of this study is to provide a description of a classification system built for SemEval-2019 Task 6: OffensEval. This system classifies a tweet as either offensive or not offensive (Sub-task A) and further classifies offensive tweets into categories (Sub-tasks B \& C). We trained machine learning and deep learning models along with data preprocessing and sampling techniques to come up with the best results. Models discussed include Naive Bayes, SVM, Logistic Regression, Random Forest and LSTM. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:1911.01217 [pdf]

Detect Toxic Content to Improve Online Conversations

Authors: Deepshi Mediratta, Nikhil Oswal

Abstract: Social media is filled with toxic content. The aim of this paper is to build a model that can detect insincere questions. We use the 'Quora Insincere Questions Classification' dataset for our analysis. The dataset is composed of sincere and insincere questions, with the majority of sincere questions. The dataset is processed and analyzed using Python and its libraries such as sklearn, numpy, panda… ▽ More Social media is filled with toxic content. The aim of this paper is to build a model that can detect insincere questions. We use the 'Quora Insincere Questions Classification' dataset for our analysis. The dataset is composed of sincere and insincere questions, with the majority of sincere questions. The dataset is processed and analyzed using Python and its libraries such as sklearn, numpy, pandas, keras etc. The dataset is converted to vector form using word embeddings such as GloVe, Wiki-news and TF-IDF. The imbalance in the dataset is handled by resampling techniques. We train and compare various machine learning and deep learning models to come up with the best results. Models discussed include SVM, Naive Bayes, GRU and LSTM. △ Less

Submitted 28 October, 2019; originally announced November 2019.

arXiv:1910.13827 [pdf, other]

Predicting Rainfall using Machine Learning Techniques

Authors: Nikhil Oswal

Abstract: Rainfall prediction is one of the challenging and uncertain tasks which has a significant impact on human society. Timely and accurate predictions can help to proactively reduce human and financial loss. This study presents a set of experiments which involve the use of prevalent machine learning techniques to build models to predict whether it is going to rain tomorrow or not based on weather data… ▽ More Rainfall prediction is one of the challenging and uncertain tasks which has a significant impact on human society. Timely and accurate predictions can help to proactively reduce human and financial loss. This study presents a set of experiments which involve the use of prevalent machine learning techniques to build models to predict whether it is going to rain tomorrow or not based on weather data for that particular day in major cities of Australia. This comparative study is conducted concentrating on three aspects: modeling inputs, modeling methods, and pre-processing techniques. The results provide a comparison of various evaluation metrics of these machine learning techniques and their reliability to predict the rainfall by analyzing the weather data. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.12816 [pdf, other]

Technical Debt: Identify, Measure and Monitor

Authors: Nikhil Oswal

Abstract: Technical Debt is a term begat by Ward Cunningham to signify the measure of adjust required to put a software into that state which it ought to have had from the earliest starting point. Often organizations need to support continuous and fast delivery of customer value both in short and a long-term perspective and later have to compromise with the quality and productivity of the software. So, a si… ▽ More Technical Debt is a term begat by Ward Cunningham to signify the measure of adjust required to put a software into that state which it ought to have had from the earliest starting point. Often organizations need to support continuous and fast delivery of customer value both in short and a long-term perspective and later have to compromise with the quality and productivity of the software. So, a simple solution could be to repay the debts as and when they are encountered to avoid maintainability cost and subsequent delays. Therefore, it has become inevitable to identify and come up with techniques so as to know when, what and how TD items to repay. This study aims to explore how to identify, measure and monitor technical debt using SonarQube and PMD. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Showing 1–4 of 4 results for author: Oswal, N