-
Multi Class Depression Detection Through Tweets using Artificial Intelligence
Authors:
Muhammad Osama Nusrat,
Waseem Shahzad,
Saad Ahmed Jamal
Abstract:
Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc.…
▽ More
Depression is a significant issue nowadays. As per the World Health Organization (WHO), in 2023, over 280 million individuals are grappling with depression. This is a huge number; if not taken seriously, these numbers will increase rapidly. About 4.89 billion individuals are social media users. People express their feelings and emotions on platforms like Twitter, Facebook, Reddit, Instagram, etc. These platforms contain valuable information which can be used for research purposes. Considerable research has been conducted across various social media platforms. However, certain limitations persist in these endeavors. Particularly, previous studies were only focused on detecting depression and the intensity of depression in tweets. Also, there existed inaccuracies in dataset labeling. In this research work, five types of depression (Bipolar, major, psychotic, atypical, and postpartum) were predicted using tweets from the Twitter database based on lexicon labeling. Explainable AI was used to provide reasoning by highlighting the parts of tweets that represent type of depression. Bidirectional Encoder Representations from Transformers (BERT) was used for feature extraction and training. Machine learning and deep learning methodologies were used to train the model. The BERT model presented the most promising results, achieving an overall accuracy of 0.96.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Hierarchical Text Classification of Urdu News using Deep Neural Network
Authors:
Taimoor Ahmed Javed,
Waseem Shahzad,
Umair Arshad
Abstract:
Digital text is increasing day by day on the internet. It is very challenging to classify a large and heterogeneous collection of data, which require improved information processing methods to organize text. To classify large size of corpus, one common approach is to use hierarchical text classification, which aims to classify textual data in a hierarchical structure. Several approaches have been…
▽ More
Digital text is increasing day by day on the internet. It is very challenging to classify a large and heterogeneous collection of data, which require improved information processing methods to organize text. To classify large size of corpus, one common approach is to use hierarchical text classification, which aims to classify textual data in a hierarchical structure. Several approaches have been proposed to tackle classification of text but most of the research has been done on English language. This paper proposes a deep learning model for hierarchical text classification of news in Urdu language - consisting of 51,325 sentences from 8 online news websites belonging to the following genres: Sports; Technology; and Entertainment. The objectives of this paper are twofold: (1) to develop a large human-annotated dataset of news in Urdu language for hierarchical text classification; and (2) to classify Urdu news hierarchically using our proposed model based on LSTM mechanism named as Hierarchical Multi-layer LSTMs (HMLSTM). Our model consists of two modules: Text Representing Layer, for obtaining text representation in which we use Word2vec embedding to transform the words to vector and Urdu Hierarchical LSTM Layer (UHLSTML) an end-to-end fully connected deep LSTMs network to perform automatic feature learning, we train one LSTM layer for each level of the class hierarchy. We have performed extensive experiments on our self created dataset named as Urdu News Dataset for Hierarchical Text Classification (UNDHTC). The result shows that our proposed method is very effective for hierarchical text classification and it outperforms baseline methods significantly and also achieved good results as compare to deep neural model.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu
Authors:
Usama Khalid,
Aizaz Hussain,
Muhammad Umair Arshad,
Waseem Shahzad,
Mirza Omer Beg
Abstract:
Urdu is a widely spoken language in South Asia. Though immoderate literature exists for the Urdu language still the data isn't enough to naturally process the language by NLP techniques. Very efficient language models exist for the English language, a high resource language, but Urdu and other under-resourced languages have been neglected for a long time. To create efficient language models for th…
▽ More
Urdu is a widely spoken language in South Asia. Though immoderate literature exists for the Urdu language still the data isn't enough to naturally process the language by NLP techniques. Very efficient language models exist for the English language, a high resource language, but Urdu and other under-resourced languages have been neglected for a long time. To create efficient language models for these languages we must have good word embedding models. For Urdu, we can only find word embeddings trained and developed using the skip-gram model. In this paper, we have built a corpus for Urdu by scra** and integrating data from various sources and compiled a vocabulary for the Urdu language. We also modify fasttext embeddings and N-Grams models to enable training them on our built corpus. We have used these trained embeddings for a word similarity task and compared the results with existing techniques.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
NUBOT: Embedded Knowledge Graph With RASA Framework for Generating Semantic Intents Responses in Roman Urdu
Authors:
Johar Shabbir,
Muhammad Umair Arshad,
Waseem Shahzad
Abstract:
The understanding of the human language is quantified by identifying intents and entities. Even though classification methods that rely on labeled information are often used for the comprehension of language understanding, it is incredibly time consuming and tedious process to generate high propensity supervised datasets. In this paper, we present the generation of accurate intents for the corresp…
▽ More
The understanding of the human language is quantified by identifying intents and entities. Even though classification methods that rely on labeled information are often used for the comprehension of language understanding, it is incredibly time consuming and tedious process to generate high propensity supervised datasets. In this paper, we present the generation of accurate intents for the corresponding Roman Urdu unstructured data and integrate this corpus in RASA NLU module for intent classification. We embed knowledge graph with RASA Framework to maintain the dialog history for semantic based natural language mechanism for chatbot communication. We compare results of our work with existing linguistic systems combined with semantic technologies. Minimum accuracy of intents generation is 64 percent of confidence and in the response generation part minimum accuracy is 82.1 percent and maximum accuracy gain is 96.7 percent. All the scores refers to log precision, recall, and f1 measure for each intents once summarized for all. Furthermore, it creates a confusion matrix represents that which intents are ambiguously recognized by approach.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
A Diverse Clustering Particle Swarm Optimizer for Dynamic Environment: To Locate and Track Multiple Optima
Authors:
Zahid Iqbal,
Waseem Shahzad
Abstract:
In real life, mostly problems are dynamic. Many algorithms have been proposed to handle the static problems, but these algorithms do not handle or poorly handle the dynamic environment problems. Although, many algorithms have been proposed to handle dynamic problems but still, there are some limitations or drawbacks in every algorithm regarding diversity of particles and tracking of already found…
▽ More
In real life, mostly problems are dynamic. Many algorithms have been proposed to handle the static problems, but these algorithms do not handle or poorly handle the dynamic environment problems. Although, many algorithms have been proposed to handle dynamic problems but still, there are some limitations or drawbacks in every algorithm regarding diversity of particles and tracking of already found optima. To overcome these limitations/drawbacks, we have proposed a new efficient algorithm to handle the dynamic environment effectively by tracking and locating multiple optima and by improving the diversity and convergence speed of algorithm. In this algorithm, a new method has been proposed which explore the undiscovered areas of search space to increase the diversity of algorithm. This algorithm also uses a method to effectively handle the overlapped and overcrowded particles. Branke has proposed moving peak benchmark which is commonly used MBP in literature. We also have performed different experiments on Moving Peak Benchmark. After comparing the experimental results with different state of art algorithms, it was seen that our algorithm performed more efficiently.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
A comparative study of machine learning techniques used in non-clinical systems for continuous healthcare of independent livings
Authors:
Zahid Iqbal,
Rafia Ilyas,
Waseem Shahzad,
Irum Inayat
Abstract:
New technologies are adapted to made progress in healthcare especially for independent livings. Medication at distance is leading to integrate technologies with medical. Machine learning methods in collaboration with wearable sensor network technology are used to find hidden patterns in data, detect patient movements, observe habits of patient, analyze clinical data of patient, find intention of p…
▽ More
New technologies are adapted to made progress in healthcare especially for independent livings. Medication at distance is leading to integrate technologies with medical. Machine learning methods in collaboration with wearable sensor network technology are used to find hidden patterns in data, detect patient movements, observe habits of patient, analyze clinical data of patient, find intention of patients and make decision on the bases of gathered data. This research performs comparative study on non-clinical systems in healthcare for independent livings. In this study, these systems are sub-divided w.r.t their working into two types: single purpose systems and multi-purpose systems. Systems that are built for single specific purpose (e.g. detect fall, detect emergent state of chronic disease patient) and cannot support healthcare generically are known as single purpose systems, where multi-purpose systems are built to serve for multiple problems (e.g. heart attack etc.) by using single system. This study analyzes usages of machine learning techniques in healthcare systems for independent livings. Answer Set Programming (ASP), Artificial Neural Networks, Classification, Sampling and Rule Based Reasoning etc. are some state of art techniques used to determine emergent situations and observe changes in patient data. Among all methods, ASP logic is used most widely, it is due to its feature to deal with incomplete data. It is also observed that system using ANN shows better accuracy than other systems. It is observed that most of the systems created are for single purpose. In this work, 10 single purpose systems and 5 multi-purpose systems are studied. There is need to create more generic systems that can be used for patients with multiple diseases. Also most of the systems created are prototypical. There is need to create systems that can serve healthcare services in real world.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Scaling up for high dimensional and high speed data streams: HSDStream
Authors:
Irshad Ahmed,
Irfan Ahmed,
Waseem Shahzad
Abstract:
This paper presents a novel high speed clustering scheme for high dimensional data streams. Data stream clustering has gained importance in different applications, for example, in network monitoring, intrusion detection, and real-time sensing are few of those. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high di…
▽ More
This paper presents a novel high speed clustering scheme for high dimensional data streams. Data stream clustering has gained importance in different applications, for example, in network monitoring, intrusion detection, and real-time sensing are few of those. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high dimensionality make it non-trivial. In order to tackle this problem, projected subspace within the high dimensions and limited window sized data per unit of time are used for clustering purpose. We propose a High Speed and Dimensions data stream clustering scheme (HSDStream) which employs exponential moving averages to reduce the size of the memory and speed up the processing of projected subspace data stream. The proposed algorithm has been tested against HDDStream for cluster purity, memory usage, and the cluster sensitivity. Experimental results have been obtained for corrected KDD intrusion detection dataset. These results show that HSDStream outperforms the HDDStream in all performance metrics, especially the memory usage and the processing speed.
△ Less
Submitted 12 October, 2015;
originally announced October 2015.