-
N-gram Statistical Stemmer for Bangla Corpus
Authors:
Rabeya Sadia,
Md Ataur Rahman,
Md Hanif Seddiqui
Abstract:
Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla stemming mostly relied on eliminating multiple suffixes from a solitary word through a recursive rule based procedure to recover progressively applicable relat…
▽ More
Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla stemming mostly relied on eliminating multiple suffixes from a solitary word through a recursive rule based procedure to recover progressively applicable relative root. Our proposed system has enhanced the aforementioned exploration by actualizing one of the stemming algorithms called N-gram stemming. By utilizing an affiliation measure called dice coefficient, related sets of words are clustered depending on their character structure. The smallest word in one cluster may be considered as the stem. We additionally analyzed Affinity Propagation clustering algorithms with coefficient similarity as well as with median similarity. Our result indicates N-gram stemming techniques to be effective in general which gave us around 87% accurate clusters.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Comparison of Classical Machine Learning Approaches on Bangla Textual Emotion Analysis
Authors:
Md. Ataur Rahman,
Md. Hanif Seddiqui
Abstract:
Detecting emotions from text is an extension of simple sentiment polarity detection. Instead of considering only positive or negative sentiments, emotions are conveyed using more tangible manner; thus, they can be expressed as many shades of gray. This paper manifests the results of our experimentation for fine-grained emotion analysis on Bangla text. We gathered and annotated a text corpus consis…
▽ More
Detecting emotions from text is an extension of simple sentiment polarity detection. Instead of considering only positive or negative sentiments, emotions are conveyed using more tangible manner; thus, they can be expressed as many shades of gray. This paper manifests the results of our experimentation for fine-grained emotion analysis on Bangla text. We gathered and annotated a text corpus consisting of user comments from several Facebook groups regarding socio-economic and political issues, and we made efforts to extract the basic emotions (sadness, happiness, disgust, surprise, fear, anger) conveyed through these comments. Finally, we compared the results of the five most popular classical machine learning techniques namely Naive Bayes, Decision Tree, k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and K-Means Clustering with several combinations of features. Our best model (SVM with a non-linear radial-basis function (RBF) kernel) achieved an overall average accuracy score of 52.98% and an F1 score (macro) of 0.3324
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Review on Telemonitoring of Maternal Health care Targeting Medical Cyber-Physical Systems
Authors:
Mohammod Abul Kashem,
Md. Hanif Seddiqui,
Nejib Moalla,
Aicha Sekhari,
Yacine Ouzrout
Abstract:
We aim to review available literature related to the telemonitoring of maternal health care for a comprehensive understanding of the roles of Medical Cyber-Physical-Systems (MCPS) as cutting edge technology in maternal risk factor management, and for understanding the possible research gap in the domain. In this regard, we search literature through google scholar and PubMed databases for published…
▽ More
We aim to review available literature related to the telemonitoring of maternal health care for a comprehensive understanding of the roles of Medical Cyber-Physical-Systems (MCPS) as cutting edge technology in maternal risk factor management, and for understanding the possible research gap in the domain. In this regard, we search literature through google scholar and PubMed databases for published studies that focus on maternal telemonitoring systems using sensors, Cyber-Physical-System (CPS) and information decision systems for addressing risk factors We extract 1340 articles relevant to maternal health care that addresses different risk factors as their managerial issues. Of a large number of relevant articles, we included 26 prospective studies relating to sensors or Medical Cyber-Physical-Systems (MCPS) based maternal telemonitoring. Of the 1340 primary articles, we have short-listed 26 articles (12 articles for risk factor analysis, 9 for synthesis matrices and 5 papers for finding essential elements. We have extracted 17 vital symptoms as maternal risk factors during pregnancy. Moreover, we have identified a number of cyber-frameworks as the basis of information decision support system to cope with the different maternal complexities. We have found the Medical Cyber-Physical System (MCPS) as a promising technology to manage the vital risk factors quickly and efficiently by the care provider from a distant place to reduce the fatal risks. Despite communication issues, MCPS is a key-enabling technology to cope with the advancement of telemonitoring paradigm in the maternal health care system.
△ Less
Submitted 26 April, 2016;
originally announced July 2016.
-
An Efficient Metric of Automatic Weight Generation for Properties in Instance Matching Technique
Authors:
Md. Hanif Seddiqui,
Rudra Pratap Deb Nath,
Masaki Aono
Abstract:
The proliferation of heterogeneous data sources of semantic knowledge base intensifies the need of an automatic instance matching technique. However, the efficiency of instance matching is often influenced by the weight of a property associated to instances. Automatic weight generation is a non-trivial, however an important task in instance matching technique. Therefore, identifying an appropriate…
▽ More
The proliferation of heterogeneous data sources of semantic knowledge base intensifies the need of an automatic instance matching technique. However, the efficiency of instance matching is often influenced by the weight of a property associated to instances. Automatic weight generation is a non-trivial, however an important task in instance matching technique. Therefore, identifying an appropriate metric for generating weight for a property automatically is nevertheless a formidable task. In this paper, we investigate an approach of generating weights automatically by considering hypotheses: (1) the weight of a property is directly proportional to the ratio of the number of its distinct values to the number of instances contain the property, and (2) the weight is also proportional to the ratio of the number of distinct values of a property to the number of instances in a training dataset. The basic intuition behind the use of our approach is the classical theory of information content that infrequent words are more informative than frequent ones. Our mathematical model derives a metric for generating property weights automatically, which is applied in instance matching system to produce re-conciliated instances efficiently. Our experiments and evaluations show the effectiveness of our proposed metric of automatic weight generation for properties in an instance matching technique.
△ Less
Submitted 12 February, 2015;
originally announced February 2015.