-
Borhan: A Novel System for Prioritized Default Logic
Authors:
Alireza Shahbazi,
Mohammad Hossein Khojasteh,
Behrouz Minaei-Bidgoli
Abstract:
Prioritized Default Logic presents an optimal solution for addressing real-world problems characterized by incomplete information and the need to establish preferences among diverse scenarios. Although it has reached great success in the theoretical aspect, its practical implementation has received less attention. In this article, we introduce Borhan, a system designed and created for prioritized…
▽ More
Prioritized Default Logic presents an optimal solution for addressing real-world problems characterized by incomplete information and the need to establish preferences among diverse scenarios. Although it has reached great success in the theoretical aspect, its practical implementation has received less attention. In this article, we introduce Borhan, a system designed and created for prioritized default logic reasoning. To create an effective system, we have refined existing default logic definitions, including the extension concept, and introduced novel concepts. In addition to its theoretical merits, Borhan proves its practical utility by efficiently addressing a range of prioritized default logic problems. In addition, one of the advantages of our system is its ability to both store and report the explanation path for any inferred triple, enhancing transparency and interpretability. Borhan is offered as an open-source system, implemented in Python, and even offers a simplified Java version as a plugin for the Protege ontology editor. Borhan thus represents a significant step forward in bridging the gap between the theoretical foundations of default logic and its real-world applications.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Emulating the Human Mind: A Neural-symbolic Link Prediction Model with Fast and Slow Reasoning and Filtered Rules
Authors:
Mohammad Hossein Khojasteh,
Najmeh Torabian,
Ali Farjami,
Saeid Hosseini,
Behrouz Minaei-Bidgoli
Abstract:
Link prediction is an important task in addressing the incompleteness problem of knowledge graphs (KG). Previous link prediction models suffer from issues related to either performance or explanatory capability. Furthermore, models that are capable of generating explanations, often struggle with erroneous paths or reasoning leading to the correct answer. To address these challenges, we introduce a…
▽ More
Link prediction is an important task in addressing the incompleteness problem of knowledge graphs (KG). Previous link prediction models suffer from issues related to either performance or explanatory capability. Furthermore, models that are capable of generating explanations, often struggle with erroneous paths or reasoning leading to the correct answer. To address these challenges, we introduce a novel Neural-Symbolic model named FaSt-FLiP (stands for Fast and Slow Thinking with Filtered rules for Link Prediction task), inspired by two distinct aspects of human cognition: "commonsense reasoning" and "thinking, fast and slow." Our objective is to combine a logical and neural model for enhanced link prediction. To tackle the challenge of dealing with incorrect paths or rules generated by the logical model, we propose a semi-supervised method to convert rules into sentences. These sentences are then subjected to assessment and removal of incorrect rules using an NLI (Natural Language Inference) model. Our approach to combining logical and neural models involves first obtaining answers from both the logical and neural models. These answers are subsequently unified using an Inference Engine module, which has been realized through both algorithmic implementation and a novel neural model architecture. To validate the efficacy of our model, we conducted a series of experiments. The results demonstrate the superior performance of our model in both link prediction metrics and the generation of more reliable explanations.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Noor-Ghateh: A Benchmark Dataset for Evaluating Arabic Word Segmenters in Hadith Domain
Authors:
Huda AlShuhayeb,
Behrouz Minaei-Bidgoli,
Mohammad E. Shenassa,
Sayyed-Ali Hossayni
Abstract:
There are many complex and rich morphological subtleties in the Arabic language, which are very useful when analyzing traditional Arabic texts, especially in the historical and religious contexts, and help in understanding the meaning of the texts. Vocabulary separation means separating the word into different parts such as root and affix. In the morphological datasets, the variety of labels and t…
▽ More
There are many complex and rich morphological subtleties in the Arabic language, which are very useful when analyzing traditional Arabic texts, especially in the historical and religious contexts, and help in understanding the meaning of the texts. Vocabulary separation means separating the word into different parts such as root and affix. In the morphological datasets, the variety of labels and the number of data samples helps to evaluate the morphological methods. In this paper, we present a benchmark data set for evaluating the methods of separating Arabic words which include about 223,690 words from the book of Sharia alIslam, which have been labeled by experts. In terms of the volume and variety of words, this dataset is superior to other existing data sets, and as far as we know, there are no Arabic Hadith Domain texts. To evaluate the dataset, we applied different methods such as Farasa, Camel, Madamira, and ALP to the dataset and we reported the annotation quality through four evaluation methods.
△ Less
Submitted 22 June, 2023;
originally announced July 2023.
-
Farspredict: A benchmark dataset for link prediction
Authors:
Najmeh Torabian,
Behrouz Minaei-Bidgoli,
Mohsen Jahanshahi
Abstract:
Link prediction with knowledge graph embedding (KGE) is a popular method for knowledge graph completion. Furthermore, training KGEs on non-English knowledge graph promote knowledge extraction and knowledge graph reasoning in the context of these languages. However, many challenges in non-English KGEs pose to learning a low-dimensional representation of a knowledge graph's entities and relations. T…
▽ More
Link prediction with knowledge graph embedding (KGE) is a popular method for knowledge graph completion. Furthermore, training KGEs on non-English knowledge graph promote knowledge extraction and knowledge graph reasoning in the context of these languages. However, many challenges in non-English KGEs pose to learning a low-dimensional representation of a knowledge graph's entities and relations. This paper proposes "Farspredict" a Persian knowledge graph based on Farsbase (the most comprehensive knowledge graph in Persian). It also explains how the knowledge graph structure affects link prediction accuracy in KGE. To evaluate Farspredict, we implemented the popular models of KGE on it and compared the results with Freebase. Given the analysis results, some optimizations on the knowledge graph are carried out to improve its functionality in the KGE. As a result, a new Persian knowledge graph is achieved. Implementation results in the KGE models on Farspredict outperforming Freebases in many cases. At last, we discuss what improvements could be effective in enhancing the quality of Farspredict and how much it improves.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Evaluating Persian Tokenizers
Authors:
Danial Kamali,
Behrooz Janfada,
Mohammad Ebrahim Shenasa,
Behrouz Minaei-Bidgoli
Abstract:
Tokenization plays a significant role in the process of lexical analysis. Tokens become the input for other natural language processing tasks, like semantic parsing and language modeling. Natural Language Processing in Persian is challenging due to Persian's exceptional cases, such as half-spaces. Thus, it is crucial to have a precise tokenizer for Persian. This article provides a novel work by in…
▽ More
Tokenization plays a significant role in the process of lexical analysis. Tokens become the input for other natural language processing tasks, like semantic parsing and language modeling. Natural Language Processing in Persian is challenging due to Persian's exceptional cases, such as half-spaces. Thus, it is crucial to have a precise tokenizer for Persian. This article provides a novel work by introducing the most widely used tokenizers for Persian and comparing and evaluating their performance on Persian texts using a simple algorithm with a pre-tagged Persian dependency dataset. After evaluating tokenizers with the F1-Score, the hybrid version of the Farsi Verb and Hazm with bounded morphemes fixing showed the best performance with an F1 score of 98.97%.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
KGRefiner: Knowledge Graph Refinement for Improving Accuracy of Translational Link Prediction Methods
Authors:
Mohammad Javad Saeedizade,
Najmeh Torabian,
Behrouz Minaei-Bidgoli
Abstract:
The Link Prediction is the task of predicting missing relations between entities of the knowledge graph. Recent work in link prediction has attempted to provide a model for increasing link prediction accuracy by using more layers in neural network architecture. In this paper, we propose a novel method of refining the knowledge graph so that link prediction operation can be performed more accuratel…
▽ More
The Link Prediction is the task of predicting missing relations between entities of the knowledge graph. Recent work in link prediction has attempted to provide a model for increasing link prediction accuracy by using more layers in neural network architecture. In this paper, we propose a novel method of refining the knowledge graph so that link prediction operation can be performed more accurately using relatively fast translational models. Translational link prediction models, such as TransE, TransH, TransD, have less complexity than deep learning approaches. Our method uses the hierarchy of relationships and entities in the knowledge graph to add the entity information as auxiliary nodes to the graph and connect them to the nodes which contain this information in their hierarchy. Our experiments show that our method can significantly increase the performance of translational link prediction methods in H@10, MR, MRR.
△ Less
Submitted 19 November, 2021; v1 submitted 27 June, 2021;
originally announced June 2021.
-
Interval Probabilistic Fuzzy WordNet
Authors:
Yousef Alizadeh-Q,
Behrouz Minaei-Bidgoli,
Sayyed-Ali Hossayni,
Mohammad-R Akbarzadeh-T,
Diego Reforgiato Recupero,
Mohammad-Reza Rajati,
Aldo Gangemi
Abstract:
WordNet lexical-database groups English words into sets of synonyms called "synsets." Synsets are utilized for several applications in the field of text-mining. However, they were also open to criticism because although, in reality, not all the members of a synset represent the meaning of that synset with the same degree, in practice, they are considered as members of the synset, identically. Thus…
▽ More
WordNet lexical-database groups English words into sets of synonyms called "synsets." Synsets are utilized for several applications in the field of text-mining. However, they were also open to criticism because although, in reality, not all the members of a synset represent the meaning of that synset with the same degree, in practice, they are considered as members of the synset, identically. Thus, the fuzzy version of synsets, called fuzzy-synsets (or fuzzy word-sense classes) were proposed and studied. In this study, we discuss why (type-1) fuzzy synsets (T1 F-synsets) do not properly model the membership uncertainty, and propose an upgraded version of fuzzy synsets in which membership degrees of word-senses are represented by intervals, similar to what in Interval Type 2 Fuzzy Sets (IT2 FS) and discuss that IT2 FS theoretical framework is insufficient for analysis and design of such synsets, and propose a new concept, called Interval Probabilistic Fuzzy (IPF) sets. Then we present an algorithm for constructing the IPF synsets in any language, given a corpus and a word-sense-disambiguation system. Utilizing our algorithm and the open-American-online-corpus (OANC) and UKB word-sense-disambiguation, we constructed and published the IPF synsets of WordNet for English language.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
A Sample-Based Training Method for Distantly Supervised Relation Extraction with Pre-Trained Transformers
Authors:
Mehrdad Nasser,
Mohamad Bagher Sajadi,
Behrouz Minaei-Bidgoli
Abstract:
Multiple instance learning (MIL) has become the standard learning paradigm for distantly supervised relation extraction (DSRE). However, due to relation extraction being performed at bag level, MIL has significant hardware requirements for training when coupled with large sentence encoders such as deep transformer neural networks. In this paper, we propose a novel sampling method for DSRE that rel…
▽ More
Multiple instance learning (MIL) has become the standard learning paradigm for distantly supervised relation extraction (DSRE). However, due to relation extraction being performed at bag level, MIL has significant hardware requirements for training when coupled with large sentence encoders such as deep transformer neural networks. In this paper, we propose a novel sampling method for DSRE that relaxes these hardware requirements. In the proposed method, we limit the number of sentences in a batch by randomly sampling sentences from the bags in the batch. However, this comes at the cost of losing valid sentences from bags. To alleviate the issues caused by random sampling, we use an ensemble of trained models for prediction. We demonstrate the effectiveness of our approach by using our proposed learning setting to fine-tuning BERT on the widely NYT dataset. Our approach significantly outperforms previous state-of-the-art methods in terms of AUC and P@N metrics.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
An Unsupervised Language-Independent Entity Disambiguation Method and its Evaluation on the English and Persian Languages
Authors:
Majid Asgari-Bidhendi,
Behrooz Janfada,
Amir Havangi,
Sayyed Ali Hossayni,
Behrouz Minaei-Bidgoli
Abstract:
Entity Linking is one of the essential tasks of information extraction and natural language understanding. Entity linking mainly consists of two tasks: recognition and disambiguation of named entities. Most studies address these two tasks separately or focus only on one of them. Moreover, most of the state-of-the -art entity linking algorithms are either supervised, which have poor performance in…
▽ More
Entity Linking is one of the essential tasks of information extraction and natural language understanding. Entity linking mainly consists of two tasks: recognition and disambiguation of named entities. Most studies address these two tasks separately or focus only on one of them. Moreover, most of the state-of-the -art entity linking algorithms are either supervised, which have poor performance in the absence of annotated corpora or language-dependent, which are not appropriate for multi-lingual applications. In this paper, we introduce an Unsupervised Language-Independent Entity Disambiguation (ULIED), which utilizes a novel approach to disambiguate and link named entities. Evaluation of ULIED on different English entity linking datasets as well as the only available Persian dataset illustrates that ULIED in most of the cases outperforms the state-of-the-art unsupervised multi-lingual approaches.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text using Deep Neural Networks and Linear Baselines
Authors:
Soroush Javdan,
Taha Shangipour ataei,
Behrouz Minaei-Bidgoli
Abstract:
Sentiment Analysis is a well-studied field of Natural Language Processing. However, the rapid growth of social media and noisy content within them poses significant challenges in addressing this problem with well-established methods and tools. One of these challenges is code-mixing, which means using different languages to convey thoughts in social media texts. Our group, with the name of IUST(use…
▽ More
Sentiment Analysis is a well-studied field of Natural Language Processing. However, the rapid growth of social media and noisy content within them poses significant challenges in addressing this problem with well-established methods and tools. One of these challenges is code-mixing, which means using different languages to convey thoughts in social media texts. Our group, with the name of IUST(username: TAHA), participated at the SemEval-2020 shared task 9 on Sentiment Analysis for Code-Mixed Social Media Text, and we have attempted to develop a system to predict the sentiment of a given code-mixed tweet. We used different preprocessing techniques and proposed to use different methods that vary from NBSVM to more complicated deep neural network models. Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction
Authors:
Majid Asgari-Bidhendi,
Mehrdad Nasser,
Behrooz Janfada,
Behrouz Minaei-Bidgoli
Abstract:
Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracti…
▽ More
Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big-data in the Persian language for different applications. In this paper, we present "PERLEX" as the first Persian dataset for relation extraction, which is an expert-translated version of the "Semeval-2010-Task-8" dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual-BERT contextual word representations. The experiments result in the maximum f-score 77.66% (provided by BERTEM-MTB method) as the state-of-the-art of relation extraction in the Persian language.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph
Authors:
Majid Asgari-Bidhendi,
Behrooz Janfada,
Behrouz Minaei-Bidgoli
Abstract:
While most of the knowledge bases already support the English language, there is only one knowledge base for the Persian language, known as FarsBase, which is automatically created via semi-structured web information. Unlike English knowledge bases such as Wikidata, which have tremendous community support, the population of a knowledge base like FarsBase must rely on automatically extracted knowle…
▽ More
While most of the knowledge bases already support the English language, there is only one knowledge base for the Persian language, known as FarsBase, which is automatically created via semi-structured web information. Unlike English knowledge bases such as Wikidata, which have tremendous community support, the population of a knowledge base like FarsBase must rely on automatically extracted knowledge. Knowledge base population can let FarsBase keep growing in size, as the system continues working. In this paper, we present a knowledge base population system for the Persian language, which extracts knowledge from unlabeled raw text, crawled from the Web. The proposed system consists of a set of state-of-the-art modules such as an entity linking module as well as information and relation extraction modules designed for FarsBase. Moreover, a canonicalization system is introduced to link extracted relations to FarsBase properties. Then, the system uses knowledge fusion techniques with minimal intervention of human experts to integrate and filter the proper knowledge instances, extracted by each module. To evaluate the performance of the presented knowledge base population system, we present the first gold dataset for benchmarking knowledge base population in the Persian language, which consisting of 22015 FarsBase triples and verified by human experts. The evaluation results demonstrate the efficiency of the proposed system.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
ParsEL 1.0: Unsupervised Entity Linking in Persian Social Media Texts
Authors:
Majid Asgari-Bidhendi,
Farzane Fakhrian,
Behrouz Minaei-Bidgoli
Abstract:
In recent years, social media data has exponentially increased, which can be enumerated as one of the largest data repositories in the world. A large portion of this social media data is natural language text. However, the natural language is highly ambiguous due to exposure to the frequent occurrences of entities, which have polysemous words or phrases. Entity linking is the task of linking the e…
▽ More
In recent years, social media data has exponentially increased, which can be enumerated as one of the largest data repositories in the world. A large portion of this social media data is natural language text. However, the natural language is highly ambiguous due to exposure to the frequent occurrences of entities, which have polysemous words or phrases. Entity linking is the task of linking the entity mentions in the text to their corresponding entities in a knowledge base. Recently, FarsBase, a Persian knowledge graph, has been introduced containing almost half a million entities. In this paper, we propose an unsupervised Persian Entity Linking system, the first entity linking system specially focused on the Persian language, which utilizes context-dependent and context-independent features. For this purpose, we also publish the first entity linking corpus of the Persian language containing 67,595 words that have been crawled from social media texts of some popular channels in the Telegram messenger. The output of the proposed method is 86.94% f-score for the Persian language, which is comparable with the similar state-of-the-art methods in the English language.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Pars-ABSA: an Aspect-based Sentiment Analysis dataset for Persian
Authors:
Taha Shangipour Ataei,
Kamyar Darvishi,
Soroush Javdan,
Behrouz Minaei-Bidgoli,
Sauleh Eetemadi
Abstract:
Due to the increased availability of online reviews, sentiment analysis had been witnessed a booming interest from the researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the…
▽ More
Due to the increased availability of online reviews, sentiment analysis had been witnessed a booming interest from the researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the entity (i.e. aspect-based sentiment analysis). Most of the available data resources were tailored to English and the other popular European languages. Although Persian is a language with more than 110 million speakers, to the best of our knowledge, there is a lack of public dataset on aspect-based sentiment analysis for Persian. This paper provides a manually annotated Persian dataset, Pars-ABSA, which is verified by 3 native Persian speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews. Moreover, as a baseline, this paper reports the performance of some state-of-the-art aspect-based sentiment analysis methods with a focus on deep learning, on Pars-ABSA. The obtained results are impressive compared to similar English state-of-the-art.
△ Less
Submitted 11 December, 2019; v1 submitted 26 July, 2019;
originally announced August 2019.
-
A Fast and Efficient algorithm for Many-To-Many Matching of Points with Demands in One Dimension
Authors:
Fatemeh Rajabi-Alni,
Alireza Bagheri,
Behrouz Minaei-Bidgoli
Abstract:
Given two point sets S and T, the minimum-cost many-to-many matching with demands (MMD) problem is the problem of finding a minimum-cost many-to-many matching between S and T such that each point of S (respectively T) is matched to at least a given number of the points of T (respectively S). We propose the first O(n^2) time algorithm for computing a one dimensional MMD (OMMD) of minimum cost betwe…
▽ More
Given two point sets S and T, the minimum-cost many-to-many matching with demands (MMD) problem is the problem of finding a minimum-cost many-to-many matching between S and T such that each point of S (respectively T) is matched to at least a given number of the points of T (respectively S). We propose the first O(n^2) time algorithm for computing a one dimensional MMD (OMMD) of minimum cost between S and T, where |S|+|T|=n. In an OMMD problem, the input point sets S and T lie on the real line and the cost of matching a point to another point equals the distance between the two points. We also study a generalized version of the MMD problem, the many-to-many matching with demands and capacities (MMDC) problem, that in which each point has a limited capacity in addition to a demand. We give the first O(n^2) time algorithm for the minimum-cost one dimensional MMDC (OMMDC) problem.
△ Less
Submitted 21 May, 2023; v1 submitted 9 April, 2019;
originally announced April 2019.
-
A faster algorithm for the limited-capacity many-to-many point matching in one dimension
Authors:
Fatemeh Rajabi-Alni,
Alireza Bagheri,
Behrouz Minaei-Bidgoli
Abstract:
Given two sets S and T, a limited-capacity many-to-many matching (LCMM) between S and T matches each element p in S (resp. T) to at least 1 and at most Cap(p) elements in T (resp. S), where the function Cap:S\cup T-> Z>0 denotes the capacity of p. In this paper, we present the first linear time algorithm for finding a minimum-cost one-dimensional LCMM (OLCMM) between S and T, where S and T are poi…
▽ More
Given two sets S and T, a limited-capacity many-to-many matching (LCMM) between S and T matches each element p in S (resp. T) to at least 1 and at most Cap(p) elements in T (resp. S), where the function Cap:S\cup T-> Z>0 denotes the capacity of p. In this paper, we present the first linear time algorithm for finding a minimum-cost one-dimensional LCMM (OLCMM) between S and T, where S and T are points lying on a line, and the cost of matching p in S to q in T equals the l_2 distance between p,q. Our algorithm improves the previous best-known quadratic time algorithm.
△ Less
Submitted 9 July, 2023; v1 submitted 4 April, 2019;
originally announced April 2019.
-
A new selection strategy for selective cluster ensemble based on Diversity and Independency
Authors:
Muhammad Yousefnezhad,
Ali Reihanian,
Daoqiang Zhang,
Behrouz Minaei-Bidgoli
Abstract:
This research introduces a new strategy in cluster ensemble selection by using Independency and Diversity metrics. In recent years, Diversity and Quality, which are two metrics in evaluation procedure, have been used for selecting basic clustering results in the cluster ensemble selection. Although quality can improve the final results in cluster ensemble, it cannot control the procedures of gener…
▽ More
This research introduces a new strategy in cluster ensemble selection by using Independency and Diversity metrics. In recent years, Diversity and Quality, which are two metrics in evaluation procedure, have been used for selecting basic clustering results in the cluster ensemble selection. Although quality can improve the final results in cluster ensemble, it cannot control the procedures of generating basic results, which causes a gap in prediction of the generated basic results' accuracy. Instead of quality, this paper introduces Independency as a supplementary method to be used in conjunction with Diversity. Therefore, this paper uses a heuristic metric, which is based on the procedure of converting code to graph in Software Testing, in order to calculate the Independency of two basic clustering algorithms. Moreover, a new modeling language, which we called as "Clustering Algorithms Independency Language" (CAIL), is introduced in order to generate graphs which depict Independency of algorithms. Also, Uniformity, which is a new similarity metric, has been introduced for evaluating the diversity of basic results. As a credential, our experimental results on varied different standard data sets show that the proposed framework improves the accuracy of final results dramatically in comparison with other cluster ensemble methods.
△ Less
Submitted 9 October, 2016;
originally announced October 2016.
-
Evaluating the effect of topic consideration in identifying communities of rating-based social networks
Authors:
Ali Reihanian,
Behrouz Minaei-Bidgoli,
Muhammad Yousefnezhad
Abstract:
Finding meaningful communities in social network has attracted the attentions of many researchers. The community structure of complex networks reveals both their organization and hidden relations among their constituents. Most of the researches in the field of community detection mainly focus on the topological structure of the network without performing any content analysis. Nowadays, real world…
▽ More
Finding meaningful communities in social network has attracted the attentions of many researchers. The community structure of complex networks reveals both their organization and hidden relations among their constituents. Most of the researches in the field of community detection mainly focus on the topological structure of the network without performing any content analysis. Nowadays, real world social networks are containing a vast range of information including shared objects, comments, following information, etc. In recent years, a number of researches have proposed approaches which consider both the contents that are interchanged in the networks and the topological structures of the networks in order to find more meaningful communities. In this research, the effect of topic analysis in finding more meaningful communities in social networking sites in which the users express their feelings toward different objects (like movies) by the means of rating is demonstrated by performing extensive experiments.
△ Less
Submitted 26 April, 2016;
originally announced April 2016.
-
Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces
Authors:
Mohamad Dolatshah,
Ali Hadian,
Behrouz Minaei-Bidgoli
Abstract:
Emerging location-based systems and data analysis frameworks requires efficient management of spatial data for approximate and exact search. Exact similarity search can be done using space partitioning data structures, such as Kd-tree, R*-tree, and Ball-tree. In this paper, we focus on Ball-tree, an efficient search tree that is specific for spatial queries which use euclidean distance. Each node…
▽ More
Emerging location-based systems and data analysis frameworks requires efficient management of spatial data for approximate and exact search. Exact similarity search can be done using space partitioning data structures, such as Kd-tree, R*-tree, and Ball-tree. In this paper, we focus on Ball-tree, an efficient search tree that is specific for spatial queries which use euclidean distance. Each node of a Ball-tree defines a ball, i.e. a hypersphere that contains a subset of the points to be searched.
In this paper, we propose Ball*-tree, an improved Ball-tree that is more efficient for spatial queries. Ball*-tree enjoys a modified space partitioning algorithm that considers the distribution of the data points in order to find an efficient splitting hyperplane. Also, we propose a new algorithm for KNN queries with restricted range using Ball*-tree, which performs better than both KNN and range search for such queries. Results show that Ball*-tree performs 39%-57% faster than the original Ball-tree algorithm.
△ Less
Submitted 2 November, 2015;
originally announced November 2015.
-
An O(n^3) time algorithm for the maximum weight b-matching problem on bipartite graphs
Authors:
Fatemeh Rajabi-Alni,
Alireza Bagheri,
Behrouz Minaei-Bidgoli
Abstract:
Given an undirected bipartite graph G= (A U B,E), the b-matching of G matches each vertex v in A (B) to at least 1 and at most b(v) vertices in B (A), where b(v) denotes the capacity of v. In this paper, we present an O(n^3) time algorithm for finding a maximum weight b-matching of G, where |A|+|B|=O(n). Our algorithm improves the previous best time complexity of O(n^3 log n) for this problem.
Given an undirected bipartite graph G= (A U B,E), the b-matching of G matches each vertex v in A (B) to at least 1 and at most b(v) vertices in B (A), where b(v) denotes the capacity of v. In this paper, we present an O(n^3) time algorithm for finding a maximum weight b-matching of G, where |A|+|B|=O(n). Our algorithm improves the previous best time complexity of O(n^3 log n) for this problem.
△ Less
Submitted 14 July, 2019; v1 submitted 13 October, 2014;
originally announced October 2014.
-
Experiments on Data Preprocessing of Persian Blog Networks
Authors:
Zeinab Borhani-fard,
Leila Esmaeili,
Behrouz Minaei-Bidgoli,
Mehdi Nasiri
Abstract:
Social networks analysis and exploring is important for researchers, sociologists, academics, and various businesses due to their information potential. Because of the large volume, diversity, and the data growth rate in web 2.0, some challenges have been made in these data analysis. Based on definitions, weblogs are a form of social networking. So far, the majority of studies and researches in th…
▽ More
Social networks analysis and exploring is important for researchers, sociologists, academics, and various businesses due to their information potential. Because of the large volume, diversity, and the data growth rate in web 2.0, some challenges have been made in these data analysis. Based on definitions, weblogs are a form of social networking. So far, the majority of studies and researches in the field of weblog networks analysis and exploring their stored data have been based on international data sets. In this paper, a framework for preprocessing and data analysis in weblog networks is presented and the results of applying it on a Persian weblog network, as a case study, are expressed.
△ Less
Submitted 1 September, 2014;
originally announced September 2014.
-
Multi-View Learning for Web Spam Detection
Authors:
Ali Hadian,
Behrouz Minaei-Bidgoli
Abstract:
Spam pages are designed to maliciously appear among the top search results by excessive usage of popular terms. Therefore, spam pages should be removed using an effective and efficient spam detection system. Previous methods for web spam classification used several features from various information sources (page contents, web graph, access logs, etc.) to detect web spam. In this paper, we follow p…
▽ More
Spam pages are designed to maliciously appear among the top search results by excessive usage of popular terms. Therefore, spam pages should be removed using an effective and efficient spam detection system. Previous methods for web spam classification used several features from various information sources (page contents, web graph, access logs, etc.) to detect web spam. In this paper, we follow page-level classification approach to build fast and scalable spam filters. We show that each web page can be classified with satisfiable accuracy using only its own HTML content. In order to design a multi-view classification system, we used state-of-the-art spam classification methods with distinct feature sets (views) as the base classifiers. Then, a fusion model is learned to combine the output of the base classifiers and make final prediction. Results show that multi-view learning significantly improves the classification performance, namely AUC by 22%, while providing linear speedup for parallel execution.
△ Less
Submitted 24 July, 2013; v1 submitted 16 May, 2013;
originally announced May 2013.
-
A Comparison Between Data Mining Prediction Algorithms for Fault Detection(Case study: Ahanpishegan co.)
Authors:
Golriz Amooee,
Behrouz Minaei-Bidgoli,
Malihe Bagheri-Dehnavi
Abstract:
In the current competitive world, industrial companies seek to manufacture products of higher quality which can be achieved by increasing reliability, maintainability and thus the availability of products. On the other hand, improvement in products lifecycle is necessary for achieving high reliability. Typically, maintenance activities are aimed to reduce failures of industrial machinery and minim…
▽ More
In the current competitive world, industrial companies seek to manufacture products of higher quality which can be achieved by increasing reliability, maintainability and thus the availability of products. On the other hand, improvement in products lifecycle is necessary for achieving high reliability. Typically, maintenance activities are aimed to reduce failures of industrial machinery and minimize the consequences of such failures. So the industrial companies try to improve their efficiency by using different fault detection techniques. One strategy is to process and analyze previous generated data to predict future failures. The purpose of this paper is to detect wasted parts using different data mining algorithms and compare the accuracy of these algorithms. A combination of thermal and physical characteristics has been used and the algorithms were implemented on Ahanpishegan's current data to estimate the availability of its produced parts.
Keywords: Data Mining, Fault Detection, Availability, Prediction Algorithms.
△ Less
Submitted 29 January, 2012;
originally announced January 2012.