-
F-tree: an algorithm for clustering transactional data using frequency tree
Authors:
Mahmoud Mahdi,
Samir Abdelrahman,
Reem Bahgat,
Ismail Ismail
Abstract:
Clustering is an important data mining technique that groups similar data records, recently categorical transaction clustering is received more attention. In this research, we study the problem of categorical data clustering for transactional data characterized with high dimensionality and large volume. We propose a novel algorithm for clustering transactional data called F-Tree, which is based on…
▽ More
Clustering is an important data mining technique that groups similar data records, recently categorical transaction clustering is received more attention. In this research, we study the problem of categorical data clustering for transactional data characterized with high dimensionality and large volume. We propose a novel algorithm for clustering transactional data called F-Tree, which is based on the idea of the frequent pattern algorithm FP-tree; the fastest approaches to the frequent item set mining. And the simple idea behind the F-Tree is to generate small high pure clusters, and then merge them. That makes it fast, and dynamic in clustering large transactional datasets with high dimensions. We also present a new solution to solve the overlap** problem between clusters, by defining a new criterion function, which is based on the probability of overlap** between weighted items. Our experimental evaluation on real datasets shows that: Firstly, F-Tree is effective in finding interesting clusters. Secondly, the usage of the tree structure reduces the clustering process time of the large data set with high attributes. Thirdly, the proposed evaluation metric used efficiently to solve the overlap** of transaction items generates high-quality clustering results. Finally, we have concluded that the process of merging pure and small clusters increases the purity of resulted clusters as well as it reduces the time of clustering better than the process of generating clusters directly from dataset then refine clusters.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Secure Count Query on Encrypted Genomic Data
Authors:
Mohammad Zahidul Hasan,
Md Safiur Rahman Mahdi,
Noman Mohammed
Abstract:
Capturing the vast amount of meaningful information encoded in the human genome is a fascinating research problem. The outcome of these researches have significant influences in a number of health related fields --- personalized medicine, paternity testing and disease susceptibility testing are a few to be named. To facilitate these types of large scale biomedical research projects, it oftentimes…
▽ More
Capturing the vast amount of meaningful information encoded in the human genome is a fascinating research problem. The outcome of these researches have significant influences in a number of health related fields --- personalized medicine, paternity testing and disease susceptibility testing are a few to be named. To facilitate these types of large scale biomedical research projects, it oftentimes requires to share genomic and clinical data collected by disparate organizations among themselves. In that case, it is of utmost importance to ensure that sharing, managing and analyzing the data does not reveal the identity of the individuals who contribute their genomic samples. The task of storage and computation on the shared data can be delegated to third party cloud infrastructures, equipped with large storage and high performance computation resources. Outsourcing these sensitive genomic data to the third party cloud storage is associated with the challenges of the potential loss, theft or misuse of the data as the server administrator cannot be completely trusted as well as there is no guarantee that the security of the server will not be breached. In this paper, we provide a model for secure sharing and computation on genomic data in a semi-honest third party cloud server. The security of the shared data is guaranteed through encryption while making the overall computation fast and scalable enough for real-life large-scale biomedical applications. We evaluated the efficiency of our proposed model on a database of Single-Nucleotide Polymorphism (SNP) sequences and experimental results demonstrate that a query of 50 SNPs in a database of 50000 records, where each record contains 300 SNPs, takes approximately 6 seconds.
△ Less
Submitted 4 March, 2017;
originally announced March 2017.
-
Genetic Algorithms and its use with back-propagation network
Authors:
Ayman M. Bahaa-Eldin,
A. M. A. Wahdan,
H. M. K. Mahdi
Abstract:
Genetic algorithms are considered as one of the most efficient search techniques. Although they do not offer an optimal solution, their ability to reach a suitable solution in considerably short time gives them their respectable role in many AI techniques. This work introduces genetic algorithms and describes their characteristics. Then a novel method using genetic algorithm in best training set g…
▽ More
Genetic algorithms are considered as one of the most efficient search techniques. Although they do not offer an optimal solution, their ability to reach a suitable solution in considerably short time gives them their respectable role in many AI techniques. This work introduces genetic algorithms and describes their characteristics. Then a novel method using genetic algorithm in best training set generation and selection for a back-propagation network is proposed. This work also offers a new extension to the original genetic algorithms
△ Less
Submitted 21 January, 2014;
originally announced January 2014.
-
Edge detection of binary images using the method of masks
Authors:
Ayman M Bahaa-Eldeen,
Abdel-Moneim A. Wahdan,
Hani M. K. Mahdi
Abstract:
In this work the method of masks, creating and using of inverted image masks, together with binary operation of image data are used in edge detection of binary images, monochrome images, which yields about 300 times faster than ordinary methods. The method is divided into three stages: Mask construction, Fundamental edge detection, and Edge Construction Comparison with an ordinary method and a fuz…
▽ More
In this work the method of masks, creating and using of inverted image masks, together with binary operation of image data are used in edge detection of binary images, monochrome images, which yields about 300 times faster than ordinary methods. The method is divided into three stages: Mask construction, Fundamental edge detection, and Edge Construction Comparison with an ordinary method and a fuzzy based method is carried out.
△ Less
Submitted 21 January, 2014;
originally announced January 2014.