Search | arXiv e-print repository

RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning

Authors: Hansheng Xue, Vi**i Mallawaarachchi, Yujia Zhang, Vaibhav Rajan, Yu Lin

Abstract: Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover… ▽ More Mixed communities of organisms are found in many environments (from the human gut to marine ecosystems) and can have profound impact on human health and the environment. Metagenomics studies the genomic material of such communities through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem in the standard workflow, called binning, is to discover clusters, of genomic subsequences, associated with the unknown constituent organisms. Inherent noise in the subsequences, various biological constraints that need to be imposed on them and the skewed cluster size distribution exacerbate the difficulty of this unsupervised learning problem. In this paper, we present a new formulation using a graph where the nodes are subsequences and edges represent homophily information. In addition, we model biological constraints providing heterophilous signal about nodes that cannot be clustered together. We solve the binning problem by develo** new algorithms for (i) graph representation learning that preserves both homophily relations and heterophily constraints (ii) constraint-based graph clustering method that addresses the problems of skewed cluster size distribution. Extensive experiments, on real and synthetic datasets, demonstrate that our approach, called RepBin, outperforms a wide variety of competing methods. Our constraint-based graph representation learning and clustering methods, that may be useful in other domains as well, advance the state-of-the-art in both metagenomics binning and graph representation learning. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI-2022

arXiv:2101.11463 [pdf]

An Ultra-Specific Image Dataset for Automated Insect Identification

Authors: D. L. Abeywardhana, C. D. Dangalle, Anupiya Nugaliyadde, Yashas Mallawarachchi

Abstract: Automated identification of insects is a tough task where many challenges like data limitation, imbalanced data count, and background noise needs to be overcome for better performance. This paper describes such an image dataset which consists of a limited, imbalanced number of images regarding six genera of subfamily Cicindelinae (tiger beetles) of order Coleoptera. The diversity of image collecti… ▽ More Automated identification of insects is a tough task where many challenges like data limitation, imbalanced data count, and background noise needs to be overcome for better performance. This paper describes such an image dataset which consists of a limited, imbalanced number of images regarding six genera of subfamily Cicindelinae (tiger beetles) of order Coleoptera. The diversity of image collection is at a high level as the images were taken from different sources, angles and on different scales. Thus, the salient regions of the images have a large variation. Therefore, one of the main intentions in this process was to get an idea about the image dataset while comparing different unique patterns and features in images. The dataset was evaluated on different classification algorithms including deep learning models based on different approaches to provide a benchmark. The dynamic nature of the dataset poses a challenge to the image classification algorithms. However transfer learning models using softmax classifier performed well on current dataset. The tiger beetle classification can be challenging even to a trained human eye, therefore, this dataset opens a new avenue for the classification algorithms to develop, to identify features which human eyes have not identified. △ Less

Submitted 27 January, 2021; originally announced January 2021.

Comments: This is a pre-print of the manuscript that is currently under review in Multimedia Tools and Applications Journal, Springer

arXiv:2011.08671 [pdf, other]

Long-Term Pipeline Failure Prediction Using Nonparametric Survival Analysis

Authors: Dilusha Weeraddana, Sudaraka MallawaArachchi, Tharindu Warnakula, Zhidong Li, Yang Wang

Abstract: Australian water infrastructure is more than a hundred years old, thus has begun to show its age through water main failures. Our work concerns approximately half a million pipelines across major Australian cities that deliver water to houses and businesses, serving over five million customers. Failures on these buried assets cause damage to properties and water supply disruptions. We applied Mach… ▽ More Australian water infrastructure is more than a hundred years old, thus has begun to show its age through water main failures. Our work concerns approximately half a million pipelines across major Australian cities that deliver water to houses and businesses, serving over five million customers. Failures on these buried assets cause damage to properties and water supply disruptions. We applied Machine Learning techniques to find a cost-effective solution to the pipe failure problem in these Australian cities, where on average 1500 of water main failures occur each year. To achieve this objective, we construct a detailed picture and understanding of the behaviour of the water pipe network by develo** a Machine Learning model to assess and predict the failure likelihood of water main breaking using historical failure records, descriptors of pipes and other environmental factors. Our results indicate that our system incorporating a nonparametric survival analysis technique called "Random Survival Forest" outperforms several popular algorithms and expert heuristics in long-term prediction. In addition, we construct a statistical inference technique to quantify the uncertainty associated with the long-term predictions. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: ECML-PKDD 2020

arXiv:1907.03202 [pdf]

Evolutionary Algorithm for Sinhala to English Translation

Authors: J. K. Joseph, W. M. T. Chathurika, A. Nugaliyadde, Y. Mallawarachchi

Abstract: Machine Translation (MT) is an area in natural language processing, which focus on translating from one language to another. Many approaches ranging from statistical methods to deep learning approaches are used in order to achieve MT. However, these methods either require a large number of data or a clear understanding about the language. Sinhala language has less digital text which could be used… ▽ More Machine Translation (MT) is an area in natural language processing, which focus on translating from one language to another. Many approaches ranging from statistical methods to deep learning approaches are used in order to achieve MT. However, these methods either require a large number of data or a clear understanding about the language. Sinhala language has less digital text which could be used to train a deep neural network. Furthermore, Sinhala has complex rules therefore, it is harder to create statistical rules in order to apply statistical methods in MT. This research focuses on Sinhala to English translation using an Evolutionary Algorithm (EA). EA is used to identifying the correct meaning of Sinhala text and to translate it to English. The Sinhala text is passed to identify the meaning in order to get the correct meaning of the sentence. With the use of the EA the translation is carried out. The translated text is passed on to grammatically correct the sentence. This has shown to achieve accurate results. △ Less

Submitted 6 July, 2019; originally announced July 2019.

Comments: The paper was submitted to National Information Technology Conference (2019)

arXiv:1902.02162 [pdf]

Adaptive Artificial Intelligent Q&A Platform

Authors: M. R, Akram, C. P, Singhabahu, M. S. M Saad, P, Deleepa, Anupiya, Nugaliyadde, Yashas, Mallawarachchi

Abstract: The paper presents an approach to build a question and answer system that is capable of processing the information in a large dataset and allows the user to gain knowledge from this dataset by asking questions in natural language form. Key content of this research covers four dimensions which are; Corpus Preprocessing, Question Preprocessing, Deep Neural Network for Answer Extraction and Answer Ge… ▽ More The paper presents an approach to build a question and answer system that is capable of processing the information in a large dataset and allows the user to gain knowledge from this dataset by asking questions in natural language form. Key content of this research covers four dimensions which are; Corpus Preprocessing, Question Preprocessing, Deep Neural Network for Answer Extraction and Answer Generation. The system is capable of understanding the question, responds to the user's query in natural language form as well. The goal is to make the user feel as if they were interacting with a person than a machine. △ Less

Submitted 19 January, 2019; originally announced February 2019.

arXiv:1901.02660 [pdf, other]

doi 10.1145/3369876

Change Detection and Notification of Webpages: A Survey

Authors: Vi**i Mallawaarachchi, Lakmal Meegahapola, Roshan Alwis, Eranga Nimalarathna, Dulani Meedeniya, Sampath Jayarathna

Abstract: Majority of the currently available webpages are dynamic in nature and are changing frequently. New content gets added to webpages and existing content gets updated or deleted. Hence, people find it useful to be alert for changes in webpages which contain information valuable to them. In the current context, kee** track of these webpages and getting alerts about different changes have become sig… ▽ More Majority of the currently available webpages are dynamic in nature and are changing frequently. New content gets added to webpages and existing content gets updated or deleted. Hence, people find it useful to be alert for changes in webpages which contain information valuable to them. In the current context, kee** track of these webpages and getting alerts about different changes have become significantly challenging. Change Detection and Notification (CDN) systems were introduced to automate this monitoring process and notify users when changes occur in webpages. This survey classifies and analyzes different aspects of CDN systems and different techniques used for each aspect. Furthermore, the survey highlights the current challenges and areas of improvement present within the field of research. △ Less

Submitted 16 February, 2020; v1 submitted 9 January, 2019; originally announced January 2019.

Comments: ACM Computing Surveys

Journal ref: Change Detection and Notification of Web Pages: A Survey. ACM Comput. Surv. 53, 1, Article 15 (February 2020), 35 pages

arXiv:1809.04557 [pdf]

Solving Sinhala Language Arithmetic Problems using Neural Networks

Authors: W. M. T Chathurika, K. C. E De Silva, A. M. Raddella, E. M. R. S. Ekanayake, A. Nugaliyadde, Y. Mallawarachchi

Abstract: A methodology is presented to solve Arithmetic problems in Sinhala Language using a Neural Network. The system comprises of (a) keyword identification, (b) question identification, (c) mathematical operation identification and is combined using a neural network. Naive Bayes Classification is used in order to identify keywords and Conditional Random Field to identify the question and the operation… ▽ More A methodology is presented to solve Arithmetic problems in Sinhala Language using a Neural Network. The system comprises of (a) keyword identification, (b) question identification, (c) mathematical operation identification and is combined using a neural network. Naive Bayes Classification is used in order to identify keywords and Conditional Random Field to identify the question and the operation which should be performed on the identified keywords to achieve the expected result. "One vs. all Classification" is done using a neural network for sentences. All functions are combined through the neural network which builds an equation to solve the problem. The paper compares each methodology in ARIS and Mahoshadha to the method presented in the paper. Mahoshadha2 learns to solve arithmetic problems with the accuracy of 76%. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: 34th National Information Technology Conference (NITC 2016)

Showing 1–7 of 7 results for author: Mallawarachchi