-
Extreme-scale many-against-many protein similarity search
Authors:
Oguz Selvitopi,
Saliya Ekanayake,
Giulia Guidi,
Muaaz G. Awan,
Georgios A. Pavlopoulos,
Ariful Azad,
Nikos Kyrpides,
Leonid Oliker,
Katherine Yelick,
Aydın Buluç
Abstract:
Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405…
▽ More
Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Object Dimension Extraction for Environment Map** with Low Cost Cameras Fused with Laser Ranging
Authors:
E. M. S. P. Ekanayake,
T. H. M. N. C. Thelasingha,
U. V. B. L. Udugama,
G. M. R. I. Godaliyadda,
M. P. B. Ekanayake,
B. G. L. T. Samaranayake,
J. V. Wijayakulasooriya
Abstract:
It is essential to have a method to map an unknown terrain for various applications. For places where human access is not possible, a method should be proposed to identify the environment. Exploration, disaster relief, transportation and many other purposes would be convenient if a map of the environment is available. Replicating the human vision system using stereo cameras would be an optimum sol…
▽ More
It is essential to have a method to map an unknown terrain for various applications. For places where human access is not possible, a method should be proposed to identify the environment. Exploration, disaster relief, transportation and many other purposes would be convenient if a map of the environment is available. Replicating the human vision system using stereo cameras would be an optimum solution. In this work, we have used laser ranging based technique fused with stereo cameras to extract dimension of objects for map**. The distortions were calibrated using mathematical model of the camera. By means of Semi Global Block Matching [1] disparity map was generated and reduces the noise using novel noise reduction method of disparity map by dilation. The Data from the Laser Range Finder (LRF) and noise reduced vision data has been used to identify the object parameters.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Laser Ranging Based Intelligent System for Unknown Environment Map**
Authors:
T. H. M. N. C. Thelasingha,
U. V. B. L. Udugama,
E. M. S. P. Ekanayake,
G. M. R. I. Godaliyadda,
M. P. B. Ekanayake,
B. G. L. T. Samaranayake,
J. V. Wijayakulasooriya
Abstract:
This work describes the implementation of a simple and computationally efficient Intelligent Navigation System (INS) for autonomous systems used in areas where human access is impossible. The system uses Laser Range Finder (LRF) readings as input, making it suitable for mobile platform implementation. The INS pre-processes the LRF readings to remove noise and determines an obstacle-free path for m…
▽ More
This work describes the implementation of a simple and computationally efficient Intelligent Navigation System (INS) for autonomous systems used in areas where human access is impossible. The system uses Laser Range Finder (LRF) readings as input, making it suitable for mobile platform implementation. The INS pre-processes the LRF readings to remove noise and determines an obstacle-free path for map**. The system's localization method uses a similarity transform and particle filter. The system was tested in artificially generated environments and emulated in real-time with real-environment data. The system was then implemented in a Raspberry Pi3 on a 3WD Omni-directional mobile platform and tested in real environments. The system was able to generate an accurate 2D map of the area. The proposed methodology was shown to be efficient through a comparative analysis of execution time.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Develo** and delivering a remote experiment based on the experiential learning framework during COVID-19 pandemic
Authors:
W. D. Kularatne,
Lasanthika H. Dissawa,
T. M. S. S. K. Ekanayake,
Janaka B. Ekanayake
Abstract:
The students following Engineering disciplines should not only acquire the conceptual understanding of the concepts but also the processors and attitudes. There are two recognizable learning environments for students, namely, classroom environment and laboratory environment. With the COVID-19 pandemic, both environments merged to online environments, impacting students' development of processes an…
▽ More
The students following Engineering disciplines should not only acquire the conceptual understanding of the concepts but also the processors and attitudes. There are two recognizable learning environments for students, namely, classroom environment and laboratory environment. With the COVID-19 pandemic, both environments merged to online environments, impacting students' development of processes and characteristic attitudes. This paper introduces a theoretical framework based on experiential learning to plan and deliver processes through an online environment. A case study based on the power factor correction experiment was presented. The traditional experiment that runs for 3 hours was broken into smaller tasks such as a pre-lab activity, a simulation exercise, a PowerPoint presentation, a remote laboratory activity, and a final report based on the experiential learning approach. A questionnaire that carries close and open-ended questions were administered to obtain students' reflections about develo** the processes through an online-friendly experiential learning approach. The majority of the students like the approach followed and praise for providing them with an opportunity to perform the experiment in a novel way during the COVID-19 situation.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
Parallel Algorithms for Densest Subgraph Discovery Using Shared Memory Model
Authors:
B. D. M. De Zoysa,
Y. A. M. M. A. Ali,
M. D. I. Maduranga,
Indika Perera,
Saliya Ekanayake,
Anil Vullikanti
Abstract:
The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This research project explores previously available algorithms in order to study them and identify potential modifications that could result in an improved version w…
▽ More
The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This research project explores previously available algorithms in order to study them and identify potential modifications that could result in an improved version with considerable performance and efficiency leap. Furthermore, efforts were also steered towards devising a novel algorithm for the problem of densest subgraph discovery. This paper presents an improved implementation of a widely used densest subgraph discovery algorithm and a novel parallel algorithm which produces better results than a 2-approximation.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices
Authors:
Oguz Selvitopi,
Saliya Ekanayake,
Giulia Guidi,
Georgios Pavlopoulos,
Ariful Azad,
Aydin Buluc
Abstract:
Identifying similar protein sequences is a core step in many computational biology pipelines such as detection of homologous protein sequences, generation of similarity protein graphs for downstream analysis, functional annotation and gene location. Performance and scalability of protein similarity searches have proven to be a bottleneck in many bioinformatics pipelines due to increases in cheap a…
▽ More
Identifying similar protein sequences is a core step in many computational biology pipelines such as detection of homologous protein sequences, generation of similarity protein graphs for downstream analysis, functional annotation and gene location. Performance and scalability of protein similarity searches have proven to be a bottleneck in many bioinformatics pipelines due to increases in cheap and abundant sequencing data. This work presents a new distributed-memory software, PASTIS. PASTIS relies on sparse matrix computations for efficient identification of possibly similar proteins. We use distributed sparse matrices for scalability and show that the sparse matrix infrastructure is a great fit for protein similarity searches when coupled with a fully-distributed dictionary of sequences that allows remote sequence requests to be fulfilled. Our algorithm incorporates the unique bias in amino acid sequence substitution in searches without altering the basic sparse matrix model, and in turn, achieves ideal scaling up to millions of protein sequences.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
The Parallelism Motifs of Genomic Data Analysis
Authors:
Katherine Yelick,
Aydin Buluc,
Muaaz Awan,
Ariful Azad,
Benjamin Brock,
Rob Egan,
Saliya Ekanayake,
Marquita Ellis,
Evangelos Georganas,
Giulia Guidi,
Steven Hofmeyr,
Oguz Selvitopi,
Cristina Teodoropol,
Leonid Oliker
Abstract:
Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from…
▽ More
Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.
-
Solving Sinhala Language Arithmetic Problems using Neural Networks
Authors:
W. M. T Chathurika,
K. C. E De Silva,
A. M. Raddella,
E. M. R. S. Ekanayake,
A. Nugaliyadde,
Y. Mallawarachchi
Abstract:
A methodology is presented to solve Arithmetic problems in Sinhala Language using a Neural Network. The system comprises of (a) keyword identification, (b) question identification, (c) mathematical operation identification and is combined using a neural network. Naive Bayes Classification is used in order to identify keywords and Conditional Random Field to identify the question and the operation…
▽ More
A methodology is presented to solve Arithmetic problems in Sinhala Language using a Neural Network. The system comprises of (a) keyword identification, (b) question identification, (c) mathematical operation identification and is combined using a neural network. Naive Bayes Classification is used in order to identify keywords and Conditional Random Field to identify the question and the operation which should be performed on the identified keywords to achieve the expected result. "One vs. all Classification" is done using a neural network for sentences. All functions are combined through the neural network which builds an equation to solve the problem. The paper compares each methodology in ARIS and Mahoshadha to the method presented in the paper. Mahoshadha2 learns to solve arithmetic problems with the accuracy of 76%.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.