-
Unified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy Prediction
Authors:
Sathira Silva,
Savindu Bhashitha Wannigama,
Gihan Jayatilaka,
Muhammad Haris Khan,
Roshan Ragel
Abstract:
Holistic understanding and reasoning in 3D scenes play a vital role in the success of autonomous driving systems. The evolution of 3D semantic occupancy prediction as a pretraining task for autonomous driving and robotic downstream tasks capture finer 3D details compared to methods like 3D detection. Existing approaches predominantly focus on spatial cues such as tri-perspective view embeddings (T…
▽ More
Holistic understanding and reasoning in 3D scenes play a vital role in the success of autonomous driving systems. The evolution of 3D semantic occupancy prediction as a pretraining task for autonomous driving and robotic downstream tasks capture finer 3D details compared to methods like 3D detection. Existing approaches predominantly focus on spatial cues such as tri-perspective view embeddings (TPV), often overlooking temporal cues. This study introduces a spatiotemporal transformer architecture S2TPVFormer for temporally coherent 3D semantic occupancy prediction. We enrich the prior process by including temporal cues using a novel temporal cross-view hybrid attention mechanism (TCVHA) and generate spatiotemporal TPV embeddings (i.e. S2TPV embeddings). Experimental evaluations on the nuScenes dataset demonstrate a substantial 4.1% improvement in mean Intersection over Union (mIoU) for 3D Semantic Occupancy compared to TPVFormer, confirming the effectiveness of the proposed S2TPVFormer in enhancing 3D scene perception.
△ Less
Submitted 4 April, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
An Optical physics inspired CNN approach for intrinsic image decomposition
Authors:
Harshana Weligampola,
Gihan Jayatilaka,
Suren Sritharan,
Parakrama Ekanayake,
Roshan Ragel,
Vijitha Herath,
Roshan Godaliyadda
Abstract:
Intrinsic Image Decomposition is an open problem of generating the constituents of an image. Generating reflectance and shading from a single image is a challenging task specifically when there is no ground truth. There is a lack of unsupervised learning approaches for decomposing an image into reflectance and shading using a single image. We propose a neural network architecture capable of this d…
▽ More
Intrinsic Image Decomposition is an open problem of generating the constituents of an image. Generating reflectance and shading from a single image is a challenging task specifically when there is no ground truth. There is a lack of unsupervised learning approaches for decomposing an image into reflectance and shading using a single image. We propose a neural network architecture capable of this decomposition using physics-based parameters derived from the image. Through experimental results, we show that (a) the proposed methodology outperforms the existing deep learning-based IID techniques and (b) the derived parameters improve the efficacy significantly. We conclude with a closer analysis of the results (numerical and example images) showing several avenues for improvement.
△ Less
Submitted 20 December, 2021; v1 submitted 20 May, 2021;
originally announced May 2021.
-
A Retinex based GAN Pipeline to Utilize Paired and Unpaired Datasets for Enhancing Low Light Images
Authors:
Harshana Weligampola,
Gihan Jayatilaka,
Suren Sritharan,
Roshan Godaliyadda,
Parakrama Ekanayaka,
Roshan Ragel,
Vijitha Herath
Abstract:
Low light image enhancement is an important challenge for the development of robust computer vision algorithms. The machine learning approaches to this have been either unsupervised, supervised based on paired dataset or supervised based on unpaired dataset. This paper presents a novel deep learning pipeline that can learn from both paired and unpaired datasets. Convolution Neural Networks (CNNs)…
▽ More
Low light image enhancement is an important challenge for the development of robust computer vision algorithms. The machine learning approaches to this have been either unsupervised, supervised based on paired dataset or supervised based on unpaired dataset. This paper presents a novel deep learning pipeline that can learn from both paired and unpaired datasets. Convolution Neural Networks (CNNs) that are optimized to minimize standard loss, and Generative Adversarial Networks (GANs) that are optimized to minimize the adversarial loss are used to achieve different steps of the low light image enhancement process. Cycle consistency loss and a patched discriminator are utilized to further improve the performance. The paper also analyses the functionality and the performance of different components, hidden layers, and the entire pipeline.
△ Less
Submitted 21 October, 2021; v1 submitted 27 June, 2020;
originally announced June 2020.
-
Non-contact Infant Sleep Apnea Detection
Authors:
Gihan Jayatilaka,
Harshana Weligampola,
Suren Sritharan,
Pankayraj Pathmanathan,
Roshan Ragel,
Isuru Nawinne
Abstract:
Sleep apnea is a breathing disorder where a person repeatedly stops breathing in sleep. Early detection is crucial for infants because it might bring long term adversities. The existing accurate detection mechanism (pulse oximetry) is a skin contact measurement. The existing non-contact mechanisms (acoustics, video processing) are not accurate enough. This paper presents a novel algorithm for the…
▽ More
Sleep apnea is a breathing disorder where a person repeatedly stops breathing in sleep. Early detection is crucial for infants because it might bring long term adversities. The existing accurate detection mechanism (pulse oximetry) is a skin contact measurement. The existing non-contact mechanisms (acoustics, video processing) are not accurate enough. This paper presents a novel algorithm for the detection of sleep apnea with video processing. The solution is non-contact, accurate and lightweight enough to run on a single board computer. The paper discusses the accuracy of the algorithm on real data, advantages of the new algorithm, its limitations and suggests future improvements.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
An optimized Parallel Failure-less Aho-Corasick algorithm for DNA sequence matching
Authors:
Vajira Thambawita,
Roshan G. Ragel,
Dhammike Elkaduwe
Abstract:
The Aho-Corasick algorithm is multiple patterns searching algorithm running sequentially in various applications like network intrusion detection and bioinformatics for finding several input strings within a given large input string. The parallel version of the Aho-Corasick algorithm is called as Parallel Failure-less Aho-Corasick algorithm because it doesn't need failure links like in the origina…
▽ More
The Aho-Corasick algorithm is multiple patterns searching algorithm running sequentially in various applications like network intrusion detection and bioinformatics for finding several input strings within a given large input string. The parallel version of the Aho-Corasick algorithm is called as Parallel Failure-less Aho-Corasick algorithm because it doesn't need failure links like in the original Aho-Corasick algorithm. In this research, we implemented an application specific parallel failureless Aho-Corasick algorithm to the general purpose graphics processing unit by applying several cache optimization techniques for matching DNA sequences. Our parallel Aho-Corasick algorithm shows better performance than the available parallel Aho-Corasick algorithm library due to its simplicity and optimized cache memory usage of graphics processing units for matching DNA sequences.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Near Real-Time Data Labeling Using a Depth Sensor for EMG Based Prosthetic Arms
Authors:
Geesara Prathap,
Titus Nanda Kumara,
Roshan Ragel
Abstract:
Recognizing sEMG (Surface Electromyography) signals belonging to a particular action (e.g., lateral arm raise) automatically is a challenging task as EMG signals themselves have a lot of variation even for the same action due to several factors. To overcome this issue, there should be a proper separation which indicates similar patterns repetitively for a particular action in raw signals. A repeti…
▽ More
Recognizing sEMG (Surface Electromyography) signals belonging to a particular action (e.g., lateral arm raise) automatically is a challenging task as EMG signals themselves have a lot of variation even for the same action due to several factors. To overcome this issue, there should be a proper separation which indicates similar patterns repetitively for a particular action in raw signals. A repetitive pattern is not always matched because the same action can be carried out with different time duration. Thus, a depth sensor (Kinect) was used for pattern identification where three joint angles were recording continuously which is clearly separable for a particular action while recording sEMG signals. To Segment out a repetitive pattern in angle data, MDTW (Moving Dynamic Time War**) approach is introduced. This technique is allowed to retrieve suspected motion of interest from raw signals. MDTW based on DTW algorithm, but it will be moving through the whole dataset in a pre-defined manner which is capable of picking up almost all the suspected segments inside a given dataset an optimal way. Elevated bicep curl and lateral arm raise movements are taken as motions of interest to show how the proposed technique can be employed to achieve auto identification and labelling. The full implementation is available at https://github.com/GPrathap/OpenBCIPython
△ Less
Submitted 10 November, 2018;
originally announced November 2018.
-
To Use or Not to Use: CPUs' Cache Optimization Techniques on GPGPUs
Authors:
Vajira Thambawita,
Roshan G. Ragel,
Dhammike Elkaduwe
Abstract:
General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which requires more processing power than normal personal computers. Therefore, most of the programmers, researchers and industry use this new concept for their work.…
▽ More
General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which requires more processing power than normal personal computers. Therefore, most of the programmers, researchers and industry use this new concept for their work. However, achieving high-performance or high-throughput using GPGPUs are not an easy task compared with conventional programming concepts in the CPU side. In this research, the CPU's cache memory optimization techniques have been adopted to the GPGPU's cache memory to identify rare performance improvement techniques compared to GPGPU's best practices. The cache optimization techniques of blocking, loop fusion, array merging and array transpose were tested on GPGPUs for finding suitability of these techniques. Finally, we identified that some of the CPU cache optimization techniques go well with the cache memory system of the GPGPU and shows performance improvements while some others show the opposite effect on the GPGPUs compared with the CPUs.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
SecureD: A Secure Dual Core Embedded Processor
Authors:
Roshan G. Ragel,
Jude A. Ambrose,
Sri Parameswaran
Abstract:
Security of embedded computing systems is becoming of paramount concern as these devices become more ubiquitous, contain personal information and are increasingly used for financial transactions. Security attacks targeting embedded systems illegally gain access to the information in these devices or destroy information. The two most common types of attacks embedded systems encounter are code-injec…
▽ More
Security of embedded computing systems is becoming of paramount concern as these devices become more ubiquitous, contain personal information and are increasingly used for financial transactions. Security attacks targeting embedded systems illegally gain access to the information in these devices or destroy information. The two most common types of attacks embedded systems encounter are code-injection and power analysis attacks. In the past, a number of countermeasures, both hardware- and software-based, were proposed individually against these two types of attacks. However, no single system exists to counter both of these two prominent attacks in a processor based embedded system. Therefore, this paper, for the first time, proposes a hardware/software based countermeasure against both code-injection attacks and power analysis based side-channel attacks in a dual core embedded system. The proposed processor, named SecureD, has an area overhead of just 3.80% and an average runtime increase of 20.0% when compared to a standard dual processing system. The overhead were measured using a set of industry standard application benchmarks, with two encryption and five other programs.
△ Less
Submitted 5 November, 2015;
originally announced November 2015.
-
A Structured Hardware Software Architecture for Peptide Based Diagnosis - Sub-string Matching Problem with Limited Tolerance (ICIAfS14)
Authors:
S. M. Vidanagamachchi,
S. D. Dewasurendra,
R. G. Ragel,
M. Niranjan
Abstract:
The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as on…
▽ More
The problem of inferring proteins from complex peptide samples in shotgun proteomic workflow sets extreme demands on computational resources. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. Two problems arise from this: a) due to these variations, the applicability of exact string matching methodologies could be questioned and b) the difficulty of defining a reference sequence for a particular set of proteins that are functionally indistinguishable, but with some variation in features. This paper presents a model-based inference approach that is developed and validated to solve the inference problem. Our approach starts from an examination of the known set of splice variants and isoforms of a target protein to identify the Greatest Common Stable Substring (GCSS) of amino acids and the Substrings Subjects to Limited Variation (SSLV) and their respective locations on the GCSS. Then we define and solve the Sub-string Matching Problem with Limited Tolerance (SMPLT). This approach is validated on identified peptides in a labelled and clustered data set from UNIPROT. Identification of Baylisascaris Procyonis infection was used as an application instance that achieved up to 70 times speedup compared to a software only system. This workflow can be generalised to any inexact multiple pattern matching application by replacing the patterns in a clustered and distributed environment which permits a distance between member strings to account for permitted deviations such as substitutions, insertions and deletions.
△ Less
Submitted 25 December, 2014;
originally announced December 2014.
-
Students Behavioural Analysis in an Online Learning Environment Using Data Mining (ICIAfS)
Authors:
I. P. Ratnapala,
R. G. Ragel,
S. Deegalla
Abstract:
The focus of this research was to use Educational Data Mining (EDM) techniques to conduct a quantitative analysis of students interaction with an e-learning system through instructor-led non-graded and graded courses. This exercise is useful for establishing a guideline for a series of online short courses for them. A group of 412 students' access behaviour in an e-learning system were analysed an…
▽ More
The focus of this research was to use Educational Data Mining (EDM) techniques to conduct a quantitative analysis of students interaction with an e-learning system through instructor-led non-graded and graded courses. This exercise is useful for establishing a guideline for a series of online short courses for them. A group of 412 students' access behaviour in an e-learning system were analysed and they were grouped into clusters using K-Means clustering method according to their course access log records. The results explained that more than 40% from the student group are passive online learners in both graded and non-graded learning environments. The result showed that the difference in the learning environments could change the online access behaviour of a student group. Clustering divided the student population into five access groups based on their course access behaviour. Among these groups, the least access group (NG-41% and G-42%) and the highest access group (NG-9% and G-5%) could be identified very clearly due to their access variation from the rest of the groups.
△ Less
Submitted 25 December, 2014;
originally announced December 2014.
-
A Structured Hardware Software Architecture for Peptide Based Diagnosis of Baylisascaris Procyonis Infection (ICIAfS14)
Authors:
S. M. Vidanagamachchi,
S. D. Dewasurendra,
R. G. Ragel,
M. Niranjan
Abstract:
The problem of inferring proteins from complex peptide cocktails (digestion products of biological samples) in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequenc…
▽ More
The problem of inferring proteins from complex peptide cocktails (digestion products of biological samples) in shotgun proteomic workflow sets extreme demands on computational resources in respect of the required very high processing throughputs, rapid processing rates and reliability of results. This is exacerbated by the fact that, in general, a given protein cannot be defined by a fixed sequence of amino acids due to the existence of splice variants and isoforms of that protein. Therefore, the problem of protein inference could be considered as one of identifying sequences of amino acids with some limited tolerance. In the current paper a model-based hardware acceleration of a structured and practical inference approach is developed and validated on a mass spectrometry experiment of realistic size. We have achieved 10 times maximum speed-up in the co-designed workflow compared to a similar software-only workflow run on the processor used for co-design.
△ Less
Submitted 25 December, 2014;
originally announced December 2014.
-
To Use or Not to Use: Graphics Processing Units for Pattern Matching Algorithms
Authors:
Vajira Thambawita,
Roshan Ragel,
Dhammika Elkaduwe
Abstract:
String matching is an important part in today's computer applications and Aho-Corasick algorithm is one of the main string matching algorithms used to accomplish this. This paper discusses that when can the GPUs be used for string matching applications using the Aho-Corasick algorithm as a benchmark. We have to identify the best unit to run our string matching algorithm according to the performanc…
▽ More
String matching is an important part in today's computer applications and Aho-Corasick algorithm is one of the main string matching algorithms used to accomplish this. This paper discusses that when can the GPUs be used for string matching applications using the Aho-Corasick algorithm as a benchmark. We have to identify the best unit to run our string matching algorithm according to the performance of our devices and the applications. Sometimes CPU gives better performance than GPU and sometimes GPU gives better performance than CPU. Therefore, identifying this critical point is significant task for researchers who are using GPUs to improve the performance of their string matching applications based on string matching algorithms.
△ Less
Submitted 25 December, 2014;
originally announced December 2014.
-
Plagiarism Detection on Electronic Text based Assignments using Vector Space Model (ICIAfS14)
Authors:
MAC Jiffriya,
MAC Akmal Jahan,
Roshan G. Ragel
Abstract:
Plagiarism is known as illegal use of others' part of work or whole work as one's own in any field such as art, poetry, literature, cinema, research and other creative forms of study. Plagiarism is one of the important issues in academic and research fields and giving more concern in academic systems. The situation is even worse with the availability of ample resources on the web. This paper focus…
▽ More
Plagiarism is known as illegal use of others' part of work or whole work as one's own in any field such as art, poetry, literature, cinema, research and other creative forms of study. Plagiarism is one of the important issues in academic and research fields and giving more concern in academic systems. The situation is even worse with the availability of ample resources on the web. This paper focuses on an effective plagiarism detection tool on identifying suitable intra-corpal plagiarism detection for text based assignments by comparing unigram, bigram, trigram of vector space model with cosine similarity measure. Manually evaluated, labelled dataset was tested using unigram, bigram and trigram vector. Even though trigram vector consumes comparatively more time, it shows better results with the labelled data. In addition, the selected trigram vector space model with cosine similarity measure is compared with tri-gram sequence matching technique with Jaccard measure. In the results, cosine similarity score shows slightly higher values than the other. Because, it focuses on giving more weight for terms that do not frequently exist in the dataset and cosine similarity measure using trigram technique is more preferable than the other. Therefore, we present our new tool and it could be used as an effective tool to evaluate text based electronic assignments and minimize the plagiarism among students.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
A Feasibility Study on Programmer Specific Instruction Set Processors (PSISPs)
Authors:
T. M. R. L. B. Abeysinghe,
N. Hassan,
R. G. Ragel
Abstract:
ASIPs are designed in order to execute instructions of a particular domain of applications. The designing of ASIPs addresses the major challenges faced by a system on chip such as size, cost, performance and energy consumption. The higher the number of similar instructions within the domain to be mapped the lesser the energy consumption, the smaller the size and the higher the performance of the A…
▽ More
ASIPs are designed in order to execute instructions of a particular domain of applications. The designing of ASIPs addresses the major challenges faced by a system on chip such as size, cost, performance and energy consumption. The higher the number of similar instructions within the domain to be mapped the lesser the energy consumption, the smaller the size and the higher the performance of the ASIP. Thus, designing processors for domains with more similar programs would overcome these issues. This paper describes the investigation of whether the domains of programmer specific programs have any significance like application specific program domains and thus, whether the approach of designing processors known as Programmer Specific Instruction Set Processors is worthwhile. We performed the evaluation at the instruction level by using four different measures to obtain the similarity of programs: (1) by the existence of each instruction, (2) by the frequency of each instruction, (3) by two consecutive instruction patterns and (4) by three consecutive instruction patterns of application specific and programmer specific programs. We found that although programmer specific instructions show some impact on the similarity measures, they are much smaller and therefore insignificant compared to the impact from application specific programs.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
Locating Tables in Scanned Documents for Reconstructing and Republishing (ICIAfS14)
Authors:
Akmal Jahan Mac,
Roshan G Ragel
Abstract:
Pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic material. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work if we are to maintain the format and the look of the electronic documents as same as their printed counterparts. Mo…
▽ More
Pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic material. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work if we are to maintain the format and the look of the electronic documents as same as their printed counterparts. Most of the printed documents contain not only characters and its formatting but also some associated non text objects such as tables, charts and graphical objects. It is challenging to detect them and to concentrate on the format preservation of the contents while reproducing them. To address this issue, we propose an algorithm using local thresholds for word space and line height to locate and extract all categories of tables from scanned document images. From the experiments performed on 298 documents, we conclude that our algorithm has an overall accuracy of about 75% in detecting tables from the scanned document images. Since the algorithm does not completely depend on rule lines, it can detect all categories of tables in a range of scanned documents with different font types, styles and sizes to extract their formatting features. Moreover, the algorithm can be applied to locate tables in multi column layouts with small modification in layout analysis. Treating tables with their existing formatting features will tremendously help the reproducing of printed documents for reprinting and updating purposes.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
Efficient Switch Architectures for Pre-configured Backup Protection with Sharing in Elastic Optical Networks (EON)
Authors:
Suthaharan Satkunarajah,
Krishanthmohan Ratnam,
Roshan G. Ragel
Abstract:
In this paper, we address the problem of providing survivability in elastic optical networks (EONs). EONs use fine granular frequency slots or flexible grids, when compared to the conventional fixed grid networks and therefore utilize the frequency spectrum efficiently. For providing survivability in EONs, we consider a recently proposed survivability method for conventional fixed grid networks, k…
▽ More
In this paper, we address the problem of providing survivability in elastic optical networks (EONs). EONs use fine granular frequency slots or flexible grids, when compared to the conventional fixed grid networks and therefore utilize the frequency spectrum efficiently. For providing survivability in EONs, we consider a recently proposed survivability method for conventional fixed grid networks, known as pre-configured backup protection with sharing (PBPS), because of its benefits over the traditional survivability approaches such as dedicated and shared protection. In PBPS, backup paths can be pre-configured and at the same time they can share resources. Therefore, both short recovery time and efficient resource usage can be achieved. We find that the existing switch architectures do not support both PBPS and EONs. Specifically, we identify and illustrate that, if a switch architecture is not carefully designed, several key problems/issues might arise in certain scenarios. Such problems include unnecessary resource consumption, inability of using existing free resources, and incapability of sharing backup paths. These problems appear when PBPS is adopted in EONs and they do not arise in fixed grid networks. In this paper, we propose new switch architectures which support both PBPS and EONs. Particularly, we illustrate that, our switch architectures avoid the specific problems/issues mentioned above. Therefore, our switch architectures support using resources more efficiently and reducing blocking of requests.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
Accelerating Correlation Power Analysis Using Graphics Processing Units
Authors:
Hasindu Gamaarachchi,
Roshan Ragel,
Darshana Jayasinghe
Abstract:
Correlation Power Analysis (CPA) is a type of power analysis based side channel attack that can be used to derive the secret key of encryption algorithms including DES (Data Encryption Standard) and AES (Advanced Encryption Standard). A typical CPA attack on unprotected AES is performed by analysing a few thousand power traces that requires about an hour of computational time on a general purpose…
▽ More
Correlation Power Analysis (CPA) is a type of power analysis based side channel attack that can be used to derive the secret key of encryption algorithms including DES (Data Encryption Standard) and AES (Advanced Encryption Standard). A typical CPA attack on unprotected AES is performed by analysing a few thousand power traces that requires about an hour of computational time on a general purpose CPU. Due to the severity of this situation, a large number of researchers work on countermeasures to such attacks. Verifying that a proposed countermeasure works well requires performing the CPA attack on about 1.5 million power traces. Such processing, even for a single attempt of verification on commodity hardware would run for several days making the verification process infeasible. Modern Graphics Processing Units (GPUs) have support for thousands of light weight threads, making them ideal for parallelizable algorithms like CPA. While the cost of a GPU being lesser than a high performance multicore server, still the GPU performance for this algorithm is many folds better than that of a multicore server. We present an algorithm and its implementation on GPU for CPA on 128-bit AES that is capable of executing 1300x faster than that on a single threaded CPU and more than 60x faster than that on a 32 threaded multicore server. We show that an attack that would take hours on the multicore server would take even less than a minute on a much cost effective GPU.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
A Fuzzy Based Model to Identify Printed Sinhala Characters (ICIAfS14)
Authors:
G. I. Gunarathna,
M. A. P. Chamikara,
R. G. Ragel
Abstract:
Character recognition techniques for printed documents are widely used for English language. However, the systems that are implemented to recognize Asian languages struggle to increase the accuracy of recognition. Among other Asian languages (such as Arabic, Tamil, Chinese), Sinhala characters are unique, mainly because they are round in shape. This unique feature makes it a challenge to extend th…
▽ More
Character recognition techniques for printed documents are widely used for English language. However, the systems that are implemented to recognize Asian languages struggle to increase the accuracy of recognition. Among other Asian languages (such as Arabic, Tamil, Chinese), Sinhala characters are unique, mainly because they are round in shape. This unique feature makes it a challenge to extend the prevailing techniques to improve recognition of Sinhala characters. Therefore, a little attention has been given to improve the accuracy of Sinhala character recognition. A novel method, which makes use of this unique feature, could be advantageous over other methods. This paper describes the use of a fuzzy inference system to recognize Sinhala characters. Feature extraction is mainly focused on distance and intersection measurements in different directions from the center of the letter making use of the round shape of characters. The results showed an overall accuracy of 90.7% for 140 instances of letters tested, much better than similar systems.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
LineCAPTCHA Mobile: A User Friendly Replacement for Unfriendly Reverse Turing Tests for Mobile Devices (ICIAfS14)
Authors:
C. B Bulumulla,
R. G. Ragel
Abstract:
As smart phones and tablets are becoming ubiquitous and taking over as the primary choice for accessing the Internet worldwide, ensuring a secure gateway to the servers serving such devices become essential. CAPTCHAs play an important role in identifying human users in internet to prevent unauthorized bot attacks. Even though there are numerous CAPTCHA alternatives available today, there are certa…
▽ More
As smart phones and tablets are becoming ubiquitous and taking over as the primary choice for accessing the Internet worldwide, ensuring a secure gateway to the servers serving such devices become essential. CAPTCHAs play an important role in identifying human users in internet to prevent unauthorized bot attacks. Even though there are numerous CAPTCHA alternatives available today, there are certain drawbacks attached with each alternative, making them harder to find a general solution for the necessity of a CAPTCHA mechanism. With the advancing technology and expertise in areas such as AI, cryptography and image processing, it has come to a stage where the chase between making and breaking CAPTCHAs are even now. This has led the humans with a hard time deciphering the CAPTCHA mechanisms. In this paper, we adapt a novel CAPTCHA mechanism named as LineCAPTCHA to mobile devices. LineCAPTCHA is a new reverse Turing test based on drawing on top of Bezier curves within noisy backgrounds. The major objective of this paper is to report the implementation and evaluation of LineCAPTCHA on a mobile platform. At the same time we impose certain security standards and security aspects for establishing LineCAPTCHAs which are obtained through extensive measures. Independency from factors such as the fluency in English language, age and easily understandable nature of it inclines the usability of LineCAPTCHA. We believe that such independency will favour the main target of LineCAPTCHA, user friendliness and usability.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
Register Spilling for Specific Application Domains in Application Specific Instruction-set Processors
Authors:
M. G. G. C. R. Salgado,
R. G. Ragel
Abstract:
An Application Specific Instruction set Processor (ASIP) is an important component in designing embedded systems. One of the problems in designing an instruction set for such processors is determining the number of registers is needed in the processor that will optimize the computational time and the cost. The performance of a processor may fall short due to register spilling, which is caused by t…
▽ More
An Application Specific Instruction set Processor (ASIP) is an important component in designing embedded systems. One of the problems in designing an instruction set for such processors is determining the number of registers is needed in the processor that will optimize the computational time and the cost. The performance of a processor may fall short due to register spilling, which is caused by the lack of available registers in a processor. In the design perspective, it will result in processors with great performance and low power consumption if we can avoid register spilling by deciding a value for the number of registers needed in an ASIP. However, as of now, it has not clearly been recognized how the number of registers changes with different application domains. In this paper, we evaluated whether different application domains have any significant effect on register spilling and therefore the performance of a processor so that we could use different number of registers when building ASIPs for different application domains rather than using a constant set of registers. Such utilization of registers will result in processors with high performance, low cost and low power consumption.
△ Less
Submitted 24 December, 2014;
originally announced December 2014.
-
Heterogeneous processor pipeline for a product cipher application
Authors:
I. B. Nawinne,
M. S. Wickramasinghe,
R. G. Ragel,
S. Radhakrishnan
Abstract:
Processing data received as a stream is a task commonly performed by modern embedded devices, in a wide range of applications such as multimedia (encoding/decoding/ playing media), networking (switching and routing), digital security, scientific data processing, etc. Such processing normally tends to be calculation intensive and therefore requiring significant processing power. Therefore, hardware…
▽ More
Processing data received as a stream is a task commonly performed by modern embedded devices, in a wide range of applications such as multimedia (encoding/decoding/ playing media), networking (switching and routing), digital security, scientific data processing, etc. Such processing normally tends to be calculation intensive and therefore requiring significant processing power. Therefore, hardware acceleration methods to increase the performance of such applications constitute an important area of study. In this paper, we present an evaluation of one such method to process streaming data, namely multi-processor pipeline architecture. The hardware is based on a Multiple-Processor System on Chip (MPSoC), using a data encryption algorithm as a case study. The algorithm is partitioned on a coarse grained level and mapped on to an MPSoC with five processor cores in a pipeline, using specifically configured Xtensa LX3 cores. The system is then selectively optimized by strengthening and pruning the resources of each processor core. The optimized system is evaluated and compared against an optimal single-processor System on Chip (SoC) for the same application. The multiple-processor pipeline system for data encryption algorithms used was observed to provide significant speed ups, up to 4.45 times that of the single-processor system, which is close to the ideal speed up from a five-stage pipeline.
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Countermeasures against Bernstein's remote cache timing attack
Authors:
Janaka Alawatugoda,
Darshana Jayasinghe,
Roshan Ragel
Abstract:
Cache timing attack is a type of side channel attack where the leaking timing information due to the cache behaviour of a crypto system is used by an attacker to break the system. Advanced Encryption Standard (AES) was considered a secure encryption standard until 2005 when Daniel Bernstein claimed that the software implementation of AES is vulnerable to cache timing attack. Bernstein demonstrated…
▽ More
Cache timing attack is a type of side channel attack where the leaking timing information due to the cache behaviour of a crypto system is used by an attacker to break the system. Advanced Encryption Standard (AES) was considered a secure encryption standard until 2005 when Daniel Bernstein claimed that the software implementation of AES is vulnerable to cache timing attack. Bernstein demonstrated a remote cache timing attack on a software implementation of AES. The original AES implementation can methodically be altered to prevent the cache timing attack by hiding the natural cache-timing pattern during the encryption while preserving its semantics. The alternations while preventing the attack should not make the implementation very slow. In this paper, we report outcomes of our experiments on designing and implementing a number of possible countermeasures.
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Tile optimization for area in FPGA based hardware acceleration of peptide identification
Authors:
S. M. Vidanagamachchi,
S. D. Dewasurendra,
R. G. Ragel,
M. Niranjan
Abstract:
Advances in life sciences over the last few decades have lead to the generation of a huge amount of biological data. Computing research has become a vital part in driving biological discovery where analysis and categorization of biological data are involved. String matching algorithms can be applied for protein/gene sequence matching and with the phenomenal increase in the size of string databases…
▽ More
Advances in life sciences over the last few decades have lead to the generation of a huge amount of biological data. Computing research has become a vital part in driving biological discovery where analysis and categorization of biological data are involved. String matching algorithms can be applied for protein/gene sequence matching and with the phenomenal increase in the size of string databases to be analyzed, software implementations of these algorithms seems to have hit a hard limit and hardware acceleration is increasingly being sought. Several hardware platforms such as Field Programmable Gate Arrays (FPGA), Graphics Processing Units (GPU) and Chip Multi Processors (CMP) are being explored as hardware platforms. In this paper, we give a comprehensive overview of the literature on hardware acceleration of string matching algorithms, we take an FPGA hardware exploration and expedite the design time by a design automation technique. Further, our design automation is also optimized for better hardware utilization through optimizing the number of peptides that can be represented in an FPGA tile. The results indicate significant improvements in design time and hardware utilization which are reported in this paper.
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Improving the throughput of the AES algorithm with multicore processors
Authors:
A. Barnes,
R. Fernando,
K. Mettananda,
R. G. Ragel
Abstract:
AES, Advanced Encryption Standard, can be considered the most widely used modern symmetric key encryption standard. To encrypt/decrypt a file using the AES algorithm, the file must undergo a set of complex computational steps. Therefore a software implementation of AES algorithm would be slow and consume large amount of time to complete. The immense increase of both stored and transferred data in…
▽ More
AES, Advanced Encryption Standard, can be considered the most widely used modern symmetric key encryption standard. To encrypt/decrypt a file using the AES algorithm, the file must undergo a set of complex computational steps. Therefore a software implementation of AES algorithm would be slow and consume large amount of time to complete. The immense increase of both stored and transferred data in the recent years had made this problem even more daunting when the need to encrypt/decrypt such data arises. As a solution to this problem, in this paper, we present an extensive study of enhancing the throughput of AES encryption algorithm by utilizing the state of the art multicore architectures. We take a sequential program that implements the AES algorithm and convert the same to run on multicore architectures with minimum effort. We implement two different parallel programmes, one with the fork system call in Linux and the other with the pthreads, the POSIX standard for threads. Later, we ran both the versions of the parallel programs on different multicore architectures and compared and analysed the throughputs between the implementations and among different architectures. The pthreads implementation outperformed in all the experiments we conducted and the best throughput obtained is around 7Gbps on a 32-core processor (the largest number of cores we had) with the pthreads implementation.
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Accelerating string matching for bio-computing applications on multi-core CPUs
Authors:
D. Herath,
C. Lakmali,
R. G. Ragel
Abstract:
Huge amount of data in the form of strings are being handled in bio-computing applications and searching algorithms are quite frequently used in them. Many methods utilizing on both software and hardware are being proposed to accelerate processing of such data. The typical hardware-based acceleration techniques either require special hardware such as general purpose graphics processing units (GPGP…
▽ More
Huge amount of data in the form of strings are being handled in bio-computing applications and searching algorithms are quite frequently used in them. Many methods utilizing on both software and hardware are being proposed to accelerate processing of such data. The typical hardware-based acceleration techniques either require special hardware such as general purpose graphics processing units (GPGPUs) or need building a new hardware such as an FPGA based design. On the other hard, software-based acceleration techniques are easier since they only require some changes in the software code or the software architecture. Typical software-based techniques make use of computers connected over a network, also known as a network grid to accelerate the processing. In this paper, we test the hypothesis that multi-core architectures should provide better performance in this kind of computation, but still it would depend on the algorithm selected as well as the programming model being utilized. We present the acceleration of a string-searching algorithm on a multi-core CPU via a POSIX thread based implementation. Our implementation on an 8-core processor (that supports 16-threads) resulted in 9x throughput improvement compared to a single thread implementation.
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Constant time encryption as a countermeasure against remote cache timing attacks
Authors:
D. Jayasinghe,
R. G. Ragel,
D. Elkaduwe
Abstract:
Rijndael was standardized in 2001 by National Institute of Standard and Technology as the Advanced Encryption Standard (AES). AES is still being used to encrypt financial, military and even government confidential data. In 2005, Bernstein illustrated a remote cache timing attack on AES using the client-server architecture and therefore proved a side channel in its software implementation. Over the…
▽ More
Rijndael was standardized in 2001 by National Institute of Standard and Technology as the Advanced Encryption Standard (AES). AES is still being used to encrypt financial, military and even government confidential data. In 2005, Bernstein illustrated a remote cache timing attack on AES using the client-server architecture and therefore proved a side channel in its software implementation. Over the years, a number of countermeasures have been proposed against cache timing attacks both using hardware and software. Although the software based countermeasures are flexible and easy to deploy, most of such countermeasures are vulnerable to statistical analysis. In this paper, we propose a novel software based countermeasure against cache timing attacks, known as constant time encryption, which we believe is secure against statistical analysis. The countermeasure we proposed performs rescheduling of instructions such that the encryption rounds will consume constant time independent of the cache hits and misses. Through experiments, we prove that our countermeasure is secure against Bernstein's cache timing attack.
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Instruction-set Selection for Multi-application based ASIP Design: An Instruction-level Study
Authors:
R. G. Ragel,
Swarnalatha Radhakrishnan,
Angelo Ambrose
Abstract:
Efficiency in embedded systems is paramount to achieve high performance while consuming less area and power. Processors in embedded systems have to be designed carefully to achieve such design constraints. Application Specific Instruction set Processors (ASIPs) exploit the nature of applications to design an optimal instruction set. Despite being not general to execute any application, ASIPs are h…
▽ More
Efficiency in embedded systems is paramount to achieve high performance while consuming less area and power. Processors in embedded systems have to be designed carefully to achieve such design constraints. Application Specific Instruction set Processors (ASIPs) exploit the nature of applications to design an optimal instruction set. Despite being not general to execute any application, ASIPs are highly preferred in the embedded systems industry where the devices are produced to satisfy a certain type of application domain/s (either intra-domain or inter-domain). Typically, ASIPs are designed from a base-processor and functionalities are added for applications. This paper studies the multi-application ASIPs and their instruction sets, extensively analysing the instructions for inter-domain and intra-domain designs. Metrics analysed are the reusable instructions and the extra cost to add a certain application. A wide range of applications from various application benchmarks (MiBench, MediaBench and SPEC2006) and domains are analysed for two different architectures (ARM-Thumb and PISA). Our study shows that the intra-domain applications contain larger number of common instructions, whereas the inter-domain applications have very less common instructions, regardless of the architecture (and therefore the ISA).
△ Less
Submitted 28 March, 2014;
originally announced March 2014.
-
Software implementation level countermeasures against the cache timing attack on advanced encryption standard
Authors:
U. Herath,
J. Alawatugoda,
R. G. Ragel
Abstract:
Advanced Encryption Standard (AES) is a symmetric key encryption algorithm which is extensively used in secure electronic data transmission. When introduced, although it was tested and declared as secure, in 2005, a researcher named Bernstein claimed that it is vulnerable to side channel attacks. The cache-based timing attack is the type of side channel attack demonstrated by Bernstein, which uses…
▽ More
Advanced Encryption Standard (AES) is a symmetric key encryption algorithm which is extensively used in secure electronic data transmission. When introduced, although it was tested and declared as secure, in 2005, a researcher named Bernstein claimed that it is vulnerable to side channel attacks. The cache-based timing attack is the type of side channel attack demonstrated by Bernstein, which uses the timing variation in cache hits and misses. This kind of attacks can be prevented by masking the actual timing information from the attacker. Such masking can be performed by altering the original AES software implementation while preserving its semantics. This paper presents possible software implementation level countermeasures against Bernstein's cache timing attack. Two simple software based countermeasures based on the concept of "constant-encryption-time" were demonstrated against the remote cache timing attack with positive outcomes, in which we establish a secured environment for the AES encryption.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
Hardware accelerated protein inference framework
Authors:
S. M. Vidanagamachchi,
S. D. Dewasurendra,
R. G. Ragel
Abstract:
Protein inference plays a vital role in the proteomics study. Two major approaches could be used to handle the problem of protein inference; top-down and bottom-up. This paper presents a framework for protein inference, which uses hardware accelerated protein inference framework for handling the most important step in a bottom-up approach, viz. peptide identification during the assembling process.…
▽ More
Protein inference plays a vital role in the proteomics study. Two major approaches could be used to handle the problem of protein inference; top-down and bottom-up. This paper presents a framework for protein inference, which uses hardware accelerated protein inference framework for handling the most important step in a bottom-up approach, viz. peptide identification during the assembling process. In our framework, identified peptides and their probabilities are used to predict the most suitable reference protein cluster for a given input amino acid sequence with the probability of identified peptides. The framework is developed on an FPGA where hardware software co-design techniques are used to accelerate the computationally intensive parts of the protein inference process. In the paper we have measured, compared and reported the time taken for the protein inference process in our framework against a pure software implementation.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
Hardware software co-design of the Aho-Corasick algorithm: Scalable for protein identification?
Authors:
S. M. Vidanagamachchi,
S. D. Dewasurendra,
R. G. Ragel
Abstract:
Pattern matching is commonly required in many application areas and bioinformatics is a major area of interest that requires both exact and approximate pattern matching. Much work has been done in this area, yet there is still a significant space for improvement in efficiency, flexibility, and throughput. This paper presents a hardware software co-design of Aho-Corasick algorithm in Nios II soft-p…
▽ More
Pattern matching is commonly required in many application areas and bioinformatics is a major area of interest that requires both exact and approximate pattern matching. Much work has been done in this area, yet there is still a significant space for improvement in efficiency, flexibility, and throughput. This paper presents a hardware software co-design of Aho-Corasick algorithm in Nios II soft-processor and a study on its scalability for a pattern matching application. A software only approach is used to compare the throughput and the scalability of the hardware software co-design approach. According to the results we obtained, we conclude that the hardware software co-design implementation shows a maximum of 10 times speed up for pattern size of 1200 peptides compared to the software only implementation. The results also show that the hardware software co-design approach scales well for increasing data size compared to the software only approach.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
Authorship detection of SMS messages using unigrams
Authors:
R. G. Ragel,
P. Herath,
U. Senanayake
Abstract:
SMS messaging is a popular media of communication. Because of its popularity and privacy, it could be used for many illegal purposes. Additionally, since they are part of the day to day life, SMSes can be used as evidence for many legal disputes. Since a cellular phone might be accessible to people close to the owner, it is important to establish the fact that the sender of the message is indeed t…
▽ More
SMS messaging is a popular media of communication. Because of its popularity and privacy, it could be used for many illegal purposes. Additionally, since they are part of the day to day life, SMSes can be used as evidence for many legal disputes. Since a cellular phone might be accessible to people close to the owner, it is important to establish the fact that the sender of the message is indeed the owner of the phone. For this purpose, the straight forward solutions seem to be the use of popular stylometric methods. However, in comparison with the data used for stylometry in the literature, SMSes have unusual characteristics making it hard or impossible to apply these methods in a conventional way. Our target is to come up with a method of authorship detection of SMS messages that could still give a usable accuracy. We argue that, considering the methods of author attribution, the best method that could be applied to SMS messages is an n-gram method. To prove our point, we checked two different methods of distribution comparison with varying number of training and testing data. We specifically try to compare how well our algorithms work under less amount of testing data and large number of candidate authors (which we believe to be the real world scenario) against controlled tests with less number of authors and selected SMSes with large number of words. To counter the lack of information in an SMS message, we propose the method of stacking together few SMSes.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
Accelerating motif finding in DNA sequences with multicore CPUs
Authors:
P. Perera,
R. G. Ragel
Abstract:
Motif discovery in DNA sequences is a challenging task in molecular biology. In computational motif discovery, Planted (l, d) motif finding is a widely studied problem and numerous algorithms are available to solve it. Both hardware and software accelerators have been introduced to accelerate the motif finding algorithms. However, the use of hardware accelerators such as FPGAs needs hardware speci…
▽ More
Motif discovery in DNA sequences is a challenging task in molecular biology. In computational motif discovery, Planted (l, d) motif finding is a widely studied problem and numerous algorithms are available to solve it. Both hardware and software accelerators have been introduced to accelerate the motif finding algorithms. However, the use of hardware accelerators such as FPGAs needs hardware specialists to design such systems. Software based acceleration methods on the other hand are easier to implement than hardware acceleration techniques. Grid computing is one such software based acceleration technique which has been used in acceleration of motif finding. However, drawbacks such as network communication delays and the need of fast interconnection between nodes in the grid can limit its usage and scalability. As using multicore CPUs to accelerate CPU intensive tasks are becoming increasingly popular and common nowadays, we can employ it to accelerate motif finding and it can be a faster method than grid based acceleration. In this paper, we have explored the use of multicore CPUs to accelerate motif finding. We have accelerated the Skip-Brute Force algorithm on multicore CPUs parallelizing it using the POSIX thread library. Our method yielded an average speed up of 34x on a 32-core processor compared to a speed up of 21x on a grid based implementation of 32 nodes.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
AntiPlag: Plagiarism Detection on Electronic Submissions of Text Based Assignments
Authors:
M. A. C. Jiffriya,
M. A. C. Akmal Jahan,
R. G. Ragel,
S. Deegalla
Abstract:
Plagiarism is one of the growing issues in academia and is always a concern in Universities and other academic institutions. The situation is becoming even worse with the availability of ample resources on the web. This paper focuses on creating an effective and fast tool for plagiarism detection for text based electronic assignments. Our plagiarism detection tool named AntiPlag is developed using…
▽ More
Plagiarism is one of the growing issues in academia and is always a concern in Universities and other academic institutions. The situation is becoming even worse with the availability of ample resources on the web. This paper focuses on creating an effective and fast tool for plagiarism detection for text based electronic assignments. Our plagiarism detection tool named AntiPlag is developed using the tri-gram sequence matching technique. Three sets of text based assignments were tested by AntiPlag and the results were compared against an existing commercial plagiarism detection tool. AntiPlag showed better results in terms of false positives compared to the commercial tool due to the pre-processing steps performed in AntiPlag. In addition, to improve the detection latency, AntiPlag applies a data clustering technique making it four times faster than the commercial tool considered. AntiPlag could be used to isolate plagiarized text based assignments from non-plagiarised assignments easily. Therefore, we present AntiPlag, a fast and effective tool for plagiarism detection on text based electronic assignments.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
String Matching with Multicore CPUs: Performing Better with the Aho-Corasick Algorithm
Authors:
S. Arudchutha,
T. Nishanthy,
R. G. Ragel
Abstract:
Multiple string matching is known as locating all the occurrences of a given number of patterns in an arbitrary string. It is used in bio-computing applications where the algorithms are commonly used for retrieval of information such as sequence analysis and gene/protein identification. Extremely large amount of data in the form of strings has to be processed in such bio-computing applications. Th…
▽ More
Multiple string matching is known as locating all the occurrences of a given number of patterns in an arbitrary string. It is used in bio-computing applications where the algorithms are commonly used for retrieval of information such as sequence analysis and gene/protein identification. Extremely large amount of data in the form of strings has to be processed in such bio-computing applications. Therefore, improving the performance of multiple string matching algorithms is always desirable. Multicore architectures are capable of providing better performance by parallelizing the multiple string matching algorithms. The Aho-Corasick algorithm is the one that is commonly used in exact multiple string matching algorithms. The focus of this paper is the acceleration of Aho-Corasick algorithm through a multicore CPU based software implementation. Through our implementation and evaluation of results, we prove that our method performs better compared to the state of the art.
△ Less
Submitted 5 March, 2014;
originally announced March 2014.
-
User Friendly Line CAPTCHAs
Authors:
A. K. B. Karunathilake,
B. M. D. Balasuriya,
R. G. Ragel
Abstract:
CAPTCHAs or reverse Turing tests are real-time assessments used by programs (or computers) to tell humans and machines apart. This is achieved by assigning and assessing hard AI problems that could only be solved easily by human but not by machines. Applications of such assessments range from stop** spammers from automatically filling online forms to preventing hackers from performing dictionary…
▽ More
CAPTCHAs or reverse Turing tests are real-time assessments used by programs (or computers) to tell humans and machines apart. This is achieved by assigning and assessing hard AI problems that could only be solved easily by human but not by machines. Applications of such assessments range from stop** spammers from automatically filling online forms to preventing hackers from performing dictionary attack. Today, the race between makers and breakers of CAPTCHAs is at a juncture, where the CAPTCHAs proposed are not even answerable by humans. We consider such CAPTCHAs as non user friendly. In this paper, we propose a novel technique for reverse Turing test - we call it the Line CAPTCHAs - that mainly focuses on user friendliness while not compromising the security aspect that is expected to be provided by such a system.
△ Less
Submitted 4 February, 2014;
originally announced February 2014.
-
Loop Unrolling in Multi-pipeline ASIP Design
Authors:
Rajitha Navarathna,
Swarnalatha Radhakrishnan,
Roshan Ragel
Abstract:
Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allows customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs were proposed to improve the performance of such systems by compromising between speed and processor area. One of the problems in the multi-pipeline design is t…
▽ More
Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allows customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs were proposed to improve the performance of such systems by compromising between speed and processor area. One of the problems in the multi-pipeline design is the limited inherent instruction level parallelism (ILP) available in applications. The ILP of application programs can be improved via a compiler optimization technique known as loop unrolling. In this paper, we present how loop unrolling effects the performance of multi-pipeline ASIPs. The improvements in performance average around 15% for a number of benchmark applications with the maximum improvement of around 30%. In addition, we analyzed the variable of performance against loop unrolling factor, which is the amount of unrolling we perform.
△ Less
Submitted 4 February, 2014;
originally announced February 2014.
-
Axis2UNO: Web Services Enabled Openoffice.org
Authors:
B. A. N. M. Bambarasinghe,
H. M. S. Huruggamuwa,
R. G. Ragel,
S. Radhakrishnan
Abstract:
Openoffice.org is a popular, free and open source office product. This product is used by millions of people and developed, maintained and extended by thousands of developers worldwide. Playing a dominant role in the web, web services technology is serving millions of people every day. Axis2 is one of the most popular, free and open source web service engines. The framework presented in this paper…
▽ More
Openoffice.org is a popular, free and open source office product. This product is used by millions of people and developed, maintained and extended by thousands of developers worldwide. Playing a dominant role in the web, web services technology is serving millions of people every day. Axis2 is one of the most popular, free and open source web service engines. The framework presented in this paper, Axis2UNO, a combination of such two technologies is capable of making a new era in office environment. Two other attempts to enhance web services functionality in office products are Excel Web Services and UNO Web Service Proxy. Excel Web Services is combined with Microsoft SharePoint technology and exposes information sharing in a different perspective within the proprietary Microsoft office products. UNO Web Service Proxy is implemented with Java Web Services Developer Pack and enables basic web services related functionality in Openoffice.org. However, the work presented here is the first one to combine Openoffice.org and Axis2 and we expect it to outperform the other efforts with the community involvement and feature richness in those products.
△ Less
Submitted 4 February, 2014;
originally announced February 2014.
-
High Throughput Virtual Screening with Data Level Parallelism in Multi-core Processors
Authors:
Upul Senanayake,
Rahal Prabuddha,
Roshan Ragel
Abstract:
Improving the throughput of molecular docking, a computationally intensive phase of the virtual screening process, is a highly sought area of research since it has a significant weight in the drug designing process. With such improvements, the world might find cures for incurable diseases like HIV disease and Cancer sooner. Our approach presented in this paper is to utilize a multi-core environmen…
▽ More
Improving the throughput of molecular docking, a computationally intensive phase of the virtual screening process, is a highly sought area of research since it has a significant weight in the drug designing process. With such improvements, the world might find cures for incurable diseases like HIV disease and Cancer sooner. Our approach presented in this paper is to utilize a multi-core environment to introduce Data Level Parallelism (DLP) to the Autodock Vina software, which is a widely used for molecular docking software. Autodock Vina already exploits Instruction Level Parallelism (ILP) in multi-core environments and therefore optimized for such environments. However, with the results we have obtained, it can be clearly seen that our approach has enhanced the throughput of the already optimized software by more than six times. This will dramatically reduce the time consumed for the lead identification phase in drug designing along with the shift in the processor technology from multi-core to many-core of the current era. Therefore, we believe that the contribution of this project will effectively make it possible to expand the number of small molecules docked against a drug target and improving the chances to design drugs for incurable diseases.
△ Less
Submitted 3 December, 2013;
originally announced December 2013.