-
Free-Space Optical Channel Turbulence Prediction: A Machine Learning Approach
Authors:
Md Zobaer Islam,
Ethan Abele,
Fahim Ferdous Hossain,
Arsalan Ahmad,
Sabit Ekin,
John F. O'Hara
Abstract:
Channel turbulence presents a formidable obstacle for free-space optical (FSO) communication. Anticipation of turbulence levels is highly important for mitigating disruptions. We study the application of machine learning (ML) to FSO data streams to rapidly predict channel turbulence levels with no additional sensing hardware. An optical bit stream was transmitted through a controlled channel in th…
▽ More
Channel turbulence presents a formidable obstacle for free-space optical (FSO) communication. Anticipation of turbulence levels is highly important for mitigating disruptions. We study the application of machine learning (ML) to FSO data streams to rapidly predict channel turbulence levels with no additional sensing hardware. An optical bit stream was transmitted through a controlled channel in the lab under six distinct turbulence levels, and the efficacy of using ML to classify turbulence levels was examined. ML-based turbulence level classification was found to be >98% accurate with multiple ML training parameters, but highly dependent upon the timescale of changes between turbulence levels.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Multi-hop graph transformer network for 3D human pose estimation
Authors:
Zaedul Islam,
A. Ben Hamza
Abstract:
Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi-hop graph transformer network designed for 2D-to-3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle lon…
▽ More
Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. In this paper, we introduce a multi-hop graph transformer network designed for 2D-to-3D human pose estimation in videos by leveraging the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi-hop graph convolutional block comprised of multi-hop convolutional and dilated convolutional layers. The combination of multi-head self-attention and multi-hop graph convolutional layers enables the model to capture both local and global dependencies, while the integration of dilated convolutional layers enhances the model's ability to handle spatial details required for accurate localization of the human body joints. Extensive experiments demonstrate the effectiveness and generalization ability of our model, achieving competitive performance on benchmark datasets.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction
Authors:
Ling Yue,
Jonathan Li,
Md Zabirul Islam,
Bolun Xia,
Tianfan Fu,
**tai Chen
Abstract:
The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a…
▽ More
The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a crucial factor that influences overall expenses. Therefore, effective management of the timeline of a clinical trial is essential for controlling the budget and maximizing the economic viability of the research. To address this issue, We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data, including disease names, drug molecules, trial phases, and eligibility criteria. Then, we encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding of clinical trial data. Finally, the model's hierarchical attention mechanism connects all of the embeddings to capture their interactions and predict clinical trial duration. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models, indicating more accurate clinical trial duration prediction. Publicly available code can be found at https://anonymous.4open.science/r/TrialDura-F196
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Analyzing the Dynamics of COVID-19 Lockdown Success: Insights from Regional Data and Public Health Measures
Authors:
Md. Motaleb Hossen Manik,
Md. Ahsan Habib,
Md. Zabirul Islam,
Tanim Ahmed,
Fabliha Haque
Abstract:
The COVID-19 pandemic caused by the coronavirus had a significant effect on social, economic, and health systems globally. The virus emerged in Wuhan, China, and spread worldwide resulting in severe disease, death, and social interference. Countries implemented lockdowns in various regions to limit the spread of the virus. Some of them were successful and some failed. Here, several factors played…
▽ More
The COVID-19 pandemic caused by the coronavirus had a significant effect on social, economic, and health systems globally. The virus emerged in Wuhan, China, and spread worldwide resulting in severe disease, death, and social interference. Countries implemented lockdowns in various regions to limit the spread of the virus. Some of them were successful and some failed. Here, several factors played a vital role in their success. But mostly these factors and their correlations remained unidentified. In this paper, we unlocked those factors that contributed to the success of lockdown during the COVID-19 pandemic and explored the correlations among them. Moreover, this paper proposes several strategies to control any pandemic situation in the future. Here, it explores the relationships among variables, such as population density, number of infected, death, recovered patients, and the success or failure of the lockdown in different regions of the world. The findings suggest a strong correlation among these factors and indicate that the spread of similar kinds of viruses can be reduced in the future by implementing several safety measures.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Malicious Package Detection using Metadata Information
Authors:
S. Halder,
M. Bewong,
A. Mahboubi,
Y. Jiang,
R. Islam,
Z. Islam,
R. Ip,
E. Ahmed,
G. Ramachandran,
A. Babar
Abstract:
Protecting software supply chains from malicious packages is paramount in the evolving landscape of software development. Attacks on the software supply chain involve attackers injecting harmful software into commonly used packages or libraries in a software repository. For instance, JavaScript uses Node Package Manager (NPM), and Python uses Python Package Index (PyPi) as their respective package…
▽ More
Protecting software supply chains from malicious packages is paramount in the evolving landscape of software development. Attacks on the software supply chain involve attackers injecting harmful software into commonly used packages or libraries in a software repository. For instance, JavaScript uses Node Package Manager (NPM), and Python uses Python Package Index (PyPi) as their respective package repositories. In the past, NPM has had vulnerabilities such as the event-stream incident, where a malicious package was introduced into a popular NPM package, potentially impacting a wide range of projects. As the integration of third-party packages becomes increasingly ubiquitous in modern software development, accelerating the creation and deployment of applications, the need for a robust detection mechanism has become critical. On the other hand, due to the sheer volume of new packages being released daily, the task of identifying malicious packages presents a significant challenge. To address this issue, in this paper, we introduce a metadata-based malicious package detection model, MeMPtec. This model extracts a set of features from package metadata information. These extracted features are classified as either easy-to-manipulate (ETM) or difficult-to-manipulate (DTM) features based on monotonicity and restricted control properties. By utilising these metadata features, not only do we improve the effectiveness of detecting malicious packages, but also we demonstrate its resistance to adversarial attacks in comparison with existing state-of-the-art. Our experiments indicate a significant reduction in both false positives (up to 97.56%) and false negatives (up to 91.86%).
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Novel Representation Learning Technique using Graphs for Performance Analytics
Authors:
Tarek Ramadan,
Ankur Lahiry,
Tanzima Z. Islam
Abstract:
The performance analytics domain in High Performance Computing (HPC) uses tabular data to solve regression problems, such as predicting the execution time. Existing Machine Learning (ML) techniques leverage the correlations among features given tabular datasets, not leveraging the relationships between samples directly. Moreover, since high-quality embeddings from raw features improve the fidelity…
▽ More
The performance analytics domain in High Performance Computing (HPC) uses tabular data to solve regression problems, such as predicting the execution time. Existing Machine Learning (ML) techniques leverage the correlations among features given tabular datasets, not leveraging the relationships between samples directly. Moreover, since high-quality embeddings from raw features improve the fidelity of the downstream predictive models, existing methods rely on extensive feature engineering and pre-processing steps, costing time and manual effort. To fill these two gaps, we propose a novel idea of transforming tabular performance data into graphs to leverage the advancement of Graph Neural Network-based (GNN) techniques in capturing complex relationships between features and samples. In contrast to other ML application domains, such as social networks, the graph is not given; instead, we need to build it. To address this gap, we propose graph-building methods where nodes represent samples, and the edges are automatically inferred iteratively based on the similarity between the features in the samples. We evaluate the effectiveness of the generated embeddings from GNNs based on how well they make even a simple feed-forward neural network perform for regression tasks compared to other state-of-the-art representation learning techniques. Our evaluation demonstrates that even with up to 25% random missing values for each dataset, our method outperforms commonly used graph and Deep Neural Network (DNN)-based approaches and achieves up to 61.67% & 78.56% improvement in MSE loss over the DNN baseline respectively for HPC dataset and Machine Learning Datasets.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Evolutionary Optimization of 1D-CNN for Non-contact Respiration Pattern Classification
Authors:
Md Zobaer Islam,
Sabit Ekin,
John F. O'Hara,
Gary Yen
Abstract:
In this study, we present a deep learning-based approach for time-series respiration data classification. The dataset contains regular breathing patterns as well as various forms of abnormal breathing, obtained through non-contact incoherent light-wave sensing (LWS) technology. Given the one-dimensional (1D) nature of the data, we employed a 1D convolutional neural network (1D-CNN) for classificat…
▽ More
In this study, we present a deep learning-based approach for time-series respiration data classification. The dataset contains regular breathing patterns as well as various forms of abnormal breathing, obtained through non-contact incoherent light-wave sensing (LWS) technology. Given the one-dimensional (1D) nature of the data, we employed a 1D convolutional neural network (1D-CNN) for classification purposes. Genetic algorithm was employed to optimize the 1D-CNN architecture to maximize classification accuracy. Addressing the computational complexity associated with training the 1D-CNN across multiple generations, we implemented transfer learning from a pre-trained model. This approach significantly reduced the computational time required for training, thereby enhancing the efficiency of the optimization process. This study contributes valuable insights into the potential applications of deep learning methodologies for enhancing respiratory anomaly detection through precise and efficient respiration classification.
△ Less
Submitted 16 April, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Seizure detection from Electroencephalogram signals via Wavelets and Graph Theory metrics
Authors:
Paul Grant,
Md Zahidul Islam
Abstract:
Epilepsy is one of the most prevalent neurological conditions, where an epileptic seizure is a transient occurrence due to abnormal, excessive and synchronous activity in the brain. Electroencephalogram signals emanating from the brain may be captured, analysed and then play a significant role in detection and prediction of epileptic seizures. In this work we enhance upon a previous approach that…
▽ More
Epilepsy is one of the most prevalent neurological conditions, where an epileptic seizure is a transient occurrence due to abnormal, excessive and synchronous activity in the brain. Electroencephalogram signals emanating from the brain may be captured, analysed and then play a significant role in detection and prediction of epileptic seizures. In this work we enhance upon a previous approach that relied on the differing properties of the wavelet transform. Here we apply the Maximum Overlap Discrete Wavelet Transform to both reduce signal \textit{noise} and use signal variance exhibited at differing inherent frequency levels to develop various metrics of connection between the electrodes placed upon the scalp. %The properties of both the noise reduced signal and the interconnected electrodes differ significantly during the different brain states.
Using short duration epochs, to approximate close to real time monitoring, together with simple statistical parameters derived from the reconstructed noise reduced signals we initiate seizure detection. To further improve performance we utilise graph theoretic indicators from derived electrode connectivity. From there we build the attribute space. We utilise open-source software and publicly available data to highlight the superior Recall/Sensitivity performance of our approach, when compared to existing published methods.
△ Less
Submitted 27 November, 2023;
originally announced December 2023.
-
Darknet Traffic Analysis A Systematic Literature Review
Authors:
Javeriah Saleem,
Rafiqul Islam,
Zahidul Islam
Abstract:
The primary objective of an anonymity tool is to protect the anonymity of its users through the implementation of strong encryption and obfuscation techniques. As a result, it becomes very difficult to monitor and identify users activities on these networks. Moreover, such systems have strong defensive mechanisms to protect users against potential risks, including the extraction of traffic charact…
▽ More
The primary objective of an anonymity tool is to protect the anonymity of its users through the implementation of strong encryption and obfuscation techniques. As a result, it becomes very difficult to monitor and identify users activities on these networks. Moreover, such systems have strong defensive mechanisms to protect users against potential risks, including the extraction of traffic characteristics and website fingerprinting. However, the strong anonymity feature also functions as a refuge for those involved in illicit activities who aim to avoid being traced on the network. As a result, a substantial body of research has been undertaken to examine and classify encrypted traffic using machine learning techniques. This paper presents a comprehensive examination of the existing approaches utilized for the categorization of anonymous traffic as well as encrypted network traffic inside the darknet. Also, this paper presents a comprehensive analysis of methods of darknet traffic using machine learning techniques to monitor and identify the traffic attacks inside the darknet.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Respiratory Anomaly Detection using Reflected Infrared Light-wave Signals
Authors:
Md Zobaer Islam,
Brenden Martin,
Carly Gotcher,
Tyler Martinez,
John F. O'Hara,
Sabit Ekin
Abstract:
In this study, we present a non-contact respiratory anomaly detection method using incoherent light-wave signals reflected from the chest of a mechanical robot that can breathe like human beings. In comparison to existing radar and camera-based sensing systems for vitals monitoring, this technology uses only a low-cost ubiquitous infrared light source and sensor. This light-wave sensing system rec…
▽ More
In this study, we present a non-contact respiratory anomaly detection method using incoherent light-wave signals reflected from the chest of a mechanical robot that can breathe like human beings. In comparison to existing radar and camera-based sensing systems for vitals monitoring, this technology uses only a low-cost ubiquitous infrared light source and sensor. This light-wave sensing system recognizes different breathing anomalies from the variations of light intensity reflected from the chest of the robot within a 0.5m-1.5m range with an average classification accuracy of up to 96.6% using machine learning.
△ Less
Submitted 22 April, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Critical Role of Artificially Intelligent Conversational Chatbot
Authors:
Seraj A. M. Mostafa,
Md Z. Islam,
Mohammad Z. Islam,
Fairose Jeehan,
Saujanna Jafreen,
Raihan U. Islam
Abstract:
Artificially intelligent chatbot, such as ChatGPT, represents a recent and powerful advancement in the AI domain. Users prefer them for obtaining quick and precise answers, avoiding the usual hassle of clicking through multiple links in traditional searches. ChatGPT's conversational approach makes it comfortable and accessible for finding answers quickly and in an organized manner. However, it is…
▽ More
Artificially intelligent chatbot, such as ChatGPT, represents a recent and powerful advancement in the AI domain. Users prefer them for obtaining quick and precise answers, avoiding the usual hassle of clicking through multiple links in traditional searches. ChatGPT's conversational approach makes it comfortable and accessible for finding answers quickly and in an organized manner. However, it is important to note that these chatbots have limitations, especially in terms of providing accurate answers as well as ethical concerns. In this study, we explore various scenarios involving ChatGPT's ethical implications within academic contexts, its limitations, and the potential misuse by specific user groups. To address these challenges, we propose architectural solutions aimed at preventing inappropriate use and promoting responsible AI interactions.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Predicting Temperature of Major Cities Using Machine Learning and Deep Learning
Authors:
Wasiou Jaharabi,
MD Ibrahim Al Hossain,
Rownak Tahmid,
Md. Zuhayer Islam,
T. M. Saad Rayhan
Abstract:
Currently, the issue that concerns the world leaders most is climate change for its effect on agriculture, environment and economies of daily life. So, to combat this, temperature prediction with strong accuracy is vital. So far, the most effective widely used measure for such forecasting is Numerical weather prediction (NWP) which is a mathematical model that needs broad data from different appli…
▽ More
Currently, the issue that concerns the world leaders most is climate change for its effect on agriculture, environment and economies of daily life. So, to combat this, temperature prediction with strong accuracy is vital. So far, the most effective widely used measure for such forecasting is Numerical weather prediction (NWP) which is a mathematical model that needs broad data from different applications to make predictions. This expensive, time and labor consuming work can be minimized through making such predictions using Machine learning algorithms. Using the database made by University of Dayton which consists the change of temperature in major cities we used the Time Series Analysis method where we use LSTM for the purpose of turning existing data into a tool for future prediction. LSTM takes the long-term data as well as any short-term exceptions or anomalies that may have occurred and calculates trend, seasonality and the stationarity of a data. By using models such as ARIMA, SARIMA, Prophet with the concept of RNN and LSTM we can, filter out any abnormalities, preprocess the data compare it with previous trends and make a prediction of future trends. Also, seasonality and stationarity help us analyze the reoccurrence or repeat over one year variable and removes the constrain of time in which the data was dependent so see the general changes that are predicted. By doing so we managed to make prediction of the temperature of different cities during any time in future based on available data and built a method of accurate prediction. This document contains our methodology for being able to make such predictions.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers
Authors:
Jun Yuan,
Kaustav Bhattacharjee,
Akm Zahirul Islam,
Aritra Dasgupta
Abstract:
Ranking schemes drive many real-world decisions, like, where to study, whom to hire, what to buy, etc. Many of these decisions often come with high consequences. For example, a university can be deemed less prestigious if not featured in a top-k list, and consumers might not even explore products that do not get recommended to buyers. At the heart of most of these decisions are opaque ranking sche…
▽ More
Ranking schemes drive many real-world decisions, like, where to study, whom to hire, what to buy, etc. Many of these decisions often come with high consequences. For example, a university can be deemed less prestigious if not featured in a top-k list, and consumers might not even explore products that do not get recommended to buyers. At the heart of most of these decisions are opaque ranking schemes, which dictate the ordering of data entities, but their internal logic is inaccessible or proprietary. Drawing inferences about the ranking differences is like a guessing game to the stakeholders, like, the rankees (i.e., the entities who are ranked, like product companies) and the decision-makers (i.e., who use the rankings, like buyers). In this paper, we aim to enable transparency in ranking interpretation by using algorithmic rankers that learn from available data and by enabling human reasoning about the learned ranking differences using explainable AI (XAI) methods. To realize this aim, we leverage the exploration-explanation paradigm of human-data interaction to let human stakeholders explore subsets and grou**s of complex multi-attribute ranking data using visual explanations of model fit and attribute influence on rankings. We realize this explanation paradigm for transparent ranking interpretation in TRIVEA, a visual analytic system that is fueled by: i) visualizations of model fit derived from algorithmic rankers that learn the associations between attributes and rankings from available data and ii) visual explanations derived from XAI methods that help abstract important patterns, like, the relative influence of attributes in different ranking ranges. Using TRIVEA, end users not trained in data science have the agency to transparently reason about the global and local behavior of the rankings without the need to open black-box ranking models and develop confidence in the resulting attribute-based inferences. We demonstrate the efficacy of TRIVEA using multiple usage scenarios and subjective feedback from researchers with diverse domain expertise.
Keywords: Visual Analytics, Learning-to-Rank, Explainable ML, Ranking
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
A three in one bottom-up framework for simultaneous semantic segmentation, instance segmentation and classification of multi-organ nuclei in digital cancer histology
Authors:
Ibtihaj Ahmad,
Syed Muhammad Israr,
Zain Ul Islam
Abstract:
Simultaneous segmentation and classification of nuclei in digital histology play an essential role in computer-assisted cancer diagnosis; however, it remains challenging. The highest achieved binary and multi-class Panoptic Quality (PQ) remains as low as 0.68 bPQ and 0.49 mPQ, respectively. It is due to the higher staining variability, variability across the tissue, rough clinical conditions, over…
▽ More
Simultaneous segmentation and classification of nuclei in digital histology play an essential role in computer-assisted cancer diagnosis; however, it remains challenging. The highest achieved binary and multi-class Panoptic Quality (PQ) remains as low as 0.68 bPQ and 0.49 mPQ, respectively. It is due to the higher staining variability, variability across the tissue, rough clinical conditions, overlap** nuclei, and nuclear class imbalance. The generic deep-learning methods usually rely on end-to-end models, which fail to address these problems associated explicitly with digital histology. In our previous work, DAN-NucNet, we resolved these issues for semantic segmentation with an end-to-end model. This work extends our previous model to simultaneous instance segmentation and classification. We introduce additional decoder heads with independent weighted losses, which produce semantic segmentation, edge proposals, and classification maps. We use the outputs from the three-head model to apply post-processing to produce the final segmentation and classification. Our multi-stage approach utilizes edge proposals and semantic segmentations compared to direct segmentation and classification strategies followed by most state-of-the-art methods. Due to this, we demonstrate a significant performance improvement in producing high-quality instance segmentation and nuclei classification. We have achieved a 0.841 Dice score for semantic segmentation, 0.713 bPQ scores for instance segmentation, and 0.633 mPQ for nuclei classification. Our proposed framework is generalized across 19 types of tissues. Furthermore, the framework is less complex compared to the state-of-the-art.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Iterative Graph Filtering Network for 3D Human Pose Estimation
Authors:
Zaedul Islam,
A. Ben Hamza
Abstract:
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accura…
▽ More
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss-Seidel iterative method. Motivated by this iterative solution, we design a Gauss-Seidel network (GS-Net) architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer's initial features as the network depth increases. We evaluate our proposed model on two standard benchmark datasets, and compare it with a comprehensive set of strong baseline methods for 3D human pose estimation. Our experimental results demonstrate that our approach outperforms the baseline methods on both datasets, achieving state-of-the-art performance. Furthermore, we conduct ablation studies to analyze the contributions of different components of our model architecture and show that the skip connection and adjacency modulation help improve the model performance.
△ Less
Submitted 7 August, 2023; v1 submitted 29 July, 2023;
originally announced July 2023.
-
Malware Resistant Data Protection in Hyper-connected Networks: A survey
Authors:
Jannatul Ferdous,
Rafiqul Islam,
Maumita Bhattacharya,
Md Zahidul Islam
Abstract:
Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system, or gain unauthorised network access without the u…
▽ More
Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system, or gain unauthorised network access without the users consent. Due to the increasing applications of computers and dependency on electronically saved private data, malware attacks on sensitive information have become a dangerous issue for individuals and organizations across the world. Hence, malware defense is critical for kee** our computer systems and data protected. Many recent survey articles have focused on either malware detection systems or single attacking strategies variously. To the best of our knowledge, no survey paper demonstrates malware attack patterns and defense strategies combinedly. Through this survey, this paper aims to address this issue by merging diverse malicious attack patterns and machine learning (ML) based detection models for modern and sophisticated malware. In doing so, we focus on the taxonomy of malware attack patterns based on four fundamental dimensions the primary goal of the attack, method of attack, targeted exposure and execution process, and types of malware that perform each attack. Detailed information on malware analysis approaches is also investigated. In addition, existing malware detection techniques employing feature extraction and ML algorithms are discussed extensively. Finally, it discusses research difficulties and unsolved problems, including future research directions.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data
Authors:
B M Tazbiul Hassan Anik,
Zubayer Islam,
Mohamed Abdel-Aty
Abstract:
The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identi…
▽ More
The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identify crash potential. Lately, Transformer has emerged as a potential deep neural network that fundamentally operates through attention-based mechanisms. Transformer has several functional benefits over extant deep learning models such as LSTM, CNN, etc. Firstly, Transformer can readily handle long-term dependencies in a data sequence. Secondly, Transformers can parallelly process all elements in a data sequence during training. Finally, a Transformer does not have the vanishing gradient issue. Realizing the immense possibility of Transformers, this paper proposes inTersection-Transformer (inTformer), a time-embedded attention-based Transformer model that can effectively predict intersection crash likelihood in real-time. The proposed model was evaluated using connected vehicle data extracted from Signal Analytics Platform. Acknowledging the complex traffic operation mechanism at intersection, this study developed zone-specific models by dividing the intersection region into two distinct zones: within-intersection and approach zone. The best inTformer models in 'within-intersection,' and 'approach' zone achieved a sensitivity of 73%, and 70%, respectively. The zone-level models were also compared to earlier studies on crash likelihood prediction at intersections and with several established deep learning models trained on the same connected vehicle dataset.
△ Less
Submitted 29 August, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
A Semi-Automated Hybrid Schema Matching Framework for Vegetation Data Integration
Authors:
Md Asif-Ur-Rahman,
Bayzid Ashik Hossain,
Michael Bewong,
Md Zahidul Islam,
Yanchang Zhao,
Jeremy Groves,
Rory Judith
Abstract:
Integrating disparate and distributed vegetation data is critical for consistent and informed national policy development and management. Australia's National Vegetation Information System (NVIS) under the Department of Climate Change, Energy, the Environment and Water (DCCEEW) is the only nationally consistent vegetation database and hierarchical typology of vegetation types in different location…
▽ More
Integrating disparate and distributed vegetation data is critical for consistent and informed national policy development and management. Australia's National Vegetation Information System (NVIS) under the Department of Climate Change, Energy, the Environment and Water (DCCEEW) is the only nationally consistent vegetation database and hierarchical typology of vegetation types in different locations. Currently, this database employs manual approaches for integrating disparate state and territory datasets which is labour intensive and can be prone to human errors. To cope with the ever-increasing need for up to date vegetation data derived from heterogeneous data sources, a Semi-Automated Hybrid Matcher (SAHM) is proposed in this paper. SAHM utilizes both schema level and instance level matching following a two-tier matching framework. A key novel technique in SAHM called Multivariate Statistical Matching is proposed for automated schema scoring which takes advantage of domain knowledge and correlations between attributes to enhance the matching. To verify the effectiveness of the proposed framework, the performance of the individual as well as combined components of SAHM have been evaluated. The empirical evaluation shows the effectiveness of the proposed framework which outperforms existing state of the art methods like Cupid, Coma, Similarity Flooding, Jaccard Leven Matcher, Distribution Based Matcher, and EmbDI. In particular, SAHM achieves between 88% and 100% accuracy with significantly better F1 scores in comparison with state-of-the-art techniques. SAHM is also shown to be several orders of magnitude more efficient than existing techniques.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms
Authors:
Mehrab Hosain,
Most. Yeasmin Arafat,
Gazi Zahirul Islam,
Jia Uddin,
Md. Mobarak Hossain,
Fatema Alam
Abstract:
There are a variety of features of the human voice that can be classified as pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents that human expresses their feelings using different vocal qualities when they are speaking. The primary objective of this research is to recognize different emotions of human beings such as anger, sadness, fear, neutrality, disgust, pleasant sur…
▽ More
There are a variety of features of the human voice that can be classified as pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents that human expresses their feelings using different vocal qualities when they are speaking. The primary objective of this research is to recognize different emotions of human beings such as anger, sadness, fear, neutrality, disgust, pleasant surprise, and happiness by using several MATLAB functions namely, spectral descriptors, periodicity, and harmonicity. To accomplish the work, we analyze the CREMA-D (Crowd-sourced Emotional Multimodal Actors Data) & TESS (Toronto Emotional Speech Set) datasets of human speech. The audio file contains data that have various characteristics (e.g., noisy, speedy, slow) thereby the efficiency of the ML (Machine Learning) models increases significantly. The EMD (Empirical Mode Decomposition) is utilized for the process of signal decomposition. Then, the features are extracted through the use of several techniques such as the MFCC, GTCC, spectral centroid, roll-off point, entropy, spread, flux, harmonic ratio, energy, skewness, flatness, and audio delta. The data is trained using some renowned ML models namely, Support Vector Machine, Neural Network, Ensemble, and KNN. The algorithms show an accuracy of 67.7%, 63.3%, 61.6%, and 59.0% respectively for the test data and 77.7%, 76.1%, 99.1%, and 61.2% for the training data. We have conducted experiments using Matlab and the result shows that our model is very prominent and flexible than existing similar works.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Enhancing Cluster Quality of Numerical Datasets with Domain Ontology
Authors:
Sudath Rohitha Heiyanthuduwage,
Md Anisur Rahman,
Md Zahidul Islam
Abstract:
Ontology-based clustering has gained attention in recent years due to the potential benefits of ontology. Current ontology-based clustering approaches have mainly been applied to reduce the dimensionality of attributes in text document clustering. Reduction in dimensionality of attributes using ontology helps to produce high quality clusters for a dataset. However, ontology-based approaches in clu…
▽ More
Ontology-based clustering has gained attention in recent years due to the potential benefits of ontology. Current ontology-based clustering approaches have mainly been applied to reduce the dimensionality of attributes in text document clustering. Reduction in dimensionality of attributes using ontology helps to produce high quality clusters for a dataset. However, ontology-based approaches in clustering numerical datasets have not been gained enough attention. Moreover, some literature mentions that ontology-based clustering can produce either high quality or low-quality clusters from a dataset. Therefore, in this paper we present a clustering approach that is based on domain ontology to reduce the dimensionality of attributes in a numerical dataset using domain ontology and to produce high quality clusters. For every dataset, we produce three datasets using domain ontology. We then cluster these datasets using a genetic algorithm-based clustering technique called GenClust++. The clusters of each dataset are evaluated in terms of Sum of Squared-Error (SSE). We use six numerical datasets to evaluate the performance of our ontology-based approach. The experimental results of our approach indicate that cluster quality gradually improves from lower to the higher levels of a domain ontology.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
Improving the Performance of OFDMA-based Wi-Fi Network with Hybrid Medium Access Control Protocol Design
Authors:
Gazi Zahirul Islam
Abstract:
Nowadays, IEEE 802.11, i.e., Wi-Fi has emerged as a prevailing technology for broadband wireless networking. To meet the tremendous rise of demand for future generation wireless LANs, a robust and efficient MAC protocol is required for the Wi-Fi network. However, traditional MAC mechanisms are not suitable for next-generation communications due to some inherent constraints. In this regard, OFDMA t…
▽ More
Nowadays, IEEE 802.11, i.e., Wi-Fi has emerged as a prevailing technology for broadband wireless networking. To meet the tremendous rise of demand for future generation wireless LANs, a robust and efficient MAC protocol is required for the Wi-Fi network. However, traditional MAC mechanisms are not suitable for next-generation communications due to some inherent constraints. In this regard, OFDMA technology could be adopted to design an efficient MAC protocol for the Wi-Fi network.
The purpose of this research is to provide a high-speed network for Wi-Fi users. The thesis presents three MAC protocols, namely, HTFA (High Throughput and Fair Access), ERA (Efficient Resource Allocation), and PRS (Proportional Resource Scheduling), by employing the OFDMA technology. The novel protocols improve Wi-Fi communication using the latest IEEE 802.11ax standard, i.e., Wi-Fi 6. In particular, the protocols improve several performance parameters of the MAC protocol, such as increasing the throughput, goodput, fairness index, and reducing the packet retransmissions, collisions, etc. Simulation results validate that the new protocols are far better than the existing protocols.
The protocols designed in this thesis are compliant with the latest IEEE 802.11ax standard that promises to enhance the throughput at least four times per user and support ten times users. Thus, the new protocols can ensure uninterrupted and smooth communication in highly dense environments. The thesis contains a lot of resources such as the state of the art of MAC protocols, analysis of contemporary protocols and their performance matrix; architecture of Wi-Fi system, OFDMA constraints and regulations; framework of protocols; analytical models; relevant data, theory, and methods; etc. that would be the valuable resources to the future researchers for the research on the Wi-Fi network.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Using Connected Vehicle Trajectory Data to Evaluate the Effects of Speeding
Authors:
Jorge Ugan,
Mohamed Abdel-Aty,
Zubayer Islam
Abstract:
Speeding has been and continues to be a major contributing factor to traffic fatalities. Various transportation agencies have proposed speed management strategies to reduce the amount of speeding on arterials. While there have been various studies done on the analysis of speeding proportions above the speed limit, few studies have considered the effect on the individual's journey. Many studies uti…
▽ More
Speeding has been and continues to be a major contributing factor to traffic fatalities. Various transportation agencies have proposed speed management strategies to reduce the amount of speeding on arterials. While there have been various studies done on the analysis of speeding proportions above the speed limit, few studies have considered the effect on the individual's journey. Many studies utilized speed data from detectors, which is limited in that there is no information of the route that the driver took. This study aims to explore the effects of various roadway features an individual experiences for a given journey on speeding proportions. Connected vehicle trajectory data was utilized to identify the path that a driver took, along with the vehicle related variables. The level of speeding proportion is predicted using multiple learning models. The model with the best performance, Extreme Gradient Boosting, achieved an accuracy of 0.756. The proposed model can be used to understand how the environment and vehicle's path effects the drivers' speeding behavior, as well as predict the areas with high levels of speeding proportions. The results suggested that features related to an individual driver's trip, i.e., total travel time, has a significant contribution towards speeding. Features that are related to the environment of the individual driver's trip, i.e., proportion of residential area, also had a significant effect on reducing speeding proportions. It is expected that the findings could help inform transportation agencies more on the factors related to speeding for an individual driver's trip.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Uncertainty Driven Bottleneck Attention U-net for Organ at Risk Segmentation
Authors:
Abdullah Nazib,
Riad Hassan,
Zahidul Islam,
Clinton Fookes
Abstract:
Organ at risk (OAR) segmentation in computed tomography (CT) imagery is a difficult task for automated segmentation methods and can be crucial for downstream radiation treatment planning. U-net has become a de-facto standard for medical image segmentation and is frequently used as a common baseline in medical image segmentation tasks. In this paper, we propose a multiple decoder U-net architecture…
▽ More
Organ at risk (OAR) segmentation in computed tomography (CT) imagery is a difficult task for automated segmentation methods and can be crucial for downstream radiation treatment planning. U-net has become a de-facto standard for medical image segmentation and is frequently used as a common baseline in medical image segmentation tasks. In this paper, we propose a multiple decoder U-net architecture and use the segmentation disagreement between the decoders as attention to the bottleneck of the network for segmentation refinement. While feature correlation is considered as attention in most cases, in our case it is the uncertainty from the network used as attention. For accurate segmentation, we also proposed a CT intensity integrated regularization loss. Proposed regularisation helps model understand the intensity distribution of low contrast tissues. We tested our model on two publicly available OAR challenge datasets. We also conducted the ablation on each datasets with the proposed attention module and regularization loss. Experimental results demonstrate a clear accuracy improvement on both datasets.
△ Less
Submitted 26 February, 2024; v1 submitted 19 March, 2023;
originally announced March 2023.
-
Combined Location Online Weather Data: Easy-to-use Targeted Weather Analysis for Agriculture
Authors:
Darren Yates,
Christopher Blanchard,
Allister Clarke,
Sabih-Ur Rehman,
Md Zahidul Islam,
Russell Ford,
Rob Walsh
Abstract:
The continuing effects of climate change require farmers and growers to have greater understanding of how these changes affect crop production. However, while climatic data is generally available to help provide much of that understanding, it can often be in a form not easy to digest. The proposed Combined Location Online Weather Data (CLOWD) framework is an easy-to-use online platform for analysi…
▽ More
The continuing effects of climate change require farmers and growers to have greater understanding of how these changes affect crop production. However, while climatic data is generally available to help provide much of that understanding, it can often be in a form not easy to digest. The proposed Combined Location Online Weather Data (CLOWD) framework is an easy-to-use online platform for analysing recent and historical weather data of any location within Australia at the click of a map. CLOWD requires no programming skills and operates in any HTML5 web browser on PC and mobile devices. It enables comparison between current and previous growing seasons over a range of environmental parameters, and can create a plain-English PDF report for offline use, using natural language generation (NLG). This paper details the platform, the design decisions taken and outlines how farmers and growers can use CLOWD to better understand current growing conditions. Prototypes of CLOWD are now online for PCs and smartphones.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Real-Time Traffic End-of-Queue Detection and Tracking in UAV Video
Authors:
Russ Messenger,
Md Zobaer Islam,
Matthew Whitlock,
Erik Spong,
Nate Morton,
Layne Claggett,
Chris Matthews,
Jordan Fox,
Leland Palmer,
Dane C. Johnson,
John F. O'Hara,
Christopher J. Crick,
Jamey D. Jacob,
Sabit Ekin
Abstract:
Highway work zones are susceptible to undue accumulation of motorized vehicles which calls for dynamic work zone warning signs to prevent accidents. The work zone signs are placed according to the location of the end-of-queue of vehicles which usually changes rapidly. The detection of moving objects in video captured by Unmanned Aerial Vehicles (UAV) has been extensively researched so far, and is…
▽ More
Highway work zones are susceptible to undue accumulation of motorized vehicles which calls for dynamic work zone warning signs to prevent accidents. The work zone signs are placed according to the location of the end-of-queue of vehicles which usually changes rapidly. The detection of moving objects in video captured by Unmanned Aerial Vehicles (UAV) has been extensively researched so far, and is used in a wide array of applications including traffic monitoring. Unlike the fixed traffic cameras, UAVs can be used to monitor the traffic at work zones in real-time and also in a more cost-effective way. This study presents a method as a proof of concept for detecting End-of-Queue (EOQ) of traffic by processing the real-time video footage of a highway work zone captured by UAV. EOQ is detected in the video by image processing which includes background subtraction and blob detection methods. This dynamic localization of EOQ of vehicles will enable faster and more accurate relocation of work zone warning signs for drivers and thus will reduce work zone fatalities. The method can be applied to detect EOQ of vehicles and notify drivers in any other roads or intersections too where vehicles are rapidly accumulating due to special events, traffic jams, construction, or accidents.
△ Less
Submitted 31 October, 2023; v1 submitted 9 January, 2023;
originally announced February 2023.
-
A Brief Overview of Software-Defined Networking
Authors:
Alexander Nunez,
Joseph Ayoka,
Md Zahidul Islam,
Pablo Ruiz
Abstract:
The Internet is the driving force of the new digital world, which has created a revolution. With the concept of the Internet of Things (IoT), almost everything is being connected to the internet. However, with the traditional IP network system, it is computationally very complex and costly to manage and configure the network, where the data plane and the control plane are tightly coupled. In order…
▽ More
The Internet is the driving force of the new digital world, which has created a revolution. With the concept of the Internet of Things (IoT), almost everything is being connected to the internet. However, with the traditional IP network system, it is computationally very complex and costly to manage and configure the network, where the data plane and the control plane are tightly coupled. In order to simplify the network management tasks, software-defined networking (SDN) has been proposed as a promising paradigm shift towards an externalized and logically centralized network control plane. SDN decouples the control plane and the data plane and provides programmability to configure the network. To address the overwhelming advancement of this new technology, a holistic overview of SDN is provided in this paper by describing different layers and their functionalities in SDN. The paper presents a simple but effective overview of SDN, which will pave the way for the readers to understand this new technology and contribute to this field.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Hand Gesture Recognition through Reflected Infrared Light Wave Signals
Authors:
Md Zobaer Islam,
Li Yu,
Hisham Abuella,
John F. O'Hara,
Christopher Crick,
Sabit Ekin
Abstract:
In this study, we present a wireless (non-contact) gesture recognition method using only incoherent light wave signals reflected from a human subject. In comparison to existing radar, light shadow, sound and camera-based sensing systems, this technology uses a low-cost ubiquitous light source (e.g., infrared LED) to send light towards the subject's hand performing gestures and the reflected light…
▽ More
In this study, we present a wireless (non-contact) gesture recognition method using only incoherent light wave signals reflected from a human subject. In comparison to existing radar, light shadow, sound and camera-based sensing systems, this technology uses a low-cost ubiquitous light source (e.g., infrared LED) to send light towards the subject's hand performing gestures and the reflected light is collected by a light sensor (e.g., photodetector). This light wave sensing system recognizes different gestures from the variations of the received light intensity within a 20-35cm range. The hand gesture recognition results demonstrate up to 96% accuracy on average. The developed system can be utilized in numerous Human-computer Interaction (HCI) applications as a low-cost and non-contact gesture recognition technology.
△ Less
Submitted 13 June, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
Noncontact Respiratory Anomaly Detection Using Infrared Light-Wave Sensing
Authors:
Md Zobaer Islam,
Brenden Martin,
Carly Gotcher,
Tyler Martinez,
John F. O'Hara,
Sabit Ekin
Abstract:
Human respiratory rate and its pattern convey essential information about the physical and psychological states of the subject. Abnormal breathing can indicate fatal health issues leading to further diagnosis and treatment. Wireless light-wave sensing (LWS) using incoherent infrared light shows promise in safe, discreet, efficient, and non-invasive human breathing monitoring without raising privac…
▽ More
Human respiratory rate and its pattern convey essential information about the physical and psychological states of the subject. Abnormal breathing can indicate fatal health issues leading to further diagnosis and treatment. Wireless light-wave sensing (LWS) using incoherent infrared light shows promise in safe, discreet, efficient, and non-invasive human breathing monitoring without raising privacy concerns. The respiration monitoring system needs to be trained on different types of breathing patterns to identify breathing anomalies.The system must also validate the collected data as a breathing waveform, discarding any faulty data caused by external interruption, user movement, or system malfunction. To address these needs, this study simulated normal and different types of abnormal respiration using a robot that mimics human breathing patterns. Then, time-series respiration data were collected using infrared light-wave sensing technology. Three machine learning algorithms, decision tree, random forest and XGBoost, were applied to detect breathing anomalies and faulty data. Model performances were evaluated through cross-validation, assessing classification accuracy, precision and recall scores. The random forest model achieved the highest classification accuracy of 96.75% with data collected at a 0.5m distance. In general, ensemble models like random forest and XGBoost performed better than a single model in classifying the data collected at multiple distances from the light-wave sensing setup.
△ Less
Submitted 16 April, 2024; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Towards Next Generation of Pedestrian and Connected Vehicle In-the-loop Research: A Digital Twin Co-Simulation Framework
Authors:
Zi** Wang,
Ou Zheng,
Liangding Li,
Mohamed Abdel-Aty,
Carolina Cruz-Neira,
Zubayer Islam
Abstract:
Digital Twin is an emerging technology that replicates real-world entities into a digital space. It has attracted increasing attention in the transportation field and many researchers are exploring its future applications in the development of Intelligent Transportation System (ITS) technologies. Connected vehicles (CVs) and pedestrians are among the major traffic participants in ITS. However, the…
▽ More
Digital Twin is an emerging technology that replicates real-world entities into a digital space. It has attracted increasing attention in the transportation field and many researchers are exploring its future applications in the development of Intelligent Transportation System (ITS) technologies. Connected vehicles (CVs) and pedestrians are among the major traffic participants in ITS. However, the usage of Digital Twin in research involving both CV and pedestrian remains largely unexplored. In this study, a Digital Twin framework for CV and pedestrian in-the-loop simulation is proposed. The proposed framework consists of the physical world, the digital world, and data transmission in between. The features for the entities (CV and pedestrian) that need digital twining are divided into external state and internal state, and the attributes in each state are described. We also demonstrate a sample architecture under the proposed Digital Twin framework, which is based on Carla-Sumo Co-simulation and Cave automatic virtual environment (CAVE). A case study that investigates Vehicle-Pedestrian (V2P) warning system is conducted to validate the effectiveness of the presented architecture. The proposed framework is expected to provide guidance to the future Digital Twin research, and the architecture we build can serve as the testbed for further research and development of ITS applications on CV and pedestrians.
△ Less
Submitted 10 March, 2023; v1 submitted 8 December, 2022;
originally announced December 2022.
-
Deep Convolutional Neural Network for Roadway Incident Surveillance Using Audio Data
Authors:
Zubayer Islam,
Mohamed Abdel-Aty
Abstract:
Crash events identification and prediction plays a vital role in understanding safety conditions for transportation systems. While existing systems use traffic parameters correlated with crash data to classify and train these models, we propose the use of a novel sensory unit that can also accurately identify crash events: microphone. Audio events can be collected and analyzed to classify events s…
▽ More
Crash events identification and prediction plays a vital role in understanding safety conditions for transportation systems. While existing systems use traffic parameters correlated with crash data to classify and train these models, we propose the use of a novel sensory unit that can also accurately identify crash events: microphone. Audio events can be collected and analyzed to classify events such as crash. In this paper, we have demonstrated the use of a deep Convolutional Neural Network (CNN) for road event classification. Important audio parameters such as Mel Frequency Cepstral Coefficients (MFCC), log Mel-filterbank energy spectrum and Fourier Spectrum were used as feature set. Additionally, the dataset was augmented with more sample data by the use of audio augmentation techniques such as time and pitch shifting. Together with the feature extraction this data augmentation can achieve reasonable accuracy. Four events such as crash, tire skid, horn and siren sounds can be accurately identified giving indication of a road hazard that can be useful for traffic operators or paramedics. The proposed methodology can reach accuracy up to 94%. Such audio systems can be implemented as a part of an Internet of Things (IoT) platform that can complement video-based sensors without complete coverage.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
How Do Organizations Seek Cyber Assurance? Investigations on the Adoption of the Common Criteria and Beyond
Authors:
Nan Sun,
Chang-Tsun Li,
Hin Chan,
Md Zahidul Islam,
Md Rafiqul Islam,
Warren Armstrong
Abstract:
Cyber assurance, which is the ability to operate under the onslaught of cyber attacks and other unexpected events, is essential for organizations facing inundating security threats on a daily basis. Organizations usually employ multiple strategies to conduct risk management to achieve cyber assurance. Utilizing cybersecurity standards and certifications can provide guidance for vendors to design a…
▽ More
Cyber assurance, which is the ability to operate under the onslaught of cyber attacks and other unexpected events, is essential for organizations facing inundating security threats on a daily basis. Organizations usually employ multiple strategies to conduct risk management to achieve cyber assurance. Utilizing cybersecurity standards and certifications can provide guidance for vendors to design and manufacture secure Information and Communication Technology (ICT) products as well as provide a level of assurance of the security functionality of the products for consumers. Hence, employing security standards and certifications is an effective strategy for risk management and cyber assurance. In this work, we begin with investigating the adoption of cybersecurity standards and certifications by surveying 258 participants from organizations across various countries and sectors. Specifically, we identify adoption barriers of the Common Criteria through the designed questionnaire. Taking into account the seven identified adoption barriers, we show the recommendations for promoting cybersecurity standards and certifications. Moreover, beyond cybersecurity standards and certifications, we shed light on other risk management strategies devised by our participants, which provides directions on cybersecurity approaches for enhancing cyber assurance in organizations.
△ Less
Submitted 5 March, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Real-time Emergency Vehicle Event Detection Using Audio Data
Authors:
Zubayer Islam,
Mohamed Abdel-Aty
Abstract:
In this work, we focus on detecting emergency vehicles using only audio data. Improved and quick detection can help in faster preemption of these vehicles at signalized intersections thereby reducing overall response time in case of emergencies. Important audio features were extracted from raw data and passed into extreme learning machines (ELM) for training. ELMs have been used in this work becau…
▽ More
In this work, we focus on detecting emergency vehicles using only audio data. Improved and quick detection can help in faster preemption of these vehicles at signalized intersections thereby reducing overall response time in case of emergencies. Important audio features were extracted from raw data and passed into extreme learning machines (ELM) for training. ELMs have been used in this work because of its simplicity and shorter run-time which can therefore be used for online learning. Recently, there have been many studies that focus on sound classification but most of the methods used are complex to train and implement. The results from this paper show that ELM can achieve similar performance with exceptionally shorter training times. The accuracy reported for ELM is about 97% for emergency vehicle detection (EVD).
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
STRIDE-based Cyber Security Threat Modeling for IoT-enabled Precision Agriculture Systems
Authors:
Md. Rashid Al Asif,
Khondokar Fida Hasan,
Md Zahidul Islam,
Rahamatullah Khondoker
Abstract:
The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of reso…
▽ More
The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of resources. Undoubtedly the success of the system would be impacted on crop productivity, where failure would impact severely. Like many other cyber-physical systems, one of the growing challenges to avoid system adversity is to ensure the system's security, privacy, and trust. But what are the vulnerabilities, threats, and security issues we should consider while deploying precision agriculture? This paper has conducted a holistic threat modeling on component levels of precision agriculture's standard infrastructure using popular threat intelligence tools STRIDE to identify common security issues. Our modeling identifies a noticing of fifty-eight potential security threats to consider. This presentation systematically presented them and advised general mitigation suggestions to support cyber security in precision agriculture.
△ Less
Submitted 30 January, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Defining Security Requirements with the Common Criteria: Applications, Adoptions, and Challenges
Authors:
Nan Sun,
Chang-Tsun Li,
Hin Chan,
Ba Dung Le,
MD Zahidul Islam,
Leo Yu Zhang,
MD Rafiqul Islam,
Warren Armstrong
Abstract:
Advances of emerging Information and Communications Technology (ICT) technologies push the boundaries of what is possible and open up new markets for innovative ICT products and services. The adoption of ICT products and systems with security properties depends on consumers' confidence and markets' trust in the security functionalities and whether the assurance measures applied to these products m…
▽ More
Advances of emerging Information and Communications Technology (ICT) technologies push the boundaries of what is possible and open up new markets for innovative ICT products and services. The adoption of ICT products and systems with security properties depends on consumers' confidence and markets' trust in the security functionalities and whether the assurance measures applied to these products meet the inherent security requirements. Such confidence and trust are primarily gained through the rigorous development of security requirements, validation criteria, evaluation, and certification. Common Criteria for Information Technology Security Evaluation (often referred to as Common Criteria or CC) is an international standard (ISO/IEC 15408) for cyber security certification. In this paper, we conduct a systematic review of the CC standards and its adoptions. Adoption barriers of the CC are also investigated based on the analysis of current trends in security evaluation. Specifically, we share the experiences and lessons gained through the recent Development of Australian Cyber Criteria Assessment (DACCA) project that promotes the CC among stakeholders in ICT security products related to specification, development, evaluation, certification and approval, procurement, and deployment. Best practices on develo** Protection Profiles, recommendations, and future directions for trusted cybersecurity advancement are presented.
△ Less
Submitted 2 April, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
BRACU Mongol Tori: Next Generation Mars Exploration Rover
Authors:
Niaz Sharif Shourov,
Masnur Rahman,
Mohammad Zahirul Islam,
Ali Ahsan,
Syed Md Kamruzzaman,
Saifur Rahman,
Md Sakiluzzaman,
Intisar Hasnain,
Ekhwan Islam,
Saiful Islam,
Md. Khalilur Rhaman
Abstract:
BRAC University (BRACU) has participated in the University Rover Challenge (URC), a robotics competition for university level students organized by the Mars Society to design and build a rover that would be of use to early explorers on Mars. BRACU has designed and developed a full functional next-generation mars rover, Mongol Tori, which can be operated in the extreme, hostile condition expected i…
▽ More
BRAC University (BRACU) has participated in the University Rover Challenge (URC), a robotics competition for university level students organized by the Mars Society to design and build a rover that would be of use to early explorers on Mars. BRACU has designed and developed a full functional next-generation mars rover, Mongol Tori, which can be operated in the extreme, hostile condition expected in planet Mars. Not only has Mongol Tori embedded with both autonomous and manual controlled features to functionalize, it can also capable of conducting scientific tasks to identify the characteristics of soils and weathering in the mars environment.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
A Generalised Logical Layered Architecture for Blockchain Technology
Authors:
Jared Newell,
Quazi Mamun,
Sabih ur Rehman,
Md Zahidul Islam
Abstract:
Precision, validity, reliability, timeliness, availability, and granularity are the desired characteristics for data and information systems. However due to the desired trait of data mutability, information systems have inherently lacked the ability to enforce data integrity without governance. A resolution to this challenge has emerged in the shape of blockchain architecture, which ensures immuta…
▽ More
Precision, validity, reliability, timeliness, availability, and granularity are the desired characteristics for data and information systems. However due to the desired trait of data mutability, information systems have inherently lacked the ability to enforce data integrity without governance. A resolution to this challenge has emerged in the shape of blockchain architecture, which ensures immutability of stored information, whilst remaining in an online state. Blockchain technology achieves this through the serial attachment of set-sized parcels of data called blocks. Links (liken to a chain) between these blocks are implemented using a cryptographic seal created using mathematical functions on the data inside the blocks. Practical implementations of blockchain vary by different components, concepts, and terminologies. Researchers proposed various architectural models using different layers to implement blockchain technologies. In this paper, we investigated those layered architectures for different use cases. We identified essential layers and components for a generalised blockchain architecture. We present a novel three-tiered storage model for the purpose of logically defining and categorising blockchain as a storage technology. We envision that this generalised model will be used as a guide when referencing and building any blockchain storage solution.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Energy-cost aware off-grid base stations with IoT devices for develo** a green heterogeneous network
Authors:
Khondoker Ziaul Islam,
MD. Sanwar Hossain,
B. M. Ruhul Amin,
Ferdous Sohel
Abstract:
Heterogeneous network (HetNet) is a specified cellular platform to tackle the rapidly growing anticipated data traffic. From communications perspective, data loads can be mapped to energy loads that are generally placed on the operator networks. Meanwhile, renewable energy aided networks offer to curtail fossil fuel consumption, so to reduce environmental pollution. This paper proposes a renewable…
▽ More
Heterogeneous network (HetNet) is a specified cellular platform to tackle the rapidly growing anticipated data traffic. From communications perspective, data loads can be mapped to energy loads that are generally placed on the operator networks. Meanwhile, renewable energy aided networks offer to curtail fossil fuel consumption, so to reduce environmental pollution. This paper proposes a renewable energy based power supply architecture for off-grid HetNet using a novel energy sharing model. Solar photovoltaic (PV) along with sufficient energy storage devices are used for each macro, micro, pico, or femto base station (BS). Additionally, biomass generator (BG) is used for macro and micro BSs. The collocated macro and micro BSs are connected through end-to-end resistive lines. A novel weighted proportional-fair resource-scheduling algorithm with sleep mechanisms is proposed for non-real time (NRT) applications by trading-off the power consumption and communication delays. Furthermore, the proposed algorithm with extended discontinuous reception (eDRX) and power saving mode (PSM) for narrowband internet of things (IoT) applications extends battery lifetime for IoT devices. HOMER optimization software is used to perform optimal system architecture, economic, and carbon footprint analyses while Monte-Carlo simulation tool is used for evaluating the throughput and energy efficiency performances. The proposed algorithms are valid for the practical data of the rural areas. We demonstrate the proposed power supply architecture is energy-efficient, cost-effective, reliable, and eco-friendly.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
EEG Signal Processing using Wavelets for Accurate Seizure Detection through Cost Sensitive Data Mining
Authors:
Paul Grant,
Md Zahidul Islam
Abstract:
Epilepsy is one of the most common and yet diverse set of chronic neurological disorders. This excessive or synchronous neuronal activity is termed seizure. Electroencephalogram signal processing plays a significant role in detection and prediction of epileptic seizures. In this paper we introduce an approach that relies upon the properties of wavelets for seizure detection. We utilise the Maximum…
▽ More
Epilepsy is one of the most common and yet diverse set of chronic neurological disorders. This excessive or synchronous neuronal activity is termed seizure. Electroencephalogram signal processing plays a significant role in detection and prediction of epileptic seizures. In this paper we introduce an approach that relies upon the properties of wavelets for seizure detection. We utilise the Maximum Overlap Discrete Wavelet Transform which enables us to reduce signal noise Then from the variance exhibited in wavelet coefficients we develop connectivity and communication efficiency between the electrodes as these properties differ significantly during a seizure period in comparison to a non-seizure period. We use basic statistical parameters derived from the reconstructed noise reduced signal, electrode connectivity and the efficiency of information transfer to build the attribute space.
We have utilised data that are publicly available to test our method that is found to be significantly better than some existing approaches.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Signal Classification using Smooth Coefficients of Multiple wavelets
Authors:
Paul Grant,
Md Zahidul Islam
Abstract:
Classification of time series signals has become an important construct and has many practical applications. With existing classifiers we may be able to accurately classify signals, however that accuracy may decline if using a reduced number of attributes. Transforming the data then undertaking reduction in dimensionality may improve the quality of the data analysis, decrease time required for cla…
▽ More
Classification of time series signals has become an important construct and has many practical applications. With existing classifiers we may be able to accurately classify signals, however that accuracy may decline if using a reduced number of attributes. Transforming the data then undertaking reduction in dimensionality may improve the quality of the data analysis, decrease time required for classification and simplify models. We propose an approach, which chooses suitable wavelets to transform the data, then combines the output from these transforms to construct a dataset to then apply ensemble classifiers to. We demonstrate this on different data sets, across different classifiers and use differing evaluation methods. Our experimental results demonstrate the effectiveness of the proposed technique, compared to the approaches that use either raw signal data or a single wavelet transform.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
A Framework for Supervised Heterogeneous Transfer Learning using Dynamic Distribution Adaptation and Manifold Regularization
Authors:
Md Geaur Rahman,
Md Zahidul Islam
Abstract:
Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by tra…
▽ More
Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of fourteen state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
△ Less
Submitted 2 September, 2022; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Intelligent Stretch Reduction in Information-CentricNetworking towards 5G-Tactile Internet realization
Authors:
Hussain Ahmad,
Muhammad Zubair Islam,
Amir Haider,
Rashid Ali,
Hyung Seok Kim
Abstract:
In recent years, 5G is widely used in parallel with IoT networks to enable massive data connectivity and exchange with ultra-reliable and low latency communication (URLLC) services. The internet requirements from user's perspective have shifted from simple human to human interactions to different communication paradigms and information-centric networking (ICN). ICN distributes the content among t…
▽ More
In recent years, 5G is widely used in parallel with IoT networks to enable massive data connectivity and exchange with ultra-reliable and low latency communication (URLLC) services. The internet requirements from user's perspective have shifted from simple human to human interactions to different communication paradigms and information-centric networking (ICN). ICN distributes the content among the users based on their trending requests. ICN is responsible not only for the routing and caching but also for naming the network's content. ICN considers several parameters such as cache-hit ratio, content diversity, content redundancy, and stretch to route the content. ICN enables name-based caching of the required content according to the user's request based on the router's interest table. The stretch shows the path covered while retrieving the content from producer to consumer. Reduction in path length also leads to a reduction in end-to-end latency and better data rate availability. ICN routers must have the minimum stretch to obtain a better system efficiency. Reinforcement learning (RL) is widely used in networks environment to increase agent efficiency to make decisions. In ICN, RL can aid to increase caching and stretch efficiency. This paper investigates a stretch reduction strategy for ICN routers by formulating the stretch reduction problem as a Markov decision process. The evaluation of the proposed stretch reduction strategy's accuracy is done by employing Q-Learning, an RL technique. The simulation results indicate that by using the optimal parameters for the proposed stretch reduction strategy.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM
Authors:
Zahidul Islam,
Mohammad Rukonuzzaman,
Raiyan Ahmed,
Md. Hasanul Kabir,
Moshiur Farazi
Abstract:
Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one…
▽ More
Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one stream takes in background suppressed frames as inputs and other stream processes difference of adjacent frames. We employed simple and fast input pre-processing techniques that highlight the moving objects in the frames by suppressing non-moving backgrounds and capture the motion in-between frames. As violent actions are mostly characterized by body movements these inputs help produce discriminative features. SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution that enables producing robust long-range Spatio-temporal features while using substantially fewer parameters. We experimented with three fusion methods to combine the output feature maps of the two streams. Evaluation of the proposed methods was done on three standard public datasets. Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin while matching state-of-the-art results on the smaller datasets. Our experiments lead us to conclude, the proposed models are superior in terms of both computational efficiency and detection accuracy.
△ Less
Submitted 20 April, 2021; v1 submitted 21 February, 2021;
originally announced February 2021.
-
Comparative Code Structure Analysis using Deep Learning for Performance Prediction
Authors:
Nathan Pinnow,
Tarek Ramadan,
Tanzima Z. Islam,
Chase Phelps,
Jayaraman J. Thiagarajan
Abstract:
Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires understanding low-level details to interpret the findings for actionable optimizations. Additionally, application performance is a function of an infinite number of…
▽ More
Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires understanding low-level details to interpret the findings for actionable optimizations. Additionally, application performance is a function of an infinite number of unknowns stemming from the application-, runtime-, and interactions between the OS and underlying hardware, making it difficult, if not impossible, to model using any deep learning technique, especially without a large labeled dataset. In this paper, we address both of these problems by presenting a large corpus of a labeled dataset for the community and take a comparative analysis approach to mitigate all unknowns except their source code differences between different correct implementations of the same problem. We put the power of deep learning to the test for automatically extracting information from the hierarchical structure of abstract syntax trees to represent source code. This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. This research will enable performance-aware application development since every version of the application will continue to contribute to the corpora, which will enhance the performance of the model. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
△ Less
Submitted 21 April, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Adaptive Decision Forest: An Incremental Machine Learning Framework
Authors:
Md Geaur Rahman,
Md Zahidul Islam
Abstract:
In this study, we present an incremental machine learning framework called Adaptive Decision Forest (ADF), which produces a decision forest to classify new records. Based on our two novel theorems, we introduce a new splitting strategy called iSAT, which allows ADF to classify new records even if they are associated with previously unseen classes. ADF is capable of identifying and handling concept…
▽ More
In this study, we present an incremental machine learning framework called Adaptive Decision Forest (ADF), which produces a decision forest to classify new records. Based on our two novel theorems, we introduce a new splitting strategy called iSAT, which allows ADF to classify new records even if they are associated with previously unseen classes. ADF is capable of identifying and handling concept drift; it, however, does not forget previously gained knowledge. Moreover, ADF is capable of handling big data if the data can be divided into batches. We evaluate ADF on five publicly available natural data sets and one synthetic data set, and compare the performance of ADF against the performance of eight state-of-the-art techniques. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Detecting Autism Spectrum Disorder using Machine Learning
Authors:
Md Delowar Hossain,
Muhammad Ashad Kabir,
Adnan Anwar,
Md Zahidul Islam
Abstract:
Autism Spectrum Disorder (ASD), which is a neuro development disorder, is often accompanied by sensory issues such an over sensitivity or under sensitivity to sounds and smells or touch. Although its main cause is genetics in nature, early detection and treatment can help to improve the conditions. In recent years, machine learning based intelligent diagnosis has been evolved to complement the tra…
▽ More
Autism Spectrum Disorder (ASD), which is a neuro development disorder, is often accompanied by sensory issues such an over sensitivity or under sensitivity to sounds and smells or touch. Although its main cause is genetics in nature, early detection and treatment can help to improve the conditions. In recent years, machine learning based intelligent diagnosis has been evolved to complement the traditional clinical methods which can be time consuming and expensive. The focus of this paper is to find out the most significant traits and automate the diagnosis process using available classification techniques for improved diagnosis purpose. We have analyzed ASD datasets of Toddler, Child, Adolescent and Adult. We determine the best performing classifier for these binary datasets using the evaluation metrics recall, precision, F-measures and classification errors. Our finding shows that Sequential minimal optimization (SMO) based Support Vector Machines (SVM) classifier outperforms all other benchmark machine learning algorithms in terms of accuracy during the detection of ASD cases and produces less classification errors compared to other algorithms. Also, we find that Relief Attributes algorithm is the best to identify the most significant attributes in ASD datasets.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
FastForest: Increasing Random Forest Processing Speed While Maintaining Accuracy
Authors:
Darren Yates,
Md Zahidul Islam
Abstract:
Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the domain of hardware-constrained devices such as smartphones and Internet of Things (IoT) devices, there is continued need for further research into algorithm efficie…
▽ More
Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the domain of hardware-constrained devices such as smartphones and Internet of Things (IoT) devices, there is continued need for further research into algorithm efficiency to deliver greater processing speed without sacrificing accuracy. Our proposed FastForest algorithm delivers an average 24% increase in processing speed compared with Random Forest whilst maintaining (and frequently exceeding) it on classification accuracy over tests involving 45 datasets. FastForest achieves this result through a combination of three optimising components - Subsample Aggregating ('Subbagging'), Logarithmic Split-Point Sampling and Dynamic Restricted Subspacing. Moreover, detailed testing of Subbagging sizes has found an optimal scalar delivering a positive mix of processing performance and accuracy.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
A Novel Incremental Clustering Technique with Concept Drift Detection
Authors:
Mitchell D. Woodbright,
Md Anisur Rahman,
Md Zahidul Islam
Abstract:
Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover,…
▽ More
Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover, it can be challenging to store the combined dataset in memory due to its ever-increasing size. As a result, various incremental clustering techniques have been proposed. These techniques need to efficiently update the current clustering result whenever a new batch arrives, to adapt the current clustering result/solution with the latest data. These techniques also need the ability to detect concept drifts when the clustering pattern of a new batch is significantly different from older batches. Sometimes, clustering patterns may drift temporarily in a single batch while the next batches do not exhibit the drift. Therefore, incremental clustering techniques need the ability to detect a temporary drift and sustained drift. In this paper, we propose an efficient incremental clustering algorithm called UIClust. It is designed to cluster streams of data chunks, even when there are temporary or sustained concept drifts. We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm. We use real and synthetic datasets. We compare the results by using well-known clustering evaluation criteria: entropy, sum of squared errors (SSE), and execution time. Our results show that UIClust outperforms the existing technique in all our experiments.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Tree Index: A New Cluster Evaluation Technique
Authors:
A. H. Beg,
Md Zahidul Islam,
Vladimir Estivill-Castro
Abstract:
We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the…
▽ More
We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. Our Tree Index produces a decision tree from the clustered data set, using the cluster identifiers as labels. It combines the entropy of each leaf with their depth. Intuitively, a shorter tree with pure leaves generalizes the data well (the clusters are easy to learn because they are well separated). So, the labels are meaningful clusters. If the clustering algorithm does not separate well, trees learned from their results will be large and too detailed. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters. We confirm the effectiveness of Tree Index through graphical visualizations. Tree Index evaluates the sensible solutions higher than the non-sensible solutions while existing cluster-quality indexes fail to do so.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Data Pre-Processing and Evaluating the Performance of Several Data Mining Methods for Predicting Irrigation Water Requirement
Authors:
Mahmood A. Khan,
Md Zahidul Islam,
Mohsin Hafeez
Abstract:
Recent drought and population growth are planting unprecedented demand for the use of available limited water resources. Irrigated agriculture is one of the major consumers of freshwater. A large amount of water in irrigated agriculture is wasted due to poor water management practices. To improve water management in irrigated areas, models for estimation of future water requirements are needed. De…
▽ More
Recent drought and population growth are planting unprecedented demand for the use of available limited water resources. Irrigated agriculture is one of the major consumers of freshwater. A large amount of water in irrigated agriculture is wasted due to poor water management practices. To improve water management in irrigated areas, models for estimation of future water requirements are needed. Develo** a model for forecasting irrigation water demand can improve water management practices and maximise water productivity. Data mining can be used effectively to build such models.
In this study, we prepare a dataset containing information on suitable attributes for forecasting irrigation water demand. The data is obtained from three different sources namely meteorological data, remote sensing images and water delivery statements. In order to make the prepared dataset useful for demand forecasting and pattern extraction, we pre-process the dataset using a novel approach based on a combination of irrigation and data mining knowledge. We then apply and compare the effectiveness of different data mining methods namely decision tree (DT), artificial neural networks (ANNs), systematically developed forest (SysFor) for multiple trees, support vector machine (SVM), logistic regression, and the traditional Evapotranspiration (ETc) methods and evaluate the performance of these models to predict irrigation water demand. Our experimental results indicate the usefulness of data pre-processing and the effectiveness of different classifiers. Among the six methods we used, SysFor produces the best prediction with 97.5% accuracy followed by a decision tree with 96% and ANN with 95% respectively by closely matching the predictions with actual water usage. Therefore, we recommend using SysFor and DT models for irrigation water demand forecasting.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
DataLearner: A Data Mining and Knowledge Discovery Tool for Android Smartphones and Tablets
Authors:
Darren Yates,
Md Zahidul Islam,
Junbin Gao
Abstract:
Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data-mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data-mining application designed specifically for Android devices that imports the Weka data-mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can…
▽ More
Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data-mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data-mining application designed specifically for Android devices that imports the Weka data-mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can be expanded with additional algorithms. Combined, DataLearner delivers 40 classification, clustering and association rule mining algorithms for model training and evaluation without need for cloud computing resources or network connectivity. It provides the same classification accuracy as PCs and laptops, while doing so with acceptable processing speed and consuming negligible battery life. With its ability to provide easy-to-use data-mining on a phone-size screen, DataLearner is a new portable, self-contained data-mining tool for remote, personalised and learning applications alike. DataLearner features four elements - this paper, the app available on Google Play, the GPL3-licensed source code on GitHub and a short video on YouTube.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.