-
Free-Space Optical Channel Turbulence Prediction: A Machine Learning Approach
Authors:
Md Zobaer Islam,
Ethan Abele,
Fahim Ferdous Hossain,
Arsalan Ahmad,
Sabit Ekin,
John F. O'Hara
Abstract:
Channel turbulence presents a formidable obstacle for free-space optical (FSO) communication. Anticipation of turbulence levels is highly important for mitigating disruptions. We study the application of machine learning (ML) to FSO data streams to rapidly predict channel turbulence levels with no additional sensing hardware. An optical bit stream was transmitted through a controlled channel in th…
▽ More
Channel turbulence presents a formidable obstacle for free-space optical (FSO) communication. Anticipation of turbulence levels is highly important for mitigating disruptions. We study the application of machine learning (ML) to FSO data streams to rapidly predict channel turbulence levels with no additional sensing hardware. An optical bit stream was transmitted through a controlled channel in the lab under six distinct turbulence levels, and the efficacy of using ML to classify turbulence levels was examined. ML-based turbulence level classification was found to be >98% accurate with multiple ML training parameters, but highly dependent upon the timescale of changes between turbulence levels.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction
Authors:
Ling Yue,
Jonathan Li,
Md Zabirul Islam,
Bolun Xia,
Tianfan Fu,
**tai Chen
Abstract:
The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a…
▽ More
The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a crucial factor that influences overall expenses. Therefore, effective management of the timeline of a clinical trial is essential for controlling the budget and maximizing the economic viability of the research. To address this issue, We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data, including disease names, drug molecules, trial phases, and eligibility criteria. Then, we encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding of clinical trial data. Finally, the model's hierarchical attention mechanism connects all of the embeddings to capture their interactions and predict clinical trial duration. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models, indicating more accurate clinical trial duration prediction. Publicly available code can be found at https://anonymous.4open.science/r/TrialDura-F196
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Analyzing the Dynamics of COVID-19 Lockdown Success: Insights from Regional Data and Public Health Measures
Authors:
Md. Motaleb Hossen Manik,
Md. Ahsan Habib,
Md. Zabirul Islam,
Tanim Ahmed,
Fabliha Haque
Abstract:
The COVID-19 pandemic caused by the coronavirus had a significant effect on social, economic, and health systems globally. The virus emerged in Wuhan, China, and spread worldwide resulting in severe disease, death, and social interference. Countries implemented lockdowns in various regions to limit the spread of the virus. Some of them were successful and some failed. Here, several factors played…
▽ More
The COVID-19 pandemic caused by the coronavirus had a significant effect on social, economic, and health systems globally. The virus emerged in Wuhan, China, and spread worldwide resulting in severe disease, death, and social interference. Countries implemented lockdowns in various regions to limit the spread of the virus. Some of them were successful and some failed. Here, several factors played a vital role in their success. But mostly these factors and their correlations remained unidentified. In this paper, we unlocked those factors that contributed to the success of lockdown during the COVID-19 pandemic and explored the correlations among them. Moreover, this paper proposes several strategies to control any pandemic situation in the future. Here, it explores the relationships among variables, such as population density, number of infected, death, recovered patients, and the success or failure of the lockdown in different regions of the world. The findings suggest a strong correlation among these factors and indicate that the spread of similar kinds of viruses can be reduced in the future by implementing several safety measures.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Evolutionary Optimization of 1D-CNN for Non-contact Respiration Pattern Classification
Authors:
Md Zobaer Islam,
Sabit Ekin,
John F. O'Hara,
Gary Yen
Abstract:
In this study, we present a deep learning-based approach for time-series respiration data classification. The dataset contains regular breathing patterns as well as various forms of abnormal breathing, obtained through non-contact incoherent light-wave sensing (LWS) technology. Given the one-dimensional (1D) nature of the data, we employed a 1D convolutional neural network (1D-CNN) for classificat…
▽ More
In this study, we present a deep learning-based approach for time-series respiration data classification. The dataset contains regular breathing patterns as well as various forms of abnormal breathing, obtained through non-contact incoherent light-wave sensing (LWS) technology. Given the one-dimensional (1D) nature of the data, we employed a 1D convolutional neural network (1D-CNN) for classification purposes. Genetic algorithm was employed to optimize the 1D-CNN architecture to maximize classification accuracy. Addressing the computational complexity associated with training the 1D-CNN across multiple generations, we implemented transfer learning from a pre-trained model. This approach significantly reduced the computational time required for training, thereby enhancing the efficiency of the optimization process. This study contributes valuable insights into the potential applications of deep learning methodologies for enhancing respiratory anomaly detection through precise and efficient respiration classification.
△ Less
Submitted 16 April, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Seizure detection from Electroencephalogram signals via Wavelets and Graph Theory metrics
Authors:
Paul Grant,
Md Zahidul Islam
Abstract:
Epilepsy is one of the most prevalent neurological conditions, where an epileptic seizure is a transient occurrence due to abnormal, excessive and synchronous activity in the brain. Electroencephalogram signals emanating from the brain may be captured, analysed and then play a significant role in detection and prediction of epileptic seizures. In this work we enhance upon a previous approach that…
▽ More
Epilepsy is one of the most prevalent neurological conditions, where an epileptic seizure is a transient occurrence due to abnormal, excessive and synchronous activity in the brain. Electroencephalogram signals emanating from the brain may be captured, analysed and then play a significant role in detection and prediction of epileptic seizures. In this work we enhance upon a previous approach that relied on the differing properties of the wavelet transform. Here we apply the Maximum Overlap Discrete Wavelet Transform to both reduce signal \textit{noise} and use signal variance exhibited at differing inherent frequency levels to develop various metrics of connection between the electrodes placed upon the scalp. %The properties of both the noise reduced signal and the interconnected electrodes differ significantly during the different brain states.
Using short duration epochs, to approximate close to real time monitoring, together with simple statistical parameters derived from the reconstructed noise reduced signals we initiate seizure detection. To further improve performance we utilise graph theoretic indicators from derived electrode connectivity. From there we build the attribute space. We utilise open-source software and publicly available data to highlight the superior Recall/Sensitivity performance of our approach, when compared to existing published methods.
△ Less
Submitted 27 November, 2023;
originally announced December 2023.
-
Respiratory Anomaly Detection using Reflected Infrared Light-wave Signals
Authors:
Md Zobaer Islam,
Brenden Martin,
Carly Gotcher,
Tyler Martinez,
John F. O'Hara,
Sabit Ekin
Abstract:
In this study, we present a non-contact respiratory anomaly detection method using incoherent light-wave signals reflected from the chest of a mechanical robot that can breathe like human beings. In comparison to existing radar and camera-based sensing systems for vitals monitoring, this technology uses only a low-cost ubiquitous infrared light source and sensor. This light-wave sensing system rec…
▽ More
In this study, we present a non-contact respiratory anomaly detection method using incoherent light-wave signals reflected from the chest of a mechanical robot that can breathe like human beings. In comparison to existing radar and camera-based sensing systems for vitals monitoring, this technology uses only a low-cost ubiquitous infrared light source and sensor. This light-wave sensing system recognizes different breathing anomalies from the variations of light intensity reflected from the chest of the robot within a 0.5m-1.5m range with an average classification accuracy of up to 96.6% using machine learning.
△ Less
Submitted 22 April, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Critical Role of Artificially Intelligent Conversational Chatbot
Authors:
Seraj A. M. Mostafa,
Md Z. Islam,
Mohammad Z. Islam,
Fairose Jeehan,
Saujanna Jafreen,
Raihan U. Islam
Abstract:
Artificially intelligent chatbot, such as ChatGPT, represents a recent and powerful advancement in the AI domain. Users prefer them for obtaining quick and precise answers, avoiding the usual hassle of clicking through multiple links in traditional searches. ChatGPT's conversational approach makes it comfortable and accessible for finding answers quickly and in an organized manner. However, it is…
▽ More
Artificially intelligent chatbot, such as ChatGPT, represents a recent and powerful advancement in the AI domain. Users prefer them for obtaining quick and precise answers, avoiding the usual hassle of clicking through multiple links in traditional searches. ChatGPT's conversational approach makes it comfortable and accessible for finding answers quickly and in an organized manner. However, it is important to note that these chatbots have limitations, especially in terms of providing accurate answers as well as ethical concerns. In this study, we explore various scenarios involving ChatGPT's ethical implications within academic contexts, its limitations, and the potential misuse by specific user groups. To address these challenges, we propose architectural solutions aimed at preventing inappropriate use and promoting responsible AI interactions.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Predicting Temperature of Major Cities Using Machine Learning and Deep Learning
Authors:
Wasiou Jaharabi,
MD Ibrahim Al Hossain,
Rownak Tahmid,
Md. Zuhayer Islam,
T. M. Saad Rayhan
Abstract:
Currently, the issue that concerns the world leaders most is climate change for its effect on agriculture, environment and economies of daily life. So, to combat this, temperature prediction with strong accuracy is vital. So far, the most effective widely used measure for such forecasting is Numerical weather prediction (NWP) which is a mathematical model that needs broad data from different appli…
▽ More
Currently, the issue that concerns the world leaders most is climate change for its effect on agriculture, environment and economies of daily life. So, to combat this, temperature prediction with strong accuracy is vital. So far, the most effective widely used measure for such forecasting is Numerical weather prediction (NWP) which is a mathematical model that needs broad data from different applications to make predictions. This expensive, time and labor consuming work can be minimized through making such predictions using Machine learning algorithms. Using the database made by University of Dayton which consists the change of temperature in major cities we used the Time Series Analysis method where we use LSTM for the purpose of turning existing data into a tool for future prediction. LSTM takes the long-term data as well as any short-term exceptions or anomalies that may have occurred and calculates trend, seasonality and the stationarity of a data. By using models such as ARIMA, SARIMA, Prophet with the concept of RNN and LSTM we can, filter out any abnormalities, preprocess the data compare it with previous trends and make a prediction of future trends. Also, seasonality and stationarity help us analyze the reoccurrence or repeat over one year variable and removes the constrain of time in which the data was dependent so see the general changes that are predicted. By doing so we managed to make prediction of the temperature of different cities during any time in future based on available data and built a method of accurate prediction. This document contains our methodology for being able to make such predictions.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Malware Resistant Data Protection in Hyper-connected Networks: A survey
Authors:
Jannatul Ferdous,
Rafiqul Islam,
Maumita Bhattacharya,
Md Zahidul Islam
Abstract:
Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system, or gain unauthorised network access without the u…
▽ More
Data protection is the process of securing sensitive information from being corrupted, compromised, or lost. A hyperconnected network, on the other hand, is a computer networking trend in which communication occurs over a network. However, what about malware. Malware is malicious software meant to penetrate private data, threaten a computer system, or gain unauthorised network access without the users consent. Due to the increasing applications of computers and dependency on electronically saved private data, malware attacks on sensitive information have become a dangerous issue for individuals and organizations across the world. Hence, malware defense is critical for kee** our computer systems and data protected. Many recent survey articles have focused on either malware detection systems or single attacking strategies variously. To the best of our knowledge, no survey paper demonstrates malware attack patterns and defense strategies combinedly. Through this survey, this paper aims to address this issue by merging diverse malicious attack patterns and machine learning (ML) based detection models for modern and sophisticated malware. In doing so, we focus on the taxonomy of malware attack patterns based on four fundamental dimensions the primary goal of the attack, method of attack, targeted exposure and execution process, and types of malware that perform each attack. Detailed information on malware analysis approaches is also investigated. In addition, existing malware detection techniques employing feature extraction and ML algorithms are discussed extensively. Finally, it discusses research difficulties and unsolved problems, including future research directions.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
A Semi-Automated Hybrid Schema Matching Framework for Vegetation Data Integration
Authors:
Md Asif-Ur-Rahman,
Bayzid Ashik Hossain,
Michael Bewong,
Md Zahidul Islam,
Yanchang Zhao,
Jeremy Groves,
Rory Judith
Abstract:
Integrating disparate and distributed vegetation data is critical for consistent and informed national policy development and management. Australia's National Vegetation Information System (NVIS) under the Department of Climate Change, Energy, the Environment and Water (DCCEEW) is the only nationally consistent vegetation database and hierarchical typology of vegetation types in different location…
▽ More
Integrating disparate and distributed vegetation data is critical for consistent and informed national policy development and management. Australia's National Vegetation Information System (NVIS) under the Department of Climate Change, Energy, the Environment and Water (DCCEEW) is the only nationally consistent vegetation database and hierarchical typology of vegetation types in different locations. Currently, this database employs manual approaches for integrating disparate state and territory datasets which is labour intensive and can be prone to human errors. To cope with the ever-increasing need for up to date vegetation data derived from heterogeneous data sources, a Semi-Automated Hybrid Matcher (SAHM) is proposed in this paper. SAHM utilizes both schema level and instance level matching following a two-tier matching framework. A key novel technique in SAHM called Multivariate Statistical Matching is proposed for automated schema scoring which takes advantage of domain knowledge and correlations between attributes to enhance the matching. To verify the effectiveness of the proposed framework, the performance of the individual as well as combined components of SAHM have been evaluated. The empirical evaluation shows the effectiveness of the proposed framework which outperforms existing state of the art methods like Cupid, Coma, Similarity Flooding, Jaccard Leven Matcher, Distribution Based Matcher, and EmbDI. In particular, SAHM achieves between 88% and 100% accuracy with significantly better F1 scores in comparison with state-of-the-art techniques. SAHM is also shown to be several orders of magnitude more efficient than existing techniques.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Enhancing Cluster Quality of Numerical Datasets with Domain Ontology
Authors:
Sudath Rohitha Heiyanthuduwage,
Md Anisur Rahman,
Md Zahidul Islam
Abstract:
Ontology-based clustering has gained attention in recent years due to the potential benefits of ontology. Current ontology-based clustering approaches have mainly been applied to reduce the dimensionality of attributes in text document clustering. Reduction in dimensionality of attributes using ontology helps to produce high quality clusters for a dataset. However, ontology-based approaches in clu…
▽ More
Ontology-based clustering has gained attention in recent years due to the potential benefits of ontology. Current ontology-based clustering approaches have mainly been applied to reduce the dimensionality of attributes in text document clustering. Reduction in dimensionality of attributes using ontology helps to produce high quality clusters for a dataset. However, ontology-based approaches in clustering numerical datasets have not been gained enough attention. Moreover, some literature mentions that ontology-based clustering can produce either high quality or low-quality clusters from a dataset. Therefore, in this paper we present a clustering approach that is based on domain ontology to reduce the dimensionality of attributes in a numerical dataset using domain ontology and to produce high quality clusters. For every dataset, we produce three datasets using domain ontology. We then cluster these datasets using a genetic algorithm-based clustering technique called GenClust++. The clusters of each dataset are evaluated in terms of Sum of Squared-Error (SSE). We use six numerical datasets to evaluate the performance of our ontology-based approach. The experimental results of our approach indicate that cluster quality gradually improves from lower to the higher levels of a domain ontology.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
Combined Location Online Weather Data: Easy-to-use Targeted Weather Analysis for Agriculture
Authors:
Darren Yates,
Christopher Blanchard,
Allister Clarke,
Sabih-Ur Rehman,
Md Zahidul Islam,
Russell Ford,
Rob Walsh
Abstract:
The continuing effects of climate change require farmers and growers to have greater understanding of how these changes affect crop production. However, while climatic data is generally available to help provide much of that understanding, it can often be in a form not easy to digest. The proposed Combined Location Online Weather Data (CLOWD) framework is an easy-to-use online platform for analysi…
▽ More
The continuing effects of climate change require farmers and growers to have greater understanding of how these changes affect crop production. However, while climatic data is generally available to help provide much of that understanding, it can often be in a form not easy to digest. The proposed Combined Location Online Weather Data (CLOWD) framework is an easy-to-use online platform for analysing recent and historical weather data of any location within Australia at the click of a map. CLOWD requires no programming skills and operates in any HTML5 web browser on PC and mobile devices. It enables comparison between current and previous growing seasons over a range of environmental parameters, and can create a plain-English PDF report for offline use, using natural language generation (NLG). This paper details the platform, the design decisions taken and outlines how farmers and growers can use CLOWD to better understand current growing conditions. Prototypes of CLOWD are now online for PCs and smartphones.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Real-Time Traffic End-of-Queue Detection and Tracking in UAV Video
Authors:
Russ Messenger,
Md Zobaer Islam,
Matthew Whitlock,
Erik Spong,
Nate Morton,
Layne Claggett,
Chris Matthews,
Jordan Fox,
Leland Palmer,
Dane C. Johnson,
John F. O'Hara,
Christopher J. Crick,
Jamey D. Jacob,
Sabit Ekin
Abstract:
Highway work zones are susceptible to undue accumulation of motorized vehicles which calls for dynamic work zone warning signs to prevent accidents. The work zone signs are placed according to the location of the end-of-queue of vehicles which usually changes rapidly. The detection of moving objects in video captured by Unmanned Aerial Vehicles (UAV) has been extensively researched so far, and is…
▽ More
Highway work zones are susceptible to undue accumulation of motorized vehicles which calls for dynamic work zone warning signs to prevent accidents. The work zone signs are placed according to the location of the end-of-queue of vehicles which usually changes rapidly. The detection of moving objects in video captured by Unmanned Aerial Vehicles (UAV) has been extensively researched so far, and is used in a wide array of applications including traffic monitoring. Unlike the fixed traffic cameras, UAVs can be used to monitor the traffic at work zones in real-time and also in a more cost-effective way. This study presents a method as a proof of concept for detecting End-of-Queue (EOQ) of traffic by processing the real-time video footage of a highway work zone captured by UAV. EOQ is detected in the video by image processing which includes background subtraction and blob detection methods. This dynamic localization of EOQ of vehicles will enable faster and more accurate relocation of work zone warning signs for drivers and thus will reduce work zone fatalities. The method can be applied to detect EOQ of vehicles and notify drivers in any other roads or intersections too where vehicles are rapidly accumulating due to special events, traffic jams, construction, or accidents.
△ Less
Submitted 31 October, 2023; v1 submitted 9 January, 2023;
originally announced February 2023.
-
A Brief Overview of Software-Defined Networking
Authors:
Alexander Nunez,
Joseph Ayoka,
Md Zahidul Islam,
Pablo Ruiz
Abstract:
The Internet is the driving force of the new digital world, which has created a revolution. With the concept of the Internet of Things (IoT), almost everything is being connected to the internet. However, with the traditional IP network system, it is computationally very complex and costly to manage and configure the network, where the data plane and the control plane are tightly coupled. In order…
▽ More
The Internet is the driving force of the new digital world, which has created a revolution. With the concept of the Internet of Things (IoT), almost everything is being connected to the internet. However, with the traditional IP network system, it is computationally very complex and costly to manage and configure the network, where the data plane and the control plane are tightly coupled. In order to simplify the network management tasks, software-defined networking (SDN) has been proposed as a promising paradigm shift towards an externalized and logically centralized network control plane. SDN decouples the control plane and the data plane and provides programmability to configure the network. To address the overwhelming advancement of this new technology, a holistic overview of SDN is provided in this paper by describing different layers and their functionalities in SDN. The paper presents a simple but effective overview of SDN, which will pave the way for the readers to understand this new technology and contribute to this field.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Hand Gesture Recognition through Reflected Infrared Light Wave Signals
Authors:
Md Zobaer Islam,
Li Yu,
Hisham Abuella,
John F. O'Hara,
Christopher Crick,
Sabit Ekin
Abstract:
In this study, we present a wireless (non-contact) gesture recognition method using only incoherent light wave signals reflected from a human subject. In comparison to existing radar, light shadow, sound and camera-based sensing systems, this technology uses a low-cost ubiquitous light source (e.g., infrared LED) to send light towards the subject's hand performing gestures and the reflected light…
▽ More
In this study, we present a wireless (non-contact) gesture recognition method using only incoherent light wave signals reflected from a human subject. In comparison to existing radar, light shadow, sound and camera-based sensing systems, this technology uses a low-cost ubiquitous light source (e.g., infrared LED) to send light towards the subject's hand performing gestures and the reflected light is collected by a light sensor (e.g., photodetector). This light wave sensing system recognizes different gestures from the variations of the received light intensity within a 20-35cm range. The hand gesture recognition results demonstrate up to 96% accuracy on average. The developed system can be utilized in numerous Human-computer Interaction (HCI) applications as a low-cost and non-contact gesture recognition technology.
△ Less
Submitted 13 June, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
Noncontact Respiratory Anomaly Detection Using Infrared Light-Wave Sensing
Authors:
Md Zobaer Islam,
Brenden Martin,
Carly Gotcher,
Tyler Martinez,
John F. O'Hara,
Sabit Ekin
Abstract:
Human respiratory rate and its pattern convey essential information about the physical and psychological states of the subject. Abnormal breathing can indicate fatal health issues leading to further diagnosis and treatment. Wireless light-wave sensing (LWS) using incoherent infrared light shows promise in safe, discreet, efficient, and non-invasive human breathing monitoring without raising privac…
▽ More
Human respiratory rate and its pattern convey essential information about the physical and psychological states of the subject. Abnormal breathing can indicate fatal health issues leading to further diagnosis and treatment. Wireless light-wave sensing (LWS) using incoherent infrared light shows promise in safe, discreet, efficient, and non-invasive human breathing monitoring without raising privacy concerns. The respiration monitoring system needs to be trained on different types of breathing patterns to identify breathing anomalies.The system must also validate the collected data as a breathing waveform, discarding any faulty data caused by external interruption, user movement, or system malfunction. To address these needs, this study simulated normal and different types of abnormal respiration using a robot that mimics human breathing patterns. Then, time-series respiration data were collected using infrared light-wave sensing technology. Three machine learning algorithms, decision tree, random forest and XGBoost, were applied to detect breathing anomalies and faulty data. Model performances were evaluated through cross-validation, assessing classification accuracy, precision and recall scores. The random forest model achieved the highest classification accuracy of 96.75% with data collected at a 0.5m distance. In general, ensemble models like random forest and XGBoost performed better than a single model in classifying the data collected at multiple distances from the light-wave sensing setup.
△ Less
Submitted 16 April, 2024; v1 submitted 9 January, 2023;
originally announced January 2023.
-
How Do Organizations Seek Cyber Assurance? Investigations on the Adoption of the Common Criteria and Beyond
Authors:
Nan Sun,
Chang-Tsun Li,
Hin Chan,
Md Zahidul Islam,
Md Rafiqul Islam,
Warren Armstrong
Abstract:
Cyber assurance, which is the ability to operate under the onslaught of cyber attacks and other unexpected events, is essential for organizations facing inundating security threats on a daily basis. Organizations usually employ multiple strategies to conduct risk management to achieve cyber assurance. Utilizing cybersecurity standards and certifications can provide guidance for vendors to design a…
▽ More
Cyber assurance, which is the ability to operate under the onslaught of cyber attacks and other unexpected events, is essential for organizations facing inundating security threats on a daily basis. Organizations usually employ multiple strategies to conduct risk management to achieve cyber assurance. Utilizing cybersecurity standards and certifications can provide guidance for vendors to design and manufacture secure Information and Communication Technology (ICT) products as well as provide a level of assurance of the security functionality of the products for consumers. Hence, employing security standards and certifications is an effective strategy for risk management and cyber assurance. In this work, we begin with investigating the adoption of cybersecurity standards and certifications by surveying 258 participants from organizations across various countries and sectors. Specifically, we identify adoption barriers of the Common Criteria through the designed questionnaire. Taking into account the seven identified adoption barriers, we show the recommendations for promoting cybersecurity standards and certifications. Moreover, beyond cybersecurity standards and certifications, we shed light on other risk management strategies devised by our participants, which provides directions on cybersecurity approaches for enhancing cyber assurance in organizations.
△ Less
Submitted 5 March, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
STRIDE-based Cyber Security Threat Modeling for IoT-enabled Precision Agriculture Systems
Authors:
Md. Rashid Al Asif,
Khondokar Fida Hasan,
Md Zahidul Islam,
Rahamatullah Khondoker
Abstract:
The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of reso…
▽ More
The concept of traditional farming is changing rapidly with the introduction of smart technologies like the Internet of Things (IoT). Under the concept of smart agriculture, precision agriculture is gaining popularity to enable Decision Support System (DSS)-based farming management that utilizes widespread IoT sensors and wireless connectivity to enable automated detection and optimization of resources. Undoubtedly the success of the system would be impacted on crop productivity, where failure would impact severely. Like many other cyber-physical systems, one of the growing challenges to avoid system adversity is to ensure the system's security, privacy, and trust. But what are the vulnerabilities, threats, and security issues we should consider while deploying precision agriculture? This paper has conducted a holistic threat modeling on component levels of precision agriculture's standard infrastructure using popular threat intelligence tools STRIDE to identify common security issues. Our modeling identifies a noticing of fifty-eight potential security threats to consider. This presentation systematically presented them and advised general mitigation suggestions to support cyber security in precision agriculture.
△ Less
Submitted 30 January, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Defining Security Requirements with the Common Criteria: Applications, Adoptions, and Challenges
Authors:
Nan Sun,
Chang-Tsun Li,
Hin Chan,
Ba Dung Le,
MD Zahidul Islam,
Leo Yu Zhang,
MD Rafiqul Islam,
Warren Armstrong
Abstract:
Advances of emerging Information and Communications Technology (ICT) technologies push the boundaries of what is possible and open up new markets for innovative ICT products and services. The adoption of ICT products and systems with security properties depends on consumers' confidence and markets' trust in the security functionalities and whether the assurance measures applied to these products m…
▽ More
Advances of emerging Information and Communications Technology (ICT) technologies push the boundaries of what is possible and open up new markets for innovative ICT products and services. The adoption of ICT products and systems with security properties depends on consumers' confidence and markets' trust in the security functionalities and whether the assurance measures applied to these products meet the inherent security requirements. Such confidence and trust are primarily gained through the rigorous development of security requirements, validation criteria, evaluation, and certification. Common Criteria for Information Technology Security Evaluation (often referred to as Common Criteria or CC) is an international standard (ISO/IEC 15408) for cyber security certification. In this paper, we conduct a systematic review of the CC standards and its adoptions. Adoption barriers of the CC are also investigated based on the analysis of current trends in security evaluation. Specifically, we share the experiences and lessons gained through the recent Development of Australian Cyber Criteria Assessment (DACCA) project that promotes the CC among stakeholders in ICT security products related to specification, development, evaluation, certification and approval, procurement, and deployment. Best practices on develo** Protection Profiles, recommendations, and future directions for trusted cybersecurity advancement are presented.
△ Less
Submitted 2 April, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
BRACU Mongol Tori: Next Generation Mars Exploration Rover
Authors:
Niaz Sharif Shourov,
Masnur Rahman,
Mohammad Zahirul Islam,
Ali Ahsan,
Syed Md Kamruzzaman,
Saifur Rahman,
Md Sakiluzzaman,
Intisar Hasnain,
Ekhwan Islam,
Saiful Islam,
Md. Khalilur Rhaman
Abstract:
BRAC University (BRACU) has participated in the University Rover Challenge (URC), a robotics competition for university level students organized by the Mars Society to design and build a rover that would be of use to early explorers on Mars. BRACU has designed and developed a full functional next-generation mars rover, Mongol Tori, which can be operated in the extreme, hostile condition expected i…
▽ More
BRAC University (BRACU) has participated in the University Rover Challenge (URC), a robotics competition for university level students organized by the Mars Society to design and build a rover that would be of use to early explorers on Mars. BRACU has designed and developed a full functional next-generation mars rover, Mongol Tori, which can be operated in the extreme, hostile condition expected in planet Mars. Not only has Mongol Tori embedded with both autonomous and manual controlled features to functionalize, it can also capable of conducting scientific tasks to identify the characteristics of soils and weathering in the mars environment.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
A Generalised Logical Layered Architecture for Blockchain Technology
Authors:
Jared Newell,
Quazi Mamun,
Sabih ur Rehman,
Md Zahidul Islam
Abstract:
Precision, validity, reliability, timeliness, availability, and granularity are the desired characteristics for data and information systems. However due to the desired trait of data mutability, information systems have inherently lacked the ability to enforce data integrity without governance. A resolution to this challenge has emerged in the shape of blockchain architecture, which ensures immuta…
▽ More
Precision, validity, reliability, timeliness, availability, and granularity are the desired characteristics for data and information systems. However due to the desired trait of data mutability, information systems have inherently lacked the ability to enforce data integrity without governance. A resolution to this challenge has emerged in the shape of blockchain architecture, which ensures immutability of stored information, whilst remaining in an online state. Blockchain technology achieves this through the serial attachment of set-sized parcels of data called blocks. Links (liken to a chain) between these blocks are implemented using a cryptographic seal created using mathematical functions on the data inside the blocks. Practical implementations of blockchain vary by different components, concepts, and terminologies. Researchers proposed various architectural models using different layers to implement blockchain technologies. In this paper, we investigated those layered architectures for different use cases. We identified essential layers and components for a generalised blockchain architecture. We present a novel three-tiered storage model for the purpose of logically defining and categorising blockchain as a storage technology. We envision that this generalised model will be used as a guide when referencing and building any blockchain storage solution.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
EEG Signal Processing using Wavelets for Accurate Seizure Detection through Cost Sensitive Data Mining
Authors:
Paul Grant,
Md Zahidul Islam
Abstract:
Epilepsy is one of the most common and yet diverse set of chronic neurological disorders. This excessive or synchronous neuronal activity is termed seizure. Electroencephalogram signal processing plays a significant role in detection and prediction of epileptic seizures. In this paper we introduce an approach that relies upon the properties of wavelets for seizure detection. We utilise the Maximum…
▽ More
Epilepsy is one of the most common and yet diverse set of chronic neurological disorders. This excessive or synchronous neuronal activity is termed seizure. Electroencephalogram signal processing plays a significant role in detection and prediction of epileptic seizures. In this paper we introduce an approach that relies upon the properties of wavelets for seizure detection. We utilise the Maximum Overlap Discrete Wavelet Transform which enables us to reduce signal noise Then from the variance exhibited in wavelet coefficients we develop connectivity and communication efficiency between the electrodes as these properties differ significantly during a seizure period in comparison to a non-seizure period. We use basic statistical parameters derived from the reconstructed noise reduced signal, electrode connectivity and the efficiency of information transfer to build the attribute space.
We have utilised data that are publicly available to test our method that is found to be significantly better than some existing approaches.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Signal Classification using Smooth Coefficients of Multiple wavelets
Authors:
Paul Grant,
Md Zahidul Islam
Abstract:
Classification of time series signals has become an important construct and has many practical applications. With existing classifiers we may be able to accurately classify signals, however that accuracy may decline if using a reduced number of attributes. Transforming the data then undertaking reduction in dimensionality may improve the quality of the data analysis, decrease time required for cla…
▽ More
Classification of time series signals has become an important construct and has many practical applications. With existing classifiers we may be able to accurately classify signals, however that accuracy may decline if using a reduced number of attributes. Transforming the data then undertaking reduction in dimensionality may improve the quality of the data analysis, decrease time required for classification and simplify models. We propose an approach, which chooses suitable wavelets to transform the data, then combines the output from these transforms to construct a dataset to then apply ensemble classifiers to. We demonstrate this on different data sets, across different classifiers and use differing evaluation methods. Our experimental results demonstrate the effectiveness of the proposed technique, compared to the approaches that use either raw signal data or a single wavelet transform.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
A Framework for Supervised Heterogeneous Transfer Learning using Dynamic Distribution Adaptation and Manifold Regularization
Authors:
Md Geaur Rahman,
Md Zahidul Islam
Abstract:
Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by tra…
▽ More
Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of fourteen state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
△ Less
Submitted 2 September, 2022; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Intelligent Stretch Reduction in Information-CentricNetworking towards 5G-Tactile Internet realization
Authors:
Hussain Ahmad,
Muhammad Zubair Islam,
Amir Haider,
Rashid Ali,
Hyung Seok Kim
Abstract:
In recent years, 5G is widely used in parallel with IoT networks to enable massive data connectivity and exchange with ultra-reliable and low latency communication (URLLC) services. The internet requirements from user's perspective have shifted from simple human to human interactions to different communication paradigms and information-centric networking (ICN). ICN distributes the content among t…
▽ More
In recent years, 5G is widely used in parallel with IoT networks to enable massive data connectivity and exchange with ultra-reliable and low latency communication (URLLC) services. The internet requirements from user's perspective have shifted from simple human to human interactions to different communication paradigms and information-centric networking (ICN). ICN distributes the content among the users based on their trending requests. ICN is responsible not only for the routing and caching but also for naming the network's content. ICN considers several parameters such as cache-hit ratio, content diversity, content redundancy, and stretch to route the content. ICN enables name-based caching of the required content according to the user's request based on the router's interest table. The stretch shows the path covered while retrieving the content from producer to consumer. Reduction in path length also leads to a reduction in end-to-end latency and better data rate availability. ICN routers must have the minimum stretch to obtain a better system efficiency. Reinforcement learning (RL) is widely used in networks environment to increase agent efficiency to make decisions. In ICN, RL can aid to increase caching and stretch efficiency. This paper investigates a stretch reduction strategy for ICN routers by formulating the stretch reduction problem as a Markov decision process. The evaluation of the proposed stretch reduction strategy's accuracy is done by employing Q-Learning, an RL technique. The simulation results indicate that by using the optimal parameters for the proposed stretch reduction strategy.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Adaptive Decision Forest: An Incremental Machine Learning Framework
Authors:
Md Geaur Rahman,
Md Zahidul Islam
Abstract:
In this study, we present an incremental machine learning framework called Adaptive Decision Forest (ADF), which produces a decision forest to classify new records. Based on our two novel theorems, we introduce a new splitting strategy called iSAT, which allows ADF to classify new records even if they are associated with previously unseen classes. ADF is capable of identifying and handling concept…
▽ More
In this study, we present an incremental machine learning framework called Adaptive Decision Forest (ADF), which produces a decision forest to classify new records. Based on our two novel theorems, we introduce a new splitting strategy called iSAT, which allows ADF to classify new records even if they are associated with previously unseen classes. ADF is capable of identifying and handling concept drift; it, however, does not forget previously gained knowledge. Moreover, ADF is capable of handling big data if the data can be divided into batches. We evaluate ADF on five publicly available natural data sets and one synthetic data set, and compare the performance of ADF against the performance of eight state-of-the-art techniques. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Detecting Autism Spectrum Disorder using Machine Learning
Authors:
Md Delowar Hossain,
Muhammad Ashad Kabir,
Adnan Anwar,
Md Zahidul Islam
Abstract:
Autism Spectrum Disorder (ASD), which is a neuro development disorder, is often accompanied by sensory issues such an over sensitivity or under sensitivity to sounds and smells or touch. Although its main cause is genetics in nature, early detection and treatment can help to improve the conditions. In recent years, machine learning based intelligent diagnosis has been evolved to complement the tra…
▽ More
Autism Spectrum Disorder (ASD), which is a neuro development disorder, is often accompanied by sensory issues such an over sensitivity or under sensitivity to sounds and smells or touch. Although its main cause is genetics in nature, early detection and treatment can help to improve the conditions. In recent years, machine learning based intelligent diagnosis has been evolved to complement the traditional clinical methods which can be time consuming and expensive. The focus of this paper is to find out the most significant traits and automate the diagnosis process using available classification techniques for improved diagnosis purpose. We have analyzed ASD datasets of Toddler, Child, Adolescent and Adult. We determine the best performing classifier for these binary datasets using the evaluation metrics recall, precision, F-measures and classification errors. Our finding shows that Sequential minimal optimization (SMO) based Support Vector Machines (SVM) classifier outperforms all other benchmark machine learning algorithms in terms of accuracy during the detection of ASD cases and produces less classification errors compared to other algorithms. Also, we find that Relief Attributes algorithm is the best to identify the most significant attributes in ASD datasets.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
FastForest: Increasing Random Forest Processing Speed While Maintaining Accuracy
Authors:
Darren Yates,
Md Zahidul Islam
Abstract:
Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the domain of hardware-constrained devices such as smartphones and Internet of Things (IoT) devices, there is continued need for further research into algorithm efficie…
▽ More
Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the domain of hardware-constrained devices such as smartphones and Internet of Things (IoT) devices, there is continued need for further research into algorithm efficiency to deliver greater processing speed without sacrificing accuracy. Our proposed FastForest algorithm delivers an average 24% increase in processing speed compared with Random Forest whilst maintaining (and frequently exceeding) it on classification accuracy over tests involving 45 datasets. FastForest achieves this result through a combination of three optimising components - Subsample Aggregating ('Subbagging'), Logarithmic Split-Point Sampling and Dynamic Restricted Subspacing. Moreover, detailed testing of Subbagging sizes has found an optimal scalar delivering a positive mix of processing performance and accuracy.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
A Novel Incremental Clustering Technique with Concept Drift Detection
Authors:
Mitchell D. Woodbright,
Md Anisur Rahman,
Md Zahidul Islam
Abstract:
Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover,…
▽ More
Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover, it can be challenging to store the combined dataset in memory due to its ever-increasing size. As a result, various incremental clustering techniques have been proposed. These techniques need to efficiently update the current clustering result whenever a new batch arrives, to adapt the current clustering result/solution with the latest data. These techniques also need the ability to detect concept drifts when the clustering pattern of a new batch is significantly different from older batches. Sometimes, clustering patterns may drift temporarily in a single batch while the next batches do not exhibit the drift. Therefore, incremental clustering techniques need the ability to detect a temporary drift and sustained drift. In this paper, we propose an efficient incremental clustering algorithm called UIClust. It is designed to cluster streams of data chunks, even when there are temporary or sustained concept drifts. We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm. We use real and synthetic datasets. We compare the results by using well-known clustering evaluation criteria: entropy, sum of squared errors (SSE), and execution time. Our results show that UIClust outperforms the existing technique in all our experiments.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Tree Index: A New Cluster Evaluation Technique
Authors:
A. H. Beg,
Md Zahidul Islam,
Vladimir Estivill-Castro
Abstract:
We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the…
▽ More
We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. Our Tree Index produces a decision tree from the clustered data set, using the cluster identifiers as labels. It combines the entropy of each leaf with their depth. Intuitively, a shorter tree with pure leaves generalizes the data well (the clusters are easy to learn because they are well separated). So, the labels are meaningful clusters. If the clustering algorithm does not separate well, trees learned from their results will be large and too detailed. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters. We confirm the effectiveness of Tree Index through graphical visualizations. Tree Index evaluates the sensible solutions higher than the non-sensible solutions while existing cluster-quality indexes fail to do so.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Data Pre-Processing and Evaluating the Performance of Several Data Mining Methods for Predicting Irrigation Water Requirement
Authors:
Mahmood A. Khan,
Md Zahidul Islam,
Mohsin Hafeez
Abstract:
Recent drought and population growth are planting unprecedented demand for the use of available limited water resources. Irrigated agriculture is one of the major consumers of freshwater. A large amount of water in irrigated agriculture is wasted due to poor water management practices. To improve water management in irrigated areas, models for estimation of future water requirements are needed. De…
▽ More
Recent drought and population growth are planting unprecedented demand for the use of available limited water resources. Irrigated agriculture is one of the major consumers of freshwater. A large amount of water in irrigated agriculture is wasted due to poor water management practices. To improve water management in irrigated areas, models for estimation of future water requirements are needed. Develo** a model for forecasting irrigation water demand can improve water management practices and maximise water productivity. Data mining can be used effectively to build such models.
In this study, we prepare a dataset containing information on suitable attributes for forecasting irrigation water demand. The data is obtained from three different sources namely meteorological data, remote sensing images and water delivery statements. In order to make the prepared dataset useful for demand forecasting and pattern extraction, we pre-process the dataset using a novel approach based on a combination of irrigation and data mining knowledge. We then apply and compare the effectiveness of different data mining methods namely decision tree (DT), artificial neural networks (ANNs), systematically developed forest (SysFor) for multiple trees, support vector machine (SVM), logistic regression, and the traditional Evapotranspiration (ETc) methods and evaluate the performance of these models to predict irrigation water demand. Our experimental results indicate the usefulness of data pre-processing and the effectiveness of different classifiers. Among the six methods we used, SysFor produces the best prediction with 97.5% accuracy followed by a decision tree with 96% and ANN with 95% respectively by closely matching the predictions with actual water usage. Therefore, we recommend using SysFor and DT models for irrigation water demand forecasting.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
DataLearner: A Data Mining and Knowledge Discovery Tool for Android Smartphones and Tablets
Authors:
Darren Yates,
Md Zahidul Islam,
Junbin Gao
Abstract:
Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data-mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data-mining application designed specifically for Android devices that imports the Weka data-mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can…
▽ More
Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data-mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data-mining application designed specifically for Android devices that imports the Weka data-mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can be expanded with additional algorithms. Combined, DataLearner delivers 40 classification, clustering and association rule mining algorithms for model training and evaluation without need for cloud computing resources or network connectivity. It provides the same classification accuracy as PCs and laptops, while doing so with acceptable processing speed and consuming negligible battery life. With its ability to provide easy-to-use data-mining on a phone-size screen, DataLearner is a new portable, self-contained data-mining tool for remote, personalised and learning applications alike. DataLearner features four elements - this paper, the app available on Google Play, the GPL3-licensed source code on GitHub and a short video on YouTube.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.
-
Decision Tree Classification with Differential Privacy: A Survey
Authors:
Sam Fletcher,
Md Zahidul Islam
Abstract:
Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while…
▽ More
Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while simultaneously not ruining the predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their participation. In this survey, we focus on one particular data mining algorithm -- decision trees -- and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze both greedy and random decision trees, and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
△ Less
Submitted 23 May, 2019; v1 submitted 7 November, 2016;
originally announced November 2016.
-
Differentially Private Random Decision Forests using Smooth Sensitivity
Authors:
Sam Fletcher,
Md Zahidul Islam
Abstract:
We propose a new differentially-private decision forest algorithm that minimizes both the number of queries required, and the sensitivity of those queries. To do so, we build an ensemble of random decision trees that avoids querying the private data except to find the majority class label in the leaf nodes. Rather than using a count query to return the class counts like the current state-of-the-ar…
▽ More
We propose a new differentially-private decision forest algorithm that minimizes both the number of queries required, and the sensitivity of those queries. To do so, we build an ensemble of random decision trees that avoids querying the private data except to find the majority class label in the leaf nodes. Rather than using a count query to return the class counts like the current state-of-the-art, we use the Exponential Mechanism to only output the class label itself. This drastically reduces the sensitivity of the query -- often by several orders of magnitude -- which in turn reduces the amount of noise that must be added to preserve privacy. Our improved sensitivity is achieved by using "smooth sensitivity", which takes into account the specific data used in the query rather than assuming the worst-case scenario. We also extend work done on the optimal depth of random decision trees to handle continuous features, not just discrete features. This, along with several other improvements, allows us to create a differentially private decision forest with substantially higher predictive power than the current state-of-the-art.
△ Less
Submitted 23 August, 2021; v1 submitted 11 June, 2016;
originally announced June 2016.
-
Measuring pattern retention in anonymized data -- where one measure is not enough
Authors:
Sam Fletcher,
Md Zahidul Islam
Abstract:
In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem -- how does one make sure that modified data still contains the information it had before modification? This question is not the same as askin…
▽ More
In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem -- how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the data that no pre-existing techniques can measure. These measures do not negate the usefulness of prediction accuracy or other measures -- they are complementary to them, and support our argument that one measure is almost never enough.
△ Less
Submitted 24 December, 2015;
originally announced December 2015.
-
Analyzing the Low Power Wireless Links for Wireless Sensor Networks
Authors:
Md. Mainul Islam Mamun,
Tarek Hasan-Al-Mahmud,
Sumon Kumar,
Md. Zahidul Islam
Abstract:
There is now an increased understanding of the need for realistic link layer models in the wireless sensor networks. In this paper, we have used mathematical techniques from communication theory to model and analyze low power wireless links. Our work provides theoretical models for the link layer showing how Packet Reception Rate vary with Signal to Noise Ratio and distance for different modulat…
▽ More
There is now an increased understanding of the need for realistic link layer models in the wireless sensor networks. In this paper, we have used mathematical techniques from communication theory to model and analyze low power wireless links. Our work provides theoretical models for the link layer showing how Packet Reception Rate vary with Signal to Noise Ratio and distance for different modulation schemes and a comparison between MICA2 and TinyNode in terms of PRR.
△ Less
Submitted 25 February, 2010;
originally announced February 2010.