-
Unbiasing on the Fly: Explanation-Guided Human Oversight of Machine Learning System Decisions
Authors:
Hussaini Mamman,
Shuib Basri,
Abdullateef Balogun,
Abubakar Abdullahi Imam,
Ganesh Kumar,
Luiz Fernando Capretz
Abstract:
The widespread adoption of ML systems across critical domains like hiring, finance, and healthcare raises growing concerns about their potential for discriminatory decision-making based on protected attributes. While efforts to ensure fairness during development are crucial, they leave deployed ML systems vulnerable to potentially exhibiting discrimination during their operations. To address this…
▽ More
The widespread adoption of ML systems across critical domains like hiring, finance, and healthcare raises growing concerns about their potential for discriminatory decision-making based on protected attributes. While efforts to ensure fairness during development are crucial, they leave deployed ML systems vulnerable to potentially exhibiting discrimination during their operations. To address this gap, we propose a novel framework for on-the-fly tracking and correction of discrimination in deployed ML systems. Leveraging counterfactual explanations, the framework continuously monitors the predictions made by an ML system and flags discriminatory outcomes. When flagged, post-hoc explanations related to the original prediction and the counterfactual alternatives are presented to a human reviewer for real-time intervention. This human-in-the-loop approach empowers reviewers to accept or override the ML system decision, enabling fair and responsible ML operation under dynamic settings. While further work is needed for validation and refinement, this framework offers a promising avenue for mitigating discrimination and building trust in ML systems deployed in a wide range of domains.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Cascade Generalization-based Classifiers for Software Defect Prediction
Authors:
Aminat Bashir,
Abdullateef Balogun,
Matthew Adigun,
Sunday Ajagbe,
Luiz Fernando Capretz,
Joseph Awotunde,
Hammed Mojeed
Abstract:
The process of software defect prediction (SDP) involves predicting which software system modules or components pose the highest risk of being defective. The projections and discernments derived from SDP can then assist the software development team in effectively allocating its finite resources toward potentially susceptible defective modules. Because of this, SDP models need to be improved and r…
▽ More
The process of software defect prediction (SDP) involves predicting which software system modules or components pose the highest risk of being defective. The projections and discernments derived from SDP can then assist the software development team in effectively allocating its finite resources toward potentially susceptible defective modules. Because of this, SDP models need to be improved and refined continuously. Hence, this research proposes the deployment of a cascade generalization (CG) function to enhance the predictive performances of machine learning (ML)-based SDP models. The CG function extends the initial sample space by introducing new samples into the neighbourhood of the distribution function generated by the base classification algorithm, subsequently mitigating its bias. Experiments were conducted to investigate the effectiveness of CG-based Naïve Bayes (NB), Decision Tree (DT), and k-Nearest Neighbor (kNN) models on NASA software defect datasets. Based on the experimental results, the CG-based models (CG-NB, CG-DT, CG-kNN) were superior in prediction performance when compared with the baseline NB, DT, and kNN models respectively. Accordingly, the average accuracy value of CG-NB, CG-DT, and CG-kNN models increased by +11.06%, +3.91%, and +5.14%, respectively, over baseline NB, DT, and kNN models. A similar performance was observed for the area under the curve (AUC) value with CG-NB, CG-DT, and CG-kNN recording an average AUC value of +7.98%, +26%, and +24.9% improvement over the baseline NB, DT, and kNN respectively. In addition, the suggested CG-based models outperformed the Bagging and Boosting ensemble variants of the NB, DT, and kNN models as well as existing computationally diverse SDP models.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
A Novel Multidimensional Reference Model For Heterogeneous Textual Datasets Using Context, Semantic And Syntactic Clues
Authors:
Ganesh Kumar,
Shuib Basri,
Abdullahi Abubakar Imam,
Abdullateef Oluwaqbemiga Balogun,
Hussaini Mamman,
Luiz Fernando Capretz
Abstract:
With the advent of technology and use of latest devices, they produces voluminous data. Out of it, 80% of the data are unstructured and remaining 20% are structured and semi-structured. The produced data are in heterogeneous format and without following any standards. Among heterogeneous (structured, semi-structured and unstructured) data, textual data are nowadays used by industries for predictio…
▽ More
With the advent of technology and use of latest devices, they produces voluminous data. Out of it, 80% of the data are unstructured and remaining 20% are structured and semi-structured. The produced data are in heterogeneous format and without following any standards. Among heterogeneous (structured, semi-structured and unstructured) data, textual data are nowadays used by industries for prediction and visualization of future challenges. Extracting useful information from it is really challenging for stakeholders due to lexical and semantic matching. Few studies have been solving this issue by using ontologies and semantic tools, but the main limitations of proposed work were the less coverage of multidimensional terms. To solve this problem, this study aims to produce a novel multidimensional reference model using linguistics categories for heterogeneous textual datasets. The categories such context, semantic and syntactic clues are focused along with their score. The main contribution of MRM is that it checks each tokens with each term based on indexing of linguistic categories such as synonym, antonym, formal, lexical word order and co-occurrence. The experiments show that the percentage of MRM is better than the state-of-the-art single dimension reference model in terms of more coverage, linguistics categories and heterogeneous datasets.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Search-Based Fairness Testing: An Overview
Authors:
Hussaini Mamman,
Shuib Basri,
Abdullateef Oluwaqbemiga Balogun,
Abdullahi Abubakar Imam,
Ganesh Kumar,
Luiz Fernando Capretz
Abstract:
Artificial Intelligence (AI) has demonstrated remarkable capabilities in domains such as recruitment, finance, healthcare, and the judiciary. However, biases in AI systems raise ethical and societal concerns, emphasizing the need for effective fairness testing methods. This paper reviews current research on fairness testing, particularly its application through search-based testing. Our analysis h…
▽ More
Artificial Intelligence (AI) has demonstrated remarkable capabilities in domains such as recruitment, finance, healthcare, and the judiciary. However, biases in AI systems raise ethical and societal concerns, emphasizing the need for effective fairness testing methods. This paper reviews current research on fairness testing, particularly its application through search-based testing. Our analysis highlights progress and identifies areas of improvement in addressing AI systems biases. Future research should focus on leveraging established search-based testing methodologies for fairness testing.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Capital Market Performance and Macroeconomic Dynamics in Nigeria
Authors:
Oladapo Fapetu,
Segun Michael Ojo,
Adekunle Alexander Balogun,
Adeoba Adepoju Asaolu
Abstract:
The study examined the relationship between capital market performance and the macroeconomic dynamics in Nigeria, and it utilized secondary data spanning 1993 to 2020. The data was analyzed using vector error correction model (VECM) technology. The result revealed a significant long run relationship between capital market performance and macroeconomic dynamics in Nigeria. We observed long run caus…
▽ More
The study examined the relationship between capital market performance and the macroeconomic dynamics in Nigeria, and it utilized secondary data spanning 1993 to 2020. The data was analyzed using vector error correction model (VECM) technology. The result revealed a significant long run relationship between capital market performance and macroeconomic dynamics in Nigeria. We observed long run causality running from the exchange rate, inflation, money supply, and unemployment rate to capital market performance indicator in Nigeria. The result supports the Arbitrage Pricing Theory (APT) proposition in the Nigerian context. The theory stipulates that the linear relationship between an asset expected returns and the macroeconomic factors whose dynamics affect the asset risk can forecast an asset's returns. In other words, the result of this study supports the proposition that the dynamics in the exchange rate, inflation, money supply, and unemployment rate influence the capital market performance. The study validates the recommendations of Arbitrage Pricing Theory (APT) in Nigeria.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
HABCSm: A Hamming Based t-way Strategy Based on Hybrid Artificial Bee Colony for Variable Strength Test Sets Generation
Authors:
Ammar K Alazzawi,
Helmi Md Rais,
Shuib Basri,
Yazan A Alsariera,
Luiz Fernando Capretz,
Abdullateef Oluwagbemiga Balogun,
Abdullahi Abubakar Imam
Abstract:
Search-based software engineering that involves the deployment of meta-heuristics in applicable software processes has been gaining wide attention. Recently, researchers have been advocating the adoption of meta-heuristic algorithms for t-way testing strategies (where t points the interaction strength among parameters). Although helpful, no single meta-heuristic based t-way strategy can claim domi…
▽ More
Search-based software engineering that involves the deployment of meta-heuristics in applicable software processes has been gaining wide attention. Recently, researchers have been advocating the adoption of meta-heuristic algorithms for t-way testing strategies (where t points the interaction strength among parameters). Although helpful, no single meta-heuristic based t-way strategy can claim dominance over its counterparts. For this reason, the hybridization of meta-heuristic algorithms can help to ascertain the search capabilities of each by compensating for the limitations of one algorithm with the strength of others. Consequently, a new meta-heuristic based t-way strategy called Hybrid Artificial Bee Colony (HABCSm) strategy, based on merging the advantages of the Artificial Bee Colony (ABC) algorithm with the advantages of a Particle Swarm Optimization (PSO) algorithm is proposed in this paper. HABCSm is the first t-way strategy to adopt Hybrid Artificial Bee Colony (HABC) algorithm with Hamming distance as its core method for generating a final test set and the first to adopt the Hamming distance as the final selection criterion for enhancing the exploration of new solutions. The experimental results demonstrate that HABCSm provides superior competitive performance over its counterparts. Therefore, this finding contributes to the field of software testing by minimizing the number of test cases required for test execution.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Using an Expert Panel to Validate the Malaysian SMEs-Software Process Improvement Model (MSME-SPI)
Authors:
Malek Almomani,
Shuib Basri,
Omar Almomani,
Luiz Fernando Capretz,
Abdullateef Balogun,
Moath Husni,
Abdul Rehman Gilal
Abstract:
This paper presents the components of a newly developed Malaysian SMEs - Software Process Improvement model (MSME-SPI) that can assess SMEs soft-ware development industry in managing and improving their software processes capability. The MSME-SPI is developed in response to practitioner needs that were highlighted in an empirical study with the Malaysian SME software development industry. After th…
▽ More
This paper presents the components of a newly developed Malaysian SMEs - Software Process Improvement model (MSME-SPI) that can assess SMEs soft-ware development industry in managing and improving their software processes capability. The MSME-SPI is developed in response to practitioner needs that were highlighted in an empirical study with the Malaysian SME software development industry. After the model development, there is a need for independent feedback to show that the model meets its objectives. Consequently, the validation phase is performed by involving a group of software process improvement experts in examining the MSME-SPI model components. Besides, the effectiveness of the MSME-SPI model is validated using an expert panel. Three criteria were used to evaluate the effectiveness of the model namely: usefulness, verifiability, and structure. The results show the model effective to be used by SMEs with minor modifications. The validation phase contributes towards a better understanding and use of the MSME-SPI model by the practitioners in the field.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Privacy Impacts of Data Encryption on the Efficiency of Digital Forensics Technology
Authors:
Adedayo M. Balogun,
Shao Ying Zhu
Abstract:
Owing to a number of reasons, the deployment of encryption solutions are beginning to be ubiquitous at both organizational and individual levels. The most emphasized reason is the necessity to ensure confidentiality of privileged information. Unfortunately, it is also popular as cyber-criminals' escape route from the grasp of digital forensic investigations. The direct encryption of data or indire…
▽ More
Owing to a number of reasons, the deployment of encryption solutions are beginning to be ubiquitous at both organizational and individual levels. The most emphasized reason is the necessity to ensure confidentiality of privileged information. Unfortunately, it is also popular as cyber-criminals' escape route from the grasp of digital forensic investigations. The direct encryption of data or indirect encryption of storage devices, more often than not, prevents access to such information contained therein. This consequently leaves the forensics investigation team, and subsequently the prosecution, little or no evidence to work with, in sixty percent of such cases. However, it is unthinkable to jeopardize the successes brought by encryption technology to information security, in favour of digital forensics technology. This paper examines what data encryption contributes to information security, and then highlights its contributions to digital forensics of disk drives. The paper also discusses the available ways and tools, in digital forensics, to get around the problems constituted by encryption. A particular attention is paid to the Truecrypt encryption solution to illustrate ideas being discussed. It then compares encryption's contributions in both realms, to justify the need for introduction of new technologies to forensically defeat data encryption as the only solution, whilst maintaining the privacy goal of users.
△ Less
Submitted 11 December, 2013;
originally announced December 2013.