Search | arXiv e-print repository

Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation

Authors: Karen Roberts-Licklider, Theodore Trafalis

Abstract: The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is… ▽ More The aim of this study is to look at predicting whether a person will complete a drug and alcohol rehabilitation program and the number of times a person attends. The study is based on demographic data obtained from Substance Abuse and Mental Health Services Administration (SAMHSA) from both admissions and discharge data from drug and alcohol rehabilitation centers in Oklahoma. Demographic data is highly categorical which led to binary encoding being used and various fairness measures being utilized to mitigate bias of nine demographic variables. Kernel methods such as linear, polynomial, sigmoid, and radial basis functions were compared using support vector machines at various parameter ranges to find the optimal values. These were then compared to methods such as decision trees, random forests, and neural networks. Synthetic Minority Oversampling Technique Nominal (SMOTEN) for categorical data was used to balance the data with imputation for missing data. The nine bias variables were then intersectionalized to mitigate bias and the dual and triple interactions were integrated to use the probabilities to look at worst case ratio fairness mitigation. Disparate Impact, Statistical Parity difference, Conditional Statistical Parity Ratio, Demographic Parity, Demographic Parity Ratio, Equalized Odds, Equalized Odds Ratio, Equal Opportunity, and Equalized Opportunity Ratio were all explored at both the binary and multiclass scenarios. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2311.10832 [pdf, other]

Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

Authors: Elaheh Jafarigol, Theodore Trafalis, Talayeh Razzaghi, Mona Zamankhani

Abstract: In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, fede… ▽ More In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.05790 [pdf, other]

The Paradox of Noise: An Empirical Study of Noise-Infusion Mechanisms to Improve Generalization, Stability, and Privacy in Federated Learning

Authors: Elaheh Jafarigol, Theodore Trafalis

Abstract: In a data-centric era, concerns regarding privacy and ethical data handling grow as machine learning relies more on personal information. This empirical study investigates the privacy, generalization, and stability of deep learning models in the presence of additive noise in federated learning frameworks. Our main objective is to provide strategies to measure the generalization, stability, and pri… ▽ More In a data-centric era, concerns regarding privacy and ethical data handling grow as machine learning relies more on personal information. This empirical study investigates the privacy, generalization, and stability of deep learning models in the presence of additive noise in federated learning frameworks. Our main objective is to provide strategies to measure the generalization, stability, and privacy-preserving capabilities of these models and further improve them. To this end, five noise infusion mechanisms at varying noise levels within centralized and federated learning settings are explored. As model complexity is a key component of the generalization and stability of deep learning models during training and evaluation, a comparative analysis of three Convolutional Neural Network (CNN) architectures is provided. The paper introduces Signal-to-Noise Ratio (SNR) as a quantitative measure of the trade-off between privacy and training accuracy of noise-infused models, aiming to find the noise level that yields optimal privacy and accuracy. Moreover, the Price of Stability and Price of Anarchy are defined in the context of privacy-preserving deep learning, contributing to the systematic investigation of the noise infusion strategies to enhance privacy without compromising performance. Our research sheds light on the delicate balance between these critical factors, fostering a deeper understanding of the implications of noise-based regularization in machine learning. By leveraging noise as a tool for regularization and privacy enhancement, we aim to contribute to the development of robust, privacy-aware algorithms, ensuring that AI-driven solutions prioritize both utility and privacy. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.13161 [pdf]

A Distributed Approach to Meteorological Predictions: Addressing Data Imbalance in Precipitation Prediction Models through Federated Learning and GANs

Authors: Elaheh Jafarigol, Theodore Trafalis

Abstract: The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such… ▽ More The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such as temperature, humidity, wind speed, and pressure, contributing to meteorological conditions. Furthermore, it's imperative that classification algorithms proficiently navigate challenges such as data imbalances, where certain weather events (e.g., storms or extreme temperatures) might be underrepresented. This empirical study explores data augmentation methods to address imbalanced classes in tabular weather data in centralized and federated settings. Employing data augmentation techniques such as the Synthetic Minority Over-sampling Technique or Generative Adversarial Networks can improve the model's accuracy in classifying rare but critical weather events. Moreover, with advancements in federated learning, machine learning models can be trained across decentralized databases, ensuring privacy and data integrity while mitigating the need for centralized data storage and processing. Thus, the classification of weather data stands as a critical bridge, linking raw meteorological data to actionable insights, enhancing our capacity to anticipate and prepare for diverse weather conditions. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.10874 [pdf]

doi 10.1007/s12115-023-00887-0

Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value Survey

Authors: Elaheh Jafarigol, William Keely, Tess Hartog, Tom Welborn, Peyman Hekmatpour, Theodore B. Trafalis

Abstract: This paper is a quantitative analysis of the data collected globally by the World Value Survey. The data is used to study the trajectories of change in individuals' religious beliefs, values, and behaviors in societies. Utilizing random forest, we aim to identify the key factors of religiosity and classify respondents of the survey as religious and non religious using country level data. We use re… ▽ More This paper is a quantitative analysis of the data collected globally by the World Value Survey. The data is used to study the trajectories of change in individuals' religious beliefs, values, and behaviors in societies. Utilizing random forest, we aim to identify the key factors of religiosity and classify respondents of the survey as religious and non religious using country level data. We use resampling techniques to balance the data and improve imbalanced learning performance metrics. The results of the variable importance analysis suggest that Age and Income are the most important variables in the majority of countries. The results are discussed with fundamental sociological theories regarding religion and human behavior. This study is an application of machine learning in identifying the underlying patterns in the data of 30 countries participating in the World Value Survey. The results from variable importance analysis and classification of imbalanced data provide valuable insights beneficial to theoreticians and researchers of social sciences. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Journal ref: Society, pp.1-17 (2023)

arXiv:2310.07917 [pdf]

A Review of Machine Learning Techniques in Imbalanced Data and Future Trends

Authors: Elaheh Jafarigol, Theodore Trafalis

Abstract: For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed… ▽ More For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers in an attempt to provide an in-depth review of various approaches in imbalanced learning from technical and application perspectives. This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains and create a general guideline for researchers in academia or industry who want to dive into the broad field of machine learning using large-scale imbalanced data. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2106.08767 [pdf, other]

doi 10.1007/s10472-023-09887-6

To Raise or Not To Raise: The Autonomous Learning Rate Question

Authors: Xiaomeng Dong, Tao Tan, Michael Potter, Yun-Chan Tsai, Gaurav Kumar, V. Ratna Saripalli, Theodore Trafalis

Abstract: There is a parameter ubiquitous throughout the deep learning world: learning rate. There is likewise a ubiquitous question: what should that learning rate be? The true answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and modify learning rates to achieve optimal training performance. Moreover… ▽ More There is a parameter ubiquitous throughout the deep learning world: learning rate. There is likewise a ubiquitous question: what should that learning rate be? The true answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and modify learning rates to achieve optimal training performance. Moreover, the long hours spent carefully crafting the perfect learning rate can come to nothing the moment your network architecture, optimizer, dataset, or initial conditions change ever so slightly. But it need not be this way. We propose a new answer to the great learning rate question: the Autonomous Learning Rate Controller. Find it at https://github.com/fastestimator/ARC/tree/v2.0 △ Less

Submitted 10 July, 2023; v1 submitted 16 June, 2021; originally announced June 2021.

arXiv:2106.08756 [pdf, other]

Optimizing Data Augmentation Policy Through Random Unidimensional Search

Authors: Xiaomeng Dong, Michael Potter, Gaurav Kumar, Yun-Chan Tsai, V. Ratna Saripalli, Theodore Trafalis

Abstract: It is no secret amongst deep learning researchers that finding the optimal data augmentation strategy during training can mean the difference between state-of-the-art performance and a run-of-the-mill result. To that end, the community has seen many efforts to automate the process of finding the perfect augmentation procedure for any task at hand. Unfortunately, even recent cutting-edge methods br… ▽ More It is no secret amongst deep learning researchers that finding the optimal data augmentation strategy during training can mean the difference between state-of-the-art performance and a run-of-the-mill result. To that end, the community has seen many efforts to automate the process of finding the perfect augmentation procedure for any task at hand. Unfortunately, even recent cutting-edge methods bring massive computational overhead, requiring as many as 100 full model trainings to settle on an ideal configuration. We show how to achieve equivalent performance using just 6 trainings with Random Unidimensional Augmentation. Source code is available at https://github.com/fastestimator/RUA/tree/v1.0 △ Less

Submitted 14 July, 2023; v1 submitted 16 June, 2021; originally announced June 2021.

Showing 1–8 of 8 results for author: Trafalis, T