Search | arXiv e-print repository

arXiv:2312.11969 [pdf, other]

doi 10.1007/978-3-031-33374-3_41

GroupMixNorm Layer for Learning Fair Models

Authors: Anubha Pandey, Aditi Rai, Maneet Singh, Deepak Bhatt, Tanmoy Bhowmik

Abstract: Recent research has identified discriminatory behavior of automated prediction algorithms towards groups identified on specific protected attributes (e.g., gender, ethnicity, age group, etc.). When deployed in real-world scenarios, such techniques may demonstrate biased predictions resulting in unfair outcomes. Recent literature has witnessed algorithms for mitigating such biased behavior mostly b… ▽ More Recent research has identified discriminatory behavior of automated prediction algorithms towards groups identified on specific protected attributes (e.g., gender, ethnicity, age group, etc.). When deployed in real-world scenarios, such techniques may demonstrate biased predictions resulting in unfair outcomes. Recent literature has witnessed algorithms for mitigating such biased behavior mostly by adding convex surrogates of fairness metrics such as demographic parity or equalized odds in the loss function, which are often not easy to estimate. This research proposes a novel in-processing based GroupMixNorm layer for mitigating bias from deep learning models. The GroupMixNorm layer probabilistically mixes group-level feature statistics of samples across different groups based on the protected attribute. The proposed method improves upon several fairness metrics with minimal impact on overall accuracy. Analysis on benchmark tabular and image datasets demonstrates the efficacy of the proposed method in achieving state-of-the-art performance. Further, the experimental analysis also suggests the robustness of the GroupMixNorm layer against new protected attributes during inference and its utility in eliminating bias from a pre-trained network. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 12 pages, 6 figures, Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2023

arXiv:2303.02776 [pdf, other]

A Low-Cost Portable Apparatus to Analyze Oral Fluid Droplets and Quantify the Efficacy of Masks

Authors: Ava Tan Bhowmik

Abstract: Every year, about 4 million people die from upper respiratory infections. Mask-wearing is crucial in preventing the spread of pathogen-containing droplets, which is the primary cause of these illnesses. However, most techniques for mask efficacy evaluation are expensive to set up and complex to operate. In this work, a novel, low-cost, and quantitative metrology to visualize, track, and analyze or… ▽ More Every year, about 4 million people die from upper respiratory infections. Mask-wearing is crucial in preventing the spread of pathogen-containing droplets, which is the primary cause of these illnesses. However, most techniques for mask efficacy evaluation are expensive to set up and complex to operate. In this work, a novel, low-cost, and quantitative metrology to visualize, track, and analyze orally-generated fluid droplets is developed. The project has four stages: setup optimization, data collection, data analysis, and application development. The metrology was initially developed in a dark closet as a proof of concept using common household materials and was subsequently implemented into a portable apparatus. Tonic water and UV darklight tube lights are selected to visualize fluorescent droplet and aerosol propagation with automated analysis developed using open-source software. The dependencies of oral fluid droplet generation and propagation on various factors are studied in detail and established using this metrology. Additionally, the smallest detectable droplet size was mathematically correlated to height and airborne time. The efficacy of different types of masks is evaluated and associated with fabric microstructures. It is found that masks with smaller-sized pores and thicker material are more effective. This technique can easily be constructed at home using materials that total to a cost of below \$60, thereby enabling a low-cost and accurate metrology. △ Less

Submitted 5 March, 2023; originally announced March 2023.

Comments: 13 pages, 15 figures. arXiv admin note: substantial text overlap with arXiv:2201.03993

arXiv:2210.13182 [pdf, other]

Simultaneous Improvement of ML Model Fairness and Performance by Identifying Bias in Data

Authors: Bhushan Chaudhari, Akash Agarwal, Tanmoy Bhowmik

Abstract: Machine learning models built on datasets containing discriminative instances attributed to various underlying factors result in biased and unfair outcomes. It's a well founded and intuitive fact that existing bias mitigation strategies often sacrifice accuracy in order to ensure fairness. But when AI engine's prediction is used for decision making which reflects on revenue or operational efficien… ▽ More Machine learning models built on datasets containing discriminative instances attributed to various underlying factors result in biased and unfair outcomes. It's a well founded and intuitive fact that existing bias mitigation strategies often sacrifice accuracy in order to ensure fairness. But when AI engine's prediction is used for decision making which reflects on revenue or operational efficiency such as credit risk modelling, it would be desirable by the business if accuracy can be somehow reasonably preserved. This conflicting requirement of maintaining accuracy and fairness in AI motivates our research. In this paper, we propose a fresh approach for simultaneous improvement of fairness and accuracy of ML models within a realistic paradigm. The essence of our work is a data preprocessing technique that can detect instances ascribing a specific kind of bias that should be removed from the dataset before training and we further show that such instance removal will have no adverse impact on model accuracy. In particular, we claim that in the problem settings where instances exist with similar feature but different labels caused by variation in protected attributes , an inherent bias gets induced in the dataset, which can be identified and mitigated through our novel scheme. Our experimental evaluation on two open-source datasets demonstrates how the proposed method can mitigate bias along with improving rather than degrading accuracy, while offering certain set of control for end user. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.13023 [pdf, other]

FairGen: Fair Synthetic Data Generation

Authors: Bhushan Chaudhari, Himanshu Chaudhary, Aakash Agarwal, Kamna Meena, Tanmoy Bhowmik

Abstract: With the rising adoption of Machine Learning across the domains like banking, pharmaceutical, ed-tech, etc, it has become utmost important to adopt responsible AI methods to ensure models are not unfairly discriminating against any group. Given the lack of clean training data, generative adversarial techniques are preferred to generate synthetic data with several state-of-the-art architectures rea… ▽ More With the rising adoption of Machine Learning across the domains like banking, pharmaceutical, ed-tech, etc, it has become utmost important to adopt responsible AI methods to ensure models are not unfairly discriminating against any group. Given the lack of clean training data, generative adversarial techniques are preferred to generate synthetic data with several state-of-the-art architectures readily available across various domains from unstructured data such as text, images to structured datasets modelling fraud detection and many more. These techniques overcome several challenges such as class imbalance, limited training data, restricted access to data due to privacy issues. Existing work focusing on generating fair data either works for a certain GAN architecture or is very difficult to tune across the GANs. In this paper, we propose a pipeline to generate fairer synthetic data independent of the GAN architecture. The proposed paper utilizes a pre-processing algorithm to identify and remove bias inducing samples. In particular, we claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples. Our experimental evaluation on two open-source datasets demonstrates how the proposed pipeline is generating fair data along with improved performance in some cases. △ Less

Submitted 1 December, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2208.09079 [pdf, other]

A Multi-Modal Wildfire Prediction and Personalized Early-Warning System Based on a Novel Machine Learning Framework

Authors: Rohan Tan Bhowmik

Abstract: Wildfires are increasingly impacting the environment, human health and safety. Among the top 20 California wildfires, those in 2020-2021 burned more acres than the last century combined. California's 2018 wildfire season caused damages of $148.5 billion. Among millions of impacted people, those living with disabilities (around 15% of the world population) are disproportionately impacted due to ina… ▽ More Wildfires are increasingly impacting the environment, human health and safety. Among the top 20 California wildfires, those in 2020-2021 burned more acres than the last century combined. California's 2018 wildfire season caused damages of $148.5 billion. Among millions of impacted people, those living with disabilities (around 15% of the world population) are disproportionately impacted due to inadequate means of alerts. In this project, a multi-modal wildfire prediction and personalized early warning system has been developed based on an advanced machine learning architecture. Sensor data from the Environmental Protection Agency and historical wildfire data from 2012 to 2018 have been compiled to establish a comprehensive wildfire database, the largest of its kind. Next, a novel U-Convolutional-LSTM (Long Short-Term Memory) neural network was designed with a special architecture for extracting key spatial and temporal features from contiguous environmental parameters indicative of impending wildfires. Environmental and meteorological factors were incorporated into the database and classified as leading indicators and trailing indicators, correlated to risks of wildfire conception and propagation respectively. Additionally, geological data was used to provide better wildfire risk assessment. This novel spatio-temporal neural network achieved >97% accuracy vs. around 76% using traditional convolutional neural networks, successfully predicting 2018's five most devastating wildfires 5-14 days in advance. Finally, a personalized early warning system, tailored to individuals with sensory disabilities or respiratory exacerbation conditions, was proposed. This technique would enable fire departments to anticipate and prevent wildfires before they strike and provide early warnings for at-risk individuals for better preparation, thereby saving lives and reducing economic damages. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2201.03993 [pdf, other]

A Novel Home-Built Metrology to Analyze Oral Fluid Droplets and Quantify the Efficacy of Masks

Authors: Ava Tan Bhowmik

Abstract: Wearing masks is crucial to preventing the spread of potentially pathogen-containing droplets, especially amidst the COVID-19 pandemic. However, not all face coverings are equally effective and most experiments evaluating mask efficacy are very expensive and complex to operate. In this work, a novel, home-built, low-cost, and accurate metrology to visualize orally-generated fluid droplets has been… ▽ More Wearing masks is crucial to preventing the spread of potentially pathogen-containing droplets, especially amidst the COVID-19 pandemic. However, not all face coverings are equally effective and most experiments evaluating mask efficacy are very expensive and complex to operate. In this work, a novel, home-built, low-cost, and accurate metrology to visualize orally-generated fluid droplets has been developed. The project includes setup optimization, data collection, data analysis, and applications. The final materials chosen were quinine-containing tonic water, 397-402 nm wavelength UV tube lights, an iPhone and tripod, string, and a spray bottle. The experiment took place in a dark closet with a dark background. During data collection, the test subject first wets their mouth with an ingestible fluorescent liquid (tonic water) and speaks, sneezes, or coughs under UV darklight. The fluorescence from the tonic water droplets generated can be visualized, recorded by an iPhone 8+ camera in slo-mo (240 fps), and analyzed. The software VLC is used for frame separation and Fiji/ImageJ is used for image processing and analysis. The dependencies of oral fluid droplet generation and propagation on different phonics, the loudness of speech, and the type of expiratory event were studied in detail and established using the metrology developed. The efficacy of different types of masks was evaluated and correlated with fabric microstructures. All masks blocked droplets to varying extent. Masks with smaller-sized pores and thicker material were found to block the most droplets. This low-cost technique can be easily constructed at home using materials that total to a cost of less than $50. Despite the minimal cost, the method is very accurate and the data is quantifiable. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: 9 pages, 12 figures

arXiv:2103.03086 [pdf, other]

A Multi-Modal Respiratory Disease Exacerbation Prediction Technique Based on a Spatio-Temporal Machine Learning Architecture

Authors: Rohan Tan Bhowmik

Abstract: Chronic respiratory diseases, such as chronic obstructive pulmonary disease and asthma, are a serious health crisis, affecting a large number of people globally and inflicting major costs on the economy. Current methods for assessing the progression of respiratory symptoms are either subjective and inaccurate, or complex and cumbersome, and do not incorporate environmental factors. Lacking predict… ▽ More Chronic respiratory diseases, such as chronic obstructive pulmonary disease and asthma, are a serious health crisis, affecting a large number of people globally and inflicting major costs on the economy. Current methods for assessing the progression of respiratory symptoms are either subjective and inaccurate, or complex and cumbersome, and do not incorporate environmental factors. Lacking predictive assessments and early intervention, unexpected exacerbations can lead to hospitalizations and high medical costs. This work presents a multi-modal solution for predicting the exacerbation risks of respiratory diseases, such as COPD, based on a novel spatio-temporal machine learning architecture for real-time and accurate respiratory events detection, and tracking of local environmental and meteorological data and trends. The proposed new machine learning architecture blends key attributes of both convolutional and recurrent neural networks, allowing extraction of both spatial and temporal features encoded in respiratory sounds, thereby leading to accurate classification and tracking of symptoms. Combined with the data from environmental and meteorological sensors, and a predictive model based on retrospective medical studies, this solution can assess and provide early warnings of respiratory disease exacerbations. This research will improve the quality of patients' lives through early medical intervention, thereby reducing hospitalization rates and medical costs. △ Less

Submitted 3 April, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

Comments: Updated Title, References, and Acknowledgements

Showing 1–7 of 7 results for author: Bhowmik, T