-
Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance
Authors:
Chiraag Kaushik,
Ran Liu,
Chi-Heng Lin,
Amrit Khera,
Matthew Y **,
Wenrui Ma,
Vidya Muthukumar,
Eva L Dyer
Abstract:
Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class dispariti…
▽ More
Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pretrained encoders and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pretrained features may have unknown biases that can be diagnosed through their spectra.
△ Less
Submitted 3 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
RMExplorer: A Visual Analytics Approach to Explore the Performance and the Fairness of Disease Risk Models on Population Subgroups
Authors:
Bum Chul Kwon,
Uri Kartoun,
Shaan Khurshid,
Mikhail Yurochkin,
Subha Maity,
Deanna G Brockman,
Amit V Khera,
Patrick T Ellinor,
Steven A Lubitz,
Kenney Ng
Abstract:
Disease risk models can identify high-risk patients and help clinicians provide more personalized care. However, risk models developed on one dataset may not generalize across diverse subpopulations of patients in different datasets and may have unexpected performance. It is challenging for clinical researchers to inspect risk models across different subgroups without any tools. Therefore, we deve…
▽ More
Disease risk models can identify high-risk patients and help clinicians provide more personalized care. However, risk models developed on one dataset may not generalize across diverse subpopulations of patients in different datasets and may have unexpected performance. It is challenging for clinical researchers to inspect risk models across different subgroups without any tools. Therefore, we developed an interactive visualization system called RMExplorer (Risk Model Explorer) to enable interactive risk model assessment. Specifically, the system allows users to define subgroups of patients by selecting clinical, demographic, or other characteristics, to explore the performance and fairness of risk models on the subgroups, and to understand the feature contributions to risk scores. To demonstrate the usefulness of the tool, we conduct a case study, where we use RMExplorer to explore three atrial fibrillation risk models by applying them to the UK Biobank dataset of 445,329 individuals. RMExplorer can help researchers to evaluate the performance and biases of risk models on subpopulations of interest in their data.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Bayesian Neural Networks
Authors:
Vikram Mullachery,
Aniruddh Khera,
Amir Husain
Abstract:
This paper describes and discusses Bayesian Neural Network (BNN). The paper showcases a few different applications of them for classification and regression problems. BNNs are comprised of a Probabilistic Model and a Neural Network. The intent of such a design is to combine the strengths of Neural Networks and Stochastic modeling. Neural Networks exhibit continuous function approximator capabiliti…
▽ More
This paper describes and discusses Bayesian Neural Network (BNN). The paper showcases a few different applications of them for classification and regression problems. BNNs are comprised of a Probabilistic Model and a Neural Network. The intent of such a design is to combine the strengths of Neural Networks and Stochastic modeling. Neural Networks exhibit continuous function approximator capabilities. Stochastic models allow direct specification of a model with known interaction between parameters to generate data. During the prediction phase, stochastic models generate a complete posterior distribution and produce probabilistic guarantees on the predictions. Thus BNNs are a unique combination of neural network and stochastic models with the stochastic model forming the core of this integration. BNNs can then produce probabilistic guarantees on it's predictions and also generate the distribution of parameters that it has learnt from the observations. That means, in the parameter space, one can deduce the nature and shape of the neural network's learnt parameters. These two characteristics makes them highly attractive to theoreticians as well as practitioners. Recently there has been a lot of activity in this area, with the advent of numerous probabilistic programming libraries such as: PyMC3, Edward, Stan etc. Further this area is rapidly gaining ground as a standard machine learning approach for numerous problems
△ Less
Submitted 30 January, 2018; v1 submitted 23 January, 2018;
originally announced January 2018.