Search | arXiv e-print repository

Algorithmic Bias in Machine Learning Based Delirium Prediction

Authors: Sandhya Tripathi, Bradley A Fritz, Michael S Avidan, Yixin Chen, Christopher R King

Abstract: Although prediction models for delirium, a commonly occurring condition during general hospitalization or post-surgery, have not gained huge popularity, their algorithmic bias evaluation is crucial due to the existing association between social determinants of health and delirium risk. In this context, using MIMIC-III and another academic hospital dataset, we present some initial experimental evid… ▽ More Although prediction models for delirium, a commonly occurring condition during general hospitalization or post-surgery, have not gained huge popularity, their algorithmic bias evaluation is crucial due to the existing association between social determinants of health and delirium risk. In this context, using MIMIC-III and another academic hospital dataset, we present some initial experimental evidence showing how sociodemographic features such as sex and race can impact the model performance across subgroups. With this work, our intent is to initiate a discussion about the intersectionality effects of old age, race and socioeconomic factors on the early-stage detection and prevention of delirium using ML. △ Less

Submitted 26 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 14 pages

arXiv:2207.03536 [pdf, other]

Deep Learning to Jointly Schema Match, Impute, and Transform Databases

Authors: Sandhya Tripathi, Bradley A. Fritz, Mohamed Abdelhack, Michael S. Avidan, Yixin Chen, Christopher R. King

Abstract: An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlap** features is a prerequisite to develo** and testing robust, generalizable algorithms, especially in health care. We approach this issue in the common but difficult case of numeric features such as nearly Gaussian and binary features, wher… ▽ More An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlap** features is a prerequisite to develo** and testing robust, generalizable algorithms, especially in health care. We approach this issue in the common but difficult case of numeric features such as nearly Gaussian and binary features, where unit changes and variable shift make simple matching of univariate summaries unsuccessful. We develop two novel procedures to address this problem. First, we demonstrate multiple methods of "fingerprinting" a feature based on its associations to other features. In the setting of even modest prior information, this allows most shared features to be accurately identified. Second, we demonstrate a deep learning algorithm for translation between databases. Unlike prior approaches, our algorithm takes advantage of discovered map**s while identifying surrogates for unshared features and learning transformations. In synthetic and real-world experiments using two electronic health record databases, our algorithms outperform existing baselines for matching variable sets, while jointly learning to impute unshared or transformed variables. △ Less

Submitted 22 June, 2022; originally announced July 2022.

arXiv:2107.08574 [pdf, other]

A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues

Authors: Mohamed Abdelhack, Jiaming Zhang, Sandhya Tripathi, Bradley A Fritz, Daniel Felsky, Michael S Avidan, Yixin Chen, Christopher R King

Abstract: Data missingness and quality are common problems in machine learning, especially for high-stakes applications such as healthcare. Developers often train machine learning models on carefully curated datasets using only high quality data; however, this reduces the utility of such models in production environments. We propose a novel neural network modification to mitigate the impacts of low quality… ▽ More Data missingness and quality are common problems in machine learning, especially for high-stakes applications such as healthcare. Developers often train machine learning models on carefully curated datasets using only high quality data; however, this reduces the utility of such models in production environments. We propose a novel neural network modification to mitigate the impacts of low quality and missing data which involves replacing the fixed weights of a fully-connected layer with a function of an additional input. This is inspired from neuromodulation in biological neural networks where the cortex can up- and down-regulate inputs based on their reliability and the presence of other data. In testing, with reliability scores as a modulating signal, models with modulating layers were found to be more robust against degradation of data quality, including additional missingness. These models are superior to imputation as they save on training time by completely skip** the imputation process and further allow the introduction of other data quality measures that imputation cannot handle. Our results suggest that explicitly accounting for reduced information quality with a modulating fully connected layer can enable the deployment of artificial intelligence systems in real-time applications. △ Less

Submitted 22 April, 2023; v1 submitted 18 July, 2021; originally announced July 2021.

Journal ref: Transactions on Machine Learning Research 2023

arXiv:2011.02036 [pdf, other]

(Un)fairness in Post-operative Complication Prediction Models

Authors: Sandhya Tripathi, Bradley A. Fritz, Mohamed Abdelhack, Michael S. Avidan, Yixin Chen, Christopher R. King

Abstract: With the current ongoing debate about fairness, explainability and transparency of machine learning models, their application in high-impact clinical decision-making systems must be scrutinized. We consider a real-life example of risk estimation before surgery and investigate the potential for bias or unfairness of a variety of algorithms. Our approach creates transparent documentation of potentia… ▽ More With the current ongoing debate about fairness, explainability and transparency of machine learning models, their application in high-impact clinical decision-making systems must be scrutinized. We consider a real-life example of risk estimation before surgery and investigate the potential for bias or unfairness of a variety of algorithms. Our approach creates transparent documentation of potential bias so that the users can apply the model carefully. We augment a model-card like analysis using propensity scores with a decision-tree based guide for clinicians that would identify predictable shortcomings of the model. In addition to functioning as a guide for users, we propose that it can guide the algorithm development and informatics team to focus on data sources and structures that can address these shortcomings. △ Less

Submitted 3 November, 2020; originally announced November 2020.

arXiv:1907.12596 [pdf, other]

A Factored Generalized Additive Model for Clinical Decision Support in the Operating Room

Authors: Zhicheng Cui, Bradley A Fritz, Christopher R King, Michael S Avidan, Yixin Chen

Abstract: Logistic regression (LR) is widely used in clinical prediction because it is simple to deploy and easy to interpret. Nevertheless, being a linear model, LR has limited expressive capability and often has unsatisfactory performance. Generalized additive models (GAMs) extend the linear model with transformations of input features, though feature interaction is not allowed for all GAM variants. In th… ▽ More Logistic regression (LR) is widely used in clinical prediction because it is simple to deploy and easy to interpret. Nevertheless, being a linear model, LR has limited expressive capability and often has unsatisfactory performance. Generalized additive models (GAMs) extend the linear model with transformations of input features, though feature interaction is not allowed for all GAM variants. In this paper, we propose a factored generalized additive model (F-GAM) to preserve the model interpretability for targeted features while allowing a rich model for interaction with features fixed within the individual. We evaluate F-GAM on prediction of two targets, postoperative acute kidney injury and acute respiratory failure, from a single-center database. We find superior model performance of F-GAM in terms of AUPRC and AUROC compared to several other GAM implementations, random forests, support vector machine, and a deep neural network. We find that the model interpretability is good with results with high face validity. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: Accepted for publication in AMIA 2019 Annual Symposium

arXiv:1801.00162 [pdf, other]

doi 10.1103/PhysRevD.97.086006

Asymptotic safety of quantum gravity beyond Ricci scalars

Authors: Kevin G. Falls, Callum R. King, Daniel F. Litim, Kostas Nikolakopoulos, Christoph Rahmede

Abstract: We investigate the asymptotic safety conjecture for quantum gravity including curvature invariants beyond Ricci scalars. Our strategy is put to work for families of gravitational actions which depend on functions of the Ricci scalar, the Ricci tensor, and products thereof. Combining functional renormalisation with high order polynomial approximations and full numerical integration we derive the re… ▽ More We investigate the asymptotic safety conjecture for quantum gravity including curvature invariants beyond Ricci scalars. Our strategy is put to work for families of gravitational actions which depend on functions of the Ricci scalar, the Ricci tensor, and products thereof. Combining functional renormalisation with high order polynomial approximations and full numerical integration we derive the renormalisation group flow for all couplings and analyse their fixed points, scaling exponents, and the fixed point effective action as a function of the background Ricci curvature. The theory is characterised by three relevant couplings. Higher-dimensional couplings show near-Gaussian scaling with increasing canonical mass dimension. We find that Ricci tensor invariants stabilise the UV fixed point and lead to a rapid convergence of polynomial approximations. We apply our results to models for cosmology and establish that the gravitational fixed point admits inflationary solutions. We also compare findings with those from $f(R)$-type theories in the same approximation and pin-point the key new effects due to Ricci tensor interactions. Implications for the asymptotic safety conjecture of gravity are indicated. △ Less

Submitted 13 March, 2018; v1 submitted 30 December, 2017; originally announced January 2018.

Comments: 55 pages, 14 figures. v2: Added more discussion and new Sec.2E (Gravitational path integral). Version accepted for publication with PRD

Journal ref: Phys. Rev. D 97, 086006 (2018)

arXiv:1312.7714 [pdf, other]

Prediction and replication from case-control sequencing studies using custom genoty** and additional sequencing

Authors: C. Ryan King, Paul J. Rathouz, Dan L. Nicolae

Abstract: We present two results about using allele-count (AC) burdens of rare SNPs discovered in a case-control sequencing study for prediction or validation in an external prospective study. When genoty** only the SNPs polymorphic in the sequence data, the phenotype to AC correlation tends to be larger in the replication data than the primary study. Conversely, if the replication sample is sequenced, AC… ▽ More We present two results about using allele-count (AC) burdens of rare SNPs discovered in a case-control sequencing study for prediction or validation in an external prospective study. When genoty** only the SNPs polymorphic in the sequence data, the phenotype to AC correlation tends to be larger in the replication data than the primary study. Conversely, if the replication sample is sequenced, ACs of SNPs which are novel in the replication tend to have much smaller or opposite signed associations. We explain this by first deriving the AC-phenotype association implied by a model of diverse SNP effects, and second accounting for the shifted distribution of SNP effects when using a case-control study as a filter for SNP inclusion. In rare diseases, the case population is depleted of protective SNPs and enriched for deleterious SNPs, creating the above difference in AC associations. This phenomenon is most relevant in re-sequencing for risk prediction in rare diseases with heterogeneous rare mutations because it applies to SNPs with MAF near 1 out of the case-control sample size and is exaggerated when SNP log-odds ratios come from a heavy-tailed distribution. It also suggests a ``winner's curse'' in which most risk increasing SNPs at a particular MAF are quickly discovered and future sequencing finds more protective or irrelevant SNPs. △ Less

Submitted 15 October, 2015; v1 submitted 30 December, 2013; originally announced December 2013.

Comments: Essentially final draft

Showing 1–7 of 7 results for author: King, C R