-
Algorithmic Changes Are Not Enough: Evaluating the Removal of Race Adjustment from the eGFR Equation
Authors:
Marika M. Cusick,
Glenn M. Chertow,
Douglas K. Owens,
Michelle Y. Williams,
Sherri Rose
Abstract:
Changing clinical algorithms to remove race adjustment has been proposed and implemented for multiple health conditions. Removing race adjustment from estimated glomerular filtration rate (eGFR) equations may reduce disparities in chronic kidney disease (CKD), but has not been studied in clinical practice after implementation. Here, we assessed whether implementing an eGFR equation (CKD-EPI 2021)…
▽ More
Changing clinical algorithms to remove race adjustment has been proposed and implemented for multiple health conditions. Removing race adjustment from estimated glomerular filtration rate (eGFR) equations may reduce disparities in chronic kidney disease (CKD), but has not been studied in clinical practice after implementation. Here, we assessed whether implementing an eGFR equation (CKD-EPI 2021) without adjustment for Black or African American race modified quarterly rates of nephrology referrals and visits within a single healthcare system, Stanford Health Care (SHC). Our cohort study analyzed 547,194 adult patients aged 21 and older who had at least one recorded serum creatinine or serum cystatin C between January 1, 2019 and September 1, 2023. During the study period, implementation of CKD-EPI 2021 did not modify rates of quarterly nephrology referrals in those documented as Black or African American or in the overall cohort. After adjusting for capacity at SHC nephrology clinics, estimated rates of nephrology referrals and visits with CKD-EPI 2021 were 34 (95% CI 29, 39) and 188 (175, 201) per 10,000 patients documented as Black or African American. If race adjustment had not been removed, estimated rates were nearly identical: 38 (95% CI: 28, 53) and 189 (165, 218) per 10,000 patients. Changes to the eGFR equation are likely insufficient to achieve health equity in CKD care decision-making as many other structural inequities remain.
△ Less
Submitted 25 April, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations
Authors:
Paidamoyo Chapfuwa,
Sherri Rose,
Lawrence Carin,
Edward Meeds,
Ricardo Henao
Abstract:
End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are co…
▽ More
End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are collected under various conditions (inputs), such as treatments, or grouped in some way, such as part of sub-populations. Understanding the effects of these system inputs on system outputs is crucial to have any meaningful model of a dynamical system. To that end, we propose a structured latent ODE model that explicitly captures system input variations within its latent representation. Building on a static latent variable specification, our model learns (independent) stochastic factors of variation for each input to the system, thus separating the effects of the system inputs in the latent space. This approach provides actionable modeling through the controlled generation of time-series data for novel input combinations (or perturbations). Additionally, we propose a flexible approach for quantifying uncertainties, leveraging a quantile regression formulation. Results on challenging biological datasets show consistent improvements over competitive baselines in the controlled generation of observational data and inference of biologically meaningful system inputs.
△ Less
Submitted 16 June, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Why Machine Learning Cannot Ignore Maximum Likelihood Estimation
Authors:
Mark J. van der Laan,
Sherri Rose
Abstract:
The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the great…
▽ More
The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the greatest potential for impact in practice? One could posit many answers to these queries. Here, we assert that one essential idea is for machine learning to integrate maximum likelihood for estimation of functional parameters, such as prediction functions and conditional densities.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Conditional Cross-Design Synthesis Estimators for Generalizability in Medicaid
Authors:
Irina Degtiar,
Tim Layton,
Jacob Wallace,
Sherri Rose
Abstract:
While much of the causal inference literature has focused on addressing internal validity biases, both internal and external validity are necessary for unbiased estimates in a target population of interest. However, few generalizability approaches exist for estimating causal quantities in a target population when the target population is not well-represented by a randomized study but is reflected…
▽ More
While much of the causal inference literature has focused on addressing internal validity biases, both internal and external validity are necessary for unbiased estimates in a target population of interest. However, few generalizability approaches exist for estimating causal quantities in a target population when the target population is not well-represented by a randomized study but is reflected when additionally incorporating observational data. To generalize to a target population represented by a union of these data, we propose a class of novel conditional cross-design synthesis estimators that combine randomized and observational data, while addressing their respective biases. The estimators include outcome regression, propensity weighting, and double robust approaches. All use the covariate overlap between the randomized and observational data to remove potential unmeasured confounding bias. We apply these methods to estimate the causal effect of managed care plans on health care spending among Medicaid beneficiaries in New York City.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Uncertainty in Lung Cancer Stage for Outcome Estimation via Set-Valued Classification
Authors:
Savannah Bergquist,
Gabriel Brooks,
Mary Beth Landrum,
Nancy Keating,
Sherri Rose
Abstract:
Difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in outcomes estimation. Leveraging set-valued cl…
▽ More
Difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in outcomes estimation. Leveraging set-valued classification and split conformal inference, we show how a fixed algorithm developed in one cohort of data may be deployed in another, while rigorously accounting for uncertainty from the initial classification step. We demonstrate this process using SEER cancer registry data linked with Medicare claims data.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Identifying Undercompensated Groups Defined By Multiple Attributes in Risk Adjustment
Authors:
Anna Zink,
Sherri Rose
Abstract:
Risk adjustment in health care aims to redistribute payments to insurers based on costs. However, risk adjustment formulas are known to underestimate costs for some groups of patients. This undercompensation makes these groups unprofitable to insurers and creates incentives for insurers to discriminate. We develop a machine learning method for "group importance" to identify unprofitable groups def…
▽ More
Risk adjustment in health care aims to redistribute payments to insurers based on costs. However, risk adjustment formulas are known to underestimate costs for some groups of patients. This undercompensation makes these groups unprofitable to insurers and creates incentives for insurers to discriminate. We develop a machine learning method for "group importance" to identify unprofitable groups defined by multiple attributes, improving on the arbitrary nature of existing evaluations. This procedure was designed to evaluate the risk adjustment formulas used in the U.S. health insurance Marketplaces as well as Medicare. We find that a number of previously unidentified groups with multiple chronic conditions are undercompensated in the Marketplaces risk adjustment formula, while groups without chronic conditions tend to be overcompensated in the Marketplaces. The magnitude of undercompensation when defining groups with multiple attributes is larger than with single attributes. No complex groups were found to be consistently under- or overcompensated in the Medicare risk adjustment formula. Our work provides policy makers with new information on potential targets of discrimination in the health care system and a path towards more equitable health coverage.
△ Less
Submitted 26 July, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Considerations Across Three Cultures: Parametric Regressions, Interpretable Algorithms, and Complex Algorithms
Authors:
Ani Eloyan,
Sherri Rose
Abstract:
We consider an extension of Leo Breiman's thesis from "Statistical Modeling: The Two Cultures" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms.
We consider an extension of Leo Breiman's thesis from "Statistical Modeling: The Two Cultures" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
A Review of Generalizability and Transportability
Authors:
Irina Degtiar,
Sherri Rose
Abstract:
When assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data…
▽ More
When assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data may better reflect the target population, and hence be more likely to have external validity, but are subject to potential bias due to unmeasured confounding. While much of the causal inference literature has focused on addressing internal validity bias, both internal and external validity are necessary for unbiased estimates in a target population. This paper presents a framework for addressing external validity bias, including a synthesis of approaches for generalizability and transportability, the assumptions they require, as well as tests for the heterogeneity of treatment effects and differences between study and target populations.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Fair Regression for Health Care Spending
Authors:
Anna Zink,
Sherri Rose
Abstract:
The distribution of health care payments to insurance plans has substantial consequences for social policy. Risk adjustment formulas predict spending in health insurance markets in order to provide fair benefits and health care coverage for all enrollees, regardless of their health status. Unfortunately, current risk adjustment formulas are known to underpredict spending for specific groups of enr…
▽ More
The distribution of health care payments to insurance plans has substantial consequences for social policy. Risk adjustment formulas predict spending in health insurance markets in order to provide fair benefits and health care coverage for all enrollees, regardless of their health status. Unfortunately, current risk adjustment formulas are known to underpredict spending for specific groups of enrollees leading to undercompensated payments to health insurers. This incentivizes insurers to design their plans such that individuals in undercompensated groups will be less likely to enroll, impacting access to health care for these groups. To improve risk adjustment formulas for undercompensated groups, we expand on concepts from the statistics, computer science, and health economics literature to develop new fair regression methods for continuous outcomes by building fairness considerations directly into the objective function. We additionally propose a novel measure of fairness while asserting that a suite of metrics is necessary in order to evaluate risk adjustment formulas more fully. Our data application using the IBM MarketScan Research Databases and simulation studies demonstrate that these new fair regression methods may lead to massive improvements in group fairness (e.g., 98%) with only small reductions in overall fit (e.g., 4%).
△ Less
Submitted 13 July, 2019; v1 submitted 27 January, 2019;
originally announced January 2019.
-
Consistent Estimation of Propensity Score Functions with Oversampled Exposed Subjects
Authors:
Sherri Rose
Abstract:
Observational cohort studies with oversampled exposed subjects are typically implemented to understand the causal effect of a rare exposure. Because the distribution of exposed subjects in the sample differs from the source population, estimation of a propensity score function (i.e., probability of exposure given baseline covariates) targets a nonparametrically nonidentifiable parameter. Consisten…
▽ More
Observational cohort studies with oversampled exposed subjects are typically implemented to understand the causal effect of a rare exposure. Because the distribution of exposed subjects in the sample differs from the source population, estimation of a propensity score function (i.e., probability of exposure given baseline covariates) targets a nonparametrically nonidentifiable parameter. Consistent estimation of propensity score functions is an important component of various causal inference estimators, including double robust machine learning and inverse probability weighted estimators. This paper develops the use of the probability of exposure from the source population in a flexible computational implementation that can be used with any algorithm that allows observation weighting to produce consistent estimators of propensity score functions. Simulation studies and a hypothetical health policy intervention data analysis demonstrate low empirical bias and variance for these propensity score function estimators with observation weights in finite samples.
△ Less
Submitted 12 February, 2019; v1 submitted 19 May, 2018;
originally announced May 2018.
-
Nonparametric Bayesian Instrumental Variable Analysis: Evaluating Heterogeneous Effects of Coronary Arterial Access Site Strategies
Authors:
Samrachana Adhikari,
Sherri Rose,
Sharon-Lise Normand
Abstract:
Percutaneous coronary interventions (PCIs) are nonsurgical procedures to open blocked blood vessels to the heart, frequently using a catheter to place a stent. The catheter can be inserted into the blood vessels using an artery in the groin or an artery in the wrist. Because clinical trials have indicated that access via the wrist may result in fewer post procedure complications, shortening the le…
▽ More
Percutaneous coronary interventions (PCIs) are nonsurgical procedures to open blocked blood vessels to the heart, frequently using a catheter to place a stent. The catheter can be inserted into the blood vessels using an artery in the groin or an artery in the wrist. Because clinical trials have indicated that access via the wrist may result in fewer post procedure complications, shortening the length of stay, and ultimately cost less than groin access, adoption of access via the wrist has been encouraged. However, patients treated in usual care are likely to differ from those participating in clinical trials, and there is reason to believe that the effectiveness of wrist access may differ between males and females. Moreover, the choice of artery access strategy is likely to be influenced by patient or physician unmeasured factors. To study the effectiveness of the two artery access site strategies on hospitalization charges, we use data from a state-mandated clinical registry including 7,963 patients undergoing PCI. A hierarchical Bayesian likelihood-based instrumental variable analysis under a latent index modeling framework is introduced to jointly model outcomes and treatment status. Our approach accounts for unobserved heterogeneity via a latent factor structure, and permits nonparametric error distributions with Dirichlet process mixture models. Our results demonstrate that artery access in the wrist reduces hospitalization charges compared to access in the groin, with higher mean reduction for male patients.
△ Less
Submitted 3 November, 2019; v1 submitted 21 April, 2018;
originally announced April 2018.
-
A Convex Reconstruction Model for X-ray Tomographic Imaging with Uncertain Flat-fields
Authors:
Hari Om Aggrawal,
Martin Skovgaard Andersen,
Sean Rose,
Emil Y. Sidky
Abstract:
Classical methods for X-ray computed tomography are based on the assumption that the X-ray source intensity is known, but in practice, the intensity is measured and hence uncertain. Under normal operating conditions, when the exposure time is sufficiently high, this kind of uncertainty typically has a negligible effect on the reconstruction quality. However, in time- or dose-limited applications s…
▽ More
Classical methods for X-ray computed tomography are based on the assumption that the X-ray source intensity is known, but in practice, the intensity is measured and hence uncertain. Under normal operating conditions, when the exposure time is sufficiently high, this kind of uncertainty typically has a negligible effect on the reconstruction quality. However, in time- or dose-limited applications such as dynamic CT, this uncertainty may cause severe and systematic artifacts known as ring artifacts. By carefully modeling the measurement process and by taking uncertainties into account, we derive a new convex model that leads to improved reconstructions despite poor quality measurements. We demonstrate the effectiveness of the methodology based on simulated and real data sets.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Reduction of wind power variability through geographic diversity
Authors:
Mark Handschy,
Stephen Rose,
Jay Apt
Abstract:
The variability of wind-generated electricity can be reduced by aggregating the outputs of wind generation plants spread over a large geographic area. In this chapter we utilize Monte Carlo simulations to investigate upper bounds on the degree of achievable smoothing and clarify how the degree of smoothing depends on the number of plants and on the size of the geographic area over which they are s…
▽ More
The variability of wind-generated electricity can be reduced by aggregating the outputs of wind generation plants spread over a large geographic area. In this chapter we utilize Monte Carlo simulations to investigate upper bounds on the degree of achievable smoothing and clarify how the degree of smoothing depends on the number of plants and on the size of the geographic area over which they are spread.
△ Less
Submitted 22 August, 2016;
originally announced August 2016.