Search | arXiv e-print repository

Algorithmic Changes Are Not Enough: Evaluating the Removal of Race Adjustment from the eGFR Equation

Authors: Marika M. Cusick, Glenn M. Chertow, Douglas K. Owens, Michelle Y. Williams, Sherri Rose

Abstract: Changing clinical algorithms to remove race adjustment has been proposed and implemented for multiple health conditions. Removing race adjustment from estimated glomerular filtration rate (eGFR) equations may reduce disparities in chronic kidney disease (CKD), but has not been studied in clinical practice after implementation. Here, we assessed whether implementing an eGFR equation (CKD-EPI 2021)… ▽ More Changing clinical algorithms to remove race adjustment has been proposed and implemented for multiple health conditions. Removing race adjustment from estimated glomerular filtration rate (eGFR) equations may reduce disparities in chronic kidney disease (CKD), but has not been studied in clinical practice after implementation. Here, we assessed whether implementing an eGFR equation (CKD-EPI 2021) without adjustment for Black or African American race modified quarterly rates of nephrology referrals and visits within a single healthcare system, Stanford Health Care (SHC). Our cohort study analyzed 547,194 adult patients aged 21 and older who had at least one recorded serum creatinine or serum cystatin C between January 1, 2019 and September 1, 2023. During the study period, implementation of CKD-EPI 2021 did not modify rates of quarterly nephrology referrals in those documented as Black or African American or in the overall cohort. After adjusting for capacity at SHC nephrology clinics, estimated rates of nephrology referrals and visits with CKD-EPI 2021 were 34 (95% CI 29, 39) and 188 (175, 201) per 10,000 patients documented as Black or African American. If race adjustment had not been removed, estimated rates were nearly identical: 38 (95% CI: 28, 53) and 189 (165, 218) per 10,000 patients. Changes to the eGFR equation are likely insufficient to achieve health equity in CKD care decision-making as many other structural inequities remain. △ Less

Submitted 25 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted to Conference on Health, Inference, and Learning (CHIL) 2024

arXiv:2202.12932 [pdf, other]

Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

Authors: Paidamoyo Chapfuwa, Sherri Rose, Lawrence Carin, Edward Meeds, Ricardo Henao

Abstract: End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are co… ▽ More End-to-end learning of dynamical systems with black-box models, such as neural ordinary differential equations (ODEs), provides a flexible framework for learning dynamics from data without prescribing a mathematical model for the dynamics. Unfortunately, this flexibility comes at the cost of understanding the dynamical system, for which ODEs are used ubiquitously. Further, experimental data are collected under various conditions (inputs), such as treatments, or grouped in some way, such as part of sub-populations. Understanding the effects of these system inputs on system outputs is crucial to have any meaningful model of a dynamical system. To that end, we propose a structured latent ODE model that explicitly captures system input variations within its latent representation. Building on a static latent variable specification, our model learns (independent) stochastic factors of variation for each input to the system, thus separating the effects of the system inputs in the latent space. This approach provides actionable modeling through the controlled generation of time-series data for novel input combinations (or perturbations). Additionally, we propose a flexible approach for quantifying uncertainties, leveraging a quantile regression formulation. Results on challenging biological datasets show consistent improvements over competitive baselines in the controlled generation of observational data and inference of biologically meaningful system inputs. △ Less

Submitted 16 June, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: Accepted for the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022). Github code can be found at https://github.com/paidamoyo/structured_latent_ODEs

arXiv:2110.12112 [pdf, ps, other]

Why Machine Learning Cannot Ignore Maximum Likelihood Estimation

Authors: Mark J. van der Laan, Sherri Rose

Abstract: The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the great… ▽ More The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the greatest potential for impact in practice? One could posit many answers to these queries. Here, we assert that one essential idea is for machine learning to integrate maximum likelihood for estimation of functional parameters, such as prediction functions and conditional densities. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 30 pages. Forthcoming as a chapter in the Handbook of Matching and Weighting in Causal Inference

arXiv:2109.13288 [pdf, other]

doi 10.1111/biom.13863

Conditional Cross-Design Synthesis Estimators for Generalizability in Medicaid

Authors: Irina Degtiar, Tim Layton, Jacob Wallace, Sherri Rose

Abstract: While much of the causal inference literature has focused on addressing internal validity biases, both internal and external validity are necessary for unbiased estimates in a target population of interest. However, few generalizability approaches exist for estimating causal quantities in a target population when the target population is not well-represented by a randomized study but is reflected… ▽ More While much of the causal inference literature has focused on addressing internal validity biases, both internal and external validity are necessary for unbiased estimates in a target population of interest. However, few generalizability approaches exist for estimating causal quantities in a target population when the target population is not well-represented by a randomized study but is reflected when additionally incorporating observational data. To generalize to a target population represented by a union of these data, we propose a class of novel conditional cross-design synthesis estimators that combine randomized and observational data, while addressing their respective biases. The estimators include outcome regression, propensity weighting, and double robust approaches. All use the covariate overlap between the randomized and observational data to remove potential unmeasured confounding bias. We apply these methods to estimate the causal effect of managed care plans on health care spending among Medicaid beneficiaries in New York City. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: 25 pages, 4 figures; supplement of 31 pages, 12 figures, and 4 tables

MSC Class: 62G05 (Primary); 62P25 (Secondary)

Journal ref: Biometrics (2023)

arXiv:2107.01251 [pdf, other]

doi 10.1002/sim.9448

Uncertainty in Lung Cancer Stage for Outcome Estimation via Set-Valued Classification

Authors: Savannah Bergquist, Gabriel Brooks, Mary Beth Landrum, Nancy Keating, Sherri Rose

Abstract: Difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in outcomes estimation. Leveraging set-valued cl… ▽ More Difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in outcomes estimation. Leveraging set-valued classification and split conformal inference, we show how a fixed algorithm developed in one cohort of data may be deployed in another, while rigorously accounting for uncertainty from the initial classification step. We demonstrate this process using SEER cancer registry data linked with Medicare claims data. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: Code available at: https://github.com/sl-bergquist/cancer_classification

Journal ref: Statistics in Medicine (2022)

arXiv:2105.08493 [pdf, other]

doi 10.1136/bmjhci-2021-100414

Identifying Undercompensated Groups Defined By Multiple Attributes in Risk Adjustment

Authors: Anna Zink, Sherri Rose

Abstract: Risk adjustment in health care aims to redistribute payments to insurers based on costs. However, risk adjustment formulas are known to underestimate costs for some groups of patients. This undercompensation makes these groups unprofitable to insurers and creates incentives for insurers to discriminate. We develop a machine learning method for "group importance" to identify unprofitable groups def… ▽ More Risk adjustment in health care aims to redistribute payments to insurers based on costs. However, risk adjustment formulas are known to underestimate costs for some groups of patients. This undercompensation makes these groups unprofitable to insurers and creates incentives for insurers to discriminate. We develop a machine learning method for "group importance" to identify unprofitable groups defined by multiple attributes, improving on the arbitrary nature of existing evaluations. This procedure was designed to evaluate the risk adjustment formulas used in the U.S. health insurance Marketplaces as well as Medicare. We find that a number of previously unidentified groups with multiple chronic conditions are undercompensated in the Marketplaces risk adjustment formula, while groups without chronic conditions tend to be overcompensated in the Marketplaces. The magnitude of undercompensation when defining groups with multiple attributes is larger than with single attributes. No complex groups were found to be consistently under- or overcompensated in the Medicare risk adjustment formula. Our work provides policy makers with new information on potential targets of discrimination in the health care system and a path towards more equitable health coverage. △ Less

Submitted 26 July, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Journal ref: BMJ Health & Care Informatics (2021)

arXiv:2104.06571 [pdf, ps, other]

Considerations Across Three Cultures: Parametric Regressions, Interpretable Algorithms, and Complex Algorithms

Authors: Ani Eloyan, Sherri Rose

Abstract: We consider an extension of Leo Breiman's thesis from "Statistical Modeling: The Two Cultures" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms. We consider an extension of Leo Breiman's thesis from "Statistical Modeling: The Two Cultures" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: 7 pages, forthcoming in Observational Studies

Journal ref: Observational Studies (2021); 7(1):191-196. https://muse.jhu.edu/article/799734

arXiv:2102.11904 [pdf, other]

doi 10.1146/annurev-statistics-042522-103837

A Review of Generalizability and Transportability

Authors: Irina Degtiar, Sherri Rose

Abstract: When assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data… ▽ More When assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data may better reflect the target population, and hence be more likely to have external validity, but are subject to potential bias due to unmeasured confounding. While much of the causal inference literature has focused on addressing internal validity bias, both internal and external validity are necessary for unbiased estimates in a target population. This paper presents a framework for addressing external validity bias, including a synthesis of approaches for generalizability and transportability, the assumptions they require, as well as tests for the heterogeneity of treatment effects and differences between study and target populations. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: 30 pages, 3 figures

MSC Class: 62-02

Journal ref: Annual Review of Statistics and Its Application (2023)

arXiv:1901.10566 [pdf, other]

doi 10.1111/biom.13206

Fair Regression for Health Care Spending

Authors: Anna Zink, Sherri Rose

Abstract: The distribution of health care payments to insurance plans has substantial consequences for social policy. Risk adjustment formulas predict spending in health insurance markets in order to provide fair benefits and health care coverage for all enrollees, regardless of their health status. Unfortunately, current risk adjustment formulas are known to underpredict spending for specific groups of enr… ▽ More The distribution of health care payments to insurance plans has substantial consequences for social policy. Risk adjustment formulas predict spending in health insurance markets in order to provide fair benefits and health care coverage for all enrollees, regardless of their health status. Unfortunately, current risk adjustment formulas are known to underpredict spending for specific groups of enrollees leading to undercompensated payments to health insurers. This incentivizes insurers to design their plans such that individuals in undercompensated groups will be less likely to enroll, impacting access to health care for these groups. To improve risk adjustment formulas for undercompensated groups, we expand on concepts from the statistics, computer science, and health economics literature to develop new fair regression methods for continuous outcomes by building fairness considerations directly into the objective function. We additionally propose a novel measure of fairness while asserting that a suite of metrics is necessary in order to evaluate risk adjustment formulas more fully. Our data application using the IBM MarketScan Research Databases and simulation studies demonstrate that these new fair regression methods may lead to massive improvements in group fairness (e.g., 98%) with only small reductions in overall fit (e.g., 4%). △ Less

Submitted 13 July, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

Comments: 30 pages, 3 figures

Journal ref: Biometrics (2020)

arXiv:1805.07684 [pdf, other]

Consistent Estimation of Propensity Score Functions with Oversampled Exposed Subjects

Authors: Sherri Rose

Abstract: Observational cohort studies with oversampled exposed subjects are typically implemented to understand the causal effect of a rare exposure. Because the distribution of exposed subjects in the sample differs from the source population, estimation of a propensity score function (i.e., probability of exposure given baseline covariates) targets a nonparametrically nonidentifiable parameter. Consisten… ▽ More Observational cohort studies with oversampled exposed subjects are typically implemented to understand the causal effect of a rare exposure. Because the distribution of exposed subjects in the sample differs from the source population, estimation of a propensity score function (i.e., probability of exposure given baseline covariates) targets a nonparametrically nonidentifiable parameter. Consistent estimation of propensity score functions is an important component of various causal inference estimators, including double robust machine learning and inverse probability weighted estimators. This paper develops the use of the probability of exposure from the source population in a flexible computational implementation that can be used with any algorithm that allows observation weighting to produce consistent estimators of propensity score functions. Simulation studies and a hypothetical health policy intervention data analysis demonstrate low empirical bias and variance for these propensity score function estimators with observation weights in finite samples. △ Less

Submitted 12 February, 2019; v1 submitted 19 May, 2018; originally announced May 2018.

Comments: 15 pages, 3 figures, 2 tables

arXiv:1804.08055 [pdf, other]

doi 10.1080/01621459.2019.1688663

Nonparametric Bayesian Instrumental Variable Analysis: Evaluating Heterogeneous Effects of Coronary Arterial Access Site Strategies

Authors: Samrachana Adhikari, Sherri Rose, Sharon-Lise Normand

Abstract: Percutaneous coronary interventions (PCIs) are nonsurgical procedures to open blocked blood vessels to the heart, frequently using a catheter to place a stent. The catheter can be inserted into the blood vessels using an artery in the groin or an artery in the wrist. Because clinical trials have indicated that access via the wrist may result in fewer post procedure complications, shortening the le… ▽ More Percutaneous coronary interventions (PCIs) are nonsurgical procedures to open blocked blood vessels to the heart, frequently using a catheter to place a stent. The catheter can be inserted into the blood vessels using an artery in the groin or an artery in the wrist. Because clinical trials have indicated that access via the wrist may result in fewer post procedure complications, shortening the length of stay, and ultimately cost less than groin access, adoption of access via the wrist has been encouraged. However, patients treated in usual care are likely to differ from those participating in clinical trials, and there is reason to believe that the effectiveness of wrist access may differ between males and females. Moreover, the choice of artery access strategy is likely to be influenced by patient or physician unmeasured factors. To study the effectiveness of the two artery access site strategies on hospitalization charges, we use data from a state-mandated clinical registry including 7,963 patients undergoing PCI. A hierarchical Bayesian likelihood-based instrumental variable analysis under a latent index modeling framework is introduced to jointly model outcomes and treatment status. Our approach accounts for unobserved heterogeneity via a latent factor structure, and permits nonparametric error distributions with Dirichlet process mixture models. Our results demonstrate that artery access in the wrist reduces hospitalization charges compared to access in the groin, with higher mean reduction for male patients. △ Less

Submitted 3 November, 2019; v1 submitted 21 April, 2018; originally announced April 2018.

Comments: 11 tables, 5 figures

Journal ref: Journal of the American Statistical Association (2020)

arXiv:1707.04531 [pdf, other]

doi 10.1109/TCI.2017.2723246

A Convex Reconstruction Model for X-ray Tomographic Imaging with Uncertain Flat-fields

Authors: Hari Om Aggrawal, Martin Skovgaard Andersen, Sean Rose, Emil Y. Sidky

Abstract: Classical methods for X-ray computed tomography are based on the assumption that the X-ray source intensity is known, but in practice, the intensity is measured and hence uncertain. Under normal operating conditions, when the exposure time is sufficiently high, this kind of uncertainty typically has a negligible effect on the reconstruction quality. However, in time- or dose-limited applications s… ▽ More Classical methods for X-ray computed tomography are based on the assumption that the X-ray source intensity is known, but in practice, the intensity is measured and hence uncertain. Under normal operating conditions, when the exposure time is sufficiently high, this kind of uncertainty typically has a negligible effect on the reconstruction quality. However, in time- or dose-limited applications such as dynamic CT, this uncertainty may cause severe and systematic artifacts known as ring artifacts. By carefully modeling the measurement process and by taking uncertainties into account, we derive a new convex model that leads to improved reconstructions despite poor quality measurements. We demonstrate the effectiveness of the methodology based on simulated and real data sets. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: Accepted at IEEE Transactions on Computational Imaging

arXiv:1608.06257 [pdf, other]

doi 10.4324/9781315848709

Reduction of wind power variability through geographic diversity

Authors: Mark Handschy, Stephen Rose, Jay Apt

Abstract: The variability of wind-generated electricity can be reduced by aggregating the outputs of wind generation plants spread over a large geographic area. In this chapter we utilize Monte Carlo simulations to investigate upper bounds on the degree of achievable smoothing and clarify how the degree of smoothing depends on the number of plants and on the size of the geographic area over which they are s… ▽ More The variability of wind-generated electricity can be reduced by aggregating the outputs of wind generation plants spread over a large geographic area. In this chapter we utilize Monte Carlo simulations to investigate upper bounds on the degree of achievable smoothing and clarify how the degree of smoothing depends on the number of plants and on the size of the geographic area over which they are spread. △ Less

Submitted 22 August, 2016; originally announced August 2016.

Comments: 12 pages, 9 figures, Chapter 12 from Variable Renewable Energy and the Electricity Grid, by Jay Apt and Paulina Jaramillo, RFF/Routledge. 2014

Showing 1–13 of 13 results for author: Rose, S