Search | arXiv e-print repository

Empirical Evidence That There Is No Such Thing As A Validated Prediction Model

Authors: Florian D. van Leeuwen, Ewout W. Steyerberg, David van Klaveren, Ben Wessler, David M. Kent, Erik W. van Zwet

Abstract: Background: External validations are essential to assess clinical prediction models (CPMs) before deployment. Apart from model misspecification, differences in patient population and other factors influence a model's AUC (c-statistic). We aimed to quantify variation in AUCs across external validation studies and adjust expectations of a model's performance in a new setting. Methods: The Tufts-PA… ▽ More Background: External validations are essential to assess clinical prediction models (CPMs) before deployment. Apart from model misspecification, differences in patient population and other factors influence a model's AUC (c-statistic). We aimed to quantify variation in AUCs across external validation studies and adjust expectations of a model's performance in a new setting. Methods: The Tufts-PACE CPM Registry contains CPMs for cardiovascular disease prognosis. We analyzed the AUCs of 469 CPMs with a total of 1,603 external validations. For each CPM, we performed a random effects meta-analysis to estimate the between-study standard deviation $τ$ among the AUCs. Since the majority of these meta-analyses has only a handful of validations, this leads to very poor estimates of $τ$. So, we estimated a log normal distribution of $τ$ across all CPMs and used this as an empirical prior. We compared this empirical Bayesian approach with frequentist meta-analyses using cross-validation. Results: The 469 CPMs had a median of 2 external validations (IQR: [1-3]). The estimated distribution of $τ$ had a mean of 0.055 and a standard deviation of 0.015. If $τ$ = 0.05, the 95% prediction interval for the AUC in a new setting is at least +/- 0.1, regardless of the number of validations. Frequentist methods underestimate the uncertainty about the AUC in a new setting. Accounting for $τ$ in a Bayesian approach achieved near nominal coverage. Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2311.18025 [pdf, other]

A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data

Authors: Ethan Harvey, Wansu Chen, David M. Kent, Michael C. Hughes

Abstract: Practitioners building classifiers often start with a smaller pilot dataset and plan to grow to larger data in the near future. Such projects need a toolkit for extrapolating how much classifier accuracy may improve from a 2x, 10x, or 50x increase in data size. While existing work has focused on finding a single "best-fit" curve using various functional forms like power laws, we argue that modelin… ▽ More Practitioners building classifiers often start with a smaller pilot dataset and plan to grow to larger data in the near future. Such projects need a toolkit for extrapolating how much classifier accuracy may improve from a 2x, 10x, or 50x increase in data size. While existing work has focused on finding a single "best-fit" curve using various functional forms like power laws, we argue that modeling and assessing the uncertainty of predictions is critical yet has seen less attention. In this paper, we propose a Gaussian process model to obtain probabilistic extrapolations of accuracy or similar performance metrics as dataset size increases. We evaluate our approach in terms of error, likelihood, and coverage across six datasets. Though we focus on medical tasks and image modalities, our open source approach generalizes to any kind of classifier. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2205.01717 [pdf, other]

Individualized treatment effect was predicted best by modeling baseline risk in interaction with treatment assignment

Authors: Alexandros Rekkas, Peter R. Rijnbeek, David M. Kent, Ewout W. Steyerberg, David van Klaveren

Abstract: Objective: To compare different risk-based methods for optimal prediction of treatment effects. Methods: We simulated RCT data using diverse assumptions for the average treatment effect, a baseline prognostic index of risk (PI), the shape of its interaction with treatment (none, linear, quadratic or non-monotonic), and the magnitude of treatment-related harms (none or constant independent of the P… ▽ More Objective: To compare different risk-based methods for optimal prediction of treatment effects. Methods: We simulated RCT data using diverse assumptions for the average treatment effect, a baseline prognostic index of risk (PI), the shape of its interaction with treatment (none, linear, quadratic or non-monotonic), and the magnitude of treatment-related harms (none or constant independent of the PI). We predicted absolute benefit using: models with a constant relative treatment effect; stratification in quarters of the PI; models including a linear interaction of treatment with the PI; models including an interaction of treatment with a restricted cubic spline (RCS) transformation of the PI; an adaptive approach using Akaike's Information Criterion. We evaluated predictive performance using root mean squared error and measures of discrimination and calibration for benefit. Results: The linear-interaction model displayed optimal or close-to-optimal performance across many simulation scenarios with moderate sample size (N=4,250 patients; ~ 785 events). The RCS-model was optimal for strong non-linear deviations from a constant treatment effect, particularly when sample size was larger (N=17,000). The adaptive approach also required larger sample sizes. These findings were illustrated in the GUSTO-I trial. Conclusion: An interaction between baseline risk and treatment assignment should be considered to improve treatment effect predictions. △ Less

Submitted 4 July, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2105.00773 [pdf, other]

Approximate Bayesian Computation for an Explicit-Duration Hidden Markov Model of COVID-19 Hospital Trajectories

Authors: Gian Marco Visani, Alexandra Hope Lee, Cuong Nguyen, David M. Kent, John B. Wong, Joshua T. Cohen, Michael C. Hughes

Abstract: We address the problem of modeling constrained hospital resources in the midst of the COVID-19 pandemic in order to inform decision-makers of future demand and assess the societal value of possible interventions. For broad applicability, we focus on the common yet challenging scenario where patient-level data for a region of interest are not available. Instead, given daily admissions counts, we mo… ▽ More We address the problem of modeling constrained hospital resources in the midst of the COVID-19 pandemic in order to inform decision-makers of future demand and assess the societal value of possible interventions. For broad applicability, we focus on the common yet challenging scenario where patient-level data for a region of interest are not available. Instead, given daily admissions counts, we model aggregated counts of observed resource use, such as the number of patients in the general ward, in the intensive care unit, or on a ventilator. In order to explain how individual patient trajectories produce these counts, we propose an aggregate count explicit-duration hidden Markov model, nicknamed the ACED-HMM, with an interpretable, compact parameterization. We develop an Approximate Bayesian Computation approach that draws samples from the posterior distribution over the model's transition and duration parameters given aggregate counts from a specific location, thus adapting the model to a region or individual hospital site of interest. Samples from this posterior can then be used to produce future forecasts of any counts of interest. Using data from the United States and the United Kingdom, we show our mechanistic approach provides competitive probabilistic forecasts for the future even as the dynamics of the pandemic shift. Furthermore, we show how our model provides insight about recovery probabilities or length of stay distributions, and we suggest its potential to answer challenging what-if questions about the societal value of possible interventions. △ Less

Submitted 28 July, 2021; v1 submitted 28 April, 2021; originally announced May 2021.

Comments: To appear in the Proceedings of the Machine Learning for Healthcare (MLHC) conference, 2021. 20 pages, 7 figures and 1 table. 26 additional pages of supplementary material

arXiv:2010.06430 [pdf, other]

A standardized framework for risk-based assessment of treatment effect heterogeneity in observational healthcare databases

Authors: Alexandros Rekkas, David van Klaveren, Patrick B. Ryan, Ewout W. Steyerberg, David M. Kent, Peter R. Rijnbeek

Abstract: The Predictive Approaches to Treatment Effect Heterogeneity statement focused on baseline risk as a robust predictor of treatment effect and provided guidance on risk-based assessment of treatment effect heterogeneity in the RCT setting. The aim of this study was to extend this approach to the observational setting using a standardized scalable framework. The proposed framework consists of five st… ▽ More The Predictive Approaches to Treatment Effect Heterogeneity statement focused on baseline risk as a robust predictor of treatment effect and provided guidance on risk-based assessment of treatment effect heterogeneity in the RCT setting. The aim of this study was to extend this approach to the observational setting using a standardized scalable framework. The proposed framework consists of five steps: 1) definition of the research aim, i.e., the population, the treatment, the comparator and the outcome(s) of interest; 2) identification of relevant databases; 3) development of a prediction model for the outcome(s) of interest; 4) estimation of relative and absolute treatment effect within strata of predicted risk, after adjusting for observed confounding; 5) presentation of the results. We demonstrate our framework by evaluating heterogeneity of the effect of angiotensin-converting enzyme (ACE) inhibitors versus beta blockers on three efficacy and six safety outcomes across three observational databases. The proposed framework can supplement any comparative effectiveness study. We provide a publicly available R software package for applying this framework to any database mapped to the Observational Medical Outcomes Partnership Common Data Model. In our demonstration, patients at low risk of acute myocardial infarction received negligible absolute benefits for all three efficacy outcomes, though they were more pronounced in the highest risk quarter, especially for hospitalization with heart failure. However, failing diagnostics showed evidence of residual imbalances even after adjustment for observed confounding. Our framework allows for the evaluation of differential treatment effects across risk strata, which offers the opportunity to consider the benefit-harm trade-off between alternative treatments. △ Less

Submitted 1 July, 2022; v1 submitted 13 October, 2020; originally announced October 2020.

Showing 1–5 of 5 results for author: Kent, D M