-
How to develop, externally validate, and update multinomial prediction models
Authors:
Celina K Gehringer,
Glen P Martin,
Ben Van Calster,
Kimme L Hyrich,
Suzanne M M Verstappen,
Jamie C Sergeant
Abstract:
Multinomial prediction models (MPMs) have a range of potential applications across healthcare where the primary outcome of interest has multiple nominal or ordinal categories. However, the application of MPMs is scarce, which may be due to the added methodological complexities that they bring. This article provides a guide of how to develop, externally validate, and update MPMs. Using a previously…
▽ More
Multinomial prediction models (MPMs) have a range of potential applications across healthcare where the primary outcome of interest has multiple nominal or ordinal categories. However, the application of MPMs is scarce, which may be due to the added methodological complexities that they bring. This article provides a guide of how to develop, externally validate, and update MPMs. Using a previously developed and validated MPM for treatment outcomes in rheumatoid arthritis as an example, we outline guidance and recommendations for producing a clinical prediction model using multinomial logistic regression. This article is intended to supplement existing general guidance on prediction model research. This guide is split into three parts: 1) Outcome definition and variable selection, 2) Model development, and 3) Model evaluation (including performance assessment, internal and external validation, and model recalibration). We outline how to evaluate and interpret the predictive performance of MPMs. R code is provided. We recommend the application of MPMs in clinical settings where the prediction of a nominal polytomous outcome is of interest. Future methodological research could focus on MPM-specific considerations for variable selection and sample size criteria for external validation.
△ Less
Submitted 20 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Calibration plots for multistate risk predictions models: an overview and simulation comparing novel approaches
Authors:
Alexander Pate,
Matthew Sperrin,
Richard D. Riley,
Niels Peek,
Tjeerd Van Staa,
Jamie C. Sergeant,
Mamas A. Mamas,
Gregory Y. H. Lip,
Martin O Flaherty,
Michael Barrowman,
Iain Buchan,
Glen P. Martin
Abstract:
Introduction. There is currently no guidance on how to assess the calibration of multistate models used for risk prediction. We introduce several techniques that can be used to produce calibration plots for the transition probabilities of a multistate model, before assessing their performance in the presence of non-informative and informative censoring through a simulation.
Methods. We studied p…
▽ More
Introduction. There is currently no guidance on how to assess the calibration of multistate models used for risk prediction. We introduce several techniques that can be used to produce calibration plots for the transition probabilities of a multistate model, before assessing their performance in the presence of non-informative and informative censoring through a simulation.
Methods. We studied pseudo-values based on the Aalen-Johansen estimator, binary logistic regression with inverse probability of censoring weights (BLR-IPCW), and multinomial logistic regression with inverse probability of censoring weights (MLR-IPCW). The MLR-IPCW approach results in a calibration scatter plot, providing extra insight about the calibration. We simulated data with varying levels of censoring and evaluated the ability of each method to estimate the calibration curve for a set of predicted transition probabilities. We also developed evaluated the calibration of a model predicting the incidence of cardiovascular disease, type 2 diabetes and chronic kidney disease among a cohort of patients derived from linked primary and secondary healthcare records.
Results. The pseudo-value, BLR-IPCW and MLR-IPCW approaches give unbiased estimates of the calibration curves under non-informative censoring. These methods remained unbiased in the presence of informative censoring, unless the mechanism was strongly informative, with bias concentrated in the areas of predicted transition probabilities of low density.
Conclusions. We recommend implementing either the pseudo-value or BLR-IPCW approaches to produce a calibration curve, combined with the MLR-IPCW approach to produce a calibration scatter plot, which provides additional information over either of the other methods.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Minimum Sample Size for Develo** a Multivariable Prediction Model using Multinomial Logistic Regression
Authors:
Alexander Pate,
Richard D Riley,
Gary S Collins,
Maarten van Smeden,
Ben Van Calster,
Joie Ensor,
Glen P Martin
Abstract:
Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When develo** such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light…
▽ More
Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When develo** such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. The first criteria aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct one-to-one logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. We tested the performance of the proposed criteria (i) through a simulation study, and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) are natural extensions from previously proposed criteria for binary outcomes. We illustrate how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Imputation and Missing Indicators for handling missing data in the development and implementation of clinical prediction models: a simulation study
Authors:
Rose Sisk,
Matthew Sperrin,
Niels Peek,
Maarten van Smeden,
Glen P. Martin
Abstract:
Background: Existing guidelines for handling missing data are generally not consistent with the goals of prediction modelling, where missing data can occur at any stage of the model pipeline. Multiple imputation (MI), often heralded as the gold standard approach, can be challenging to apply in the clinic. Clearly, the outcome cannot be used to impute data at prediction time. Regression imputation…
▽ More
Background: Existing guidelines for handling missing data are generally not consistent with the goals of prediction modelling, where missing data can occur at any stage of the model pipeline. Multiple imputation (MI), often heralded as the gold standard approach, can be challenging to apply in the clinic. Clearly, the outcome cannot be used to impute data at prediction time. Regression imputation (RI) may offer a pragmatic alternative in the prediction context, that is simpler to apply in the clinic. Moreover, the use of missing indicators can handle informative missingness, but it is currently unknown how well they perform within CPMs. Methods: We performed a simulation study where data were generated under various missing data mechanisms to compare the predictive performance of CPMs developed using both imputation methods. We consider deployment scenarios where missing data is permitted/prohibited, and develop models that use/omit the outcome during imputation and include/omit missing indicators. Results: When complete data must be available at deployment, our findings were in line with widely used recommendations; that the outcome should be used to impute development data under MI, yet omitted under RI. When imputation is applied at deployment, omitting the outcome from the imputation at development was preferred. Missing indicators improved model performance in some specific cases, but can be harmful when missingness is dependent on the outcome. Conclusion: We provide evidence that commonly taught principles of handling missing data via MI may not apply to CPMs, particularly when data can be missing at deployment. In such settings, RI and missing indicator methods can (marginally) outperform MI. As shown, the performance of the missing data handling method must be evaluated on a study-by-study basis, and should be based on whether missing data are allowed at deployment.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
A sco** review of causal methods enabling predictions under hypothetical interventions
Authors:
Li**g Lin,
Matthew Sperrin,
David A. Jenkins,
Glen P. Martin,
Niels Peek
Abstract:
Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for develo** and validating prediction mo…
▽ More
Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for develo** and validating prediction models that enable risk estimation of outcomes under hypothetical interventions, utilizing causal inference: their main methodological approaches, underlying assumptions, targeted estimands, and potential pitfalls and challenges with using the method, and unresolved methodological challenges.
Methods: We systematically reviewed literature published by December 2019, considering papers in the health domain that used causal considerations to enable prediction models to be used for predictions under hypothetical interventions.
Results: We identified 4919 papers through database searches and a further 115 papers through manual searches, of which 13 were selected for inclusion, from both the statistical and the machine learning literature. Most of the identified methods for causal inference from observational data were based on marginal structural models and g-estimation.
Conclusions: There exist two broad methodological approaches for allowing prediction under hypothetical intervention into clinical prediction models: 1) enriching prediction models derived from observational studies with estimated causal effects from clinical trials and meta-analyses; and 2) estimating prediction models and causal effects directly from observational data. These methods require extending to dynamic treatment regimes, and consideration of multiple interventions to operationalise a clinical decision support system. Techniques for validating 'causal prediction models' are still in their infancy.
△ Less
Submitted 12 January, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches
Authors:
Glen P. Martin,
Matthew Sperrin,
Kym I. E. Snell,
Iain Buchan,
Richard D. Riley
Abstract:
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction…
▽ More
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on conditional independence: separate univariate CPMs and stacked regression. Employing a simulation study and real-world example via the MIMIC-III database, we illustrate that CPMs for joint risk prediction of multiple outcomes should only be derived using methods that model the residual correlation between outcomes. In such a situation, our results suggest that probabilistic classification chains, multinomial logistic regression or the Bayesian probit model are all appropriate choices. We call into question the development of CPMs for each outcome in isolation when multiple correlated or structurally related outcomes are of interest and recommend more holistic risk prediction.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Examining the impact of data quality and completeness of electronic health records on predictions of patients risks of cardiovascular disease
Authors:
Yan Li,
Matthew Sperrin,
Glen P. Martin,
Darren M Ashcroft,
Tjeerd Pieter van Staa
Abstract:
The objective is to assess the extent of variation of data quality and completeness of electronic health records and impact on the robustness of risk predictions of incident cardiovascular disease (CVD) using a risk prediction tool that is based on routinely collected data (QRISK3). The study design is a longitudinal cohort study with a setting of 392 general practices (including 3.6 million patie…
▽ More
The objective is to assess the extent of variation of data quality and completeness of electronic health records and impact on the robustness of risk predictions of incident cardiovascular disease (CVD) using a risk prediction tool that is based on routinely collected data (QRISK3). The study design is a longitudinal cohort study with a setting of 392 general practices (including 3.6 million patients) linked to hospital admission data. Variation in data quality was assessed using Saez stability metrics quantifying outlyingness of each practice. Statistical frailty models evaluated whether accuracy of QRISK3 predictions on individual predictions and effects of overall risk factors (linear predictor) varied between practices. There was substantial heterogeneity between practices in CVD incidence unaccounted for by QRISK3. In the lowest quintile of statistical frailty, a QRISK3 predicted risk of 10% for female was in a range between 7.1% and 9.0% when incorporating practice variability into the statistical frailty models; for the highest quintile, this was 10.9%-16.4%. Data quality (using Saez metrics) and completeness were comparable across different levels of statistical frailty. For example, recording of missing information on ethnicity was 55.7%, 62.7%, 57.8%, 64.8% and 62.1% for practices from lowest to highest quintiles of statistical frailty respectively. The effects of risk factors did not vary between practices with little statistical variation of beta coefficients. In conclusion, the considerable unmeasured heterogeneity in CVD incidence between practices was not explained by variations in data quality or effects of risk factors. QRISK3 risk prediction should be supplemented with clinical judgement and evidence of additional risk factors.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.