-
No winners: Performance of lung cancer prediction models depends on screening-detected, incidental, and biopsied pulmonary nodule use cases
Authors:
Thomas Z. Li,
Kaiwen Xu,
Aravind Krishnan,
Riqiang Gao,
Michael N. Kammer,
Sanja Antic,
David Xiao,
Michael Knight,
Yency Martinez,
Rafael Paez,
Robert J. Lentz,
Stephen Deppen,
Eric L. Grogan,
Thomas A. Lasko,
Kim L. Sandler,
Fabien Maldonado,
Bennett A. Landman
Abstract:
Statistical models for predicting lung cancer have the potential to facilitate earlier diagnosis of malignancy and avoid invasive workup of benign disease. Many models have been published, but comparative studies of their utility in different clinical settings in which patients would arguably most benefit are scarce. This study retrospectively evaluated promising predictive models for lung cancer…
▽ More
Statistical models for predicting lung cancer have the potential to facilitate earlier diagnosis of malignancy and avoid invasive workup of benign disease. Many models have been published, but comparative studies of their utility in different clinical settings in which patients would arguably most benefit are scarce. This study retrospectively evaluated promising predictive models for lung cancer prediction in three clinical settings: lung cancer screening with low-dose computed tomography, incidentally detected pulmonary nodules, and nodules deemed suspicious enough to warrant a biopsy. We leveraged 9 cohorts (n=898, 896, 882, 219, 364, 117, 131, 115, 373) from multiple institutions to assess the area under the receiver operating characteristic curve (AUC) of validated models including logistic regressions on clinical variables and radiologist nodule characterizations, artificial intelligence on chest CTs, longitudinal imaging AI, and multi-modal approaches. We implemented each model from their published literature, re-training the models if necessary, and curated each cohort from primary data sources. We observed that model performance varied greatly across clinical use cases. No single predictive model emerged as a clear winner across all cohorts, but certain models excelled in specific clinical contexts. Single timepoint chest CT AI performed well in lung screening, but struggled to generalize to other clinical settings. Longitudinal imaging and multimodal models demonstrated comparatively promising performance on incidentally-detected nodules. However, when applied to nodules that underwent biopsy, all models underperformed. These results underscore the strengths and limitations of 8 validated predictive models and highlight promising directions towards personalized, noninvasive lung cancer diagnosis.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Prediction modelling with many correlated and zero-inflated predictors: assessing a nonnegative garrote approach
Authors:
Mariella Gregorich,
Michael Kammer,
Harald Mischak,
Georg Heinze
Abstract:
Building prediction models from mass-spectrometry data is challenging due to the abundance of correlated features with varying degrees of zero-inflation, leading to a common interest in reducing the features to a concise predictor set with good predictive performance. In this study, we formally established and examined regularized regression approaches, designed to address zero-inflated and correl…
▽ More
Building prediction models from mass-spectrometry data is challenging due to the abundance of correlated features with varying degrees of zero-inflation, leading to a common interest in reducing the features to a concise predictor set with good predictive performance. In this study, we formally established and examined regularized regression approaches, designed to address zero-inflated and correlated predictors. In particular, we describe a novel two-stage regularized regression approach (ridge-garrote) explicitly modelling zero-inflated predictors using two component variables, comprising a ridge estimator in the first stage and subsequently applying a nonnegative garrote estimator in the second stage. We contrasted ridge-garrote with one-stage methods (ridge, lasso) and other two-stage regularized regression approaches (lasso-ridge, ridge-lasso) for zero-inflated predictors. We assessed the predictive performance and predictor selection properties of these methods in a comparative simulation study and a real-data case study to predict kidney function using peptidomic features derived from mass-spectrometry. In the simulation study, the predictive performance of all assessed approaches was comparable, yet the ridge-garrote approach consistently selected more parsimonious models compared to its competitors in most scenarios. While lasso-ridge achieved higher predictive accuracy than its competitors, it exhibited high variability in the number of selected predictors. Ridge-lasso exhibited slightly superior predictive accuracy than ridge-garrote but at the expense of selecting more noise predictors. Overall, ridge emerged as a favourable option when variable selection is not a primary concern, while ridge-garrote demonstrated notable practical utility in selecting a parsimonious set of predictors, with only minimal compromise in predictive accuracy.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification
Authors:
Thomas Z. Li,
John M. Still,
Kaiwen Xu,
Ho Hin Lee,
Leon Y. Cai,
Aravind R. Krishnan,
Riqiang Gao,
Mirza S. Khan,
Sanja Antic,
Michael Kammer,
Kim L. Sandler,
Fabien Maldonado,
Bennett A. Landman,
Thomas A. Lasko
Abstract:
The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learni…
▽ More
The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.
△ Less
Submitted 29 June, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Phases of methodological research in biostatistics - building the evidence base for new methods
Authors:
Georg Heinze,
Anne-Laure Boulesteix,
Michael Kammer,
Tim P. Morris,
Ian R. White
Abstract:
Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of clinical research in drug development, we define fou…
▽ More
Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of clinical research in drug development, we define four phases of methodological research. These four phases cover (I) providing logical reasoning and proofs, (II) providing empirical evidence, first in a narrow target setting, then (III) in an extended range of settings and for various outcomes, accompanied by appropriate application examples, and (IV) investigations that establish a method as sufficiently well-understood to know when it is preferred over others and when it is not. We provide basic definitions of the four phases but acknowledge that more work is needed to facilitate unambiguous classification of studies into phases. Methodological developments that have undergone all four proposed phases are still rare, but we give two examples with references. Our concept rebalances the emphasis to studies in phase III and IV, i.e., carefully planned methods comparison studies and studies that explore the empirical properties of existing methods in a wider range of problems.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
A Comparative Study of Confidence Calibration in Deep Learning: From Computer Vision to Medical Imaging
Authors:
Riqiang Gao,
Thomas Li,
Yucheng Tang,
Zhoubing Xu,
Michael Kammer,
Sanja L. Antic,
Kim Sandler,
Fabien Moldonado,
Thomas A. Lasko,
Bennett Landman
Abstract:
Although deep learning prediction models have been successful in the discrimination of different classes, they can often suffer from poor calibration across challenging domains including healthcare. Moreover, the long-tail distribution poses great challenges in deep learning classification problems including clinical disease prediction. There are approaches proposed recently to calibrate deep pred…
▽ More
Although deep learning prediction models have been successful in the discrimination of different classes, they can often suffer from poor calibration across challenging domains including healthcare. Moreover, the long-tail distribution poses great challenges in deep learning classification problems including clinical disease prediction. There are approaches proposed recently to calibrate deep prediction in computer vision, but there are no studies found to demonstrate how the representative models work in different challenging contexts. In this paper, we bridge the confidence calibration from computer vision to medical imaging with a comparative study of four high-impact calibration models. Our studies are conducted in different contexts (natural image classification and lung cancer risk estimation) including in balanced vs. imbalanced training sets and in computer vision vs. medical imaging. Our results support key findings: (1) We achieve new conclusions which are not studied under different learning contexts, e.g., combining two calibration models that both mitigate the overconfident prediction can lead to under-confident prediction, and simpler calibration models from the computer vision domain tend to be more generalizable to medical imaging. (2) We highlight the gap between general computer vision tasks and medical imaging prediction, e.g., calibration methods ideal for general computer vision tasks may in fact damage the calibration of medical imaging prediction. (3) We also reinforce previous conclusions in natural image classification settings. We believe that this study has merits to guide readers to choose calibration models and understand gaps between general computer vision and medical imaging domains.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Deep Multi-path Network Integrating Incomplete Biomarker and Chest CT Data for Evaluating Lung Cancer Risk
Authors:
Riqiang Gao,
Yucheng Tang,
Kaiwen Xu,
Michael N. Kammer,
Sanja L. Antic,
Steve Deppen,
Kim L. Sandler,
Pierre P. Massion,
Yuankai Huo,
Bennett A. Landman
Abstract:
Clinical data elements (CDEs) (e.g., age, smoking history), blood markers and chest computed tomography (CT) structural features have been regarded as effective means for assessing lung cancer risk. These independent variables can provide complementary information and we hypothesize that combining them will improve the prediction accuracy. In practice, not all patients have all these variables ava…
▽ More
Clinical data elements (CDEs) (e.g., age, smoking history), blood markers and chest computed tomography (CT) structural features have been regarded as effective means for assessing lung cancer risk. These independent variables can provide complementary information and we hypothesize that combining them will improve the prediction accuracy. In practice, not all patients have all these variables available. In this paper, we propose a new network design, termed as multi-path multi-modal missing network (M3Net), to integrate the multi-modal data (i.e., CDEs, biomarker and CT image) considering missing modality with multiple paths neural network. Each path learns discriminative features of one modality, and different modalities are fused in a second stage for an integrated prediction. The network can be trained end-to-end with both medical image features and CDEs/biomarkers, or make a prediction with single modality. We evaluate M3Net with datasets including three sites from the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL) project. Our method is cross validated within a cohort of 1291 subjects (383 subjects with complete CDEs/biomarkers and CT images), and externally validated with a cohort of 99 subjects (99 with complete CDEs/biomarkers and CT images). Both cross-validation and external-validation results show that combining multiple modality significantly improves the predicting performance of single modality. The results suggest that integrating subjects with missing either CDEs/biomarker or CT imaging features can contribute to the discriminatory power of our model (p < 0.05, bootstrap two-tailed test). In summary, the proposed M3Net framework provides an effective way to integrate image and non-image data in the context of missing information.
△ Less
Submitted 9 February, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Great SCO2T! Rapid tool for carbon sequestration science, engineering, and economics
Authors:
Richard S. Middleton,
Jeffrey M. Bielicki,
Bailian Chen,
Andres F. Clarens,
Robert P. Currier,
Kevin M. Ellett,
Dylan R. Harp,
Brendan A. Hoover,
Ryan M. Kammer,
Dane N. McFarlane,
Jonathan D. Ogland-Hand,
Rajesh J. Pawar,
Philip H. Stauffer,
Hari S. Viswanathan,
Sean P. Yaw
Abstract:
CO2 capture and storage (CCS) technology is likely to be widely deployed in coming decades in response to major climate and economics drivers: CCS is part of every clean energy pathway that limits global warming to 2C or less and receives significant CO2 tax credits in the United States. These drivers are likely to stimulate capture, transport, and storage of hundreds of millions or billions of to…
▽ More
CO2 capture and storage (CCS) technology is likely to be widely deployed in coming decades in response to major climate and economics drivers: CCS is part of every clean energy pathway that limits global warming to 2C or less and receives significant CO2 tax credits in the United States. These drivers are likely to stimulate capture, transport, and storage of hundreds of millions or billions of tonnes of CO2 annually. A key part of the CCS puzzle will be identifying and characterizing suitable storage sites for vast amounts of CO2. We introduce a new software tool called SCO2T (Sequestration of CO2 Tool, pronounced "Scott") to rapidly characterizing saline storage reservoirs. The tool is designed to rapidly screen hundreds of thousands of reservoirs, perform sensitivity and uncertainty analyses, and link sequestration engineering (injection rates, reservoir capacities, plume dimensions) to sequestration economics (costs constructed from around 70 separate economic inputs). We describe the novel science developments supporting SCO2T including a new approach to estimating CO2 injection rates and CO2 plume dimensions as well as key advances linking sequestration engineering with economics. Next, we perform a sensitivity and uncertainty analysis of geology combinations (including formation depth, thickness, permeability, porosity, and temperature) to understand the impact on carbon sequestration. Through the sensitivity analysis we show that increasing depth and permeability both can lead to increased CO2 injection rates, increased storage potential, and reduced costs, while increasing porosity reduces costs without impacting the injection rate (CO2 is injected at a constant pressure in all cases) by increasing the reservoir capacity.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study
Authors:
Michael Kammer,
Daniela Dunkler,
Stefan Michiels,
Georg Heinze
Abstract:
Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the model. We review two interpretations of inference after selection: the full model view, in which the parameters of interest are those of the full model on all predic…
▽ More
Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the model. We review two interpretations of inference after selection: the full model view, in which the parameters of interest are those of the full model on all predictors, and then focus on the submodel view, in which the parameters of interest are those of the selected model only. In the context of L1-penalized regression we compare proposals for submodel inference (selective inference) via confidence intervals available to applied researchers via software packages using a simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. Our findings indicate that the frequentist properties of selective confidence intervals are generally acceptable, but desired coverage levels are not guaranteed in all scenarios except for the most conservative methods. The choice of inference method potentially has a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. Currently available software packages are not yet very user friendly or robust which might affect their use in practice. In summary, we find submodel inference after selection useful for experienced statisticians to assess the importance of individual selected predictors in future applications.
△ Less
Submitted 20 July, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.