Search | arXiv e-print repository

Exploring the validity of the complete case analysis for regression models with a right-censored covariate

Authors: Marissa C. Ashner, Tanya P. Garcia

Abstract: Despite its drawbacks, the complete case analysis is commonly used in regression models with missing covariates. Understanding when implementing complete cases will lead to consistent parameter estimation is vital before use. Here, our aim is to demonstrate when a complete case analysis is appropriate for a nuanced type of missing covariate, the randomly right-censored covariate. Across the censor… ▽ More Despite its drawbacks, the complete case analysis is commonly used in regression models with missing covariates. Understanding when implementing complete cases will lead to consistent parameter estimation is vital before use. Here, our aim is to demonstrate when a complete case analysis is appropriate for a nuanced type of missing covariate, the randomly right-censored covariate. Across the censored covariate literature, different assumptions are made to ensure a complete case analysis produces a consistent estimator, which leads to confusion in practice. We make several contributions to dispel this confusion. First, we summarize the language surrounding the assumptions that lead to a consistent complete case estimator. Then, we show a unidirectional hierarchical relationship between these assumptions, which leads us to one sufficient assumption to consider before using a complete case analysis. Lastly, we conduct a simulation study to illustrate the performance of a complete case analysis with a right-censored covariate under different censoring mechanism assumptions, and we demonstrate its use with a Huntington disease data example. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.01602 [pdf, other]

Mission Imputable: Correcting for Berkson Error When Imputing a Censored Covariate

Authors: Kyle F. Grosser, Sarah C. Lotspeich, Tanya P. Garcia

Abstract: To select outcomes for clinical trials testing experimental therapies for Huntington disease, a fatal neurodegenerative disorder, analysts model how potential outcomes change over time. Yet, subjects with Huntington disease are often observed at different levels of disease progression. To account for these differences, analysts include time to clinical diagnosis as a covariate when modeling potent… ▽ More To select outcomes for clinical trials testing experimental therapies for Huntington disease, a fatal neurodegenerative disorder, analysts model how potential outcomes change over time. Yet, subjects with Huntington disease are often observed at different levels of disease progression. To account for these differences, analysts include time to clinical diagnosis as a covariate when modeling potential outcomes, but this covariate is often censored. One popular solution is imputation, whereby we impute censored values using predictions from a model of the censored covariate given other data, then analyze the imputed dataset. However, when this imputation model is misspecified, our outcome model estimates can be biased. To address this problem, we developed a novel method, dubbed "ACE imputation." First, we model imputed values as error-prone versions of the true covariate values. Then, we correct for these errors using semiparametric theory. Specifically, we derive an outcome model estimator that is consistent, even when the censored covariate is imputed using a misspecified imputation model. Simulation results show that ACE imputation remains empirically unbiased even if the imputation model is misspecified, unlike multiple imputation which yields >100% bias. Applying our method to a Huntington disease study pinpoints outcomes for clinical trials aimed at slowing disease progression. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: The main text consists of 35 pages, including 1 figure and 3 tables. The supplement consists of 29 pages, including 1 figure and 3 tables

arXiv:2209.04716 [pdf, other]

Extrapolation before imputation reduces bias when imputing censored covariates

Authors: Sarah C. Lotspeich, Tanya P. Garcia

Abstract: Modeling symptom progression to identify informative subjects for a new Huntington's disease clinical trial is problematic since time to diagnosis, a key covariate, can be heavily censored. Imputation is an appealing strategy where censored covariates are replaced with their conditional means, but existing methods saw over 200% bias under heavy censoring. Calculating these conditional means well r… ▽ More Modeling symptom progression to identify informative subjects for a new Huntington's disease clinical trial is problematic since time to diagnosis, a key covariate, can be heavily censored. Imputation is an appealing strategy where censored covariates are replaced with their conditional means, but existing methods saw over 200% bias under heavy censoring. Calculating these conditional means well requires estimating and then integrating over the survival function of the censored covariate from the censored value to infinity. To estimate the survival function flexibly, existing methods use the semiparametric Cox model with Breslow's estimator, leaving the integrand for the conditional means (the estimated survival function) undefined beyond the observed data. The integral is then estimated up to the largest observed covariate value, and this approximation can cut off the tail of the survival function and lead to severe bias, particularly under heavy censoring. We propose a hybrid approach that splices together the semiparametric survival estimator with a parametric extension, making it possible to approximate the integral up to infinity. In simulation studies, our proposed approach of extrapolation then imputation substantially reduces the bias seen with existing imputation methods, even when the parametric extension was misspecified. We further demonstrate how imputing with corrected conditional means helps to prioritize patients for future clinical trials. △ Less

Submitted 29 November, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

Comments: 16 pages main text (incl. 2 tables and 3 figures); Supplemental Materials, R code, and R package available on GitHub (linked in main text)

MSC Class: 62J05

arXiv:2109.11989 [pdf, other]

doi 10.1002/bimj.202100250

Correcting Conditional Mean Imputation for Censored Covariates and Improving Usability

Authors: Sarah C. Lotspeich, Kyle F. Grosser, Tanya P. Garcia

Abstract: Analysts are often confronted with censoring, wherein some variables are not observed at their true value, but rather at a value that is known to fall above or below that truth. While much attention has been given to the analysis of censored outcomes, contemporary focus has shifted to censored covariates, as well. Missing data is often overcome using multiple imputation, which leverages the entire… ▽ More Analysts are often confronted with censoring, wherein some variables are not observed at their true value, but rather at a value that is known to fall above or below that truth. While much attention has been given to the analysis of censored outcomes, contemporary focus has shifted to censored covariates, as well. Missing data is often overcome using multiple imputation, which leverages the entire dataset by replacing missing values with informed placeholders, and this method can be modified for censored data by also incorporating partial information from censored values. One such modification involves replacing censored covariates with their conditional means given other fully observed information, such as the censored value or additional covariates. So-called conditional mean imputation approaches were proposed for censored covariates in Atem et al. [2017], Atem et al.[2019a], and Atem et al. [2019b]. These methods are robust to additional parametric assumptions on the censored covariate and utilize all available data, which is appealing. As we worked to implement these methods, however, we discovered that these three manuscripts provide nonequivalent formulas and, in fact, none is the correct formula for the conditional mean. Herein, we derive the correct form of the conditional mean and demonstrate the impact of the incorrect formulas on the imputed values and statistical inference. Under several settings considered, using an incorrect formula is seen to seriously bias parameter estimation in simple linear regression. Lastly, we provide user-friendly R software, the imputeCensoRd package, to enable future researchers to tackle censored covariates in their data. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: 8 pages, 2 figures

Journal ref: Biometrical Journal, vol. 64, pp. 858-862, 2022

arXiv:2007.06076 [pdf, other]

svReg: Structural Varying-coefficient regression to differentiate how regional brain atrophy affects motor impairment for Huntington disease severity groups

Authors: Rakheon Kim, Samuel Mueller, Tanya P. Garcia

Abstract: For Huntington disease, identification of brain regions related to motor impairment can be useful for develo** interventions to alleviate the motor symptom, the major symptom of the disease. However, the effects from the brain regions to motor impairment may vary for different groups of patients. Hence, our interest is not only to identify the brain regions but also to understand how their effec… ▽ More For Huntington disease, identification of brain regions related to motor impairment can be useful for develo** interventions to alleviate the motor symptom, the major symptom of the disease. However, the effects from the brain regions to motor impairment may vary for different groups of patients. Hence, our interest is not only to identify the brain regions but also to understand how their effects on motor impairment differ by patient groups. This can be cast as a model selection problem for a varying-coefficient regression. However, this is challenging when there is a pre-specified group structure among variables. We propose a novel variable selection method for a varying-coefficient regression with such structured variables. Our method is empirically shown to select relevant variables consistently. Also, our method screens irrelevant variables better than existing methods. Hence, our method leads to a model with higher sensitivity, lower false discovery rate and higher prediction accuracy than the existing methods. Finally, we found that the effects from the brain regions to motor impairment differ by disease severity of the patients. To the best of our knowledge, our study is the first to identify such interaction effects between the disease severity and brain regions, which indicates the need for customized intervention by disease severity. △ Less

Submitted 12 July, 2020; originally announced July 2020.

arXiv:1407.8412 [pdf, ps, other]

doi 10.1214/14-AOAS730

Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint

Authors: **g Qin, Tanya P. Garcia, Yanyuan Ma, Ming-Xin Tang, Karen Marder, Yuanjia Wang

Abstract: In certain genetic studies, clinicians and genetic counselors are interested in estimating the cumulative risk of a disease for individuals with and without a rare deleterious mutation. Estimating the cumulative risk is difficult, however, when the estimates are based on family history data. Often, the genetic mutation status in many family members is unknown; instead, only estimated probabilities… ▽ More In certain genetic studies, clinicians and genetic counselors are interested in estimating the cumulative risk of a disease for individuals with and without a rare deleterious mutation. Estimating the cumulative risk is difficult, however, when the estimates are based on family history data. Often, the genetic mutation status in many family members is unknown; instead, only estimated probabilities of a patient having a certain mutation status are available. Also, ages of disease-onset are subject to right censoring. Existing methods to estimate the cumulative risk using such family-based data only provide estimation at individual time points, and are not guaranteed to be monotonic or nonnegative. In this paper, we develop a novel method that combines Expectation-Maximization and isotonic regression to estimate the cumulative risk across the entire support. Our estimator is monotonic, satisfies self-consistent estimating equations and has high power in detecting differences between the cumulative risks of different populations. Application of our estimator to a Parkinson's disease (PD) study provides the age-at-onset distribution of PD in PARK2 mutation carriers and noncarriers, and reveals a significant difference between the distribution in compound heterozygous carriers compared to noncarriers, but not between heterozygous carriers and noncarriers. △ Less

Submitted 31 July, 2014; originally announced July 2014.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS730 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS730

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 2, 1182-1208

Showing 1–6 of 6 results for author: Garcia, T P