-
Polytomous Explanatory Item Response Models for Item Discrimination: Assessing Negative-Framing Effects in Social-Emotional Learning Surveys
Authors:
Joshua B. Gilbert,
Li** Zhang,
Esther Ulitzsch,
Benjamin W. Domingue
Abstract:
Modeling item parameters as a function of item characteristics has a long history but has generally focused on models for item location. Explanatory item response models for item discrimination are available but rarely used. In this study, we extend existing approaches for modeling item discrimination from dichotomous to polytomous item responses. We illustrate our proposed approach with an applic…
▽ More
Modeling item parameters as a function of item characteristics has a long history but has generally focused on models for item location. Explanatory item response models for item discrimination are available but rarely used. In this study, we extend existing approaches for modeling item discrimination from dichotomous to polytomous item responses. We illustrate our proposed approach with an application to four social-emotional learning surveys of preschool children to investigate how item discrimination depends on whether an item is positively or negatively framed. Negative framing predicts significantly lower item discrimination on two of the four surveys, and a plausibly causal estimate from a regression discontinuity analysis shows that negative framing reduces discrimination by about 30\% on one survey. We conclude with a discussion of potential applications of explanatory models for item discrimination.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Estimating Heterogeneous Treatment Effects with Item-Level Outcome Data: Insights from Item Response Theory
Authors:
Joshua B. Gilbert,
Zachary Himmelsbach,
James Soland,
Mridul Joshi,
Benjamin W. Domingue
Abstract:
Analyses of heterogeneous treatment effects (HTE) are common in applied causal inference research. However, when outcomes are latent variables assessed via psychometric instruments such as educational tests, standard methods ignore the potential HTE that may exist among the individual items of the outcome measure. Failing to account for "item-level" HTE (IL-HTE) can lead to both estimated standard…
▽ More
Analyses of heterogeneous treatment effects (HTE) are common in applied causal inference research. However, when outcomes are latent variables assessed via psychometric instruments such as educational tests, standard methods ignore the potential HTE that may exist among the individual items of the outcome measure. Failing to account for "item-level" HTE (IL-HTE) can lead to both estimated standard errors that are too small and identification challenges in the estimation of treatment-by-covariate interaction effects. We demonstrate how Item Response Theory (IRT) models that estimate a treatment effect for each assessment item can both address these challenges and provide new insights into HTE generally. This study articulates the theoretical rationale for the IL-HTE model and demonstrates its practical value using data from 20 randomized controlled trials containing 2.3 million item responses in economics, education, and health research. Our results show that the IL-HTE model reveals item-level variation masked by average treatment effects, provides more accurate statistical inference, allows for estimates of the generalizability of causal effects, resolves identification problems in the estimation of interaction effects, and provides estimates of standardized treatment effect sizes corrected for attenuation due to measurement error.
△ Less
Submitted 20 May, 2024; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Identification and estimation of mediational effects of longitudinal modified treatment policies
Authors:
Brian Gilbert,
Katherine L. Hoffman,
Nicholas Williams,
Kara E. Rudolph,
Edward J. Schenck,
Iván Díaz
Abstract:
We demonstrate a comprehensive semiparametric approach to causal mediation analysis, addressing the complexities inherent in settings with longitudinal and continuous treatments, confounders, and mediators. Our methodology utilizes a nonparametric structural equation model and a cross-fitted sequential regression technique based on doubly robust pseudo-outcomes, yielding an efficient, asymptotical…
▽ More
We demonstrate a comprehensive semiparametric approach to causal mediation analysis, addressing the complexities inherent in settings with longitudinal and continuous treatments, confounders, and mediators. Our methodology utilizes a nonparametric structural equation model and a cross-fitted sequential regression technique based on doubly robust pseudo-outcomes, yielding an efficient, asymptotically normal estimator without relying on restrictive parametric modeling assumptions. We are motivated by a recent scientific controversy regarding the effects of invasive mechanical ventilation (IMV) on the survival of COVID-19 patients, considering acute kidney injury (AKI) as a mediating factor. We highlight the possibility of "inconsistent mediation," in which the direct and indirect effects of the exposure operate in opposite directions. We discuss the significance of mediation analysis for scientific understanding and its potential utility in treatment decisions.
△ Less
Submitted 30 June, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Item-Level Heterogeneous Treatment Effects of Selective Serotonin Reuptake Inhibitors (SSRIs) on Depression: Implications for Inference, Generalizability, and Identification
Authors:
Joshua B. Gilbert,
Fredrik Hieronymus,
Elias Eriksson,
Benjamin W. Domingue
Abstract:
In analysis of randomized controlled trials (RCTs) with patient-reported outcome measures (PROMs), Item Response Theory (IRT) models that allow for heterogeneity in the treatment effect at the item level merit consideration. These models for ``item-level heterogeneous treatment effects'' (IL-HTE) can provide more accurate statistical inference, allow researchers to better generalize their results,…
▽ More
In analysis of randomized controlled trials (RCTs) with patient-reported outcome measures (PROMs), Item Response Theory (IRT) models that allow for heterogeneity in the treatment effect at the item level merit consideration. These models for ``item-level heterogeneous treatment effects'' (IL-HTE) can provide more accurate statistical inference, allow researchers to better generalize their results, and resolve critical identification problems in the estimation of interaction effects. In this study, we extend the IL-HTE model to polytomous data and apply the model to determine how the effect of selective serotonin reuptake inhibitors (SSRIs) on depression varies across the items on a depression rating scale. We first conduct a Monte Carlo simulation study to assess the performance of the polytomous IL-HTE model under a range of conditions. We then apply the IL-HTE model to item-level data from 28 RCTs measuring the effect of SSRIs on depression using the 17-item Hamilton Depression Rating Scale (HDRS-17) and estimate potential heterogeneity by subscale (HDRS-6). Our results show that the IL-HTE model provides more accurate statistical inference, allows for generalizability of results to out-of-sample items, and resolves identification problems in the estimation of interaction effects. Our empirical application shows that while the average effect of SSRIs on depression is beneficial (i.e., negative) and statistically significant, there is substantial IL-HTE, with estimates of the standard deviation of item-level effects nearly as large as the average effect. We show that this substantial IL-HTE is driven primarily by systematically larger effects on the HDRS-6 subscale items. The IL-HTE model has the potential to provide new insights for the inference, generalizability, and identification of treatment effects in clinical trials using patient reported outcome measures.
△ Less
Submitted 24 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Nonparametric variable importance for time-to-event outcomes with application to prediction of HIV infection
Authors:
Charles J. Wolock,
Peter B. Gilbert,
Noah Simon,
Marco Carone
Abstract:
In survival analysis, complex machine learning algorithms have been increasingly used for predictive modeling. Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. In particular, in HIV vaccine trials, participant baseline characteristics are used to predict t…
▽ More
In survival analysis, complex machine learning algorithms have been increasingly used for predictive modeling. Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. In particular, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of infection over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioral factors, contribute toward overall predictiveness. Time-to-event outcomes such as time to infection are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference, and enjoys double-robustness. We assess the performance of our proposed procedure via numerical simulations and analyze data from the HVTN 702 study to inform enrollment strategies for future HIV vaccine trials.
△ Less
Submitted 11 December, 2023; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Data fusion using weakly aligned sources
Authors:
Sijia Li,
Peter B. Gilbert,
Alex Luedtke
Abstract:
We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes t…
▽ More
We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment can be characterized by a prespecified density ratio model. We describe gains in efficiency and provide a general means to construct estimators achieving these gains. We illustrate our results by fusing data from two harmonized HIV monoclonal antibody prevention efficacy trials to study how a neutralizing antibody biomarker associates with HIV genotype.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Consistency of common spatial estimators under spatial confounding
Authors:
Brian Gilbert,
Elizabeth L. Ogburn,
Abhirup Datta
Abstract:
This paper addresses the asymptotic performance of popular spatial regression estimators on the task of estimating the linear effect of an exposure on an outcome under "spatial confounding" -- the presence of an unmeasured spatially-structured variable influencing both the exposure and the outcome. The existing literature on spatial confounding is informal and inconsistent; this paper is an attemp…
▽ More
This paper addresses the asymptotic performance of popular spatial regression estimators on the task of estimating the linear effect of an exposure on an outcome under "spatial confounding" -- the presence of an unmeasured spatially-structured variable influencing both the exposure and the outcome. The existing literature on spatial confounding is informal and inconsistent; this paper is an attempt to bring clarity through rigorous results on the asymptotic bias and consistency of estimators from popular spatial regression models. We consider two data generation processes: one where the confounder is a fixed function of space and one where it is a random function (i.e., a stochastic process on the spatial domain). We first show that the estimators from ordinary least squares (OLS) and restricted spatial regression are asymptotically biased under spatial confounding. We then prove a novel main result on the consistency of the generalized least squares (GLS) estimator using a Gaussian process (GP) covariance matrix in the presence of spatial confounding under in-fill (fixed domain) asymptotics. The result holds under very general conditions -- for any exposure with some non-spatial variation (noise), for any spatially continuous confounder, using any choice of Matérn or square exponential Gaussian process covariance used to construct the GLS estimator, and without requiring Gaussianity of errors. Finally, we prove that spatial estimators from GLS, GP regression, and spline models that are consistent under confounding by a fixed function will also be consistent under confounding by a random function. We conclude that, contrary to much of the literature on spatial confounding, traditional spatial estimators are capable of estimating linear exposure effects under spatial confounding in the presence of some noise in the exposure. We support our theoretical arguments with simulation studies.
△ Less
Submitted 23 April, 2024; v1 submitted 23 August, 2023;
originally announced August 2023.
-
Learning sources of variability from high-dimensional observational studies
Authors:
Eric W. Bridgeford,
Jaewon Chung,
Brian Gilbert,
Sambit Panda,
Adam Li,
Cencheng Shen,
Alexandra Badea,
Brian Caffo,
Joshua T. Vogelstein
Abstract:
Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estiman…
▽ More
Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.
△ Less
Submitted 28 November, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Visibility graph-based covariance functions for scalable spatial analysis in non-convex domains
Authors:
Brian Gilbert,
Abhirup Datta
Abstract:
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the…
▽ More
We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships. We show that the proposed method preserves the partially Euclidean nature of the intrinsic geometry on the domain while maintaining validity (positive definiteness) and marginal stationarity of the covariance function over the entire parameter space, properties which are not always fulfilled by existing approaches to construct covariance functions on non-convex domains. We provide useful approximations to improve computational efficiency, resulting in a scalable algorithm. We compare the performance of our method with those of competing state-of-the-art methods using simulation studies on synthetic non-convex domains. The method is applied to data regarding acidity levels in the Chesapeake Bay, showing its potential for ecological monitoring in real-world spatial applications on irregular domains.
△ Less
Submitted 29 May, 2024; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Semiparametric inference for relative heterogeneous vaccine efficacy between strains in observational case-only studies
Authors:
Lars van der Laan,
Peter B. Gilbert
Abstract:
The aim of this manuscript is to explore semiparametric methods for inferring subgroup-specific relative vaccine efficacy in a partially vaccinated population against multiple strains of a virus. We consider methods for observational case-only studies with informative missingness in viral strain type due to vaccination status, pre-vaccination variables, and also post-vaccination factors such as vi…
▽ More
The aim of this manuscript is to explore semiparametric methods for inferring subgroup-specific relative vaccine efficacy in a partially vaccinated population against multiple strains of a virus. We consider methods for observational case-only studies with informative missingness in viral strain type due to vaccination status, pre-vaccination variables, and also post-vaccination factors such as viral load. We establish general causal conditions under which the relative conditional vaccine efficacy between strains can be identified nonparametrically from the observed data-generating distribution. Assuming that the relative strain-specific conditional vaccine efficacy has a known parametric form, we propose semiparametric asymptotically linear estimators of the parameters based on targeted (debiased) machine learning estimators for partially linear logistic regression models. Finally, we apply our methods to estimate the relative strain-specific conditional vaccine efficacy in the ENSEMBLE COVID-19 vaccine trial.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
A framework for leveraging machine learning tools to estimate personalized survival curves
Authors:
Charles J. Wolock,
Peter B. Gilbert,
Noah Simon,
Marco Carone
Abstract:
The conditional survival function of a time-to-event outcome subject to censoring and truncation is a common target of estimation in survival analysis. This parameter may be of scientific interest and also often appears as a nuisance in nonparametric and semiparametric problems. In addition to classical parametric and semiparametric methods (e.g., based on the Cox proportional hazards model), flex…
▽ More
The conditional survival function of a time-to-event outcome subject to censoring and truncation is a common target of estimation in survival analysis. This parameter may be of scientific interest and also often appears as a nuisance in nonparametric and semiparametric problems. In addition to classical parametric and semiparametric methods (e.g., based on the Cox proportional hazards model), flexible machine learning approaches have been developed to estimate the conditional survival function. However, many of these methods are either implicitly or explicitly targeted toward risk stratification rather than overall survival function estimation. Others apply only to discrete-time settings or require inverse probability of censoring weights, which can be as difficult to estimate as the outcome survival function itself. Here, we employ a decomposition of the conditional survival function in terms of observable regression models in which censoring and truncation play no role. This allows application of an array of flexible regression and classification methods rather than only approaches that explicitly handle the complexities inherent to survival data. We outline estimation procedures based on this decomposition, empirically assess their performance, and demonstrate their use on data from an HIV vaccine trial.
△ Less
Submitted 31 October, 2023; v1 submitted 6 November, 2022;
originally announced November 2022.
-
Estimation and Hypothesis Testing of Strain-Specific Vaccine Efficacy with Missing Strain Types, with Applications to a COVID-19 Vaccine Trial
Authors:
Fei Heng,
Yanqing Sun,
Peter B. Gilbert
Abstract:
Statistical methods are developed for analysis of clinical and virus genetics data from phase 3 randomized, placebo-controlled trials of vaccines against novel coronavirus COVID-19. Vaccine efficacy (VE) of a vaccine to prevent COVID-19 caused by one of finitely many genetic strains of SARS-CoV-2 may vary by strain. The problem of assessing differential VE by viral genetics can be formulated under…
▽ More
Statistical methods are developed for analysis of clinical and virus genetics data from phase 3 randomized, placebo-controlled trials of vaccines against novel coronavirus COVID-19. Vaccine efficacy (VE) of a vaccine to prevent COVID-19 caused by one of finitely many genetic strains of SARS-CoV-2 may vary by strain. The problem of assessing differential VE by viral genetics can be formulated under a competing risks model where the endpoint is virologically confirmed COVID-19 and the cause-of-failure is the infecting SARS-CoV-2 genotype. Strain-specific VE is defined as one minus the cause-specific hazard ratio (vaccine/placebo). For the COVID-19 VE trials, the time to COVID-19 is right-censored, and a substantial percentage of failure cases are missing the infecting virus genotype. We develop estimation and hypothesis testing procedures for strain-specific VE when the failure time is subject to right censoring and the cause-of-failure is subject to missingness, focusing on $J \ge 2$ discrete categorical unordered or ordered virus genotypes. The stratified Cox proportional hazards model is used to relate the cause-specific outcomes to explanatory variables. The inverse probability weighted complete-case (IPW) estimator and the augmented inverse probability weighted complete-case (AIPW) estimator are investigated. Hypothesis tests are developed to assess whether the vaccine provides at least a specified level of efficacy against some viral genotypes and whether VE varies across genotypes, adjusting for covariates. The finite-sample properties of the proposed tests are studied through simulations and are shown to have good performances. In preparation for the real data analyses, the developed methods are applied to a pseudo dataset mimicking the Moderna COVE trial.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
A causal inference framework for spatial confounding
Authors:
Brian Gilbert,
Abhirup Datta,
Joan A. Casey,
Elizabeth L. Ogburn
Abstract:
Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference…
▽ More
Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
△ Less
Submitted 17 June, 2024; v1 submitted 30 December, 2021;
originally announced December 2021.
-
Efficient nonparametric estimation of the covariate-adjusted threshold-response function, a support-restricted stochastic intervention
Authors:
Lars van der Laan,
Wenbo Zhang,
Peter B. Gilbert
Abstract:
Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. This risk, viewed as a function of thresholds and possibly adjusted for covariates, we call the threshold-response function. Extending the work of Donovan, Hudgens and Gilbert (2019), we propose a nonparametric efficient estimator for the covariate-adjust…
▽ More
Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. This risk, viewed as a function of thresholds and possibly adjusted for covariates, we call the threshold-response function. Extending the work of Donovan, Hudgens and Gilbert (2019), we propose a nonparametric efficient estimator for the covariate-adjusted threshold-response function, which utilizes machine learning and Targeted Minimum-Loss Estimation (TMLE). We additionally propose a more general estimator, based on sequential regression, that also applies when there is outcome missingness. We show that the threshold-response for a given threshold may be viewed as the expected outcome under a stochastic intervention where all participants are given a treatment dose above the threshold. We prove the estimator is efficient and characterize its asymptotic distribution. A method to construct simultaneous 95% confidence bands for the threshold-response function and its inverse is given. Furthermore, we discuss how to adjust our estimator when the treatment or biomarker is missing-at-random, as is the case in clinical trials with biased sampling designs, using inverse-probability-weighting. The methods are assessed in a diverse set of simulation settings with rare outcomes and cumulative case-control sampling. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials.
△ Less
Submitted 2 March, 2023; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Assessment of Immune Correlates of Protection via Controlled Vaccine Efficacy and Controlled Risk
Authors:
Peter B. Gilbert,
Youyi Fong,
Marco Carone
Abstract:
Immune correlates of protection (CoPs) are immunologic biomarkers accepted as a surrogate for an infectious disease clinical endpoint and thus can be used for traditional or provisional vaccine approval. To study CoPs in randomized, placebo-controlled trials, correlates of risk (CoRs) are first assessed in vaccine recipients. This analysis does not assess causation, as a CoR may fail to be a CoP.…
▽ More
Immune correlates of protection (CoPs) are immunologic biomarkers accepted as a surrogate for an infectious disease clinical endpoint and thus can be used for traditional or provisional vaccine approval. To study CoPs in randomized, placebo-controlled trials, correlates of risk (CoRs) are first assessed in vaccine recipients. This analysis does not assess causation, as a CoR may fail to be a CoP. We propose a causal CoP analysis that estimates the controlled vaccine efficacy curve across biomarker levels $s$, $CVE(s)$, equal to one minus the ratio of the controlled-risk curve $r_C(s)$ at $s$ and placebo risk, where $r_C(s)$ is causal risk if all participants are assigned vaccine and the biomarker is set to $s$. The criterion for a useful CoP is wide variability of $CVE(s)$ in $s$. Moreover, estimation of $r_C(s)$ is of interest in itself, especially in studies without a placebo arm. For estimation of $r_C(s)$, measured confounders can be adjusted for by any regression method that accommodates missing biomarkers, to which we add sensitivity analysis to quantify robustness of CoP evidence to unmeasured confounding. Application to two harmonized phase 3 trials supports that 50% neutralizing antibody titer has value as a controlled vaccine efficacy CoP for virologically confirmed dengue (VCD): in CYD14 the point estimate (95% confidence interval) for $CVE(s)$ accounting for measured confounders and building in conservative margin for unmeasured confounding increases from 29.6% (95% CI 3.5 to 45.9) at titer 1:36 to 78.5% (95% CI 67.9 to 86.8) at titer 1:1200; these estimates are 17.4% (95% CI -14.4 to 36.5) and 84.5% (95% CI 79.6 to 89.1) for CYD15.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
A general framework for inference on algorithm-agnostic variable importance
Authors:
Brian D. Williamson,
Peter B. Gilbert,
Noah R. Simon,
Marco Carone
Abstract:
In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment…
▽ More
In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.
△ Less
Submitted 13 September, 2021; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Efficient nonparametric inference on the effects of stochastic interventions under two-phase sampling, with applications to vaccine efficacy trials
Authors:
Nima S. Hejazi,
Mark J. van der Laan,
Holly E. Janes,
Peter B. Gilbert,
David C. Benkeser
Abstract:
The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve as effective surrogates for clinically significant…
▽ More
The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve as effective surrogates for clinically significant infection or disease endpoints. However, measuring immune response marker activity is often costly, which has motivated the usage of two-phase sampling for immune response evaluation in clinical trials of preventive vaccines. In such trials, the measurement of immunological markers is performed on a subset of trial participants, where enrollment in this second phase is potentially contingent on the observed study outcome and other participant-level information. We propose nonparametric methodology for efficiently estimating a counterfactual parameter that quantifies the impact of a given immune response marker on the subsequent probability of infection. Along the way, we fill in theoretical gaps pertaining to the asymptotic behavior of nonparametric efficient estimators in the context of two-phase sampling, including a multiple robustness property enjoyed by our estimators. Techniques for constructing confidence intervals and hypothesis tests are presented, and an open source software implementation of the methodology, the txshift R package, is introduced. We illustrate the proposed techniques using data from a recent preventive HIV vaccine efficacy trial.
△ Less
Submitted 3 April, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Ongoing Vaccine and Monoclonal Antibody HIV Prevention Efficacy Trials and Considerations for Sequel Efficacy Trial Designs
Authors:
Peter B. Gilbert
Abstract:
Four randomized placebo-controlled efficacy trials of a candidate vaccine or passively infused monoclonal antibody for prevention of HIV-1 infection are underway (HVTN 702 in South African men and women; HVTN 705 in sub-Saharan African women; HVTN 703/HPTN 081 in sub-Saharan African women; HVTN 704/HPTN 085 in U.S., Peruvian, Brazilian, and Swiss men or transgender persons who have sex with men).…
▽ More
Four randomized placebo-controlled efficacy trials of a candidate vaccine or passively infused monoclonal antibody for prevention of HIV-1 infection are underway (HVTN 702 in South African men and women; HVTN 705 in sub-Saharan African women; HVTN 703/HPTN 081 in sub-Saharan African women; HVTN 704/HPTN 085 in U.S., Peruvian, Brazilian, and Swiss men or transgender persons who have sex with men). Several challenges are posed to the optimal design of the sequel efficacy trials, including: (1) how to account for the evolving mosaic of effective prevention interventions that may be part of the trial design or standard of prevention; (2) how to define viable and optimal sequel trial designs depending on the primary efficacy results and secondary 'correlates of protection' results of each of the ongoing trials; and (3) how to define the primary objective of sequel efficacy trials if HIV-1 incidence is expected to be very low in all study arms such that a standard trial design has a steep opportunity cost. After summarizing the ongoing trials, I discuss statistical science considerations for sequel efficacy trial designs, both generally and specifically to each trial listed above. One conclusion is that the results of 'correlates of protection' analyses, which ascertain how different host immunological markers and HIV-1 viral features impact HIV-1 risk and prevention efficacy, have an important influence on sequel trial design. This influence is especially relevant for the monoclonal antibody trials because of the focused pre-trial hypothesis that potency and coverage of serum neutralization constitutes a surrogate endpoint for HIV-1 infection... (see manuscript for the full abstract)
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Post-randomization Biomarker Effect Modification in an HIV Vaccine Clinical Trial
Authors:
Peter B. Gilbert,
Bryan S. Blette,
Bryan E. Shepherd,
Michael G. Hudgens
Abstract:
While the HVTN 505 trial showed no overall efficacy of the tested vaccine to prevent HIV infection over placebo, previous studies, biological theories, and the finding that immune response markers strongly correlated with infection in vaccine recipients generated the hypothesis that a qualitative interaction occurred. This hypothesis can be assessed with statistical methods for studying treatment…
▽ More
While the HVTN 505 trial showed no overall efficacy of the tested vaccine to prevent HIV infection over placebo, previous studies, biological theories, and the finding that immune response markers strongly correlated with infection in vaccine recipients generated the hypothesis that a qualitative interaction occurred. This hypothesis can be assessed with statistical methods for studying treatment effect modification by an intermediate response variable (i.e., principal stratification effect modification (PSEM) methods). However, available PSEM methods make untestable structural risk assumptions, such that assumption-lean versions of PSEM methods are needed in order to surpass the high bar of evidence to demonstrate a qualitative interaction. Fortunately, the survivor average causal effect (SACE) literature is replete with assumption-lean methods that can be readily adapted to the PSEM application for the special case of a binary intermediate response variable. We map this adaptation, opening up a host of new PSEM methods for a binary intermediate variable measured via two-phase sampling, for a dichotomous or failure time final outcome and including or excluding the SACE monotonicity assumption. The new methods support that the vaccine partially protected vaccine recipients with a high polyfunctional CD8+ T cell response, an important new insight for the HIV vaccine field.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Pharmacokinetics Simulations for Studying Correlates of Prevention Efficacy of Passive HIV-1 Antibody Prophylaxis in the Antibody Mediated Prevention (AMP) Study
Authors:
Lily Zhang,
Peter B. Gilbert,
Edmund Capparelli,
Yunda Huang
Abstract:
A key objective in two phase 2b AMP clinical trials of VRC01 is to evaluate whether drug concentration over time, as estimated by non-linear mixed effects pharmacokinetics (PK) models, is associated with HIV infection rate. We conducted a simulation study of marker sampling designs, and evaluated the effect of study adherence and sub-cohort sample size on PK model estimates in multiple-dose studie…
▽ More
A key objective in two phase 2b AMP clinical trials of VRC01 is to evaluate whether drug concentration over time, as estimated by non-linear mixed effects pharmacokinetics (PK) models, is associated with HIV infection rate. We conducted a simulation study of marker sampling designs, and evaluated the effect of study adherence and sub-cohort sample size on PK model estimates in multiple-dose studies. With m=120, even under low adherence (about half of study visits missing per participant), reasonably unbiased and consistent estimates of most fixed and random effect terms were obtained. Coarsened marker sampling schedules were also studied.
△ Less
Submitted 25 January, 2018;
originally announced January 2018.
-
Generating survival times using Cox proportional hazards models with cyclic time-varying covariates, with application to a multiple-dose monoclonal antibody clinical trial
Authors:
Yunda Huang,
Yuanyuan Zhang,
Zong Zhang,
Peter B. Gilbert
Abstract:
In two harmonized efficacy studies to prevent HIV infection through multiple infusions of the monoclonal antibody VRC01, a key objective is to evaluate whether the serum concentration of VRC01, which changes cyclically over time along with the infusion schedule, is associated with the rate of HIV infection. Simulation studies are needed in the development of such survival models. In this paper, we…
▽ More
In two harmonized efficacy studies to prevent HIV infection through multiple infusions of the monoclonal antibody VRC01, a key objective is to evaluate whether the serum concentration of VRC01, which changes cyclically over time along with the infusion schedule, is associated with the rate of HIV infection. Simulation studies are needed in the development of such survival models. In this paper, we consider simulating event time data with a continuous time-varying covariate whose values vary with time through multiple drug administration cycles, and whose effect on survival changes differently before and after a threshold within each cycle. The latter accommodates settings with a zero-protection biomarker threshold above which the drug provides a varying level of protection depending on the biomarker level, but below which the drug provides no protection. We propose two simulation approaches: one based on simulating survival data under a single-dose regimen first before data are aggregated over multiple doses, and another based on simulating survival data directly under a multiple-dose regimen. We generate time-to-event data following a Cox proportional hazards model based on inverting the cumulative hazard function and a log link function for relating the hazard function to the covariates. The method's validity is assessed in two sets of simulation experiments. The results indicate that the proposed procedures perform well in producing data that conform to their cyclic nature and assumptions of the Cox proportional hazards model.
△ Less
Submitted 24 January, 2018;
originally announced January 2018.
-
Evaluation of Treatment Effect Modification by Biomarkers Measured Pre- and Post-randomization in the Presence of Non-monotone Missingness
Authors:
Yingying Zhuang,
Ying Huang,
Peter B. Gilbert
Abstract:
In vaccine studies, investigators are often interested in studying effect modifiers of clinical treatment efficacy by biomarker-based principal strata, which is useful for selecting biomarker study endpoints for evaluating treatments in new trials, exploring biological mechanisms of clinical treatment efficacy, and studying mediators of clinical treatment efficacy. However, in trials where partici…
▽ More
In vaccine studies, investigators are often interested in studying effect modifiers of clinical treatment efficacy by biomarker-based principal strata, which is useful for selecting biomarker study endpoints for evaluating treatments in new trials, exploring biological mechanisms of clinical treatment efficacy, and studying mediators of clinical treatment efficacy. However, in trials where participants may enter the study with prior exposure therefore with variable baseline biomarker values, clinical treatment efficacy may depend jointly on a biomarker measured at baseline and measured at a fixed time after vaccination. Therefore, it is of interest to conduct a bivariate effect modification analysis by biomarker-based principal strata and baseline biomarker values. Previous methods allow this assessment if participants who have the biomarker measured at the the fixed time point post randomization would also have the biomarker measured at baseline. However, additional complications in study design could happen in practice. For example, in the Dengue correlates study, baseline biomarker values were only available from a fraction of participants who have biomarkers measured post-randomization. How to conduct the bivariate effect modification analysis in these studies remains an open research question. In this article, we propose an estimated likelihood method to utilize the sub-sampled baseline biomarker in the effect modification analysis and illustrate our method with datasets from two dengue phase 3 vaccine efficacy trials.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
Partial Bridging of Vaccine Efficacy to New Populations
Authors:
Alexander R. Luedtke,
Peter B. Gilbert
Abstract:
Suppose one has data from one or more completed vaccine efficacy trials and wishes to estimate the efficacy in a new setting. Often logistical or ethical considerations make running another efficacy trial impossible. Fortunately, if there is a biomarker that is the primary modifier of efficacy, then the biomarker-conditional efficacy may be identical in the completed trials and the new setting, or…
▽ More
Suppose one has data from one or more completed vaccine efficacy trials and wishes to estimate the efficacy in a new setting. Often logistical or ethical considerations make running another efficacy trial impossible. Fortunately, if there is a biomarker that is the primary modifier of efficacy, then the biomarker-conditional efficacy may be identical in the completed trials and the new setting, or at least informative enough to meaningfully bound this quantity. Given a sample of this biomarker from the new population, we might hope we can bridge the results of the completed trials to estimate the vaccine efficacy in this new population. Unfortunately, even knowing the true conditional efficacy in the new population fails to identify the marginal efficacy due to the unknown conditional unvaccinated risk. We define a curve that partially identifies (lower bounds) the marginal efficacy in the new population as a function of the population's marginal unvaccinated risk, under the assumption that one can identify bounds on the conditional unvaccinated risk in the new population. Interpreting the curve only requires identifying plausible regions of the marginal unvaccinated risk in the new population. We present a nonparametric estimator of this curve and develop valid lower confidence bounds that concentrate at a parametric rate. We use vaccine terminology throughout, but the results apply to general binary interventions and bounded outcomes.
△ Less
Submitted 24 January, 2017;
originally announced January 2017.
-
Nonparametric Bounds and Sensitivity Analysis of Treatment Effects
Authors:
Amy Richardson,
Michael G. Hudgens,
Peter B. Gilbert,
Jason P. Fine
Abstract:
This paper considers conducting inference about the effect of a treatment (or exposure) on an outcome of interest. In the ideal setting where treatment is assigned randomly, under certain assumptions the treatment effect is identifiable from the observable data and inference is straightforward. However, in other settings such as observational studies or randomized trials with noncompliance, the tr…
▽ More
This paper considers conducting inference about the effect of a treatment (or exposure) on an outcome of interest. In the ideal setting where treatment is assigned randomly, under certain assumptions the treatment effect is identifiable from the observable data and inference is straightforward. However, in other settings such as observational studies or randomized trials with noncompliance, the treatment effect is no longer identifiable without relying on untestable assumptions. Nonetheless, the observable data often do provide some information about the effect of treatment, that is, the parameter of interest is partially identifiable. Two approaches are often employed in this setting: (i) bounds are derived for the treatment effect under minimal assumptions, or (ii) additional untestable assumptions are invoked that render the treatment effect identifiable and then sensitivity analysis is conducted to assess how inference about the treatment effect changes as the untestable assumptions are varied. Approaches (i) and (ii) are considered in various settings, including assessing principal strata effects, direct and indirect effects and effects of time-varying exposures. Methods for drawing formal inference about partially identified parameters are also discussed.
△ Less
Submitted 5 March, 2015;
originally announced March 2015.
-
Assessing surrogate endpoints in vaccine trials with case-cohort sampling and the Cox model
Authors:
Li Qin,
Peter B. Gilbert,
Dean Follmann,
Dongfeng Li
Abstract:
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition u…
▽ More
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition under the principal stratification framework proposed by Frangakis and Rubin [Biometrics 58 (2002) 21--29] and adapted by Gilbert and Hudgens (2006), we introduce estimands that measure the value of an immune response as a surrogate of protection in the context of the Cox proportional hazards model. The estimands are not identified because the immune response to vaccine is not measured in placebo recipients. We formulate the problem as a Cox model with missing covariates, and employ novel trial designs for predicting the missing immune responses and thereby identifying the estimands. The first design utilizes information from baseline predictors of the immune response, and bridges their relationship in the vaccine recipients to the placebo recipients. The second design provides a validation set for the unmeasured immune responses of uninfected placebo recipients by immunizing them with the study vaccine after trial closeout. A maximum estimated likelihood approach is proposed for estimation of the parameters. Simulated data examples are given to evaluate the proposed designs and study their properties.
△ Less
Submitted 27 March, 2008;
originally announced March 2008.