Search | arXiv e-print repository

Double Robust Variance Estimation

Authors: Bonnie E. Shook-Sa, Paul N. Zivich, Chanhwa Lee, Keyi Xue, Rachael K. Ross, Jessie K. Edwards, Jeffrey S. A. Stringer, Stephen R. Cole

Abstract: Doubly robust estimators have gained popularity in the field of causal inference due to their ability to provide consistent point estimates when either an outcome or exposure model is correctly specified. However, the influence function based variance estimator frequently used with doubly robust estimators is only consistent when both the outcome and exposure models are correctly specified. Here,… ▽ More Doubly robust estimators have gained popularity in the field of causal inference due to their ability to provide consistent point estimates when either an outcome or exposure model is correctly specified. However, the influence function based variance estimator frequently used with doubly robust estimators is only consistent when both the outcome and exposure models are correctly specified. Here, use of M-estimation and the empirical sandwich variance estimator for doubly robust point and variance estimation is demonstrated. Simulation studies illustrate the properties of the influence function based and empirical sandwich variance estimators. Estimators are applied to data from the Improving Pregnancy Outcomes with Progesterone (IPOP) trial to estimate the effect of maternal anemia on birth weight among women with HIV. In the example, birth weights if all women had anemia were estimated to be lower than birth weights if no women had anemia, though estimates were imprecise. Variance estimates were more stable under varying model specifications for the empirical sandwich variance estimator than the influence function based variance estimator. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 19 pages, 5 figures, 6 tables

arXiv:2403.18115 [pdf, other]

Assessing COVID-19 Vaccine Effectiveness in Observational Studies via Nested Trial Emulation

Authors: Justin B. DeMonte, Bonnie E. Shook-Sa, Michael G. Hudgens

Abstract: Observational data are often used to estimate real-world effectiveness and durability of coronavirus disease 2019 (COVID-19) vaccines. A sequence of nested trials can be emulated to draw inference from such data while minimizing selection bias, immortal time bias, and confounding. Typically, when nested trial emulation (NTE) is employed, effect estimates are pooled across trials to increase statis… ▽ More Observational data are often used to estimate real-world effectiveness and durability of coronavirus disease 2019 (COVID-19) vaccines. A sequence of nested trials can be emulated to draw inference from such data while minimizing selection bias, immortal time bias, and confounding. Typically, when nested trial emulation (NTE) is employed, effect estimates are pooled across trials to increase statistical efficiency. However, such pooled estimates may lack a clear interpretation when the treatment effect is heterogeneous across trials. In the context of COVID-19, vaccine effectiveness quite plausibly will vary over calendar time due to newly emerging variants of the virus. This manuscript considers a NTE inverse probability weighted estimator of vaccine effectiveness that may vary over calendar time, time since vaccination, or both. Statistical testing of the trial effect homogeneity assumption is considered. Simulation studies are presented examining the finite-sample performance of these methods under a variety of scenarios. The methods are used to estimate vaccine effectiveness against COVID-19 outcomes using observational data on over 120,000 residents of Abruzzo, Italy during 2021. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 27 pages, 2 figures

arXiv:2311.09388 [pdf, other]

Synthesis estimators for positivity violations with a continuous covariate

Authors: Paul N Zivich, Jessie K Edwards, Bonnie E Shook-Sa, Eric T Lofgren, Justin Lessler, Stephen R Cole

Abstract: Studies intended to estimate the effect of a treatment, like randomized trials, may not be sampled from the desired target population. To correct for this discrepancy, estimates can be transported to the target population. Methods for transporting between populations are often premised on a positivity assumption, such that all relevant covariate patterns in one population are also present in the o… ▽ More Studies intended to estimate the effect of a treatment, like randomized trials, may not be sampled from the desired target population. To correct for this discrepancy, estimates can be transported to the target population. Methods for transporting between populations are often premised on a positivity assumption, such that all relevant covariate patterns in one population are also present in the other. However, eligibility criteria, particularly in the case of trials, can result in violations of positivity when transporting to external populations. To address nonpositivity, a synthesis of statistical and mathematical models can be considered. This approach integrates multiple data sources (e.g. trials, observational, pharmacokinetic studies) to estimate treatment effects, leveraging mathematical models to handle positivity violations. This approach was previously demonstrated for positivity violations by a single binary covariate. Here, we extend the synthesis approach for positivity violations with a continuous covariate. For estimation, two novel augmented inverse probability weighting estimators are proposed. Both estimators are contrasted with other common approaches for addressing nonpositivity. Empirical performance is compared via Monte Carlo simulation. Finally, the competing approaches are illustrated with an example in the context of two-drug versus one-drug antiretroviral therapy on CD4 T cell counts among women with HIV. △ Less

Submitted 31 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2306.10976 [pdf, other]

Empirical sandwich variance estimator for iterated conditional expectation g-computation

Authors: Paul N Zivich, Rachael K Ross, Bonnie E Shook-Sa, Stephen R Cole, Jessie K Edwards

Abstract: Iterated conditional expectation (ICE) g-computation is an estimation approach for addressing time-varying confounding for both longitudinal and time-to-event data. Unlike other g-computation implementations, ICE avoids the need to specify models for each time-varying covariate. For variance estimation, previous work has suggested the bootstrap. However, bootstrap** can be computationally intens… ▽ More Iterated conditional expectation (ICE) g-computation is an estimation approach for addressing time-varying confounding for both longitudinal and time-to-event data. Unlike other g-computation implementations, ICE avoids the need to specify models for each time-varying covariate. For variance estimation, previous work has suggested the bootstrap. However, bootstrap** can be computationally intense and sensitive to the number of resamples used. Here, we present ICE g-computation as a set of stacked estimating equations. Therefore, the variance for the ICE g-computation estimator can be consistently estimated using the empirical sandwich variance estimator. Performance of the variance estimator was evaluated empirically with a simulation study. The proposed approach is also demonstrated with an illustrative example on the effect of cigarette smoking on the prevalence of hypertension. In the simulation study, the empirical sandwich variance estimator appropriately estimated the variance. When comparing runtimes between the sandwich variance estimator and the bootstrap for the applied example, the sandwich estimator was substantially faster, even when bootstraps were run in parallel. The empirical sandwich variance estimator is a viable option for variance estimation with ICE g-computation. △ Less

Submitted 4 March, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 18 pages, 1 figure, 6 tables

arXiv:2305.00845 [pdf, other]

Fusing Trial Data for Treatment Comparisons: Single versus Multi-Span Bridging

Authors: Bonnie E. Shook-Sa, Paul N. Zivich, Samuel P. Rosin, Jessie K. Edwards, Adaora A. Adimora, Michael G. Hudgens, Stephen R. Cole

Abstract: While randomized controlled trials (RCTs) are critical for establishing the efficacy of new therapies, there are limitations regarding what comparisons can be made directly from trial data. RCTs are limited to a small number of comparator arms and often compare a new therapeutic to a standard of care which has already proven efficacious. It is sometimes of interest to estimate the efficacy of the… ▽ More While randomized controlled trials (RCTs) are critical for establishing the efficacy of new therapies, there are limitations regarding what comparisons can be made directly from trial data. RCTs are limited to a small number of comparator arms and often compare a new therapeutic to a standard of care which has already proven efficacious. It is sometimes of interest to estimate the efficacy of the new therapy relative to a treatment that was not evaluated in the same trial, such as a placebo or an alternative therapy that was evaluated in a different trial. Such multi-study comparisons are challenging because of potential differences between trial populations that can affect the outcome. In this paper, two bridging estimators are considered that allow for comparisons of treatments evaluated in different trials using data fusion methods to account for measured differences in trial populations. A "multi-span'' estimator leverages a shared arm between two trials, while a "single-span'' estimator does not require a shared arm. A diagnostic statistic that compares the outcome in the standardized shared arms is provided. The two estimators are compared in simulations, where both estimators demonstrate minimal empirical bias and nominal confidence interval coverage when the identification assumptions are met. The estimators are applied to data from the AIDS Clinical Trials Group 320 and 388 to compare the efficacy of two-drug versus four-drug antiretroviral therapy on CD4 cell counts among persons with advanced HIV. The single-span approach requires fewer identification assumptions and was more efficient in simulations and the application. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2303.01572 [pdf, other]

doi 10.1097/EDE.0000000000001677

Transportability without positivity: a synthesis of statistical and simulation modeling

Authors: Paul N Zivich, Jessie K Edwards, Eric T Lofgren, Stephen R Cole, Bonnie E Shook-Sa, Justin Lessler

Abstract: When estimating an effect of an action with a randomized or observational study, that study is often not a random sample of the desired target population. Instead, estimates from that study can be transported to the target population. However, transportability methods generally rely on a positivity assumption, such that all relevant covariate patterns in the target population are also observed in… ▽ More When estimating an effect of an action with a randomized or observational study, that study is often not a random sample of the desired target population. Instead, estimates from that study can be transported to the target population. However, transportability methods generally rely on a positivity assumption, such that all relevant covariate patterns in the target population are also observed in the study sample. Strict eligibility criteria, particularly in the context of randomized trials, may lead to violations of this assumption. Two common approaches to address positivity violations are restricting the target population and restricting the relevant covariate set. As neither of these restrictions are ideal, we instead propose a synthesis of statistical and simulation models to address positivity violations. We propose corresponding g-computation and inverse probability weighting estimators. The restriction and synthesis approaches to addressing positivity violations are contrasted with a simulation experiment and an illustrative example in the context of sexually transmitted infection testing uptake. In both cases, the proposed synthesis approach accurately addressed the original research question when paired with a thoughtfully selected simulation model. Neither of the restriction approaches were able to accurately address the motivating question. As public health decisions must often be made with imperfect target population information, model synthesis is a viable approach given a combination of empirical data and external information based on the best available knowledge. △ Less

Submitted 3 January, 2024; v1 submitted 2 March, 2023; originally announced March 2023.

Journal ref: Epidemiology, 35(1), 23-31 (2024)

arXiv:2206.04445 [pdf, other]

Bridged treatment comparisons: an illustrative application in HIV treatment

Authors: Paul N Zivich, Stephen R Cole, Jessie K Edwards, Bonnie E Shook-Sa, Alexander Breskin, Michael G Hudgens

Abstract: Comparisons of treatments, interventions, or exposures are of central interest in epidemiology, but direct comparisons are not always possible due to practical or ethical reasons. Here, we detail a fusion approach to compare treatments across studies. The motivating example entails comparing the risk of the composite outcome of death, AIDS, or greater than a 50% CD4 cell count decline in people wi… ▽ More Comparisons of treatments, interventions, or exposures are of central interest in epidemiology, but direct comparisons are not always possible due to practical or ethical reasons. Here, we detail a fusion approach to compare treatments across studies. The motivating example entails comparing the risk of the composite outcome of death, AIDS, or greater than a 50% CD4 cell count decline in people with HIV when assigned triple versus mono antiretroviral therapy, using data from the AIDS Clinical Trial Group (ACTG) 175 (mono versus dual therapy) and ACTG 320 (dual versus triple therapy). We review a set of identification assumptions and estimate the risk difference using an inverse probability weighting estimator that leverages the shared trial arms (dual therapy). A fusion diagnostic based on comparing the shared arms is proposed that may indicate violation of the identification assumptions. Application of the data fusion estimator and diagnostic to the ACTG trials indicates triple therapy results in a reduction in risk compared to monotherapy in individuals with baseline CD4 counts between 50 and 300 cells/mm$^3$. Bridged treatment comparisons address questions that none of the constituent data sources could address alone, but valid fusion-based inference requires careful consideration of the underlying assumptions. △ Less

Submitted 22 August, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 21 pages, 3 figures, 5 tables

arXiv:2203.11300 [pdf, other]

Delicatessen: M-Estimation in Python

Authors: Paul N Zivich, Mark Klose, Stephen R Cole, Jessie K Edwards, Bonnie E Shook-Sa

Abstract: M-estimation is a general statistical framework that simplifies estimation. Here, we introduce delicatessen, a Python library that automates the tedious calculations of M-estimation, and supports both built-in user-specified estimating equations. To highlight the utility of delicatessen for quantitative data analysis, we provide several illustrations common to life science research: linear regress… ▽ More M-estimation is a general statistical framework that simplifies estimation. Here, we introduce delicatessen, a Python library that automates the tedious calculations of M-estimation, and supports both built-in user-specified estimating equations. To highlight the utility of delicatessen for quantitative data analysis, we provide several illustrations common to life science research: linear regression robust to outliers, estimation of a dose-response curve, and standardization of results. △ Less

Submitted 10 October, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: 1 figure

arXiv:2202.01650 [pdf, other]

Exposure Effects on Count Outcomes with Observational Data, with Application to Incarcerated Women

Authors: Bonnie E. Shook-Sa, Michael G. Hudgens, Andrea K. Knittel, Andrew Edmonds, Catalina Ramirez, Stephen R. Cole, Mardge Cohen, Adebola Adedimeji, Tonya Taylor, Katherine G. Michel, Andrea Kovacs, Jennifer Cohen, Jessica Donohue, Antonina Foster, Margaret A. Fischl, Dustin Long, Adaora A. Adimora

Abstract: Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outco… ▽ More Causal inference methods can be applied to estimate the effect of a point exposure or treatment on an outcome of interest using data from observational studies. For example, in the Women's Interagency HIV Study, it is of interest to understand the effects of incarceration on the number of sexual partners and the number of cigarettes smoked after incarceration. In settings like this where the outcome is a count, the estimand is often the causal mean ratio, i.e., the ratio of the counterfactual mean count under exposure to the counterfactual mean count under no exposure. This paper considers estimators of the causal mean ratio based on inverse probability of treatment weights, the parametric g-formula, and doubly robust estimation, each of which can account for overdispersion, zero-inflation, and hea** in the measured outcome. Methods are compared in simulations and are applied to data from the Women's Interagency HIV Study. △ Less

Submitted 6 November, 2023; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2111.02910 [pdf, other]

Estimating SARS-CoV-2 Seroprevalence

Authors: Samuel P. Rosin, Bonnie E. Shook-Sa, Stephen R. Cole, Michael G. Hudgens

Abstract: Governments and public health authorities use seroprevalence studies to guide responses to the COVID-19 pandemic. Seroprevalence surveys estimate the proportion of individuals who have detectable SARS-CoV-2 antibodies. However, serologic assays are prone to misclassification error, and non-probability sampling may induce selection bias. In this paper, nonparametric and parametric seroprevalence es… ▽ More Governments and public health authorities use seroprevalence studies to guide responses to the COVID-19 pandemic. Seroprevalence surveys estimate the proportion of individuals who have detectable SARS-CoV-2 antibodies. However, serologic assays are prone to misclassification error, and non-probability sampling may induce selection bias. In this paper, nonparametric and parametric seroprevalence estimators are considered that address both challenges by leveraging validation data and assuming equal probabilities of sample inclusion within covariate-defined strata. Both estimators are shown to be consistent and asymptotically normal, and consistent variance estimators are derived. Simulation studies are presented comparing the estimators over a range of scenarios. The methods are used to estimate SARS-CoV-2 seroprevalence in New York City, Belgium, and North Carolina. △ Less

Submitted 9 November, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: Main text: 23 pages, 5 figures, 3 tables. Appendix: 24 pages, 18 figures. Preprint

arXiv:2003.05979 [pdf, other]

Power and Sample Size for Marginal Structural Models

Authors: Bonnie E. Shook-Sa, Michael G. Hudgens

Abstract: Marginal structural models fit via inverse probability of treatment weighting are commonly used to control for confounding when estimating causal effects from observational data. When planning a study that will be analyzed with marginal structural modeling, determining the required sample size for a given level of statistical power is challenging because of the effect of weighting on the variance… ▽ More Marginal structural models fit via inverse probability of treatment weighting are commonly used to control for confounding when estimating causal effects from observational data. When planning a study that will be analyzed with marginal structural modeling, determining the required sample size for a given level of statistical power is challenging because of the effect of weighting on the variance of the estimated causal means. This paper considers the utility of the design effect to quantify the effect of weighting on the precision of causal estimates. The design effect is defined as the ratio of the variance of the causal mean estimator divided by the variance of a naive estimator if, counter to fact, no confounding had been present and weights were not needed. A simple, closed-form approximation of the design effect is derived that is outcome invariant and can be estimated during the study design phase. Once the design effect is approximated for each treatment group, sample size calculations are conducted as for a randomized trial, but with variances inflated by the design effects to account for weighting. Simulations demonstrate the accuracy of the design effect approximation, and practical considerations are discussed. △ Less

Submitted 12 March, 2020; originally announced March 2020.

Showing 1–11 of 11 results for author: Shook-Sa, B E