Search | arXiv e-print repository

Sparse two-stage Bayesian meta-analysis for individualized treatments

Authors: Junwei Shen, Erica E. M. Moodie, Shirin Golchi

Abstract: Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies m… ▽ More Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies may be required due to the low power provided by smaller datasets for detecting the often small treatment-covariate interactions. However, sharing of individual-level data is sometimes constrained. Furthermore, sparsity may arise in two senses: different data sites may recruit from different populations, making it infeasible to estimate identical models or all parameters of interest at all sites, and the number of non-zero parameters in the model for the treatment rule may be small. To address these issues, we adopt a two-stage Bayesian meta-analysis approach to estimate individualized treatment rules which optimize expected patient outcomes using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters which fully characterize the optimal individualized treatment rule. We estimate the optimal Warfarin dose strategy using data from the International Warfarin Pharmacogenetics Consortium, where data sparsity and small treatment-covariate interaction effects pose additional statistical challenges. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.08180 [pdf, ps, other]

An adaptive enrichment design using Bayesian model averaging for selection and threshold-identification of tailoring variables

Authors: Lara Maleyeff, Shirin Golchi, Erica E. M. Moodie, Marie Hudson

Abstract: Precision medicine stands as a transformative approach in healthcare, offering tailored treatments that can enhance patient outcomes and reduce healthcare costs. As understanding of complex disease improves, clinical trials are being designed to detect subgroups of patients with enhanced treatment effects. Biomarker-driven adaptive enrichment designs, which enroll a general population initially an… ▽ More Precision medicine stands as a transformative approach in healthcare, offering tailored treatments that can enhance patient outcomes and reduce healthcare costs. As understanding of complex disease improves, clinical trials are being designed to detect subgroups of patients with enhanced treatment effects. Biomarker-driven adaptive enrichment designs, which enroll a general population initially and later restrict accrual to treatment-sensitive patients, are gaining popularity. Current practice often assumes either pre-trial knowledge of biomarkers defining treatment-sensitive subpopulations or a simple, linear relationship between continuous markers and treatment effectiveness. Motivated by a trial studying rheumatoid arthritis treatment, we propose a Bayesian adaptive enrichment design which identifies important tailoring variables out of a larger set of candidate biomarkers. Our proposed design is equipped with a flexible modelling framework where the effects of continuous biomarkers are introduced using free knot B-splines. The parameters of interest are then estimated by marginalizing over the space of all possible variable combinations using Bayesian model averaging. At interim analyses, we assess whether a biomarker-defined subgroup has enhanced or reduced treatment effects, allowing for early termination due to efficacy or futility and restricting future enrollment to treatment-sensitive patients. We consider pre-categorized and continuous biomarkers, the latter of which may have complex, nonlinear relationships to the outcome and treatment effect. Using simulations, we derive the operating characteristics of our design and compare its performance to two existing approaches. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 34 pages

arXiv:2404.11323 [pdf, ps, other]

Bayesian Optimization for Identification of Optimal Biological Dose Combinations in Personalized Dose-Finding Trials

Authors: James Willard, Shirin Golchi, Erica EM Moodie

Abstract: Early phase, personalized dose-finding trials for combination therapies seek to identify patient-specific optimal biological dose (OBD) combinations, which are defined as safe dose combinations which maximize therapeutic benefit for a specific covariate pattern. Given the small sample sizes which are typical of these trials, it is challenging for traditional parametric approaches to identify OBD c… ▽ More Early phase, personalized dose-finding trials for combination therapies seek to identify patient-specific optimal biological dose (OBD) combinations, which are defined as safe dose combinations which maximize therapeutic benefit for a specific covariate pattern. Given the small sample sizes which are typical of these trials, it is challenging for traditional parametric approaches to identify OBD combinations across multiple dosing agents and covariate patterns. To address these challenges, we propose a Bayesian optimization approach to dose-finding which formally incorporates toxicity information into both the initial data collection process and the sequential search strategy. Independent Gaussian processes are used to model the efficacy and toxicity surfaces, and an acquisition function is utilized to define the dose-finding strategy and an early stop** rule. This work is motivated by a personalized dose-finding trial which considers a dual-agent therapy for obstructive sleep apnea, where OBD combinations are tailored to obstructive sleep apnea severity. To compare the performance of the personalized approach to a standard approach where covariate information is ignored, a simulation study is performed. We conclude that personalized dose-finding is essential in the presence of heterogeneity. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 3 Figures, 1 Table

arXiv:2404.07411 [pdf, ps, other]

Joint mixed-effects models for causal inference in clustered network-based observational studies

Authors: Vanessa McNealis, Erica E. M. Moodie, Nema Dean

Abstract: Causal inference on populations embedded in social networks poses technical challenges, since the typical no interference assumption frequently does not hold. Existing methods developed in the context of network interference rely upon the assumption of no unmeasured confounding. However, when faced with multilevel network data, there may be a latent factor influencing both the exposure and the out… ▽ More Causal inference on populations embedded in social networks poses technical challenges, since the typical no interference assumption frequently does not hold. Existing methods developed in the context of network interference rely upon the assumption of no unmeasured confounding. However, when faced with multilevel network data, there may be a latent factor influencing both the exposure and the outcome at the cluster level. We propose a Bayesian inference approach that combines a joint mixed-effects model for the outcome and the exposure with direct standardization to identify and estimate causal effects in the presence of network interference and unmeasured cluster confounding. In simulations, we compare our proposed method with linear mixed and fixed effects models and show that unbiased estimation is achieved using the joint model. Having derived valid tools for estimation, we examine the effect of maternal college education on adolescent school performance using data from the National Longitudinal Study of Adolescent Health. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.04564 [pdf, other]

Estimating hidden population size from a single respondent-driven sampling survey

Authors: Mamadou Yauck, Erica EM Moodie, Alain Fourmigue, Milada Dvorakova, Gilles Lambert, Daniel Grace, Joseph Cox

Abstract: This work is concerned with the estimation of hard-to-reach population sizes using a single respondent-driven sampling (RDS) survey, a variant of chain-referral sampling that leverages social relationships to reach members of a hidden population. The popularity of RDS as a standard approach for surveying hidden populations brings theoretical and methodological challenges regarding the estimation o… ▽ More This work is concerned with the estimation of hard-to-reach population sizes using a single respondent-driven sampling (RDS) survey, a variant of chain-referral sampling that leverages social relationships to reach members of a hidden population. The popularity of RDS as a standard approach for surveying hidden populations brings theoretical and methodological challenges regarding the estimation of population sizes, mainly for public health purposes. This paper proposes a frequentist, model-based framework for estimating the size of a hidden population using a network-based approach. An optimization algorithm is proposed for obtaining the identification region of the target parameter when model assumptions are violated. We characterize the asymptotic behavior of our proposed methodology and assess its finite sample performance under departures from model assumptions. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2311.01638 [pdf, other]

Inference on summaries of a model-agnostic longitudinal variable importance trajectory

Authors: Brian D. Williamson, Erica E. M. Moodie, Susan M. Shortreed

Abstract: In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be es… ▽ More In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. We propose a nonparametric efficient estimation and inference procedure as well as a null hypothesis testing procedure that are valid even when complex machine learning tools are used for prediction. Through simulations, we demonstrate that our proposed procedures have good operating characteristics, and we illustrate their use by investigating the longitudinal importance of risk factors for suicide attempt. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 65 pages (29 main, 36 supplementary), 5 figures (3 main, 2 supplementary), 19 tables (2 main, 17 supplementary)

arXiv:2310.17334 [pdf, ps, other]

Bayesian Optimization for Personalized Dose-Finding Trials with Combination Therapies

Authors: James Willard, Shirin Golchi, Erica E. M. Moodie, Bruno Boulanger, Bradley P. Carlin

Abstract: Identification of optimal dose combinations in early phase dose-finding trials is challenging, due to the trade-off between precisely estimating the many parameters required to flexibly model the possibly non-monotonic dose-response surface, and the small sample sizes in early phase trials. This difficulty is even more pertinent in the context of personalized dose-finding, where patient characteri… ▽ More Identification of optimal dose combinations in early phase dose-finding trials is challenging, due to the trade-off between precisely estimating the many parameters required to flexibly model the possibly non-monotonic dose-response surface, and the small sample sizes in early phase trials. This difficulty is even more pertinent in the context of personalized dose-finding, where patient characteristics are used to identify tailored optimal dose combinations. To overcome these challenges, we propose the use of Bayesian optimization for finding optimal dose combinations in standard ("one size fits all") and personalized multi-agent dose-finding trials. Bayesian optimization is a method for estimating the global optima of expensive-to-evaluate objective functions. The objective function is approximated by a surrogate model, commonly a Gaussian process, paired with a sequential design strategy to select the next point via an acquisition function. This work is motivated by an industry-sponsored problem, where focus is on optimizing a dual-agent therapy in a setting featuring minimal toxicity. To compare the performance of the standard and personalized methods under this setting, simulation studies are performed for a variety of scenarios. Our study concludes that taking a personalized approach is highly beneficial in the presence of heterogeneity. △ Less

Submitted 11 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 28 pages, 4 figures, 1 table

arXiv:2304.12548 [pdf, other]

The impact of directly observed therapy on the efficacy of Tuberculosis treatment: A Bayesian multilevel approach

Authors: Widemberg S. Nobre, Alexandra M. Schmidt, Erica E. M. Moodie, David A. Stephens

Abstract: We propose and discuss a Bayesian procedure to estimate the average treatment effect (ATE) for multilevel observations in the presence of confounding. We focus on situations where the confounders may be latent (e.g., spatial latent effects). This work is motivated by an interest in determining the causal impact of directly observed therapy (DOT) on the successful treatment of Tuberculosis (TB); th… ▽ More We propose and discuss a Bayesian procedure to estimate the average treatment effect (ATE) for multilevel observations in the presence of confounding. We focus on situations where the confounders may be latent (e.g., spatial latent effects). This work is motivated by an interest in determining the causal impact of directly observed therapy (DOT) on the successful treatment of Tuberculosis (TB); the available data correspond to individual-level information observed across different cities in a state in Brazil. We focus on propensity score regression and covariate adjustment to balance the treatment (DOT) allocation. We discuss the need to include latent local-level random effects in the propensity score model to reduce bias in the estimation of the ATE. A simulation study suggests that accounting for the multilevel nature of the data with latent structures in both the outcome and propensity score models has the potential to reduce bias in the estimation of causal effects. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.15281 [pdf, other]

Bayesian inference for optimal dynamic treatment regimes in practice

Authors: Daniel Rodriguez Duque, Erica E. M. Moodie, David A. Stephens

Abstract: In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, th… ▽ More In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, they are most interested in identifying optimal DTRs. Recent work has developed Bayesian methods for identifying optimal DTRs in a family indexed by $ψ$ via Bayesian dynamic marginal structural models (MSMs) (Rodriguez Duque et al., 2022a); we review the proposed estimation procedure and illustrate its use via the new BayesDTR R package. Although methods in (Rodriguez Duque et al., 2022a) can estimate optimal DTRs well, they may lead to biased estimators when the model for the expected outcome if everyone in a population were to follow a given treatment strategy, known as a value function, is misspecified or when a grid search for the optimum is employed. We describe recent work that uses a Gaussian process ($GP$) prior on the value function as a means to robustly identify optimal DTRs (Rodriguez Duque et al., 2022b). We demonstrate how a $GP$ approach may be implemented with the BayesDTR package and contrast it with other value-search approaches to identifying optimal DTRs. We use data from an HIV therapeutic trial in order to illustrate a standard analysis with these methods, using both the original observed trial data and an additional simulated component to showcase a longitudinal (two-stage DTR) analysis. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2302.00230 [pdf, ps, other]

Revisiting the Effects of Maternal Education on Adolescents' Academic Performance: Doubly Robust Estimation in a Network-Based Observational Study

Authors: Vanessa McNealis, Erica E. M. Moodie, Nema Dean

Abstract: In many contexts, particularly when study subjects are adolescents, peer effects can invalidate typical statistical requirements in the data. For instance, it is plausible that a student's academic performance is influenced both by their own mother's educational level as well as that of their peers. Since the underlying social network is measured, the Add Health study provides a unique opportunity… ▽ More In many contexts, particularly when study subjects are adolescents, peer effects can invalidate typical statistical requirements in the data. For instance, it is plausible that a student's academic performance is influenced both by their own mother's educational level as well as that of their peers. Since the underlying social network is measured, the Add Health study provides a unique opportunity to examine the impact of maternal college education on adolescent school performance, both direct and indirect. However, causal inference on populations embedded in social networks poses technical challenges, since the typical no interference assumption no longer holds. While inverse probability-of-treatment weighted (IPW) estimators have been developed for this setting, they are often highly unstable. Motivated by the question of maternal education, we propose doubly robust (DR) estimators combining models for treatment and outcome that are consistent and asymptotically normal if either model is correctly specified. We present empirical results that illustrate the DR property and the efficiency gain of DR over IPW estimators even when the treatment model is misspecified. Contrary to previous studies, our robust analysis does not provide evidence of an indirect effect of maternal education on academic performance within adolescents' social circles in Add Health. △ Less

Submitted 26 January, 2024; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: 39 pages (including appendices), 13 figures

arXiv:2301.03710 [pdf, other]

A time-dependent Poisson-Gamma model for recruitment forecasting in multicenter studies

Authors: Armando Turchetta, Nicolas Savy, David A. Stephens, Erica E. M. Moodie, Marina B. Klein

Abstract: Forecasting recruitments is a key component of the monitoring phase of multicenter studies. One of the most popular techniques in this field is the Poisson-Gamma recruitment model, a Bayesian technique built on a doubly stochastic Poisson process. This approach is based on the modeling of enrollments as a Poisson process where the recruitment rates are assumed to be constant over time and to follo… ▽ More Forecasting recruitments is a key component of the monitoring phase of multicenter studies. One of the most popular techniques in this field is the Poisson-Gamma recruitment model, a Bayesian technique built on a doubly stochastic Poisson process. This approach is based on the modeling of enrollments as a Poisson process where the recruitment rates are assumed to be constant over time and to follow a common Gamma prior distribution. However, the constant-rate assumption is a restrictive limitation that is rarely appropriate for applications in real studies. In this paper, we illustrate a flexible generalization of this methodology which allows the enrollment rates to vary over time by modeling them through B-splines. We show the suitability of this approach for a wide range of recruitment behaviors in a simulation study and by estimating the recruitment progression of the Canadian Co-infection Cohort (CCC). △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.08968 [pdf, ps, other]

Covariate Adjustment in Bayesian Adaptive Randomized Controlled Trials

Authors: James Willard, Shirin Golchi, Erica EM Moodie

Abstract: In conventional randomized controlled trials, adjustment for baseline values of covariates known to be at least moderately associated with the outcome increases the power of the trial. Recent work has shown particular benefit for more flexible frequentist designs, such as information adaptive and adaptive multi-arm designs. However, covariate adjustment has not been characterized within the more f… ▽ More In conventional randomized controlled trials, adjustment for baseline values of covariates known to be at least moderately associated with the outcome increases the power of the trial. Recent work has shown particular benefit for more flexible frequentist designs, such as information adaptive and adaptive multi-arm designs. However, covariate adjustment has not been characterized within the more flexible Bayesian adaptive designs, despite their growing popularity. We focus on a subclass of these which allow for early stop** at an interim analysis given evidence of treatment superiority. We consider both collapsible and non-collapsible estimands, and show how to obtain posterior samples of marginal estimands from adjusted analyses. We describe several estimands for three common outcome types. We perform a simulation study to assess the impact of covariate adjustment using a variety of adjustment models in several different scenarios. This is followed by a real world application of the compared approaches to a COVID-19 trial with a binary endpoint. For all scenarios, it is shown that covariate adjustment increases power and the probability of stop** the trials early, and decreases the expected sample sizes as compared to unadjusted analyses. △ Less

Submitted 23 November, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

Comments: 23 pages, 5 tables, 4 figures

arXiv:2210.13330 [pdf, other]

Dynamic Treatment Regimes using Bayesian Additive Regression Trees for Censored Outcomes

Authors: Xiao Li, Brent R Logan, S M Ferdous Hossain, Erica E M Moodie

Abstract: To achieve the goal of providing the best possible care to each patient, physicians need to customize treatments for patients with the same diagnosis, especially when treating diseases that can progress further and require additional treatments, such as cancer. Making decisions at multiple stages as a disease progresses can be formalized as a dynamic treatment regime (DTR). Most of the existing op… ▽ More To achieve the goal of providing the best possible care to each patient, physicians need to customize treatments for patients with the same diagnosis, especially when treating diseases that can progress further and require additional treatments, such as cancer. Making decisions at multiple stages as a disease progresses can be formalized as a dynamic treatment regime (DTR). Most of the existing optimization approaches for estimating dynamic treatment regimes including the popular method of Q-learning were developed in a frequentist context. Recently, a general Bayesian machine learning framework that facilitates using Bayesian regression modeling to optimize DTRs has been proposed. In this article, we adapt this approach to censored outcomes using Bayesian additive regression trees (BART) for each stage under the accelerated failure time modeling framework, along with simulation studies and a real data example that compare the proposed approach with Q-learning. We also develop an R wrapper function that utilizes a standard BART survival model to optimize DTRs for censored outcomes. The wrapper function can easily be extended to accommodate any type of Bayesian machine learning model. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2205.13609 [pdf, ps, other]

Variable Selection for Individualized Treatment Rules with Discrete Outcomes

Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sylvie D Lambert, Sahir Bhatnagar

Abstract: An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a l… ▽ More An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a loss of efficiency and an unnecessarily complicated treatment rule, which is difficult for physicians to interpret or implement. Thus, a data-driven approach to select important tailoring variables with the aim of improving the estimated decision rules is crucial. While there is a growing body of literature on selecting variables in ITRs with continuous outcomes, relatively few methods exist for discrete outcomes, which pose additional computational challenges even in the absence of variable selection. In this paper, we propose a variable selection method for ITRs with discrete outcomes. We show theoretically and empirically that our approach has the double robustness property, and that it compares favorably with other competing approaches. We illustrate the proposed method on data from a study of an adaptive web-based stress management tool to identify which variables are relevant for tailoring treatment. △ Less

Submitted 29 September, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2205.06370 [pdf]

Characterizing patterns in police stops by race in Minneapolis from 2016-2021

Authors: Tuviere Onookome-Okome, Jonah Gorondensky, Eric Rose, Jeffery Sauer, Kristian Lum, Erica EM Moodie

Abstract: The murder of George Floyd centered Minneapolis, Minnesota, in conversations on racial injustice in the US. We leverage open data from the Minneapolis Police Department to analyze individual, geographic, and temporal patterns in more than 170,000 police stops since 2016. We evaluate person and vehicle searches at the individual level by race using generalized estimating equations with neighborhood… ▽ More The murder of George Floyd centered Minneapolis, Minnesota, in conversations on racial injustice in the US. We leverage open data from the Minneapolis Police Department to analyze individual, geographic, and temporal patterns in more than 170,000 police stops since 2016. We evaluate person and vehicle searches at the individual level by race using generalized estimating equations with neighborhood clustering, directly addressing neighborhood differences in police activity. Minneapolis exhibits clear patterns of disproportionate policing by race, wherein Black people are searched at higher rates compared to White people. Temporal visualizations indicate that police stops declined following the murder of George Floyd. This analysis provides contemporary evidence on the state of policing for a major metropolitan area in the United States. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2204.02231 [pdf, ps, other]

Causal inference: critical developments, past and future

Authors: Erica EM Moodie, David A Stephens

Abstract: Causality is a subject of philosophical debate and a central scientific issue with a long history. In the statistical domain, the study of cause and effect based on the notion of `fairness' in comparisons dates back several hundred years, and yet statistical concepts and developments that form the area of causal inference are only decades old. In this paper, we review core tenets and methods of ca… ▽ More Causality is a subject of philosophical debate and a central scientific issue with a long history. In the statistical domain, the study of cause and effect based on the notion of `fairness' in comparisons dates back several hundred years, and yet statistical concepts and developments that form the area of causal inference are only decades old. In this paper, we review core tenets and methods of causal inference and key developments in the history of the field. We highlight connections with traditional `associational' statistical methods, including estimating equations and semiparametric theory, and point to current topics of active research in this crucial area of our field. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2202.09611 [pdf, other]

Estimating Individualized Treatment Rules in Longitudinal Studies with Covariate-Driven Observation Times

Authors: Janie Coulombe, Erica E. M. Moodie, Susan M. Shortreed, Christel Renoux

Abstract: The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, i.e., treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electr… ▽ More The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, i.e., treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electronic health records data in which the outcome, the observation times, and the treatment mechanism are associated with patients' characteristics. The treatment and observation processes can lead to spurious associations between the treatment of interest and the outcome to be optimized under the dynamic treatment regime if not adequately considered in the analysis. We address these associations by incorporating two inverse weights that are functions of a patient's covariates into dynamic weighted ordinary least squares to develop optimal single stage dynamic treatment regimes, known as individualized treatment rules. We show empirically that our methodology yields consistent, multiply robust estimators. In a cohort of new users of antidepressant drugs from the United Kingdom's Clinical Practice Research Datalink, the proposed method is used to develop an optimal treatment rule that chooses between two antidepressants to optimize a utility function related to the change in body mass index. △ Less

Submitted 19 February, 2022; originally announced February 2022.

arXiv:2202.09451 [pdf, ps, other]

Using Pilot Data to Size Observational Studies for the Estimation of Dynamic Treatment Regimes

Authors: Eric J. Rose, Erica E. M. Moodie, Susan Shortreed

Abstract: There has been significant attention given to develo** data-driven methods for tailoring patient care based on individual patient characteristics. Dynamic treatment regimes formalize this through a sequence of decision rules that map patient information to a suggested treatment. The data for estimating and evaluating treatment regimes are ideally gathered through the use of Sequential Multiple A… ▽ More There has been significant attention given to develo** data-driven methods for tailoring patient care based on individual patient characteristics. Dynamic treatment regimes formalize this through a sequence of decision rules that map patient information to a suggested treatment. The data for estimating and evaluating treatment regimes are ideally gathered through the use of Sequential Multiple Assignment Randomized Trials (SMARTs) though longitudinal observational studies are commonly used due to the potentially prohibitive costs of conducting a SMART. These studies are typically sized for simple comparisons of fixed treatment sequences or, in the case of observational studies, a priori sample size calculations are often not performed. We develop sample size procedures for the estimation of dynamic treatment regimes from observational studies. Our approach uses pilot data to ensure a study will have sufficient power for comparing the value of the optimal regime, i.e. the expected outcome if all patients in the population were treated by following the optimal regime, with a known comparison mean. Our approach also ensures the value of the estimated optimal treatment regime is within an a priori set range of the value of the true optimal regime with a high probability. We examine the performance of the proposed procedure with a simulation study and use it to size a study for reducing depressive symptoms using data from electronic health records. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.09448 [pdf, other]

Monte Carlo Sensitivity Analysis for Unmeasured Confounding in Dynamic Treatment Regimes

Authors: Eric J. Rose, Erica E. M. Moodie, Susan Shortreed

Abstract: Data-driven methods for personalizing treatment assignment have garnered much attention from clinicians and researchers. Dynamic treatment regimes formalize this through a sequence of decision rules that map individual patient characteristics to a recommended treatment. Observational studies are commonly used for estimating dynamic treatment regimes due to the potentially prohibitive costs of cond… ▽ More Data-driven methods for personalizing treatment assignment have garnered much attention from clinicians and researchers. Dynamic treatment regimes formalize this through a sequence of decision rules that map individual patient characteristics to a recommended treatment. Observational studies are commonly used for estimating dynamic treatment regimes due to the potentially prohibitive costs of conducting sequential multiple assignment randomized trials. However, estimating a dynamic treatment regime from observational data can lead to bias in the estimated regime due to unmeasured confounding. Sensitivity analyses are useful for assessing how robust the conclusions of the study are to a potential unmeasured confounder. A Monte Carlo sensitivity analysis is a probabilistic approach that involves positing and sampling from distributions for the parameters governing the bias. We propose a method for performing a Monte Carlo sensitivity analysis of the bias due to unmeasured confounding in the estimation of dynamic treatment regimes. We demonstrate the performance of the proposed procedure with a simulation study and apply it to an observational study examining tailoring the use of antidepressants for reducing symptoms of depression using data from Kaiser Permanente Washington (KPWA). △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.07003 [pdf, other]

Privacy-preserving estimation of an optimal individualized treatment rule : A case study in maximizing time to severe depression-related outcomes

Authors: Erica EM Moodie, Janie Coulombe, Coraline Danieli, Christel Renoux, Susan M Shortreed

Abstract: Estimating individualized treatment rules - particularly in the context of right-censored outcomes - is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to shar… ▽ More Estimating individualized treatment rules - particularly in the context of right-censored outcomes - is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom's Clinical Practice Research Datalink. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2201.12831 [pdf, ps, other]

Causal inference under mis-specification: adjustment based on the propensity score

Authors: David A. Stephens, Widemberg S. Nobre, Erica E. M. Moodie, Alexandra M. Schmidt

Abstract: We study Bayesian approaches to causal inference via propensity score regression. Much of the Bayesian literature on propensity score methods have relied on approaches that cannot be viewed as fully Bayesian in the context of conventional `likelihood times prior' posterior inference; in addition, most methods rely on parametric and distributional assumptions, and presumed correct specification. We… ▽ More We study Bayesian approaches to causal inference via propensity score regression. Much of the Bayesian literature on propensity score methods have relied on approaches that cannot be viewed as fully Bayesian in the context of conventional `likelihood times prior' posterior inference; in addition, most methods rely on parametric and distributional assumptions, and presumed correct specification. We emphasize that causal inference is typically carried out in settings of mis-specification, and develop strategies for fully Bayesian inference that reflect this. We focus on methods based on decision-theoretic arguments, and show how inference based on loss-minimization can give valid and fully Bayesian inference. We propose a computational approach to inference based on the Bayesian bootstrap which has good Bayesian and frequentist properties. △ Less

Submitted 30 January, 2022; originally announced January 2022.

arXiv:2201.02301 [pdf, other]

New designs for Bayesian adaptive cluster-randomized trials

Authors: Junwei Shen, Shirin Golchi, Erica E. M. Moodie, David Benrimoh

Abstract: Adaptive approaches, allowing for more flexible trial design, have been proposed for individually randomized trials to save time or reduce sample size. However, adaptive designs for cluster-randomized trials in which groups of participants rather than individuals are randomized to treatment arms are less common. Motivated by a cluster-randomized trial designed to assess the effectiveness of a mach… ▽ More Adaptive approaches, allowing for more flexible trial design, have been proposed for individually randomized trials to save time or reduce sample size. However, adaptive designs for cluster-randomized trials in which groups of participants rather than individuals are randomized to treatment arms are less common. Motivated by a cluster-randomized trial designed to assess the effectiveness of a machine-learning based clinical decision support system for physicians treating patients with depression, two Bayesian adaptive designs for cluster-randomized trials are proposed to allow for early stop** for efficacy at pre-planned interim analyses. The difference between the two designs lies in the way that participants are sequentially recruited. Given a maximum number of clusters as well as maximum cluster size allowed in the trial, one design sequentially recruits clusters with the given maximum cluster size, while the other recruits all clusters at the beginning of the trial but sequentially enrolls individual participants until the trial is stopped early for efficacy or the final analysis has been reached. The design operating characteristics are explored via simulations for a variety of scenarios and two outcome types for the two designs. The simulation results show that for different outcomes the design choice may be different. We make recommendations for designs of Bayesian adaptive cluster-randomized trial based on the simulation results. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2112.11517 [pdf, other]

doi 10.1002/sim.9151

Estimating the Marginal Effect of a Continuous Exposure on an Ordinal Outcome using Data Subject to Covariate-Driven Treatment and Visit Processes

Authors: Janie Coulombe, Erica E M Moodie, Robert W Platt

Abstract: In the statistical literature, a number of methods have been proposed to ensure valid inference about marginal effects of variables on a longitudinal outcome in settings with irregular monitoring times. However, the potential biases due to covariate-driven monitoring times and confounding have rarely been considered simultaneously, and never in a setting with an ordinal outcome and a continuous ex… ▽ More In the statistical literature, a number of methods have been proposed to ensure valid inference about marginal effects of variables on a longitudinal outcome in settings with irregular monitoring times. However, the potential biases due to covariate-driven monitoring times and confounding have rarely been considered simultaneously, and never in a setting with an ordinal outcome and a continuous exposure. In this work, we propose and demonstrate a methodology for causal inference in such a setting, relying on a proportional odds model to study the effect of the exposure on the outcome. Irregular observation times are considered via a proportional rate model, and a generalization of inverse probability of treatment weights is used to account for the continuous exposure. We motivate our methodology by the estimation of the marginal (causal) effect of the time spent on video or computer games on suicide attempts in the Add Health study, a longitudinal study in the United States. Although in the Add Health data, observation times are pre-specified, our proposed approach is applicable even in more general settings such as when analyzing data from electronic health records where observations are highly irregular. In simulation studies, we let observation times vary across individuals and demonstrate that not accounting for biasing imbalances due to the monitoring and the exposure schemes can bias the estimate for the marginal odds ratio of exposure. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: 42 pages, 10 figures. This manuscript is the submitted version, but the reference of the accepted version is: "Estimating the Marginal Effect of a Continuous Exposure on an Ordinal Outcome using Data Subject to Covariate-Driven Treatment and Visit Processes," Janie Coulombe, Erica E. M. Moodie, and Robert W. Platt. Statistics in Medicine; 40(26): 5746-5764, Copyright@2021, Wiley

Journal ref: Statistics in Medicine; 40(26): 5746-5764, Copyright@2021, Wiley

arXiv:2111.04844 [pdf, ps, other]

Double robust estimation of partially adaptive treatment strategies

Authors: Denis Talbot, Erica EM Moodie, Caroline Diorio

Abstract: Precision medicine aims to tailor treatment decisions according to patients' characteristics. G-estimation and dynamic weighted ordinary least squares (dWOLS) are double robust statistical methods that can be used to identify optimal adaptive treatment strategies. They require both a model for the outcome and a model for the treatment and are consistent if at least one of these models is correctly… ▽ More Precision medicine aims to tailor treatment decisions according to patients' characteristics. G-estimation and dynamic weighted ordinary least squares (dWOLS) are double robust statistical methods that can be used to identify optimal adaptive treatment strategies. They require both a model for the outcome and a model for the treatment and are consistent if at least one of these models is correctly specified. It is underappreciated that these methods additionally require modeling all existing treatment-confounder interactions to yield consistent estimators. Identifying partially adaptive treatment strategies that tailor treatments according to only a few covariates, ignoring some interactions, may be preferable in practice. It has been proposed to combine inverse probability weighting and G-estimation to address this issue, but we argue that the resulting estimator is not expected to be double robust. Building on G-estimation and dWOLS, we propose alternative estimators of partially adaptive strategies and demonstrate their double robustness. We investigate and compare the empirical performance of six estimators in a simulation study. As expected, estimators combining inverse probability weighting with either G-estimation or dWOLS are biased when the treatment model is incorrectly specified. The other estimators are unbiased if either the treatment or the outcome model are correctly specified and have similar standard errors. Using data maintained by the Centre des Maladies du Sein, the methods are illustrated to estimate a partially adaptive treatment strategy for tailoring hormonal therapy use in breast cancer patients according to their estrogen receptor status and body mass index. R software implementing our estimators is provided. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: 22 pages and 8 tables

arXiv:2109.03757 [pdf, ps, other]

Causal Inference for Quantile Treatment Effects

Authors: Shuo Sun, Erica E. M. Moodie, Johanna G. Nešlehová

Abstract: Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted Q… ▽ More Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Journal ref: Environmetrics 32.4 (2021): e2668

arXiv:2109.01218 [pdf, other]

Evaluating the Use of Generalized Dynamic Weighted Ordinary Least Squares for Individualized HIV Treatment Strategies

Authors: Larry Dong, Erica E. M. Moodie, Laura Villain, Rodolphe Thiébaut

Abstract: Dynamic treatment regimes (DTR) are a statistical paradigm in precision medicine which aim to optimize patient outcomes by individualizing treatments. At its simplest, a DTR may require only a single decision to be made; this special case is called an individualized treatment rule (ITR) and is often used to maximize short-term rewards. Generalized dynamic weighted ordinary least squares (G-dWOLS),… ▽ More Dynamic treatment regimes (DTR) are a statistical paradigm in precision medicine which aim to optimize patient outcomes by individualizing treatments. At its simplest, a DTR may require only a single decision to be made; this special case is called an individualized treatment rule (ITR) and is often used to maximize short-term rewards. Generalized dynamic weighted ordinary least squares (G-dWOLS), a DTR estimation method that offers theoretical advantages such as double robustness of parameter estimators in the decision rules, has been recently extended to now accommodate categorical treatments. In this work, G-dWOLS is applied to longitudinal data to estimate an optimal ITR, which is demonstrated in simulations. This novel method is then applied to a population affected by HIV whereby an ITR for the administration of Interleukin 7 (IL-7) is devised to maximize the duration where the CD4 load is above a healthy threshold (500 cells/$μ$L) while preventing the administration of unnecessary injections. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.01041 [pdf, ps, other]

Bayesian Sample Size Calculations for SMART Studies

Authors: Armando Turchetta, Erica E. M. Moodie, David A. Stephens, Sylvie D. Lambert

Abstract: In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have been growing in popularity as they offer a more individualized approach, and sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While… ▽ More In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have been growing in popularity as they offer a more individualized approach, and sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While the number of SMARTs has increased in recent years, their design has remained limited to the frequentist setting, which may not fully or appropriately account for uncertainty in design parameters and hence not yield appropriate sample size recommendations. Specifically, standard frequentist formulae rely on several assumptions that can be easily misspecified. The Bayesian framework offers a straightforward path to alleviate some of these concerns. In this paper, we provide calculations in a Bayesian setting to allow more realistic and robust estimates that account for uncertainty in inputs through the `two priors' approach. Additionally, compared to the standard formulae, this methodology allows us to rely on fewer assumptions, integrate pre-trial knowledge, and switch the focus from the standardized effect size to the minimal detectable difference. The proposed methodology is evaluated in a thorough simulation study and is implemented to estimate the sample size for a full-scale SMART of an Internet-Based Adaptive Stress Management intervention based on a pilot SMART conducted on cardiovascular disease patients from two Canadian provinces. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: Main article 16 pages, 3 figures, 2 tables. Appendix 11 pages, 10 tables. Submitted to Biometrics

arXiv:2106.14364 [pdf, ps, other]

Estimation of the marginal effect of antidepressants on body mass index under confounding and endogenous covariate-driven monitoring times

Authors: Janie Coulombe, Erica E. M. Moodie, Robert W. Platt, Christel Renoux

Abstract: In studying the marginal effect of antidepressants on body mass index using electronic health records data, we face several challenges. Patients' characteristics can affect the exposure (confounding) as well as the timing of routine visits (measurement process), and those characteristics may be altered following a visit which can create dependencies between the monitoring and body mass index when… ▽ More In studying the marginal effect of antidepressants on body mass index using electronic health records data, we face several challenges. Patients' characteristics can affect the exposure (confounding) as well as the timing of routine visits (measurement process), and those characteristics may be altered following a visit which can create dependencies between the monitoring and body mass index when viewed as a stochastic or random processes in time. This may result in a form of selection bias that distorts the estimation of the marginal effect of the antidepressant. Inverse intensity of visit weights have been proposed to adjust for these imbalances, however no approaches have addressed complex settings where the covariate and the monitoring processes affect each other in time so as to induce endogeneity, a situation likely to occur in electronic health records. We review how selection bias due to outcome-dependent follow-up times may arise and propose a new cumulated weight that models a complete monitoring path so as to address the above-mentioned challenges and produce a reliable estimate of the impact of antidepressants on body mass index. More specifically, we do so using data from the Clinical Practice Research Datalink in the United Kingdom, comparing the marginal effect of two commonly used antidepressants, citalopram and fluoxetine, on body mass index. The results are compared to those obtained with simpler methods that do not account for the extent of the dependence due to an endogenous covariate process. △ Less

Submitted 27 June, 2021; originally announced June 2021.

MSC Class: 62D20

arXiv:2105.12259 [pdf, other]

Estimation of Optimal Dynamic Treatment Regimes using Gaussian Process Emulation

Authors: Daniel Rodriguez Duque, David A. Stephens, Erica E. M. Moodie

Abstract: In precision medicine, identifying optimal sequences of decision rules, termed dynamic treatment regimes (DTRs), is an important undertaking. One approach investigators may take to infer about optimal DTRs is via Bayesian dynamic Marginal Structural Models (MSMs). These models represent the expected outcome under adherence to a DTR for DTRs in a family indexed by a parameter $ ψ$; the function map… ▽ More In precision medicine, identifying optimal sequences of decision rules, termed dynamic treatment regimes (DTRs), is an important undertaking. One approach investigators may take to infer about optimal DTRs is via Bayesian dynamic Marginal Structural Models (MSMs). These models represent the expected outcome under adherence to a DTR for DTRs in a family indexed by a parameter $ ψ$; the function map** regimes in the family to the expected outcome under adherence to a DTR is known as the value function. Models that allow for the straightforward identification of an optimal DTR may lead to biased estimates. If such a model is computationally tractable, common wisdom says that a grid-search for the optimal DTR may obviate this difficulty. In a Bayesian context, computational difficulties may be compounded if a posterior mean must be calculated at each grid point. We seek to alleviate these inferential challenges by implementing Gaussian Process ($ \mathcal{GP} $) optimization methods for estimators for the causal effect of adherence to a specified DTR. We examine how to identify optimal DTRs in settings where the value function is multi-modal, which are often not addressed in the DTR literature. We conclude that a $ \mathcal{GP} $ modeling approach that acknowledges noise in the estimated response surface leads to improved results. Additionally, we find that a grid-search may not always yield a robust solution and that it is often less efficient than a $ \mathcal{GP} $ approach. We illustrate the use of the proposed methods by analyzing a clinical dataset with the aim of quantifying the effect of different patterns of HIV therapy. △ Less

Submitted 7 June, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2101.07359 [pdf, other]

Variable Selection in Regression-based Estimation of Dynamic Treatment Regimes

Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sahir Bhatnagar

Abstract: Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data… ▽ More Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it is difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach of selecting these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. Through simulations, we show our method has both the double robustness property and the oracle property, and the newly proposed methods compare favorably with other variable selection approaches. △ Less

Submitted 3 December, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

arXiv:2012.00457 [pdf, other]

General Regression Methods for Respondent-Driven Sampling Data

Authors: Mamadou Yauck, Erica E. M. Moodie, Herak Apelian, Alain Fourmigue, Daniel Grace, Trevor Hart, Gilles Lambert, Joseph Cox

Abstract: Respondent-Driven Sampling (RDS) is a variant of link-tracing sampling techniques that aim to recruit hard-to-reach populations by leveraging individuals' social relationships. As such, an RDS sample has a graphical component which represents a partially observed network of unknown structure. Moreover, it is common to observe homophily, or the tendency to form connections with individuals who shar… ▽ More Respondent-Driven Sampling (RDS) is a variant of link-tracing sampling techniques that aim to recruit hard-to-reach populations by leveraging individuals' social relationships. As such, an RDS sample has a graphical component which represents a partially observed network of unknown structure. Moreover, it is common to observe homophily, or the tendency to form connections with individuals who share similar traits. Currently, there is a lack of principled guidance on multivariate modeling strategies for RDS to address homophilic covariates and the dependence between observations within the network. In this work, we propose a methodology for general regression techniques using RDS data. This is used to study the socio-demographic predictors of HIV treatment optimism (about the value of antiretroviral therapy) among gay, bisexual and other men who have sex with men, recruited into an RDS study in Montreal, Canada. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2010.00165 [pdf, other]

Neighbourhood Bootstrap for Respondent-Driven Sampling

Authors: Mamadou Yauck, Erica E. M. Moodie, Herak Apelian, Alain Fourmigue, Daniel Grace, Trevor A. Hart, Gilles Lambert, Joseph Cox

Abstract: Respondent-Driven Sampling (RDS) is a form of link-tracing sampling, a sampling technique used for `hard-to-reach' populations that aims to leverage individuals' social relationships to reach potential participants. While the methodological focus has been restricted to the estimation of population proportions, there is a growing interest in the estimation of uncertainty for RDS as recent findings… ▽ More Respondent-Driven Sampling (RDS) is a form of link-tracing sampling, a sampling technique used for `hard-to-reach' populations that aims to leverage individuals' social relationships to reach potential participants. While the methodological focus has been restricted to the estimation of population proportions, there is a growing interest in the estimation of uncertainty for RDS as recent findings suggest that most variance estimators underestimate variability. Recently, Baraff et al. (2016) proposed the \textit{tree bootstrap} method based on resampling the RDS recruitment tree, and empirically showed that this method outperforms current bootstrap methods. However, some findings suggest that the tree bootstrap (severely) overestimates uncertainty. In this paper, we propose the \textit{neighbourhood} bootstrap method for quantifiying uncertainty in RDS. We prove the consistency of our method under some conditions and investigate its finite sample performance, through a simulation study, under realistic RDS sampling assumptions. △ Less

Submitted 4 February, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

arXiv:2006.01799 [pdf, ps, other]

doi 10.1214/22-STS879

The role of exchangeability in causal inference

Authors: Olli Saarela, David A. Stephens, Erica E. M. Moodie

Abstract: Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and defini… ▽ More Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and definition of causal contrasts of interest, to the concept of exchangeability. Here we propose a probabilistic between-group exchangeability property as an identifying condition for causal effects, relate it to alternative conditions for unconfounded inferences (commonly stated using potential outcomes) and define causal contrasts in the presence of exchangeability in terms of posterior predictive expectations for further exchangeable units. While our main focus is on a point treatment setting, we also investigate how this reasoning carries over to longitudinal settings. △ Less

Submitted 15 December, 2022; v1 submitted 2 June, 2020; originally announced June 2020.

Journal ref: Statistical Science. 2023 Aug; 38(3): 369-385

arXiv:2002.05793 [pdf, other]

Sampling from Networks: Respondent-Driven Sampling

Authors: Mamadou Yauck, Erica E. M. Moodie, Herak Apelian, Marc-Messier Peet, Gilles Lambert, Daniel Grace, Nathan J. Lachowsky, Trevor Hart, Joseph Cox

Abstract: Respondent-Driven Sampling (RDS) is a variant of link-tracing, a sampling technique for surveying hard-to-reach communities that takes advantage of community members' social networks to reach potential participants. As a network-based sampling method, RDS is faced with the fundamental problem of sampling from population networks where features such as homophily (the tendency for individuals with s… ▽ More Respondent-Driven Sampling (RDS) is a variant of link-tracing, a sampling technique for surveying hard-to-reach communities that takes advantage of community members' social networks to reach potential participants. As a network-based sampling method, RDS is faced with the fundamental problem of sampling from population networks where features such as homophily (the tendency for individuals with similar traits to share social ties) and differential activity (the ratio of the average number of connections by attribute) are sensitive to the choice of a sampling method. Though not clearly described in the RDS literature, many simple methods exist to generate simulated RDS data, with specific levels of network features, where the focus is on estimating simple estimands. However, the accuracy of these methods in their abilities to consistently recover those targeted network features remains unclear. This is also motivated by recent findings that some population network parameters (e.g.~homophily) cannot be consistently estimated from the RDS data alone \citep{Crawford17}. In this paper, we conduct a simulation study to assess the accuracy of existing RDS simulation methods, in terms of their abilities to generate RDS samples with the desired levels of two network parameters: homophily and differential activity. The results show that (1) homophily cannot be consistently estimated from simulated RDS samples and (2) differential activity estimates are more precise when groups, defined by traits, are equally active and equally represented in the population. We use this approach to mimic features of the Engage Study, an RDS sample of gay, bisexual and other men who have sex with men in Montreal. △ Less

Submitted 14 August, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

arXiv:1704.08229 [pdf, ps, other]

Generalized G-estimation and Model Selection

Authors: M. P. Wallace, E. E. M. Moodie, D. A. Stephens

Abstract: Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extend… ▽ More Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extending G-estimation to the case of non-additive effects, non-continuous outcomes or on model selection. We demonstrate how G-estimation can be more widely applied through the use of iteratively-reweighted least squares procedures, and illustrate this for log-linear models. We then derive a quasi-likelihood function for G-estimation within the DTR framework, and show how it can be used to form an information criterion for blip model selection. These developments are demonstrated through application to a variety of simulation studies as well as data from the Sequenced Treatment Alternatives to Relieve Depression study. △ Less

Submitted 26 April, 2017; originally announced April 2017.

arXiv:1407.8371 [pdf, ps, other]

doi 10.1214/14-AOAS727

Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

Authors: Mireille E. Schnitzer, Mark J. van der Laan, Erica E. M. Moodie, Robert W. Platt

Abstract: The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized a program encouraging breastfeeding to new mothers in hospital centers. The original studies indicated that this intervention successfully increased duration of breastfeeding and lowered rates of gastrointestinal tract infections in newborns. Additional scientific and popular interest lies in determining the causal effec… ▽ More The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized a program encouraging breastfeeding to new mothers in hospital centers. The original studies indicated that this intervention successfully increased duration of breastfeeding and lowered rates of gastrointestinal tract infections in newborns. Additional scientific and popular interest lies in determining the causal effect of longer breastfeeding on gastrointestinal infection. In this study, we estimate the expected infection count under various lengths of breastfeeding in order to estimate the effect of breastfeeding duration on infection. Due to the presence of baseline and time-dependent confounding, specialized "causal" estimation methods are required. We demonstrate the double-robust method of Targeted Maximum Likelihood Estimation (TMLE) in the context of this application and review some related methods and the adjustments required to account for clustering. We compare TMLE (implemented both parametrically and using a data-adaptive algorithm) to other causal methods for this example. In addition, we conduct a simulation study to determine (1) the effectiveness of controlling for clustering indicators when cluster-specific confounders are unmeasured and (2) the importance of using data-adaptive TMLE. △ Less

Submitted 31 July, 2014; originally announced July 2014.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS727 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS727

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 2, 703-725

Showing 1–36 of 36 results for author: Moodie, E E