Search | arXiv e-print repository

Dirichlet process mixture models for the Analysis of Repeated Attempt Designs

Authors: Michael J. Daniels, Minji Lee, Wei Feng

Abstract: In longitudinal studies, it is not uncommon to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful provides useful information for the purposes of assessing missing data assumptions. This is because measurements from subjects who provide the data after numerous failed attempts may differ from those who provide the measurement after fewer… ▽ More In longitudinal studies, it is not uncommon to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful provides useful information for the purposes of assessing missing data assumptions. This is because measurements from subjects who provide the data after numerous failed attempts may differ from those who provide the measurement after fewer attempts. Previous models for these designs were parametric and/or did not allow sensitivity analysis. For the former, there are always concerns about model misspecification and for the latter, sensitivity analysis is essential when conducting inference in the presence of missing data. Here, we propose a new approach which minimizes issues with model misspecification by using Bayesian nonparametrics for the observed data distribution. We also introduce a novel approach for identification and sensitivity analysis. We re-analyze the repeated attempts data from a clinical trial involving patients with severe mental illness and conduct simulations to better understand the properties of our approach. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 24 pages, additional 16 pages of supplementary material

arXiv:2305.05017 [pdf, ps, other]

A Bayesian Non-parametric Approach for Causal Mediation with a Post-treatment Confounder

Authors: Woojung Bae, Michael J. Daniels, Michael G. Perri

Abstract: We propose a new Bayesian non-parametric (BNP) method for estimating the causal effects of mediation in the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounders, treatment, and baseline confounders). The proposed BNP model allows more confounder-based clus… ▽ More We propose a new Bayesian non-parametric (BNP) method for estimating the causal effects of mediation in the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounders, treatment, and baseline confounders). The proposed BNP model allows more confounder-based clusters than clusters for the outcome and mediator. For identifiability, we use the extended version of the standard sequential ignorability as introduced in \citet{hong2022posttreatment}. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, $i.e.$, the natural direct effects (NDE), and indirect effects (NIE). We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, demonstrating its practical utility in real-world scenarios. \keywords{Causal inference; Enriched Dirichlet process mixture model.} △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.01631 [pdf, other]

Truncation Approximation for Enriched Dirichlet Process Mixture Models

Authors: Natalie Burns, Michael J. Daniels

Abstract: Enriched Dirichlet process mixture (EDPM) models are Bayesian nonparametric models which can be used for nonparametric regression and conditional density estimation and which overcome a key disadvantage of jointly modeling the response and predictors as a Dirichlet process mixture (DPM) model: when there is a large number of predictors, the clusters induced by the DPM will be overwhelmingly determ… ▽ More Enriched Dirichlet process mixture (EDPM) models are Bayesian nonparametric models which can be used for nonparametric regression and conditional density estimation and which overcome a key disadvantage of jointly modeling the response and predictors as a Dirichlet process mixture (DPM) model: when there is a large number of predictors, the clusters induced by the DPM will be overwhelmingly determined by the predictors rather than the response. A truncation approximation to a DPM allows a blocked Gibbs sampling algorithm to be used rather than a Polya urn sampling algorithm. The blocked Gibbs sampler offers potential improvement in mixing. The truncation approximation also allows for implementation in standard software ($\textit{rjags}$ and $\textit{rstan}$). In this paper we introduce an analogous truncation approximation for an EDPM. We show that with sufficiently large truncation values in the approximation of the EDP prior, a precise approximation to the EDP is available. We verify that the truncation approximation and blocked Gibbs sampler with minimum truncation values that obtain adequate error bounds achieve similar accuracy to the truncation approximation and blocked Gibbs sampler with large truncation values using a simulated example. Further, we use the simulated example to show that the blocked Gibbs sampler improves upon the mixing in the Polya urn sampler, especially as the number of covariates increases. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2208.13382 [pdf, other]

A Bayesian nonparametric approach for causal inference with multiple mediators

Authors: Samrat Roy, Michael J. Daniels, Brendan J. Kelly, Jason Roy

Abstract: Mediation analysis with contemporaneously observed multiple mediators is an important area of causal inference. Recent approaches for multiple mediators are often based on parametric models and thus may suffer from model misspecification. Also, much of the existing literature either only allow estimation of the joint mediation effect, or, estimate the joint mediation effect as the sum of individua… ▽ More Mediation analysis with contemporaneously observed multiple mediators is an important area of causal inference. Recent approaches for multiple mediators are often based on parametric models and thus may suffer from model misspecification. Also, much of the existing literature either only allow estimation of the joint mediation effect, or, estimate the joint mediation effect as the sum of individual mediator effects, which often is not a reasonable assumption. In this paper, we propose a methodology which overcomes the two aforementioned drawbacks. Our method is based on a novel Bayesian nonparametric (BNP) approach, wherein the joint distribution of the observed data (outcome, mediators, treatment, and confounders) is modeled flexibly using an enriched Dirichlet process mixture with three levels: the first level characterizing the conditional distribution of the outcome given the mediators, treatment and the confounders, the second level corresponding to the conditional distribution of each of the mediators given the treatment and the confounders, and the third level corresponding to the distribution of the treatment and the confounders. We use standardization (g-computation) to compute causal mediation effects under three uncheckable assumptions that allow identification of the individual and joint mediation effects. The efficacy of our proposed method is demonstrated with simulations. We apply our proposed method to analyze data from a study of Ventilator-associated Pneumonia (VAP) co-infected patients, where the effect of the abundance of Pseudomonas on VAP infection is suspected to be mediated through antibiotics. △ Less

Submitted 29 August, 2022; originally announced August 2022.

ACM Class: G.3

arXiv:2208.09869 [pdf, other]

Flexible evaluation of surrogacy in Bayesian adaptive platform studies

Authors: Michael C Sachs, Erin E Gabriel, Alessio Crippa, Michael J Daniels

Abstract: Trial level surrogates are useful tools for improving the speed and cost effectiveness of trials, but surrogates that have not been properly evaluated can cause misleading results. The evaluation procedure is often contextual and depends on the type of trial setting. There have been many proposed methods for trial level surrogate evaluation, but none, to our knowledge, for the specific setting of… ▽ More Trial level surrogates are useful tools for improving the speed and cost effectiveness of trials, but surrogates that have not been properly evaluated can cause misleading results. The evaluation procedure is often contextual and depends on the type of trial setting. There have been many proposed methods for trial level surrogate evaluation, but none, to our knowledge, for the specific setting of Bayesian adaptive platform studies. As adaptive studies are becoming more popular, methods for surrogate evaluation using them are needed. These studies also offer a rich data resource for surrogate evaluation that would not normally be possible. However, they also offer a set of statistical issues including heterogeneity of the study population, treatments, implementation, and even potentially the quality of the surrogate. We propose the use of a hierarchical Bayesian semiparametric model for the evaluation of potential surrogates using nonparametric priors for the distribution of true effects based on Dirichlet process mixtures. The motivation for this approach is to flexibly model relationships between the treatment effect on the surrogate and the treatment effect on the outcome and also to identify potential clusters with differential surrogate value in a data-driven manner. In simulations, we find that our proposed method is superior to a simple, but fairly standard, hierarchical Bayesian method. We demonstrate how our method can be used in a simulated illustrative example (based on the ProBio trial), in which we are able to identify clusters where the surrogate is, and is not useful. We plan to apply our method to the ProBio trial, once it is completed. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: 21 pages, 4 figures

arXiv:2201.03077 [pdf, other]

Information Borrowing in Regression Models

Authors: Amy Zhang, Le Bao, Michael J. Daniels

Abstract: Model development often takes data structure, subject matter considerations, model assumptions, and goodness of fit into consideration. To diagnose issues with any of these factors, it can be helpful to understand regression model estimates at a more granular level. We propose a new method for decomposing point estimates from a regression model via weights placed on data clusters. The weights are… ▽ More Model development often takes data structure, subject matter considerations, model assumptions, and goodness of fit into consideration. To diagnose issues with any of these factors, it can be helpful to understand regression model estimates at a more granular level. We propose a new method for decomposing point estimates from a regression model via weights placed on data clusters. The weights are informed only by the model specification and data availability and thus can be used to explicitly link the effects of data imbalance and model assumptions to actual model estimates. The weight matrix has been understood in linear models as the hat matrix in the existing literature. We extend it to Bayesian hierarchical regression models that incorporate prior information and complicated dependence structures through the covariance among random effects. We show that the model weights, which we call borrowing factors, generalize shrinkage and information borrowing to all regression models. In contrast, the focus of the hat matrix has been mainly on the diagonal elements indicating the amount of leverage. We also provide metrics that summarize the borrowing factors and are practically useful. We present the theoretical properties of the borrowing factors and associated metrics and demonstrate their usage in two examples. By explicitly quantifying borrowing and shrinkage, researchers can better incorporate domain knowledge and evaluate model performance and the impacts of data properties such as data imbalance or influential points. △ Less

Submitted 9 January, 2022; originally announced January 2022.

arXiv:2112.13998 [pdf, other]

Variable Selection Using Bayesian Additive Regression Trees

Authors: Chuji Luo, Michael J. Daniels

Abstract: Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, wh… ▽ More Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, which is flexible enough to capture the interactions between predictors and nonlinear relationships with the response. An emphasis of this review is on the capability of identifying relevant predictors. We also propose two variable importance measures which can be used in a permutation-based variable selection approach, and a backward variable selection procedure for BART. We present simulations demonstrating that our approaches exhibit improved performance in terms of the ability to recover all the relevant predictors in a variety of data settings, compared to existing BART-based variable selection methods. △ Less

Submitted 28 December, 2021; originally announced December 2021.

Comments: 40 pages, 13 figures

arXiv:2106.14599 [pdf, other]

BNPqte: A Bayesian Nonparametric Approach to Causal Inference on Quantiles in R

Authors: Chuji Luo, Michael J. Daniels

Abstract: In this article, we introduce the BNPqte R package which implements the Bayesian nonparametric approach of Xu, Daniels and Winterstein (2018) for estimating quantile treatment effects in observational studies. This approach provides flexible modeling of the distributions of potential outcomes, so it is capable of capturing a variety of underlying relationships among the outcomes, treatments and co… ▽ More In this article, we introduce the BNPqte R package which implements the Bayesian nonparametric approach of Xu, Daniels and Winterstein (2018) for estimating quantile treatment effects in observational studies. This approach provides flexible modeling of the distributions of potential outcomes, so it is capable of capturing a variety of underlying relationships among the outcomes, treatments and confounders and estimating multiple quantile treatment effects simultaneously. Specifically, this approach uses a Bayesian additive regression trees (BART) model to estimate the propensity score and a Dirichlet process mixture (DPM) of multivariate normals model to estimate the conditional distribution of the potential outcome given the estimated propensity score. The BNPqte R package provides a fast implementation for this approach by designing efficient R functions for the DPM of multivariate normals model in joint and conditional density estimation. These R functions largely improve the efficiency of the DPM model in density estimation, compared to the popular DPpackage. BART-related R functions in the BNPqte R package are inherited from the BART R package with two modifications on variable importance and split probability. To maximize computational efficiency, the actual sampling and computation for each model are carried out in C++ code. The Armadillo C++ library is also used for fast linear algebra calculations. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 44 pages, 13 figures

arXiv:2101.06823 [pdf, other]

Inference for BART with Multinomial Outcomes

Authors: Yizhen Xu, Joseph W. Hogan, Michael J. Daniels, Rami Kantor, Ann Mwangi

Abstract: The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian… ▽ More The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy. △ Less

Submitted 12 August, 2022; v1 submitted 17 January, 2021; originally announced January 2021.

Comments: 23 pages, 12 tables, 6 figures, with appendix, 49 pages total

arXiv:2011.14238 [pdf, other]

Approximate Cross-validated Mean Estimates for Bayesian Hierarchical Regression Models

Authors: Amy X. Zhang, Le Bao, Changcheng Li, Michael J. Daniels

Abstract: We introduce a novel procedure for obtaining cross-validated predictive estimates for Bayesian hierarchical regression models (BHRMs). Bayesian hierarchical models are popular for their ability to model complex dependence structures and provide probabilistic uncertainty estimates, but can be computationally expensive to run. Cross-validation (CV) is therefore not a common practice to evaluate the… ▽ More We introduce a novel procedure for obtaining cross-validated predictive estimates for Bayesian hierarchical regression models (BHRMs). Bayesian hierarchical models are popular for their ability to model complex dependence structures and provide probabilistic uncertainty estimates, but can be computationally expensive to run. Cross-validation (CV) is therefore not a common practice to evaluate the predictive performance of BHRMs. Our method circumvents the need to re-run computationally costly estimation methods for each cross-validation fold and makes CV more feasible for large BHRMs. By conditioning on the variance-covariance parameters, we shift the CV problem from probability-based sampling to a simple and familiar optimization problem. In many cases, this produces estimates which are equivalent to full CV. We provide theoretical results and demonstrate its efficacy on publicly available data and in simulations. △ Less

Submitted 17 January, 2024; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: 26 pages, 2 figures

arXiv:2011.12345 [pdf, ps, other]

A Bayesian semi-parametric approach for inference on the population partly conditional mean from longitudinal data with dropout

Authors: Maria Josefsson, Michael J. Daniels, Sara Pudas

Abstract: Studies of memory trajectories using longitudinal data often result in highly non-representative samples due to selective study enrollment and attrition. An additional bias comes from practice effects that result in improved or maintained performance due to familiarity with test content or context. These challenges may bias study findings and severely distort the ability to generalize to the targe… ▽ More Studies of memory trajectories using longitudinal data often result in highly non-representative samples due to selective study enrollment and attrition. An additional bias comes from practice effects that result in improved or maintained performance due to familiarity with test content or context. These challenges may bias study findings and severely distort the ability to generalize to the target population. In this study we propose an approach for estimating the finite population mean of a longitudinal outcome conditioning on being alive at a specific time point. We develop a flexible Bayesian semi-parametric predictive estimator for population inference when longitudinal auxiliary information is known for the target population. We evaluate sensitivity of the results to untestable assumptions and further compare our approach to other methods used for population inference in a simulation study. The proposed approach is motivated by 15-year longitudinal data from the Betula longitudinal cohort study. We apply our approach to estimate lifespan trajectories in episodic memory, with the aim to generalize findings to a target population. △ Less

Submitted 22 March, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

arXiv:2011.00404 [pdf, other]

Informed Pooled Testing with Quantitative Assays

Authors: Tao Liu, Joseph W Hogan, Wanning Su, Yizhen Xu, Michael J Daniels, Kantor Rami

Abstract: Pooled testing is widely used for screening for viral or bacterial infections with low prevalence when individual testing is not cost-efficient. Pooled testing with qualitative assays that give binary results has been well-studied. However, characteristics of pooling with quantitative assays were mostly demonstrated using simulations or empirical studies. We investigate properties of three pooling… ▽ More Pooled testing is widely used for screening for viral or bacterial infections with low prevalence when individual testing is not cost-efficient. Pooled testing with qualitative assays that give binary results has been well-studied. However, characteristics of pooling with quantitative assays were mostly demonstrated using simulations or empirical studies. We investigate properties of three pooling strategies with quantitative assays: traditional two-stage mini-pooling (MP) (Dorfman, 1943), mini-pooling with deconvolution algorithm (MPA) (May et al., 2010), and marker-assisted MPA (mMPA) (Liu et al., 2017). MPA and mMPA test individuals in a sequence after a positive pool and implement a deconvolution algorithm to determine when testing can cease to ascertain all individual statuses. mMPA uses information from other available markers to determine an optimal order for individual testings. We derive and compare the general statistical properties of the three pooling methods. We show that with a proper pool size, MP, MPA, and mMPA can be more cost-efficient than individual testing, and mMPA is superior to MPA and MP. For diagnostic accuracy, mMPA and MPA have higher specificity and positive predictive value but lower sensitivity and negative predictive value than MP and individual testing. Included in this paper are applications to various simulations and an application for HIV treatment monitoring. △ Less

Submitted 31 October, 2020; originally announced November 2020.

arXiv:1902.10787 [pdf, ps, other]

Bayesian semi-parametric G-computation for causal inference in a cohort study with MNAR dropout and death

Authors: Maria Josefsson, Michael J. Daniels

Abstract: Causal inference with observational longitudinal data and time-varying exposures is often complicated by time-dependent confounding and attrition. The G-computation formula is one approach for estimating a causal effect in this setting. The parametric modeling approach typically used in practice relies on strong modeling assumptions for valid inference, and moreover depends on an assumption of mis… ▽ More Causal inference with observational longitudinal data and time-varying exposures is often complicated by time-dependent confounding and attrition. The G-computation formula is one approach for estimating a causal effect in this setting. The parametric modeling approach typically used in practice relies on strong modeling assumptions for valid inference, and moreover depends on an assumption of missing at random, which is not appropriate when the missingness is missing not at random (MNAR) or due to death. In this work we develop a flexible Bayesian semi-parametric G-computation approach for assessing the causal effect on the subpopulation that would survive irrespective of exposure, in a setting with MNAR dropout. The approach is to specify models for the observed data using Bayesian additive regression trees, and then use assumptions with embedded sensitivity parameters to identify and estimate the causal effect. The proposed approach is motivated by a longitudinal cohort study on cognition, health, and aging, and we apply our approach to study the effect of becoming a widow on memory. We also compare our approach to several standard methods. △ Less

Submitted 12 October, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

arXiv:1901.00908 [pdf, other]

Bayesian Longitudinal Causal Inference in the Analysis of the Public Health Impact of Pollutant Emissions

Authors: Chanmin Kim, Corwin M Zigler, Michael J Daniels, Christine Choirat, Jason A Roy

Abstract: Pollutant emissions from coal-burning power plants have been deemed to adversely impact ambient air quality and public health conditions. Despite the noticeable reduction in emissions and the improvement of air quality since the Clean Air Act (CAA) became the law, the public-health benefits from changes in emissions have not been widely evaluated yet. In terms of the chain of accountability (HEI A… ▽ More Pollutant emissions from coal-burning power plants have been deemed to adversely impact ambient air quality and public health conditions. Despite the noticeable reduction in emissions and the improvement of air quality since the Clean Air Act (CAA) became the law, the public-health benefits from changes in emissions have not been widely evaluated yet. In terms of the chain of accountability (HEI Accountability Working Group, 2003), the link between pollutant emissions from the power plants (SO2) and public health conditions (respiratory diseases) accounting for changes in ambient air quality (PM2.5) is unknown. We provide the first assessment of the longitudinal effect of specific pollutant emission (SO2) on public health outcomes that is mediated through changes in the ambient air quality. It is of particular interest to examine the extent to which the effect that is mediated through changes in local ambient air quality differs from year to year. In this paper, we propose a Bayesian approach to estimate novel causal estimands: time-varying mediation effects in the presence of mediators and responses measured every year. We replace the commonly invoked sequential ignorability assumption with a new set of assumptions which are sufficient to identify the distributions of the natural indirect and direct effects in this setting. △ Less

Submitted 3 January, 2019; originally announced January 2019.

arXiv:1812.06507 [pdf, ps, other]

doi 10.1002/sim.8082

Classification using Ensemble Learning under Weighted Misclassification Loss

Authors: Yizhen Xu, Tao Liu, Michael J. Daniels, Rami Kantor, Ann Mwangi, Joseph W. Hogan

Abstract: Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy (ART) requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. I… ▽ More Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy (ART) requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples. △ Less

Submitted 10 May, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

Comments: 23 pages, 4 tables, 4 figures

Journal ref: Statistics in Medicine 2019, Vol. 38, Issue 11, Pg. 2002-2012

arXiv:1805.07147 [pdf, other]

A Bayesian Parametric Approach to Handle Missing Longitudinal Outcome Data in Trial-Based Health Economic Evaluations

Authors: Andrea Gabrio, Michael J. Daniels, Gianluca Baio

Abstract: Trial-based economic evaluations are typically performed on cross-sectional variables, derived from the responses for only the completers in the study, using methods that ignore the complexities of utility and cost data (e.g. skewness and spikes). We present an alternative and more efficient Bayesian parametric approach to handle missing longitudinal outcomes in economic evaluations, while account… ▽ More Trial-based economic evaluations are typically performed on cross-sectional variables, derived from the responses for only the completers in the study, using methods that ignore the complexities of utility and cost data (e.g. skewness and spikes). We present an alternative and more efficient Bayesian parametric approach to handle missing longitudinal outcomes in economic evaluations, while accounting for the complexities of the data. We specify a flexible parametric model for the observed data and partially identify the distribution of the missing data with partial identifying restrictions and sensitivity parameters. We explore alternative nonignorable scenarios through different priors for the sensitivity parameters, calibrated on the observed data. Our approach is motivated by, and applied to, data from a trial assessing the cost-effectiveness of a new treatment for intellectual disability and challenging behaviour. △ Less

Submitted 18 May, 2018; originally announced May 2018.

arXiv:1702.08496 [pdf, other]

Bayesian nonparametric generative models for causal inference with missing at random covariates

Authors: Jason Roy, Kirsten J Lum, Michael J. Daniels, Bret Zeldow, Jordan Dworkin, Vincent Lo Re III

Abstract: We propose a general Bayesian nonparametric (BNP) approach to causal inference in the point treatment setting. The joint distribution of the observed data (outcome, treatment, and confounders) is modeled using an enriched Dirichlet process. The combination of the observed data model and causal assumptions allows us to identify any type of causal effect - differences, ratios, or quantile effects, e… ▽ More We propose a general Bayesian nonparametric (BNP) approach to causal inference in the point treatment setting. The joint distribution of the observed data (outcome, treatment, and confounders) is modeled using an enriched Dirichlet process. The combination of the observed data model and causal assumptions allows us to identify any type of causal effect - differences, ratios, or quantile effects, either marginally or for subpopulations of interest. The proposed BNP model is well-suited for causal inference problems, as it does not require parametric assumptions about the distribution of confounders and naturally leads to a computationally efficient Gibbs sampling algorithm. By flexibly modeling the joint distribution, we are also able to impute (via data augmentation) values for missing covariates within the algorithm under an assumption of ignorable missingness, obviating the need to create separate imputed data sets. This approach for imputing the missing covariates has the additional advantage of guaranteeing congeniality between the imputation model and the analysis model, and because we use a BNP approach, parametric models are avoided for imputation. The performance of the method is assessed using simulation studies. The method is applied to data from a cohort study of human immunodeficiency virus/hepatitis C virus co-infected patients. △ Less

Submitted 27 February, 2017; originally announced February 2017.

arXiv:1507.01825 [pdf, other]

Comparing Biomarkers as Trial Level General Surrogates

Authors: Erin E. Gabriel, Michael J. Daniels, M. Elizabeth Halloran

Abstract: An intermediate response measure that accurately predicts efficacy in a new setting can reduce trial cost and time to product licensure. In this paper, we define a trial level general surrogate as a trial level intermediate response that accurately predicts trial level clinical responses. Methods for evaluating trial level general surrogates have been developed previously. Many methods in the lite… ▽ More An intermediate response measure that accurately predicts efficacy in a new setting can reduce trial cost and time to product licensure. In this paper, we define a trial level general surrogate as a trial level intermediate response that accurately predicts trial level clinical responses. Methods for evaluating trial level general surrogates have been developed previously. Many methods in the literature use trial level intermediate responses for prediction. However, all existing methods focus on surrogate evaluation and prediction in new settings, rather than comparison of candidate trial level surrogates, and few formalize the use of cross validation to quantify the expected prediction error. Our proposed method uses Bayesian non-parametric modeling and cross-validation to estimate the absolute prediction error for use in evaluating and comparing candidate trial level general surrogates. Simulations show that our method performs well across a variety of scenarios. We use our method to evaluate and to compare candidate trial level general surrogates in several multi-national trials of a pentavalent rotavirus vaccine. We identify two immune measures that have potential value as trial level general surrogates and use the measures to predict efficacy in a trial with no clinical outcomes measured. △ Less

Submitted 7 July, 2015; originally announced July 2015.

Showing 1–18 of 18 results for author: Daniels, M J