Search | arXiv e-print repository

Transportability of Principal Causal Effects

Authors: Justin M. Clark, Kollin W. Rott, James S. Hodges, Jared D. Huling

Abstract: Recent research in causal inference has made important progress in addressing challenges to the external validity of trial findings. Such methods weight trial participant data to more closely resemble the distribution of effect-modifying covariates in a well-defined target population. In the presence of participant non-adherence to study medication, these methods effectively transport an intention… ▽ More Recent research in causal inference has made important progress in addressing challenges to the external validity of trial findings. Such methods weight trial participant data to more closely resemble the distribution of effect-modifying covariates in a well-defined target population. In the presence of participant non-adherence to study medication, these methods effectively transport an intention-to-treat effect that averages over heterogeneous compliance behaviors. In this paper, we develop a principal stratification framework to identify causal effects conditioning on both on compliance behavior and membership in the target population. We also develop non-parametric efficiency theory for and construct efficient estimators of such "transported" principal causal effects and characterize their finite-sample performance in simulation experiments. While this work focuses on treatment non-adherence, the framework is applicable to a broad class of estimands that target effects in clinically-relevant, possibly latent subsets of a target population. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2302.07840 [pdf]

Causally-interpretable meta-analysis: clearly-defined causal effects and two case studies

Authors: Kollin W. Rott, Gert Bronfort, Haitao Chu, Jared D. Huling, Brent Leininger, Mohammad Hassan Murad, Zhen Wang, James S. Hodges

Abstract: Meta-analysis is commonly used to combine results from multiple clinical trials, but traditional meta-analysis methods do not refer explicitly to a population of individuals to whom the results apply and it is not clear how to use their results to assess a treatment's effect for a population of interest. We describe recently-introduced causally-interpretable meta-analysis methods and apply their t… ▽ More Meta-analysis is commonly used to combine results from multiple clinical trials, but traditional meta-analysis methods do not refer explicitly to a population of individuals to whom the results apply and it is not clear how to use their results to assess a treatment's effect for a population of interest. We describe recently-introduced causally-interpretable meta-analysis methods and apply their treatment effect estimators to two individual-participant data sets. These estimators transport estimated treatment effects from studies in the meta-analysis to a specified target population using individuals' potentially effect-modifying covariates. We consider different regression and weighting methods within this approach and compare the results to traditional aggregated-data meta-analysis methods. In our applications, certain versions of the causally-interpretable methods performed somewhat better than the traditional methods, but the latter generally did well. The causally-interpretable methods offer the most promise when covariates modify treatment effects and our results suggest that traditional methods work well when there is little effect heterogeneity. The causally-interpretable approach gives meta-analysis an appealing theoretical framework by relating an estimator directly to a specific population and lays a solid foundation for future developments. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 31 pages, 2 figures Submitted to Research Synthesis Methods

arXiv:2302.03544 [pdf, other]

Causally-Interpretable Random-Effects Meta-Analysis

Authors: Justin M. Clark, Kollin W. Rott, James S. Hodges, Jared D. Huling

Abstract: Recent work has made important contributions in the development of causally-interpretable meta-analysis. These methods transport treatment effects estimated in a collection of randomized trials to a target population of interest. Ideally, estimates targeted toward a specific population are more interpretable and relevant to policy-makers and clinicians. However, between-study heterogeneity not ari… ▽ More Recent work has made important contributions in the development of causally-interpretable meta-analysis. These methods transport treatment effects estimated in a collection of randomized trials to a target population of interest. Ideally, estimates targeted toward a specific population are more interpretable and relevant to policy-makers and clinicians. However, between-study heterogeneity not arising from differences in the distribution of treatment effect modifiers can raise difficulties in synthesizing estimates across trials. The existence of such heterogeneity, including variations in treatment modality, also complicates the interpretation of transported estimates as a generic effect in the target population. We propose a conceptual framework and estimation procedures that attempt to account for such heterogeneity, and develop inferential techniques that aim to capture the accompanying excess variability in causal estimates. This framework also seeks to clarify the kind of treatment effects that are amenable to the techniques of generalizability and transportability. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2003.01946 [pdf, ps, other]

doi 10.1177/1471082X211015452

Alleviating confounding in spatio-temporal areal models with an application on crimes against women in India

Authors: A. Adin, T. Goicoa, J. S. Hodges, P. Schnell, M. D. Ugarte

Abstract: Assessing associations between a response of interest and a set of covariates in spatial areal models is the leitmotiv of ecological regression. However, the presence of spatially correlated random effects can mask or even bias estimates of such associations due to confounding effects if they are not carefully handled. Though potentially harmful, confounding issues have often been ignored in pract… ▽ More Assessing associations between a response of interest and a set of covariates in spatial areal models is the leitmotiv of ecological regression. However, the presence of spatially correlated random effects can mask or even bias estimates of such associations due to confounding effects if they are not carefully handled. Though potentially harmful, confounding issues have often been ignored in practice leading to wrong conclusions about the underlying associations between the response and the covariates. In spatio-temporal areal models, the temporal dimension may emerge as a new source of confounding, and the problem may be even worse. In this work, we propose two approaches to deal with confounding of fixed effects by spatial and temporal random effects, while obtaining good model predictions. In particular, restricted regression and an apparently -- though in fact not -- equivalent procedure using constraints are proposed within both fully Bayes and empirical Bayes approaches. The methods are compared in terms of fixed-effect estimates and model selection criteria. The techniques are used to assess the association between dowry deaths and certain socio-demographic covariates in the districts of Uttar Pradesh, India. △ Less

Submitted 7 April, 2021; v1 submitted 4 March, 2020; originally announced March 2020.

Journal ref: Statistical Modelling 2021

arXiv:1905.08381 [pdf, other]

Statistical methods research done as science rather than mathematics

Authors: James S. Hodges

Abstract: This paper is about how we study statistical methods. As an example, it uses the random regressions model, in which the intercept and slope of cluster-specific regression lines are modeled as a bivariate random effect. Maximizing this model's restricted likelihood often gives a boundary value for the random effect correlation or variances. We argue that this is a problem; that it is a problem beca… ▽ More This paper is about how we study statistical methods. As an example, it uses the random regressions model, in which the intercept and slope of cluster-specific regression lines are modeled as a bivariate random effect. Maximizing this model's restricted likelihood often gives a boundary value for the random effect correlation or variances. We argue that this is a problem; that it is a problem because our discipline has little understanding of how contemporary models and methods map data to inferential summaries; that we lack such understanding, even for models as simple as this, because of a near-exclusive reliance on mathematics as a means of understanding; and that math alone is no longer sufficient. We then argue that as a discipline, we can and should break open our black-box methods by mimicking the five steps that molecular biologists commonly use to break open Nature's black boxes: design a simple model system, formulate hypotheses using that system, test them in experiments on that system, iterate as needed to reformulate and test hypotheses, and finally test the results in an "in vivo" system. We demonstrate this by identifying conditions under which the random-regressions restricted likelihood is likely to be maximized at a boundary value. Resistance to this approach seems to arise from a view that it lacks the certainty or intellectual heft of mathematics, perhaps because simulation experiments in our literature rarely do more than measure a new method's operating characteristics in a small range of situations. We argue that such work can make useful contributions including, as in molecular biology, the findings themselves and sometimes the designs used in the five steps; that these contributions have as much practical value as mathematical results; and that therefore they merit publication as much as the mathematical results our discipline esteems so highly. △ Less

Submitted 20 May, 2019; originally announced May 2019.

arXiv:1904.07672 [pdf]

Constraints in Random Effects Age-Period-Cohort Models

Authors: Liying Luo, James S. Hodges

Abstract: Random effects (RE) models have been widely used to study the contextual effects of structures such as neighborhood or school. The RE approach has recently been applied to age-period-cohort (APC) models that are unidentified because the predictors are exactly linearly dependent. However, it has not been fully understood how the RE specification identifies these otherwise unidentified APC models. W… ▽ More Random effects (RE) models have been widely used to study the contextual effects of structures such as neighborhood or school. The RE approach has recently been applied to age-period-cohort (APC) models that are unidentified because the predictors are exactly linearly dependent. However, it has not been fully understood how the RE specification identifies these otherwise unidentified APC models. We address this challenge by first making explicit that RE-APC models have greater -- not less -- rank deficiency than the traditional fixed-effects model, followed by two empirical examples. We then provide intuition and a mathematical proof to explain that for APC models with one RE, treating one effect as an RE is equivalent to constraining the estimates of that effect's linear component and the random intercept to be zero. For APC models with two RE's, the effective constraints implied by the model depend on the true (i.e., in the data-generating mechanism) non-linear components of the effects that are modeled as RE's, so that the estimated linear components of the RE's are determined by the true non-linear components of those effects. In conclusion, RE-APC models impose arbitrary though highly obscure constraints and thus do not differ qualitatively from other constrained APC estimators. △ Less

Submitted 16 April, 2019; originally announced April 2019.

Comments: Submitted to "Sociological Methodology"

arXiv:1805.01010 [pdf, other]

Toward a diagnostic toolkit for linear models with Gaussian-process distributed random effects

Authors: Maitreyee Bose, James S. Hodges, Sudipto Banerjee

Abstract: Gaussian processes (GPs) are widely used as distributions of random effects in linear mixed models, which are fit using the restricted likelihood or the closely-related Bayesian analysis. This article addresses two problems. First, we propose tools for understanding how data determine estimates in these models, using a spectral basis approximation to the GP under which the restricted likelihood is… ▽ More Gaussian processes (GPs) are widely used as distributions of random effects in linear mixed models, which are fit using the restricted likelihood or the closely-related Bayesian analysis. This article addresses two problems. First, we propose tools for understanding how data determine estimates in these models, using a spectral basis approximation to the GP under which the restricted likelihood is formally identical to the likelihood for a gamma-errors GLM with identity link. Second, to examine the data's support for a covariate and to understand how adding that covariate moves variation in the outcome y out of the GP and error parts of the fit, we apply a linear-model diagnostic, the added variable plot (AVP), both to the original observations and to projections of the data onto the spectral basis functions. The spectral- and observation-domain AVPs estimate the same coefficient for a covariate but emphasize low- and high-frequency data features respectively and thus highlight the covariate's effect on the GP and error parts of the fit respectively. The spectral approximation applies to data observed on a regular grid; for data observed at irregular locations, we propose smoothing the data to a grid before applying our methods. The methods are illustrated using the forest-biomass data of Finley et al.~(2008). △ Less

Submitted 2 May, 2018; originally announced May 2018.

arXiv:1704.07848 [pdf, other]

Spatial disease map** using Directed Acyclic Graph Auto-Regressive (DAGAR) models

Authors: Abhirup Datta, Sudipto Banerjee, James S. Hodges

Abstract: Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modeled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used ICAR model, which is singular, and its nonsingu… ▽ More Hierarchical models for regionally aggregated disease incidence data commonly involve region specific latent random effects that are modeled jointly as having a multivariate Gaussian distribution. The covariance or precision matrix incorporates the spatial dependence between the regions. Common choices for the precision matrix include the widely used ICAR model, which is singular, and its nonsingular extension which lacks interpretability. We propose a new parametric model for the precision matrix based on a directed acyclic graph (DAG) representation of the spatial dependence. Our model guarantees positive definiteness and, hence, in addition to being a valid prior for regional spatially correlated random effects, can also directly model the outcome from dependent data like images and networks. Theoretical results establish a link between the parameters in our model and the variance and covariances of the random effects. Substantive simulation studies demonstrate that the improved interpretability of our model reaps benefits in terms of accurately recovering the latent spatial random effects as well as for inference on the spatial covariance parameters. Under modest spatial correlation, our model far outperforms the CAR models, while the performances are similar when the spatial correlation is strong. We also assess sensitivity to the choice of the ordering in the DAG construction using theoretical and empirical results which testify to the robustness of our model. We also present a large-scale public health application demonstrating the competitive performance of the model. △ Less

Submitted 28 April, 2019; v1 submitted 25 April, 2017; originally announced April 2017.

arXiv:1011.0646 [pdf, ps, other]

doi 10.1214/09-AOAS267

Smoothed ANOVA with spatial effects as a competitor to MCAR in multivariate spatial smoothing

Authors: Yufen Zhang, James S. Hodges, Sudipto Banerjee

Abstract: Rapid developments in geographical information systems (GIS) continue to generate interest in analyzing complex spatial datasets. One area of activity is in creating smoothed disease maps to describe the geographic variation of disease and generate hypotheses for apparent differences in risk. With multiple diseases, a multivariate conditionally autoregressive (MCAR) model is often used to smooth a… ▽ More Rapid developments in geographical information systems (GIS) continue to generate interest in analyzing complex spatial datasets. One area of activity is in creating smoothed disease maps to describe the geographic variation of disease and generate hypotheses for apparent differences in risk. With multiple diseases, a multivariate conditionally autoregressive (MCAR) model is often used to smooth across space while accounting for associations between the diseases. The MCAR, however, imposes complex covariance structures that are difficult to interpret and estimate. This article develops a much simpler alternative approach building upon the techniques of smoothed ANOVA (SANOVA). Instead of simply shrinking effects without any structure, here we use SANOVA to smooth spatial random effects by taking advantage of the spatial structure. We extend SANOVA to cases in which one factor is a spatial lattice, which is smoothed using a CAR model, and a second factor is, for example, type of cancer. Datasets routinely lack enough information to identify the additional structure of MCAR. SANOVA offers a simpler and more intelligible structure than the MCAR while performing as well. We demonstrate our approach with simulation studies designed to compare SANOVA with different design matrices versus MCAR with different priors. Subsequently a cancer-surveillance dataset, describing incidence of 3-cancers in Minnesota's 87 counties, is analyzed using both approaches, showing the competitiveness of the SANOVA approach. △ Less

Submitted 2 November, 2010; originally announced November 2010.

Comments: Published in at http://dx.doi.org/10.1214/09-AOAS267 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS267

Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 4, 1805-1830

Showing 1–9 of 9 results for author: Hodges, J S