-
Mediation with External Summary Statistic Information (MESSI)
Authors:
Jonathan Boss,
Wei Hao,
Amber Cathey,
Barrett M. Welch,
Kelly K. Ferguson,
John D. Meeker,
Jian Kang,
Bhramar Mukherjee
Abstract:
Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there…
▽ More
Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $\boldsymbol{A}$ on $\boldsymbol{Y}$ given $\boldsymbol{C}$. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect improves estimation efficiency of the natural direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $R^2$ between the outcome ($\boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C}$) and total effect ($\boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C}$) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We robustify our estimation procedure to incongenial external information by assuming the total effect follows a random distribution. This framework allows shrinkage towards the external information if the total effects in the internal and external populations agree. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Bayesian Hierarchical Models for High-Dimensional Mediation Analysis with Coordinated Selection of Correlated Mediators
Authors:
Yanyi Song,
Xiang Zhou,
Jian Kang,
Max T. Aung,
Min Zhang,
Wei Zhao,
Belinda L. Needham,
Sharon L. R. Kardia,
Yongmei Liu,
John D. Meeker,
Jennifer A. Smith,
Bhramar Mukherjee
Abstract:
We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven…
▽ More
We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven by gene networks in genome data and correlated exposure data from the same source. When correlations are present among active mediators, mediation analysis that fails to account for such correlation can be sub-optimal and may lead to a loss of power in identifying active mediators. Building upon a recent high-dimensional mediation analysis framework, we propose two Bayesian hierarchical models, one with a Gaussian mixture prior that enables correlated mediator selection and the other with a Potts mixture prior that accounts for the correlation among active mediators in mediation analysis. We develop efficient sampling algorithms for both methods. Various simulations demonstrate that our methods enable effective identification of correlated active mediators, which could be missed by using existing methods that assume prior independence among active mediators. The proposed methods are applied to the LIFECODES birth cohort and the Multi-Ethnic Study of Atherosclerosis (MESA) and identified new active mediators with important biological implications.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth
Authors:
Shi Dong,
Zlatan Feric,
Guangyu Li,
Chieh Wu,
April Z. Gu,
Jennifer Dy,
John Meeker,
Ingrid Y. Padilla,
Jose Cordero,
Carmen Velez Vega,
Zaira Rosario,
Akram Alshawabkeh,
David Kaeli
Abstract:
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant a…
▽ More
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear Aggregation (CLA), Weighted Mean Aggregation (WMA), Feature Occurrence Frequency (OFA), and Classification Accuracy Based Aggregation (CAA). To further address missing data present in each feature, we propose two novel methods: 1) Missing Data Rate and Accuracy Based Aggregation (MAA), and 2) Entropy and Accuracy Based Aggregation (EAA). Both proposed models balance the degree of data variance introduced by the missing data handling during the feature selection process while maintaining model performance. Our results show a 42\% improvement in sensitivity versus fallout over previous state-of-the-art methods.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects
Authors:
Yanyi Song,
Xiang Zhou,
Jian Kang,
Max T. Aung,
Min Zhang,
Wei Zhao,
Belinda L. Needham,
Sharon L. R. Kardia,
Yongmei Liu,
John D. Meeker,
Jennifer A. Smith,
Bhramar Mukherjee
Abstract:
Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional me…
▽ More
Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
A Hierarchical Integrative Group LASSO (HiGLASSO) Framework for Analyzing Environmental Mixtures
Authors:
Jonathan Boss,
Alexander Rix,
Yin-Hsiu Chen,
Naveen N. Narisetty,
Zhenke Wu,
Kelly K. Ferguson,
Thomas F. McElrath,
John D. Meeker,
Bhramar Mukherjee
Abstract:
Environmental health studies are increasingly measuring multiple pollutants to characterize the joint health effects attributable to exposure mixtures. However, the underlying dose-response relationship between toxicants and health outcomes of interest may be highly nonlinear, with possible nonlinear interaction effects. Existing penalized regression methods that account for exposure interactions…
▽ More
Environmental health studies are increasingly measuring multiple pollutants to characterize the joint health effects attributable to exposure mixtures. However, the underlying dose-response relationship between toxicants and health outcomes of interest may be highly nonlinear, with possible nonlinear interaction effects. Existing penalized regression methods that account for exposure interactions either cannot accommodate nonlinear interactions while maintaining strong heredity or are computationally unstable in applications with limited sample size. In this paper, we propose a general shrinkage and selection framework to identify noteworthy nonlinear main and interaction effects among a set of exposures. We design hierarchical integrative group LASSO (HiGLASSO) to (a) impose strong heredity constraints on two-way interaction effects (hierarchical), (b) incorporate adaptive weights without necessitating initial coefficient estimates (integrative), and (c) induce sparsity for variable selection while respecting group structure (group LASSO). We prove sparsistency of the proposed method and apply HiGLASSO to an environmental toxicants dataset from the LIFECODES birth cohort, where the investigators are interested in understanding the joint effects of 21 urinary toxicant biomarkers on urinary 8-isoprostane, a measure of oxidative stress. An implementation of HiGLASSO is available in the higlasso R package, accessible through the Comprehensive R Archive Network.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.