-
Hierarchical selection of genetic and gene by environment interaction effects in high-dimensional mixed models
Authors:
Julien St-Pierre,
Karim Oualkacha,
Julien St-Pierre
Abstract:
Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models (GLMs) have been proposed for hierarchical selection of gene by environment interaction (GEI) effects, where a GEI effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these metho…
▽ More
Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models (GLMs) have been proposed for hierarchical selection of gene by environment interaction (GEI) effects, where a GEI effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these methods allow to include random effects to account for population structure, subject relatedness and shared environmental exposure. In this paper, we develop a unified approach based on regularized penalized quasi-likelihood (PQL) estimation to perform hierarchical selection of GEI effects in sparse regularized mixed models. We compare the selection and prediction accuracy of our proposed model with existing methods through simulations under the presence of population structure and shared environmental exposure. We show that for all simulation scenarios, compared to other penalized methods, our proposed method enforced sparsity by controlling the number of false positives in the model while having the best predictive performance. Finally, we apply our method to a real data application using the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, and found that our method retrieves previously reported significant loci.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Efficient Penalized Generalized Linear Mixed Models for Variable Selection and Genetic Risk Prediction in High-Dimensional Data
Authors:
Julien St-Pierre,
Karim Oualkacha,
Sahir Rai Bhatnagar
Abstract:
Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PC) adjustment to account for population structure and relatedness in high-dimensional penalized models. However…
▽ More
Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PC) adjustment to account for population structure and relatedness in high-dimensional penalized models. However, their use in binary trait GWAS rely on the invalid assumption that the residual variance does not depend on the estimated regression coefficients. Moreover, LMMs use a single spectral decomposition of the covariance matrix of the responses, which is no longer possible in generalized linear mixed models (GLMMs). We introduce a new method called pglmm, a penalized GLMM that allows to simultaneously select genetic markers and estimate their effects, accounting for between-individual correlations and binary nature of the trait. We develop a computationally efficient algorithm based on PQL estimation that allows to scale regularized mixed models on high-dimensional binary trait GWAS (~300,000 SNPs). We show through simulations that penalized LMM and logistic regression with PC adjustment fail to correctly select important predictors and/or that prediction accuracy decreases for a binary response when the dimensionality of the relatedness matrix is high compared to pglmm. Further, we demonstrate through the analysis of two polygenic binary traits in the UK Biobank data that our method can achieve higher predictive performance, while also selecting fewer predictors than a sparse regularized logistic lasso with PC adjustment. Our method is available as a Julia package PenalizedGLMM.jl.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
A copula-based set-variant association test for bivariate continuous or mixed phenotypes
Authors:
Julien St-Pierre,
Karim Oualkacha
Abstract:
In genome wide association studies (GWAS), researchers are often dealing with non-normally distributed traits or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally d…
▽ More
In genome wide association studies (GWAS), researchers are often dealing with non-normally distributed traits or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the $[0, 1]$ interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible Copula-Based Multivariate Association Test (CBMAT) for discovering association between a genetic region and a bivariate continuous or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1,477 subjects from the ASLPAC study.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
High-throughput molecular imaging via deep learning enabled Raman spectroscopy
Authors:
Conor C. Horgan,
Magnus Jensen,
Anika Nagelkerke,
Jean-Phillipe St-Pierre,
Tom Vercauteren,
Molly M. Stevens,
Mads S. Bergholt
Abstract:
Raman spectroscopy enables non-destructive, label-free imaging with unprecedented molecular contrast but is limited by slow data acquisition, largely preventing high-throughput imaging applications. Here, we present a comprehensive framework for higher-throughput molecular imaging via deep learning enabled Raman spectroscopy, termed DeepeR, trained on a large dataset of hyperspectral Raman images,…
▽ More
Raman spectroscopy enables non-destructive, label-free imaging with unprecedented molecular contrast but is limited by slow data acquisition, largely preventing high-throughput imaging applications. Here, we present a comprehensive framework for higher-throughput molecular imaging via deep learning enabled Raman spectroscopy, termed DeepeR, trained on a large dataset of hyperspectral Raman images, with over 1.5 million spectra (400 hours of acquisition) in total. We firstly perform denoising and reconstruction of low signal-to-noise ratio Raman molecular signatures via deep learning, with a 9x improvement in mean squared error over state-of-the-art Raman filtering methods. Next, we develop a neural network for robust 2-4x super-resolution of hyperspectral Raman images that preserves molecular cellular information. Combining these approaches, we achieve Raman imaging speed-ups of up to 160x, enabling high resolution, high signal-to-noise ratio cellular imaging in under one minute. Finally, transfer learning is applied to extend DeepeR from cell to tissue-scale imaging. DeepeR provides a foundation that will enable a host of higher-throughput Raman spectroscopy and molecular imaging applications across biomedicine.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.