Search | arXiv e-print repository

Multi-Block Sparse Functional Principal Components Analysis for Longitudinal Microbiome Multi-Omics Data

Authors: Ling**g Jiang, Chris Elrod, Jane J. Kim, Austin D. Swafford, Rob Knight, Wesley K. Thompson

Abstract: Microbiome researchers often need to model the temporal dynamics of multiple complex, nonlinear outcome trajectories simultaneously. This motivates our development of multivariate Sparse Functional Principal Components Analysis (mSFPCA), extending existing SFPCA methods to simultaneously characterize multiple temporal trajectories and their inter-relationships. As with existing SFPCA methods, the… ▽ More Microbiome researchers often need to model the temporal dynamics of multiple complex, nonlinear outcome trajectories simultaneously. This motivates our development of multivariate Sparse Functional Principal Components Analysis (mSFPCA), extending existing SFPCA methods to simultaneously characterize multiple temporal trajectories and their inter-relationships. As with existing SFPCA methods, the mSFPCA algorithm characterizes each trajectory as a smooth mean plus a weighted combination of the smooth major modes of variation about the mean, where the weights are given by the component scores for each subject. Unlike existing SFPCA methods, the mSFPCA algorithm allows estimation of multiple trajectories simultaneously, such that the component scores, which are constrained to be independent within a particular outcome for identifiability, may be arbitrarily correlated with component scores for other outcomes. A Cholesky decomposition is used to estimate the component score covariance matrix efficiently and guarantee positive semi-definiteness given these constraints. Mutual information is used to assess the strength of marginal and conditional temporal associations across outcome trajectories. Importantly, we implement mSFPCA as a Bayesian algorithm using R and stan, enabling easy use of packages such as PSIS-LOO for model selection and graphical posterior predictive checks to assess the validity of mSFPCA models. Although we focus on application of mSFPCA to microbiome data in this paper, the mSFPCA model is of general utility and can be used in a wide range of real-world applications. △ Less

Submitted 5 February, 2021; v1 submitted 29 January, 2021; originally announced February 2021.

arXiv:2012.00579 [pdf, other]

BayesTime: Bayesian Functional Principal Components for Sparse Longitudinal Data

Authors: Ling**g Jiang, Yuan Zhong, Chris Elrod, Loki Natarajan, Rob Knight, Wesley K. Thompson

Abstract: Modeling non-linear temporal trajectories is of fundamental interest in many application areas, such as in longitudinal microbiome analysis. Many existing methods focus on estimating mean trajectories, but it is also often of value to assess temporal patterns of individual subjects. Sparse principal components analysis (SFPCA) serves as a useful tool for assessing individual variation in non-linea… ▽ More Modeling non-linear temporal trajectories is of fundamental interest in many application areas, such as in longitudinal microbiome analysis. Many existing methods focus on estimating mean trajectories, but it is also often of value to assess temporal patterns of individual subjects. Sparse principal components analysis (SFPCA) serves as a useful tool for assessing individual variation in non-linear trajectories; however its application to real data often requires careful model selection criteria and diagnostic tools. Here, we propose a Bayesian approach to SFPCA, which allows users to use the efficient leave-one-out cross-validation (LOO) with Pareto-smoothed importance sampling (PSIS) for model selection, and to utilize the estimated shape parameter from PSIS-LOO and also the posterior predictive checks for graphical model diagnostics. This Bayesian implementation thus enables careful application of SFPCA to a wide range of longitudinal data applications. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2002.03419 [pdf, other]

The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Authors: Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Arman Eshaghi, Tina Toni, Marcin Salaterski, Veronika Lunina, Manon Ansart, Stanley Durrleman, Pascal Lu, Samuel Iddi, Dan Li, Wesley K. Thompson, Michael C. Donohue, Aviv Nahon, Yarden Levy, Dan Halbersberg, Mariya Cohen, Huiling Liao, Tengfei Li , et al. (71 additional authors not shown)

Abstract: We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcome… ▽ More We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of biomarkers. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials. △ Less

Submitted 27 December, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: Presents final results of the TADPOLE competition. 60 pages, 7 tables, 14 figures

Journal ref: Machine Learning for Biomedical Imaging (MELBA), Dec 2021

arXiv:1902.02026 [pdf, other]

doi 10.1016/j.trci.2019.04.004

The relative efficiency of time-to-progression and continuous measures of cognition in pre-symptomatic Alzheimer's

Authors: Dan Li, Samuel Iddi, Paul S. Aisen, Wesley K. Thompson, Michael C. Donohue

Abstract: Pre-symptomatic (or Preclinical) Alzheimer's Disease is defined by biomarker evidence of fibrillar amyloid beta pathology in the absence of clinical symptoms. Clinical trials in this early phase of disease are challenging due to the slow rate of disease progression as measured by periodic cognitive performance tests or by transition to a diagnosis of Mild Cognitive Impairment. In a multisite study… ▽ More Pre-symptomatic (or Preclinical) Alzheimer's Disease is defined by biomarker evidence of fibrillar amyloid beta pathology in the absence of clinical symptoms. Clinical trials in this early phase of disease are challenging due to the slow rate of disease progression as measured by periodic cognitive performance tests or by transition to a diagnosis of Mild Cognitive Impairment. In a multisite study, experts provide diagnoses by central chart review without the benefit of in-person assessment. We use a simulation study to demonstrate that models of repeated cognitive assessments detect treatment effects more efficiently compared to models of time-to-progression to an endpoint such as change in diagnosis. Multivariate continuous data are simulated from a Bayesian joint mixed effects model fit to data from the Alzheimer's Disease Neuroimaging Initiative. Simulated progression events are algorithmically derived from the continuous assessments using a random forest model fit to the same data. We find that power is approximately doubled with models of repeated continuous outcomes compared to the time-to-progression analysis. The simulations also demonstrate that a plausible informative missing data pattern can induce a bias which inflates treatment effects, yet 5% Type I error is maintained. △ Less

Submitted 6 February, 2019; originally announced February 2019.

Comments: 16 pages, 4 figures

Journal ref: Alzheimer's & Dementia: Translational Research & Clinical Interventions (2019)

arXiv:1703.10266 [pdf, other]

doi 10.1177/0962280217737566

Bayesian latent time joint mixed effect models for multicohort longitudinal data

Authors: Dan Li, Samuel Iddi, Wesley K. Thompson, Michael C. Donohue

Abstract: Characterization of long-term disease dynamics, from disease-free to end-stage, is integral to understanding the course of neurodegenerative diseases such as Parkinson's and Alzheimer's; and ultimately, how best to intervene. Natural history studies typically recruit multiple cohorts at different stages of disease and follow them longitudinally for a relatively short period of time. We propose a l… ▽ More Characterization of long-term disease dynamics, from disease-free to end-stage, is integral to understanding the course of neurodegenerative diseases such as Parkinson's and Alzheimer's; and ultimately, how best to intervene. Natural history studies typically recruit multiple cohorts at different stages of disease and follow them longitudinally for a relatively short period of time. We propose a latent time joint mixed effects model to characterize long-term disease dynamics using this short-term data. Markov chain Monte Carlo methods are proposed for estimation, model selection, and inference. We apply the model to detailed simulation studies and data from the Alzheimer's Disease Neuroimaging Initiative. △ Less

Submitted 10 January, 2018; v1 submitted 29 March, 2017; originally announced March 2017.

Journal ref: Stat.Methods.Med.Res. (2017)

Showing 1–5 of 5 results for author: Thompson, W K