-
Scanner Invariant Representations for Diffusion MRI Harmonization
Authors:
Daniel Moyer,
Greg Ver Steeg,
Chantal M. W. Tax,
Paul M. Thompson
Abstract:
Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation.
Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more c…
▽ More
Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation.
Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more common. We propose learning an intermediate representation invariant to site/protocol variables, a technique adapted from information theory-based algorithmic fairness; by leveraging the data processing inequality, such a representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to underlying structures. To implement this, we use a deep learning method based on variational auto-encoders (VAE) to construct scanner invariant encodings of the imaging data.
Results: To evaluate our method, we use training data from the 2018 MICCAI Computational Diffusion MRI (CDMRI) Challenge Harmonization dataset. Our proposed method shows improvements on independent test data relative to a recently published baseline method on each subtask, map** data from three different scanning contexts to and from one separate target scanning context.
Conclusion: As imaging studies continue to grow, the use of pooled multi-site imaging will similarly increase. Invariant representation presents a strong candidate for the harmonization of these data.
△ Less
Submitted 31 January, 2020; v1 submitted 10 April, 2019;
originally announced April 2019.
-
Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data
Authors:
Santiago Silva,
Boris Gutman,
Eduardo Romero,
Paul M Thompson,
Andre Altmann,
Marco Lorenzi
Abstract:
At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting t…
▽ More
At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts
△ Less
Submitted 14 March, 2019; v1 submitted 19 October, 2018;
originally announced October 2018.
-
Measures of Tractography Convergence
Authors:
Daniel Moyer,
Paul M. Thompson,
Greg Ver Steeg
Abstract:
In the present work, we use information theory to understand the empirical convergence rate of tractography, a widely-used approach to reconstruct anatomical fiber pathways in the living brain. Based on diffusion MRI data, tractography is the starting point for many methods to study brain connectivity. Of the available methods to perform tractography, most reconstruct a finite set of streamlines,…
▽ More
In the present work, we use information theory to understand the empirical convergence rate of tractography, a widely-used approach to reconstruct anatomical fiber pathways in the living brain. Based on diffusion MRI data, tractography is the starting point for many methods to study brain connectivity. Of the available methods to perform tractography, most reconstruct a finite set of streamlines, or 3D curves, representing probable connections between anatomical regions, yet relatively little is known about how the sampling of this set of streamlines affects downstream results, and how exhaustive the sampling should be. Here we provide a method to measure the information theoretic surprise (self-cross entropy) for tract sampling schema. We then empirically assess four streamline methods. We demonstrate that the relative information gain is very low after a moderate number of streamlines have been generated for each tested method. The results give rise to several guidelines for optimal sampling in brain connectivity analyses.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
A Fast, Accurate Two-Step Linear Mixed Model for Genetic Analysis Applied to Repeat MRI Measurements
Authors:
Qifan Yang,
Gennady V. Roshchupkin,
Wiro J. Niessen,
Sarah E. Medland,
Alyssa H. Zhu,
Paul M. Thompson,
Neda Jahanshad
Abstract:
Large-scale biobanks are being collected around the world in efforts to better understand human health and risk factors for disease. They often survey hundreds of thousands of individuals, combining questionnaires with clinical, genetic, demographic, and imaging assessments; some of this data may be collected longitudinally. Genetic associations analysis of such datasets requires methods to proper…
▽ More
Large-scale biobanks are being collected around the world in efforts to better understand human health and risk factors for disease. They often survey hundreds of thousands of individuals, combining questionnaires with clinical, genetic, demographic, and imaging assessments; some of this data may be collected longitudinally. Genetic associations analysis of such datasets requires methods to properly handle relatedness, population structure and other types of biases introduced by confounders. Most popular and accurate approaches rely on linear mixed model (LMM) algorithms, which are iterative and computational complexity of each iteration scales by the square of the sample size, slowing the pace of discoveries (up to several days for single trait analysis), and, furthermore, limiting the use of repeat phenotypic measurements. Here, we describe our new, non-iterative, much faster and accurate Two-Step Linear Mixed Model (Two-Step LMM) approach, that has a computational complexity that scales linearly with sample size. We show that the first step retains accurate estimates of the heritability (the proportion of the trait variance explained by additive genetic factors), even when increasingly complex genetic relationships between individuals are modeled. Second step provides a faster framework to obtain the effect sizes of covariates in regression model. We applied Two-Step LMM to real data from the UK Biobank, which recently released genoty** information and processed MRI data from 9,725 individuals. We used the left and right hippocampus volume (HV) as repeated measures, and observed increased and more accurate heritability estimation, consistent with simulations.
△ Less
Submitted 15 March, 2019; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Simultaneous Matrix Diagonalization for Structural Brain Networks Classification
Authors:
Nikita Mokrov,
Maxim Panov,
Boris A. Gutman,
Joshua I. Faskowitz,
Neda Jahanshad,
Paul M. Thompson
Abstract:
This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenst…
▽ More
This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer's disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification.
△ Less
Submitted 14 October, 2017;
originally announced October 2017.
-
Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model
Authors:
Dajiang Zhu,
Brandalyn C. Riedel,
Neda Jahanshad,
Nynke A. Groenewold,
Dan J. Stein,
Ian H. Gotlib,
Matthew D. Sacchet,
Danai Dima,
James H. Cole,
Cynthia H. Y. Fu,
Henrik Walter,
Ilya M. Veer,
Thomas Frodl,
Lianne Schmaal,
Dick J. Veltman,
Paul M. Thompson
Abstract:
Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distribut…
▽ More
Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distributed information. Here we propose a novel classification framework through multi-site weighted LASSO: each site performs an iterative weighted LASSO for feature selection separately. Within each iteration, the classification result and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at the next iteration. Only the fea-tures that help to improve the classification accuracy are preserved. In tests on da-ta from five sites (299 patients with major depressive disorder (MDD) and 258 normal controls), our method boosted classification accuracy for MDD by 4.9% on average. This result shows the potential of the proposed new strategy as an ef-fective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data.
△ Less
Submitted 3 June, 2017; v1 submitted 26 May, 2017;
originally announced May 2017.
-
Large-scale Feature Selection of Risk Genetic Factors for Alzheimer's Disease via Distributed Group Lasso Regression
Authors:
Qingyang Li,
Dajiang Zhu,
Jie Zhang,
Derrek Paul Hibar,
Neda Jahanshad,
Yalin Wang,
Jie** Ye,
Paul M. Thompson,
Jie Wang
Abstract:
Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer's disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in…
▽ More
Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer's disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection Framework (DFSF) to conduct the large-scale imaging genetics studies across multiple institutions. To speed up the learning process, we propose a family of distributed group Lasso screening rules to identify irrelevant features and remove them from the optimization. Then we select the relevant group features by performing the group Lasso feature selection process in a sequence of parameters. Finally, we employ the stability selection to rank the top risk SNPs that might help detect the early stage of AD. To the best of our knowledge, this is the first distributed feature selection model integrated with group Lasso feature selection as well as detecting the risk genetic factors across multiple research institutions system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs which are distributed across several individual institutions, demonstrating the efficiency and effectiveness of the proposed method.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex
Authors:
Daniel Moyer,
Boris A Gutman,
Neda Jahanshad,
Paul M. Thompson
Abstract:
One of the primary objectives of human brain map** is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as…
▽ More
One of the primary objectives of human brain map** is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as the volume of available information grows. Towards this end, we present a parcellation method based on a Bayesian non-parametric mixture model of cortical connectivity.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
A Continuous Model of Cortical Connectivity
Authors:
Daniel Moyer,
Boris A. Gutman,
Joshua Faskowitz,
Neda Jahanshad,
Paul M. Thompson
Abstract:
We present a continuous model for structural brain connectivity based on the Poisson point process. The model treats each streamline curve in a tractography as an observed event in connectome space, here a product space of cortical white matter boundaries. We approximate the model parameter via kernel density estimation. To deal with the heavy computational burden, we develop a fast parameter esti…
▽ More
We present a continuous model for structural brain connectivity based on the Poisson point process. The model treats each streamline curve in a tractography as an observed event in connectome space, here a product space of cortical white matter boundaries. We approximate the model parameter via kernel density estimation. To deal with the heavy computational burden, we develop a fast parameter estimation method by pre-computing associated Legendre products of the data, leveraging properties of the spherical heat kernel. We show how our approach can be used to assess the quality of cortical parcellations with respect to connectivty. We further present empirical results that suggest the discrete connectomes derived from our model have substantially higher test-retest reliability compared to standard methods.
△ Less
Submitted 5 November, 2018; v1 submitted 12 October, 2016;
originally announced October 2016.
-
Large-scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer's Disease Across Multiple Institutions
Authors:
Qingyang Li,
Tao Yang,
Liang Zhan,
Derrek Paul Hibar,
Neda Jahanshad,
Yalin Wang,
Jie** Ye,
Paul M. Thompson,
Jie Wang
Abstract:
Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimer's disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on individual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data priv…
▽ More
Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimer's disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on individual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data privacy. To address this challenge, we propose a novel distributed framework, termed Local Query Model (LQM) to detect risk SNPs for AD across multiple research institutions. To accelerate the learning process, we propose a Distributed Enhanced Dual Polytope Projection (D-EDPP) screening rule to identify irrelevant features and remove them from the optimization. To the best of our knowledge, this is the first successful run of the computationally intensive model selection procedure to learn a consistent model across different institutions without compromising their privacy while ranking the SNPs that may collectively affect AD. Empirical studies are conducted on 809 subjects with 5.9 million SNP features which are distributed across three individual institutions. D-EDPP achieved a 66-fold speed-up by effectively identifying irrelevant features.
△ Less
Submitted 19 August, 2016;
originally announced August 2016.
-
Unifying inference on brain network variations in neurological diseases: The Alzheimer's case
Authors:
Daniele Durante,
Madelaine Daianu,
Neda Jahanshad,
Paul M. Thompson,
David B. Dunson
Abstract:
There is growing interest in understanding how the structural interconnections among brain regions change with the occurrence of neurological diseases. Diffusion weighted MRI imaging has allowed researchers to non-invasively estimate a network of structural cortical connections made by white matter tracts, but current statistical methods for relating such networks to the presence or absence of a d…
▽ More
There is growing interest in understanding how the structural interconnections among brain regions change with the occurrence of neurological diseases. Diffusion weighted MRI imaging has allowed researchers to non-invasively estimate a network of structural cortical connections made by white matter tracts, but current statistical methods for relating such networks to the presence or absence of a disease cannot exploit this rich network information. Standard practice considers each edge independently or summarizes the network with a few simple features. We enable dramatic gains in biological insight via a novel unifying methodology for inference on brain network variations associated to the occurrence of neurological diseases. The key of this approach is to define a probabilistic generative mechanism directly on the space of network configurations via dependent mixtures of low-rank factorizations, which efficiently exploit network information and allow the probability mass function for the brain network-valued random variable to vary flexibly across the group of patients characterized by a specific neurological disease and the one comprising age-matched cognitively healthy individuals.
△ Less
Submitted 19 October, 2015;
originally announced October 2015.
-
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression
Authors:
Matt Silver,
Eva Janousova,
Xue Hua,
Paul M. Thompson,
Giovanni Montana
Abstract:
We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regress…
▽ More
We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 464 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing chemokine, Jak-stat and insulin signalling pathways, and tight junction interactions. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection, and identify a number of previously validated AD genes including CR1, APOE and TOMM40.
△ Less
Submitted 9 April, 2012;
originally announced April 2012.