-
Synthesizing study-specific controls using generative models on open access datasets for harmonized multi-study analyses
Authors:
Shruti P. Gadewar,
Alyssa H. Zhu,
Iyad Ba Gari,
Sunanda Somu,
Sophia I. Thomopoulos,
Paul M. Thompson,
Talia M. Nir,
Neda Jahanshad
Abstract:
Neuroimaging consortia can enhance reliability and generalizability of findings by pooling data across studies to achieve larger sample sizes. To adjust for site and MRI protocol effects, imaging datasets are often harmonized based on healthy controls. When data from a control group were not collected, statistical harmonization options are limited as patient characteristics and acquisition-related…
▽ More
Neuroimaging consortia can enhance reliability and generalizability of findings by pooling data across studies to achieve larger sample sizes. To adjust for site and MRI protocol effects, imaging datasets are often harmonized based on healthy controls. When data from a control group were not collected, statistical harmonization options are limited as patient characteristics and acquisition-related variables may be confounded. Here, in a multi-study neuroimaging analysis of Alzheimer's patients and controls, we tested whether it is possible to generate synthetic control MRIs. For one case-control study, we used a generative adversarial model for style-based harmonization to generate site-specific controls. Downstream feature extraction, statistical harmonization and group-level multi-study case-control and case-only analyses were performed twice, using either true or synthetic controls. All effect sizes using synthetic controls overlapped with those based on true study controls. This line of work may facilitate wider inclusion of case-only studies in multi-study consortia.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
DenseNet and Support Vector Machine classifications of major depressive disorder using vertex-wise cortical features
Authors:
Vladimir Belov,
Tracy Erwin-Grabner,
Ling-Li Zeng,
Christopher R. K. Ching,
Andre Aleman,
Alyssa R. Amod,
Zeynep Basgoze,
Francesco Benedetti,
Bianca Besteher,
Katharina Brosch,
Robin Bülow,
Romain Colle,
Colm G. Connolly,
Emmanuelle Corruble,
Baptiste Couvy-Duchesne,
Kathryn Cullen,
Udo Dannlowski,
Christopher G. Davey,
Annemiek Dols,
Jan Ernsting,
Jennifer W. Evans,
Lukas Fisch,
Paola Fuentes-Claramonte,
Ali Saffet Gonul,
Ian H. Gotlib
, et al. (63 additional authors not shown)
Abstract:
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h…
▽ More
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. In this study, we used globally representative data from the ENIGMA-MDD working group containing an extensive sample of people with MDD (N=2,772) and HC (N=4,240), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. As we analyzed a multi-site sample, we additionally applied the ComBat harmonization tool to remove potential nuisance effects of site. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of features and classifiers is unfeasible.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Tackling the dimensions in imaging genetics with CLUB-PLS
Authors:
Andre Altmann,
Ana C Lawry Aguila,
Neda Jahanshad,
Paul M Thompson,
Marco Lorenzi
Abstract:
A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging m…
▽ More
A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging measure. Although this approach has been tremendously successful, one shortcoming is that phenotypes must be pre-defined. Consequently, effects that are not confined to pre-selected regions of interest or that reflect larger brain-wide patterns can easily be missed. In this work we introduce a Partial Least Squares (PLS)-based framework, which we term Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in both domains as well as with large sample sizes. One key factor of the framework is to use cluster bootstrap to provide robust statistics for single input features in both domains. We applied CLUB-PLS to investigating the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
△ Less
Submitted 19 September, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis
Authors:
Nikhil J. Dhinagar,
Amit Singh,
Saket Ozarkar,
Ketaki Buwa,
Sophia I. Thomopoulos,
Conor Owens-Walton,
Emily Laltoo,
Yao-Liang Chen,
Philip Cook,
Corey McMillan,
Chih-Chien Tsai,
J-J Wang,
Yih-Ru Wu,
Paul M. Thompson
Abstract:
Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on…
▽ More
Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on natural images, medical images, or even synthetically generated MRI scans or video data. To evaluate these alternatives, here we benchmarked vision transformers (ViTs) and convolutional neural networks (CNNs), initialized with varied upstream pre-training approaches. These methods were then adapted to three unique downstream neuroimaging tasks with a range of difficulty: Alzheimer's disease (AD) and Parkinson's disease (PD) classification, "brain age" prediction. Experimental tests led to the following key observations: 1. Pre-training improved performance across all tasks including a boost of 7.4% for AD classification and 4.6% for PD classification for the ViT and 19.1% for PD classification and reduction in brain age prediction error by 1.26 years for CNNs, 2. Pre-training on large-scale video or synthetic MRI data boosted performance of ViTs, 3. CNNs were robust in limited-data settings, and in-domain pretraining enhanced their performances, 4. Pre-training improved generalization to out-of-distribution datasets and sites. Overall, we benchmarked different vision architectures, revealing the value of pre-training them with emerging datasets for model initialization. The resulting pre-trained models can be adapted to a range of downstream neuroimaging tasks, even when training data for the target task is limited.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Linking Symptom Inventories using Semantic Textual Similarity
Authors:
Eamonn Kennedy,
Shashank Vadlamani,
Hannah M Lindsey,
Kelly S Peterson,
Kristen Dams OConnor,
Kenton Murray,
Ronak Agarwal,
Houshang H Amiri,
Raeda K Andersen,
Talin Babikian,
David A Baron,
Erin D Bigler,
Karen Caeyenberghs,
Lisa Delano-Wood,
Seth G Disner,
Ekaterina Dobryakova,
Blessen C Eapen,
Rachel M Edelstein,
Carrie Esopenko,
Helen M Genova,
Elbert Geuze,
Naomi J Goodrich-Hunsaker,
Jordan Grafman,
Asta K Haberg,
Cooper B Hodges
, et al. (57 additional authors not shown)
Abstract:
An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores…
▽ More
An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Incomplete Multimodal Learning for Complex Brain Disorders Prediction
Authors:
Reza Shirkavand,
Liang Zhan,
Heng Huang,
Li Shen,
Paul M. Thompson
Abstract:
Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research coh…
▽ More
Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research cohorts and are prohibitive to collect in routine clinical practice. Especially in studies of brain diseases, research cohorts may include both neuroimaging data and genetic data, but for practical clinical diagnosis, we often need to make disease predictions only based on neuroimages. As a result, it is desired to design machine learning models which can use all available data (different data could provide complementary information) during training but conduct inference using only the most common data modality. We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks to effectively exploit auxiliary modalities available during training in order to improve the performance of a unimodal model at inference. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Experimental results demonstrate that our approach outperforms the related machine learning and deep learning methods by a significant margin.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
A Comprehensive Corpus Callosum Segmentation Tool for Detecting Callosal Abnormalities and Genetic Associations from Multi Contrast MRIs
Authors:
Shruti P. Gadewar,
Elnaz Nourollahimoghadam,
Ravi R. Bhatt,
Abhinaav Ramesh,
Shayan Javid,
Iyad Ba Gari,
Alyssa H. Zhu,
Sophia Thomopoulos,
Paul M. Thompson,
Neda Jahanshad
Abstract:
Structural alterations of the midsagittal corpus callosum (midCC) have been associated with a wide range of brain disorders. The midCC is visible on most MRI contrasts and in many acquisitions with a limited field-of-view. Here, we present an automated tool for segmenting and assessing the shape of the midCC from T1w, T2w, and FLAIR images. We train a UNet on images from multiple public datasets t…
▽ More
Structural alterations of the midsagittal corpus callosum (midCC) have been associated with a wide range of brain disorders. The midCC is visible on most MRI contrasts and in many acquisitions with a limited field-of-view. Here, we present an automated tool for segmenting and assessing the shape of the midCC from T1w, T2w, and FLAIR images. We train a UNet on images from multiple public datasets to obtain midCC segmentations. A quality control algorithm is also built-in, trained on the midCC shape features. We calculate intraclass correlations (ICC) and average Dice scores in a test-retest dataset to assess segmentation reliability. We test our segmentation on poor quality and partial brain scans. We highlight the biological significance of our extracted features using data from over 40,000 individuals from the UK Biobank; we classify clinically defined shape abnormalities and perform genetic analyses.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
A Surface-Based Federated Chow Test Model for Integrating APOE Status, Tau Deposition Measure, and Hippocampal Surface Morphometry
Authors:
Jianfeng Wu,
Yi Su,
Yanxi Chen,
Wenhui Zhu,
Eric M. Reiman,
Richard J. Caselli,
Kewei Chen,
Paul M. Thompson,
Junwen Wang,
Yalin Wang
Abstract:
Background: Alzheimer's Disease (AD) is the most common type of age-related dementia, affecting 6.2 million people aged 65 or older according to CDC data. It is commonly agreed that discovering an effective AD diagnosis biomarker could have enormous public health benefits, potentially preventing or delaying up to 40% of dementia cases. Tau neurofibrillary tangles are the primary driver of downstre…
▽ More
Background: Alzheimer's Disease (AD) is the most common type of age-related dementia, affecting 6.2 million people aged 65 or older according to CDC data. It is commonly agreed that discovering an effective AD diagnosis biomarker could have enormous public health benefits, potentially preventing or delaying up to 40% of dementia cases. Tau neurofibrillary tangles are the primary driver of downstream neurodegeneration and subsequent cognitive impairment in AD, resulting in structural deformations such as hippocampal atrophy that can be observed in magnetic resonance imaging (MRI) scans. Objective: To build a surface-based model to 1) detect differences between APOE subgroups in patterns of tau deposition and hippocampal atrophy, and 2) use the extracted surface-based features to predict cognitive decline. Methods: Using data obtained from different institutions, we develop a surface-based federated Chow test model to study the synergistic effects of APOE, a previously reported significant risk factor of AD, and tau on hippocampal surface morphometry. Results: We illustrate that the APOE-specific morphometry features correlate with AD progression and better predict future AD conversion than other MRI biomarkers. For example, a strong association between atrophy and abnormal tau was identified in hippocampal subregion cornu ammonis 1 (CA1 subfield) and subiculum in e4 homozygote cohort. Conclusion: Our model allows for identifying MRI biomarkers for AD and cognitive decline prediction and may uncover a corner of the neural mechanism of the influence of APOE and tau deposition on hippocampal morphology.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Few-Shot Classification of Autism Spectrum Disorder using Site-Agnostic Meta-Learning and Brain MRI
Authors:
Nikhil J. Dhinagar,
Vignesh Santhalingam,
Katherine E. Lawrence,
Emily Laltoo,
Paul M. Thompson
Abstract:
For machine learning applications in medical imaging, the availability of training data is often limited, which hampers the design of radiological classifiers for subtle conditions such as autism spectrum disorder (ASD). Transfer learning is one method to counter this problem of low training data regimes. Here we explore the use of meta-learning for very low data regimes in the context of having p…
▽ More
For machine learning applications in medical imaging, the availability of training data is often limited, which hampers the design of radiological classifiers for subtle conditions such as autism spectrum disorder (ASD). Transfer learning is one method to counter this problem of low training data regimes. Here we explore the use of meta-learning for very low data regimes in the context of having prior data from multiple sites - an approach we term site-agnostic meta-learning. Inspired by the effectiveness of meta-learning for optimizing a model across multiple tasks, here we propose a framework to adapt it to learn across multiple sites. We tested our meta-learning model for classifying ASD versus typically develo** controls in 2,201 T1-weighted (T1-w) MRI scans collected from 38 imaging sites as part of Autism Brain Imaging Data Exchange (ABIDE) [age: 5.2-64.0 years]. The method was trained to find a good initialization state for our model that can quickly adapt to data from new unseen sites by fine-tuning on the limited data that is available. The proposed method achieved an ROC-AUC=0.857 on 370 scans from 7 unseen sites in ABIDE using a few-shot setting of 2-way 20-shot i.e., 20 training samples per site. Our results outperformed a transfer learning baseline by generalizing across a wider range of sites as well as other related prior work. We also tested our model in a zero-shot setting on an independent test site without any additional fine-tuning. Our experiments show the promise of the proposed site-agnostic meta-learning framework for challenging neuroimaging tasks involving multi-site heterogeneity with limited availability of training data.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection
Authors:
Nikhil J. Dhinagar,
Sophia I. Thomopoulos,
Emily Laltoo,
Paul M. Thompson
Abstract:
Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease, and to assist diagnosis, subty**, and prognosis. Data-driven models such as convolutional neural networks (CNNs) have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features. Vision transformers (ViT) - a new class of deep learning archi…
▽ More
Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease, and to assist diagnosis, subty**, and prognosis. Data-driven models such as convolutional neural networks (CNNs) have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features. Vision transformers (ViT) - a new class of deep learning architectures - have emerged in recent years as an alternative to CNNs for several computer vision applications. Here we tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty, in this case for sex and Alzheimer's disease (AD) classification based on 3D brain MRI. In our experiments, two vision transformer architecture variants achieved an AUC of 0.987 for sex and 0.892 for AD classification, respectively. We independently evaluated our models on data from two benchmark AD datasets. We achieved a performance boost of 5% and 9-10% upon fine-tuning vision transformer models pre-trained on synthetic (generated by a latent diffusion model) and real MRI scans, respectively. Our main contributions include testing the effects of different ViT training strategies including pre-training, data augmentation and learning rate warm-ups followed by annealing, as pertaining to the neuroimaging domain. These techniques are essential for training ViT-like models for neuroimaging applications where training data is usually limited. We also analyzed the effect of the amount of training data utilized on the test-time performance of the ViT via data-model scaling curves.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Transferring Models Trained on Natural Images to 3D MRI via Position Encoded Slice Models
Authors:
Umang Gupta,
Tamoghna Chattopadhyay,
Nikhil Dhinagar,
Paul M. Thompson,
Greg Ver Steeg,
The Alzheimer's Disease Neuroimaging Initiative
Abstract:
Transfer learning has remarkably improved computer vision. These advances also promise improvements in neuroimaging, where training set sizes are often small. However, various difficulties arise in directly applying models pretrained on natural images to radiologic images, such as MRIs. In particular, a mismatch in the input space (2D images vs. 3D MRIs) restricts the direct transfer of models, of…
▽ More
Transfer learning has remarkably improved computer vision. These advances also promise improvements in neuroimaging, where training set sizes are often small. However, various difficulties arise in directly applying models pretrained on natural images to radiologic images, such as MRIs. In particular, a mismatch in the input space (2D images vs. 3D MRIs) restricts the direct transfer of models, often forcing us to consider only a few MRI slices as input. To this end, we leverage the 2D-Slice-CNN architecture of Gupta et al. (2021), which embeds all the MRI slices with 2D encoders (neural networks that take 2D image input) and combines them via permutation-invariant layers. With the insight that the pretrained model can serve as the 2D encoder, we initialize the 2D encoder with ImageNet pretrained weights that outperform those initialized and trained from scratch on two neuroimaging tasks -- brain age prediction on the UK Biobank dataset and Alzheimer's disease detection on the ADNI dataset. Further, we improve the modeling capabilities of 2D-Slice models by incorporating spatial information through position embeddings, which can improve the performance in some cases.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Curriculum Based Multi-Task Learning for Parkinson's Disease Detection
Authors:
Nikhil J. Dhinagar,
Conor Owens-Walton,
Emily Laltoo,
Christina P. Boyle,
Yao-Liang Chen,
Philip Cook,
Corey McMillan,
Chih-Chien Tsai,
J-J Wang,
Yih-Ru Wu,
Ysbrand van der Werf,
Paul M. Thompson
Abstract:
There is great interest in develo** radiological classifiers for diagnosis, staging, and predictive modeling in progressive diseases such as Parkinson's disease (PD), a neurodegenerative disease that is difficult to detect in its early stages. Here we leverage severity-based meta-data on the stages of disease to define a curriculum for training a deep convolutional neural network (CNN). Typicall…
▽ More
There is great interest in develo** radiological classifiers for diagnosis, staging, and predictive modeling in progressive diseases such as Parkinson's disease (PD), a neurodegenerative disease that is difficult to detect in its early stages. Here we leverage severity-based meta-data on the stages of disease to define a curriculum for training a deep convolutional neural network (CNN). Typically, deep learning networks are trained by randomly selecting samples in each mini-batch. By contrast, curriculum learning is a training strategy that aims to boost classifier performance by starting with examples that are easier to classify. Here we define a curriculum to progressively increase the difficulty of the training data corresponding to the Hoehn and Yahr (H&Y) staging system for PD (total N=1,012; 653 PD patients, 359 controls; age range: 20.0-84.9 years). Even with our multi-task setting using pre-trained CNNs and transfer learning, PD classification based on T1-weighted (T1-w) MRI was challenging (ROC AUC: 0.59-0.65), but curriculum training boosted performance (by 3.9%) compared to our baseline model. Future work with multimodal imaging may further boost performance.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Improved Prediction of Beta-Amyloid and Tau Burden Using Hippocampal Surface Multivariate Morphometry Statistics and Sparse Coding
Authors:
Jianfeng Wu,
Yi Su,
Wenhui Zhu,
Negar Jalili Mallak,
Natasha Lepore,
Eric M. Reiman,
Richard J. Caselli,
Paul M. Thompson,
Kewei Chen,
Yalin Wang
Abstract:
Background: Beta-amyloid (A$β$) plaques and tau protein tangles in the brain are the defining 'A' and 'T' hallmarks of Alzheimer's disease (AD), and together with structural atrophy detectable on brain magnetic resonance imaging (MRI) scans as one of the neurodegenerative ('N') biomarkers comprise the ''ATN framework'' of AD. Current methods to detect A$β$/tau pathology include cerebrospinal fluid…
▽ More
Background: Beta-amyloid (A$β$) plaques and tau protein tangles in the brain are the defining 'A' and 'T' hallmarks of Alzheimer's disease (AD), and together with structural atrophy detectable on brain magnetic resonance imaging (MRI) scans as one of the neurodegenerative ('N') biomarkers comprise the ''ATN framework'' of AD. Current methods to detect A$β$/tau pathology include cerebrospinal fluid (CSF; invasive), positron emission tomography (PET; costly and not widely available), and blood-based biomarkers (BBBM; promising but mainly still in development).
Objective: To develop a non-invasive and widely available structural MRI-based framework to quantitatively predict the amyloid and tau measurements.
Methods: With MRI-based hippocampal multivariate morphometry statistics (MMS) features, we apply our Patch Analysis-based Surface Correntropy-induced Sparse coding and max-pooling (PASCS-MP) method combined with the ridge regression model to individual amyloid/tau measure prediction.
Results: We evaluate our framework on amyloid PET/MRI and tau PET/MRI datasets from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Each subject has one pair consisting of a PET image and MRI scan, collected at about the same time. Experimental results suggest that amyloid/tau measurements predicted with our PASCP-MP representations are closer to the real values than the measures derived from other approaches, such as hippocampal surface area, volume, and shape morphometry features based on spherical harmonics (SPHARM).
Conclusion: The MMS-based PASCP-MP is an efficient tool that can bridge hippocampal atrophy with amyloid and tau pathology and thus help assess disease burden, progression, and treatment effects.
△ Less
Submitted 27 October, 2022;
originally announced November 2022.
-
Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures
Authors:
Vladimir Belov,
Tracy Erwin-Grabner,
Ali Saffet Gonul,
Alyssa R. Amod,
Amar Ojha,
Andre Aleman,
Annemiek Dols,
Anouk Scharntee,
Aslihan Uyar-Demir,
Ben J Harrison,
Benson M. Irungu,
Bianca Besteher,
Bonnie Klimes-Dougan,
Brenda W. J. H. Penninx,
Bryon A. Mueller,
Carlos Zarate,
Christopher G. Davey,
Christopher R. K. Ching,
Colm G. Connolly,
Cynthia H. Y. Fu,
Dan J. Stein,
Danai Dima,
David E. J. Linden,
David M. A. Mehler,
Edith Pomarol-Clotet
, et al. (41 additional authors not shown)
Abstract:
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to da…
▽ More
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (n=5,356) to provide a generalizable ML classification benchmark of major depressive disorder (MDD). Using brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD vs healthy controls (HC) with around 62% balanced accuracy, but when harmonizing the data using ComBat balanced accuracy dropped to approximately 52%. Similar results were observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may achieve more encouraging prospects.
△ Less
Submitted 25 October, 2022; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Secure & Private Federated Neuroimaging
Authors:
Dimitris Stripelis,
Umang Gupta,
Hamza Saleem,
Nikhil Dhinagar,
Tanmay Ghai,
Rafael Chrysovalantis Anastasiou,
Armaghan Asghar,
Greg Ver Steeg,
Srivatsan Ravi,
Muhammad Naveed,
Paul M. Thompson,
Jose Luis Ambite
Abstract:
The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use Federated Learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its…
▽ More
The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use Federated Learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its private data for some time, then shares the neural network parameters (i.e., weights, gradients) with a Federation Controller, which in turn aggregates the local models, sends the resulting community model back to each site, and the process repeats. Our Federated Learning architecture, MetisFL, provides strong security and privacy. First, sample data never leaves a site. Second, neural network parameters are encrypted before transmission and the global neural model is computed under fully-homomorphic encryption. Finally, we use information-theoretic methods to limit information leakage from the neural model to prevent a curious site from performing model inversion or membership attacks. We present a thorough evaluation of the performance of secure, private federated learning in neuroimaging tasks, including for predicting Alzheimer's disease and estimating BrainAGE from magnetic resonance imaging (MRI) studies, in challenging, heterogeneous federated environments where sites have different amounts of data and statistical distributions.
△ Less
Submitted 28 August, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Predicting Tau Accumulation in Cerebral Cortex with Multivariate MRI Morphometry Measurements, Sparse Coding, and Correntropy
Authors:
Jianfeng Wu,
Wenhui Zhu,
Yi Su,
Jie Gui,
Natasha Lepore,
Eric M. Reiman,
Richard J. Caselli,
Paul M. Thompson,
Kewei Chen,
Yalin Wang
Abstract:
Biomarker-assisted diagnosis and intervention in Alzheimer's disease (AD) may be the key to prevention breakthroughs. One of the hallmarks of AD is the accumulation of tau plaques in the human brain. However, current methods to detect tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (Tau PET). In our previous work, structural MRI-based hippocampal multiv…
▽ More
Biomarker-assisted diagnosis and intervention in Alzheimer's disease (AD) may be the key to prevention breakthroughs. One of the hallmarks of AD is the accumulation of tau plaques in the human brain. However, current methods to detect tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (Tau PET). In our previous work, structural MRI-based hippocampal multivariate morphometry statistics (MMS) showed superior performance as an effective neurodegenerative biomarker for preclinical AD and Patch Analysis-based Surface Correntropy-induced Sparse coding and max-pooling (PASCS-MP) has excellent ability to generate low-dimensional representations with strong statistical power for brain amyloid prediction. In this work, we apply this framework together with ridge regression models to predict Tau deposition in Braak12 and Braak34 brain regions separately. We evaluate our framework on 925 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Each subject has one pair consisting of a PET image and MRI scan which were collected at about the same times. Experimental results suggest that the representations from our MMS and PASCS-MP have stronger predictive power and their predicted Braak12 and Braak34 are closer to the real values compared to the measures derived from other approaches such as hippocampal surface area and volume, and shape morphometry features based on spherical harmonics (SPHARM).
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Secure Neuroimaging Analysis using Federated Learning with Homomorphic Encryption
Authors:
Dimitris Stripelis,
Hamza Saleem,
Tanmay Ghai,
Nikhil Dhinagar,
Umang Gupta,
Chrysovalantis Anastasiou,
Greg Ver Steeg,
Srivatsan Ravi,
Muhammad Naveed,
Paul M. Thompson,
Jose Luis Ambite
Abstract:
Federated learning (FL) enables distributed computation of machine learning models over various disparate, remote data sources, without requiring to transfer any individual data to a centralized location. This results in an improved generalizability of models and efficient scaling of computation as more sources and larger datasets are added to the federation. Nevertheless, recent membership attack…
▽ More
Federated learning (FL) enables distributed computation of machine learning models over various disparate, remote data sources, without requiring to transfer any individual data to a centralized location. This results in an improved generalizability of models and efficient scaling of computation as more sources and larger datasets are added to the federation. Nevertheless, recent membership attacks show that private or sensitive personal data can sometimes be leaked or inferred when model parameters or summary statistics are shared with a central site, requiring improved security solutions. In this work, we propose a framework for secure FL using fully-homomorphic encryption (FHE). Specifically, we use the CKKS construction, an approximate, floating point compatible scheme that benefits from ciphertext packing and rescaling. In our evaluation on large-scale brain MRI datasets, we use our proposed secure FL framework to train a deep learning model to predict a person's age from distributed MRI scans, a common benchmarking task, and demonstrate that there is no degradation in the learning performance between the encrypted and non-encrypted federated models.
△ Less
Submitted 9 November, 2021; v1 submitted 7 August, 2021;
originally announced August 2021.
-
Membership Inference Attacks on Deep Regression Models for Neuroimaging
Authors:
Umang Gupta,
Dimitris Stripelis,
Pradeep K. Lam,
Paul M. Thompson,
José Luis Ambite,
Greg Ver Steeg
Abstract:
Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing para…
▽ More
Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing parameters during the training process. However, we demonstrate that allowing access to parameters may leak private information even if data is never directly shared. In particular, we show that it is possible to infer if a sample was used to train the model given only access to the model prediction (black-box) or access to the model itself (white-box) and some leaked samples from the training data distribution. Such attacks are commonly referred to as Membership Inference attacks. We show realistic Membership Inference attacks on deep learning models trained for 3D neuroimaging tasks in a centralized as well as decentralized setup. We demonstrate feasible attacks on brain age prediction models (deep learning models that predict a person's age from their brain MRI scan). We correctly identified whether an MRI scan was used in model training with a 60% to over 80% success rate depending on model complexity and security assumptions.
△ Less
Submitted 3 June, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Predicting Future Cognitive Decline with Hyperbolic Stochastic Coding
Authors:
J. Zhang,
Q. Dong,
J. Shi,
Q. Li,
C. M. Stonnington,
B. A. Gutman,
K. Chen,
E. M. Reiman,
R. J. Caselli,
P. M. Thompson,
J. Ye,
Y. Wang
Abstract:
Hyperbolic geometry has been successfully applied in modeling brain cortical and subcortical surfaces with general topological structures. However such approaches, similar to other surface based brain morphology analysis methods, usually generate high dimensional features. It limits their statistical power in cognitive decline prediction research, especially in datasets with limited subject number…
▽ More
Hyperbolic geometry has been successfully applied in modeling brain cortical and subcortical surfaces with general topological structures. However such approaches, similar to other surface based brain morphology analysis methods, usually generate high dimensional features. It limits their statistical power in cognitive decline prediction research, especially in datasets with limited subject numbers. To address the above limitation, we propose a novel framework termed as hyperbolic stochastic coding (HSC). Our preliminary experimental results show that our algorithm achieves superior results on various classification tasks. Our work may enrich surface based brain imaging research tools and potentially result in a diagnostic and prognostic indicator to be useful in individualized treatment strategies.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Improved Brain Age Estimation with Slice-based Set Networks
Authors:
Umang Gupta,
Pradeep K. Lam,
Greg Ver Steeg,
Paul M. Thompson
Abstract:
Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer…
▽ More
Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer parameters and may require fewer samples to achieve comparable performance. In this paper, we propose a new architecture for BrainAGE prediction. The proposed architecture works by encoding each 2D slice in an MRI with a deep 2D-CNN model. Next, it combines the information from these 2D-slice encodings using set networks or permutation invariant layers. Experiments on the BrainAGE prediction problem, using the UK Biobank dataset, showed that the model with the permutation invariant layers trains faster and provides better predictions compared to other state-of-the-art approaches.
△ Less
Submitted 9 February, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.
-
3D Grid-Attention Networks for Interpretable Age and Alzheimer's Disease Prediction from Structural MRI
Authors:
Pradeep Lam,
Alyssa H. Zhu,
Iyad Ba Gari,
Neda Jahanshad,
Paul M. Thompson
Abstract:
We propose an interpretable 3D Grid-Attention deep neural network that can accurately predict a person's age and whether they have Alzheimer's disease (AD) from a structural brain MRI scan. Building on a 3D convolutional neural network, we added two attention modules at different layers of abstraction, so that features learned are spatially related to the global features for the task. The attentio…
▽ More
We propose an interpretable 3D Grid-Attention deep neural network that can accurately predict a person's age and whether they have Alzheimer's disease (AD) from a structural brain MRI scan. Building on a 3D convolutional neural network, we added two attention modules at different layers of abstraction, so that features learned are spatially related to the global features for the task. The attention layers allow the network to focus on brain regions relevant to the task, while masking out irrelevant or noisy regions. In evaluations based on 4,561 3-Tesla T1-weighted MRI scans from 4 phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI), salience maps for age and AD prediction partially overlapped, but lower-level features overlapped more than higher-level features. The brain age prediction network also distinguished AD and healthy control groups better than another state-of-the-art method. The resulting visual analyses can distinguish interpretable feature patterns that are important for predicting clinical diagnosis. Future work is needed to test performance across scanners and populations.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Overview of Scanner Invariant Representations
Authors:
Daniel Moyer,
Greg Ver Steeg,
Paul M. Thompson
Abstract:
Pooled imaging data from multiple sources is subject to bias from each source. Studies that do not correct for these scanner/site biases at best lose statistical power, and at worst leave spurious correlations in their data. Estimation of the bias effects is non-trivial due to the paucity of data with correspondence across sites, so called "traveling phantom" data, which is expensive to collect. N…
▽ More
Pooled imaging data from multiple sources is subject to bias from each source. Studies that do not correct for these scanner/site biases at best lose statistical power, and at worst leave spurious correlations in their data. Estimation of the bias effects is non-trivial due to the paucity of data with correspondence across sites, so called "traveling phantom" data, which is expensive to collect. Nevertheless, numerous solutions leveraging direct correspondence have been proposed. In contrast to this, Moyer et al. (2019) proposes an unsupervised solution using invariant representations, one which does not require correspondence and thus does not require paired images. By leveraging the data processing inequality, an invariant representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to the underlying structure. In the present abstract we provide an overview of this method.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Scanner Invariant Representations for Diffusion MRI Harmonization
Authors:
Daniel Moyer,
Greg Ver Steeg,
Chantal M. W. Tax,
Paul M. Thompson
Abstract:
Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation.
Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more c…
▽ More
Purpose: In the present work we describe the correction of diffusion-weighted MRI for site and scanner biases using a novel method based on invariant representation.
Theory and Methods: Pooled imaging data from multiple sources are subject to variation between the sources. Correcting for these biases has become very important as imaging studies increase in size and multi-site cases become more common. We propose learning an intermediate representation invariant to site/protocol variables, a technique adapted from information theory-based algorithmic fairness; by leveraging the data processing inequality, such a representation can then be used to create an image reconstruction that is uninformative of its original source, yet still faithful to underlying structures. To implement this, we use a deep learning method based on variational auto-encoders (VAE) to construct scanner invariant encodings of the imaging data.
Results: To evaluate our method, we use training data from the 2018 MICCAI Computational Diffusion MRI (CDMRI) Challenge Harmonization dataset. Our proposed method shows improvements on independent test data relative to a recently published baseline method on each subtask, map** data from three different scanning contexts to and from one separate target scanning context.
Conclusion: As imaging studies continue to grow, the use of pooled multi-site imaging will similarly increase. Invariant representation presents a strong candidate for the harmonization of these data.
△ Less
Submitted 31 January, 2020; v1 submitted 10 April, 2019;
originally announced April 2019.
-
Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data
Authors:
Santiago Silva,
Boris Gutman,
Eduardo Romero,
Paul M Thompson,
Andre Altmann,
Marco Lorenzi
Abstract:
At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting t…
▽ More
At this moment, databanks worldwide contain brain images of previously unimaginable numbers. Combined with developments in data science, these massive data provide the potential to better understand the genetic underpinnings of brain diseases. However, different datasets, which are stored at different institutions, cannot always be shared directly due to privacy and legal concerns, thus limiting the full exploitation of big data in the study of brain disorders. Here we propose a federated learning framework for securely accessing and meta-analyzing any biomedical data without sharing individual information. We illustrate our framework by investigating brain structural relationships across diseases and clinical cohorts. The framework is first tested on synthetic data and then applied to multi-centric, multi-database studies including ADNI, PPMI, MIRIAD and UK Biobank, showing the potential of the approach for further applications in distributed analysis of multi-centric cohorts
△ Less
Submitted 14 March, 2019; v1 submitted 19 October, 2018;
originally announced October 2018.
-
Deep Learning for Quality Control of Subcortical Brain 3D Shape Models
Authors:
Dmitry Petrov,
Boris A. Gutman Egor Kuznetsov,
Theo G. M. van Erp,
Jessica A. Turner,
Lianne Schmaal,
Dick Veltman,
Lei Wang,
Kathryn Alpert,
Dmitry Isaev,
Artemis Zavaliangos-Petropulu,
Christopher R. K. Ching,
Vince Calhoun,
David Glahn,
Theodore D. Satterthwaite,
Ole Andreas Andreassen,
Stefan Borgwardt,
Fleur Howells,
Nynke Groenewold,
Aristotle Voineskos,
Joaquim Radua,
Steven G. Potkin,
Benedicto Crespo-Facorro,
Diana Tordesillas-Gutierrez,
Li Shen,
Irina Lebedeva
, et al. (48 additional authors not shown)
Abstract:
We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present a…
▽ More
We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present an approach for model decision visualization, allowing human raters to see the areas of subcortical shapes most likely to be deemed of failing quality by the machine. Our training data is comprised of 5200 subjects from the ENIGMA Schizophrenia MRI cohorts, and our test dataset contains 1500 subjects from the ENIGMA Major Depressive Disorder cohorts. Our final models reduce human rater time by 46-70%. ResNet outperforms VGGNet and Inception for all of our predictive tasks.
△ Less
Submitted 4 September, 2018; v1 submitted 30 August, 2018;
originally announced August 2018.
-
Measures of Tractography Convergence
Authors:
Daniel Moyer,
Paul M. Thompson,
Greg Ver Steeg
Abstract:
In the present work, we use information theory to understand the empirical convergence rate of tractography, a widely-used approach to reconstruct anatomical fiber pathways in the living brain. Based on diffusion MRI data, tractography is the starting point for many methods to study brain connectivity. Of the available methods to perform tractography, most reconstruct a finite set of streamlines,…
▽ More
In the present work, we use information theory to understand the empirical convergence rate of tractography, a widely-used approach to reconstruct anatomical fiber pathways in the living brain. Based on diffusion MRI data, tractography is the starting point for many methods to study brain connectivity. Of the available methods to perform tractography, most reconstruct a finite set of streamlines, or 3D curves, representing probable connections between anatomical regions, yet relatively little is known about how the sampling of this set of streamlines affects downstream results, and how exhaustive the sampling should be. Here we provide a method to measure the information theoretic surprise (self-cross entropy) for tract sampling schema. We then empirically assess four streamline methods. We demonstrate that the relative information gain is very low after a moderate number of streamlines have been generated for each tested method. The results give rise to several guidelines for optimal sampling in brain connectivity analyses.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
A Fast, Accurate Two-Step Linear Mixed Model for Genetic Analysis Applied to Repeat MRI Measurements
Authors:
Qifan Yang,
Gennady V. Roshchupkin,
Wiro J. Niessen,
Sarah E. Medland,
Alyssa H. Zhu,
Paul M. Thompson,
Neda Jahanshad
Abstract:
Large-scale biobanks are being collected around the world in efforts to better understand human health and risk factors for disease. They often survey hundreds of thousands of individuals, combining questionnaires with clinical, genetic, demographic, and imaging assessments; some of this data may be collected longitudinally. Genetic associations analysis of such datasets requires methods to proper…
▽ More
Large-scale biobanks are being collected around the world in efforts to better understand human health and risk factors for disease. They often survey hundreds of thousands of individuals, combining questionnaires with clinical, genetic, demographic, and imaging assessments; some of this data may be collected longitudinally. Genetic associations analysis of such datasets requires methods to properly handle relatedness, population structure and other types of biases introduced by confounders. Most popular and accurate approaches rely on linear mixed model (LMM) algorithms, which are iterative and computational complexity of each iteration scales by the square of the sample size, slowing the pace of discoveries (up to several days for single trait analysis), and, furthermore, limiting the use of repeat phenotypic measurements. Here, we describe our new, non-iterative, much faster and accurate Two-Step Linear Mixed Model (Two-Step LMM) approach, that has a computational complexity that scales linearly with sample size. We show that the first step retains accurate estimates of the heritability (the proportion of the trait variance explained by additive genetic factors), even when increasingly complex genetic relationships between individuals are modeled. Second step provides a faster framework to obtain the effect sizes of covariates in regression model. We applied Two-Step LMM to real data from the UK Biobank, which recently released genoty** information and processed MRI data from 9,725 individuals. We used the left and right hippocampus volume (HV) as repeated measures, and observed increased and more accurate heritability estimation, consistent with simulations.
△ Less
Submitted 15 March, 2019; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Simultaneous Matrix Diagonalization for Structural Brain Networks Classification
Authors:
Nikita Mokrov,
Maxim Panov,
Boris A. Gutman,
Joshua I. Faskowitz,
Neda Jahanshad,
Paul M. Thompson
Abstract:
This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenst…
▽ More
This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer's disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification.
△ Less
Submitted 14 October, 2017;
originally announced October 2017.
-
Heritability estimates on resting state fMRI data using the ENIGMA analysis pipeline
Authors:
Bhim M. Adhikari,
Neda Jahanshad,
Dinesh Shukla,
Dinesh Shukla,
Richard C. Reynolds,
Robert W. Cox,
Els Fieremans,
Jelle Veraart,
Dmitry S. Novikov,
L. Elliot Hong,
Paul M. Thompson,
Peter Kochunov
Abstract:
Big data initiatives such as the Enhancing NeuroImaging Genetics through Meta-Analysis consortium (ENIGMA), combine data collected by independent studies worldwide to achieve more accurate estimates of effect sizes and more reliable and reproducible outcomes. Such efforts require harmonized analyses protocols to consistently extract phenotypes. Even so, challenges include wide variability of fMRI…
▽ More
Big data initiatives such as the Enhancing NeuroImaging Genetics through Meta-Analysis consortium (ENIGMA), combine data collected by independent studies worldwide to achieve more accurate estimates of effect sizes and more reliable and reproducible outcomes. Such efforts require harmonized analyses protocols to consistently extract phenotypes. Even so, challenges include wide variability of fMRI protocols and scanner platforms; this leads to site-to-site variance in quality, resolution and temporal signal-to-noise ratio (tSNR). An effective harmonization should provide optimal measures for data of different qualities. We developed a multi-site rsfMRI analysis pipeline to allow research groups around the world to process rsfMRI scans in a harmonized way, to extract consistent and quantitative measurements of connectivity and to perform coordinated statistical tests. We used the single-modality ENIGMA rsfMRI pipeline based on model-free Marchenko-Pastor PCA based denoising to verify and replicate findings of significant heritability of measures from resting state networks. We analyzed two independent cohorts, GOBS (Genetics of Brain Structure) and HCP (the Human Connectome Project), which collected data using conventional and connectomics oriented fMRI protocols. We used seed-based connectivity and dual-regression approaches to show that rsfMRI signal is consistently heritable across twenty major functional network measures. Heritability values of 20-40% were observed across both cohorts.
△ Less
Submitted 13 September, 2017;
originally announced September 2017.
-
Machine Learning for Large-Scale Quality Control of 3D Shape Models in Neuroimaging
Authors:
Dmitry Petrov,
Boris A. Gutman,
Shih-Hua,
Yu,
Theo G. M. van Erp,
Jessica A. Turner,
Lianne Schmaal,
Dick Veltman,
Lei Wang,
Kathryn Alpert,
Dmitry Isaev,
Artemis Zavaliangos-Petropulu,
Christopher R. K. Ching,
Vince Calhoun,
David Glahn,
Theodore D. Satterthwaite,
Ole Andreas Andreasen,
Stefan Borgwardt,
Fleur Howells,
Nynke Groenewold,
Aristotle Voineskos,
Joaquim Radua,
Steven G. Potkin,
Benedicto Crespo-Facorro,
Diana Tordesillas-Gutierrez
, et al. (50 additional authors not shown)
Abstract:
As very large studies of complex neuroimaging phenotypes become more common, human quality assessment of MRI-derived data remains one of the last major bottlenecks. Few attempts have so far been made to address this issue with machine learning. In this work, we optimize predictive models of quality for meshes representing deep brain structure shapes. We use standard vertex-wise and global shape fe…
▽ More
As very large studies of complex neuroimaging phenotypes become more common, human quality assessment of MRI-derived data remains one of the last major bottlenecks. Few attempts have so far been made to address this issue with machine learning. In this work, we optimize predictive models of quality for meshes representing deep brain structure shapes. We use standard vertex-wise and global shape features computed homologously across 19 cohorts and over 7500 human-rated subjects, training kernelized Support Vector Machine and Gradient Boosted Decision Trees classifiers to detect meshes of failing quality. Our models generalize across datasets and diseases, reducing human workload by 30-70\%, or equivalently hundreds of human rater hours for datasets of comparable size, with recall rates approaching inter-rater reliability.
△ Less
Submitted 7 August, 2017; v1 submitted 19 July, 2017;
originally announced July 2017.
-
Classification of Major Depressive Disorder via Multi-Site Weighted LASSO Model
Authors:
Dajiang Zhu,
Brandalyn C. Riedel,
Neda Jahanshad,
Nynke A. Groenewold,
Dan J. Stein,
Ian H. Gotlib,
Matthew D. Sacchet,
Danai Dima,
James H. Cole,
Cynthia H. Y. Fu,
Henrik Walter,
Ilya M. Veer,
Thomas Frodl,
Lianne Schmaal,
Dick J. Veltman,
Paul M. Thompson
Abstract:
Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distribut…
▽ More
Large-scale collaborative analysis of brain imaging data, in psychiatry and neu-rology, offers a new source of statistical power to discover features that boost ac-curacy in disease classification, differential diagnosis, and outcome prediction. However, due to data privacy regulations or limited accessibility to large datasets across the world, it is challenging to efficiently integrate distributed information. Here we propose a novel classification framework through multi-site weighted LASSO: each site performs an iterative weighted LASSO for feature selection separately. Within each iteration, the classification result and the selected features are collected to update the weighting parameters for each feature. This new weight is used to guide the LASSO process at the next iteration. Only the fea-tures that help to improve the classification accuracy are preserved. In tests on da-ta from five sites (299 patients with major depressive disorder (MDD) and 258 normal controls), our method boosted classification accuracy for MDD by 4.9% on average. This result shows the potential of the proposed new strategy as an ef-fective and practical collaborative platform for machine learning on large scale distributed imaging and biobank data.
△ Less
Submitted 3 June, 2017; v1 submitted 26 May, 2017;
originally announced May 2017.
-
Large-scale Feature Selection of Risk Genetic Factors for Alzheimer's Disease via Distributed Group Lasso Regression
Authors:
Qingyang Li,
Dajiang Zhu,
Jie Zhang,
Derrek Paul Hibar,
Neda Jahanshad,
Yalin Wang,
Jie** Ye,
Paul M. Thompson,
Jie Wang
Abstract:
Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer's disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in…
▽ More
Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer's disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection Framework (DFSF) to conduct the large-scale imaging genetics studies across multiple institutions. To speed up the learning process, we propose a family of distributed group Lasso screening rules to identify irrelevant features and remove them from the optimization. Then we select the relevant group features by performing the group Lasso feature selection process in a sequence of parameters. Finally, we employ the stability selection to rank the top risk SNPs that might help detect the early stage of AD. To the best of our knowledge, this is the first distributed feature selection model integrated with group Lasso feature selection as well as detecting the risk genetic factors across multiple research institutions system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs which are distributed across several individual institutions, demonstrating the efficiency and effectiveness of the proposed method.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex
Authors:
Daniel Moyer,
Boris A Gutman,
Neda Jahanshad,
Paul M. Thompson
Abstract:
One of the primary objectives of human brain map** is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as…
▽ More
One of the primary objectives of human brain map** is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as the volume of available information grows. Towards this end, we present a parcellation method based on a Bayesian non-parametric mixture model of cortical connectivity.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
An Empirical Study of Continuous Connectivity Degree Sequence Equivalents
Authors:
Daniel Moyer,
Boris A. Gutman,
Joshua Faskowitz,
Neda Jahanshad,
Paul M. Thompson
Abstract:
In the present work we demonstrate the use of a parcellation free connectivity model based on Poisson point processes. This model produces for each subject a continuous bivariate intensity function that represents for every possible pair of points the relative rate at which we observe tracts terminating at those points. We fit this model to explore degree sequence equivalents for spatial continuum…
▽ More
In the present work we demonstrate the use of a parcellation free connectivity model based on Poisson point processes. This model produces for each subject a continuous bivariate intensity function that represents for every possible pair of points the relative rate at which we observe tracts terminating at those points. We fit this model to explore degree sequence equivalents for spatial continuum graphs, and to investigate the local differences between estimated intensity functions for two different tractography methods. This is a companion paper to Moyer et al. (2016), where the model was originally defined.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
A Continuous Model of Cortical Connectivity
Authors:
Daniel Moyer,
Boris A. Gutman,
Joshua Faskowitz,
Neda Jahanshad,
Paul M. Thompson
Abstract:
We present a continuous model for structural brain connectivity based on the Poisson point process. The model treats each streamline curve in a tractography as an observed event in connectome space, here a product space of cortical white matter boundaries. We approximate the model parameter via kernel density estimation. To deal with the heavy computational burden, we develop a fast parameter esti…
▽ More
We present a continuous model for structural brain connectivity based on the Poisson point process. The model treats each streamline curve in a tractography as an observed event in connectome space, here a product space of cortical white matter boundaries. We approximate the model parameter via kernel density estimation. To deal with the heavy computational burden, we develop a fast parameter estimation method by pre-computing associated Legendre products of the data, leveraging properties of the spherical heat kernel. We show how our approach can be used to assess the quality of cortical parcellations with respect to connectivty. We further present empirical results that suggest the discrete connectomes derived from our model have substantially higher test-retest reliability compared to standard methods.
△ Less
Submitted 5 November, 2018; v1 submitted 12 October, 2016;
originally announced October 2016.
-
The Importance of Being Negative: A serious treatment of non-trivial edges in brain functional connectome
Authors:
Liang Zhan,
Lisanne M. Jenkins,
Ouri E. Wolfson,
Johnson J. GadElkarim,
Kevin Nocito,
Paul M. Thompson,
Olusola A. Ajilore,
Moo K. Chung,
Alex D. Leow
Abstract:
Understanding the modularity of fMRI-derived brain networks or connectomes can inform the study of brain function organization. However, fMRI connectomes additionally involve negative edges, which are not rigorously accounted for by existing approaches to modularity that either ignores or arbitrarily weight these connections. Furthermore, most Q maximization-based modularity algorithms yield varia…
▽ More
Understanding the modularity of fMRI-derived brain networks or connectomes can inform the study of brain function organization. However, fMRI connectomes additionally involve negative edges, which are not rigorously accounted for by existing approaches to modularity that either ignores or arbitrarily weight these connections. Furthermore, most Q maximization-based modularity algorithms yield variable results with suboptimal reproducibility. Here we present an alternative, reproducible approach that exploits how frequent the BOLD-signal correlation between two nodes is negative. We validated this novel probability-based modularity approach on two independent publicly-available resting-state connectome dataset (the Human Connectome Project and the 1000 Functional Connectomes) and demonstrated that negative correlations alone are sufficient in understanding resting-state modularity. In fact, this approach a) permits a dual formulation, leading to equivalent solutions regardless of whether one considers positive or negative edges; b) is theoretically linked to the Ising model defined on the connectome, thus yielding modularity result that maximizes data likelihood. We additionally were able to detect sex differences in modularity that the most widely utilized methods did not. Results confirmed the superiority of our approach in that: a) correlations with the highest probability of being negative are consistently placed between modules, b) due to the equivalent dual forms, no arbitrary weighting factor is required to balance the influence between negative and positive correlations, unlike existing Q maximization-based modularity approaches. As datasets like HCP become widely available for analysis by the neuroscience community at large, appropriate computational tools to understand the neurobiological information of negative edges in fMRI connectomes are increasingly important.
△ Less
Submitted 5 June, 2017; v1 submitted 6 September, 2016;
originally announced September 2016.
-
Large-scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer's Disease Across Multiple Institutions
Authors:
Qingyang Li,
Tao Yang,
Liang Zhan,
Derrek Paul Hibar,
Neda Jahanshad,
Yalin Wang,
Jie** Ye,
Paul M. Thompson,
Jie Wang
Abstract:
Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimer's disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on individual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data priv…
▽ More
Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimer's disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on individual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data privacy. To address this challenge, we propose a novel distributed framework, termed Local Query Model (LQM) to detect risk SNPs for AD across multiple research institutions. To accelerate the learning process, we propose a Distributed Enhanced Dual Polytope Projection (D-EDPP) screening rule to identify irrelevant features and remove them from the optimization. To the best of our knowledge, this is the first successful run of the computationally intensive model selection procedure to learn a consistent model across different institutions without compromising their privacy while ranking the SNPs that may collectively affect AD. Empirical studies are conducted on 809 subjects with 5.9 million SNP features which are distributed across three individual institutions. D-EDPP achieved a 66-fold speed-up by effectively identifying irrelevant features.
△ Less
Submitted 19 August, 2016;
originally announced August 2016.
-
Unifying inference on brain network variations in neurological diseases: The Alzheimer's case
Authors:
Daniele Durante,
Madelaine Daianu,
Neda Jahanshad,
Paul M. Thompson,
David B. Dunson
Abstract:
There is growing interest in understanding how the structural interconnections among brain regions change with the occurrence of neurological diseases. Diffusion weighted MRI imaging has allowed researchers to non-invasively estimate a network of structural cortical connections made by white matter tracts, but current statistical methods for relating such networks to the presence or absence of a d…
▽ More
There is growing interest in understanding how the structural interconnections among brain regions change with the occurrence of neurological diseases. Diffusion weighted MRI imaging has allowed researchers to non-invasively estimate a network of structural cortical connections made by white matter tracts, but current statistical methods for relating such networks to the presence or absence of a disease cannot exploit this rich network information. Standard practice considers each edge independently or summarizes the network with a few simple features. We enable dramatic gains in biological insight via a novel unifying methodology for inference on brain network variations associated to the occurrence of neurological diseases. The key of this approach is to define a probabilistic generative mechanism directly on the space of network configurations via dependent mixtures of low-rank factorizations, which efficiently exploit network information and allow the probability mass function for the brain network-valued random variable to vary flexibly across the group of patients characterized by a specific neurological disease and the one comprising age-matched cognitively healthy individuals.
△ Less
Submitted 19 October, 2015;
originally announced October 2015.
-
SketchBio: A Scientist's 3D Interface for Molecular Modeling and Animation
Authors:
Shawn M. Waldon,
Peter M. Thompson,
Patrick J. Hahn,
Russell M. Taylor II
Abstract:
Background: Because of the difficulties involved in learning and using 3D modeling and rendering software, many scientists hire programmers or animators to create models and animations. This both slows the discovery process and provides opportunities for miscommunication. Working with multiple collaborators, we developed a set of design goals for a tool that would enable them to directly construct…
▽ More
Background: Because of the difficulties involved in learning and using 3D modeling and rendering software, many scientists hire programmers or animators to create models and animations. This both slows the discovery process and provides opportunities for miscommunication. Working with multiple collaborators, we developed a set of design goals for a tool that would enable them to directly construct models and animations. Results: We present SketchBio, a tool that incorporates state-of-the-art bimanual interaction and drop shadows to enable rapid construction of molecular structures and animations. It includes three novel features: crystal by example, pose-mode physics, and spring-based layout that accelerate operations common in the formation of molecular models. We present design decisions and their consequences, including cases where iterative design was required to produce effective approaches. Conclusions: The design decisions, novel features, and inclusion of state-of-the-art techniques enabled SketchBio to meet all of its design goals. These features and decisions can be incorporated into existing and new tools to improve their effectiveness
△ Less
Submitted 11 July, 2014;
originally announced July 2014.
-
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression
Authors:
Matt Silver,
Eva Janousova,
Xue Hua,
Paul M. Thompson,
Giovanni Montana
Abstract:
We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regress…
▽ More
We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 464 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing chemokine, Jak-stat and insulin signalling pathways, and tight junction interactions. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection, and identify a number of previously validated AD genes including CR1, APOE and TOMM40.
△ Less
Submitted 9 April, 2012;
originally announced April 2012.
-
Optimization of Surface Registrations using Beltrami Holomorphic Flow
Authors:
L. M. Lui,
T. W. Wong,
W. Zeng,
X. F. Gu,
P. M. Thompson,
T. F. Chan,
S. T. Yau
Abstract:
In shape analysis, finding an optimal 1-1 correspondence between surfaces within a large class of admissible bijective map**s is of great importance. Such process is called surface registration. The difficulty lies in the fact that the space of all surface diffeomorphisms is a complicated functional space, making exhaustive search for the best map** challenging. To tackle this problem, we prop…
▽ More
In shape analysis, finding an optimal 1-1 correspondence between surfaces within a large class of admissible bijective map**s is of great importance. Such process is called surface registration. The difficulty lies in the fact that the space of all surface diffeomorphisms is a complicated functional space, making exhaustive search for the best map** challenging. To tackle this problem, we propose a simple representation of bijective surface maps using Beltrami coefficients (BCs), which are complex-valued functions defined on surfaces with supreme norm less than 1. Fixing any 3 points on a pair of surfaces, there is a 1-1 correspondence between the set of surface diffeomorphisms between them and the set of BCs. Hence, every bijective surface map can be represented by a unique BC. Conversely, given a BC, we can reconstruct the unique surface map associated to it using the Beltrami Holomorphic flow (BHF) method. Using BCs to represent surface maps is advantageous because it is a much simpler functional space, which captures many essential features of a surface map. By adjusting BCs, we equivalently adjust surface diffeomorphisms to obtain the optimal map with desired properties. More specifically, BHF gives us the variation of the associated map under the variation of BC. Using this, a variational problem over the space of surface diffeomorphisms can be easily reformulated into a variational problem over the space of BCs. This makes the minimization procedure much easier. More importantly, the diffeomorphic property is always preserved. We test our method on synthetic examples and real medical applications. Experimental results demonstrate the effectiveness of our proposed algorithm for surface registration.
△ Less
Submitted 14 May, 2010;
originally announced May 2010.