-
A principled framework to assess information theoretical fitness of brain functional sub-circuits
Authors:
Duy Duong-Tran,
Nghi Nguyen,
Shizhuo Mu,
Jiong Chen,
**gxuan Bao,
Frederick Xu,
Sumita Garai,
Jose Cadena-Pico,
Alan David Kaplan,
Tianlong Chen,
Yize Zhao,
Li Shen,
Joaquín Goñi
Abstract:
In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is map** a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is t…
▽ More
In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is map** a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is thresholding weighted FCs to remove spurious connections without justifying the chosen threshold. This paper leverages recent theoretical advances in Stochastic Block Models (SBMs) to formally define and quantify the information-theoretic fitness (e.g., prominence) of a predetermined set of FNs when mapped to individual FCs under different fMRI task conditions. Our framework allows for evaluating any combination of FC granularity, FN partition, and thresholding strategy, thereby optimizing these choices to preserve important topological features of the human brain connectomes. Our results pave the way for the proper use of predetermined FNs and thresholding methods and provide insights for future research in individualized parcellations.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Knot data analysis using multiscale Gauss link integral
Authors:
Li Shen,
Hongsong Feng,
Fengling Li,
Fengchun Lei,
Jie Wu,
Guo-Wei Wei
Abstract:
In the past decade, topological data analysis (TDA) has emerged as a powerful approach in data science. The main technique in TDA is persistent homology, which tracks topological invariants over the filtration of point cloud data using algebraic topology. Although knot theory and related subjects are a focus of study in mathematics, their success in practical applications is quite limited due to t…
▽ More
In the past decade, topological data analysis (TDA) has emerged as a powerful approach in data science. The main technique in TDA is persistent homology, which tracks topological invariants over the filtration of point cloud data using algebraic topology. Although knot theory and related subjects are a focus of study in mathematics, their success in practical applications is quite limited due to the lack of localization and quantization. We address these challenges by introducing knot data analysis (KDA), a new paradigm that incorporating curve segmentation and multiscale analysis into the Gauss link integral. The resulting multiscale Gauss link integral (mGLI) recovers the global topological properties of knots and links at an appropriate scale but offers multiscale feature vectors to capture the local structures and connectivities of each curve segment at various scales. The proposed mGLI significantly outperforms other state-of-the-art methods in benchmark protein flexibility analysis, including earlier persistent homology-based methods. Our approach enables the integration of artificial intelligence (AI) and KDA for general curve-like objects and data.
△ Less
Submitted 2 October, 2023;
originally announced November 2023.
-
Modeling Path Importance for Effective Alzheimer's Disease Drug Repurposing
Authors:
Shunian Xiang,
Patrick J. Lawrence,
Bo Peng,
ChienWei Chiang,
Dokyoon Kim,
Li Shen,
Xia Ning
Abstract:
Recently, drug repurposing has emerged as an effective and resource-efficient paradigm for AD drug discovery. Among various methods for drug repurposing, network-based methods have shown promising results as they are capable of leveraging complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing…
▽ More
Recently, drug repurposing has emerged as an effective and resource-efficient paradigm for AD drug discovery. Among various methods for drug repurposing, network-based methods have shown promising results as they are capable of leveraging complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing approaches typically assume paths of the same length in the network have equal importance in identifying the therapeutic effect of drugs. Other domains have found that same length paths do not necessarily have the same importance. Thus, relying on this assumption may be deleterious to drug repurposing attempts. In this work, we propose MPI (Modeling Path Importance), a novel network-based method for AD drug repurposing. MPI is unique in that it prioritizes important paths via learned node embeddings, which can effectively capture a network's rich structural information. Thus, leveraging learned embeddings allows MPI to effectively differentiate the importance among paths. We evaluate MPI against a commonly used baseline method that identifies anti-AD drug candidates primarily based on the shortest paths between drugs and AD in the network. We observe that among the top-50 ranked drugs, MPI prioritizes 20.0% more drugs with anti-AD evidence compared to the baseline. Finally, Cox proportional-hazard models produced from insurance claims data aid us in identifying the use of etodolac, nicotine, and BBB-crossing ACE-INHs as having a reduced risk of AD, suggesting such drugs may be viable candidates for repurposing and should be explored further in future studies.
△ Less
Submitted 27 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Incomplete Multimodal Learning for Complex Brain Disorders Prediction
Authors:
Reza Shirkavand,
Liang Zhan,
Heng Huang,
Li Shen,
Paul M. Thompson
Abstract:
Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research coh…
▽ More
Recent advancements in the acquisition of various brain data sources have created new opportunities for integrating multimodal brain data to assist in early detection of complex brain disorders. However, current data integration approaches typically need a complete set of biomedical data modalities, which may not always be feasible, as some modalities are only available in large-scale research cohorts and are prohibitive to collect in routine clinical practice. Especially in studies of brain diseases, research cohorts may include both neuroimaging data and genetic data, but for practical clinical diagnosis, we often need to make disease predictions only based on neuroimages. As a result, it is desired to design machine learning models which can use all available data (different data could provide complementary information) during training but conduct inference using only the most common data modality. We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks to effectively exploit auxiliary modalities available during training in order to improve the performance of a unimodal model at inference. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Experimental results demonstrate that our approach outperforms the related machine learning and deep learning methods by a significant margin.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Establishing group-level brain structural connectivity incorporating anatomical knowledge under latent space modeling
Authors:
Selena Wang,
Yiting Wang,
Frederick H. Xu,
Li Shen,
Yize Zhao
Abstract:
Brain structural connectivity, capturing the white matter fiber tracts among brain regions inferred by diffusion MRI (dMRI), provides a unique characterization of brain anatomical organization. One fundamental question to address with structural connectivity is how to properly summarize and perform statistical inference for a group-level connectivity architecture, for instance, under different sex…
▽ More
Brain structural connectivity, capturing the white matter fiber tracts among brain regions inferred by diffusion MRI (dMRI), provides a unique characterization of brain anatomical organization. One fundamental question to address with structural connectivity is how to properly summarize and perform statistical inference for a group-level connectivity architecture, for instance, under different sex groups, or disease cohorts. Existing analyses commonly summarize group-level brain connectivity by a simple entry-wise sample mean or median across individual brain connectivity matrices. However, such a heuristic approach fully ignores the associations among structural connections and the topological properties of brain networks. In this project, we propose a latent space-based generative network model to estimate group-level brain connectivity. We name our method the attributes-informed brain connectivity (ABC) model, which compared with existing group-level connectivity estimations, (1) offers an interpretable latent space representation of the group-level connectivity, (2) incorporates the anatomical knowledge of nodes and tests its co-varying relationship with connectivity and (3) quantifies the uncertainty and evaluates the likelihood of the estimated group-level effects against chance. We devise a novel Bayesian MCMC algorithm to estimate the model. By applying the ABC model to study brain structural connectivity stratified by sex among Alzheimer's Disease (AD) subjects and healthy controls incorporating the anatomical attributes (volume, thickness and area) on nodes, our method shows superior predictive power on out-of-sample structural connectivity and identifies meaningful sex-specific network neuromarkers for AD.
△ Less
Submitted 21 February, 2023;
originally announced April 2023.
-
Gene-SGAN: a method for discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering
Authors:
Zhijian Yang,
Junhao Wen,
Ahmed Abdulkadir,
Yuhan Cui,
Guray Erus,
Elizabeth Mamourian,
Randa Melhem,
Dhivya Srinivasan,
Sindhuja T. Govindarajan,
Jiong Chen,
Mohamad Habes,
Colin L. Masters,
Paul Maruff,
Jurgen Fripp,
Luigi Ferrucci,
Marilyn S. Albert,
Sterling C. Johnson,
John C. Morris,
Pamela LaMontagne,
Daniel S. Marcus,
Tammie L. S. Benzinger,
David A. Wolk,
Li Shen,
**gxuan Bao,
Susan M. Resnick
, et al. (3 additional authors not shown)
Abstract:
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limite…
▽ More
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN - a multi-view, weakly-supervised deep clustering method - which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer's disease and brain endophenotypes associated with hypertension, from MRI and SNP data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subty** and endophenotype discovery, and is herein tested on disease-related, genetically-driven neuroimaging phenotypes.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
Protein Co-Enrichment Analysis of Extracellular Vesicles
Authors:
Molly L. Shen,
Zijie **,
Rosalie Martel,
Andreas Wallucks,
Lucile Alexandre,
Philippe DeCorwin-Martin,
Lorenna Oliveira Fernandes de Araujo,
Andy Ng,
David Juncker
Abstract:
Extracellular Vesicles (EVs) carry cell-derived proteins that confer functionality and selective cell uptake. However, whether proteins are packaged stochastically or co-enriched within individual EVs, and whether co-enrichment fluctuates under homeostasis and disease, has not been measured. EV abundance and protein global relative expression have been qualified by bulk analysis. Meanwhile, co-enr…
▽ More
Extracellular Vesicles (EVs) carry cell-derived proteins that confer functionality and selective cell uptake. However, whether proteins are packaged stochastically or co-enriched within individual EVs, and whether co-enrichment fluctuates under homeostasis and disease, has not been measured. EV abundance and protein global relative expression have been qualified by bulk analysis. Meanwhile, co-enrichment is not directly accessible via bulk measurement and has not been reported for single EV analysis. Here, we introduce the normalized index of co-enrichment (NICE) to measure protein co-enrichment. NICE was derived by (i) capturing EVs based on the expression of a membrane-bound protein, (ii) probing for the co-expression of a second protein at the population level - EV integrity underwrites the detection of single EV co-expression without the need to resolve single EVs - and (iii) normalizing measured values using two universal normalization probes. Axiomatically, NICE = 1 for stochastic inclusion or no overall co-enrichment, while for positive and negative co-enrichment NICE > 1 or < 1, respectively. We quantified the NICE of tetraspanins, growth factor receptors and integrins in EVs of eight breast cancer cell lines of varying metastatic potential and organotropism, combinatorially map** up to 104 protein pairs. Our analysis revealed protein enrichment and co-expression patterns consistent with previous findings. For the organotropic cell lines, most protein pairs were co-enriched on EVs, with the majority of NICE values between 0.2 to 11.5, and extending from 0.037 to 80.4. Median NICE were either negative, neutral or positive depending on the cells. NICE analysis is easily multiplexed and is compatible with microarrays, bead-based and single EV assays. Additional studies are needed to deepen our understanding of the potential and significance of NICE for research and clinical uses.
△ Less
Submitted 17 May, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
SVSBI: Sequence-based virtual screening of biomolecular interactions
Authors:
Li Shen,
Hongsong Feng,
Yuchi Qiu,
Guo-Wei Wei
Abstract:
Virtual screening (VS) is an essential technique for understanding biomolecular interactions, particularly, drug design and discovery. The best-performing VS models depend vitally on three-dimensional (3D) structures, which are not available in general but can be obtained from molecular docking. However, current docking accuracy is relatively low, rendering unreliable VS models. We introduce seque…
▽ More
Virtual screening (VS) is an essential technique for understanding biomolecular interactions, particularly, drug design and discovery. The best-performing VS models depend vitally on three-dimensional (3D) structures, which are not available in general but can be obtained from molecular docking. However, current docking accuracy is relatively low, rendering unreliable VS models. We introduce sequence-based virtual screening (SVS) as a new generation of VS models for modeling biomolecular interactions. The SVS model utilizes advanced natural language processing (NLP) algorithms and optimizes deep $K$-embedding strategies to encode biomolecular interactions without invoking 3D structure-based docking. We demonstrate the state-of-art performance of SVS for four regression datasets involving protein-ligand binding, protein-protein, protein-nucleic acid binding, and ligand inhibition of protein-protein interactions and five classification datasets for the protein-protein interactions in five biological species. SVS has the potential to dramatically change the current practice in drug discovery and protein engineering.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Coordinating Cross-modal Distillation for Molecular Property Prediction
Authors:
Hao Zhang,
Nan Zhang,
Ruixin Zhang,
Lei Shen,
Yingyi Zhang,
Meng Liu
Abstract:
In recent years, molecular graph representation learning (GRL) has drawn much more attention in molecular property prediction (MPP) problems. The existing graph methods have demonstrated that 3D geometric information is significant for better performance in MPP. However, accurate 3D structures are often costly and time-consuming to obtain, limiting the large-scale application of GRL. It is an intu…
▽ More
In recent years, molecular graph representation learning (GRL) has drawn much more attention in molecular property prediction (MPP) problems. The existing graph methods have demonstrated that 3D geometric information is significant for better performance in MPP. However, accurate 3D structures are often costly and time-consuming to obtain, limiting the large-scale application of GRL. It is an intuitive solution to train with 3D to 2D knowledge distillation and predict with only 2D inputs. But some challenging problems remain open for 3D to 2D distillation. One is that the 3D view is quite distinct from the 2D view, and the other is that the gradient magnitudes of atoms in distillation are discrepant and unstable due to the variable molecular size. To address these challenging problems, we exclusively propose a distillation framework that contains global molecular distillation and local atom distillation. We also provide a theoretical insight to justify how to coordinate atom and molecular information, which tackles the drawback of variable molecular size for atom information distillation. Experimental results on two popular molecular datasets demonstrate that our proposed model achieves superior performance over other methods. Specifically, on the largest MPP dataset PCQM4Mv2 served as an "ImageNet Large Scale Visual Recognition Challenge" in the field of graph ML, the proposed method achieved a 6.9% improvement compared with the best works. And we obtained fourth place with the MAE of 0.0734 on the test-challenge set for OGB-LSC 2022 Graph Regression Task. We will release the code soon.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Multidimensional representations in late-life depression: convergence in neuroimaging, cognition, clinical symptomatology and genetics
Authors:
Junhao Wen,
Cynthia H. Y. Fu,
Duygu Tosun,
Yogasudha Veturi,
Zhijian Yang,
Ahmed Abdulkadir,
Elizabeth Mamourian,
Dhivya Srinivasan,
**gxuan Bao,
Guray Erus,
Haochang Shou,
Mohamad Habes,
Jimit Doshi,
Erdem Varol,
Scott R Mackin,
Aristeidis Sotiras,
Yong Fan,
Andrew J. Saykin,
Yvette I. Sheline,
Li Shen,
Marylyn D. Ritchie,
David A. Wolk,
Marilyn Albert,
Susan M. Resnick,
Christos Davatzikos
Abstract:
Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity would aid in elucidating etiological mechanisms and pave the road to precision and individualized medicine. We sought to delineate, cross-sectionally and longitudinally, disease-related heterogeneity in LLD linked to neuroanatomy, cognitive functioning, clinical sympto…
▽ More
Late-life depression (LLD) is characterized by considerable heterogeneity in clinical manifestation. Unraveling such heterogeneity would aid in elucidating etiological mechanisms and pave the road to precision and individualized medicine. We sought to delineate, cross-sectionally and longitudinally, disease-related heterogeneity in LLD linked to neuroanatomy, cognitive functioning, clinical symptomatology, and genetic profiles. Multimodal data from a multicentre sample (N=996) were analyzed. A semi-supervised clustering method (HYDRA) was applied to regional grey matter (GM) brain volumes to derive dimensional representations. Two dimensions were identified, which accounted for the LLD-related heterogeneity in voxel-wise GM maps, white matter (WM) fractional anisotropy (FA), neurocognitive functioning, clinical phenotype, and genetics. Dimension one (Dim1) demonstrated relatively preserved brain anatomy without WM disruptions relative to healthy controls. In contrast, dimension two (Dim2) showed widespread brain atrophy and WM integrity disruptions, along with cognitive impairment and higher depression severity. Moreover, one de novo independent genetic variant (rs13120336) was significantly associated with Dim 1 but not with Dim 2. Notably, the two dimensions demonstrated significant SNP-based heritability of 18-27% within the general population (N=12,518 in UKBB). Lastly, in a subset of individuals having longitudinal measurements, Dim2 demonstrated a more rapid longitudinal decrease in GM and brain age, and was more likely to progress to Alzheimers disease, compared to Dim1 (N=1,413 participants and 7,225 scans from ADNI, BLSA, and BIOCARD datasets).
△ Less
Submitted 25 October, 2021; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Functional Connectome Fingerprint Gradients in Young Adults
Authors:
Uttara Tipnis,
Kausar Abbas,
Elizabeth Tran,
Enrico Amico,
Li Shen,
Alan D. Kaplan,
Joaquín Goñi
Abstract:
The assessment of brain fingerprints has emerged in the recent years as an important tool to study individual differences and to infer quality of neuroimaging datasets. Studies so far have mainly focused on connectivity fingerprints between different brain scans of the same individual. Here, we extend the concept of brain connectivity fingerprints beyond test/retest and assess fingerprint gradient…
▽ More
The assessment of brain fingerprints has emerged in the recent years as an important tool to study individual differences and to infer quality of neuroimaging datasets. Studies so far have mainly focused on connectivity fingerprints between different brain scans of the same individual. Here, we extend the concept of brain connectivity fingerprints beyond test/retest and assess fingerprint gradients in young adults by develo** an extension of the differential identifiability framework. To do so, we look at the similarity between not only the multiple scans of an individual (subject fingerprint), but also between the scans of monozygotic and dizygotic twins (twin fingerprint). We have carried out this analysis on the 8 fMRI conditions present in the Human Connectome Project -- Young Adult dataset, which we processed into functional connectomes (FCs) and timeseries parcellated according to the Schaefer Atlas scheme, which has multiple levels of resolution. Our differential identifiability results show that the fingerprint gradients based on genetic and environmental similarities are indeed present when comparing FCs for all parcellations and fMRI conditions. Importantly, only when assessing optimally reconstructed FCs, we fully uncover fingerprints present in higher resolution atlases. We also study the effect of scanning length on subject fingerprint of resting-state FCs to analyze the effect of scanning length and parcellation. In the pursuit of open science, we have also made available the processed and parcellated FCs and timeseries for all conditions for ~1200 subjects part of the HCP-YA dataset to the scientific community.
△ Less
Submitted 11 January, 2021; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Network reinforcement driven drug repurposing for COVID-19 by exploiting disease-gene-drug associations
Authors:
Yonghyun Nam,
Jae-Seung Yun,
Seung Mi Lee,
Ji Won Park,
Ziqi Chen,
Brian Lee,
Anurag Verma,
Xia Ning,
Li Shen,
Dokyoon Kim
Abstract:
Currently, the number of patients with COVID-19 has significantly increased. Thus, there is an urgent need for develo** treatments for COVID-19. Drug repurposing, which is the process of reusing already-approved drugs for new medical conditions, can be a good way to solve this problem quickly and broadly. Many clinical trials for COVID-19 patients using treatments for other diseases have already…
▽ More
Currently, the number of patients with COVID-19 has significantly increased. Thus, there is an urgent need for develo** treatments for COVID-19. Drug repurposing, which is the process of reusing already-approved drugs for new medical conditions, can be a good way to solve this problem quickly and broadly. Many clinical trials for COVID-19 patients using treatments for other diseases have already been in place or will be performed at clinical sites in the near future. Additionally, patients with comorbidities such as diabetes mellitus, obesity, liver cirrhosis, kidney diseases, hypertension, and asthma are at higher risk for severe illness from COVID-19. Thus, the relationship of comorbidity disease with COVID-19 may help to find repurposable drugs. To reduce trial and error in finding treatments for COVID-19, we propose building a network-based drug repurposing framework to prioritize repurposable drugs. First, we utilized knowledge of COVID-19 to construct a disease-gene-drug network (DGDr-Net) representing a COVID-19-centric interactome with components for diseases, genes, and drugs. DGDr-Net consisted of 592 diseases, 26,681 human genes and 2,173 drugs, and medical information for 18 common comorbidities. The DGDr-Net recommended candidate repurposable drugs for COVID-19 through network reinforcement driven scoring algorithms. The scoring algorithms determined the priority of recommendations by utilizing graph-based semi-supervised learning. From the predicted scores, we recommended 30 drugs, including dexamethasone, resveratrol, methotrexate, indomethacin, quercetin, etc., as repurposable drugs for COVID-19, and the results were verified with drugs that have been under clinical trials. The list of drugs via a data-driven computational approach could help reduce trial-and-error in finding treatment for COVID-19.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
Cognitive Biomarker Prioritization in Alzheimer's Disease using Brain Morphometric Data
Authors:
Bo Peng,
Xiaohui Yao,
Shannon L. Risacher,
Andrew J. Saykin,
Li Shen,
Xia Ning
Abstract:
Background:Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimer's Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are no…
▽ More
Background:Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimer's Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are not customized for each individual subject. In this manuscript, we develop a machine learning paradigm enabling personalized cognitive assessments prioritization. Method: We adapt a newly developed learning-to-rank approach PLTR to implement our paradigm. This method learns the latent scoring function that pushes the most effective cognitive assessments onto the top of the prioritization list. We also extend PLTR to better separate the most effective cognitive assessments and the less effective ones. Results: Our empirical study on the ADNI data shows that the proposed paradigm outperforms the state-of-the-art baselines on identifying and prioritizing individual-specific cognitive biomarkers. We conduct experiments in cross validation and level-out validation settings. In the two settings, our paradigm significantly outperforms the best baselines with improvement as much as 22.1% and 19.7%, respectively, on prioritizing cognitive features. Conclusions: The proposed paradigm achieves superior performance on prioritizing cognitive biomarkers. The cognitive biomarkers prioritized on top have great potentials to facilitate personalized diagnosis, disease subty**, and ultimately precision medicine in AD.
△ Less
Submitted 12 November, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Deep Learning for Quality Control of Subcortical Brain 3D Shape Models
Authors:
Dmitry Petrov,
Boris A. Gutman Egor Kuznetsov,
Theo G. M. van Erp,
Jessica A. Turner,
Lianne Schmaal,
Dick Veltman,
Lei Wang,
Kathryn Alpert,
Dmitry Isaev,
Artemis Zavaliangos-Petropulu,
Christopher R. K. Ching,
Vince Calhoun,
David Glahn,
Theodore D. Satterthwaite,
Ole Andreas Andreassen,
Stefan Borgwardt,
Fleur Howells,
Nynke Groenewold,
Aristotle Voineskos,
Joaquim Radua,
Steven G. Potkin,
Benedicto Crespo-Facorro,
Diana Tordesillas-Gutierrez,
Li Shen,
Irina Lebedeva
, et al. (48 additional authors not shown)
Abstract:
We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present a…
▽ More
We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present an approach for model decision visualization, allowing human raters to see the areas of subcortical shapes most likely to be deemed of failing quality by the machine. Our training data is comprised of 5200 subjects from the ENIGMA Schizophrenia MRI cohorts, and our test dataset contains 1500 subjects from the ENIGMA Major Depressive Disorder cohorts. Our final models reduce human rater time by 46-70%. ResNet outperforms VGGNet and Inception for all of our predictive tasks.
△ Less
Submitted 4 September, 2018; v1 submitted 30 August, 2018;
originally announced August 2018.
-
Machine Learning for Large-Scale Quality Control of 3D Shape Models in Neuroimaging
Authors:
Dmitry Petrov,
Boris A. Gutman,
Shih-Hua,
Yu,
Theo G. M. van Erp,
Jessica A. Turner,
Lianne Schmaal,
Dick Veltman,
Lei Wang,
Kathryn Alpert,
Dmitry Isaev,
Artemis Zavaliangos-Petropulu,
Christopher R. K. Ching,
Vince Calhoun,
David Glahn,
Theodore D. Satterthwaite,
Ole Andreas Andreasen,
Stefan Borgwardt,
Fleur Howells,
Nynke Groenewold,
Aristotle Voineskos,
Joaquim Radua,
Steven G. Potkin,
Benedicto Crespo-Facorro,
Diana Tordesillas-Gutierrez
, et al. (50 additional authors not shown)
Abstract:
As very large studies of complex neuroimaging phenotypes become more common, human quality assessment of MRI-derived data remains one of the last major bottlenecks. Few attempts have so far been made to address this issue with machine learning. In this work, we optimize predictive models of quality for meshes representing deep brain structure shapes. We use standard vertex-wise and global shape fe…
▽ More
As very large studies of complex neuroimaging phenotypes become more common, human quality assessment of MRI-derived data remains one of the last major bottlenecks. Few attempts have so far been made to address this issue with machine learning. In this work, we optimize predictive models of quality for meshes representing deep brain structure shapes. We use standard vertex-wise and global shape features computed homologously across 19 cohorts and over 7500 human-rated subjects, training kernelized Support Vector Machine and Gradient Boosted Decision Trees classifiers to detect meshes of failing quality. Our models generalize across datasets and diseases, reducing human workload by 30-70\%, or equivalently hundreds of human rater hours for datasets of comparable size, with recall rates approaching inter-rater reliability.
△ Less
Submitted 7 August, 2017; v1 submitted 19 July, 2017;
originally announced July 2017.
-
A Unified Model for Differential Expression Analysis of RNA-seq Data via L1-Penalized Linear Regression
Authors:
Kefei Liu,
Jie** Ye,
Yang Yang,
Li Shen,
Hui Jiang
Abstract:
The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal sin…
▽ More
The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2-norm for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms outperform existing methods in terms of detection power and false-positive rate when more than half of the genes are differentially expressed and/or when the up- and down-regulated genes among DE genes are unbalanced in amount.
△ Less
Submitted 11 October, 2016;
originally announced October 2016.