-
A principled framework to assess information theoretical fitness of brain functional sub-circuits
Authors:
Duy Duong-Tran,
Nghi Nguyen,
Shizhuo Mu,
Jiong Chen,
**gxuan Bao,
Frederick Xu,
Sumita Garai,
Jose Cadena-Pico,
Alan David Kaplan,
Tianlong Chen,
Yize Zhao,
Li Shen,
Joaquín Goñi
Abstract:
In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is map** a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is t…
▽ More
In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is map** a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is thresholding weighted FCs to remove spurious connections without justifying the chosen threshold. This paper leverages recent theoretical advances in Stochastic Block Models (SBMs) to formally define and quantify the information-theoretic fitness (e.g., prominence) of a predetermined set of FNs when mapped to individual FCs under different fMRI task conditions. Our framework allows for evaluating any combination of FC granularity, FN partition, and thresholding strategy, thereby optimizing these choices to preserve important topological features of the human brain connectomes. Our results pave the way for the proper use of predetermined FNs and thresholding methods and provide insights for future research in individualized parcellations.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models
Authors:
Alan D. Kaplan,
Priyadip Ray,
John D. Greene,
Vincent X. Liu
Abstract:
In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work…
▽ More
In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data. The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications. It can be trained on original data, without requiring any lossy transformations or time binning. Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values. We train this model on data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The results are evaluated against held-out data for predicting the length of sequences and presence of Intensive Care Unit (ICU) in hospitalization bed sequences. Our method outperforms a baseline approach, showing that in these experiments the trained model captures information in the sequences that is informative of their future values.
△ Less
Submitted 24 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Tangent functional connectomes uncover more unique phenotypic traits
Authors:
Kausar Abbas,
Mintao Liu,
Michael Wang,
Duy Duong-Tran,
Uttara Tipnis,
Enrico Amico,
Alan D. Kaplan,
Mario Dzemidzic,
David Kareken,
Beau M. Ances,
Jaroslaw Harezlak,
Joaquín Goñi
Abstract:
Functional connectomes (FCs) contain pairwise estimations of functional couplings based on pairs of brain regions activity. FCs are commonly represented as correlation matrices that are symmetric positive definite (SPD) lying on or inside the SPD manifold. Since the geometry on the SPD manifold is non-Euclidean, the inter-related entries of FCs undermine the use of Euclidean-based distances. By pr…
▽ More
Functional connectomes (FCs) contain pairwise estimations of functional couplings based on pairs of brain regions activity. FCs are commonly represented as correlation matrices that are symmetric positive definite (SPD) lying on or inside the SPD manifold. Since the geometry on the SPD manifold is non-Euclidean, the inter-related entries of FCs undermine the use of Euclidean-based distances. By projecting FCs into a tangent space, we can obtain tangent functional connectomes (tangent-FCs). Tangent-FCs have shown a higher predictive power of behavior and cognition, but no studies have evaluated the effect of such projections with respect to fingerprinting. We hypothesize that tangent-FCs have a higher fingerprint than regular FCs. Fingerprinting was measured by identification rates (ID rates) on test-retest FCs as well as on monozygotic and dizygotic twins. Our results showed that identification rates are systematically higher when using tangent-FCs. Specifically, we found: (i) Riemann and log-Euclidean matrix references systematically led to higher ID rates. (ii) In tangent-FCs, Main-diagonal regularization prior to tangent space projection was critical for ID rate when using Euclidean distance, whereas barely affected ID rates when using correlation distance. (iii) ID rates were dependent on condition and fMRI scan length. (iv) Parcellation granularity was key for ID rates in FCs, as well as in tangent-FCs with fixed regularization, whereas optimal regularization of tangent-FCs mostly removed this effect. (v) Correlation distance in tangent-FCs outperformed any other configuration of distance on FCs or on tangent-FCs across the fingerprint gradient (here sampled by assessing test-retest, Monozygotic and Dizygotic twins). (vi)ID rates tended to be higher in task scans compared to resting-state scans when accounting for fMRI scan length.
△ Less
Submitted 9 June, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Unsupervised Probabilistic Models for Sequential Electronic Health Records
Authors:
Alan D. Kaplan,
John D. Greene,
Vincent X. Liu,
Priyadip Ray
Abstract:
We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrou** and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables…
▽ More
We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrou** and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.
△ Less
Submitted 31 August, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Continuous-Time Probabilistic Models for Longitudinal Electronic Health Records
Authors:
Alan D. Kaplan,
Uttara Tipnis,
Jean C. Beckham,
Nathan A. Kimbrel,
David W. Oslin,
Benjamin H. McMahon
Abstract:
Analysis of longitudinal Electronic Health Record (EHR) data is an important goal for precision medicine. Difficulty in applying Machine Learning (ML) methods, either predictive or unsupervised, stems in part from the heterogeneity and irregular sampling of EHR data. We present an unsupervised probabilistic model that captures nonlinear relationships between variables over continuous-time. This me…
▽ More
Analysis of longitudinal Electronic Health Record (EHR) data is an important goal for precision medicine. Difficulty in applying Machine Learning (ML) methods, either predictive or unsupervised, stems in part from the heterogeneity and irregular sampling of EHR data. We present an unsupervised probabilistic model that captures nonlinear relationships between variables over continuous-time. This method works with arbitrary sampling patterns and captures the joint probability distribution between variable measurements and the time intervals between them. Inference algorithms are derived that can be used to evaluate the likelihood of future using under a trained model. As an example, we consider data from the United States Veterans Health Administration (VHA) in the areas of diabetes and depression. Likelihood ratio maps are produced showing the likelihood of risk for moderate-severe vs minimal depression as measured by the Patient Health Questionnaire-9 (PHQ-9).
△ Less
Submitted 14 April, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Mixture Model Framework for Traumatic Brain Injury Prognosis Using Heterogeneous Clinical and Outcome Data
Authors:
Alan D. Kaplan,
Qi Cheng,
K. Aditya Mohan,
Lindsay D. Nelson,
Sonia Jain,
Harvey Levin,
Abel Torres-Espin,
Austin Chou,
J. Russell Huie,
Adam R. Ferguson,
Michael McCrea,
Joseph Giacino,
Shivshankar Sundaram,
Amy J. Markowitz,
Geoffrey T. Manley
Abstract:
Prognoses of Traumatic Brain Injury (TBI) outcomes are neither easily nor accurately determined from clinical indicators. This is due in part to the heterogeneity of damage inflicted to the brain, ultimately resulting in diverse and complex outcomes. Using a data-driven approach on many distinct data elements may be necessary to describe this large set of outcomes and thereby robustly depict the n…
▽ More
Prognoses of Traumatic Brain Injury (TBI) outcomes are neither easily nor accurately determined from clinical indicators. This is due in part to the heterogeneity of damage inflicted to the brain, ultimately resulting in diverse and complex outcomes. Using a data-driven approach on many distinct data elements may be necessary to describe this large set of outcomes and thereby robustly depict the nuanced differences among TBI patients' recovery. In this work, we develop a method for modeling large heterogeneous data types relevant to TBI. Our approach is geared toward the probabilistic representation of mixed continuous and discrete variables with missing values. The model is trained on a dataset encompassing a variety of data types, including demographics, blood-based biomarkers, and imaging findings. In addition, it includes a set of clinical outcome assessments at 3, 6, and 12 months post-injury. The model is used to stratify patients into distinct groups in an unsupervised learning setting. We use the model to infer outcomes using input data, and show that the collection of input data reduces uncertainty of outcomes over a baseline approach. In addition, we quantify the performance of a likelihood scoring technique that can be used to self-evaluate the extrapolation risk of prognosis on unseen patients.
△ Less
Submitted 20 July, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Functional Connectome Fingerprint Gradients in Young Adults
Authors:
Uttara Tipnis,
Kausar Abbas,
Elizabeth Tran,
Enrico Amico,
Li Shen,
Alan D. Kaplan,
Joaquín Goñi
Abstract:
The assessment of brain fingerprints has emerged in the recent years as an important tool to study individual differences and to infer quality of neuroimaging datasets. Studies so far have mainly focused on connectivity fingerprints between different brain scans of the same individual. Here, we extend the concept of brain connectivity fingerprints beyond test/retest and assess fingerprint gradient…
▽ More
The assessment of brain fingerprints has emerged in the recent years as an important tool to study individual differences and to infer quality of neuroimaging datasets. Studies so far have mainly focused on connectivity fingerprints between different brain scans of the same individual. Here, we extend the concept of brain connectivity fingerprints beyond test/retest and assess fingerprint gradients in young adults by develo** an extension of the differential identifiability framework. To do so, we look at the similarity between not only the multiple scans of an individual (subject fingerprint), but also between the scans of monozygotic and dizygotic twins (twin fingerprint). We have carried out this analysis on the 8 fMRI conditions present in the Human Connectome Project -- Young Adult dataset, which we processed into functional connectomes (FCs) and timeseries parcellated according to the Schaefer Atlas scheme, which has multiple levels of resolution. Our differential identifiability results show that the fingerprint gradients based on genetic and environmental similarities are indeed present when comparing FCs for all parcellations and fMRI conditions. Importantly, only when assessing optimally reconstructed FCs, we fully uncover fingerprints present in higher resolution atlases. We also study the effect of scanning length on subject fingerprint of resting-state FCs to analyze the effect of scanning length and parcellation. In the pursuit of open science, we have also made available the processed and parcellated FCs and timeseries for all conditions for ~1200 subjects part of the HCP-YA dataset to the scientific community.
△ Less
Submitted 11 January, 2021; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Geodesic distance on optimally regularized functional connectomes uncovers individual fingerprints
Authors:
Kausar Abbas,
Mintao Liu,
Manasij Venkatesh,
Enrico Amico,
Alan David Kaplan,
Mario Ventresca,
Luiz Pessoa,
Jaroslaw Harezlak,
Joaquín Goñi
Abstract:
Background: Functional connectomes (FCs), have been shown to provide a reproducible individual fingerprint, which has opened the possibility of personalized medicine for neuro/psychiatric disorders. Thus, develo** accurate ways to compare FCs is essential to establish associations with behavior and/or cognition at the individual-level.
Methods: Canonically, FCs are compared using Pearson's cor…
▽ More
Background: Functional connectomes (FCs), have been shown to provide a reproducible individual fingerprint, which has opened the possibility of personalized medicine for neuro/psychiatric disorders. Thus, develo** accurate ways to compare FCs is essential to establish associations with behavior and/or cognition at the individual-level.
Methods: Canonically, FCs are compared using Pearson's correlation coefficient of the entire functional connectivity profiles. Recently, it has been proposed that the use of geodesic distance is a more accurate way of comparing functional connectomes, one which reflects the underlying non-Euclidean geometry of the data. Computing geodesic distance requires FCs to be positive-definite and hence invertible matrices. As this requirement depends on the fMRI scanning length and the parcellation used, it is not always attainable and sometimes a regularization procedure is required.
Results: In the present work, we show that regularization is not only an algebraic operation for making FCs invertible, but also that an optimal magnitude of regularization leads to systematically higher fingerprints. We also show evidence that optimal regularization is dataset-dependent, and varies as a function of condition, parcellation, scanning length, and the number of frames used to compute the FCs.
Discussion: We demonstrate that a universally fixed regularization does not fully uncover the potential of geodesic distance on individual fingerprinting, and indeed could severely diminish it. Thus, an optimal regularization must be estimated on each dataset to uncover the most differentiable across-subject and reproducible within-subject geodesic distances between FCs. The resulting pairwise geodesic distances at the optimal regularization level constitute a very reliable quantification of differences between subjects.
△ Less
Submitted 31 March, 2021; v1 submitted 11 March, 2020;
originally announced March 2020.