-
Altered Topological Structure of the Brain White Matter in Maltreated Children through Topological Data Analysis
Authors:
Moo K. Chung,
Tahmineh Azizi,
Jamie L. Hanson,
Andrew L. Alexander,
Richard J. Davidson,
Seth D. Pollak
Abstract:
Childhood maltreatment may adversely affect brain development and consequently influence behavioral, emotional, and psychological patterns during adulthood. In this study, we propose an analytical pipeline for modeling the altered topological structure of brain white matter in maltreated and typically develo** children. We perform topological data analysis (TDA) to assess the alteration in the g…
▽ More
Childhood maltreatment may adversely affect brain development and consequently influence behavioral, emotional, and psychological patterns during adulthood. In this study, we propose an analytical pipeline for modeling the altered topological structure of brain white matter in maltreated and typically develo** children. We perform topological data analysis (TDA) to assess the alteration in the global topology of the brain white-matter structural covariance network among children. We use persistent homology, an algebraic technique in TDA, to analyze topological features in the brain covariance networks constructed from structural magnetic resonance imaging (MRI) and diffusion tensor imaging (DTI). We develop a novel framework for statistical inference based on the Wasserstein distance to assess the significance of the observed topological differences. Using these methods in comparing maltreated children to a typically develo** control group, we find that maltreatment may increase homogeneity in white matter structures and thus induce higher correlations in the structural covariance; this is reflected in the topological profile. Our findings strongly suggest that TDA can be a valuable framework to model altered topological structures of the brain. The MATLAB codes and processed data used in this study can be found at https://github.com/laplcebeltrami/maltreated.
△ Less
Submitted 14 November, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer
Authors:
Mu Gao,
Mark Coletti,
Russell B. Davidson,
Ryan Prout,
Subil Abraham,
Benjamin Hernandez,
Ada Sedova
Abstract:
Deep learning has contributed to major advances in the prediction of protein structure from sequence, a fundamental problem in structural bioinformatics. With predictions now approaching the accuracy of crystallographic resolution in some cases, and with accelerators like GPUs and TPUs making inference using large models rapid, fast genome-level structure prediction becomes an obvious aim. Leaders…
▽ More
Deep learning has contributed to major advances in the prediction of protein structure from sequence, a fundamental problem in structural bioinformatics. With predictions now approaching the accuracy of crystallographic resolution in some cases, and with accelerators like GPUs and TPUs making inference using large models rapid, fast genome-level structure prediction becomes an obvious aim. Leadership-class computing resources can be used to perform genome-scale protein structure prediction using state-of-the-art deep learning models, providing a wealth of new data for systems biology applications. Here we describe our efforts to efficiently deploy the AlphaFold2 program, for full-proteome structure prediction, at scale on the Oak Ridge Leadership Computing Facility's resources, including the Summit supercomputer. We performed inference to produce the predicted structures for 35,634 protein sequences, corresponding to three prokaryotic proteomes and one plant proteome, using under 4,000 total Summit node hours, equivalent to using the majority of the supercomputer for one hour. We also designed an optimized structure refinement that reduced the time for the relaxation stage of the AlphaFold pipeline by over 10X for longer sequences. We demonstrate the types of analyses that can be performed on proteome-scale collections of sequences, including a search for novel quaternary structures and implications for functional annotation.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Exact Combinatorial Inference for Brain Images
Authors:
Moo K. Chung,
Zhan Luo,
Alex D. Leow,
Andrew L. Alexander,
Richard J. Davidson,
H. Hill Goldsmith
Abstract:
The permutation test is known as the exact test procedure in statistics. However, often it is not exact in practice and only an approximate method since only a small fraction of every possible permutation is generated. Even for a small sample size, it often requires to generate tens of thousands permutations, which can be a serious computational bottleneck. In this paper, we propose a novel combin…
▽ More
The permutation test is known as the exact test procedure in statistics. However, often it is not exact in practice and only an approximate method since only a small fraction of every possible permutation is generated. Even for a small sample size, it often requires to generate tens of thousands permutations, which can be a serious computational bottleneck. In this paper, we propose a novel combinatorial inference procedure that enumerates all possible permutations combinatorially without any resampling. The proposed method is validated against the standard permutation test in simulation studies with the ground truth. The method is further applied in twin DTI study in determining the genetic contribution of the minimum spanning tree of the structural brain connectivity.
△ Less
Submitted 8 July, 2018;
originally announced July 2018.
-
Dimensions of Group-based Phylogenetic Mixtures
Authors:
Hector Baños,
Nathaniel Bushek,
Ruth Davidson,
Elizabeth Gross,
Pamela E. Harris,
Robert Krone,
Colby Long,
Allen Stewart,
Robert Walker
Abstract:
In this paper we study group-based Markov models of evolution and their mixtures. In the algebreo-geometric setting, group-based phylogenetic tree models correspond to toric varieties, while their mixtures correspond to secant and join varieties. Determining properties of these secant and join varieties can aid both in model selection and establishing parameter identifiability. Here we explore the…
▽ More
In this paper we study group-based Markov models of evolution and their mixtures. In the algebreo-geometric setting, group-based phylogenetic tree models correspond to toric varieties, while their mixtures correspond to secant and join varieties. Determining properties of these secant and join varieties can aid both in model selection and establishing parameter identifiability. Here we explore the first natural geometric property of these varieties: their dimension. The expected projective dimension of the join variety of a set of varieties is one more than the sum of their dimensions. A join variety that realizes the expected dimension is nondefective. Nondefectiveness is not only interesting from a geometric point-of-view, but has been used to establish combinatorial identifiability for several classes of phylogenetic mixture models. In this paper, we focus on group-based models where the equivalence classes of identified parameters are orbits of a subgroup of the automorphism group of the group defining the model. In particular, we show that, for these group-based models, the variety corresponding to the mixture of $r$ trees with $n$ leaves is nondefective when $n \geq 2r+5$. We also give improved bounds for claw trees and give computational evidence that 2-tree and 3-tree mixtures are nondefective for small~$n$.
△ Less
Submitted 23 November, 2017;
originally announced November 2017.
-
A combinatorial method for connecting BHV spaces representing different numbers of taxa
Authors:
Yingying Ren,
Sihan Zha,
**gwen Bi,
José A. Sanchez,
Cara Monical,
Michelle Delcourt,
Rosemary K. Guzman,
Ruth Davidson
Abstract:
The phylogenetic tree space introduced by Billera, Holmes, and Vogtmann (BHV tree space) is a CAT(0) continuous space that represents trees with edge weights with an intrinsic geodesic distance measure. The geodesic distance measure unique to BHV tree space is well known to be computable in polynomial time, which makes it a potentially powerful tool for optimization problems in phylogenetics and p…
▽ More
The phylogenetic tree space introduced by Billera, Holmes, and Vogtmann (BHV tree space) is a CAT(0) continuous space that represents trees with edge weights with an intrinsic geodesic distance measure. The geodesic distance measure unique to BHV tree space is well known to be computable in polynomial time, which makes it a potentially powerful tool for optimization problems in phylogenetics and phylogenomics. Specifically, there is significant interest in comparing and combining phylogenetic trees. For example, BHV tree space has been shown to be potentially useful in tree summary and consensus methods, which require combining trees with different number of leaves. Yet an open problem is to transition between BHV tree spaces of different maximal dimension, where each maximal dimension corresponds to the complete set of edge-weighted trees with a fixed number of leaves. We show a combinatorial method to transition between copies of BHV tree spaces in which trees with different numbers of taxa can be studied, derived from its topological structure and geometric properties. This method removes obstacles for embedding problems such as supertree and consensus methods in the BHV treespace framework.
△ Less
Submitted 3 December, 2017; v1 submitted 8 August, 2017;
originally announced August 2017.
-
Phylogenetic trees
Authors:
Hector Baños,
Nathaniel Bushek,
Ruth Davidson,
Elizabeth Gross,
Pamela E. Harris,
Robert Krone,
Colby Long,
Allen Stewart,
Robert Walker
Abstract:
We introduce the package PhylogeneticTrees for Macaulay2 which allows users to compute phylogenetic invariants for group-based tree models. We provide some background information on phylogenetic algebraic geometry and show how the package PhylogeneticTrees can be used to calculate a generating set for a phylogenetic ideal as well as a lower bound for its dimension. Finally, we show how methods wit…
▽ More
We introduce the package PhylogeneticTrees for Macaulay2 which allows users to compute phylogenetic invariants for group-based tree models. We provide some background information on phylogenetic algebraic geometry and show how the package PhylogeneticTrees can be used to calculate a generating set for a phylogenetic ideal as well as a lower bound for its dimension. Finally, we show how methods within the package can be used to compute a generating set for the join of any two ideals.
△ Less
Submitted 17 November, 2016;
originally announced November 2016.
-
Modeling the distribution of distance data in Euclidean space
Authors:
Ruth Davidson,
Joseph Rusinko,
Zoe Vernon,
**g Xi
Abstract:
Phylogenetic inference-the derivation of a hypothesis for the common evolutionary history of a group of species- is an active area of research at the intersection of biology, computer science, mathematics, and statistics. One assumes the data contains a phylogenetic signal that will be recovered with varying accuracy due to the quality of the method used, and the quality of the data.
The input f…
▽ More
Phylogenetic inference-the derivation of a hypothesis for the common evolutionary history of a group of species- is an active area of research at the intersection of biology, computer science, mathematics, and statistics. One assumes the data contains a phylogenetic signal that will be recovered with varying accuracy due to the quality of the method used, and the quality of the data.
The input for distance-based inference methods is an element of a Euclidean space with coordinates indexed by the pairs of organisms. For several algorithms there exists a subdivision of this space into polyhedral cones such that inputs in the same cone return the same tree topology. The geometry of these cones has been used to analyze the inference algorithms. In this chapter, we model how input data points drawn from DNA sequences are distributed throughout Euclidean space in relation to the space of tree metrics, which in turn can also be described as a collection of polyhedral cones.
△ Less
Submitted 20 June, 2016;
originally announced June 2016.
-
Evaluating consistency of deterministic streamline tractography in non-linearly warped DTI data
Authors:
Nagesh Adluru,
Daniel J. Destiche,
Do P. M. Tromp,
Richard J. Davidson,
Hui Zhang,
Andrew L. Alexander
Abstract:
Tractography is typically performed for each subject using the diffusion tensor imaging (DTI) data in its native subject space rather than in some space common to the entire study cohort. Despite performing tractography on a population average in a normalized space, the latter is considered less favorably at the \emph{individual} subject level because it requires spatial transformations of DTI dat…
▽ More
Tractography is typically performed for each subject using the diffusion tensor imaging (DTI) data in its native subject space rather than in some space common to the entire study cohort. Despite performing tractography on a population average in a normalized space, the latter is considered less favorably at the \emph{individual} subject level because it requires spatial transformations of DTI data that involve non-linear war** and reorientation of the tensors. Although the commonly used reorientation strategies such as finite strain and preservation of principle direction are expected to result in adequate accuracy for voxel based analyses of DTI measures such as fractional anisotropy (FA), mean diffusivity (MD), the reorientations are not always exact except in the case of rigid transformations. Small imperfections in reorientation at individual voxel level accumulate and could potentially affect the tractography results adversely. This study aims to evaluate and compare deterministic white matter fiber tracking in non-linearly warped DTI against that in native DTI. The data present promising evidence that tractography in non-linear warped DTI data could indeed be a viable and valid option for various statistical analysis of DTI data in a spatially normalized space.
△ Less
Submitted 5 February, 2016;
originally announced February 2016.
-
Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods
Authors:
Ruth Davidson,
MaLyn Lawhorn,
Joseph Rusinko,
Noah Weber
Abstract:
Quartet trees displayed by larger phylogenetic trees have long been used as inputs for species tree and supertree reconstruction. Computational constraints prevent the use of all displayed quartets in many practical problems due to the number of taxa. We introduce the notion of an Efficient Quartet System (EQS) to represent a phylogenetic tree with a subset of the quartets displayed by the tree. W…
▽ More
Quartet trees displayed by larger phylogenetic trees have long been used as inputs for species tree and supertree reconstruction. Computational constraints prevent the use of all displayed quartets in many practical problems due to the number of taxa. We introduce the notion of an Efficient Quartet System (EQS) to represent a phylogenetic tree with a subset of the quartets displayed by the tree. We show mathematically that the set of quartets obtained from a tree via an EQS contains all of the combinatorial information of the tree itself. Using performance tests on simulated datasets, we also demonstrate that using an EQS to reduce the number of quartets in pipelines for summary methods of species tree inference and supertree inference results in only small reductions in accuracy.
△ Less
Submitted 5 December, 2016; v1 submitted 16 December, 2015;
originally announced December 2015.
-
Distance-based phylogenetic methods around a polytomy
Authors:
Ruth Davidson,
Seth Sullivant
Abstract:
Distance-based phylogenetic algorithms attempt to solve the NP-hard least squares phylogeny problem by map** an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and Neighbor-J…
▽ More
Distance-based phylogenetic algorithms attempt to solve the NP-hard least squares phylogeny problem by map** an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and Neighbor-Joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones.
A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least squares phylogeny, UPGMA, and Neighbor-Joining. Our results suggest that in some circumstances, UPGMA and Neighbor-Joining poorly match least squares phylogeny when the true tree has a polytomy.
△ Less
Submitted 22 July, 2013;
originally announced July 2013.
-
Polyhedral Combinatorics of UPGMA Cones
Authors:
Ruth Davidson,
Seth Sullivant
Abstract:
Distance-based methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) continue to play a significant role in phylogenetic research. We use polyhedral combinatorics to analyze the natural subdivision of the positive orthant induced by classifying the input vectors according to tree topologies returned by the algorithm. The partition lattice informs the study of UPGMA trees. We gi…
▽ More
Distance-based methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) continue to play a significant role in phylogenetic research. We use polyhedral combinatorics to analyze the natural subdivision of the positive orthant induced by classifying the input vectors according to tree topologies returned by the algorithm. The partition lattice informs the study of UPGMA trees. We give a closed form for the extreme rays of UPGMA cones on n taxa, and compute the normalized volumes of the UPGMA cones for small n.
Keywords: phylogenetic trees, polyhedral combinatorics, partition lattice
△ Less
Submitted 7 June, 2012;
originally announced June 2012.