Search | arXiv e-print repository

Computing Affine Combinations, Distances, and Correlations for Recursive Partition Functions

Abstract: Recursive partitioning is the core of several statistical methods including CART, random forest, and boosted trees. Despite the popularity of tree based methods, to date, there did not exist methods for combining multiple trees into a single tree, or methods for systematically quantifying the discrepancy between two trees. Taking advantage of the recursive structure in trees we formulated fast alg… ▽ More Recursive partitioning is the core of several statistical methods including CART, random forest, and boosted trees. Despite the popularity of tree based methods, to date, there did not exist methods for combining multiple trees into a single tree, or methods for systematically quantifying the discrepancy between two trees. Taking advantage of the recursive structure in trees we formulated fast algorithms for computing affine combinations, distances and correlations in a vector subspace of recursive partition functions. △ Less

Submitted 17 March, 2016; v1 submitted 11 December, 2015; originally announced December 2015.

arXiv:1512.03115 [pdf, ps, other]

Dynamic Geodesics in Treespace via Parametric Maximum Flow

Authors: Sean Skwerer, Scott Provan

Abstract: Shortest paths in treespace, which represent minimal deformations between trees, are unique and can be computed in polynomial time. The ability to quickly compute shortest paths has enabled new approaches for statistical analysis of populations of trees and phylogenetic inference. This paper gives a new algorithm for updating geodesic paths when the end points are dynamic. Such algorithms will be… ▽ More Shortest paths in treespace, which represent minimal deformations between trees, are unique and can be computed in polynomial time. The ability to quickly compute shortest paths has enabled new approaches for statistical analysis of populations of trees and phylogenetic inference. This paper gives a new algorithm for updating geodesic paths when the end points are dynamic. Such algorithms will be especially useful when optimizing for objectives that are functions of distances from a search point to other points e.g. for finding a tree which has the minimum average distance to a collection of trees. Our method for updating treespace shortest paths is based on parametric sensitivity analysis of the maximum flow subproblems that are optimized when solving for a treespace geodesic. △ Less

Submitted 9 December, 2015; originally announced December 2015.

arXiv:1411.6652 [pdf, other]

Persistent homology analysis of brain artery trees

Authors: Paul Bendich, J. S. Marron, Ezra Miller, Alex Pieloch, Sean Skwerer

Abstract: New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and loo** of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the per… ▽ More New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and loo** of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries △ Less

Submitted 24 November, 2014; originally announced November 2014.

arXiv:1411.2923 [pdf, other]

Relative Optimality Conditions and Algorithms for Treespace Fréchet Means

Authors: Sean Skwerer, Scott Provan, J. S. Marron

Abstract: Recent interest in treespaces as well-founded mathematical domains for phylogenetic inference and statistical analysis for populations of anatomical trees has motivated research into efficient and rigorous methods for optimization problems on treespaces. A central problem in this area is computing an average of phylogenetic trees, which is equivalently characterized as the minimizer of the Fréchet… ▽ More Recent interest in treespaces as well-founded mathematical domains for phylogenetic inference and statistical analysis for populations of anatomical trees has motivated research into efficient and rigorous methods for optimization problems on treespaces. A central problem in this area is computing an average of phylogenetic trees, which is equivalently characterized as the minimizer of the Fréchet function. The Fréchet mean can be used for statistical inference and exploratory data analysis: for example it can be leveraged as a test statistic to compare groups via permutation tests, or to find trends in data over time via kernel smoothing. By analyzing the differential properties of the Fréchet function along geodesics in treespace we obtained a theorem describing a decomposition of the derivative along a geodesic. This decomposition theorem is used to formulate optimality conditions which are used as a logical basis for an algorithm to verify relative optimality at points where the Fréchet function gradient does not exist. △ Less

Submitted 16 August, 2017; v1 submitted 16 October, 2014; originally announced November 2014.

MSC Class: 90C48; 90C90

arXiv:1409.5501 [pdf, other]

Tree Oriented Data Analysis

Authors: Sean Skwerer

Abstract: Complex data objects arise in many areas of modern science including evolutionary biology, nueroscience, dynamics of gene expression and medical imaging. Object oriented data analysis (OODA) is the statistical analysis of datasets of complex objects. Data analysis of tree data objects is an exciting research area with interesting questions and challenging problems. This thesis focuses on tree orie… ▽ More Complex data objects arise in many areas of modern science including evolutionary biology, nueroscience, dynamics of gene expression and medical imaging. Object oriented data analysis (OODA) is the statistical analysis of datasets of complex objects. Data analysis of tree data objects is an exciting research area with interesting questions and challenging problems. This thesis focuses on tree oriented statistical methodologies, and algorithms for solving related mathematical optimization problems. This research is motivated by the goal of analyzing a data set of images of human brain arteries. The approach we take here is to use a novel representation of brain artery systems as points in phylogenetic treespace. The treespace property of unique global geodesics leads to a notion of geometric center called a Fréchet mean. For a sample of data points, the Fréchet function is the sum of squared distances from a point to the data points, and the Fréchet mean is the minimizer of the Fréchet function. In this thesis we use properties of the Fréchet function to develop an algorithmic system for computing Fréchet means. Properties of the Fréchet function are also used to show a sticky law of large numbers which describes a surprising stability of the topological tree structure of sample Fréchet means at that of the population Fréchet mean. We also introduce non-parametric regression of brain artery tree structure as a response variable to age based on weighted Fréchet means. △ Less

Submitted 22 September, 2014; v1 submitted 18 September, 2014; originally announced September 2014.

Comments: PhD thesis, University of North Carolina, 2014

arXiv:1202.4267 [pdf, ps, other]

doi 10.1214/12-AAP899

Sticky central limit theorems on open books

Authors: Thomas Hotz, Sean Skwerer, Stephan Huckemann, Huiling Le, J. S. Marron, Jonathan C. Mattingly, Ezra Miller, James Nolen, Megan Owen, Vic Patrangenaru

Abstract: Given a probability distribution on an open book (a metric space obtained by gluing a disjoint union of copies of a half-space along their boundary hyperplanes), we define a precise concept of when the Fréchet mean (barycenter) is sticky. This nonclassical phenomenon is quantified by a law of large numbers (LLN) stating that the empirical mean eventually almost surely lies on the (codimension $1$… ▽ More Given a probability distribution on an open book (a metric space obtained by gluing a disjoint union of copies of a half-space along their boundary hyperplanes), we define a precise concept of when the Fréchet mean (barycenter) is sticky. This nonclassical phenomenon is quantified by a law of large numbers (LLN) stating that the empirical mean eventually almost surely lies on the (codimension $1$ and hence measure $0$) spine that is the glued hyperplane, and a central limit theorem (CLT) stating that the limiting distribution is Gaussian and supported on the spine. We also state versions of the LLN and CLT for the cases where the mean is nonsticky (i.e., not lying on the spine) and partly sticky (i.e., is, on the spine but not sticky). △ Less

Submitted 3 December, 2013; v1 submitted 20 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-AAP899 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP899

Journal ref: Annals of Applied Probability 2013, Vol. 23, No. 6, 2238-2258

Showing 1–6 of 6 results for author: Skwerer, S