-
A Topology Scavenger Hunt to Introduce Topological Data Analysis
Authors:
Lori Ziegelmeier
Abstract:
Topology at the undergraduate level is often a theoretical mathematics course, introducing concepts from point-set topology or possibly algebraic topology. However, the last two decades have seen an explosion of growth in applied topology and topological data analysis, which are topics that can be presented in an accessible way to undergraduate students and can encourage exciting projects. For the…
▽ More
Topology at the undergraduate level is often a theoretical mathematics course, introducing concepts from point-set topology or possibly algebraic topology. However, the last two decades have seen an explosion of growth in applied topology and topological data analysis, which are topics that can be presented in an accessible way to undergraduate students and can encourage exciting projects. For the past several years, the Topology course at Macalester College has included content from point-set and algebraic topology, as well as applied topology, culminating in a project chosen by the students. In the course, students work through a topology scavenger hunt as an activity to introduce the ideas and software behind some of the primary tools in topological data analysis, namely, persistent homology and mapper. This scavenger hunt includes a variety of point clouds of varying dimensions, such as an annulus in 2D, a bouquet of loops in 3D, a sphere in 4D, and a torus in 400D. The students' goal is to analyze each point cloud with a variety of software to infer the topological structure. After completing this activity, students are able to extend the ideas learned in the scavenger hunt to an open-ended capstone project. Examples of past projects include: using persistence to explore the relationship between country development and geography, to analyze congressional voting patterns, and to classify genres of a large corpus of texts by combining with tools from natural language processing and machine learning.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
U-match factorization: sparse homological algebra, lazy cycle representatives, and dualities in persistent (co)homology
Authors:
Haibin Hang,
Chad Giusti,
Lori Ziegelmeier,
Gregory Henselman-Petrusek
Abstract:
Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns numbered in billions. Low-rank approximation of such arrays typically destroys essential information; thus, new mathematical and computational paradigms are needed f…
▽ More
Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns numbered in billions. Low-rank approximation of such arrays typically destroys essential information; thus, new mathematical and computational paradigms are needed for very large, sparse matrices.
We present the U-match matrix factorization scheme to address this challenge. U-match has two desirable features. First, it admits a compressed storage format that reduces the number of nonzero entries held in computer memory by one or more orders of magnitude over other common factorizations. Second, it permits direct solution of diverse problems in linear and homological algebra, without decompressing matrices stored in memory. These problems include look-up and retrieval of rows and columns; evaluation of birth/death times, and extraction of generators in persistent (co)homology; and, calculation of bases for boundary and cycle subspaces of filtered chain complexes. Such bases are key to unlocking a range of other topological techniques for use in TDA, and U-match factorization is designed to make such calculations broadly accessible to practitioners.
As an application, we show that individual cycle representatives in persistent homology can be retrieved at time and memory costs orders of magnitude below current state of the art, via global duality. Moreover, the algebraic machinery needed to achieve this computation already exists in many modern solvers.
△ Less
Submitted 20 August, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Minimal Cycle Representatives in Persistent Homology using Linear Programming: an Empirical Study with User's Guide
Authors:
Lu Li,
Connor Thompson,
Gregory Henselman-Petrusek,
Chad Giusti,
Lori Ziegelmeier
Abstract:
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the cont…
▽ More
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several $\ell_1$-minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices.
Our key findings are: (i) optimization is effective in reducing the size of cycle representatives, (ii) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis in most data sets we consider, (iii) the choice of linear solvers matters a lot to the computation time of optimizing cycles, (iv) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, (v) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in {-1, 0, 1} and therefore, are also solutions to a restricted $\ell_0$ optimization problem, and (vi) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes.
△ Less
Submitted 17 October, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Capturing Dynamics of Time-Varying Data via Topology
Authors:
Lu Xian,
Henry Adams,
Chad M. Topaz,
Lori Ziegelmeier
Abstract:
One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, c…
▽ More
One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [19], crocker plots [56], and multiparameter rank functions [37]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable continuity property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [58]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.
△ Less
Submitted 28 June, 2021; v1 submitted 7 October, 2020;
originally announced October 2020.
-
Analyzing Collective Motion with Machine Learning and Topology
Authors:
Dhananjay Bhaskar,
Angelika Manhart,
Jesse Milzman,
John T. Nardini,
Kathleen Storey,
Chad M. Topaz,
Lori Ziegelmeier
Abstract:
We use topological data analysis and machine learning to study a seminal model of collective motion in biology [D'Orsogna et al., Phys. Rev. Lett. 96 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simula…
▽ More
We use topological data analysis and machine learning to study a seminal model of collective motion in biology [D'Orsogna et al., Phys. Rev. Lett. 96 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based in topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.
△ Less
Submitted 3 February, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Local Versus Global Distances for Zigzag Persistence Modules
Authors:
Ellen Gasparovic,
Maria Gommel,
Emilie Purvine,
Radmila Sazdanovic,
Bei Wang,
Yusu Wang,
Lori Ziegelmeier
Abstract:
This short note establishes explicit and broadly applicable relationships between persistence-based distances computed locally and globally. In particular, we show that the bottleneck distance between two zigzag persistence modules restricted to an interval is always bounded above by the distance between the unrestricted versions. While this result is not surprising, it could have different practi…
▽ More
This short note establishes explicit and broadly applicable relationships between persistence-based distances computed locally and globally. In particular, we show that the bottleneck distance between two zigzag persistence modules restricted to an interval is always bounded above by the distance between the unrestricted versions. While this result is not surprising, it could have different practical implications. We give two related applications for metric graph distances, as well as an extension for the matching distance between multiparameter persistence modules.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
The Relationship Between the Intrinsic Cech and Persistence Distortion Distances for Metric Graphs
Authors:
Ellen Gasparovic,
Maria Gommel,
Emilie Purvine,
Radmila Sazdanovic,
Bei Wang,
Yusu Wang,
Lori Ziegelmeier
Abstract:
Metric graphs are meaningful objects for modeling complex structures that arise in many real-world applications, such as road networks, river systems, earthquake faults, blood vessels, and filamentary structures in galaxies. To study metric graphs in the context of comparison, we are interested in determining the relative discriminative capabilities of two topology-based distances between a pair o…
▽ More
Metric graphs are meaningful objects for modeling complex structures that arise in many real-world applications, such as road networks, river systems, earthquake faults, blood vessels, and filamentary structures in galaxies. To study metric graphs in the context of comparison, we are interested in determining the relative discriminative capabilities of two topology-based distances between a pair of arbitrary finite metric graphs: the persistence distortion distance and the intrinsic Cech distance. We explicitly show how to compute the intrinsic Cech distance between two metric graphs based solely on knowledge of the shortest systems of loops for the graphs. Our main theorem establishes an inequality between the intrinsic Cech and persistence distortion distances in the case when one of the graphs is a bouquet graph and the other is arbitrary. The relationship also holds when both graphs are constructed via wedge sums of cycles and edges.
△ Less
Submitted 13 December, 2018;
originally announced December 2018.
-
Assessing biological models using topological data analysis
Authors:
M. Ulmer,
Lori Ziegelmeier,
Chad M. Topaz
Abstract:
We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects, and the second model is a control model that exclud…
▽ More
We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects, and the second model is a control model that excludes these interactions. We compare data from each model to data from experiment by performing statistical tests based on three different sets of measures. First, we use time series of order parameters commonly used in collective motion studies. These order parameters measure the overall polarization and angular momentum of the group, and do not rely on a priori knowledge of the models that produced the data. Second, we use order parameter time series that do rely on a priori knowledge, namely average distance to nearest neighbor and percentage of aphids moving. Third, we use computational persistent homology to calculate topological signatures of the data. Analysis of the a priori order parameters indicates that the interactive model better describes the experimental data than the control model does. The topological approach performs as well as these a priori order parameters and better than the other order parameters, suggesting the utility of the topological approach in the absence of specific knowledge of mechanisms underlying the data.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
On Homotopy Types of Vietoris-Rips Complexes of Metric Gluings
Authors:
Michal Adamaszek,
Henry Adams,
Ellen Gasparovic,
Maria Gommel,
Emilie Purvine,
Radmila Sazdanovic,
Bei Wang,
Yusu Wang,
Lori Ziegelmeier
Abstract:
We study Vietoris-Rips complexes of metric wedge sums and metric gluings. We show that the Vietoris-Rips complex of a wedge sum, equipped with a natural metric, is homotopy equivalent to the wedge sum of the Vietoris-Rips complexes. We also provide generalizations for when two metric spaces are glued together along a common isometric subset. As our main example, we deduce the homotopy type of the…
▽ More
We study Vietoris-Rips complexes of metric wedge sums and metric gluings. We show that the Vietoris-Rips complex of a wedge sum, equipped with a natural metric, is homotopy equivalent to the wedge sum of the Vietoris-Rips complexes. We also provide generalizations for when two metric spaces are glued together along a common isometric subset. As our main example, we deduce the homotopy type of the Vietoris-Rips complex of two metric graphs glued together along a sufficiently short path (compared to lengths of certain loops in the input graphs). As a result, we can describe the persistent homology, in all homological dimensions, of the Vietoris-Rips complexes of a wide class of metric graphs.
△ Less
Submitted 12 August, 2019; v1 submitted 17 December, 2017;
originally announced December 2017.
-
Mind the Gap: A Study in Global Development through Persistent Homology
Authors:
Andrew Banman,
Lori Ziegelmeier
Abstract:
The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross nati…
▽ More
The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross national income per capita, were used to quantify the development. Two analyses were performed. The first considers clusters of the countries based on these indicators, and the second uncovers cycles in the data when combined with geographic border structure. Our analysis is a multi-scale approach that reveals similarities and connections among countries at a variety of levels. We discover localized development patterns that are invisible in standard statistical methods.
△ Less
Submitted 10 January, 2018; v1 submitted 27 February, 2017;
originally announced February 2017.
-
A Complete Characterization of the 1-Dimensional Intrinsic Cech Persistence Diagrams for Metric Graphs
Authors:
Ellen Gasparovic,
Maria Gommel,
Emilie Purvine,
Radmila Sazdanovic,
Bei Wang,
Yusu Wang,
Lori Ziegelmeier
Abstract:
Metric graphs are special types of metric spaces used to model and represent simple, ubiquitous, geometric relations in data such as biological networks, social networks, and road networks. We are interested in giving a qualitative description of metric graphs using topological summaries. In particular, we provide a complete characterization of the 1-dimensional intrinsic Cech persistence diagrams…
▽ More
Metric graphs are special types of metric spaces used to model and represent simple, ubiquitous, geometric relations in data such as biological networks, social networks, and road networks. We are interested in giving a qualitative description of metric graphs using topological summaries. In particular, we provide a complete characterization of the 1-dimensional intrinsic Cech persistence diagrams for metric graphs using persistent homology. Together with complementary results by Adamaszek et. al, which imply results on intrinsic Cech persistence diagrams in all dimensions for a single cycle, our results constitute important steps toward characterizing intrinsic Cech persistence diagrams for arbitrary metric graphs across all dimensions.
△ Less
Submitted 7 July, 2017; v1 submitted 23 February, 2017;
originally announced February 2017.
-
Stratifying High Dimensional Data Based on Proximity to the Convex Hull Boundary
Authors:
Lori Ziegelmeier,
Michael Kirby,
Chris Peterson
Abstract:
The convex hull of a set of points, $C$, serves to expose extremal properties of $C$ and can help identify elements in $C$ of high interest. For many problems, particularly in the presence of noise, the true vertex set (and facets) may be difficult to determine. One solution is to expand the list of high interest candidates to points lying near the boundary of the convex hull. We propose a quadrat…
▽ More
The convex hull of a set of points, $C$, serves to expose extremal properties of $C$ and can help identify elements in $C$ of high interest. For many problems, particularly in the presence of noise, the true vertex set (and facets) may be difficult to determine. One solution is to expand the list of high interest candidates to points lying near the boundary of the convex hull. We propose a quadratic program for the purpose of stratifying points in a data cloud based on proximity to the boundary of the convex hull. For each data point, a quadratic program is solved to determine an associated weight vector. We show that the weight vector encodes geometric information concerning the point's relationship to the boundary of the convex hull. The computation of the weight vectors can be carried out in parallel, and for a fixed number of points and fixed neighborhood size, the overall computational complexity of the algorithm grows linearly with dimension. As a consequence, meaningful computations can be completed on reasonably large, high dimensional data sets.
△ Less
Submitted 4 November, 2016;
originally announced November 2016.
-
Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies
Authors:
Sofya Chepushtanova,
Michael Kirby,
Chris Peterson,
Lori Ziegelmeier
Abstract:
The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper we apply persistent homology to the characterization of gas plumes in time…
▽ More
The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper we apply persistent homology to the characterization of gas plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of 4-way arrays. We investigate hyperspectral movies of Long-Wavelength Infrared data monitoring an experimental release of chemical simulant into the air. Our approach models regions of interest within the hyperspectral data cubes as points on the real Grassmann manifold $G(k, n)$ (whose points parameterize the $k$-dimensional subspaces of $\mathbb{R}^n$), contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows a sequence of time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmann manifold. This motivates the search for topological features, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the Grassmann manifold. The proposed mathematical model affords the processing of large data sets while retaining valuable discriminatory information. In this paper, we discuss how embedding our data in the Grassmann manifold, together with topological data analysis, captures dynamical events that occur as the chemical plume is released and evolves.
△ Less
Submitted 11 July, 2016; v1 submitted 7 July, 2016;
originally announced July 2016.
-
Persistence Images: A Stable Vector Representation of Persistent Homology
Authors:
Henry Adams,
Sofya Chepushtanova,
Tegan Emerson,
Eric Hanson,
Michael Kirby,
Francis Motta,
Rachel Neville,
Chris Peterson,
Patrick Shipman,
Lori Ziegelmeier
Abstract:
Many datasets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a dataset. A useful representation of this homological information is a persistence diagram (PD). Effo…
▽ More
Many datasets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a dataset. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.
△ Less
Submitted 11 July, 2016; v1 submitted 22 July, 2015;
originally announced July 2015.
-
Topological Data Analysis of Biological Aggregation Models
Authors:
Chad M. Topaz,
Lori Ziegelmeier,
Tom Halverson
Abstract:
We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time…
▽ More
We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters.
△ Less
Submitted 11 March, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
Locally Linear Embedding Clustering Algorithm for Natural Imagery
Authors:
Lori Ziegelmeier,
Michael Kirby,
Chris Peterson
Abstract:
The ability to characterize the color content of natural imagery is an important application of image processing. The pixel by pixel coloring of images may be viewed naturally as points in color space, and the inherent structure and distribution of these points affords a quantization, through clustering, of the color information in the image. In this paper, we present a novel topologically driven…
▽ More
The ability to characterize the color content of natural imagery is an important application of image processing. The pixel by pixel coloring of images may be viewed naturally as points in color space, and the inherent structure and distribution of these points affords a quantization, through clustering, of the color information in the image. In this paper, we present a novel topologically driven clustering algorithm that permits segmentation of the color features in a digital image. The algorithm blends Locally Linear Embedding (LLE) and vector quantization by map** color information to a lower dimensional space, identifying distinct color regions, and classifying pixels together based on both a proximity measure and color content. It is observed that these techniques permit a significant reduction in color resolution while maintaining the visually important features of images.
△ Less
Submitted 20 February, 2012;
originally announced February 2012.