-
Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems
Authors:
Paweł Dłotko,
Davide Gurnari
Abstract:
Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Chara…
▽ More
Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Characteristics Curves, for one parameter filtrations and Euler Characteristic Profiles, for multi-parameter filtrations. While being a weaker invariant in one dimension, we show that Euler Characteristic based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations and practical applicability for big data problems. In addition we show that the Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis. Lastly, to show their practical applicability, multiple use-cases are considered.
△ Less
Submitted 11 August, 2023; v1 submitted 3 December, 2022;
originally announced December 2022.
-
Mapper-type algorithms for complex data and relations
Authors:
Paweł Dłotko,
Davide Gurnari,
Radmila Sazdanovic
Abstract:
Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar-valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper const…
▽ More
Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar-valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper constructions are combined to create a tool for comparing high dimensional data descriptors of a single dataset. This new hybrid algorithm, Mapper on Ball Mapper, is applicable to high dimensional lens functions. As a proof of concept we include applications to knot and game theory, as well as material science and cancer research.
△ Less
Submitted 28 March, 2023; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Bottleneck Profiles and Discrete Prokhorov Metrics for Persistence Diagrams
Authors:
Paweł Dłotko,
Niklas Hellmer
Abstract:
In topological data analysis (TDA), persistence diagrams have been a succesful tool. To compare them, Wasserstein and Bottleneck distances are commonly used. We address the shortcomings of these metrics and show a way to investigate them in a systematic way by introducing bottleneck profiles. This leads to a notion of discrete Prokhorov metrics for persistence diagrams as a generalization of the B…
▽ More
In topological data analysis (TDA), persistence diagrams have been a succesful tool. To compare them, Wasserstein and Bottleneck distances are commonly used. We address the shortcomings of these metrics and show a way to investigate them in a systematic way by introducing bottleneck profiles. This leads to a notion of discrete Prokhorov metrics for persistence diagrams as a generalization of the Bottleneck distance. They satisfy a stability result and bounds with respect to Wasserstein metrics. We provide algorithms to compute the newly introduced quantities and end with an discussion about experiments.
△ Less
Submitted 12 September, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Hotspot identification for Mapper graphs
Authors:
Ciara Frances Loughrey,
Nick Orr,
Anna Jurek-Loughrey,
Paweł Dłotko
Abstract:
Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters. The graph can be further annotated with additional colouring of vertices allowing location of regions of special interest. For instance, in many applications, such as precision medicine, Mapper graph has been used to identify unkn…
▽ More
Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters. The graph can be further annotated with additional colouring of vertices allowing location of regions of special interest. For instance, in many applications, such as precision medicine, Mapper graph has been used to identify unknown compactly localized subareas within the dataset demonstrating unique or unusual behaviours. This task, performed so far by a researcher, can be automatized using hotspot analysis. In this work we propose a new algorithm for detecting hotspots in Mapper graphs. It allows automatizing of the hotspot detection process. We demonstrate the performance of the algorithm on a number of artificial and real world datasets. We further demonstrate how our algorithm can be used for the automatic selection of the Mapper lens functions.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Visualising the Evolution of English Covid-19 Cases with Topological Data Analysis Ball Mapper
Authors:
Pawel Dlotko,
Simon Rudkin
Abstract:
Understanding disease spread through data visualisation has concentrated on trends and maps. Whilst these are helpful, they neglect important multi-dimensional interactions between characteristics of communities. Using the Topological Data Analysis Ball Mapper algorithm we construct an abstract representation of NUTS3 level economic data, overlaying onto it the confirmed cases of Covid-19 in Engla…
▽ More
Understanding disease spread through data visualisation has concentrated on trends and maps. Whilst these are helpful, they neglect important multi-dimensional interactions between characteristics of communities. Using the Topological Data Analysis Ball Mapper algorithm we construct an abstract representation of NUTS3 level economic data, overlaying onto it the confirmed cases of Covid-19 in England. In so doing we may understand how the disease spreads on different socio-economical dimensions. It is observed that some areas of the characteristic space have quickly raced to the highest levels of infection, while others close by in the characteristic space, do not show large infection growth. Likewise, we see patterns emerging in very different areas that command more monitoring. A strong contribution for Topological Data Analysis, and the Ball Mapper algorithm especially, in comprehending dynamic epidemic data is signposted.
△ Less
Submitted 18 April, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Cyclicality, Periodicity and the Topology of Time Series
Authors:
Paweł Dłotko,
Wanling Qiu,
Simon Rudkin
Abstract:
Periodic and semi periodic patterns are very common in nature. In this paper we introduce a topological toolbox aiming in detecting and quantifying periodicity. The presented technique is of a general nature and may be employed wherever there is suspected cyclic behaviour in a time series with no trend. The approach is tested on a number of real-world examples enabling us to consistently demonstra…
▽ More
Periodic and semi periodic patterns are very common in nature. In this paper we introduce a topological toolbox aiming in detecting and quantifying periodicity. The presented technique is of a general nature and may be employed wherever there is suspected cyclic behaviour in a time series with no trend. The approach is tested on a number of real-world examples enabling us to consistently demonstrate an ability to recognise periodic behaviour where conventional techniques fail to do so. Quicker to react to changes in time series behaviour, and with a high robustness to noise, the toolbox offers a powerful way to deeper understanding of time series dynamics.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Persistence Bag-of-Words for Topological Data Analysis
Authors:
Bartosz Zieliński,
Michał Lipiński,
Mateusz Juda,
Matthias Zeppelzauer,
Paweł Dłotko
Abstract:
Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with mac…
▽ More
Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches.
△ Less
Submitted 4 June, 2019; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Computational and applied topology, tutorial
Authors:
Paweł Dłotko
Abstract:
This is a tutorial in applied and computational topology and topological data analysis. It is illustrated with numerous computational examples that utilize Gudhi library. It is under constant development, so please do not consider this version as final.
This is a tutorial in applied and computational topology and topological data analysis. It is illustrated with numerous computational examples that utilize Gudhi library. It is under constant development, so please do not consider this version as final.
△ Less
Submitted 23 August, 2018; v1 submitted 11 July, 2018;
originally announced July 2018.
-
Persistence Codebooks for Topological Data Analysis
Authors:
Bartosz Zielinski,
Michal Lipinski,
Mateusz Juda,
Matthias Zeppelzauer,
Pawel Dlotko
Abstract:
Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representa…
▽ More
Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representation of PDs. To this end, we adapt bag-of-words (BoW), vectors of locally aggregated descriptors (VLAD) and Fischer vectors (FV) for the quantization of PDs. Persistence codebooks represent PDs in a convenient way for machine learning and statistical analysis and have a number of favorable practical and theoretical properties including 1-Wasserstein stability. We evaluate the presented representations on several heterogeneous datasets and show their (high) discriminative power. Our approach achieves state-of-the-art performance and beyond in much less time than alternative approaches.
△ Less
Submitted 13 June, 2019; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Quantifying topological invariants of neuronal morphologies
Authors:
Lida Kanari,
Paweł Dłotko,
Martina Scolamiero,
Ran Levi,
Julian Shillcock,
Kathryn Hess,
Henry Markram
Abstract:
Nervous systems are characterized by neurons displaying a diversity of morphological shapes. Traditionally, different shapes have been qualitatively described based on visual inspection and quantitatively described based on morphometric parameters. Neither process provides a solid foundation for categorizing the various morphologies, a problem that is important in many fields. We propose a stable…
▽ More
Nervous systems are characterized by neurons displaying a diversity of morphological shapes. Traditionally, different shapes have been qualitatively described based on visual inspection and quantitatively described based on morphometric parameters. Neither process provides a solid foundation for categorizing the various morphologies, a problem that is important in many fields. We propose a stable topological measure as a standardized descriptor for any tree-like morphology, which encodes its skeletal branching anatomy. More specifically it is a barcode of the branching tree as determined by a spherical filtration centered at the root or neuronal soma. This Topological Morphology Descriptor (TMD) allows for the discrimination of groups of random and neuronal trees at linear computational cost.
△ Less
Submitted 28 March, 2016;
originally announced March 2016.
-
A persistence landscapes toolbox for topological statistics
Authors:
Peter Bubenik,
Pawel Dlotko
Abstract:
Topological data analysis provides a multiscale description of the geometry and topology of quantitative data. The persistence landscape is a topological summary that can be easily combined with tools from statistics and machine learning. We give efficient algorithms for calculating persistence landscapes, their averages, and distances between such averages. We discuss an implementation of these a…
▽ More
Topological data analysis provides a multiscale description of the geometry and topology of quantitative data. The persistence landscape is a topological summary that can be easily combined with tools from statistics and machine learning. We give efficient algorithms for calculating persistence landscapes, their averages, and distances between such averages. We discuss an implementation of these algorithms and some related procedures. These are intended to facilitate the combination of statistics and machine learning with topological data analysis. We present an experiment showing that the low-dimensional persistence landscapes of points sampled from spheres (and boxes) of varying dimensions differ.
△ Less
Submitted 28 August, 2015; v1 submitted 31 December, 2014;
originally announced January 2015.
-
Topology preserving thinning for cell complexes
Authors:
Paweł Dłotko,
Ruben Specogna
Abstract:
A topology preserving skeleton is a synthetic representation of an object that retains its topology and many of its significant morphological properties. The process of obtaining the skeleton, referred to as skeletonization or thinning, is a very active research area. It plays a central role in reducing the amount of information to be processed during image analysis and visualization, computer-aid…
▽ More
A topology preserving skeleton is a synthetic representation of an object that retains its topology and many of its significant morphological properties. The process of obtaining the skeleton, referred to as skeletonization or thinning, is a very active research area. It plays a central role in reducing the amount of information to be processed during image analysis and visualization, computer-aided diagnosis or by pattern recognition algorithms.
This paper introduces a novel topology preserving thinning algorithm which removes \textit{simple cells}---a generalization of simple points---of a given cell complex. The test for simple cells is based on \textit{acyclicity tables} automatically produced in advance with homology computations. Using acyclicity tables render the implementation of thinning algorithms straightforward. Moreover, the fact that tables are automatically filled for all possible configurations allows to rigorously prove the generality of the algorithm and to obtain fool-proof implementations. The novel approach enables, for the first time, according to our knowledge, to thin a general unstructured simplicial complex. Acyclicity tables for cubical and simplicial complexes and an open source implementation of the thinning algorithm are provided as additional material to allow their immediate use in the vast number of practical applications arising in medical imaging and beyond.
△ Less
Submitted 25 February, 2014; v1 submitted 6 September, 2013;
originally announced September 2013.
-
Physics inspired algorithms for (co)homology computation
Authors:
Paweł Dłotko,
Ruben Specogna
Abstract:
The issue of computing (co)homology generators of a cell complex is gaining a pivotal role in various branches of science. While this issue can be rigorously solved in polynomial time, it is still overly demanding for large scale problems. Drawing inspiration from low-frequency electrodynamics, this paper presents a physics inspired algorithm for first cohomology group computations on three-dimens…
▽ More
The issue of computing (co)homology generators of a cell complex is gaining a pivotal role in various branches of science. While this issue can be rigorously solved in polynomial time, it is still overly demanding for large scale problems. Drawing inspiration from low-frequency electrodynamics, this paper presents a physics inspired algorithm for first cohomology group computations on three-dimensional complexes. The algorithm is general and exhibits orders of magnitude speed up with respect to competing ones, allowing to handle problems not addressable before. In particular, when generators are employed in the physical modeling of magneto-quasistatic problems, this algorithm solves one of the most long-lasting problems in low-frequency computational electromagnetics. In this case, the effectiveness of the algorithm and its ease of implementation may be even improved by introducing the novel concept of \textit{lazy cohomology generators}.
△ Less
Submitted 6 December, 2012;
originally announced December 2012.
-
Computing homology and persistent homology using iterated Morse decomposition
Authors:
Paweł Dłotko,
Hubert Wagner
Abstract:
In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular…
▽ More
In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular, this approach is provably correct in any dimension.
△ Less
Submitted 25 October, 2012; v1 submitted 4 October, 2012;
originally announced October 2012.