Search | arXiv e-print repository

doi 10.1093/gigascience/giad094

Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems

Abstract: Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Chara… ▽ More Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Characteristics Curves, for one parameter filtrations and Euler Characteristic Profiles, for multi-parameter filtrations. While being a weaker invariant in one dimension, we show that Euler Characteristic based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations and practical applicability for big data problems. In addition we show that the Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis. Lastly, to show their practical applicability, multiple use-cases are considered. △ Less

Submitted 11 August, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

Comments: 32 pages, 19 figures. Added remark on multicritical filtrations in section 4, typos corrected

Journal ref: GigaScience, Volume 12, 2023, giad094

arXiv:2109.00831 [pdf, other]

Mapper-type algorithms for complex data and relations

Authors: Paweł Dłotko, Davide Gurnari, Radmila Sazdanovic

Abstract: Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar-valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper const… ▽ More Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar-valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper constructions are combined to create a tool for comparing high dimensional data descriptors of a single dataset. This new hybrid algorithm, Mapper on Ball Mapper, is applicable to high dimensional lens functions. As a proof of concept we include applications to knot and game theory, as well as material science and cancer research. △ Less

Submitted 28 March, 2023; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: Changed title. Rewrote introduction and section 3. Added new examples and applications. 20 pages, 16 figures

MSC Class: 55N31; 55U10; 57N25; 57M27; 57M15

arXiv:2106.02538 [pdf, other]

Bottleneck Profiles and Discrete Prokhorov Metrics for Persistence Diagrams

Authors: Paweł Dłotko, Niklas Hellmer

Abstract: In topological data analysis (TDA), persistence diagrams have been a succesful tool. To compare them, Wasserstein and Bottleneck distances are commonly used. We address the shortcomings of these metrics and show a way to investigate them in a systematic way by introducing bottleneck profiles. This leads to a notion of discrete Prokhorov metrics for persistence diagrams as a generalization of the B… ▽ More In topological data analysis (TDA), persistence diagrams have been a succesful tool. To compare them, Wasserstein and Bottleneck distances are commonly used. We address the shortcomings of these metrics and show a way to investigate them in a systematic way by introducing bottleneck profiles. This leads to a notion of discrete Prokhorov metrics for persistence diagrams as a generalization of the Bottleneck distance. They satisfy a stability result and bounds with respect to Wasserstein metrics. We provide algorithms to compute the newly introduced quantities and end with an discussion about experiments. △ Less

Submitted 12 September, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: 34 pages, 12 figures; improved exposition. To appear in Discrete & Computational Geometry

arXiv:2012.01868 [pdf, other]

Hotspot identification for Mapper graphs

Authors: Ciara Frances Loughrey, Nick Orr, Anna Jurek-Loughrey, Paweł Dłotko

Abstract: Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters. The graph can be further annotated with additional colouring of vertices allowing location of regions of special interest. For instance, in many applications, such as precision medicine, Mapper graph has been used to identify unkn… ▽ More Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters. The graph can be further annotated with additional colouring of vertices allowing location of regions of special interest. For instance, in many applications, such as precision medicine, Mapper graph has been used to identify unknown compactly localized subareas within the dataset demonstrating unique or unusual behaviours. This task, performed so far by a researcher, can be automatized using hotspot analysis. In this work we propose a new algorithm for detecting hotspots in Mapper graphs. It allows automatizing of the hotspot detection process. We demonstrate the performance of the algorithm on a number of artificial and real world datasets. We further demonstrate how our algorithm can be used for the automatic selection of the Mapper lens functions. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: Topological Data Analysis and Beyond Workshop at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:2004.03282 [pdf, other]

Visualising the Evolution of English Covid-19 Cases with Topological Data Analysis Ball Mapper

Authors: Pawel Dlotko, Simon Rudkin

Abstract: Understanding disease spread through data visualisation has concentrated on trends and maps. Whilst these are helpful, they neglect important multi-dimensional interactions between characteristics of communities. Using the Topological Data Analysis Ball Mapper algorithm we construct an abstract representation of NUTS3 level economic data, overlaying onto it the confirmed cases of Covid-19 in Engla… ▽ More Understanding disease spread through data visualisation has concentrated on trends and maps. Whilst these are helpful, they neglect important multi-dimensional interactions between characteristics of communities. Using the Topological Data Analysis Ball Mapper algorithm we construct an abstract representation of NUTS3 level economic data, overlaying onto it the confirmed cases of Covid-19 in England. In so doing we may understand how the disease spreads on different socio-economical dimensions. It is observed that some areas of the characteristic space have quickly raced to the highest levels of infection, while others close by in the characteristic space, do not show large infection growth. Likewise, we see patterns emerging in very different areas that command more monitoring. A strong contribution for Topological Data Analysis, and the Ball Mapper algorithm especially, in comprehending dynamic epidemic data is signposted. △ Less

Submitted 18 April, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Updated to include April 17 2020

arXiv:1905.12118 [pdf, other]

Cyclicality, Periodicity and the Topology of Time Series

Authors: Paweł Dłotko, Wanling Qiu, Simon Rudkin

Abstract: Periodic and semi periodic patterns are very common in nature. In this paper we introduce a topological toolbox aiming in detecting and quantifying periodicity. The presented technique is of a general nature and may be employed wherever there is suspected cyclic behaviour in a time series with no trend. The approach is tested on a number of real-world examples enabling us to consistently demonstra… ▽ More Periodic and semi periodic patterns are very common in nature. In this paper we introduce a topological toolbox aiming in detecting and quantifying periodicity. The presented technique is of a general nature and may be employed wherever there is suspected cyclic behaviour in a time series with no trend. The approach is tested on a number of real-world examples enabling us to consistently demonstrate an ability to recognise periodic behaviour where conventional techniques fail to do so. Quicker to react to changes in time series behaviour, and with a high robustness to noise, the toolbox offers a powerful way to deeper understanding of time series dynamics. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1812.09245 [pdf, other]

Persistence Bag-of-Words for Topological Data Analysis

Authors: Bartosz Zieliński, Michał Lipiński, Mateusz Juda, Matthias Zeppelzauer, Paweł Dłotko

Abstract: Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with mac… ▽ More Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs). PDs exhibit, however, complex structure and are difficult to integrate in today's machine learning workflows. This paper introduces persistence bag-of-words: a novel and stable vectorized representation of PDs that enables the seamless integration with machine learning. Comprehensive experiments show that the new representation achieves state-of-the-art performance and beyond in much less time than alternative approaches. △ Less

Submitted 4 June, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: Accepted for the Twenty-Eight International Joint Conference on Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text overlap with arXiv:1802.04852

arXiv:1807.08607 [pdf, other]

Computational and applied topology, tutorial

Authors: Paweł Dłotko

Abstract: This is a tutorial in applied and computational topology and topological data analysis. It is illustrated with numerous computational examples that utilize Gudhi library. It is under constant development, so please do not consider this version as final. This is a tutorial in applied and computational topology and topological data analysis. It is illustrated with numerous computational examples that utilize Gudhi library. It is under constant development, so please do not consider this version as final. △ Less

Submitted 23 August, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

arXiv:1802.04852 [pdf, other]

Persistence Codebooks for Topological Data Analysis

Authors: Bartosz Zielinski, Michal Lipinski, Mateusz Juda, Matthias Zeppelzauer, Pawel Dlotko

Abstract: Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representa… ▽ More Persistent homology (PH) is a rigorous mathematical theory that provides a robust descriptor of data in the form of persistence diagrams (PDs) which are 2D multisets of points. Their variable size makes them, however, difficult to combine with typical machine learning workflows. In this paper we introduce persistence codebooks, a novel expressive and discriminative fixed-size vectorized representation of PDs. To this end, we adapt bag-of-words (BoW), vectors of locally aggregated descriptors (VLAD) and Fischer vectors (FV) for the quantization of PDs. Persistence codebooks represent PDs in a convenient way for machine learning and statistical analysis and have a number of favorable practical and theoretical properties including 1-Wasserstein stability. We evaluate the presented representations on several heterogeneous datasets and show their (high) discriminative power. Our approach achieves state-of-the-art performance and beyond in much less time than alternative approaches. △ Less

Submitted 13 June, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: minor update, remove heading

arXiv:1603.08432 [pdf, other]

Quantifying topological invariants of neuronal morphologies

Authors: Lida Kanari, Paweł Dłotko, Martina Scolamiero, Ran Levi, Julian Shillcock, Kathryn Hess, Henry Markram

Abstract: Nervous systems are characterized by neurons displaying a diversity of morphological shapes. Traditionally, different shapes have been qualitatively described based on visual inspection and quantitatively described based on morphometric parameters. Neither process provides a solid foundation for categorizing the various morphologies, a problem that is important in many fields. We propose a stable… ▽ More Nervous systems are characterized by neurons displaying a diversity of morphological shapes. Traditionally, different shapes have been qualitatively described based on visual inspection and quantitatively described based on morphometric parameters. Neither process provides a solid foundation for categorizing the various morphologies, a problem that is important in many fields. We propose a stable topological measure as a standardized descriptor for any tree-like morphology, which encodes its skeletal branching anatomy. More specifically it is a barcode of the branching tree as determined by a spherical filtration centered at the root or neuronal soma. This Topological Morphology Descriptor (TMD) allows for the discrimination of groups of random and neuronal trees at linear computational cost. △ Less

Submitted 28 March, 2016; originally announced March 2016.

Comments: 10 pages, 5 figures, conference or other essential info

arXiv:1501.00179 [pdf, other]

doi 10.1016/j.jsc.2016.03.009

A persistence landscapes toolbox for topological statistics

Authors: Peter Bubenik, Pawel Dlotko

Abstract: Topological data analysis provides a multiscale description of the geometry and topology of quantitative data. The persistence landscape is a topological summary that can be easily combined with tools from statistics and machine learning. We give efficient algorithms for calculating persistence landscapes, their averages, and distances between such averages. We discuss an implementation of these a… ▽ More Topological data analysis provides a multiscale description of the geometry and topology of quantitative data. The persistence landscape is a topological summary that can be easily combined with tools from statistics and machine learning. We give efficient algorithms for calculating persistence landscapes, their averages, and distances between such averages. We discuss an implementation of these algorithms and some related procedures. These are intended to facilitate the combination of statistics and machine learning with topological data analysis. We present an experiment showing that the low-dimensional persistence landscapes of points sampled from spheres (and boxes) of varying dimensions differ. △ Less

Submitted 28 August, 2015; v1 submitted 31 December, 2014; originally announced January 2015.

Comments: 24 pages

Journal ref: Journal of Symbolic Computation, Volume 78, January-February 2017, Pages 91-114

arXiv:1309.1628 [pdf, other]

doi 10.1109/TIP.2014.2348799

Topology preserving thinning for cell complexes

Authors: Paweł Dłotko, Ruben Specogna

Abstract: A topology preserving skeleton is a synthetic representation of an object that retains its topology and many of its significant morphological properties. The process of obtaining the skeleton, referred to as skeletonization or thinning, is a very active research area. It plays a central role in reducing the amount of information to be processed during image analysis and visualization, computer-aid… ▽ More A topology preserving skeleton is a synthetic representation of an object that retains its topology and many of its significant morphological properties. The process of obtaining the skeleton, referred to as skeletonization or thinning, is a very active research area. It plays a central role in reducing the amount of information to be processed during image analysis and visualization, computer-aided diagnosis or by pattern recognition algorithms. This paper introduces a novel topology preserving thinning algorithm which removes \textit{simple cells}---a generalization of simple points---of a given cell complex. The test for simple cells is based on \textit{acyclicity tables} automatically produced in advance with homology computations. Using acyclicity tables render the implementation of thinning algorithms straightforward. Moreover, the fact that tables are automatically filled for all possible configurations allows to rigorously prove the generality of the algorithm and to obtain fool-proof implementations. The novel approach enables, for the first time, according to our knowledge, to thin a general unstructured simplicial complex. Acyclicity tables for cubical and simplicial complexes and an open source implementation of the thinning algorithm are provided as additional material to allow their immediate use in the vast number of practical applications arising in medical imaging and beyond. △ Less

Submitted 25 February, 2014; v1 submitted 6 September, 2013; originally announced September 2013.

arXiv:1212.1360 [pdf, other]

Physics inspired algorithms for (co)homology computation

Authors: Paweł Dłotko, Ruben Specogna

Abstract: The issue of computing (co)homology generators of a cell complex is gaining a pivotal role in various branches of science. While this issue can be rigorously solved in polynomial time, it is still overly demanding for large scale problems. Drawing inspiration from low-frequency electrodynamics, this paper presents a physics inspired algorithm for first cohomology group computations on three-dimens… ▽ More The issue of computing (co)homology generators of a cell complex is gaining a pivotal role in various branches of science. While this issue can be rigorously solved in polynomial time, it is still overly demanding for large scale problems. Drawing inspiration from low-frequency electrodynamics, this paper presents a physics inspired algorithm for first cohomology group computations on three-dimensional complexes. The algorithm is general and exhibits orders of magnitude speed up with respect to competing ones, allowing to handle problems not addressable before. In particular, when generators are employed in the physical modeling of magneto-quasistatic problems, this algorithm solves one of the most long-lasting problems in low-frequency computational electromagnetics. In this case, the effectiveness of the algorithm and its ease of implementation may be even improved by introducing the novel concept of \textit{lazy cohomology generators}. △ Less

Submitted 6 December, 2012; originally announced December 2012.

arXiv:1210.1429 [pdf, other]

Computing homology and persistent homology using iterated Morse decomposition

Authors: Paweł Dłotko, Hubert Wagner

Abstract: In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular… ▽ More In this paper we present a new approach to computing homology (with field coefficients) and persistent homology. We use concepts from discrete Morse theory, to provide an algorithm which can be expressed solely in terms of simple graph theoretical operations. We use iterated Morse decomposition, which allows us to sidetrack many problems related to the standard discrete Morse theory. In particular, this approach is provably correct in any dimension. △ Less

Submitted 25 October, 2012; v1 submitted 4 October, 2012; originally announced October 2012.

Showing 1–14 of 14 results for author: Dlotko, P