-
Few Shot Learning for the Classification of Confocal Laser Endomicroscopy Images of Head and Neck Tumors
Authors:
Marc Aubreville,
Zhaoya Pan,
Matti Sievert,
Jonas Ammeling,
Jonathan Ganz,
Nicolai Oetter,
Florian Stelzle,
Ann-Kathrin Frenken,
Katharina Breininger,
Miguel Goncalves
Abstract:
The surgical removal of head and neck tumors requires safe margins, which are usually confirmed intraoperatively by means of frozen sections. This method is, in itself, an oversampling procedure, which has a relatively low sensitivity compared to the definitive tissue analysis on paraffin-embedded sections. Confocal laser endomicroscopy (CLE) is an in-vivo imaging technique that has shown its pote…
▽ More
The surgical removal of head and neck tumors requires safe margins, which are usually confirmed intraoperatively by means of frozen sections. This method is, in itself, an oversampling procedure, which has a relatively low sensitivity compared to the definitive tissue analysis on paraffin-embedded sections. Confocal laser endomicroscopy (CLE) is an in-vivo imaging technique that has shown its potential in the live optical biopsy of tissue. An automated analysis of this notoriously difficult to interpret modality would help surgeons. However, the images of CLE show a wide variability of patterns, caused both by individual factors but also, and most strongly, by the anatomical structures of the imaged tissue, making it a challenging pattern recognition task. In this work, we evaluate four popular few shot learning (FSL) methods towards their capability of generalizing to unseen anatomical domains in CLE images. We evaluate this on images of sinunasal tumors (SNT) from five patients and on images of the vocal folds (VF) from 11 patients using a cross-validation scheme. The best respective approach reached a median accuracy of 79.6% on the rather homogeneous VF dataset, but only of 61.6% for the highly diverse SNT dataset. Our results indicate that FSL on CLE images is viable, but strongly affected by the number of patients, as well as the diversity of anatomical patterns.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Pull-back Geometry of Persistent Homology Encodings
Authors:
Shuang Liang,
Renata Turkeš,
Jiayi Li,
Nina Otter,
Guido Montúfar
Abstract:
Persistent homology (PH) is a method for generating topology-inspired representations of data. Empirical studies that investigate the properties of PH, such as its sensitivity to perturbations or ability to detect a feature of interest, commonly rely on training and testing an additional model on the basis of the PH representation. To gain more intrinsic insights about PH, independently of the cho…
▽ More
Persistent homology (PH) is a method for generating topology-inspired representations of data. Empirical studies that investigate the properties of PH, such as its sensitivity to perturbations or ability to detect a feature of interest, commonly rely on training and testing an additional model on the basis of the PH representation. To gain more intrinsic insights about PH, independently of the choice of such a model, we propose a novel methodology based on the pull-back geometry that a PH encoding induces on the data manifold. The spectrum and eigenvectors of the induced metric help to identify the most and least significant information captured by PH. Furthermore, the pull-back norm of tangent vectors provides insights about the sensitivity of PH to a given perturbation, or its potential to detect a given feature of interest, and in turn its ability to solve a given classification or regression problem. Experimentally, the insights gained through our methodology align well with the existing knowledge about PH. Moreover, we show that the pull-back norm correlates with the performance on downstream tasks, and can therefore guide the choice of a suitable PH encoding.
△ Less
Submitted 3 March, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
On the effectiveness of persistent homology
Authors:
Renata Turkeš,
Guido Montúfar,
Nina Otter
Abstract:
Persistent homology (PH) is one of the most popular methods in Topological Data Analysis. Even though PH has been used in many different types of applications, the reasons behind its success remain elusive; in particular, it is not known for which classes of problems it is most effective, or to what extent it can detect geometric or topological features. The goal of this work is to identify some t…
▽ More
Persistent homology (PH) is one of the most popular methods in Topological Data Analysis. Even though PH has been used in many different types of applications, the reasons behind its success remain elusive; in particular, it is not known for which classes of problems it is most effective, or to what extent it can detect geometric or topological features. The goal of this work is to identify some types of problems where PH performs well or even better than other methods in data analysis. We consider three fundamental shape analysis tasks: the detection of the number of holes, curvature and convexity from 2D and 3D point clouds sampled from shapes. Experiments demonstrate that PH is successful in these tasks, outperforming several baselines, including PointNet, an architecture inspired precisely by the properties of point clouds. In addition, we observe that PH remains effective for limited computational resources and limited training data, as well as out-of-distribution test data, including various data transformations and noise. For convexity detection, we provide a theoretical guarantee that PH is effective for this task in $\mathbb{R}^d$, and demonstrate the detection of a convexity measure on the FLAVIA data set of plant leaf images. Due to the crucial role of shape classification in understanding mathematical and physical structures and objects, and in many applications, the findings of this work will provide some knowledge about the types of problems that are appropriate for PH, so that it can - to borrow the words from Wigner 1960 - ``remain valid in future research, and extend, to our pleasure", but to our lesser bafflement, to a variety of applications.
△ Less
Submitted 16 January, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Alpha magnitude
Authors:
Miguel O'Malley,
Sara Kalisnik,
Nina Otter
Abstract:
Magnitude is an isometric invariant for metric spaces that was introduced by Leinster around 2010, and is currently the object of intense research, since it has been shown to encode many known invariants of metric spaces. In recent work, Govc and Hepworth introduced persistent magnitude, a numerical invariant of a filtered simplicial complex associated to a metric space. Inspired by Govc and Hepwo…
▽ More
Magnitude is an isometric invariant for metric spaces that was introduced by Leinster around 2010, and is currently the object of intense research, since it has been shown to encode many known invariants of metric spaces. In recent work, Govc and Hepworth introduced persistent magnitude, a numerical invariant of a filtered simplicial complex associated to a metric space. Inspired by Govc and Hepworth's definition, we introduce alpha magnitude and investigate some of its key properties. Heuristic observations lead us to conjecture a relationship with the Minkowski dimension of compact subspaces of Euclidean space. Finally, alpha magnitude presents computational advantages over both magnitude as well as Rips magnitude, and we thus propose it as a new measure for the estimation of fractal dimensions of real-world data sets that is easily computable.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Amplitudes in persistence theory
Authors:
Barbara Giunti,
John S. Nolan,
Nina Otter,
Lukas Waas
Abstract:
The use of persistent homology in applications is justified by the validity of certain stability results. At the core of such results is a notion of distance between the invariants that one associates with data sets. Here we introduce a general framework to compare distances and invariants in multiparameter persistence, where there is no natural choice of invariants and distances between them. We…
▽ More
The use of persistent homology in applications is justified by the validity of certain stability results. At the core of such results is a notion of distance between the invariants that one associates with data sets. Here we introduce a general framework to compare distances and invariants in multiparameter persistence, where there is no natural choice of invariants and distances between them. We define amplitudes, monotone, and subadditive invariants that arise from assigning a non-negative real number to objects of an abelian category. We then present different ways to associate distances to such invariants, and we provide a classification of classes of amplitudes relevant to topological data analysis. In addition, we study the the relationships as well as the discriminitative power of such amplitude distances arising in topological data analysis scenarios.
△ Less
Submitted 12 July, 2024; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Weisfeiler and Lehman Go Cellular: CW Networks
Authors:
Cristian Bodnar,
Fabrizio Frasca,
Nina Otter,
Yu Guang Wang,
Pietro Liò,
Guido Montúfar,
Michael Bronstein
Abstract:
Graph Neural Networks (GNNs) are limited in their expressive power, struggle with long-range interactions and lack a principled way to model higher-order structures. These problems can be attributed to the strong coupling between the computational graph and the input graph structure. The recently proposed Message Passing Simplicial Networks naturally decouple these elements by performing message p…
▽ More
Graph Neural Networks (GNNs) are limited in their expressive power, struggle with long-range interactions and lack a principled way to model higher-order structures. These problems can be attributed to the strong coupling between the computational graph and the input graph structure. The recently proposed Message Passing Simplicial Networks naturally decouple these elements by performing message passing on the clique complex of the graph. Nevertheless, these models can be severely constrained by the rigid combinatorial structure of Simplicial Complexes (SCs). In this work, we extend recent theoretical results on SCs to regular Cell Complexes, topological objects that flexibly subsume SCs and graphs. We show that this generalisation provides a powerful set of graph "lifting" transformations, each leading to a unique hierarchical message passing procedure. The resulting methods, which we collectively call CW Networks (CWNs), are strictly more powerful than the WL test and not less powerful than the 3-WL test. In particular, we demonstrate the effectiveness of one such scheme, based on rings, when applied to molecular graph problems. The proposed architecture benefits from provably larger expressivity than commonly used GNNs, principled modelling of higher-order signals and from compressing the distances between nodes. We demonstrate that our model achieves state-of-the-art results on a variety of molecular datasets.
△ Less
Submitted 31 January, 2022; v1 submitted 23 June, 2021;
originally announced June 2021.
-
A topological perspective on weather regimes
Authors:
Kristian Strommen,
Matthew Chantry,
Joshua Dorrington,
Nina Otter
Abstract:
It has long been suggested that the mid-latitude atmospheric circulation possesses what has come to be known as `weather regimes', loosely categorised as regions of phase space with above-average density and/or extended persistence. Their existence and behaviour has been extensively studied in meteorology and climate science, due to their potential for drastically simplifying the complex and chaot…
▽ More
It has long been suggested that the mid-latitude atmospheric circulation possesses what has come to be known as `weather regimes', loosely categorised as regions of phase space with above-average density and/or extended persistence. Their existence and behaviour has been extensively studied in meteorology and climate science, due to their potential for drastically simplifying the complex and chaotic mid-latitude dynamics. Several well-known, simple non-linear dynamical systems have been used as toy-models of the atmosphere in order to understand and exemplify such regime behaviour. Nevertheless, no agreed-upon and clear-cut definition of a `regime' exists in the literature. We argue here for an approach which equates the existence of regimes in a dynamical system with the existence of non-trivial topological structure of the system's attractor. We show using persistent homology, an algorithmic tool in topological data analysis, that this approach is computationally tractable, practically informative, and identifies the relevant regime structure across a range of examples.
△ Less
Submitted 6 September, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks
Authors:
Cristian Bodnar,
Fabrizio Frasca,
Yu Guang Wang,
Nina Otter,
Guido Montúfar,
Pietro Liò,
Michael Bronstein
Abstract:
The pairwise interaction paradigm of graph machine learning has predominantly governed the modelling of relational systems. However, graphs alone cannot capture the multi-level interactions present in many complex systems and the expressive power of such schemes was proven to be limited. To overcome these limitations, we propose Message Passing Simplicial Networks (MPSNs), a class of models that p…
▽ More
The pairwise interaction paradigm of graph machine learning has predominantly governed the modelling of relational systems. However, graphs alone cannot capture the multi-level interactions present in many complex systems and the expressive power of such schemes was proven to be limited. To overcome these limitations, we propose Message Passing Simplicial Networks (MPSNs), a class of models that perform message passing on simplicial complexes (SCs). To theoretically analyse the expressivity of our model we introduce a Simplicial Weisfeiler-Lehman (SWL) colouring procedure for distinguishing non-isomorphic SCs. We relate the power of SWL to the problem of distinguishing non-isomorphic graphs and show that SWL and MPSNs are strictly more powerful than the WL test and not less powerful than the 3-WL test. We deepen the analysis by comparing our model with traditional graph neural networks (GNNs) with ReLU activations in terms of the number of linear regions of the functions they can represent. We empirically support our theoretical claims by showing that MPSNs can distinguish challenging strongly regular graphs for which GNNs fail and, when equipped with orientation equivariant layers, they can improve classification accuracy in oriented SCs compared to a GNN baseline.
△ Less
Submitted 14 June, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Can neural networks learn persistent homology features?
Authors:
Guido Montúfar,
Nina Otter,
Yuguang Wang
Abstract:
Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applic…
▽ More
Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
A unified framework for equivalences in social networks
Authors:
Nina Otter,
Mason A. Porter
Abstract:
A key concern in network analysis is the study of social positions and roles of actors in a network. The notion of "position" refers to an equivalence class of nodes that have similar ties to other nodes, whereas a "role" is an equivalence class of compound relations that connect the same pairs of nodes. An open question in network science is whether it is possible to simultaneously perform role a…
▽ More
A key concern in network analysis is the study of social positions and roles of actors in a network. The notion of "position" refers to an equivalence class of nodes that have similar ties to other nodes, whereas a "role" is an equivalence class of compound relations that connect the same pairs of nodes. An open question in network science is whether it is possible to simultaneously perform role and positional analysis. Motivated by the principle of functoriality in category theory we propose a new method that allows to tie role and positional analysis together. We illustrate our methods on two well-studied data sets in network science.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Transferability of Deep Learning Algorithms for Malignancy Detection in Confocal Laser Endomicroscopy Images from Different Anatomical Locations of the Upper Gastrointestinal Tract
Authors:
Marc Aubreville,
Miguel Goncalves,
Christian Knipfer,
Nicolai Oetter,
Helmut Neumann,
Florian Stelzle,
Christopher Bohr,
Andreas Maier
Abstract:
Squamous Cell Carcinoma (SCC) is the most common cancer type of the epithelium and is often detected at a late stage. Besides invasive diagnosis of SCC by means of biopsy and histo-pathologic assessment, Confocal Laser Endomicroscopy (CLE) has emerged as noninvasive method that was successfully used to diagnose SCC in vivo. For interpretation of CLE images, however, extensive training is required,…
▽ More
Squamous Cell Carcinoma (SCC) is the most common cancer type of the epithelium and is often detected at a late stage. Besides invasive diagnosis of SCC by means of biopsy and histo-pathologic assessment, Confocal Laser Endomicroscopy (CLE) has emerged as noninvasive method that was successfully used to diagnose SCC in vivo. For interpretation of CLE images, however, extensive training is required, which limits its applicability and use in clinical practice of the method. To aid diagnosis of SCC in a broader scope, automatic detection methods have been proposed. This work compares two methods with regard to their applicability in a transfer learning sense, i.e. training on one tissue type (from one clinical team) and applying the learnt classification system to another entity (different anatomy, different clinical team). Besides a previously proposed, patch-based method based on convolutional neural networks, a novel classification method on image level (based on a pre-trained Inception V.3 network with dedicated preprocessing and interpretation of class activation maps) is proposed and evaluated. The newly presented approach improves recognition performance, yielding accuracies of 91.63% on the first data set (oral cavity) and 92.63% on a joint data set. The generalization from oral cavity to the second data set (vocal folds) lead to similar area-under-the-ROC curve values than a direct training on the vocal folds data set, indicating good generalization.
△ Less
Submitted 3 January, 2020; v1 submitted 24 February, 2019;
originally announced February 2019.
-
Magnitude meets persistence. Homology theories for filtered simplicial sets
Authors:
Nina Otter
Abstract:
The Euler characteristic is an invariant of a topological space that in a precise sense captures its canonical notion of size, akin to the cardinality of a set. The Euler characteristic is closely related to the homology of a space, as it can be expressed as the alternating sum of its Betti numbers, whenever the sum is well-defined. Thus, one says that homology categorifies the Euler characteristi…
▽ More
The Euler characteristic is an invariant of a topological space that in a precise sense captures its canonical notion of size, akin to the cardinality of a set. The Euler characteristic is closely related to the homology of a space, as it can be expressed as the alternating sum of its Betti numbers, whenever the sum is well-defined. Thus, one says that homology categorifies the Euler characteristic. In his work on the generalisation of cardinality-like invariants, Leinster introduced the magnitude of a metric space, a real number that counts the "effective number of points" of the space and has been shown to encode many invariants of metric spaces from integral geometry and geometric measure theory. In 2015, Hepworth and Willerton introduced a homology theory for metric graphs, called magnitude homology, which categorifies the magnitude of a finite metric graph. This work was subsequently generalised to enriched categories by Leinster and Shulman, and the homology theory that they introduced categorifies magnitude for arbitrary finite metric spaces. When studying a metric space, one is often only interested in the metric space up to a rescaling of the distance of the points by a non-negative real number. The magnitude function describes how the effective number of points changes as one scales the distance, and is completely encoded by magnitude homology. When studying a finite metric space in topological data analysis using persistent homology, one approximates the space through a nested sequence of simplicial complexes so as to recover topological information about the space by studying the homology of this sequence. Here we relate magnitude homology and persistent homology as two different ways of computing homology of filtered simplicial sets.
△ Less
Submitted 24 August, 2021; v1 submitted 4 July, 2018;
originally announced July 2018.
-
Motion Artifact Detection in Confocal Laser Endomicroscopy Images
Authors:
Maike P. Stoeve,
Marc Aubreville,
Nicolai Oetter,
Christian Knipfer,
Helmut Neumann,
Florian Stelzle,
Andreas Maier
Abstract:
Confocal Laser Endomicroscopy (CLE), an optical imaging technique allowing non-invasive examination of the mucosa on a (sub)cellular level, has proven to be a valuable diagnostic tool in gastroenterology and shows promising results in various anatomical regions including the oral cavity. Recently, the feasibility of automatic carcinoma detection for CLE images of sufficient quality was shown. Howe…
▽ More
Confocal Laser Endomicroscopy (CLE), an optical imaging technique allowing non-invasive examination of the mucosa on a (sub)cellular level, has proven to be a valuable diagnostic tool in gastroenterology and shows promising results in various anatomical regions including the oral cavity. Recently, the feasibility of automatic carcinoma detection for CLE images of sufficient quality was shown. However, in real world data sets a high amount of CLE images is corrupted by artifacts. Amongst the most prevalent artifact types are motion-induced image deteriorations. In the scope of this work, algorithmic approaches for the automatic detection of motion artifact-tainted image regions were developed. Hence, this work provides an important step towards clinical applicability of automatic carcinoma detection. Both, conventional machine learning and novel, deep learning-based approaches were assessed. The deep learning-based approach outperforms the conventional approaches, attaining an AUC of 0.90.
△ Less
Submitted 4 May, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
Stratifying multiparameter persistent homology
Authors:
Heather A. Harrington,
Nina Otter,
Hal Schenck,
Ulrike Tillmann
Abstract:
A fundamental tool in topological data analysis is persistent homology, which allows extraction of information from complex datasets in a robust way. Persistent homology assigns a module over a principal ideal domain to a one-parameter family of spaces obtained from the data. In applications data often depend on several parameters, and in this case one is interested in studying the persistent homo…
▽ More
A fundamental tool in topological data analysis is persistent homology, which allows extraction of information from complex datasets in a robust way. Persistent homology assigns a module over a principal ideal domain to a one-parameter family of spaces obtained from the data. In applications data often depend on several parameters, and in this case one is interested in studying the persistent homology of a multiparameter family of spaces associated to the data. While the theory of persistent homology for one-parameter families is well-understood, the situation for multiparameter families is more delicate. Following Carlsson and Zomorodian we recast the problem in the setting of multigraded algebra, and we propose multigraded Hilbert series, multigraded associated primes and local cohomology as invariants for studying multiparameter persistent homology. Multigraded associated primes provide a stratification of the region where a multigraded module does not vanish, while multigraded Hilbert series and local cohomology give a measure of the size of components of the module supported on different strata. These invariants generalize in a suitable sense the invariant for the one-parameter case.
△ Less
Submitted 18 June, 2019; v1 submitted 24 August, 2017;
originally announced August 2017.
-
Patch-based Carcinoma Detection on Confocal Laser Endomicroscopy Images -- A Cross-Site Robustness Assessment
Authors:
Marc Aubreville,
Miguel Goncalves,
Christian Knipfer,
Nicolai Oetter,
Tobias Wuerfl,
Helmut Neumann,
Florian Stelzle,
Christopher Bohr,
Andreas Maier
Abstract:
Deep learning technologies such as convolutional neural networks (CNN) provide powerful methods for image recognition and have recently been employed in the field of automated carcinoma detection in confocal laser endomicroscopy (CLE) images. CLE is a (sub-)surface microscopic imaging technique that reaches magnifications of up to 1000x and is thus suitable for in vivo structural tissue analysis.…
▽ More
Deep learning technologies such as convolutional neural networks (CNN) provide powerful methods for image recognition and have recently been employed in the field of automated carcinoma detection in confocal laser endomicroscopy (CLE) images. CLE is a (sub-)surface microscopic imaging technique that reaches magnifications of up to 1000x and is thus suitable for in vivo structural tissue analysis. In this work, we aim to evaluate the prospects of a priorly developed deep learning-based algorithm targeted at the identification of oral squamous cell carcinoma with regard to its generalization to further anatomic locations of squamous cell carcinomas in the area of head and neck. We applied the algorithm on images acquired from the vocal fold area of five patients with histologically verified squamous cell carcinoma and presumably healthy control images of the clinically normal contra-lateral vocal cord. We find that the network trained on the oral cavity data reaches an accuracy of 89.45% and an area-under-the-curve (AUC) value of 0.955, when applied on the vocal cords data. Compared to the state of the art, we achieve very similar results, yet with an algorithm that was trained on a completely disjunct data set. Concatenating both data sets yielded further improvements in cross-validation with an accuracy of 90.81% and AUC of 0.970. In this study, for the first time to our knowledge, a deep learning mechanism for the identification of oral carcinomas using CLE Images could be applied to other disciplines in the area of head and neck. This study shows the prospect of the algorithmic approach to generalize well on other malignant entities of the head and neck, regardless of the anatomical location and furthermore in an examiner-independent manner.
△ Less
Submitted 3 January, 2020; v1 submitted 25 July, 2017;
originally announced July 2017.
-
Automatic Classification of Cancerous Tissue in Laserendomicroscopy Images of the Oral Cavity using Deep Learning
Authors:
Marc Aubreville,
Christian Knipfer,
Nicolai Oetter,
Christian Jaremenko,
Erik Rodner,
Joachim Denzler,
Christopher Bohr,
Helmut Neumann,
Florian Stelzle,
Andreas Maier
Abstract:
Oral Squamous Cell Carcinoma (OSCC) is a common type of cancer of the oral epithelium. Despite their high impact on mortality, sufficient screening methods for early diagnosis of OSCC often lack accuracy and thus OSCCs are mostly diagnosed at a late stage. Early detection and accurate outline estimation of OSCCs would lead to a better curative outcome and an reduction in recurrence rates after sur…
▽ More
Oral Squamous Cell Carcinoma (OSCC) is a common type of cancer of the oral epithelium. Despite their high impact on mortality, sufficient screening methods for early diagnosis of OSCC often lack accuracy and thus OSCCs are mostly diagnosed at a late stage. Early detection and accurate outline estimation of OSCCs would lead to a better curative outcome and an reduction in recurrence rates after surgical treatment.
Confocal Laser Endomicroscopy (CLE) records sub-surface micro-anatomical images for in vivo cell structure analysis. Recent CLE studies showed great prospects for a reliable, real-time ultrastructural imaging of OSCC in situ.
We present and evaluate a novel automatic approach for a highly accurate OSCC diagnosis using deep learning technologies on CLE images. The method is compared against textural feature-based machine learning approaches that represent the current state of the art.
For this work, CLE image sequences (7894 images) from patients diagnosed with OSCC were obtained from 4 specific locations in the oral cavity, including the OSCC lesion. The present approach is found to outperform the state of the art in CLE image recognition with an area under the curve (AUC) of 0.96 and a mean accuracy of 88.3% (sensitivity 86.6%, specificity 90%).
△ Less
Submitted 10 March, 2017; v1 submitted 5 March, 2017;
originally announced March 2017.
-
Operads and Phylogenetic Trees
Authors:
John C. Baez,
Nina Otter
Abstract:
We construct an operad $\mathrm{Phyl}$ whose operations are the edge-labelled trees used in phylogenetics. This operad is the coproduct of $\mathrm{Com}$, the operad for commutative semigroups, and $[0,\infty)$, the operad with unary operations corresponding to nonnegative real numbers, where composition is addition. We show that there is a homeomorphism between the space of $n$-ary operations of…
▽ More
We construct an operad $\mathrm{Phyl}$ whose operations are the edge-labelled trees used in phylogenetics. This operad is the coproduct of $\mathrm{Com}$, the operad for commutative semigroups, and $[0,\infty)$, the operad with unary operations corresponding to nonnegative real numbers, where composition is addition. We show that there is a homeomorphism between the space of $n$-ary operations of $\mathrm{Phyl}$ and $\mathcal{T}_n\times [0,\infty)^{n+1}$, where $\mathcal{T}_n$ is the space of metric $n$-trees introduced by Billera, Holmes and Vogtmann. Furthermore, we show that the Markov models used to reconstruct phylogenetic trees from genome data give coalgebras of $\mathrm{Phyl}$. These always extend to coalgebras of the larger operad $\mathrm{Com} + [0,\infty]$, since Markov processes on finite sets converge to an equilibrium as time approaches infinity. We show that for any operad $O$, its coproduct with $[0,\infty]$ contains the operad $W(O)$ constucted by Boardman and Vogt. To prove these results, we explicitly describe the coproduct of operads in terms of labelled trees.
△ Less
Submitted 3 August, 2017; v1 submitted 10 December, 2015;
originally announced December 2015.
-
A roadmap for the computation of persistent homology
Authors:
Nina Otter,
Mason A. Porter,
Ulrike Tillmann,
Peter Grindrod,
Heather A. Harrington
Abstract:
Persistent homology (PH) is a method used in topological data analysis (TDA) to study qualitative features of data that persist across multiple scales. It is robust to perturbations of input data, independent of dimensions and coordinates, and provides a compact representation of the qualitative features of the input. The computation of PH is an open area with numerous important and fascinating ch…
▽ More
Persistent homology (PH) is a method used in topological data analysis (TDA) to study qualitative features of data that persist across multiple scales. It is robust to perturbations of input data, independent of dimensions and coordinates, and provides a compact representation of the qualitative features of the input. The computation of PH is an open area with numerous important and fascinating challenges. The field of PH computation is evolving rapidly, and new algorithms and software implementations are being updated and released at a rapid pace. The purposes of our article are to (1) introduce theory and computational methods for PH to a broad range of computational scientists and (2) provide benchmarks of state-of-the-art implementations for the computation of PH. We give a friendly introduction to PH, navigate the pipeline for the computation of PH with an eye towards applications, and use a range of synthetic and real-world data sets to evaluate currently available open-source implementations for the computation of PH. Based on our benchmarking, we indicate which algorithms and implementations are best suited to different types of data sets. In an accompanying tutorial, we provide guidelines for the computation of PH. We make publicly available all scripts that we wrote for the tutorial, and we make available the processed version of the data sets used in the benchmarking.
△ Less
Submitted 12 September, 2017; v1 submitted 29 June, 2015;
originally announced June 2015.