-
Differentiability and Optimization of Multiparameter Persistent Homology
Authors:
Luis Scoccola,
Siddharth Setlur,
David Loiseaux,
Mathieu Carrière,
Steve Oudot
Abstract:
Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation map** a real-v…
▽ More
Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation map** a real-valued function to its barcode is differentiable almost everywhere, and the convergence of gradient descent for losses using barcodes is relatively well understood. When optimizing a vector-valued function (the multiparameter setting), there is no unique choice of descriptor for multiparameter persistent homology, and many distinct descriptors have been proposed. This calls for the development of a general framework for differentiability and optimization that applies to a wide range of multiparameter homological descriptors. In this article, we develop such a framework and show that it encompasses well-known descriptors of different flavors, such as signed barcodes and the multiparameter persistence landscape. We complement the theory with numerical experiments supporting the idea that optimizing multiparameter homological descriptors can lead to improved performances compared to optimizing one-parameter descriptors, even when using the simplest and most efficiently computable multiparameter descriptors.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
Authors:
David Loiseaux,
Luis Scoccola,
Mathieu Carrière,
Magnus Bakke Botnan,
Steve Oudot
Abstract:
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array o…
▽ More
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
△ Less
Submitted 7 February, 2024; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Toroidal Coordinates: Decorrelating Circular Coordinates With Lattice Reduction
Authors:
Luis Scoccola,
Hitesh Gakhar,
Johnathan Bush,
Nikolas Schonsheck,
Tatum Rask,
Ling Zhou,
Jose A. Perea
Abstract:
The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can b…
▽ More
The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be "geometrically correlated" even if the chosen cohomology classes are linearly independent. It is shown in the original work that less correlated maps can be obtained with suitable integer linear combinations of the cohomology classes, with the linear combinations being chosen by inspection. In this paper, we identify a formal notion of geometric correlation between circle-valued maps which, in the Riemannian manifold case, corresponds to the Dirichlet form, a bilinear form derived from the Dirichlet energy. We describe a systematic procedure for constructing low energy torus-valued maps on data, starting from a set of linearly independent cohomology classes. We showcase our procedure with computational examples. Our main algorithm is based on the Lenstra--Lenstra--Lovász algorithm from computational number theory.
△ Less
Submitted 15 March, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
On the bottleneck stability of rank decompositions of multi-parameter persistence modules
Authors:
Magnus Bakke Botnan,
Steffen Oppermann,
Steve Oudot,
Luis Scoccola
Abstract:
A significant part of modern topological data analysis is concerned with the design and study of algebraic invariants of poset representations -- often referred to as multi-parameter persistence modules. One such invariant is the minimal rank decomposition, which encodes the ranks of all the structure morphisms of the persistence module by a single ordered pair of rectangle-decomposable modules, i…
▽ More
A significant part of modern topological data analysis is concerned with the design and study of algebraic invariants of poset representations -- often referred to as multi-parameter persistence modules. One such invariant is the minimal rank decomposition, which encodes the ranks of all the structure morphisms of the persistence module by a single ordered pair of rectangle-decomposable modules, interpreted as a signed barcode. This signed barcode generalizes the concept of persistence barcode from one-parameter persistence to any number of parameters, raising the question of its bottleneck stability. We show in this paper that the minimal rank decomposition is not stable under the natural notion of signed bottleneck matching between signed barcodes. We remedy this by turning our focus to the rank exact decomposition, a related signed barcode induced by the minimal projective resolution of the module relative to the so-called rank exact structure, which we prove to be bottleneck stable under signed matchings. As part of our proof, we obtain two intermediate results of independent interest: we compute the global dimension of the rank exact structure on the category of finitely presentable multi-parameter persistence modules, and we prove a bottleneck stability result for hook-decomposable modules. We also give a bound for the size of the rank exact decomposition that is polynomial in the size of the usual minimal projective resolution, we prove a universality result for the dissimilarity function induced by the notion of signed matching, and we compute, in the two-parameter case, the global dimension of a different exact structure related to the upsets of the indexing poset. This set of results combines concepts from topological data analysis and from the representation theory of posets, and we believe is relevant to both areas.
△ Less
Submitted 5 March, 2024; v1 submitted 30 July, 2022;
originally announced August 2022.
-
FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles
Authors:
Luis Scoccola,
Jose A. Perea
Abstract:
Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the…
▽ More
Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.
△ Less
Submitted 15 March, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
On the stability of multigraded Betti numbers and Hilbert functions
Authors:
Steve Oudot,
Luis Scoccola
Abstract:
Multigraded Betti numbers are one of the simplest invariants of multiparameter persistence modules. This invariant is useful in theory -- it completely determines the Hilbert function of the module and the isomorphism type of the free modules in its minimal free resolution -- as well as in practice -- it is easy to visualize and it is one of the main outputs of current multiparameter persistent ho…
▽ More
Multigraded Betti numbers are one of the simplest invariants of multiparameter persistence modules. This invariant is useful in theory -- it completely determines the Hilbert function of the module and the isomorphism type of the free modules in its minimal free resolution -- as well as in practice -- it is easy to visualize and it is one of the main outputs of current multiparameter persistent homology software, such as RIVET. However, to the best of our knowledge, no bottleneck stability result with respect to the interleaving distance has been established for this invariant so far, and this potential lack of stability limits its practical applications. We prove a stability result for multigraded Betti numbers, using an efficiently computable bottleneck-type dissimilarity function we introduce. Our notion of matching is inspired by recent work on signed barcodes, and allows matching bars of the same module in homological degrees of different parity, in addition to matchings bars of different modules in homological degrees of the same parity. Our stability result is a combination of Hilbert's syzygy theorem, Bjerkevik's bottleneck stability for free modules, and a novel stability result for projective resolutions. We also prove, in the $2$-parameter case, a $1$-Wasserstein stability result for Hilbert functions with respect to the $1$-presentation distance of Bjerkevik and Lesnick.
△ Less
Submitted 7 February, 2024; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Approximate and discrete Euclidean vector bundles
Authors:
Luis Scoccola,
Jose A. Perea
Abstract:
We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study Čech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to rep…
▽ More
We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study Čech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to represent classical vector bundles when $\varepsilon > 0$ is sufficiently small. We also introduce distances between approximate vector bundles and use them to prove that sufficiently similar approximate vector bundles represent the same classical vector bundle. This gives a way of specifying vector bundles over finite simplicial complexes using a finite amount of data, and also allows for some tolerance to noise when working with vector bundles in an applied setting. As an example, we prove a reconstruction theorem for vector bundles from finite samples. We give algorithms for the effective computation of low-dimensional characteristic classes of vector bundles directly from discrete and approximate representations and illustrate the usage of these algorithms with computational examples.
△ Less
Submitted 7 February, 2024; v1 submitted 15 April, 2021;
originally announced April 2021.
-
The Integers as a Higher Inductive Type
Authors:
Thorsten Altenkirch,
Luis Scoccola
Abstract:
We consider the problem of defining the integers in Homotopy Type Theory (HoTT). We can define the type of integers as signed natural numbers (i.e., using a coproduct), but its induction principle is very inconvenient to work with, since it leads to an explosion of cases. An alternative is to use set-quotients, but here we need to use set-truncation to avoid non-trivial higher equalities. This res…
▽ More
We consider the problem of defining the integers in Homotopy Type Theory (HoTT). We can define the type of integers as signed natural numbers (i.e., using a coproduct), but its induction principle is very inconvenient to work with, since it leads to an explosion of cases. An alternative is to use set-quotients, but here we need to use set-truncation to avoid non-trivial higher equalities. This results in a recursion principle that only allows us to define function into sets (types satisfying UIP). In this paper we consider higher inductive types using either a small universe or bi-invertible maps. These types represent integers without explicit set-truncation that are equivalent to the usual coproduct representation. This is an interesting example since it shows how some coherence problems can be handled in HoTT. We discuss some open questions triggered by this work. The proofs have been formally verified using cubical Agda.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Stable and consistent density-based clustering via multiparameter persistence
Authors:
Alexander Rolle,
Luis Scoccola
Abstract:
We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known metho…
▽ More
We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark datasets, showing that it identifies multi-scale cluster structure in data.
△ Less
Submitted 3 August, 2023; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Visualization tools for parameter selection in cluster analysis
Authors:
Alexander Rolle,
Luis Scoccola
Abstract:
We propose an algorithm, HPREF (Hierarchical Partitioning by Repeated Features), that produces a hierarchical partition of a set of clusterings of a fixed dataset, such as sets of clusterings produced by running a clustering algorithm with a range of parameters. This gives geometric structure to such sets of clustering, and can be used to visualize the set of results one obtains by running a clust…
▽ More
We propose an algorithm, HPREF (Hierarchical Partitioning by Repeated Features), that produces a hierarchical partition of a set of clusterings of a fixed dataset, such as sets of clusterings produced by running a clustering algorithm with a range of parameters. This gives geometric structure to such sets of clustering, and can be used to visualize the set of results one obtains by running a clustering algorithm with a range of parameters.
△ Less
Submitted 28 September, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.