Search | arXiv e-print repository

Differentiability and Optimization of Multiparameter Persistent Homology

Authors: Luis Scoccola, Siddharth Setlur, David Loiseaux, Mathieu Carrière, Steve Oudot

Abstract: Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation map** a real-v… ▽ More Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation map** a real-valued function to its barcode is differentiable almost everywhere, and the convergence of gradient descent for losses using barcodes is relatively well understood. When optimizing a vector-valued function (the multiparameter setting), there is no unique choice of descriptor for multiparameter persistent homology, and many distinct descriptors have been proposed. This calls for the development of a general framework for differentiability and optimization that applies to a wide range of multiparameter homological descriptors. In this article, we develop such a framework and show that it encompasses well-known descriptors of different flavors, such as signed barcodes and the multiparameter persistence landscape. We complement the theory with numerical experiments supporting the idea that optimizing multiparameter homological descriptors can lead to improved performances compared to optimizing one-parameter descriptors, even when using the simplest and most efficiently computable multiparameter descriptors. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 13 pages + 13 page appendix, 8 figures. International Conference on Machine Learning, ICML 2024

arXiv:2306.03801 [pdf, ps, other]

Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures

Authors: David Loiseaux, Luis Scoccola, Mathieu Carrière, Magnus Bakke Botnan, Steve Oudot

Abstract: Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array o… ▽ More Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data. △ Less

Submitted 7 February, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 26 pages, 4 figures, 9 tables; v2: final version in NeurIPS 2023

arXiv:2212.07201 [pdf, other]

Toroidal Coordinates: Decorrelating Circular Coordinates With Lattice Reduction

Authors: Luis Scoccola, Hitesh Gakhar, Johnathan Bush, Nikolas Schonsheck, Tatum Rask, Ling Zhou, Jose A. Perea

Abstract: The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can b… ▽ More The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be "geometrically correlated" even if the chosen cohomology classes are linearly independent. It is shown in the original work that less correlated maps can be obtained with suitable integer linear combinations of the cohomology classes, with the linear combinations being chosen by inspection. In this paper, we identify a formal notion of geometric correlation between circle-valued maps which, in the Riemannian manifold case, corresponds to the Dirichlet form, a bilinear form derived from the Dirichlet energy. We describe a systematic procedure for constructing low energy torus-valued maps on data, starting from a set of linearly independent cohomology classes. We showcase our procedure with computational examples. Our main algorithm is based on the Lenstra--Lenstra--Lovász algorithm from computational number theory. △ Less

Submitted 15 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: 24 pages, 12 figures. To appear in proceedings of 39th International Symposium on Computational Geometry

arXiv:2208.00300 [pdf, ps, other]

On the bottleneck stability of rank decompositions of multi-parameter persistence modules

Authors: Magnus Bakke Botnan, Steffen Oppermann, Steve Oudot, Luis Scoccola

Abstract: A significant part of modern topological data analysis is concerned with the design and study of algebraic invariants of poset representations -- often referred to as multi-parameter persistence modules. One such invariant is the minimal rank decomposition, which encodes the ranks of all the structure morphisms of the persistence module by a single ordered pair of rectangle-decomposable modules, i… ▽ More A significant part of modern topological data analysis is concerned with the design and study of algebraic invariants of poset representations -- often referred to as multi-parameter persistence modules. One such invariant is the minimal rank decomposition, which encodes the ranks of all the structure morphisms of the persistence module by a single ordered pair of rectangle-decomposable modules, interpreted as a signed barcode. This signed barcode generalizes the concept of persistence barcode from one-parameter persistence to any number of parameters, raising the question of its bottleneck stability. We show in this paper that the minimal rank decomposition is not stable under the natural notion of signed bottleneck matching between signed barcodes. We remedy this by turning our focus to the rank exact decomposition, a related signed barcode induced by the minimal projective resolution of the module relative to the so-called rank exact structure, which we prove to be bottleneck stable under signed matchings. As part of our proof, we obtain two intermediate results of independent interest: we compute the global dimension of the rank exact structure on the category of finitely presentable multi-parameter persistence modules, and we prove a bottleneck stability result for hook-decomposable modules. We also give a bound for the size of the rank exact decomposition that is polynomial in the size of the usual minimal projective resolution, we prove a universality result for the dissimilarity function induced by the notion of signed matching, and we compute, in the two-parameter case, the global dimension of a different exact structure related to the upsets of the indexing poset. This set of results combines concepts from topological data analysis and from the representation theory of posets, and we believe is relevant to both areas. △ Less

Submitted 5 March, 2024; v1 submitted 30 July, 2022; originally announced August 2022.

Comments: 32 pages, 4 figures; v2: add details, fix typos and minor issues, improve exposition, add conjecture 5.30

arXiv:2206.06513 [pdf, other]

FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles

Authors: Luis Scoccola, Jose A. Perea

Abstract: Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the… ▽ More Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms. △ Less

Submitted 15 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 15 pages + 10 page appendix, 15 figures + 1 table. To appear in proceedings of 39th International Symposium on Computational Geometry

arXiv:2112.11901 [pdf, other]

doi 10.1137/22M1489150

On the stability of multigraded Betti numbers and Hilbert functions

Authors: Steve Oudot, Luis Scoccola

Abstract: Multigraded Betti numbers are one of the simplest invariants of multiparameter persistence modules. This invariant is useful in theory -- it completely determines the Hilbert function of the module and the isomorphism type of the free modules in its minimal free resolution -- as well as in practice -- it is easy to visualize and it is one of the main outputs of current multiparameter persistent ho… ▽ More Multigraded Betti numbers are one of the simplest invariants of multiparameter persistence modules. This invariant is useful in theory -- it completely determines the Hilbert function of the module and the isomorphism type of the free modules in its minimal free resolution -- as well as in practice -- it is easy to visualize and it is one of the main outputs of current multiparameter persistent homology software, such as RIVET. However, to the best of our knowledge, no bottleneck stability result with respect to the interleaving distance has been established for this invariant so far, and this potential lack of stability limits its practical applications. We prove a stability result for multigraded Betti numbers, using an efficiently computable bottleneck-type dissimilarity function we introduce. Our notion of matching is inspired by recent work on signed barcodes, and allows matching bars of the same module in homological degrees of different parity, in addition to matchings bars of different modules in homological degrees of the same parity. Our stability result is a combination of Hilbert's syzygy theorem, Bjerkevik's bottleneck stability for free modules, and a novel stability result for projective resolutions. We also prove, in the $2$-parameter case, a $1$-Wasserstein stability result for Hilbert functions with respect to the $1$-presentation distance of Bjerkevik and Lesnick. △ Less

Submitted 7 February, 2024; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: 24 pages, 4 figures; v2: adds section on efficient computability of lower bounds, section on consequences of main results, no-go result (Prop. 4), generalization of Thm. 1 (Thm. 26), and improves exposition; v3: adds several details and improves exposition; v4: minor clarifications and use numbering as in published version

MSC Class: 55N31; 62R40

Journal ref: SIAM Journal on Applied Algebra and Geometry. Vol. 8, Iss. 1 (2024)

arXiv:2104.07563 [pdf, other]

doi 10.1017/fms.2023.16

Approximate and discrete Euclidean vector bundles

Authors: Luis Scoccola, Jose A. Perea

Abstract: We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study Čech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to rep… ▽ More We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study Čech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to represent classical vector bundles when $\varepsilon > 0$ is sufficiently small. We also introduce distances between approximate vector bundles and use them to prove that sufficiently similar approximate vector bundles represent the same classical vector bundle. This gives a way of specifying vector bundles over finite simplicial complexes using a finite amount of data, and also allows for some tolerance to noise when working with vector bundles in an applied setting. As an example, we prove a reconstruction theorem for vector bundles from finite samples. We give algorithms for the effective computation of low-dimensional characteristic classes of vector bundles directly from discrete and approximate representations and illustrate the usage of these algorithms with computational examples. △ Less

Submitted 7 February, 2024; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: 56 pages, 9 figures; v2: improvements to exposition; v3: improvements to exposition, final version

MSC Class: Primary 55R99; 55N31; 68W05; Secondary 55U99

Journal ref: Forum of Mathematics, Sigma, Volume 11, 2023, e20

arXiv:2007.00167 [pdf, ps, other]

The Integers as a Higher Inductive Type

Authors: Thorsten Altenkirch, Luis Scoccola

Abstract: We consider the problem of defining the integers in Homotopy Type Theory (HoTT). We can define the type of integers as signed natural numbers (i.e., using a coproduct), but its induction principle is very inconvenient to work with, since it leads to an explosion of cases. An alternative is to use set-quotients, but here we need to use set-truncation to avoid non-trivial higher equalities. This res… ▽ More We consider the problem of defining the integers in Homotopy Type Theory (HoTT). We can define the type of integers as signed natural numbers (i.e., using a coproduct), but its induction principle is very inconvenient to work with, since it leads to an explosion of cases. An alternative is to use set-quotients, but here we need to use set-truncation to avoid non-trivial higher equalities. This results in a recursion principle that only allows us to define function into sets (types satisfying UIP). In this paper we consider higher inductive types using either a small universe or bi-invertible maps. These types represent integers without explicit set-truncation that are equivalent to the usual coproduct representation. This is an interesting example since it shows how some coherence problems can be handled in HoTT. We discuss some open questions triggered by this work. The proofs have been formally verified using cubical Agda. △ Less

Submitted 30 June, 2020; originally announced July 2020.

Comments: 11 pages

Journal ref: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science (pp. 67-73), 2020

arXiv:2005.09048 [pdf, other]

Stable and consistent density-based clustering via multiparameter persistence

Authors: Alexander Rolle, Luis Scoccola

Abstract: We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known metho… ▽ More We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark datasets, showing that it identifies multi-scale cluster structure in data. △ Less

Submitted 3 August, 2023; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 68 pages, 16 figures. v3: major changes to exposition, significant additions to content, some mathematical reformulations

MSC Class: 62H30 (Primary) 62R40 (Secondary)

arXiv:1902.01436 [pdf, other]

Visualization tools for parameter selection in cluster analysis

Authors: Alexander Rolle, Luis Scoccola

Abstract: We propose an algorithm, HPREF (Hierarchical Partitioning by Repeated Features), that produces a hierarchical partition of a set of clusterings of a fixed dataset, such as sets of clusterings produced by running a clustering algorithm with a range of parameters. This gives geometric structure to such sets of clustering, and can be used to visualize the set of results one obtains by running a clust… ▽ More We propose an algorithm, HPREF (Hierarchical Partitioning by Repeated Features), that produces a hierarchical partition of a set of clusterings of a fixed dataset, such as sets of clusterings produced by running a clustering algorithm with a range of parameters. This gives geometric structure to such sets of clustering, and can be used to visualize the set of results one obtains by running a clustering algorithm with a range of parameters. △ Less

Submitted 28 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: 6 pages

MSC Class: 62H30

Showing 1–10 of 10 results for author: Scoccola, L