Search | arXiv e-print repository

Quiver Laplacians and Feature Selection

Authors: Otto Sumray, Heather A. Harrington, Vidit Nanda

Abstract: The challenge of selecting the most relevant features of a given dataset arises ubiquitously in data analysis and dimensionality reduction. However, features found to be of high importance for the entire dataset may not be relevant to subsets of interest, and vice versa. Given a feature selector and a fixed decomposition of the data into subsets, we describe a method for identifying selected featu… ▽ More The challenge of selecting the most relevant features of a given dataset arises ubiquitously in data analysis and dimensionality reduction. However, features found to be of high importance for the entire dataset may not be relevant to subsets of interest, and vice versa. Given a feature selector and a fixed decomposition of the data into subsets, we describe a method for identifying selected features which are compatible with the decomposition into subsets. We achieve this by re-framing the problem of finding compatible features to one of finding sections of a suitable quiver representation. In order to approximate such sections, we then introduce a Laplacian operator for quiver representations valued in Hilbert spaces. We provide explicit bounds on how the spectrum of a quiver Laplacian changes when the representation and the underlying quiver are modified in certain natural ways. Finally, we apply this machinery to the study of peak-calling algorithms which measure chromatin accessibility in single-cell data. We demonstrate that eigenvectors of the associated quiver Laplacian yield locally and globally compatible features. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 40 pages, 7 figures

MSC Class: 16G20; 05C50; 62P05; 62H25

arXiv:2212.06505 [pdf, other]

Multiscale topology classifies and quantifies cell types in subcellular spatial transcriptomics

Authors: Katherine Benjamin, Aneesha Bhandari, Zhouchun Shang, Yanan Xing, Yanru An, Nannan Zhang, Yong Hou, Ulrike Tillmann, Katherine R. Bull, Heather A. Harrington

Abstract: Spatial transcriptomics has the potential to transform our understanding of RNA expression in tissues. Classical array-based technologies produce multiple-cell-scale measurements requiring deconvolution to recover single cell information. However, rapid advances in subcellular measurement of RNA expression at whole-transcriptome depth necessitate a fundamentally different approach. To integrate si… ▽ More Spatial transcriptomics has the potential to transform our understanding of RNA expression in tissues. Classical array-based technologies produce multiple-cell-scale measurements requiring deconvolution to recover single cell information. However, rapid advances in subcellular measurement of RNA expression at whole-transcriptome depth necessitate a fundamentally different approach. To integrate single-cell RNA-seq data with nanoscale spatial transcriptomics, we present a topological method for automatic cell type identification (TopACT). Unlike popular decomposition approaches to multicellular resolution data, TopACT is able to pinpoint the spatial locations of individual sparsely dispersed cells without prior knowledge of cell boundaries. Pairing TopACT with multiparameter persistent homology landscapes predicts immune cells forming a peripheral ring structure within kidney glomeruli in a murine model of lupus nephritis, which we experimentally validate with immunofluorescent imaging. The proposed topological data analysis unifies multiple biological scales, from subcellular gene expression to multicellular tissue organization. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: Main text: 8 pages, 4 figures. Supplement: 12 pages, 5 figures

MSC Class: 92-08; 55N31; 62R40; 68T09

arXiv:2206.07760 [pdf, other]

doi 10.3390/e24081116

Multiscale methods for signal selection in single-cell data

Authors: Renee S. Hoekzema, Lewis Marsh, Otto Sumray, Thomas M. Carroll, Xin Lu, Helen M. Byrne, Heather A. Harrington

Abstract: Analysis of single-cell transcriptomics often relies on clustering cells and then performing differential gene expression (DGE) to identify genes that vary between these clusters. These discrete analyses successfully determine cell types and markers; however, continuous variation within and between cell types may not be detected. We propose three topologically motivated mathematical methods for un… ▽ More Analysis of single-cell transcriptomics often relies on clustering cells and then performing differential gene expression (DGE) to identify genes that vary between these clusters. These discrete analyses successfully determine cell types and markers; however, continuous variation within and between cell types may not be detected. We propose three topologically motivated mathematical methods for unsupervised feature selection that consider discrete and continuous transcriptional patterns on an equal footing across multiple scales simultaneously. Eigenscores ($\text{eig}_i$) rank signals or genes based on their correspondence to low-frequency intrinsic patterning in the data using the spectral decomposition of the Laplacian graph. The multiscale Laplacian score (MLS) is an unsupervised method for locating relevant scales in data and selecting the genes that are coherently expressed at these respective scales. The persistent Rayleigh quotient (PRQ) takes data equipped with a filtration, allowing the separation of genes with different roles in a bifurcation process (e.g., pseudo-time). We demonstrate the utility of these techniques by applying them to published single-cell transcriptomics data sets. The methods validate previously identified genes and detect additional biologically meaningful genes with coherent expression patterns. By studying the interaction between gene signals and the geometry of the underlying space, the three methods give multidimensional rankings of the genes and visualisation of relationships between them. △ Less

Submitted 6 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: 32 pages, 15 figures, 1 table. Revised and published in Entropy, special issue Applications of Topological Data Analysis in the Life Sciences

Journal ref: Entropy 2022, 24(8), 1116

arXiv:2112.00688 [pdf, other]

Algebra, Geometry and Topology of ERK Kinetics

Authors: Lewis Marsh, Emilie Dufresne, Helen M. Byrne, Heather A. Harrington

Abstract: The MEK/ERK signalling pathway is involved in cell division, cell specialisation, survival and cell death. Here we study a polynomial dynamical system describing the dynamics of MEK/ERK proposed by Yeung et al. with their experimental setup, data and known biological information. The experimental dataset is a time-course of ERK measurements in different phosphorylation states following activation… ▽ More The MEK/ERK signalling pathway is involved in cell division, cell specialisation, survival and cell death. Here we study a polynomial dynamical system describing the dynamics of MEK/ERK proposed by Yeung et al. with their experimental setup, data and known biological information. The experimental dataset is a time-course of ERK measurements in different phosphorylation states following activation of either wild-type MEK or MEK mutations associated with cancer or developmental defects. We demonstrate how methods from computational algebraic geometry, differential algebra, Bayesian statistics and computational algebraic topology can inform the model reduction, identification and parameter inference of MEK variants, respectively. Throughout, we show how this algebraic viewpoint offers a rigorous and systematic analysis of such models. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2108.11640 [pdf, other]

Topological Approximate Bayesian Computation for Parameter Inference of an Angiogenesis Model

Authors: Thomas Thorne, Paul D. W. Kirk, Heather A. Harrington

Abstract: Inferring the parameters of models describing biological systems is an important problem in the reverse engineering of the mechanisms underlying these systems. Much work has focused on parameter inference of stochastic and ordinary differential equation models using Approximate Bayesian Computation (ABC). While there is some recent work on inference in spatial models, this remains an open problem.… ▽ More Inferring the parameters of models describing biological systems is an important problem in the reverse engineering of the mechanisms underlying these systems. Much work has focused on parameter inference of stochastic and ordinary differential equation models using Approximate Bayesian Computation (ABC). While there is some recent work on inference in spatial models, this remains an open problem. Simultaneously, advances in topological data analysis (TDA), a field of computational mathematics, have enabled spatial patterns in data to be characterised. Here we focus on recent work using topological data analysis to study different regimes of parameter space for a well-studied model of angiogenesis. We propose a method for combining TDA with ABC to infer parameters in the Anderson-Chaplain model of angiogenesis. We demonstrate that this topological approach outperforms ABC approaches that use simpler statistics based on spatial features of the data. This is a first step towards a general framework of spatial parameter inference for biological systems, for which there may be a variety of filtrations, vectorisations, and summary statistics to be considered. All code used to produce our results is available as a Snakemake workflow. △ Less

Submitted 8 November, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures. For associated code see: https://github.com/tt104/tabc_angio

arXiv:1612.08116 [pdf, other]

doi 10.1098/rsif.2018.0661

Tensor clustering with algebraic constraints gives interpretable groups of crosstalk mechanisms in breast cancer

Authors: Anna Seigal, Mariano Beguerisse-Díaz, Birgit Schoeberl, Mario Niepel, Heather A. Harrington

Abstract: We introduce a tensor-based clustering method to extract sparse, low-dimensional structure from high-dimensional, multi-indexed datasets. This framework is designed to enable detection of clusters of data in the presence of structural requirements which we encode as algebraic constraints in a linear program. Our clustering method is general and can be tailored to a variety of applications in scien… ▽ More We introduce a tensor-based clustering method to extract sparse, low-dimensional structure from high-dimensional, multi-indexed datasets. This framework is designed to enable detection of clusters of data in the presence of structural requirements which we encode as algebraic constraints in a linear program. Our clustering method is general and can be tailored to a variety of applications in science and industry. We illustrate our method on a collection of experiments measuring the response of genetically diverse breast cancer cell lines to an array of ligands. Each experiment consists of a cell line-ligand combination, and contains time-course measurements of the early-signalling kinases MAPK and AKT at two different ligand dose levels. By imposing appropriate structural constraints and respecting the multi-indexed structure of the data, the analysis of clusters can be optimized for biological interpretation and therapeutic understanding. We then perform a systematic, large-scale exploration of mechanistic models of MAPK-AKT crosstalk for each cluster. This analysis allows us to quantify the heterogeneity of breast cancer cell subtypes, and leads to hypotheses about the signalling mechanisms that mediate the response of the cell lines to ligands. △ Less

Submitted 8 February, 2019; v1 submitted 23 December, 2016; originally announced December 2016.

Comments: 22 pages, 12 figures, 4 tables

Journal ref: Journal of The Royal Society Interface, volume 16 (2019) issue 151, 20180661

arXiv:1603.09730 [pdf, other]

Differential Algebra for Model Comparison

Authors: Heather A. Harrington, Kenneth L. Ho, Nicolette Meshkat

Abstract: We present a method for rejecting competing models from noisy time-course data that does not rely on parameter inference. First we characterize ordinary differential equation models in only measurable variables using differential algebra elimination. Next we extract additional information from the given data using Gaussian Process Regression (GPR) and then transform the differential invariants. We… ▽ More We present a method for rejecting competing models from noisy time-course data that does not rely on parameter inference. First we characterize ordinary differential equation models in only measurable variables using differential algebra elimination. Next we extract additional information from the given data using Gaussian Process Regression (GPR) and then transform the differential invariants. We develop a test using linear algebra and statistics to reject transformed models with the given data in a parameter-free manner. This algorithm exploits the information about transients that is encoded in the model's structure. We demonstrate the power of this approach by discriminating between different models from mathematical biology. △ Less

Submitted 31 March, 2016; originally announced March 2016.

Comments: 17 pages

arXiv:1507.04331 [pdf, other]

Numerical algebraic geometry for model selection and its application to the life sciences

Authors: Elizabeth Gross, Brent Davis, Kenneth L. Ho, Daniel J. Bates, Heather A. Harrington

Abstract: Researchers working with mathematical models are often confronted by the related problems of parameter estimation, model validation, and model selection. These are all optimization problems, well-known to be challenging due to non-linearity, non-convexity and multiple local optima. Furthermore, the challenges are compounded when only partial data is available. Here, we consider polynomial models (… ▽ More Researchers working with mathematical models are often confronted by the related problems of parameter estimation, model validation, and model selection. These are all optimization problems, well-known to be challenging due to non-linearity, non-convexity and multiple local optima. Furthermore, the challenges are compounded when only partial data is available. Here, we consider polynomial models (e.g., mass-action chemical reaction networks at steady state) and describe a framework for their analysis based on optimization using numerical algebraic geometry. Specifically, we use probability-one polynomial homotopy continuation methods to compute all critical points of the objective function, then filter to recover the global optima. Our approach exploits the geometric structures relating models and data, and we demonstrate its utility on examples from cell signaling, synthetic biology, and epidemiology. △ Less

Submitted 1 April, 2016; v1 submitted 15 July, 2015; originally announced July 2015.

Comments: References added, additional clarifications

Showing 1–8 of 8 results for author: Harrington, H A