Search | arXiv e-print repository

Hypergraph Topological Features for Autoencoder-Based Intrusion Detection for Cybersecurity Data

Authors: Bill Kay, Sinan G. Aksoy, Molly Baird, Daniel M. Best, Helen Jenne, Cliff Joslyn, Christopher Potvin, Gregory Henselman-Petrusek, Garret Seppala, Stephen J. Young, Emilie Purvine

Abstract: In this position paper, we argue that when hypergraphs are used to capture multi-way local relations of data, their resulting topological features describe global behaviour. Consequently, these features capture complex correlations that can then serve as high fidelity inputs to autoencoder-driven anomaly detection pipelines. We propose two such potential pipelines for cybersecurity data, one that… ▽ More In this position paper, we argue that when hypergraphs are used to capture multi-way local relations of data, their resulting topological features describe global behaviour. Consequently, these features capture complex correlations that can then serve as high fidelity inputs to autoencoder-driven anomaly detection pipelines. We propose two such potential pipelines for cybersecurity data, one that uses an autoencoder directly to determine network intrusions, and one that de-noises input data for a persistent homology system, PHANTOM. We provide heuristic justification for the use of the methods described therein for an intrusion detection pipeline for cyber data. We conclude by showing a small example over synthetic cyber attack data. △ Less

Submitted 9 November, 2023; originally announced December 2023.

MSC Class: 55N31

arXiv:2311.16154 [pdf]

Step** out of Flatland: Discovering Behavior Patterns as Topological Structures in Cyber Hypergraphs

Authors: Helen Jenne, Sinan G. Aksoy, Daniel Best, Alyson Bittner, Gregory Henselman-Petrusek, Cliff Joslyn, Bill Kay, Audun Myers, Garret Seppala, Jackson Warley, Stephen J. Young, Emilie Purvine

Abstract: Data breaches and ransomware attacks occur so often that they have become part of our daily news cycle. This is due to a myriad of factors, including the increasing number of internet-of-things devices, shift to remote work during the pandemic, and advancement in adversarial techniques, which all contribute to the increase in both the complexity of data captured and the challenge of protecting our… ▽ More Data breaches and ransomware attacks occur so often that they have become part of our daily news cycle. This is due to a myriad of factors, including the increasing number of internet-of-things devices, shift to remote work during the pandemic, and advancement in adversarial techniques, which all contribute to the increase in both the complexity of data captured and the challenge of protecting our networks. At the same time, cyber research has made strides, leveraging advances in machine learning and natural language processing to focus on identifying sophisticated attacks that are known to evade conventional measures. While successful, the shortcomings of these methods, particularly the lack of interpretability, are inherent and difficult to overcome. Consequently, there is an ever-increasing need to develop new tools for analyzing cyber data to enable more effective attack detection. In this paper, we present a novel framework based in the theory of hypergraphs and topology to understand data from cyber networks through topological signatures, which are both flexible and can be traced back to the log data. While our approach's mathematical grounding requires some technical development, this pays off in interpretability, which we will demonstrate with concrete examples in a large-scale cyber network dataset. These examples are an introduction to the broader possibilities that lie ahead; our goal is to demonstrate the value of applying methods from the burgeoning fields of hypernetwork science and applied topology to understand relationships among behaviors in cyber data. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 18 pages, 11 figures. This paper is written for a general audience

MSC Class: 55N31

arXiv:2310.11626 [pdf, other]

HyperNetX: A Python package for modeling complex network data as hypergraphs

Authors: Brenda Praggastis, Sinan Aksoy, Dustin Arendt, Mark Bonicillo, Cliff Joslyn, Emilie Purvine, Madelyn Shapiro, Ji Young Yun

Abstract: HyperNetX (HNX) is an open source Python library for the analysis and visualization of complex network data modeled as hypergraphs. Initially released in 2019, HNX facilitates exploratory data analysis of complex networks using algebraic topology, combinatorics, and generalized hypergraph and graph theoretical methods on structured data inputs. With its 2023 release, the library supports attaching… ▽ More HyperNetX (HNX) is an open source Python library for the analysis and visualization of complex network data modeled as hypergraphs. Initially released in 2019, HNX facilitates exploratory data analysis of complex networks using algebraic topology, combinatorics, and generalized hypergraph and graph theoretical methods on structured data inputs. With its 2023 release, the library supports attaching metadata, numerical and categorical, to nodes (vertices) and hyperedges, as well as to node-hyperedge pairings (incidences). HNX has a customizable Matplotlib-based visualization module as well as HypernetX-Widget, its JavaScript addon for interactive exploration and visualization of hypergraphs within Jupyter Notebooks. Both packages are available on GitHub and PyPI. With a growing community of users and collaborators, HNX has become a preeminent tool for hypergraph analysis. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 3 pages, 2 figures

arXiv:2309.08010 [pdf, other]

Malicious Cyber Activity Detection Using Zigzag Persistence

Authors: Audun Myers, Alyson Bittner, Sinan Aksoy, Daniel M. Best, Gregory Henselman-Petrusek, Helen Jenne, Cliff Joslyn, Bill Kay, Garret Seppala, Stephen J. Young, Emilie Purvine

Abstract: In this study we synthesize zigzag persistence from topological data analysis with autoencoder-based approaches to detect malicious cyber activity and derive analytic insights. Cybersecurity aims to safeguard computers, networks, and servers from various forms of malicious attacks, including network damage, data theft, and activity monitoring. Here we focus on the detection of malicious activity u… ▽ More In this study we synthesize zigzag persistence from topological data analysis with autoencoder-based approaches to detect malicious cyber activity and derive analytic insights. Cybersecurity aims to safeguard computers, networks, and servers from various forms of malicious attacks, including network damage, data theft, and activity monitoring. Here we focus on the detection of malicious activity using log data. To do this we consider the dynamics of the data by exploring the changing topology of a hypergraph representation gaining insights into the underlying activity. Hypergraphs provide a natural representation of cyber log data by capturing complex interactions between processes. To study the changing topology we use zigzag persistence which captures how topological features persist at multiple dimensions over time. We observe that the resulting barcodes represent malicious activity differently than benign activity. To automate this detection we implement an autoencoder trained on a vectorization of the resulting zigzag persistence barcodes. Our experimental results demonstrate the effectiveness of the autoencoder in detecting malicious activity in comparison to standard summary statistics. Overall, this study highlights the potential of zigzag persistence and its combination with temporal hypergraphs for analyzing cybersecurity log data and detecting malicious behavior. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.06634 [pdf, other]

$G$-Mapper: Learning a Cover in the Mapper Construction

Authors: Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh Palande, Sarah Percival, Emilie Purvine

Abstract: The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cove… ▽ More The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly fast. △ Less

Submitted 4 March, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2302.02857 [pdf, other]

Topological Analysis of Temporal Hypergraphs

Authors: Audun Myers, Cliff Joslyn, Bill Kay, Emilie Purvine, Gregory Roek, Madelyn Shapiro

Abstract: In this work we study the topological properties of temporal hypergraphs. Hypergraphs provide a higher dimensional generalization of a graph that is capable of capturing multi-way connections. As such, they have become an integral part of network science. A common use of hypergraphs is to model events as hyperedges in which the event can involve many elements as nodes. This provides a more complet… ▽ More In this work we study the topological properties of temporal hypergraphs. Hypergraphs provide a higher dimensional generalization of a graph that is capable of capturing multi-way connections. As such, they have become an integral part of network science. A common use of hypergraphs is to model events as hyperedges in which the event can involve many elements as nodes. This provides a more complete picture of the event, which is not limited by the standard dyadic connections of a graph. However, a common attribution to events is temporal information as an interval for when the event occurred. Consequently, a temporal hypergraph is born, which accurately captures both the temporal information of events and their multi-way connections. Common tools for studying these temporal hypergraphs typically capture changes in the underlying dynamics with summary statistics of snapshots sampled in a sliding window procedure. However, these tools do not characterize the evolution of hypergraph structure over time, nor do they provide insight on persistent components which are influential to the underlying system. To alleviate this need, we leverage zigzag persistence from the field of Topological Data Analysis (TDA) to study the change in topological structure of time-evolving hypergraphs. We apply our pipeline to both a cyber security and social network dataset and show how the topological structure of their temporal hypergraphs change and can be used to understand the underlying dynamics. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2212.00222 [pdf, other]

Experimental Observations of the Topology of Convolutional Neural Network Activations

Authors: Emilie Purvine, Davis Brown, Brett Jefferson, Cliff Joslyn, Brenda Praggastis, Archit Rathore, Madelyn Shapiro, Bei Wang, Youjia Zhou

Abstract: Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal repres… ▽ More Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden-layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers, and we discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight into how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: Accepted at AAAI 2023. This version includes supplementary material

arXiv:2208.06894 [pdf, other]

The SVD of Convolutional Weights: A CNN Interpretability Framework

Authors: Brenda Praggastis, Davis Brown, Carlos Ortiz Marrero, Emilie Purvine, Madelyn Shapiro, Bei Wang

Abstract: Deep neural networks used for image classification often use convolutional filters to extract distinguishing features before passing them to a linear classifier. Most interpretability literature focuses on providing semantic meaning to convolutional filters to explain a model's reasoning process and confirm its use of relevant information from the input domain. Fully connected layers can be studie… ▽ More Deep neural networks used for image classification often use convolutional filters to extract distinguishing features before passing them to a linear classifier. Most interpretability literature focuses on providing semantic meaning to convolutional filters to explain a model's reasoning process and confirm its use of relevant information from the input domain. Fully connected layers can be studied by decomposing their weight matrices using a singular value decomposition, in effect studying the correlations between the rows in each matrix to discover the dynamics of the map. In this work we define a singular value decomposition for the weight tensor of a convolutional layer, which provides an analogous understanding of the correlations between filters, exposing the dynamics of the convolutional map. We validate our definition using recent results in random matrix theory. By applying the decomposition across the linear layers of an image classification network we suggest a framework against which interpretability methods might be applied using hypergraphs to model class separation. Rather than looking to the activations to explain the network, we use the singular vectors with the greatest corresponding singular values for each linear layer to identify those features most important to the network. We illustrate our approach with examples and introduce the DeepDataProfiler library, the analysis tool used for this study. △ Less

Submitted 14 August, 2022; originally announced August 2022.

MSC Class: 68T07; 68T01; 05C65

arXiv:2204.01142

Proceedings of TDA: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Workshop at SDM 2022

Authors: R. W. R. Darling, John A. Emanuello, Emilie Purvine, Ahmad Ridley

Abstract: Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural la… ▽ More Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in develo** and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces. △ Less

Submitted 14 April, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

arXiv:2105.10414 [pdf, other]

Sheaves as a Framework for Understanding and Interpreting Model Fit

Authors: Henry Kvinge, Brett Jefferson, Cliff Joslyn, Emilie Purvine

Abstract: As data grows in size and complexity, finding frameworks which aid in interpretation and analysis has become critical. This is particularly true when data comes from complex systems where extensive structure is available, but must be drawn from peripheral sources. In this paper we argue that in such situations, sheaves can provide a natural framework to analyze how well a statistical model fits at… ▽ More As data grows in size and complexity, finding frameworks which aid in interpretation and analysis has become critical. This is particularly true when data comes from complex systems where extensive structure is available, but must be drawn from peripheral sources. In this paper we argue that in such situations, sheaves can provide a natural framework to analyze how well a statistical model fits at the local level (that is, on subsets of related datapoints) vs the global level (on all the data). The sheaf-based approach that we propose is suitably general enough to be useful in a range of applications, from analyzing sensor networks to understanding the feature space of a deep learning model. △ Less

Submitted 21 May, 2021; originally announced May 2021.

Comments: 12 page

arXiv:2104.11214 [pdf, other]

Topological Simplifications of Hypergraphs

Authors: Youjia Zhou, Archit Rathore, Emilie Purvine, Bei Wang

Abstract: We study hypergraph visualization via its topological simplification. We explore both vertex simplification and hyperedge simplification of hypergraphs using tools from topological data analysis. In particular, we transform a hypergraph to its graph representations known as the line graph and clique expansion. A topological simplification of such a graph representation induces a simplification of… ▽ More We study hypergraph visualization via its topological simplification. We explore both vertex simplification and hyperedge simplification of hypergraphs using tools from topological data analysis. In particular, we transform a hypergraph to its graph representations known as the line graph and clique expansion. A topological simplification of such a graph representation induces a simplification of the hypergraph. In simplifying a hypergraph, we allow vertices to be combined if they belong to almost the same set of hyperedges, and hyperedges to be merged if they share almost the same set of vertices. Our proposed approaches are general, mathematically justifiable, and they put vertex simplification and hyperedge simplification in a unifying framework. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2011.08952 [pdf, other]

Argumentative Topology: Finding Loop(holes) in Logic

Authors: Sarah Tymochko, Zachary New, Lucius Bynum, Emilie Purvine, Timothy Doster, Julien Chaput, Tegan Emerson

Abstract: Advances in natural language processing have resulted in increased capabilities with respect to multiple tasks. One of the possible causes of the observed performance gains is the introduction of increasingly sophisticated text representations. While many of the new word embedding techniques can be shown to capture particular notions of sentiment or associative structures, we explore the ability o… ▽ More Advances in natural language processing have resulted in increased capabilities with respect to multiple tasks. One of the possible causes of the observed performance gains is the introduction of increasingly sophisticated text representations. While many of the new word embedding techniques can be shown to capture particular notions of sentiment or associative structures, we explore the ability of two different word embeddings to uncover or capture the notion of logical shape in text. To this end we present a novel framework that we call Topological Word Embeddings which leverages mathematical techniques in dynamical system analysis and data driven shape extraction (i.e. topological data analysis). In this preliminary work we show that using a topological delay embedding we are able to capture and extract a different, shape-based notion of logic aimed at answering the question "Can we find a circle in a circular argument?" △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2010.03068 [pdf, other]

Hypergraph Models of Biological Networks to Identify Genes Critical to Pathogenic Viral Response

Authors: Song Feng, Emily Heath, Brett Jefferson, Cliff Joslyn, Henry Kvinge, Hugh D. Mitchell, Brenda Praggastis, Amie J. Eisfeld, Amy C. Sims, Larissa B. Thackray, Shufang Fan, Kevin B. Walters, Peter J. Halfmann, Danielle Westhoff-Smith, Qing Tan, Vineet D. Menachery, Timothy P. Sheahan, Adam S. Cockrell, Jacob F. Kocher, Kelly G. Stratton, Natalie C. Heller, Lisa M. Bramer, Michael S. Diamond, Ralph S. Baric, Katrina M. Waters , et al. (3 additional authors not shown)

Abstract: Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and… ▽ More Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and have shown promise in modeling systems such as protein complexes and metabolic reactions. In this paper we seek to understand how hypergraphs can more faithfully identify, and potentially predict, important genes based on complex relationships inferred from genomic expression data sets. Results: We compiled a novel data set of transcriptional host response to pathogenic viral infections and formulated relationships between genes as a hypergraph where hyperedges represent significantly perturbed genes, and vertices represent individual biological samples with specific experimental conditions. We find that hypergraph betweenness centrality is a superior method for identification of genes important to viral response when compared with graph centrality. Conclusions: Our results demonstrate the utility of using hypergraphs to represent complex biological systems and highlight central important responses in common to a variety of highly pathogenic viruses. △ Less

Submitted 6 October, 2020; originally announced October 2020.

MSC Class: 92C42; 92-08; 05C65

arXiv:2008.04357 [pdf, other]

Directional Laplacian Centrality for Cyber Situational Awareness

Authors: Sinan G. Aksoy, Emilie Purvine, Stephen J. Young

Abstract: Cyber operations is drowning in diverse, high-volume, multi-source data. In order to get a full picture of current operations and identify malicious events and actors analysts must see through data generated by a mix of human activity and benign automated processes. Although many monitoring and alert systems exist, they typically use signature-based detection methods. We introduce a general method… ▽ More Cyber operations is drowning in diverse, high-volume, multi-source data. In order to get a full picture of current operations and identify malicious events and actors analysts must see through data generated by a mix of human activity and benign automated processes. Although many monitoring and alert systems exist, they typically use signature-based detection methods. We introduce a general method rooted in spectral graph theory to discover patterns and anomalies without a priori knowledge of signatures. We derive and propose a new graph-theoretic centrality measure based on the derivative of the graph Laplacian matrix in the direction of a vertex. To build intuition about our measure we show how it identifies the most central vertices in standard network data sets and compare to other graph centrality measures. Finally, we focus our attention on studying its effectiveness in identifying important IP addresses in network flow data. Using both real and synthetic network flow data, we conduct several experiments to test our measure's sensitivity to two types of injected attack profiles, and show that vertices participating in injected attack profiles exhibit noticeable changes in our centrality measures, even when the injected anomalies are relatively small, and in the presence of simulated network dynamics. △ Less

Submitted 23 March, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: 25 pages, 15 figures

arXiv:2003.11782 [pdf, other]

Hypernetwork Science: From Multidimensional Networks to Computational Topology

Authors: Cliff A. Joslyn, Sinan Aksoy, Tiffany J. Callahan, Lawrence E. Hunter, Brett Jefferson, Brenda Praggastis, Emilie A. H. Purvine, Ignacio J. Tripodi

Abstract: As data structures and mathematical objects used for complex systems modeling, hypergraphs sit nicely poised between on the one hand the world of network models, and on the other that of higher-order mathematical abstractions from algebra, lattice theory, and topology. They are able to represent complex systems interactions more faithfully than graphs and networks, while also being some of the sim… ▽ More As data structures and mathematical objects used for complex systems modeling, hypergraphs sit nicely poised between on the one hand the world of network models, and on the other that of higher-order mathematical abstractions from algebra, lattice theory, and topology. They are able to represent complex systems interactions more faithfully than graphs and networks, while also being some of the simplest classes of systems representing topological structures as collections of multidimensional objects connected in a particular pattern. In this paper we discuss the role of (undirected) hypergraphs in the science of complex networks, and provide a mathematical overview of the core concepts needed for hypernetwork modeling, including duality and the relationship to bicolored graphs, quantitative adjacency and incidence, the nature of walks in hypergraphs, and available topological relationships and properties. We close with a brief discussion of two example applications: biomedical databases for disease analysis, and domain-name system (DNS) analysis of cyber data. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Report number: PNNL-SA-152208 MSC Class: 05C65; ACM Class: G.2.2

arXiv:1912.05487 [pdf, other]

A Sheaf Theoretical Approach to Uncertainty Quantification of Heterogeneous Geolocation Information

Authors: Cliff Joslyn, Lauren Charles, Chris DePerno, Nicholas Gould, Kathleen Nowak, Brenda Praggastis, Emilie Purvine, Michael Robinson, Jennifer Strules, Paul Whitney

Abstract: Integration of heterogeneous sensors is a challenging problem across a range of applications. Prominent among these are multi-target tracking, where one must combine observations from different sensor types in a meaningful way to track multiple targets. Because sensors have differing error models, we seek a theoretically-justified quantification of the agreement among ensembles of sensors, both ov… ▽ More Integration of heterogeneous sensors is a challenging problem across a range of applications. Prominent among these are multi-target tracking, where one must combine observations from different sensor types in a meaningful way to track multiple targets. Because sensors have differing error models, we seek a theoretically-justified quantification of the agreement among ensembles of sensors, both overall for a sensor collection, and also at a fine-grained level specifying pairwise and multi-way interactions among sensors. We demonstrate that the theory of mathematical sheaves provides a unified answer to this need, supporting both quantitative and qualitative data. The theory provides algorithms to globalize data across the network of deployed sensors, and to diagnose issues when the data do not globalize cleanly. We demonstrate the utility of sheaf-based tracking models based on experimental data of a wild population of black bears in Asheville, North Carolina. A measurement model involving four sensors deployed among the bears and the team of scientists charged with tracking their location is deployed. This provides a sheaf-based integration model which is small enough to fully interpret, but of sufficient complexity to demonstrate the sheaf's ability to recover a holistic picture of the locations and behaviors of both individual bears and the bear-human tracking system. A statistical approach was developed for comparison, a dynamic linear model which was estimated using a Kalman filter. This approach also recovered bear and human locations and sensor accuracies. When the observations are normalized into a common coordinate system, the structure of the dynamic linear observation model recapitulates the structure of the sheaf model, demonstrating the canonicity of the sheaf-based approach. But when the observations are not so normalized, the sheaf model still remains valid. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Submitted

arXiv:1906.11295 [pdf, other]

Hypernetwork Science via High-Order Hypergraph Walks

Authors: Sinan G. Aksoy, Cliff Joslyn, Carlos Ortiz Marrero, Brenda Praggastis, Emilie Purvine

Abstract: We propose high-order hypergraph walks as a framework to generalize graph-based network science techniques to hypergraphs. Edge incidence in hypergraphs is quantitative, yielding hypergraph walks with both length and width. Graph methods which then generalize to hypergraphs include connected component analyses, graph distance-based metrics such as closeness centrality, and motif-based measures suc… ▽ More We propose high-order hypergraph walks as a framework to generalize graph-based network science techniques to hypergraphs. Edge incidence in hypergraphs is quantitative, yielding hypergraph walks with both length and width. Graph methods which then generalize to hypergraphs include connected component analyses, graph distance-based metrics such as closeness centrality, and motif-based measures such as clustering coefficients. We apply high-order analogs of these methods to real world hypernetworks, and show they reveal nuanced and interpretable structure that cannot be detected by graph-based methods. Lastly, we apply three generative models to the data and find that basic hypergraph properties, such as density and degree distributions, do not necessarily control these new structural measurements. Our work demonstrates how analyses of hypergraph-structured data are richer when utilizing tools tailored to capture hypergraph-native phenomena, and suggests one possible avenue towards that end. △ Less

Submitted 8 June, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

Comments: Updated to address referee comments, to appear in EPJ Data Science

arXiv:1906.04936 [pdf, other]

Relative Hausdorff Distance for Network Analysis

Authors: Sinan G. Aksoy, Kathleen E. Nowak, Emilie Purvine, Stephen J. Young

Abstract: Similarity measures are used extensively in machine learning and data science algorithms. The newly proposed graph Relative Hausdorff (RH) distance is a lightweight yet nuanced similarity measure for quantifying the closeness of two graphs. In this work we study the effectiveness of RH distance as a tool for detecting anomalies in time-evolving graph sequences. We apply RH to cyber data with given… ▽ More Similarity measures are used extensively in machine learning and data science algorithms. The newly proposed graph Relative Hausdorff (RH) distance is a lightweight yet nuanced similarity measure for quantifying the closeness of two graphs. In this work we study the effectiveness of RH distance as a tool for detecting anomalies in time-evolving graph sequences. We apply RH to cyber data with given red team events, as well to synthetically generated sequences of graphs with planted attacks. In our experiments, the performance of RH distance is at times comparable, and sometimes superior, to graph edit distance in detecting anomalous phenomena. Our results suggest that in appropriate contexts, RH distance has advantages over more computationally intensive similarity measures. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: 20 pages

arXiv:1903.08298 [pdf, other]

Local Versus Global Distances for Zigzag Persistence Modules

Authors: Ellen Gasparovic, Maria Gommel, Emilie Purvine, Radmila Sazdanovic, Bei Wang, Yusu Wang, Lori Ziegelmeier

Abstract: This short note establishes explicit and broadly applicable relationships between persistence-based distances computed locally and globally. In particular, we show that the bottleneck distance between two zigzag persistence modules restricted to an interval is always bounded above by the distance between the unrestricted versions. While this result is not surprising, it could have different practi… ▽ More This short note establishes explicit and broadly applicable relationships between persistence-based distances computed locally and globally. In particular, we show that the bottleneck distance between two zigzag persistence modules restricted to an interval is always bounded above by the distance between the unrestricted versions. While this result is not surprising, it could have different practical implications. We give two related applications for metric graph distances, as well as an extension for the matching distance between multiparameter persistence modules. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: 9 pages, 1 figure

arXiv:1812.05282 [pdf, other]

The Relationship Between the Intrinsic Cech and Persistence Distortion Distances for Metric Graphs

Authors: Ellen Gasparovic, Maria Gommel, Emilie Purvine, Radmila Sazdanovic, Bei Wang, Yusu Wang, Lori Ziegelmeier

Abstract: Metric graphs are meaningful objects for modeling complex structures that arise in many real-world applications, such as road networks, river systems, earthquake faults, blood vessels, and filamentary structures in galaxies. To study metric graphs in the context of comparison, we are interested in determining the relative discriminative capabilities of two topology-based distances between a pair o… ▽ More Metric graphs are meaningful objects for modeling complex structures that arise in many real-world applications, such as road networks, river systems, earthquake faults, blood vessels, and filamentary structures in galaxies. To study metric graphs in the context of comparison, we are interested in determining the relative discriminative capabilities of two topology-based distances between a pair of arbitrary finite metric graphs: the persistence distortion distance and the intrinsic Cech distance. We explicitly show how to compute the intrinsic Cech distance between two metric graphs based solely on knowledge of the shortest systems of loops for the graphs. Our main theorem establishes an inequality between the intrinsic Cech and persistence distortion distances in the case when one of the graphs is a bouquet graph and the other is arbitrary. The relationship also holds when both graphs are constructed via wedge sums of cycles and edges. △ Less

Submitted 13 December, 2018; originally announced December 2018.

Comments: 18 pages, 6 figures

MSC Class: 57M15

arXiv:1805.11547 [pdf, other]

Local homology of abstract simplicial complexes

Authors: Michael Robinson, Chris Capraro, Cliff Joslyn, Emilie Purvine, Brenda Praggastis, Stephen Ranshous, Arun Sathanur

Abstract: This survey describes some useful properties of the local homology of abstract simplicial complexes. Although the existing literature on local homology is somewhat dispersed, it is largely dedicated to the study of manifolds, submanifolds, or samplings thereof. While this is a vital perspective, the focus of this survey is squarely on the local homology of abstract simplicial complexes. Our motiva… ▽ More This survey describes some useful properties of the local homology of abstract simplicial complexes. Although the existing literature on local homology is somewhat dispersed, it is largely dedicated to the study of manifolds, submanifolds, or samplings thereof. While this is a vital perspective, the focus of this survey is squarely on the local homology of abstract simplicial complexes. Our motivation comes from the needs of the analysis of hypergraphs and graphs. In addition to presenting many classical facts in a unified way, this survey presents a few new results about how local homology generalizes useful tools from graph theory. The survey ends with a statistical comparison of graph invariants with local homology. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: 38 pages

MSC Class: 55N25

arXiv:1712.06224 [pdf, other]

doi 10.1007/s41468-020-00054-y

On Homotopy Types of Vietoris-Rips Complexes of Metric Gluings

Authors: Michal Adamaszek, Henry Adams, Ellen Gasparovic, Maria Gommel, Emilie Purvine, Radmila Sazdanovic, Bei Wang, Yusu Wang, Lori Ziegelmeier

Abstract: We study Vietoris-Rips complexes of metric wedge sums and metric gluings. We show that the Vietoris-Rips complex of a wedge sum, equipped with a natural metric, is homotopy equivalent to the wedge sum of the Vietoris-Rips complexes. We also provide generalizations for when two metric spaces are glued together along a common isometric subset. As our main example, we deduce the homotopy type of the… ▽ More We study Vietoris-Rips complexes of metric wedge sums and metric gluings. We show that the Vietoris-Rips complex of a wedge sum, equipped with a natural metric, is homotopy equivalent to the wedge sum of the Vietoris-Rips complexes. We also provide generalizations for when two metric spaces are glued together along a common isometric subset. As our main example, we deduce the homotopy type of the Vietoris-Rips complex of two metric graphs glued together along a sufficiently short path (compared to lengths of certain loops in the input graphs). As a result, we can describe the persistent homology, in all homological dimensions, of the Vietoris-Rips complexes of a wide class of metric graphs. △ Less

Submitted 12 August, 2019; v1 submitted 17 December, 2017; originally announced December 2017.

MSC Class: 05E45 ACM Class: F.2.2

arXiv:1711.11098 [pdf, ps, other]

doi 10.1093/comnet/cny016

A generative graph model for electrical infrastructure networks

Authors: Sinan G. Aksoy, Emilie Purvine, Eduardo Cotilla-Sanchez, Mahantesh Halappanavar

Abstract: We propose a generative graph model for electrical infrastructure networks that accounts for heterogeneity in both node and edge type. To inform the model design, we analyze the properties of power grid graphs derived from the U.S. Eastern Interconnection, Texas Interconnection, and Poland transmission system power grids. Across these datasets, we find subgraphs induced by nodes of the same voltag… ▽ More We propose a generative graph model for electrical infrastructure networks that accounts for heterogeneity in both node and edge type. To inform the model design, we analyze the properties of power grid graphs derived from the U.S. Eastern Interconnection, Texas Interconnection, and Poland transmission system power grids. Across these datasets, we find subgraphs induced by nodes of the same voltage level exhibit shared structural properties atypical to small-world networks, including low local clustering, large diameter, and large average distance. On the other hand, we find subgraphs induced by transformer edges linking nodes of different voltage types contain a more limited structure, consisting mainly of small, disjoint star graphs. The goal of our proposed model is to match both these inter and intra-network properties by proceeding in two phases: the first phase adapts the Chung-Lu random graph model, taking desired vertex degrees and desired diameter as inputs, while the second phase of the model is based on a simpler random star graph generation process. We test the model's performance by comparing its output across many runs to the aforementioned real data. In nearly all categories tested, we find our model is more accurate in reproducing the unusual mixture of properties apparent in the data than the Chung-Lu model. We also include graph visualization comparisons, a brief analysis of edge-deletion resiliency, and guidelines for artificially generating the model inputs in the absence of real data. △ Less

Submitted 19 July, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

arXiv:1702.07379 [pdf, other]

A Complete Characterization of the 1-Dimensional Intrinsic Cech Persistence Diagrams for Metric Graphs

Authors: Ellen Gasparovic, Maria Gommel, Emilie Purvine, Radmila Sazdanovic, Bei Wang, Yusu Wang, Lori Ziegelmeier

Abstract: Metric graphs are special types of metric spaces used to model and represent simple, ubiquitous, geometric relations in data such as biological networks, social networks, and road networks. We are interested in giving a qualitative description of metric graphs using topological summaries. In particular, we provide a complete characterization of the 1-dimensional intrinsic Cech persistence diagrams… ▽ More Metric graphs are special types of metric spaces used to model and represent simple, ubiquitous, geometric relations in data such as biological networks, social networks, and road networks. We are interested in giving a qualitative description of metric graphs using topological summaries. In particular, we provide a complete characterization of the 1-dimensional intrinsic Cech persistence diagrams for metric graphs using persistent homology. Together with complementary results by Adamaszek et. al, which imply results on intrinsic Cech persistence diagrams in all dimensions for a single cycle, our results constitute important steps toward characterizing intrinsic Cech persistence diagrams for arbitrary metric graphs across all dimensions. △ Less

Submitted 7 July, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

Comments: 24 pages, 10 figures

MSC Class: 57M15

arXiv:1609.02883 [pdf, other]

A Category Theoretical Investigation of the Type Hierarchy for Heterogeneous Sensor Integration

Authors: Emilie Purvine, Cliff Joslyn, Michael Robinson

Abstract: Consider the case of many sensors, each returning very different types of data (e.g., a camera returning images, a thermometer returning probability distributions, a newspaper returning articles, a traffic counter returning numbers). Additionally we have a set of questions, or variables, that we wish to use these sensors to inform (e.g., temperature, location, crowd size, topic). Rather than using… ▽ More Consider the case of many sensors, each returning very different types of data (e.g., a camera returning images, a thermometer returning probability distributions, a newspaper returning articles, a traffic counter returning numbers). Additionally we have a set of questions, or variables, that we wish to use these sensors to inform (e.g., temperature, location, crowd size, topic). Rather than using one sensor to inform each variable we wish to integrate these sources of data to get more robust and complete information. The problem, of course, is how to inform a variable, e.g., crowd size, using a number, a newspaper article, and an image. How do we integrate these very different types of information? Michael Robinson proposes that sheaf theory is the canonical answer. Moreover, one of the axioms in Robinson's paper which makes sheaf theory work for data integration is that all data sources have the structure of a vector space. Therefore, the motivating question for everything in this report is "How do we interpret arbitrary sensor output as a vector space with the intent to integrate?" △ Less

Submitted 9 September, 2016; originally announced September 2016.

Report number: PNNL-25784

arXiv:1507.07021 [pdf, other]

doi 10.1021/acs.jpcb.6b02059

Energy Minimization of Discrete Protein Titration State Models Using Graph Theory

Authors: Emilie Purvine, Kyle Monson, Elizabeth Jurrus, Keith Star, Nathan A. Baker

Abstract: There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of… ▽ More There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of "maximum flow-minimum cut" graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial-time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered. △ Less

Submitted 16 April, 2016; v1 submitted 24 July, 2015; originally announced July 2015.

arXiv:1501.00943 [pdf, other]

Comparative Study of Clustering Techniques for Real-Time Dynamic Model Reduction

Authors: Emilie Purvine, Eduardo Cotilla-Sanchez, Mahantesh Halappanavar, Zhenyu Huang, Guang Lin, Shuai Lu, Shaobu Wang

Abstract: Dynamic model reduction in power systems is necessary for improving computational efficiency. Traditional model reduction using linearized models or online analysis is not adequate to capture dynamic behaviors of the power system, especially with the new mix of intermittent generation and intelligent consumption making the power system more dynamic and non-linear. Real-time dynamic model reduction… ▽ More Dynamic model reduction in power systems is necessary for improving computational efficiency. Traditional model reduction using linearized models or online analysis is not adequate to capture dynamic behaviors of the power system, especially with the new mix of intermittent generation and intelligent consumption making the power system more dynamic and non-linear. Real-time dynamic model reduction has emerged to fill this important need. This paper explores using clustering techniques to analyze real-time phasor measurements to identify groups of generators with similar behavior, as well as a representative generator from each group for dynamic model reduction. Two clustering techniques -- graph clustering and k-means -- are considered. These techniques are compared with a previously developed dynamic model reduction approach using Singular Value Decomposition. Two sample power grid data sets are used to test these different model reduction techniques. Based on the algorithms' relative performance, recommendations are provided for practical use. △ Less

Submitted 18 July, 2017; v1 submitted 5 January, 2015; originally announced January 2015.

Comments: Statistical Analysis and Data Mining, in press, 2017

Showing 1–27 of 27 results for author: Purvine, E