-
Survey and Analysis of IoT Operating Systems: A Comparative Study on the Effectiveness and Acquisition Time of Open Source Digital Forensics Tools
Authors:
Jeffrey Fairbanks,
Md Mashrur Arifin,
Sadia Afreen,
Alex Curtis
Abstract:
The main goal of this research project is to evaluate the effectiveness and speed of open-source forensic tools for digital evidence collecting from various Internet-of-Things (IoT) devices. The project will create and configure many IoT environments, across popular IoT operating systems, and run common forensics tasks in order to accomplish this goal. To validate these forensic analysis operation…
▽ More
The main goal of this research project is to evaluate the effectiveness and speed of open-source forensic tools for digital evidence collecting from various Internet-of-Things (IoT) devices. The project will create and configure many IoT environments, across popular IoT operating systems, and run common forensics tasks in order to accomplish this goal. To validate these forensic analysis operations, a variety of open-source forensic tools covering four standard digital forensics tasks. These tasks will be utilized across each sample IoT operating system and will have its time spent on record carefully tracked down and examined, allowing for a thorough evaluation of the effectiveness and speed for performing forensics on each type of IoT device. The research also aims to offer recommendations to IoT security experts and digital forensic practitioners about the most efficient open-source tools for forensic investigations with IoT devices while maintaining the integrity of gathered evidence and identifying challenges that exist with these new device types. The results will be shared widely and well-documented in order to provide significant contributions to the field of internet-of-things device makers and digital forensics.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
POST: Email Archival, Processing and Flagging Stack for Incident Responders
Authors:
Jeffrey Fairbanks
Abstract:
Phishing is one of the main points of compromise, with email security and awareness being estimated at \$50-100B in 2022. There is great need for email forensics capability to quickly search for malicious content. A novel solution POST is proposed. POST is an API driven serverless email archival, processing, and flagging workflow for both large and small organizations that collects and parses all…
▽ More
Phishing is one of the main points of compromise, with email security and awareness being estimated at \$50-100B in 2022. There is great need for email forensics capability to quickly search for malicious content. A novel solution POST is proposed. POST is an API driven serverless email archival, processing, and flagging workflow for both large and small organizations that collects and parses all email, flags emails using state of the art Natural Language Processing and Machine Learning, allows full email searching on every aspect of an email, and provides a cost savings of up to 68.6%.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
GATlab: Modeling and Programming with Generalized Algebraic Theories
Authors:
Owen Lynch,
Kris Brown,
James Fairbanks,
Evan Patterson
Abstract:
Categories and categorical structures are increasingly recognized as useful abstractions for modeling in science and engineering. To uniformly implement category-theoretic mathematical models in software, we introduce GATlab, a domain-specific language for algebraic specification embedded in a technical programming language. GATlab is based on generalized algebraic theories (GATs), a logical syste…
▽ More
Categories and categorical structures are increasingly recognized as useful abstractions for modeling in science and engineering. To uniformly implement category-theoretic mathematical models in software, we introduce GATlab, a domain-specific language for algebraic specification embedded in a technical programming language. GATlab is based on generalized algebraic theories (GATs), a logical system extending algebraic theories with dependent types so as to encompass category theory. Using GATlab, the programmer can specify generalized algebraic theories and their models, including both free models, based on symbolic expressions, and computational models, defined by arbitrary code in the host language. Moreover, the programmer can define maps between theories and use them to declaratively migrate models of one theory to models of another. In short, GATlab aims to provide a unified environment for both computer algebra and software interface design with generalized algebraic theories. In this paper, we describe the design, implementation, and applications of GATlab.
△ Less
Submitted 8 June, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Generalized Gradient Descent is a Hypergraph Functor
Authors:
Tyler Hanks,
Matthew Klawonn,
James Fairbanks
Abstract:
Cartesian reverse derivative categories (CRDCs) provide an axiomatic generalization of the reverse derivative, which allows generalized analogues of classic optimization algorithms such as gradient descent to be applied to a broad class of problems. In this paper, we show that generalized gradient descent with respect to a given CRDC induces a hypergraph functor from a hypergraph category of optim…
▽ More
Cartesian reverse derivative categories (CRDCs) provide an axiomatic generalization of the reverse derivative, which allows generalized analogues of classic optimization algorithms such as gradient descent to be applied to a broad class of problems. In this paper, we show that generalized gradient descent with respect to a given CRDC induces a hypergraph functor from a hypergraph category of optimization problems to a hypergraph category of dynamical systems. The domain of this functor consists of objective functions that are 1) general in the sense that they are defined with respect to an arbitrary CRDC, and 2) open in that they are decorated spans that can be composed with other such objective functions via variable sharing. The codomain is specified analogously as a category of general and open dynamical systems for the underlying CRDC. We describe how the hypergraph functor induces a distributed optimization algorithm for arbitrary composite problems specified in the domain. To illustrate the kinds of problems our framework can model, we show that parameter sharing models in multitask learning, a prevalent machine learning paradigm, yield a composite optimization problem for a given choice of CRDC. We then apply the gradient descent functor to this composite problem and describe the resulting distributed gradient descent algorithm for training parameter sharing models.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Towards a Unified Theory of Time-Varying Data
Authors:
Benjamin Merlin Bumpus,
James Fairbanks,
Martti Karvonen,
Wilmer Leal,
Frédéric Simard
Abstract:
What is a time-varying graph, or a time-varying topological space and more generally what does it mean for a mathematical structure to vary over time? Here we introduce categories of narratives: powerful tools for studying temporal graphs and other time-varying data structures. Narratives are sheaves on posets of intervals of time which specify snapshots of a temporal object as well as relationshi…
▽ More
What is a time-varying graph, or a time-varying topological space and more generally what does it mean for a mathematical structure to vary over time? Here we introduce categories of narratives: powerful tools for studying temporal graphs and other time-varying data structures. Narratives are sheaves on posets of intervals of time which specify snapshots of a temporal object as well as relationships between snapshots over the course of any given interval of time. This approach offers two significant advantages. First, when restricted to the base category of graphs, the theory is consistent with the well-established theory of temporal graphs, enabling the reproduction of results in this field. Second, the theory is general enough to extend results to a wide range of categories used in data analysis, such as groups, topological spaces, databases, Petri nets, simplicial complexes and many more. The approach overcomes the challenge of relating narratives of different types to each other and preserves the structure over time in a compositional sense. Furthermore our approach allows for the systematic relation of different kinds of narratives. In summary, this theory provides a consistent and general framework for analyzing dynamic systems, offering an essential tool for mathematicians and data scientists alike.
△ Less
Submitted 27 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
A Categorical Representation Language and Computational System for Knowledge-Based Planning
Authors:
Angeline Aguinaldo,
Evan Patterson,
James Fairbanks,
William Regli,
Jaime Ruiz
Abstract:
Classical planning representation languages based on first-order logic have preliminarily been used to model and solve robotic task planning problems. Wider adoption of these representation languages, however, is hindered by the limitations present when managing implicit world changes with concise action models. To address this problem, we propose an alternative approach to representing and managi…
▽ More
Classical planning representation languages based on first-order logic have preliminarily been used to model and solve robotic task planning problems. Wider adoption of these representation languages, however, is hindered by the limitations present when managing implicit world changes with concise action models. To address this problem, we propose an alternative approach to representing and managing updates to world states during planning. Based on the category-theoretic concepts of $\mathsf{C}$-sets and double-pushout rewriting (DPO), our proposed representation can effectively handle structured knowledge about world states that support domain abstractions at all levels. It formalizes the semantics of predicates according to a user-provided ontology and preserves the semantics when transitioning between world states. This method provides a formal semantics for using knowledge graphs and relational databases to model world states and updates in planning. In this paper, we conceptually compare our category-theoretic representation with the classical planning representation. We show that our proposed representation has advantages over the classical representation in terms of handling implicit preconditions and effects, and provides a more structured framework in which to model and solve planning problems.
△ Less
Submitted 14 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Analyzing the Effects of CI/CD on Open Source Repositories in GitHub and GitLab
Authors:
Jeffrey Fairbanks,
Akshharaa Tharigonda,
Nasir U. Eisty
Abstract:
Numerous articles emphasize the benefits of implementing Continuous Integration and Delivery (CI/CD) pipelines in software development. These pipelines are expected to improve the reputation of a project and decrease the number of commits and issues in the repository. Although CI/CD adoption may be slow initially, it is believed to accelerate service delivery and deployment in the long run. This s…
▽ More
Numerous articles emphasize the benefits of implementing Continuous Integration and Delivery (CI/CD) pipelines in software development. These pipelines are expected to improve the reputation of a project and decrease the number of commits and issues in the repository. Although CI/CD adoption may be slow initially, it is believed to accelerate service delivery and deployment in the long run. This study aims to investigate the impact of CI/CD on commit velocity and issue counts in two open-source repositories, GitLab and GitHub. By analyzing more than 12,000 repositories and recording every commit and issue, it was discovered that CI/CD enhances commit velocity by 141.19 percent, but also increases the number of issues by 321.21 percent.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Compositional Algorithms on Compositional Data: Deciding Sheaves on Presheaves
Authors:
Ernst Althaus,
Benjamin Merlin Bumpus,
James Fairbanks,
Daniel Rosiak
Abstract:
Algorithmicists are well-aware that fast dynamic programming algorithms are very often the correct choice when computing on compositional (or even recursive) graphs. Here we initiate the study of how to generalize this folklore intuition to mathematical structures writ large. We achieve this horizontal generality by adopting a categorial perspective which allows us to show that: (1) structured dec…
▽ More
Algorithmicists are well-aware that fast dynamic programming algorithms are very often the correct choice when computing on compositional (or even recursive) graphs. Here we initiate the study of how to generalize this folklore intuition to mathematical structures writ large. We achieve this horizontal generality by adopting a categorial perspective which allows us to show that: (1) structured decompositions (a recent, abstract generalization of many graph decompositions) define Grothendieck topologies on categories of data (adhesive categories) and that (2) any computational problem which can be represented as a sheaf with respect to these topologies can be decided in linear time on classes of inputs which admit decompositions of bounded width and whose decomposition shapes have bounded feedback vertex number. This immediately leads to algorithms on objects of any C-set category; these include -- to name but a few examples -- structures such as: symmetric graphs, directed graphs, directed multigraphs, hypergraphs, directed hypergraphs, databases, simplicial complexes, circular port graphs and half-edge graphs.
Thus we initiate the bridging of tools from sheaf theory, structural graph theory and parameterized complexity theory; we believe this to be a very fruitful approach for a general, algebraic theory of dynamic programming algorithms. Finally we pair our theoretical results with concrete implementations of our main algorithmic contribution in the AlgebraicJulia ecosystem.
△ Less
Submitted 3 October, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Compositional Exploration of Combinatorial Scientific Models
Authors:
Kristopher Brown,
Tyler Hanks,
James Fairbanks
Abstract:
We implement a novel representation of model search spaces as diagrams over a category of models, where we have restricted attention to a broad class of models whose structure is presented by \C-sets. (Co)limits in these diagram categories allow the creation of composite model spaces from more primitive spaces. We present a novel implementation of the computer algebra of finitely presented categor…
▽ More
We implement a novel representation of model search spaces as diagrams over a category of models, where we have restricted attention to a broad class of models whose structure is presented by \C-sets. (Co)limits in these diagram categories allow the creation of composite model spaces from more primitive spaces. We present a novel implementation of the computer algebra of finitely presented categories and diagram categories (including their limits and colimits), which formalizes a notion of model space exploration. This is coupled with strategies to facilitate the selection of desired models from these model spaces. We demonstrate our framework by generating a tool which fits experimental data, searching an epidemiology-relevant subspace of mass-action kinetic models.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
An Algebraic Framework for Structured Epidemic Modeling
Authors:
Sophie Libkind,
Andrew Baas,
Micah Halter,
Evan Patterson,
James Fairbanks
Abstract:
Pandemic management requires that scientists rapidly formulate and analyze epidemiological models in order to forecast the spread of disease and the effects of mitigation strategies. Scientists must modify existing models and create novel ones in light of new biological data and policy changes such as social distancing and vaccination. Traditional scientific modeling workflows detach the structure…
▽ More
Pandemic management requires that scientists rapidly formulate and analyze epidemiological models in order to forecast the spread of disease and the effects of mitigation strategies. Scientists must modify existing models and create novel ones in light of new biological data and policy changes such as social distancing and vaccination. Traditional scientific modeling workflows detach the structure of a model -- its submodels and their interactions -- from its implementation in software. Consequently, incorporating local changes to model components may require global edits to the code-base through a manual, time-intensive, and error-prone process. We propose a compositional modeling framework that uses high-level algebraic structures to capture domain-specific scientific knowledge and bridge the gap between how scientists think about models and the code that implements them. These algebraic structures, grounded in applied category theory, simplify and expedite modeling tasks such as model specification, stratification, analysis, and calibration. With their structure made explicit, models also become easier to communicate, criticize, and refine in light of stakeholder feedback.
△ Less
Submitted 7 May, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
-
Computational category-theoretic rewriting
Authors:
Kristopher Brown,
Evan Patterson,
Tyler Hanks,
James Fairbanks
Abstract:
We demonstrate how category theory provides specifications that can efficiently be implemented via imperative algorithms and apply this to the field of graph rewriting. By examples, we show how this paradigm of software development makes it easy to quickly write correct and performant code. We provide a modern implementation of graph rewriting techniques at the level of abstraction of finitely-pre…
▽ More
We demonstrate how category theory provides specifications that can efficiently be implemented via imperative algorithms and apply this to the field of graph rewriting. By examples, we show how this paradigm of software development makes it easy to quickly write correct and performant code. We provide a modern implementation of graph rewriting techniques at the level of abstraction of finitely-presented C-sets and clarify the connections between C-sets and the typed graphs supported in existing rewriting software. We emphasize that our open-source library is extensible: by taking new categorical constructions (such as slice categories, structured cospans, and distributed graphs) and relating their limits and colimits to those of their underlying categories, users inherit efficient algorithms for pushout complements and (final) pullback complements. This allows one to perform double-, single-, and sesqui-pushout rewriting over a broad class of data structures.
△ Less
Submitted 31 March, 2023; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Categorical Data Structures for Technical Computing
Authors:
Evan Patterson,
Owen Lynch,
James Fairbanks
Abstract:
Many mathematical objects can be represented as functors from finitely-presented categories $\mathsf{C}$ to $\mathsf{Set}$. For instance, graphs are functors to $\mathsf{Set}$ from the category with two parallel arrows. Such functors are known informally as $\mathsf{C}$-sets. In this paper, we describe and implement an extension of $\mathsf{C}$-sets having data attributes with fixed types, such as…
▽ More
Many mathematical objects can be represented as functors from finitely-presented categories $\mathsf{C}$ to $\mathsf{Set}$. For instance, graphs are functors to $\mathsf{Set}$ from the category with two parallel arrows. Such functors are known informally as $\mathsf{C}$-sets. In this paper, we describe and implement an extension of $\mathsf{C}$-sets having data attributes with fixed types, such as graphs with labeled vertices or real-valued edge weights. We call such structures "acsets," short for "attributed $\mathsf{C}$-sets." Derived from previous work on algebraic databases, acsets are a joint generalization of graphs and data frames. They also encompass more elaborate graph-like objects such as wiring diagrams and Petri nets with rate constants. We develop the mathematical theory of acsets and then describe a generic implementation in the Julia programming language, which uses advanced language features to achieve performance comparable with specialized data structures.
△ Less
Submitted 19 July, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Compositional Scientific Computing with Catlab and SemanticModels
Authors:
Micah Halter,
Evan Patterson,
Andrew Baas,
James Fairbanks
Abstract:
Scientific computing is currently performed by writing domain specific modeling frameworks for solving special classes of mathematical problems. Since applied category theory provides abstract reasoning machinery for describing and analyzing diverse areas of math, it is a natural platform for building generic and reusable software components for scientific computing. We present Catlab.jl, which pr…
▽ More
Scientific computing is currently performed by writing domain specific modeling frameworks for solving special classes of mathematical problems. Since applied category theory provides abstract reasoning machinery for describing and analyzing diverse areas of math, it is a natural platform for building generic and reusable software components for scientific computing. We present Catlab.jl, which provides the category-theoretic infrastructure for this project, together with SemanticModels.jl, which leverages this infrastructure for particular modeling tasks. This approach enhances and automates scientific computing workflows by applying recent advances in mathematical modeling of interconnected systems as cospan algebras.
△ Less
Submitted 29 June, 2020; v1 submitted 10 May, 2020;
originally announced May 2020.
-
Unsupervised Construction of Knowledge Graphs From Text and Code
Authors:
Kun Cao,
James Fairbanks
Abstract:
The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit this new resource, we construct a knowledge graph using unsupervised learning methods to identify conceptual entities. We associate source code entities to thes…
▽ More
The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit this new resource, we construct a knowledge graph using unsupervised learning methods to identify conceptual entities. We associate source code entities to these natural language concepts using word embedding and clustering techniques. Practical naming conventions for methods and functions tend to reflect the concept(s) they implement. We take advantage of this specificity by presenting a novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem. With our pipeline, we aim to assist scientists in building on existing models in their discipline when making novel models for new phenomena. By combining source code and conceptual information, our knowledge graph enhances corpus-wide understanding of scientific literature.
△ Less
Submitted 25 August, 2019;
originally announced August 2019.
-
A Compositional Framework for Scientific Model Augmentation
Authors:
Micah Halter,
Christine Herlihy,
James Fairbanks
Abstract:
Scientists construct and analyze computational models to understand the world. That understanding comes from efforts to augment, combine, and compare models of related phenomena. We propose SemanticModels.jl, a system that leverages techniques from static and dynamic program analysis to process executable versions of scientific models to perform such metamodeling tasks. By framing these metamodeli…
▽ More
Scientists construct and analyze computational models to understand the world. That understanding comes from efforts to augment, combine, and compare models of related phenomena. We propose SemanticModels.jl, a system that leverages techniques from static and dynamic program analysis to process executable versions of scientific models to perform such metamodeling tasks. By framing these metamodeling tasks as metaprogramming problems, SemanticModels.jl enables writing programs that generate and expand models. To this end, we present a category theory-based framework for defining metamodeling tasks, and extracting semantic information from model implementations, and show how this framework can be used to enhance scientific workflows in a working case study.
△ Less
Submitted 14 September, 2020; v1 submitted 1 July, 2019;
originally announced July 2019.