-
Flurry: a Fast Framework for Reproducible Multi-layered Provenance Graph Representation Learning
Authors:
Maya Kapoor,
Joshua Melton,
Michael Ridenhour,
Mahalavanya Sriram,
Thomas Moyer,
Siddharth Krishnan
Abstract:
Complex heterogeneous dynamic networks like knowledge graphs are powerful constructs that can be used in modeling data provenance from computer systems. From a security perspective, these attributed graphs enable causality analysis and tracing for analyzing a myriad of cyberattacks. However, there is a paucity in systematic development of pipelines that transform system executions and provenance i…
▽ More
Complex heterogeneous dynamic networks like knowledge graphs are powerful constructs that can be used in modeling data provenance from computer systems. From a security perspective, these attributed graphs enable causality analysis and tracing for analyzing a myriad of cyberattacks. However, there is a paucity in systematic development of pipelines that transform system executions and provenance into usable graph representations for machine learning tasks. This lack of instrumentation severely inhibits scientific advancement in provenance graph machine learning by hindering reproducibility and limiting the availability of data that are critical for techniques like graph neural networks. To fulfill this need, we present Flurry, an end-to-end data pipeline which simulates cyberattacks, captures provenance data from these attacks at multiple system and application layers, converts audit logs from these attacks into data provenance graphs, and incorporates this data with a framework for training deep neural models that supports preconfigured or custom-designed models for analysis in real-world resilient systems. We showcase this pipeline by processing data from multiple system attacks and performing anomaly detection via graph classification using current benchmark graph representational learning frameworks. Flurry provides a fast, customizable, extensible, and transparent solution for providing this much needed data to cybersecurity professionals.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
Detecting Safety and Security Faults in PLC Systems with Data Provenance
Authors:
Abdullah Al Farooq,
Jessica Marquard,
Kripa George,
Thomas Moyer
Abstract:
Programmable Logic Controllers are an integral component for managing many different industrial processes (e.g., smart building management, power generation, water and wastewater management, and traffic control systems), and manufacturing and control industries (e.g., oil and natural gas, chemical, pharmaceutical, pulp and paper, food and beverage, automotive, and aerospace). Despite being used wi…
▽ More
Programmable Logic Controllers are an integral component for managing many different industrial processes (e.g., smart building management, power generation, water and wastewater management, and traffic control systems), and manufacturing and control industries (e.g., oil and natural gas, chemical, pharmaceutical, pulp and paper, food and beverage, automotive, and aerospace). Despite being used widely in many critical infrastructures, PLCs use protocols which make these control systems vulnerable to many common attacks, including man-in-the-middle attacks, denial of service attacks, and memory corruption attacks (e.g., array, stack, and heap overflows, integer overflows, and pointer corruption). In this paper, we propose PLC-PROV, a system for tracking the inputs and outputs of the control system to detect violations in the safety and security policies of the system. We consider a smart building as an example of a PLC-based system and show how PLC-PROV can be applied to ensure that the inputs and outputs are consistent with the intended safety and security policies.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
IoTC2: A Formal Method Approach for Detecting Conflicts in Large Scale IoT Systems
Authors:
Abdullah Al Farooq,
Ehab Al-Shaer,
Thomas Moyer,
Krishna Kant
Abstract:
Internet of Things (IoT) has become a common paradigm for different domains such as health care, transportation infrastructure, smart home, smart shop**, and e-commerce. With its interoperable functionality, it is now possible to connect all domains of IoT together for providing competent services to the users. Because numerous IoT devices can connect and communicate at the same time, there can…
▽ More
Internet of Things (IoT) has become a common paradigm for different domains such as health care, transportation infrastructure, smart home, smart shop**, and e-commerce. With its interoperable functionality, it is now possible to connect all domains of IoT together for providing competent services to the users. Because numerous IoT devices can connect and communicate at the same time, there can be events that trigger conflicting actions to an actuator or an environmental feature. However, there have been very few research efforts made to detect conflicting situation in IoT system using formal method. This paper provides a formal method approach, IoT Confict Checker (IoTC2), to ensure safety of controller and actuators' behavior with respect to conflicts. Any policy violation results in detection of the conflicts. We defined the safety policies for controller, actions, and triggering events and implemented the those with Prolog to prove the logical completeness and soundness. In addition to that, we have implemented the detection policies in Matlab Simulink Environment with its built-in Model Verification blocks. We created smart home environment in Simulink and showed how the conflicts affect actions and corresponding features. We have also experimented the scalability, efficiency, and accuracy of our method in the simulated environment.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
Runtime Analysis of Whole-System Provenance
Authors:
Thomas Pasquier,
Xueyuan Han,
Thomas Moyer,
Adam Bates,
Olivier Hermant,
David Eyers,
Jean Bacon,
Margo Seltzer
Abstract:
Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place…
▽ More
Identifying the root cause and impact of a system intrusion remains a foundational challenge in computer security. Digital provenance provides a detailed history of the flow of information within a computing system, connecting suspicious events to their root causes. Although existing provenance-based auditing techniques provide value in forensic analysis, they assume that such analysis takes place only retrospectively. Such post-hoc analysis is insufficient for realtime security applications, moreover, even for forensic tasks, prior provenance collection systems exhibited poor performance and scalability, jeopardizing the timeliness of query responses.
We present CamQuery, which provides inline, realtime provenance analysis, making it suitable for implementing security applications. CamQuery is a Linux Security Module that offers support for both userspace and in-kernel execution of analysis applications. We demonstrate the applicability of CamQuery to a variety of runtime security applications including data loss prevention, intrusion detection, and regulatory compliance. In evaluation, we demonstrate that CamQuery reduces the latency of realtime query mechanisms, while imposing minimal overheads on system execution. CamQuery thus enables the further deployment of provenance-based technologies to address central challenges in computer security.
△ Less
Submitted 25 August, 2018; v1 submitted 18 August, 2018;
originally announced August 2018.
-
Curator: Provenance Management for Modern Distributed Systems
Authors:
Warren Smith,
Thomas Moyer,
Charles Munson
Abstract:
Data provenance is a valuable tool for protecting and troubleshooting distributed systems. Careful design of the provenance components reduces the impact on the design, implementation, and operation of the distributed system. In this paper, we present Curator, a provenance management toolkit that can be easily integrated with microservice-based systems and other modern distributed systems. This pa…
▽ More
Data provenance is a valuable tool for protecting and troubleshooting distributed systems. Careful design of the provenance components reduces the impact on the design, implementation, and operation of the distributed system. In this paper, we present Curator, a provenance management toolkit that can be easily integrated with microservice-based systems and other modern distributed systems. This paper describes the design of Curator and discusses how we have used Curator to add provenance to distributed systems. We find that our approach results in no changes to the design of these distributed systems and minimal additional code and dependencies to manage. In addition, Curator uses the same scalable infrastructure as the distributed system and can therefore scale with the distributed system.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Practical Whole-System Provenance Capture
Authors:
Thomas Pasquier,
Xueyuan Han,
Mark Goldstein,
Thomas Moyer,
David Eyers,
Margo Seltzer,
Jean Bacon
Abstract:
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been sever…
▽ More
Data provenance describes how data came to be in its present form. It includes data sources and the transformations that have been applied to them. Data provenance has many uses, from forensics and security to aiding the reproducibility of scientific experiments. We present CamFlow, a whole-system provenance capture mechanism that integrates easily into a PaaS offering. While there have been several prior whole-system provenance systems that captured a comprehensive, systemic and ubiquitous record of a system's behavior, none have been widely adopted. They either A) impose too much overhead, B) are designed for long-outdated kernel releases and are hard to port to current systems, C) generate too much data, or D) are designed for a single system. CamFlow addresses these shortcoming by: 1) leveraging the latest kernel design advances to achieve efficiency; 2) using a self-contained, easily maintainable implementation relying on a Linux Security Module, NetFilter, and other existing kernel facilities; 3) providing a mechanism to tailor the captured provenance data to the needs of the application; and 4) making it easy to integrate provenance across distributed systems. The provenance we capture is streamed and consumed by tenant-built auditor applications. We illustrate the usability of our implementation by describing three such applications: demonstrating compliance with data regulations; performing fault/intrusion detection; and implementing data loss prevention. We also show how CamFlow can be leveraged to capture meaningful provenance without modifying existing applications.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
Retrofitting Applications with Provenance-Based Security Monitoring
Authors:
Adam Bates,
Kevin Butler,
Alin Dobra,
Brad Reaves,
Patrick Cable,
Thomas Moyer,
Nabil Schear
Abstract:
Data provenance is a valuable tool for detecting and preventing cyber attack, providing insight into the nature of suspicious events. For example, an administrator can use provenance to identify the perpetrator of a data leak, track an attacker's actions following an intrusion, or even control the flow of outbound data within an organization. Unfortunately, providing relevant data provenance for c…
▽ More
Data provenance is a valuable tool for detecting and preventing cyber attack, providing insight into the nature of suspicious events. For example, an administrator can use provenance to identify the perpetrator of a data leak, track an attacker's actions following an intrusion, or even control the flow of outbound data within an organization. Unfortunately, providing relevant data provenance for complex, heterogenous software deployments is challenging, requiring both the tedious instrumentation of many application components as well as a unified architecture for aggregating information between components.
In this work, we present a composition of techniques for bringing affordable and holistic provenance capabilities to complex application workflows, with particular consideration for the exemplar domain of web services. We present DAP, a transparent architecture for capturing detailed data provenance for web service components. Our approach leverages a key insight that minimal knowledge of open protocols can be leveraged to extract precise and efficient provenance information by interposing on application components' communications, granting DAP compatibility with existing web services without requiring instrumentation or developer cooperation. We show how our system can be used in real time to monitor system intrusions or detect data exfiltration attacks while imposing less than 5.1 ms end-to-end overhead on web requests. Through the introduction of a garbage collection optimization, DAP is able to monitor system activity without suffering from excessive storage overhead. DAP thus serves not only as a provenance-aware web framework, but as a case study in the non-invasive deployment of provenance capabilities for complex applications workflows.
△ Less
Submitted 1 September, 2016;
originally announced September 2016.
-
High-throughput Ingest of Provenance Records into Accumulo
Authors:
Thomas Moyer,
Vijay Gadepally
Abstract:
Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must be stored somewhere where it can be queried. This problem becomes even…
▽ More
Whole-system data provenance provides deep insight into the processing of data on a system, including detecting data integrity attacks. The downside to systems that collect whole-system data provenance is the sheer volume of data that is generated under many heavy workloads. In order to make provenance metadata useful, it must be stored somewhere where it can be queried. This problem becomes even more challenging when considering a network of provenance-aware machines all collecting this metadata. In this paper, we investigate the use of D4M and Accumulo to support high-throughput data ingest of whole-system provenance data. We find that we are able to ingest 3,970 graph components per second. Centrally storing the provenance metadata allows us to build systems that can detect and respond to data integrity attacks that are captured by the provenance system.
△ Less
Submitted 12 August, 2016;
originally announced August 2016.