-
Zipr: A High-Impact, Robust, Open-source, Multi-platform, Static Binary Rewriter
Authors:
Jason D. Hiser,
Anh Nguyen-Tuong,
Jack W. Davidson
Abstract:
Zipr is a tool for static binary rewriting, first published in 2016. Zipr was engineered to support arbitrary program modification with an emphasis on low overhead, robustness, and flexibility to perform security enhancements and instrumentation. Originally targeted to Linux x86-32 binaries, Zipr now supports 32- and 64-bit binaries for X86, ARM, and MIPS architectures, as well as preliminary supp…
▽ More
Zipr is a tool for static binary rewriting, first published in 2016. Zipr was engineered to support arbitrary program modification with an emphasis on low overhead, robustness, and flexibility to perform security enhancements and instrumentation. Originally targeted to Linux x86-32 binaries, Zipr now supports 32- and 64-bit binaries for X86, ARM, and MIPS architectures, as well as preliminary support for Windows programs.
These features have helped Zipr make a dramatic impact on research. It was first used in the DARPA Cyber Grand Challenge to take second place overall, with the best security score of any participant, Zipr has now been used in a variety of research areas by both the original authors as well as third parties. Zipr has also led to publications in artificial diversity, program instrumentation, program repair, fuzzing, autonomous vehicle security, research computing security, as well as directly contributing to two student dissertations. The open-source repository has accepted accepted patches from several external authors, demonstrating the impact of Zipr beyond the original authors.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Helix++: A platform for efficiently securing software
Authors:
Jack W. Davidson,
Jason D. Hiser,
Anh Nguyen-Tuong
Abstract:
The open-source Helix++ project improves the security posture of computing platforms by applying cutting-edge cybersecurity techniques to diversify and harden software automatically. A distinguishing feature of Helix++ is that it does not require source code or build artifacts; it operates directly on software in binary form--even stripped executables and libraries. This feature is key as rebuildi…
▽ More
The open-source Helix++ project improves the security posture of computing platforms by applying cutting-edge cybersecurity techniques to diversify and harden software automatically. A distinguishing feature of Helix++ is that it does not require source code or build artifacts; it operates directly on software in binary form--even stripped executables and libraries. This feature is key as rebuilding applications from source is a time-consuming and often frustrating process. Diversification breaks the software monoculture and makes attacks harder to execute as information needed for a successful attack will have changed unpredictably. Diversification also forces attackers to customize an attack for each target instead of attackers crafting an exploit that works reliably on all similarly configured targets. Hardening directly targets key attack classes. The combination of diversity and hardening provides defense-in-depth, as well as a moving target defense, to secure the Nation's cyber infrastructure.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing
Authors:
Stefan Nagy,
Anh Nguyen-Tuong,
Jason D. Hiser,
Jack W. Davidson,
Matthew Hicks
Abstract:
Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in thr…
▽ More
Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in throughput by restricting the expense of coverage tracing to only when new coverage is guaranteed. Unfortunately, CGT suits only a basic block coverage granularity -- yet most fuzzers require finer-grain coverage metrics: edge coverage and hit counts. It is this limitation which prohibits nearly all of today's state-of-the-art fuzzers from attaining the performance benefits of CGT.
This paper tackles the challenges of adapting CGT to fuzzing's most ubiquitous coverage metrics. We introduce and implement a suite of enhancements that expand CGT's introspection to fuzzing's most common code coverage metrics, while maintaining its orders-of-magnitude speedup over conventional always-on coverage tracing. We evaluate their trade-offs with respect to fuzzing performance and effectiveness across 12 diverse real-world binaries (8 open- and 4 closed-source). On average, our coverage-preserving CGT attains near-identical speed to the present block-coverage-only CGT, UnTracer; and outperforms leading binary- and source-level coverage tracers QEMU, Dyninst, RetroWrite, and AFL-Clang by 2-24x, finding more bugs in less time.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
On Generating and Labeling Network Traffic with Realistic, Self-Propagating Malware
Authors:
Molly Buchanan,
Jeffrey W. Collyer,
Jack W. Davidson,
Saikat Dey,
Mark Gardner,
Jason D. Hiser,
Jeffry Lang,
Alastair Nottingham,
Alina Oprea
Abstract:
Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is…
▽ More
Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is easily distinguishable from real traffic by even simple machine learning (ML) algorithms. Real network data is preferable, but while ubiquitous is broadly both sensitive and lacking in ground truth labels, limiting its utility for ML research.
This paper presents a multi-faceted approach to generating a data set of labeled malicious connections embedded within anonymized network traffic collected from large production networks. Real-world malware is defanged and introduced to simulated, secured nodes within those networks to generate realistic traffic while maintaining sufficient isolation to protect real data and infrastructure. Network sensor data, including this embedded malware traffic, is collected at a network edge and anonymized for research use.
Network traffic was collected and produced in accordance with the aforementioned methods at two major educational institutions. The result is a highly realistic, long term, multi-institution data set with embedded data labels spanning over 1.5 trillion connections and over a petabyte of sensor log data. The usability of this data set is demonstrated by its utility to our artificial intelligence and machine learning (AI/ML) research program.
△ Less
Submitted 27 May, 2022; v1 submitted 20 April, 2021;
originally announced April 2021.