Search | arXiv e-print repository

Zipr: A High-Impact, Robust, Open-source, Multi-platform, Static Binary Rewriter

Authors: Jason D. Hiser, Anh Nguyen-Tuong, Jack W. Davidson

Abstract: Zipr is a tool for static binary rewriting, first published in 2016. Zipr was engineered to support arbitrary program modification with an emphasis on low overhead, robustness, and flexibility to perform security enhancements and instrumentation. Originally targeted to Linux x86-32 binaries, Zipr now supports 32- and 64-bit binaries for X86, ARM, and MIPS architectures, as well as preliminary supp… ▽ More Zipr is a tool for static binary rewriting, first published in 2016. Zipr was engineered to support arbitrary program modification with an emphasis on low overhead, robustness, and flexibility to perform security enhancements and instrumentation. Originally targeted to Linux x86-32 binaries, Zipr now supports 32- and 64-bit binaries for X86, ARM, and MIPS architectures, as well as preliminary support for Windows programs. These features have helped Zipr make a dramatic impact on research. It was first used in the DARPA Cyber Grand Challenge to take second place overall, with the best security score of any participant, Zipr has now been used in a variety of research areas by both the original authors as well as third parties. Zipr has also led to publications in artificial diversity, program instrumentation, program repair, fuzzing, autonomous vehicle security, research computing security, as well as directly contributing to two student dissertations. The open-source repository has accepted accepted patches from several external authors, demonstrating the impact of Zipr beyond the original authors. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 5 pages

arXiv:2304.04846 [pdf, other]

Helix++: A platform for efficiently securing software

Authors: Jack W. Davidson, Jason D. Hiser, Anh Nguyen-Tuong

Abstract: The open-source Helix++ project improves the security posture of computing platforms by applying cutting-edge cybersecurity techniques to diversify and harden software automatically. A distinguishing feature of Helix++ is that it does not require source code or build artifacts; it operates directly on software in binary form--even stripped executables and libraries. This feature is key as rebuildi… ▽ More The open-source Helix++ project improves the security posture of computing platforms by applying cutting-edge cybersecurity techniques to diversify and harden software automatically. A distinguishing feature of Helix++ is that it does not require source code or build artifacts; it operates directly on software in binary form--even stripped executables and libraries. This feature is key as rebuilding applications from source is a time-consuming and often frustrating process. Diversification breaks the software monoculture and makes attacks harder to execute as information needed for a successful attack will have changed unpredictably. Diversification also forces attackers to customize an attack for each target instead of attackers crafting an exploit that works reliably on all similarly configured targets. Hardening directly targets key attack classes. The combination of diversity and hardening provides defense-in-depth, as well as a moving target defense, to secure the Nation's cyber infrastructure. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 4 pages, 1 figure, white paper

ACM Class: D.2.m

arXiv:2209.03441 [pdf, other]

doi 10.1145/3460120.3484787

Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing

Authors: Stefan Nagy, Anh Nguyen-Tuong, Jason D. Hiser, Jack W. Davidson, Matthew Hicks

Abstract: Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in thr… ▽ More Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in throughput by restricting the expense of coverage tracing to only when new coverage is guaranteed. Unfortunately, CGT suits only a basic block coverage granularity -- yet most fuzzers require finer-grain coverage metrics: edge coverage and hit counts. It is this limitation which prohibits nearly all of today's state-of-the-art fuzzers from attaining the performance benefits of CGT. This paper tackles the challenges of adapting CGT to fuzzing's most ubiquitous coverage metrics. We introduce and implement a suite of enhancements that expand CGT's introspection to fuzzing's most common code coverage metrics, while maintaining its orders-of-magnitude speedup over conventional always-on coverage tracing. We evaluate their trade-offs with respect to fuzzing performance and effectiveness across 12 diverse real-world binaries (8 open- and 4 closed-source). On average, our coverage-preserving CGT attains near-identical speed to the present block-coverage-only CGT, UnTracer; and outperforms leading binary- and source-level coverage tracers QEMU, Dyninst, RetroWrite, and AFL-Clang by 2-24x, finding more bugs in less time. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

arXiv:2104.10034 [pdf, other]

On Generating and Labeling Network Traffic with Realistic, Self-Propagating Malware

Authors: Molly Buchanan, Jeffrey W. Collyer, Jack W. Davidson, Saikat Dey, Mark Gardner, Jason D. Hiser, Jeffry Lang, Alastair Nottingham, Alina Oprea

Abstract: Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is… ▽ More Research and development of techniques which detect or remediate malicious network activity require access to diverse, realistic, contemporary data sets containing labeled malicious connections. In the absence of such data, said techniques cannot be meaningfully trained, tested, and evaluated. Synthetically produced data containing fabricated or merged network traffic is of limited value as it is easily distinguishable from real traffic by even simple machine learning (ML) algorithms. Real network data is preferable, but while ubiquitous is broadly both sensitive and lacking in ground truth labels, limiting its utility for ML research. This paper presents a multi-faceted approach to generating a data set of labeled malicious connections embedded within anonymized network traffic collected from large production networks. Real-world malware is defanged and introduced to simulated, secured nodes within those networks to generate realistic traffic while maintaining sufficient isolation to protect real data and infrastructure. Network sensor data, including this embedded malware traffic, is collected at a network edge and anonymized for research use. Network traffic was collected and produced in accordance with the aforementioned methods at two major educational institutions. The result is a highly realistic, long term, multi-institution data set with embedded data labels spanning over 1.5 trillion connections and over a petabyte of sensor log data. The usability of this data set is demonstrated by its utility to our artificial intelligence and machine learning (AI/ML) research program. △ Less

Submitted 27 May, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 4+2 pages, 3 figures, 1 table, for AI4CS-SDM21

Showing 1–4 of 4 results for author: Hiser, J D