Skip to main content

Showing 1–9 of 9 results for author: Anderson, H S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.06716  [pdf, ps, other

    cs.LG cs.CL cs.CR

    Machine Learning Model Attribution Challenge

    Authors: Elizabeth Merkhofer, Deepesh Chaudhari, Hyrum S. Anderson, Keith Manville, Lily Wong, João Gante

    Abstract: We present the findings of the Machine Learning Model Attribution Challenge. Fine-tuned machine learning models may derive from other trained models without obvious attribution characteristics. In this challenge, participants identify the publicly-available base models that underlie a set of anonymous, fine-tuned large language models (LLMs) using only textual output of the models. Contestants aim… ▽ More

    Submitted 17 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  2. arXiv:2212.14315  [pdf, other

    cs.CR cs.LG

    "Real Attackers Don't Compute Gradients": Bridging the Gap Between Adversarial ML Research and Practice

    Authors: Giovanni Apruzzese, Hyrum S. Anderson, Savino Dambra, David Freeman, Fabio Pierazzi, Kevin A. Roundy

    Abstract: Recent years have seen a proliferation of research on adversarial machine learning. Numerous papers demonstrate powerful algorithmic attacks against a wide variety of machine learning (ML) models, and numerous other papers propose defenses that can withstand most attacks. However, abundant real-world evidence suggests that actual attackers use simple tactics to subvert ML-driven systems, and as a… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

  3. arXiv:2012.09390  [pdf, other

    stat.ML cs.AI cs.LG

    Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

    Authors: Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, Mark McLean

    Abstract: Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a conv… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: To appear in AAAI 2021

  4. arXiv:2009.03779  [pdf, other

    cs.CR cs.IR cs.LG stat.ML

    Automatic Yara Rule Generation Using Biclustering

    Authors: Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, James Holt

    Abstract: Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Develo** high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams ($n \geq 8$) combin… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

    Comments: to be published in the 13th ACM Workshop on Artificial Intelligence and Security (AISec)

  5. arXiv:1805.09738  [pdf, other

    cs.CR

    Detecting Homoglyph Attacks with a Siamese Neural Network

    Authors: Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, Daniel Grant

    Abstract: A homoglyph (name spoofing) attack is a common technique used by adversaries to obfuscate file and domain names. This technique creates process or domain names that are visually similar to legitimate and recognized names. For instance, an attacker may create malware with the name svch0st.exe so that in a visual inspection of running processes or a directory listing, the process or file name might… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  6. arXiv:1804.04637  [pdf, other

    cs.CR

    EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

    Authors: Hyrum S. Anderson, Phil Roth

    Abstract: This paper describes EMBER: a labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign). To accompany the dataset, we also release open source co… ▽ More

    Submitted 16 April, 2018; v1 submitted 12 April, 2018; originally announced April 2018.

  7. arXiv:1801.08917  [pdf, other

    cs.CR

    Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning

    Authors: Hyrum S. Anderson, Anant Kharkar, Bobby Filar, David Evans, Phil Roth

    Abstract: Machine learning is a popular approach to signatureless malware detection because it can generalize to never-before-seen malware families and polymorphic strains. This has resulted in its practical use for either primary detection engines or for supplementary heuristic detection by anti-malware vendors. Recent work in adversarial machine learning has shown that deep learning models are susceptible… ▽ More

    Submitted 30 January, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

  8. arXiv:1611.00791  [pdf, other

    cs.CR cs.AI

    Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

    Authors: Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja, Daniel Grant

    Abstract: Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to a command and control (C&C) server. In order to block DGA C&C traffic, security organizations must first discover the algorithm by reverse engineering malware samples, then generating a list of domains for a given seed. The domains are then either preregistered… ▽ More

    Submitted 2 November, 2016; originally announced November 2016.

  9. arXiv:1610.01969  [pdf, other

    cs.CR cs.AI

    DeepDGA: Adversarially-Tuned Domain Generation and Detection

    Authors: Hyrum S. Anderson, Jonathan Woodbridge, Bobby Filar

    Abstract: Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been… ▽ More

    Submitted 6 October, 2016; originally announced October 2016.