Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Raff, Edward; Fleshman, William; Zak, Richard; Anderson, Hyrum S.; Filar, Bobby; McLean, Mark

Statistics > Machine Learning

arXiv:2012.09390 (stat)

[Submitted on 17 Dec 2020]

Title:Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Authors:Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, Mark McLean

View PDF

Abstract:Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a convolutional neural network capable of processing up to $T=2,000,000$ steps. The $\mathcal{O}(T)$ memory of CNNs has prevented further application of CNNs to malware. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length $T$. This makes MalConv $116\times$ more memory efficient, and up to $25.8\times$ faster to train on its original dataset, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by develo** a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv CNN. Our implementation can be found at this https URL

Comments:	To appear in AAAI 2021
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2012.09390 [stat.ML]
	(or arXiv:2012.09390v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2012.09390

Submission history

From: Edward Raff [view email]
[v1] Thu, 17 Dec 2020 04:45:33 UTC (840 KB)

Statistics > Machine Learning

Title:Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators