Search | arXiv e-print repository

doi 10.1145/3605098.3636046

Understanding the Process of Data Labeling in Cybersecurity

Authors: Tobias Braun, Irdin Pekaric, Giovanni Apruzzese

Abstract: Many domains now leverage the benefits of Machine Learning (ML), which promises solutions that can autonomously learn to solve complex tasks by training over some data. Unfortunately, in cyberthreat detection, high-quality data is hard to come by. Moreover, for some specific applications of ML, such data must be labeled by human operators. Many works "assume" that labeling is tough/challenging/cos… ▽ More Many domains now leverage the benefits of Machine Learning (ML), which promises solutions that can autonomously learn to solve complex tasks by training over some data. Unfortunately, in cyberthreat detection, high-quality data is hard to come by. Moreover, for some specific applications of ML, such data must be labeled by human operators. Many works "assume" that labeling is tough/challenging/costly in cyberthreat detection, thereby proposing solutions to address such a hurdle. Yet, we found no work that specifically addresses the process of labeling 'from the viewpoint of ML security practitioners'. This is a problem: to this date, it is still mostly unknown how labeling is done in practice -- thereby preventing one from pinpointing "what is needed" in the real world. In this paper, we take the first step to build a bridge between academic research and security practice in the context of data labeling. First, we reach out to five subject matter experts and carry out open interviews to identify pain points in their labeling routines. Then, by using our findings as a scaffold, we conduct a user study with 13 practitioners from large security companies, and ask detailed questions on subjects such as active learning, costs of labeling, and revision of labels. Finally, we perform proof-of-concept experiments addressing labeling-related aspects in cyberthreat detection that are sometimes overlooked in research. Altogether, our contributions and recommendations serve as a step** stone to future endeavors aimed at improving the quality and robustness of ML-driven security systems. We release our resources. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.00654 [pdf, other]

Streamlining Attack Tree Generation: A Fragment-Based Approach

Authors: Irdin Pekaric, Markus Frick, Jubril Gbolahan Adigun, Raffaela Groner, Thomas Witte, Alexander Raschke, Michael Felderer, Matthias Tichy

Abstract: Attack graphs are a tool for analyzing security vulnerabilities that capture different and prospective attacks on a system. As a threat modeling tool, it shows possible paths that an attacker can exploit to achieve a particular goal. However, due to the large number of vulnerabilities that are published on a daily basis, they have the potential to rapidly expand in size. Consequently, this necessi… ▽ More Attack graphs are a tool for analyzing security vulnerabilities that capture different and prospective attacks on a system. As a threat modeling tool, it shows possible paths that an attacker can exploit to achieve a particular goal. However, due to the large number of vulnerabilities that are published on a daily basis, they have the potential to rapidly expand in size. Consequently, this necessitates a significant amount of resources to generate attack graphs. In addition, generating composited attack models for complex systems such as self-adaptive or AI is very difficult due to their nature to continuously change. In this paper, we present a novel fragment-based attack graph generation approach that utilizes information from publicly available information security databases. Furthermore, we also propose a domain-specific language for attack modeling, which we employ in the proposed attack graph generation approach. Finally, we present a demonstrator example showcasing the attack generator's capability to replicate a verified attack chain, as previously confirmed by security experts. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: To appear at the 57th Hawaii International Conference on Social Systems (HICSS-57), Honolulu, Hawaii. 2024

arXiv:2309.09941 [pdf, other]

doi 10.1007/978-3-031-40923-3_9

Model-Based Generation of Attack-Fault Trees

Authors: Raffaela Groner, Thomas Witte, Alexander Raschke, Sophie Hirn, Irdin Pekaric, Markus Frick, Matthias Tichy, Michael Felderer

Abstract: Joint safety and security analysis of cyber-physical systems is a necessary step to correctly capture inter-dependencies between these properties. Attack-Fault Trees represent a combination of dynamic Fault Trees and Attack Trees and can be used to model and model-check a holistic view on both safety and security. Manually creating a complete AFT for the whole system is, however, a daunting task.… ▽ More Joint safety and security analysis of cyber-physical systems is a necessary step to correctly capture inter-dependencies between these properties. Attack-Fault Trees represent a combination of dynamic Fault Trees and Attack Trees and can be used to model and model-check a holistic view on both safety and security. Manually creating a complete AFT for the whole system is, however, a daunting task. It needs to span multiple abstraction layers, e.g., abstract application architecture and data flow as well as system and library dependencies that are affected by various vulnerabilities. We present an AFT generation tool-chain that facilitates this task using partial Fault and Attack Trees that are either manually created or mined from vulnerability databases. We semi-automatically create two system models that provide the necessary information to automatically combine these partial Fault and Attack Trees into complete AFTs using graph transformation rules. △ Less