Showing 1–2 of 2 results for author: Luxemburk, J

Search v0.5.6 released 2020-02-24

arXiv:2310.19568 [pdf, other]

cs.LG cs.NI

doi 10.1145/3630050.3630176

DataZoo: Streamlining Traffic Classification Experiments

Authors: Jan Luxemburk, Karel Hynek

Abstract: The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the g… ▽ More The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification and to reduce the space for potential mistakes in the evaluation setup. DataZoo provides a standardized API for accessing three extensive datasets -- CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-Year22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Journal ref: SAFE '23: Proceedings of the 2023 on Explainable and Safety Bounded, Fidelitous, Machine Learning for Networking
arXiv:2202.11984 [pdf, other]

cs.LG cs.NI

doi 10.1016/j.comnet.2022.109467

Fine-grained TLS services classification with reject option

Authors: Jan Luxemburk, Tomáš Čejka

Abstract: The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset w… ▽ More The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata. The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic. The number of service labels, which is important to make the problem hard and realistic, is four times higher than in the public dataset with the most class labels. The published dataset is intended as a benchmark for identifying services in encrypted traffic. Service identification can be further extended with the task of "rejecting" unknown services, i.e., the traffic not seen during the training phase. Neural networks offer superior performance for tackling this more challenging problem. To showcase the dataset's usefulness, we implemented a neural network with a multi-modal architecture, which is the state-of-the-art approach, and achieved 97.04% classification accuracy and detected 91.94% of unknown services with 5% false positive rate. △ Less

Submitted 29 November, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Journal ref: Computer Networks, vol. 220, p. 109467, Jan. 2023

Search v0.5.6 released 2020-02-24