Search | arXiv e-print repository

arXiv:2310.05530 [pdf, other]

NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification

Authors: Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka

Abstract: Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large IPS-based networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended for additional features that enable netwo… ▽ More Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large IPS-based networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended for additional features that enable network traffic analysis with high accuracy. Nevertheless, the flow extensions are often too large or hard to compute, which limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed), which is based on the analysis of the time series of packet sizes. By thoroughly testing 25 different network classification tasks, we show the broad applicability and high usability of NetTiSA, which often outperforms the best-performing related works. For practical deployment, we also consider the sizes of flows extended for NetTiSA and evaluate the performance impacts of its computation in the flow exporter. The novel feature set proved universal and deployable to high-speed ISP networks with 100\,Gbps lines; thus, it enables accurate and widespread network security protection. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Submitted to The International Journal of Computer and Telecommunications Networking

arXiv:2307.13434 [pdf, other]

Network Traffic Classification based on Single Flow Time Series Analysis

Authors: Josef Koumar, Karel Hynek, Tomáš Čejka

Abstract: Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number… ▽ More Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number of bytes in each packet and its timestamp. We propose 69 universal features based on the statistical analysis of data points, time domain analysis, packet distribution within the flow timespan, time series behavior, and frequency domain analysis. We have demonstrated the usability and universality of the proposed feature vector for various network traffic classification tasks using 15 well-known publicly available datasets. Our evaluation shows that the novel feature vector achieves classification performance similar or better than related works on both binary and multiclass classification tasks. In more than half of the evaluated tasks, the classification performance increased by up to 5\%. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Submitted to The 19th International Conference on Network and Service Management (CNSM) 2023

arXiv:2211.08399 [pdf, other]

Active Learning Framework to Automate NetworkTraffic Classification

Authors: Jaroslav Pešek, Dominik Soukup, Tomáš Čejka

Abstract: Recent network traffic classification methods benefitfrom machine learning (ML) technology. However, there aremany challenges due to use of ML, such as: lack of high-qualityannotated datasets, data-drifts and other effects causing aging ofdatasets and ML models, high volumes of network traffic etc. Thispaper argues that it is necessary to augment traditional workflowsof ML training&deployment and… ▽ More Recent network traffic classification methods benefitfrom machine learning (ML) technology. However, there aremany challenges due to use of ML, such as: lack of high-qualityannotated datasets, data-drifts and other effects causing aging ofdatasets and ML models, high volumes of network traffic etc. Thispaper argues that it is necessary to augment traditional workflowsof ML training&deployment and adapt Active Learning concepton network traffic analysis. The paper presents a novel ActiveLearning Framework (ALF) to address this topic. ALF providesprepared software components that can be used to deploy an activelearning loop and maintain an ALF instance that continuouslyevolves a dataset and ML model automatically. The resultingsolution is deployable for IP flow-based analysis of high-speed(100 Gb/s) networks, and also supports research experiments ondifferent strategies and methods for annotation, evaluation, datasetoptimization, etc. Finally, the paper lists some research challengesthat emerge from the first experiments with ALF in practice. △ Less

Submitted 26 October, 2022; originally announced November 2022.

arXiv:2202.11984 [pdf, other]

doi 10.1016/j.comnet.2022.109467

Fine-grained TLS services classification with reject option

Authors: Jan Luxemburk, Tomáš Čejka

Abstract: The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset w… ▽ More The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata. The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic. The number of service labels, which is important to make the problem hard and realistic, is four times higher than in the public dataset with the most class labels. The published dataset is intended as a benchmark for identifying services in encrypted traffic. Service identification can be further extended with the task of "rejecting" unknown services, i.e., the traffic not seen during the training phase. Neural networks offer superior performance for tackling this more challenging problem. To showcase the dataset's usefulness, we implemented a neural network with a multi-modal architecture, which is the state-of-the-art approach, and achieved 97.04% classification accuracy and detected 91.94% of unknown services with 5% false positive rate. △ Less

Submitted 29 November, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Journal ref: Computer Networks, vol. 220, p. 109467, Jan. 2023

arXiv:2107.04436 [pdf, other]

Large Scale Measurement on the Adoption of Encrypted DNS

Authors: Sebastián García, Karel Hynek, Dmtrii Vekshin, Tomáš Čejka, Armin Wasicek

Abstract: Several encryption proposals for DNS have been presented since 2016, but their adoption was not comprehensively studied yet. This research measured the current adoption of DoH (DNS over HTTPS), DoT (DNS over TLS), and DoQ (DNS over QUIC) for five months at the beginning of 2021 by three different organizations with global coverage. By comparing the total values, amount of requests per user, and th… ▽ More Several encryption proposals for DNS have been presented since 2016, but their adoption was not comprehensively studied yet. This research measured the current adoption of DoH (DNS over HTTPS), DoT (DNS over TLS), and DoQ (DNS over QUIC) for five months at the beginning of 2021 by three different organizations with global coverage. By comparing the total values, amount of requests per user, and the seasonality of the traffic, it was possible to obtain the current adoption trends. Moreover, we actively scanned the Internet for still-unknown working DoH servers and we compared them with a novel curated list of well-known DoH servers. We conclude that despite growing in 2020, during the first five months of 2021 there was statistically significant evidence that the average amount of Internet traffic for DoH, DoT and DoQ remained stationary. However, we found that the amount of, still unknown and ready to use, DoH servers grew 4 times. These measurements suggest that even though the amount of encrypted DNS is currently not growing, there may probably be more connections soon to those unknown DoH servers for benign and malicious purposes. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Comments: 16 pages, 10 figures

ACM Class: C.2.0; C.2.2

Showing 1–5 of 5 results for author: Čejka, T