-
DataZoo: Streamlining Traffic Classification Experiments
Authors:
Jan Luxemburk,
Karel Hynek
Abstract:
The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the g…
▽ More
The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification and to reduce the space for potential mistakes in the evaluation setup. DataZoo provides a standardized API for accessing three extensive datasets -- CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-Year22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification
Authors:
Josef Koumar,
Karel Hynek,
Jaroslav Pešek,
Tomáš Čejka
Abstract:
Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large IPS-based networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended for additional features that enable netwo…
▽ More
Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large IPS-based networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended for additional features that enable network traffic analysis with high accuracy. Nevertheless, the flow extensions are often too large or hard to compute, which limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed), which is based on the analysis of the time series of packet sizes. By thoroughly testing 25 different network classification tasks, we show the broad applicability and high usability of NetTiSA, which often outperforms the best-performing related works. For practical deployment, we also consider the sizes of flows extended for NetTiSA and evaluate the performance impacts of its computation in the flow exporter. The novel feature set proved universal and deployable to high-speed ISP networks with 100\,Gbps lines; thus, it enables accurate and widespread network security protection.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Network Traffic Classification based on Single Flow Time Series Analysis
Authors:
Josef Koumar,
Karel Hynek,
Tomáš Čejka
Abstract:
Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number…
▽ More
Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number of bytes in each packet and its timestamp. We propose 69 universal features based on the statistical analysis of data points, time domain analysis, packet distribution within the flow timespan, time series behavior, and frequency domain analysis. We have demonstrated the usability and universality of the proposed feature vector for various network traffic classification tasks using 15 well-known publicly available datasets. Our evaluation shows that the novel feature vector achieves classification performance similar or better than related works on both binary and multiclass classification tasks. In more than half of the evaluated tasks, the classification performance increased by up to 5\%.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Large Scale Measurement on the Adoption of Encrypted DNS
Authors:
Sebastián García,
Karel Hynek,
Dmtrii Vekshin,
Tomáš Čejka,
Armin Wasicek
Abstract:
Several encryption proposals for DNS have been presented since 2016, but their adoption was not comprehensively studied yet. This research measured the current adoption of DoH (DNS over HTTPS), DoT (DNS over TLS), and DoQ (DNS over QUIC) for five months at the beginning of 2021 by three different organizations with global coverage. By comparing the total values, amount of requests per user, and th…
▽ More
Several encryption proposals for DNS have been presented since 2016, but their adoption was not comprehensively studied yet. This research measured the current adoption of DoH (DNS over HTTPS), DoT (DNS over TLS), and DoQ (DNS over QUIC) for five months at the beginning of 2021 by three different organizations with global coverage. By comparing the total values, amount of requests per user, and the seasonality of the traffic, it was possible to obtain the current adoption trends. Moreover, we actively scanned the Internet for still-unknown working DoH servers and we compared them with a novel curated list of well-known DoH servers. We conclude that despite growing in 2020, during the first five months of 2021 there was statistically significant evidence that the average amount of Internet traffic for DoH, DoT and DoQ remained stationary. However, we found that the amount of, still unknown and ready to use, DoH servers grew 4 times. These measurements suggest that even though the amount of encrypted DNS is currently not growing, there may probably be more connections soon to those unknown DoH servers for benign and malicious purposes.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.