-
A Novel Open Set Energy-based Flow Classifier for Network Intrusion Detection
Authors:
Manuela M. C. Souza,
Camila Pontes,
Joao Gondim,
Luis P. F. Garcia,
Luiz DaSilva,
Marcelo A. Marotta
Abstract:
Network intrusion detection systems (NIDS) are one of many solutions that make up a computer security system. Several machine learning-based NIDS have been proposed in recent years, but most of them were developed and evaluated under the assumption that the training context is similar to the test context. In real networks, this assumption is false, given the emergence of new attacks and variants o…
▽ More
Network intrusion detection systems (NIDS) are one of many solutions that make up a computer security system. Several machine learning-based NIDS have been proposed in recent years, but most of them were developed and evaluated under the assumption that the training context is similar to the test context. In real networks, this assumption is false, given the emergence of new attacks and variants of known attacks. To deal with this reality, the open set recognition field, which is the most general task of recognizing classes not seen during training in any domain, began to gain importance in NIDS research. Yet, existing solutions are often bounded to high temporal complexities and performance bottlenecks. In this work, we propose an algorithm to be used in NIDS that performs open set recognition. Our proposal is an adaptation of the single-class Energy-based Flow Classifier (EFC), which proved to be an algorithm with strong generalization capability and low computational cost. The new version of EFC correctly classifies not only known attacks, but also unknown ones, and differs from other proposals from the literature by presenting a single layer with low temporal complexity. Our proposal was evaluated against well-established multi-class algorithms and as an open set classifier. It proved to be an accurate classifier in both evaluations, similar to the state of the art. As a conclusion of our work, we consider EFC a promising algorithm to be used in NIDS for its high performance and applicability in real networks.
△ Less
Submitted 26 April, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Characterizing classification datasets: a study of meta-features for meta-learning
Authors:
Adriano Rivolli,
Luís P. F. Garcia,
Carlos Soares,
Joaquin Vanschoren,
André C. P. L. F. de Carvalho
Abstract:
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for…
▽ More
Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for the performance of machine learning algorithms trained on them. Unfortunately, despite being used in a large number of studies, meta-features are not uniformly described, organized and computed, making many empirical studies irreproducible and hard to compare. This paper aims to deal with this by systematizing and standardizing data characterization measures for classification datasets used in meta-learning. Moreover, it presents MFE, a new tool for extracting meta-features from datasets and identifying more subtle reproducibility issues in the literature, proposing guidelines for data characterization that strengthen reproducible empirical research in meta-learning.
△ Less
Submitted 26 August, 2019; v1 submitted 30 August, 2018;
originally announced August 2018.
-
How Complex is your classification problem? A survey on measuring classification complexity
Authors:
Ana C. Lorena,
Luís P. F. Garcia,
Jens Lehmann,
Marcilio C. P. Souto,
Tin K. Ho
Abstract:
Characteristics extracted from the training datasets of classification problems have proven to be effective predictors in a number of meta-analyses. Among them, measures of classification complexity can be used to estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision…
▽ More
Characteristics extracted from the training datasets of classification problems have proven to be effective predictors in a number of meta-analyses. Among them, measures of classification complexity can be used to estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the known measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenges highlighted by such characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.
△ Less
Submitted 30 December, 2020; v1 submitted 10 August, 2018;
originally announced August 2018.