-
AD-NEv++ : The multi-architecture neuroevolution-based multivariate anomaly detection framework
Authors:
Marcin Pietroń,
Dominik Żurek,
Kamil Faber,
Roberto Corizzo
Abstract:
Anomaly detection tools and methods enable key analytical capabilities in modern cyberphysical and sensor-based systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time-consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search me…
▽ More
Anomaly detection tools and methods enable key analytical capabilities in modern cyberphysical and sensor-based systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time-consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method for learning optimal neural networks, supporting both gradient and non-gradient fine tuning. However, existing frameworks incorporating neuroevolution lack of support for new layers and architectures and are typically limited to convolutional and LSTM layers. In this paper we propose AD-NEv++, a three-stage neuroevolution-based method that synergically combines subspace evolution, model evolution, and fine-tuning. Our method overcomes the limitations of existing approaches by optimizing the mutation operator in the neuroevolution process, while supporting a wide spectrum of neural layers, including attention, dense, and graph convolutional layers. Our extensive experimental evaluation was conducted with widely adopted multivariate anomaly detection benchmark datasets, and showed that the models generated by AD-NEv++ outperform well-known deep learning architectures and neuroevolution-based approaches for anomaly detection. Moreover, results show that AD-NEv++ can improve and outperform the state-of-the-art GNN (Graph Neural Networks) model architecture in all anomaly detection benchmarks.
△ Less
Submitted 25 March, 2024;
originally announced April 2024.
-
Towards efficient deep autoencoders for multivariate time series anomaly detection
Authors:
Marcin Pietroń,
Dominik Żurek,
Kamil Faber,
Roberto Corizzo
Abstract:
Multivariate time series anomaly detection is a crucial problem in many industrial and research applications. Timely detection of anomalies allows, for instance, to prevent defects in manufacturing processes and failures in cyberphysical systems. Deep learning methods are preferred among others for their accuracy and robustness for the analysis of complex multivariate data. However, a key aspect i…
▽ More
Multivariate time series anomaly detection is a crucial problem in many industrial and research applications. Timely detection of anomalies allows, for instance, to prevent defects in manufacturing processes and failures in cyberphysical systems. Deep learning methods are preferred among others for their accuracy and robustness for the analysis of complex multivariate data. However, a key aspect is being able to extract predictions in a timely manner, to accommodate real-time requirements in different applications. In the case of deep learning models, model reduction is extremely important to achieve optimal results in real-time systems with limited time and memory constraints. In this paper, we address this issue by proposing a novel compression method for deep autoencoders that involves three key factors. First, pruning reduces the number of weights, while preventing catastrophic drops in accuracy by means of a fast search process that identifies high sparsity levels. Second, linear and non-linear quantization reduces model complexity by reducing the number of bits for every single weight. The combined contribution of these three aspects allow the model size to be reduced, by removing a subset of the weights (pruning), and decreasing their bit-width (quantization). As a result, the compressed model is faster and easier to adopt in highly constrained hardware environments. Experiments performed on popular multivariate anomaly detection benchmarks, show that our method is capable of achieving significant model compression ratio (between 80% and 95%) without a significant reduction in the anomaly detection performance.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Using LLMs to discover emerging coded antisemitic hate-speech in extremist social media
Authors:
Dhanush Kikkisetti,
Raza Ul Mustafa,
Wendy Melillo,
Roberto Corizzo,
Zois Boukouvalas,
Jeff Gill,
Nathalie Japkowicz
Abstract:
Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The…
▽ More
Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned large language model, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.
△ Less
Submitted 23 January, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Ada-QPacknet -- adaptive pruning with bit width reduction as an efficient continual learning method without forgetting
Authors:
Marcin Pietroń,
Dominik Żurek,
Kamil Faber,
Roberto Corizzo
Abstract:
Continual Learning (CL) is a process in which there is still huge gap between human and deep learning model efficiency. Recently, many CL algorithms were designed. Most of them have many problems with learning in dynamic and complex environments. In this work new architecture based approach Ada-QPacknet is described. It incorporates the pruning for extracting the sub-network for each task. The cru…
▽ More
Continual Learning (CL) is a process in which there is still huge gap between human and deep learning model efficiency. Recently, many CL algorithms were designed. Most of them have many problems with learning in dynamic and complex environments. In this work new architecture based approach Ada-QPacknet is described. It incorporates the pruning for extracting the sub-network for each task. The crucial aspect in architecture based CL methods is theirs capacity. In presented method the size of the model is reduced by efficient linear and nonlinear quantisation approach. The method reduces the bit-width of the weights format. The presented results shows that low bit quantisation achieves similar accuracy as floating-point sub-network on a well-know CL scenarios. To our knowledge it is the first CL strategy which incorporates both compression techniques pruning and quantisation for generating task sub-networks. The presented algorithm was tested on well-known episode combinations and compared with most popular algorithms. Results show that proposed approach outperforms most of the CL strategies in task and class incremental scenarios.
△ Less
Submitted 1 October, 2023; v1 submitted 14 August, 2023;
originally announced August 2023.
-
AD-NEV: A Scalable Multi-level Neuroevolution Framework for Multivariate Anomaly Detection
Authors:
Marcin Pietron,
Dominik Zurek,
Kamil Faber,
Roberto Corizzo
Abstract:
Anomaly detection tools and methods present a key capability in modern cyberphysical and failure prediction systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method…
▽ More
Anomaly detection tools and methods present a key capability in modern cyberphysical and failure prediction systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method for learning optimal neural networks, supporting both gradient and non-gradient fine tuning. However, existing methods mostly focus on optimizing model architectures without taking into account feature subspaces and model weights. In this work, we propose Anomaly Detection Neuroevolution (AD-NEv) - a scalable multi-level optimized neuroevolution framework for multivariate time series anomaly detection. The method represents a novel approach to synergically: i) optimize feature subspaces for an ensemble model based on the bagging technique; ii) optimize the model architecture of single anomaly detection models; iii) perform non-gradient fine-tuning of network weights. An extensive experimental evaluation on widely adopted multivariate anomaly detection benchmark datasets shows that the models extracted by AD-NEv outperform well-known deep learning architectures for anomaly detection. Moreover, results show that AD-NEv can perform the whole process efficiently, presenting high scalability when multiple GPUs are available.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum Learning
Authors:
Kamil Faber,
Dominik Zurek,
Marcin Pietron,
Nathalie Japkowicz,
Antonio Vergari,
Roberto Corizzo
Abstract:
Continual learning (CL) is one of the most promising trends in recent machine learning research. Its goal is to go beyond classical assumptions in machine learning and develop models and learning strategies that present high robustness in dynamic environments. The landscape of CL research is fragmented into several learning evaluation protocols, comprising different learning tasks, datasets, and e…
▽ More
Continual learning (CL) is one of the most promising trends in recent machine learning research. Its goal is to go beyond classical assumptions in machine learning and develop models and learning strategies that present high robustness in dynamic environments. The landscape of CL research is fragmented into several learning evaluation protocols, comprising different learning tasks, datasets, and evaluation metrics. Additionally, the benchmarks adopted so far are still distant from the complexity of real-world scenarios, and are usually tailored to highlight capabilities specific to certain strategies. In such a landscape, it is hard to objectively assess strategies. In this work, we fill this gap for CL on image data by introducing two novel CL benchmarks that involve multiple heterogeneous tasks from six image datasets, with varying levels of complexity and quality. Our aim is to fairly evaluate current state-of-the-art CL strategies on a common ground that is closer to complex real-world scenarios. We additionally structure our benchmarks so that tasks are presented in increasing and decreasing order of complexity -- according to a curriculum -- in order to evaluate if current CL models are able to exploit structure across tasks. We devote particular emphasis to providing the CL community with a rigorous and reproducible evaluation protocol for measuring the ability of a model to generalize and not to forget while learning. Furthermore, we provide an extensive experimental evaluation showing that popular CL strategies, when challenged with our benchmarks, yield sub-par performance, high levels of forgetting, and present a limited ability to effectively leverage curriculum task ordering. We believe that these results highlight the need for rigorous comparisons in future CL works as well as pave the way to design new CL strategies that are able to deal with more complex scenarios.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Lifelong Continual Learning for Anomaly Detection: New Challenges, Perspectives, and Insights
Authors:
Kamil Faber,
Roberto Corizzo,
Bartlomiej Sniezynski,
Nathalie Japkowicz
Abstract:
Anomaly detection is of paramount importance in many real-world domains, characterized by evolving behavior. Lifelong learning represents an emerging trend, answering the need for machine learning models that continuously adapt to new challenges in dynamic environments while retaining past knowledge. However, limited efforts are dedicated to building foundations for lifelong anomaly detection, whi…
▽ More
Anomaly detection is of paramount importance in many real-world domains, characterized by evolving behavior. Lifelong learning represents an emerging trend, answering the need for machine learning models that continuously adapt to new challenges in dynamic environments while retaining past knowledge. However, limited efforts are dedicated to building foundations for lifelong anomaly detection, which provides intrinsically different challenges compared to the more widely explored classification setting. In this paper, we face this issue by exploring, motivating, and discussing lifelong anomaly detection, trying to build foundations for its wider adoption. First, we explain why lifelong anomaly detection is relevant, defining challenges and opportunities to design anomaly detection methods that deal with lifelong learning complexities. Second, we characterize learning settings and a scenario generation procedure that enables researchers to experiment with lifelong anomaly detection using existing datasets. Third, we perform experiments with popular anomaly detection methods on proposed lifelong scenarios, emphasizing the gap in performance that could be gained with the adoption of lifelong learning. Overall, we conclude that the adoption of lifelong anomaly detection is important to design more robust models that provide a comprehensive view of the environment, as well as simultaneous adaptation and knowledge retention.
△ Less
Submitted 2 April, 2024; v1 submitted 13 March, 2023;
originally announced March 2023.
-
System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games
Authors:
Indranil Sur,
Zachary Daniels,
Abrar Rahman,
Kamil Faber,
Gianmarco J. Gallardo,
Tyler L. Hayes,
Cameron E. Taylor,
Mustafa Burak Gurbuz,
James Smith,
Sahana Joshi,
Nathalie Japkowicz,
Michael Baron,
Zsolt Kira,
Christopher Kanan,
Roberto Corizzo,
Ajay Divakaran,
Michael Piacentino,
Jesse Hostetler,
Aswin Raghavan
Abstract:
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new ta…
▽ More
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than develo** individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
WATCH: Wasserstein Change Point Detection for High-Dimensional Time Series Data
Authors:
Kamil Faber,
Roberto Corizzo,
Bartlomiej Sniezynski,
Michael Baron,
Nathalie Japkowicz
Abstract:
Detecting relevant changes in dynamic time series data in a timely manner is crucially important for many data analysis tasks in real-world settings. Change point detection methods have the ability to discover changes in an unsupervised fashion, which represents a desirable property in the analysis of unbounded and unlabeled data streams. However, one limitation of most of the existing approaches…
▽ More
Detecting relevant changes in dynamic time series data in a timely manner is crucially important for many data analysis tasks in real-world settings. Change point detection methods have the ability to discover changes in an unsupervised fashion, which represents a desirable property in the analysis of unbounded and unlabeled data streams. However, one limitation of most of the existing approaches is represented by their limited ability to handle multivariate and high-dimensional data, which is frequently observed in modern applications such as traffic flow prediction, human activity recognition, and smart grids monitoring. In this paper, we attempt to fill this gap by proposing WATCH, a novel Wasserstein distance-based change point detection approach that models an initial distribution and monitors its behavior while processing new data points, providing accurate and robust detection of change points in dynamic high-dimensional data. An extensive experimental evaluation involving a large number of benchmark datasets shows that WATCH is capable of accurately identifying change points and outperforming state-of-the-art methods.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Fast and scalable neuroevolution deep learning architecture search for multivariate anomaly detection
Authors:
M. Pietroń,
D. Żurek,
K. Faber,
R. Corizzo
Abstract:
Neuroevolution is one of the methodologies that can be used for learning optimal architecture during training. It uses evolutionary algorithms to generate the topology of artificial neural networks and its parameters. The main benefits are that it is scalable and can be fully or partially non gradient method. In this work, a modified neuroevolution technique is presented which incorporates multi-l…
▽ More
Neuroevolution is one of the methodologies that can be used for learning optimal architecture during training. It uses evolutionary algorithms to generate the topology of artificial neural networks and its parameters. The main benefits are that it is scalable and can be fully or partially non gradient method. In this work, a modified neuroevolution technique is presented which incorporates multi-level optimisation. The presented approach adapts evolution strategies for evolving an ensemble model based on the bagging technique, using genetic operators for optimising single anomaly detection models, reducing the training dataset to speedup the search process and perform non-gradient fine tuning. Multivariate anomaly detection as an unsupervised learning task is the case study upon which the presented approach is tested. Single model optimisation is based on mutation and crossover operators and is focused on finding optimal window sizes, the number of layers, layer depths, hyperparameters etc. to boost the anomaly detection scores of new and already known models. The proposed framework and its protocol shows that it is possible to find architecture within a reasonable time frame which can boost all well known multivariate anomaly detection deep learning architectures. The work concentrates on improvements to the multi-level neuroevolution approach for anomaly detection. The main modifications are in the methods of mixing groups and single model evolution, non-gradient fine tuning and a voting mechanism. The presented framework can be used as an efficient learning network architecture method for any different unsupervised task where autoencoder architectures can be used. The tests were run on SWAT, WADI, MSL and SMAP datasets and the presented approach evolved the architectures that achieved the best scores among other deep learning models.
△ Less
Submitted 27 August, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
On the combined effect of class imbalance and concept complexity in deep learning
Authors:
Kushankur Ghosh,
Colin Bellinger,
Roberto Corizzo,
Bartosz Krawczyk,
Nathalie Japkowicz
Abstract:
Structural concept complexity, class overlap, and data scarcity are some of the most important factors influencing the performance of classifiers under class imbalance conditions. When these effects were uncovered in the early 2000s, understandably, the classifiers on which they were demonstrated belonged to the classical rather than Deep Learning categories of approaches. As Deep Learning is gain…
▽ More
Structural concept complexity, class overlap, and data scarcity are some of the most important factors influencing the performance of classifiers under class imbalance conditions. When these effects were uncovered in the early 2000s, understandably, the classifiers on which they were demonstrated belonged to the classical rather than Deep Learning categories of approaches. As Deep Learning is gaining ground over classical machine learning and is beginning to be used in critical applied settings, it is important to assess systematically how well they respond to the kind of challenges their classical counterparts have struggled with in the past two decades. The purpose of this paper is to study the behavior of deep learning systems in settings that have previously been deemed challenging to classical machine learning systems to find out whether the depth of the systems is an asset in such settings. The results in both artificial and real-world image datasets (MNIST Fashion, CIFAR-10) show that these settings remain mostly challenging for Deep Learning systems and that deeper architectures seem to help with structural concept complexity but not with overlap challenges in simple artificial domains. Data scarcity is not overcome by deeper layers, either. In the real-world image domains, where overfitting is a greater concern than in the artificial domains, the advantage of deeper architectures is less obvious: while it is observed in certain cases, it is quickly cancelled as models get deeper and perform worse than their shallower counterparts.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
ReMix: Calibrated Resampling for Class Imbalance in Deep learning
Authors:
Colin Bellinger,
Roberto Corizzo,
Nathalie Japkowicz
Abstract:
Class imbalance is a problem of significant importance in applied deep learning where trained models are exploited for decision support and automated decisions in critical areas such as health and medicine, transportation, and finance. The challenge of learning deep models from imbalanced training data remains high, and the state-of-the-art solutions are typically data dependent and primarily focu…
▽ More
Class imbalance is a problem of significant importance in applied deep learning where trained models are exploited for decision support and automated decisions in critical areas such as health and medicine, transportation, and finance. The challenge of learning deep models from imbalanced training data remains high, and the state-of-the-art solutions are typically data dependent and primarily focused on image data. Real-world imbalanced classification problems, however, are much more diverse thus necessitating a general solution that can be applied to tabular, image and text data. In this paper, we propose ReMix, a training technique that leverages batch resampling, instance mixing and soft-labels to enable the induction of robust deep models for imbalanced learning. Our results show that dense nets and CNNs trained with ReMix generally outperform the alternatives according to the g-mean and are better calibrated according to the balanced Brier score.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.