-
Cost-Driven Data Replication with Predictions
Authors:
Tianyu Zuo,
Xueyan Tang,
Bu Sung Lee
Abstract:
This paper studies an online replication problem for distributed data access. The goal is to dynamically create and delete data copies in a multi-server system as time passes to minimize the total storage and network cost of serving access requests. We study the problem in the emergent learning-augmented setting, assuming simple binary predictions about inter-request times at individual servers. W…
▽ More
This paper studies an online replication problem for distributed data access. The goal is to dynamically create and delete data copies in a multi-server system as time passes to minimize the total storage and network cost of serving access requests. We study the problem in the emergent learning-augmented setting, assuming simple binary predictions about inter-request times at individual servers. We develop an online algorithm and prove that it is ($\frac{5+α}{3}$)-consistent (competitiveness under perfect predictions) and ($1 + \frac{1}α$)-robust (competitiveness under terrible predictions), where $α\in (0, 1]$ is a hyper-parameter representing the level of distrust in the predictions. We also study the impact of mispredictions on the competitive ratio of the proposed algorithm and adapt it to achieve a bounded robustness while retaining its consistency. We further establish a lower bound of $\frac{3}{2}$ on the consistency of any deterministic learning-augmented algorithm. Experimental evaluations are carried out to evaluate our algorithms using real data access traces.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly Detection
Authors:
Ijaz Ul Haq,
Byung Suk Lee,
Donna M. Rizzo
Abstract:
The surge in real-time data collection across various industries has underscored the need for advanced anomaly detection in both univariate and multivariate time series data. This paper introduces TransNAS-TSAD, a framework that synergizes the transformer architecture with neural architecture search (NAS), enhanced through NSGA-II algorithm optimization. This approach effectively tackles the compl…
▽ More
The surge in real-time data collection across various industries has underscored the need for advanced anomaly detection in both univariate and multivariate time series data. This paper introduces TransNAS-TSAD, a framework that synergizes the transformer architecture with neural architecture search (NAS), enhanced through NSGA-II algorithm optimization. This approach effectively tackles the complexities of time series data, balancing computational efficiency with detection accuracy. Our evaluation reveals that TransNAS-TSAD surpasses conventional anomaly detection models due to its tailored architectural adaptability and the efficient exploration of complex search spaces, leading to marked improvements in diverse data scenarios. We also introduce the Efficiency-Accuracy-Complexity Score (EACS) as a new metric for assessing model performance, emphasizing the balance between accuracy and computational resources. TransNAS-TSAD sets a new benchmark in time series anomaly detection, offering a versatile, efficient solution for complex real-world applications. This research highlights the TransNAS-TSAD potential across a wide range of industry applications and paves the way for future developments in the field.
△ Less
Submitted 4 March, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
An Automated Machine Learning Approach for Detecting Anomalous Peak Patterns in Time Series Data from a Research Watershed in the Northeastern United States Critical Zone
Authors:
Ijaz Ul Haq,
Byung Suk Lee,
Donna M. Rizzo,
Julia N Perdrial
Abstract:
This paper presents an automated machine learning framework designed to assist hydrologists in detecting anomalies in time series data generated by sensors in a research watershed in the northeastern United States critical zone. The framework specifically focuses on identifying peak-pattern anomalies, which may arise from sensor malfunctions or natural phenomena. However, the use of classification…
▽ More
This paper presents an automated machine learning framework designed to assist hydrologists in detecting anomalies in time series data generated by sensors in a research watershed in the northeastern United States critical zone. The framework specifically focuses on identifying peak-pattern anomalies, which may arise from sensor malfunctions or natural phenomena. However, the use of classification methods for anomaly detection poses challenges, such as the requirement for labeled data as ground truth and the selection of the most suitable deep learning model for the given task and dataset. To address these challenges, our framework generates labeled datasets by injecting synthetic peak patterns into synthetically generated time series data and incorporates an automated hyperparameter optimization mechanism. This mechanism generates an optimized model instance with the best architectural and training parameters from a pool of five selected models, namely Temporal Convolutional Network (TCN), InceptionTime, MiniRocket, Residual Networks (ResNet), and Long Short-Term Memory (LSTM). The selection is based on the user's preferences regarding anomaly detection accuracy and computational cost. The framework employs Time-series Generative Adversarial Networks (TimeGAN) as the synthetic dataset generator. The generated model instances are evaluated using a combination of accuracy and computational cost metrics, including training time and memory, during the anomaly detection process. Performance evaluation of the framework was conducted using a dataset from a watershed, demonstrating consistent selection of the most fitting model instance that satisfies the user's preferences.
△ Less
Submitted 5 December, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Label-based Graph Augmentation with Metapath for Graph Anomaly Detection
Authors:
Hwan Kim,
Junghoon Kim,
Byung Suk Lee,
Sungsu Lim
Abstract:
Graph anomaly detection has attracted considerable attention from various domain ranging from network security to finance in recent years. Due to the fact that labeling is very costly, existing methods are predominately developed in an unsupervised manner. However, the detected anomalies may be found out uninteresting instances due to the absence of prior knowledge regarding the anomalies looking…
▽ More
Graph anomaly detection has attracted considerable attention from various domain ranging from network security to finance in recent years. Due to the fact that labeling is very costly, existing methods are predominately developed in an unsupervised manner. However, the detected anomalies may be found out uninteresting instances due to the absence of prior knowledge regarding the anomalies looking for. This issue may be solved by using few labeled anomalies as prior knowledge. In real-world scenarios, we can easily obtain few labeled anomalies. Efficiently leveraging labelled anomalies as prior knowledge is crucial for graph anomaly detection; however, this process remains challenging due to the inherently limited number of anomalies available. To address the problem, we propose a novel approach that leverages metapath to embed actual connectivity patterns between anomalous and normal nodes. To further efficiently exploit context information from metapath-based anomaly subgraph, we present a new framework, Metapath-based Graph Anomaly Detection (MGAD), incorporating GCN layers in both the dual-encoders and decoders to efficiently propagate context information between abnormal and normal nodes. Specifically, MGAD employs GNN-based graph autoencoder as its backbone network. Moreover, dual encoders capture the complex interactions and metapath-based context information between labeled and unlabeled nodes both globally and locally. Through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the MGAD method compared to state-of-the-art techniques. The code is available at https://github.com/missinghwan/MGAD.
△ Less
Submitted 11 April, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Fact-Checking Generative AI: Ontology-Driven Biological Graphs for Disease-Gene Link Verification
Authors:
Ahmed Abdeen Hamed,
Byung Suk Lee,
Alessandro Crimi,
Magdalena M. Misiak
Abstract:
Since the launch of various generative AI tools, scientists have been striving to evaluate their capabilities and contents, in the hope of establishing trust in their generative abilities. Regulations and guidelines are emerging to verify generated contents and identify novel uses. we aspire to demonstrate how ChatGPT claims are checked computationally using the rigor of network models. We aim to…
▽ More
Since the launch of various generative AI tools, scientists have been striving to evaluate their capabilities and contents, in the hope of establishing trust in their generative abilities. Regulations and guidelines are emerging to verify generated contents and identify novel uses. we aspire to demonstrate how ChatGPT claims are checked computationally using the rigor of network models. We aim to achieve fact-checking of the knowledge embedded in biological graphs that were contrived from ChatGPT contents at the aggregate level. We adopted a biological networks approach that enables the systematic interrogation of ChatGPT's linked entities. We designed an ontology-driven fact-checking algorithm that compares biological graphs constructed from approximately 200,000 PubMed abstracts with counterparts constructed from a dataset generated using the ChatGPT-3.5 Turbo model. In 10-samples of 250 randomly selected records a ChatGPT dataset of 1000 "simulated" articles , the fact-checking link accuracy ranged from 70% to 86%. This study demonstrated high accuracy of aggregate disease-gene links relationships found in ChatGPT-generated texts.
△ Less
Submitted 8 April, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges
Authors:
Hwan Kim,
Byung Suk Lee,
Won-Yong Shin,
Sungsu Lim
Abstract:
Graphs are used widely to model complex systems, and detecting anomalies in a graph is an important task in the analysis of complex systems. Graph anomalies are patterns in a graph that do not conform to normal patterns expected of the attributes and/or structures of the graph. In recent years, graph neural networks (GNNs) have been studied extensively and have successfully performed difficult mac…
▽ More
Graphs are used widely to model complex systems, and detecting anomalies in a graph is an important task in the analysis of complex systems. Graph anomalies are patterns in a graph that do not conform to normal patterns expected of the attributes and/or structures of the graph. In recent years, graph neural networks (GNNs) have been studied extensively and have successfully performed difficult machine learning tasks in node classification, link prediction, and graph classification thanks to the highly expressive capability via message passing in effectively learning graph representations. To solve the graph anomaly detection problem, GNN-based methods leverage information about the graph attributes (or features) and/or structures to learn to score anomalies appropriately. In this survey, we review the recent advances made in detecting graph anomalies using GNN models. Specifically, we summarize GNN-based methods according to the graph type (i.e., static and dynamic), the anomaly type (i.e., node, edge, subgraph, and whole graph), and the network architecture (e.g., graph autoencoder, graph convolutional network). To the best of our knowledge, this survey is the first comprehensive review of graph anomaly detection methods based on GNNs.
△ Less
Submitted 4 October, 2022; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Adaptive Model Pooling for Online Deep Anomaly Detection from a Complex Evolving Data Stream
Authors:
Susik Yoon,
Youngjun Lee,
Jae-Gil Lee,
Byung Suk Lee
Abstract:
Online anomaly detection from a data stream is critical for the safety and security of many applications but is facing severe challenges due to complex and evolving data streams from IoT devices and cloud-based infrastructures. Unfortunately, existing approaches fall too short for these challenges; online anomaly detection methods bear the burden of handling the complexity while offline deep anoma…
▽ More
Online anomaly detection from a data stream is critical for the safety and security of many applications but is facing severe challenges due to complex and evolving data streams from IoT devices and cloud-based infrastructures. Unfortunately, existing approaches fall too short for these challenges; online anomaly detection methods bear the burden of handling the complexity while offline deep anomaly detection methods suffer from the evolving data distribution. This paper presents a framework for online deep anomaly detection, ARCUS, which can be instantiated with any autoencoder-based deep anomaly detection methods. It handles the complex and evolving data streams using an adaptive model pooling approach with two novel techniques: concept-driven inference and drift-aware model pool update; the former detects anomalies with a combination of models most appropriate for the complexity, and the latter adapts the model pool dynamically to fit the evolving data streams. In comprehensive experiments with ten data sets which are both high-dimensional and concept-drifted, ARCUS improved the anomaly detection accuracy of the streaming variants of state-of-the-art autoencoder-based methods and that of the state-of-the-art streaming anomaly detection methods by up to 22% and 37%, respectively.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
SOMTimeS: Self Organizing Maps for Time Series Clustering and its Application to Serious Illness Conversations
Authors:
Ali Javed,
Donna M. Rizzo,
Byung Suk Lee,
Robert Gramling
Abstract:
There is an increasing demand for scalable algorithms capable of clustering and analyzing large time series datasets. The Kohonen self-organizing map (SOM) is a type of unsupervised artificial neural network for visualizing and clustering complex data, reducing the dimensionality of data, and selecting influential features. Like all clustering methods, the SOM requires a measure of similarity betw…
▽ More
There is an increasing demand for scalable algorithms capable of clustering and analyzing large time series datasets. The Kohonen self-organizing map (SOM) is a type of unsupervised artificial neural network for visualizing and clustering complex data, reducing the dimensionality of data, and selecting influential features. Like all clustering methods, the SOM requires a measure of similarity between input data (in this work time series). Dynamic time war** (DTW) is one such measure, and a top performer given that it accommodates the distortions when aligning time series. Despite its use in clustering, DTW is limited in practice because it is quadratic in runtime complexity with the length of the time series data. To address this, we present a new DTW-based clustering method, called SOMTimeS (a Self-Organizing Map for TIME Series), that scales better and runs faster than other DTW-based clustering algorithms, and has similar performance accuracy. The computational performance of SOMTimeS stems from its ability to prune unnecessary DTW computations during the SOM's training phase. We also implemented a similar pruning strategy for K-means for comparison with one of the top performing clustering algorithms. We evaluated the pruning effectiveness, accuracy, execution time and scalability on 112 benchmark time series datasets from the University of California, Riverside classification archive. We showed that for similar accuracy, the speed-up achieved for SOMTimeS and K-means was 1.8x on average; however, rates varied between 1x and 18x depending on the dataset. SOMTimeS and K-means pruned 43% and 50% of the total DTW computations, respectively. We applied SOMtimeS to natural language conversation data collected as part of a large healthcare cohort study of patient-clinician serious illness conversations to demonstrate the algorithm's utility with complex, temporally sequenced phenomena.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
A Benchmark Study on Time Series Clustering
Authors:
Ali Javed,
Byung Suk Lee,
Dona M. Rizzo
Abstract:
This paper presents the first time series clustering benchmark utilizing all time series datasets currently available in the University of California Riverside (UCR) archive -- the state of the art repository of time series data. Specifically, the benchmark examines eight popular clustering methods representing three categories of clustering algorithms (partitional, hierarchical and density-based)…
▽ More
This paper presents the first time series clustering benchmark utilizing all time series datasets currently available in the University of California Riverside (UCR) archive -- the state of the art repository of time series data. Specifically, the benchmark examines eight popular clustering methods representing three categories of clustering algorithms (partitional, hierarchical and density-based) and three types of distance measures (Euclidean, dynamic time war**, and shape-based). We lay out six restrictions with special attention to making the benchmark as unbiased as possible. A phased evaluation approach was then designed for summarizing dataset-level assessment metrics and discussing the results. The benchmark study presented can be a useful reference for the research community on its own; and the dataset-level assessment metrics reported may be used for designing evaluation frameworks to answer different research questions.
△ Less
Submitted 26 April, 2020; v1 submitted 20 April, 2020;
originally announced April 2020.
-
Analysis of Hydrological and Suspended Sediment Events from Mad River Watershed using Multivariate Time Series Clustering
Authors:
Ali Javed,
Scott D. Hamshaw,
Donna M. Rizzo,
Byung Suk Lee
Abstract:
Hydrological storm events are a primary driver for transporting water quality constituents such as turbidity, suspended sediments and nutrients. Analyzing the concentration (C) of these water quality constituents in response to increased streamflow discharge (Q), particularly when monitored at high temporal resolution during a hydrological event, helps to characterize the dynamics and flux of such…
▽ More
Hydrological storm events are a primary driver for transporting water quality constituents such as turbidity, suspended sediments and nutrients. Analyzing the concentration (C) of these water quality constituents in response to increased streamflow discharge (Q), particularly when monitored at high temporal resolution during a hydrological event, helps to characterize the dynamics and flux of such constituents. A conventional approach to storm event analysis is to reduce the C-Q time series to two-dimensional (2-D) hysteresis loops and analyze these 2-D patterns. While effective and informative to some extent, this hysteresis loop approach has limitations because projecting the C-Q time series onto a 2-D plane obscures detail (e.g., temporal variation) associated with the C-Q relationships. In this paper, we address this issue using a multivariate time series clustering approach. Clustering is applied to sequences of river discharge and suspended sediment data (acquired through turbidity-based monitoring) from six watersheds located in the Lake Champlain Basin in the northeastern United States. While clusters of the hydrological storm events using the multivariate time series approach were found to be correlated to 2-D hysteresis loop classifications and watershed locations, the clusters differed from the 2-D hysteresis classifications. Additionally, using available meteorological data associated with storm events, we examine the characteristics of computational clusters of storm events in the study watersheds and identify the features driving the clustering approach.
△ Less
Submitted 20 March, 2020; v1 submitted 27 November, 2019;
originally announced November 2019.
-
Real-time Top-K Predictive Query Processing over Event Streams
Authors:
Saurav Acharya,
Byung Suk Lee,
Paul Hines
Abstract:
This paper addresses the problem of predicting the k events that are most likely to occur next, over historical real-time event streams. Existing approaches to causal prediction queries have a number of limitations. First, they exhaustively search over an acyclic causal network to find the most likely k effect events; however, data from real event streams frequently reflect cyclic causality. Secon…
▽ More
This paper addresses the problem of predicting the k events that are most likely to occur next, over historical real-time event streams. Existing approaches to causal prediction queries have a number of limitations. First, they exhaustively search over an acyclic causal network to find the most likely k effect events; however, data from real event streams frequently reflect cyclic causality. Second, they contain conservative assumptions intended to exclude all possible non-causal links in the causal network; it leads to the omission of many less-frequent but important causal links. We overcome these limitations by proposing a novel event precedence model and a run-time causal inference mechanism. The event precedence model constructs a first order absorbing Markov chain incrementally over event streams, where an edge between two events signifies a temporal precedence relationship between them, which is a necessary condition for causality. Then, the run-time causal inference mechanism learns causal relationships dynamically during query processing. This is done by removing some of the temporal precedence relationships that do not exhibit causality in the presence of other events in the event precedence model. This paper presents two query processing algorithms -- one performs exhaustive search on the model and the other performs a more efficient reduced search with early termination. Experiments using two real datasets (cascading blackouts in power systems and web page views) verify the effectiveness of the probabilistic top-k prediction queries and the efficiency of the algorithms. Specifically, the reduced search algorithm reduced runtime, relative to exhaustive search, by 25-80% (depending on the application) with only a small reduction in accuracy.
△ Less
Submitted 26 August, 2015;
originally announced August 2015.
-
A Self-Organization Framework for Wireless Ad Hoc Networks as Small Worlds
Authors:
Abhik Banerjee,
Rachit Agarwal,
Vincent Gauthier,
Chai Kiat Yeo,
Hossam Afifi,
Bu Sung Lee
Abstract:
Motivated by the benefits of small world networks, we propose a self-organization framework for wireless ad hoc networks. We investigate the use of directional beamforming for creating long-range short cuts between nodes. Using simulation results for randomized beamforming as a guideline, we identify crucial design issues for algorithm design. Our results show that, while significant path length r…
▽ More
Motivated by the benefits of small world networks, we propose a self-organization framework for wireless ad hoc networks. We investigate the use of directional beamforming for creating long-range short cuts between nodes. Using simulation results for randomized beamforming as a guideline, we identify crucial design issues for algorithm design. Our results show that, while significant path length reduction is achievable, this is accompanied by the problem of asymmetric paths between nodes. Subsequently, we propose a distributed algorithm for small world creation that achieves path length reduction while maintaining connectivity. We define a new centrality measure that estimates the structural importance of nodes based on traffic flow in the network, which is used to identify the optimum nodes for beamforming. We show, using simulations, that this leads to significant reduction in path length while maintaining connectivity.
△ Less
Submitted 6 March, 2012;
originally announced March 2012.
-
Achieving Small World Properties using Bio-Inspired Techniques in Wireless Networks
Authors:
Rachit Agarwal,
Abhik Banerjee,
Vincent Gauthier,
Monique Becker,
Chai Kiat Yeo,
Bu Sung Lee
Abstract:
It is highly desirable and challenging for a wireless ad hoc network to have self-organization properties in order to achieve network wide characteristics. Studies have shown that Small World properties, primarily low average path length and high clustering coefficient, are desired properties for networks in general. However, due to the spatial nature of the wireless networks, achieving small worl…
▽ More
It is highly desirable and challenging for a wireless ad hoc network to have self-organization properties in order to achieve network wide characteristics. Studies have shown that Small World properties, primarily low average path length and high clustering coefficient, are desired properties for networks in general. However, due to the spatial nature of the wireless networks, achieving small world properties remains highly challenging. Studies also show that, wireless ad hoc networks with small world properties show a degree distribution that lies between geometric and power law. In this paper, we show that in a wireless ad hoc network with non-uniform node density with only local information, we can significantly reduce the average path length and retain the clustering coefficient. To achieve our goal, our algorithm first identifies logical regions using Lateral Inhibition technique, then identifies the nodes that beamform and finally the beam properties using Flocking. We use Lateral Inhibition and Flocking because they enable us to use local state information as opposed to other techniques. We support our work with simulation results and analysis, which show that a reduction of up to 40% can be achieved for a high-density network. We also show the effect of hopcount used to create regions on average path length, clustering coefficient and connectivity.
△ Less
Submitted 3 March, 2012; v1 submitted 21 November, 2011;
originally announced November 2011.
-
Self-organization of Nodes using Bio-Inspired Techniques for Achieving Small World Properties
Authors:
Rachit Agarwal,
Abhik Banerjee,
Vincent Gauthier,
Monique Becker,
Chai Kiat Yeo,
Bu Sung Lee
Abstract:
In an autonomous wireless sensor network, self-organization of the nodes is essential to achieve network wide characteristics. We believe that connectivity in wireless autonomous networks can be increased and overall average path length can be reduced by using beamforming and bio-inspired algorithms. Recent works on the use of beamforming in wireless networks mostly assume the knowledge of the net…
▽ More
In an autonomous wireless sensor network, self-organization of the nodes is essential to achieve network wide characteristics. We believe that connectivity in wireless autonomous networks can be increased and overall average path length can be reduced by using beamforming and bio-inspired algorithms. Recent works on the use of beamforming in wireless networks mostly assume the knowledge of the network in aggregation to either heterogeneous or hybrid deployment. We propose that without the global knowledge or the introduction of any special feature, the average path length can be reduced with the help of inspirations from the nature and simple interactions between neighboring nodes. Our algorithm also reduces the number of disconnected components within the network. Our results show that reduction in the average path length and the number of disconnected components can be achieved using very simple local rules and without the full network knowledge.
△ Less
Submitted 27 September, 2011;
originally announced September 2011.
-
Self-Organization of Wireless Ad Hoc Networks as Small Worlds Using Long Range Directional Beams
Authors:
Abhik Banerjee,
Rachit Agarwal,
Vincent Gauthier,
Chai Kiat Yeo,
Hossam Afifi,
Bu Sung Lee
Abstract:
We study how long range directional beams can be used for self-organization of a wireless network to exhibit small world properties. Using simulation results for randomized beamforming as a guideline, we identify crucial design issues for algorithm design. Subsequently, we propose an algorithm for deterministic creation of small worlds. We define a new centrality measure that estimates the structu…
▽ More
We study how long range directional beams can be used for self-organization of a wireless network to exhibit small world properties. Using simulation results for randomized beamforming as a guideline, we identify crucial design issues for algorithm design. Subsequently, we propose an algorithm for deterministic creation of small worlds. We define a new centrality measure that estimates the structural importance of nodes based on traffic flow in the network, which is used to identify the optimum nodes for beamforming. This results in significant reduction in path length while maintaining connectivity.
△ Less
Submitted 25 September, 2011;
originally announced September 2011.
-
From Linked Data to Relevant Data -- Time is the Essence
Authors:
Markus Kirchberg,
Ryan K L Ko,
Bu Sung Lee
Abstract:
The Semantic Web initiative puts emphasis not primarily on putting data on the Web, but rather on creating links in a way that both humans and machines can explore the Web of data. When such users access the Web, they leave a trail as Web servers maintain a history of requests. Web usage mining approaches have been studied since the beginning of the Web given the log's huge potential for purposes…
▽ More
The Semantic Web initiative puts emphasis not primarily on putting data on the Web, but rather on creating links in a way that both humans and machines can explore the Web of data. When such users access the Web, they leave a trail as Web servers maintain a history of requests. Web usage mining approaches have been studied since the beginning of the Web given the log's huge potential for purposes such as resource annotation, personalization, forecasting etc. However, the impact of any such efforts has not really gone beyond generating statistics detailing who, when, and how Web pages maintained by a Web server were visited.
△ Less
Submitted 25 March, 2011;
originally announced March 2011.