Search | arXiv e-print repository

Predicting the structure of dynamic graphs

Abstract: Dynamic graph embeddings, inductive and incremental learning facilitate predictive tasks such as node classification and link prediction. However, predicting the structure of a graph at a future time step from a time series of graphs, allowing for new nodes has not gained much attention. In this paper, we present such an approach. We use time series methods to predict the node degree at future tim… ▽ More Dynamic graph embeddings, inductive and incremental learning facilitate predictive tasks such as node classification and link prediction. However, predicting the structure of a graph at a future time step from a time series of graphs, allowing for new nodes has not gained much attention. In this paper, we present such an approach. We use time series methods to predict the node degree at future time points and combine it with flux balance analysis -- a linear programming method used in biochemistry -- to obtain the structure of future graphs. Furthermore, we explore the predictive graph distribution for different parameter values. We evaluate this method using synthetic and real datasets and demonstrate its utility and applicability. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2310.04059 [pdf, other]

DEFT: A new distance-based feature set for keystroke dynamics

Authors: Nuwan Kaluarachchi, Sevvandi Kandanaarachchi, Kristen Moore, Arathi Arakala

Abstract: Keystroke dynamics is a behavioural biometric utilised for user identification and authentication. We propose a new set of features based on the distance between keys on the keyboard, a concept that has not been considered before in keystroke dynamics. We combine flight times, a popular metric, with the distance between keys on the keyboard and call them as Distance Enhanced Flight Time features (… ▽ More Keystroke dynamics is a behavioural biometric utilised for user identification and authentication. We propose a new set of features based on the distance between keys on the keyboard, a concept that has not been considered before in keystroke dynamics. We combine flight times, a popular metric, with the distance between keys on the keyboard and call them as Distance Enhanced Flight Time features (DEFT). This novel approach provides comprehensive insights into a person's ty** behaviour, surpassing ty** velocity alone. We build a DEFT model by combining DEFT features with other previously used keystroke dynamic features. The DEFT model is designed to be device-agnostic, allowing us to evaluate its effectiveness across three commonly used devices: desktop, mobile, and tablet. The DEFT model outperforms the existing state-of-the-art methods when we evaluate its effectiveness across two datasets. We obtain accuracy rates exceeding 99% and equal error rates below 10% on all three devices. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 12 pages, 5 figures, 3 tables, conference paper

arXiv:2307.15850 [pdf, other]

doi 10.13140/RG.2.2.11363.09760

Comprehensive Algorithm Portfolio Evaluation using Item Response Theory

Authors: Sevvandi Kandanaarachchi, Kate Smith-Miles

Abstract: Item Response Theory (IRT) has been proposed within the field of Educational Psychometrics to assess student ability as well as test question difficulty and discrimination power. More recently, IRT has been applied to evaluate machine learning algorithm performance on a single classification dataset, where the student is now an algorithm, and the test question is an observation to be classified by… ▽ More Item Response Theory (IRT) has been proposed within the field of Educational Psychometrics to assess student ability as well as test question difficulty and discrimination power. More recently, IRT has been applied to evaluate machine learning algorithm performance on a single classification dataset, where the student is now an algorithm, and the test question is an observation to be classified by the algorithm. In this paper we present a modified IRT-based framework for evaluating a portfolio of algorithms across a repository of datasets, while simultaneously eliciting a richer suite of characteristics - such as algorithm consistency and anomalousness - that describe important aspects of algorithm performance. These characteristics arise from a novel inversion and reinterpretation of the traditional IRT model without requiring additional dataset feature computations. We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2304.13941 [pdf, other]

Detecting inner-LAN anomalies using hierarchical forecasting

Authors: Sevvandi Kandanaarachchi, Mahdi Abolghasemi, Hideya Ochiai, Asha Rao

Abstract: Increasing activity and the number of devices online are leading to increasing and more diverse cyber attacks. This continuously evolving attack activity makes signature-based detection methods ineffective. Once malware has infiltrated into a LAN, bypassing an external gateway or entering via an unsecured mobile device, it can potentially infect all nodes in the LAN as well as carry out nefarious… ▽ More Increasing activity and the number of devices online are leading to increasing and more diverse cyber attacks. This continuously evolving attack activity makes signature-based detection methods ineffective. Once malware has infiltrated into a LAN, bypassing an external gateway or entering via an unsecured mobile device, it can potentially infect all nodes in the LAN as well as carry out nefarious activities such as stealing valuable data, leading to financial damage and loss of reputation. Such infiltration could be viewed as an insider attack, increasing the need for LAN monitoring and security. In this paper we aim to detect such inner-LAN activity by studying the variations in Address Resolution Protocol (ARP) calls within the LAN. We find anomalous nodes by modelling inner-LAN traffic using hierarchical forecasting methods. We substantially reduce the false positives ever present in anomaly detection, by using an extreme value theory based method. We use a dataset from a real inner-LAN monitoring project, containing over 10M ARP calls from 362 nodes. Furthermore, the small number of false positives generated using our methods, is a potential solution to the "alert fatigue" commonly reported by security experts. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2210.07407 [pdf, other]

Anomaly detection in dynamic networks

Authors: Sevvandi Kandanaarachchi, Rob J Hyndman

Abstract: Detecting anomalies from a series of temporal networks has many applications, including road accidents in transport networks and suspicious events in social networks. While there are many methods for network anomaly detection, statistical methods are under utilised in this space even though they have a long history and proven capability in handling temporal dependencies. In this paper, we introduc… ▽ More Detecting anomalies from a series of temporal networks has many applications, including road accidents in transport networks and suspicious events in social networks. While there are many methods for network anomaly detection, statistical methods are under utilised in this space even though they have a long history and proven capability in handling temporal dependencies. In this paper, we introduce \textit{oddnet}, a feature-based network anomaly detection method that uses time series methods to model temporal dependencies. We demonstrate the effectiveness of oddnet on synthetic and real-world datasets. The R package oddnet implements this algorithm. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2210.05821 [pdf, other]

Short-term prediction of stream turbidity using surrogate data and a meta-model approach

Authors: Bhargav Rele, Caleb Hogan, Sevvandi Kandanaarachchi, Catherine Leigh

Abstract: Many water-quality monitoring programs aim to measure turbidity to help guide effective management of waterways and catchments, yet distributing turbidity sensors throughout networks is typically cost prohibitive. To this end, we built and compared the ability of dynamic regression (ARIMA), long short-term memory neural nets (LSTM), and generalized additive models (GAM) to forecast stream turbidit… ▽ More Many water-quality monitoring programs aim to measure turbidity to help guide effective management of waterways and catchments, yet distributing turbidity sensors throughout networks is typically cost prohibitive. To this end, we built and compared the ability of dynamic regression (ARIMA), long short-term memory neural nets (LSTM), and generalized additive models (GAM) to forecast stream turbidity one step ahead, using surrogate data from relatively low-cost in-situ sensors and publicly available databases. We iteratively trialled combinations of four surrogate covariates (rainfall, water level, air temperature and total global solar exposure) selecting a final model for each type that minimised the corrected Akaike Information Criterion. Cross-validation using a rolling time-window indicated that ARIMA, which included the rainfall and water-level covariates only, produced the most accurate predictions, followed closely by GAM, which included all four covariates. We constructed a meta-model, trained on time-series features of turbidity, to take advantage of the strengths of each model over different time points and predict the best model (that with the lowest forecast error one-step prior) for each time step. The meta-model outperformed all other models, indicating that this methodology can yield high accuracy and may be a viable alternative to using measurements sourced directly from turbidity-sensors where costs prohibit their deployment and maintenance, and when predicting turbidity across the short term. Our findings also indicated that temperature and light-associated variables, for example underwater illuminance, may hold promise as cost-effective, high-frequency surrogates of turbidity, especially when combined with other covariates, like rainfall, that are typically measured at coarse levels of spatial resolution. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2106.06243 [pdf, other]

Unsupervised Anomaly Detection Ensembles using Item Response Theory

Authors: Sevvandi Kandanaarachchi

Abstract: Constructing an ensemble from a heterogeneous set of unsupervised anomaly detection methods is challenging because the class labels or the ground truth is unknown. Thus, traditional ensemble techniques that use the response variable or the class labels cannot be used to construct an ensemble for unsupervised anomaly detection. We use Item Response Theory (IRT) -- a class of models used in educat… ▽ More Constructing an ensemble from a heterogeneous set of unsupervised anomaly detection methods is challenging because the class labels or the ground truth is unknown. Thus, traditional ensemble techniques that use the response variable or the class labels cannot be used to construct an ensemble for unsupervised anomaly detection. We use Item Response Theory (IRT) -- a class of models used in educational psychometrics to assess student and test question characteristics -- to construct an unsupervised anomaly detection ensemble. IRT's latent trait computation lends itself to anomaly detection because the latent trait can be used to uncover the hidden ground truth. Using a novel IRT map** to the anomaly detection problem, we construct an ensemble that can downplay noisy, non-discriminatory methods and accentuate sharper methods. We demonstrate the effectiveness of the IRT ensemble on an extensive data repository, by comparing its performance to other ensemble techniques. △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: 25 pages

arXiv:2105.02526 [pdf, other]

Honeyboost: Boosting honeypot performance with data fusion and anomaly detection

Authors: Sevvandi Kandanaarachchi, Hideya Ochiai, Asha Rao

Abstract: With cyber incidents and data breaches becoming increasingly common, being able to predict a cyberattack has never been more crucial. The ability of Network Anomaly Detection Systems (NADS) to identify unusual behavior makes them useful in predicting such attacks. However, NADS often suffer from high false positive rates. In this paper, we introduce a novel framework called Honeyboost that enhance… ▽ More With cyber incidents and data breaches becoming increasingly common, being able to predict a cyberattack has never been more crucial. The ability of Network Anomaly Detection Systems (NADS) to identify unusual behavior makes them useful in predicting such attacks. However, NADS often suffer from high false positive rates. In this paper, we introduce a novel framework called Honeyboost that enhances the performance of honeypot aided NADS. Using data from the LAN Security Monitoring Project, Honeyboost identifies most anomalous nodes before they access the honeypot aiding early detection and prediction. Furthermore, using extreme value theory, we achieve the highly desirable low false positive rates. Honeyboost is an unsupervised method comprising two approaches: horizontal and vertical. The horizontal approach constructs a time series from the communications of each node, with node-level features encapsulating their behavior over time. The vertical approach finds anomalies in each protocol space. Using a window-based model, which is typically used in online scenarios, the horizontal and vertical approaches are combined to identify anomalies and gain useful insights. Experimental results indicate the efficacy of our framework in identifying suspicious activities of nodes. △ Less

Submitted 7 September, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: 26 pages

Showing 1–8 of 8 results for author: Kandanaarachchi, S