Search | arXiv e-print repository

Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection

Authors: Chun**g Xiao, Shikang Pang, Xovee Xu, Xuan Li, Goce Trajcevski, Fan Zhou

Abstract: A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Aug… ▽ More A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Augmentation method for Graph Anomaly Detection -- which introduces a graph pointer neural network as the heterophilic node detector to identify potential anomalies whose neighborhoods are normal-node-dominant. For each identified potential anomaly, we design a graph-specific diffusion model to translate a part of its neighbors, which are probably normal, into anomalous ones. At last, we involve these translated neighbors in GNN neighborhood aggregation to produce counterfactual representations of anomalies. Through aggregating the translated anomalous neighbors, counterfactual representations become more distinguishable and further advocate detection performance. The experimental results on four datasets demonstrate that CAGAD significantly outperforms strong baselines, with an average improvement of 2.35% on F1, 2.53% on AUC-ROC, and 2.79% on AUC-PR. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by IEEE Transactions on Computational Social Systems(TCSS). DOI: https://doi.org/10.1109/TCSS.2024.3403503

arXiv:2310.08213 [pdf, other]

A Universal Scheme for Dynamic Partitioned Shortest Path Index

Authors: Mengxuan Zhang, Xinjie Zhou, Lei Li, Ziyi Liu, Goce Trajcevski, Yan Huang, Xiaofang Zhou

Abstract: Shortest path (SP) computation is the fundamental operation in various networks such as urban networks, logistic networks, communication networks, social networks, etc. With the development of technology and societal expansions, those networks tend to be massive. This, in turn, causes deteriorated performance of SP computation, and graph partitioning is commonly leveraged to scale up the SP algori… ▽ More Shortest path (SP) computation is the fundamental operation in various networks such as urban networks, logistic networks, communication networks, social networks, etc. With the development of technology and societal expansions, those networks tend to be massive. This, in turn, causes deteriorated performance of SP computation, and graph partitioning is commonly leveraged to scale up the SP algorithms. However, the partitioned shortest path (PSP) index has never been systematically investigated and theoretically analyzed, and there is a lack of experimental comparison among different PSP indexes. Moreover, few studies have explored PSP index maintenance in dynamic networks. Therefore, in this paper, we systematically analyze the dynamic PSP index by proposing a universal scheme for it. Specifically, we first propose two novel partitioned shortest path strategies (No-boundary and Post-boundary strategies) to improve the performance of PSP indexes and design the corresponding index maintenance approaches to deal with dynamic scenarios. Then we categorize the partition methods from the perspective of partition structure to facilitate the selection of partition methods in the PSP index. Furthermore, we propose a universal scheme for designing the PSP index by coupling its three dimensions (i.e. PSP strategy, partition structure, and SP algorithm). Based on this scheme, we propose five new PSP indexes with prominent performance in either query or update efficiency. Lastly, extensive experiments are implemented to demonstrate the effectiveness of the proposed PSP scheme, with valuable guidance provided on the PSP index design. △ Less

Submitted 1 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2307.05717 [pdf, other]

Towards Mobility Data Science (Vision Paper)

Authors: Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento , et al. (23 additional authors not shown)

Abstract: Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences… ▽ More Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years. △ Less

Submitted 7 March, 2024; v1 submitted 21 June, 2023; originally announced July 2023.

Comments: Updated to reflect the major revision for ACM Transactions on Spatial Algorithms and Systems (TSAS). This version reflects the final version accepted by ACM TSAS

arXiv:2211.09625 [pdf, other]

Predicting Human Mobility via Self-supervised Disentanglement Learning

Authors: Qiang Gao, **yu Hong, Xovee Xu, ** Kuang, Fan Zhou, Goce Trajcevski

Abstract: Deep neural networks have recently achieved considerable improvements in learning human behavioral patterns and individual preferences from massive spatial-temporal trajectories data. However, most of the existing research concentrates on fusing different semantics underlying sequential trajectories for mobility pattern learning which, in turn, yields a narrow perspective on comprehending human in… ▽ More Deep neural networks have recently achieved considerable improvements in learning human behavioral patterns and individual preferences from massive spatial-temporal trajectories data. However, most of the existing research concentrates on fusing different semantics underlying sequential trajectories for mobility pattern learning which, in turn, yields a narrow perspective on comprehending human intrinsic motions. In addition, the inherent sparsity and under-explored heterogeneous collaborative items pertaining to human check-ins hinder the potential exploitation of human diverse periodic regularities as well as common interests. Motivated by recent advances in disentanglement learning, in this study we propose a novel disentangled solution called SSDL for tackling the next POI prediction problem. SSDL primarily seeks to disentangle the potential time-invariant and time-varying factors into different latent spaces from massive trajectories data, providing an interpretable view to understand the intricate semantics underlying human diverse mobility representations. To address the data sparsity issue, we present two realistic trajectory augmentation approaches to enhance the understanding of both the human intrinsic periodicity and constantly-changing intents. In addition, we devise a POI-centric graph structure to explore heterogeneous collaborative signals underlying historical check-ins. Extensive experiments conducted on four real-world datasets demonstrate that our proposed SSDL significantly outperforms the state-of-the-art approaches -- for example, it yields up to 8.57% improvements on ACC@1. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 15 pages, 9 figures

arXiv:2005.11041 [pdf, other]

doi 10.1145/3433000

A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances

Authors: Fan Zhou, Xovee Xu, Goce Trajcevski, Kunpeng Zhang

Abstract: The deluge of digital information in our daily life -- from user-generated content, such as microblogs and scientific papers, to online business, such as viral marketing and advertising -- offers unprecedented opportunities to explore and exploit the trajectories and structures of the evolution of information cascades. Abundant research efforts, both academic and industrial, have aimed to reach a… ▽ More The deluge of digital information in our daily life -- from user-generated content, such as microblogs and scientific papers, to online business, such as viral marketing and advertising -- offers unprecedented opportunities to explore and exploit the trajectories and structures of the evolution of information cascades. Abundant research efforts, both academic and industrial, have aimed to reach a better understanding of the mechanisms driving the spread of information and quantifying the outcome of information diffusion. This article presents a comprehensive review and categorization of information popularity prediction methods, from feature engineering and stochastic processes, through graph representation, to deep learning-based approaches. Specifically, we first formally define different types of information cascades and summarize the perspectives of existing studies. We then present a taxonomy that categorizes existing works into the aforementioned three main groups as well as the main subclasses in each group, and we systematically review cutting-edge research work. Finally, we summarize the pros and cons of existing research efforts and outline the open challenges and opportunities in this field. △ Less

Submitted 23 March, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: Author version, with 43 pages, 9 figures, and 11 tables

Journal ref: ACM Computing Surveys (CSUR), 54(2), Article 27, Mar 2021, 36 pages

arXiv:2003.12042 [pdf, other]

A Heterogeneous Dynamical Graph Neural Networks Approach to Quantify Scientific Impact

Authors: Fan Zhou, Xovee Xu, Ce Li, Goce Trajcevski, Ting Zhong, Kunpeng Zhang

Abstract: Quantifying and predicting the long-term impact of scientific writings or individual scholars has important implications for many policy decisions, such as funding proposal evaluation and identifying emerging research fields. In this work, we propose an approach based on Heterogeneous Dynamical Graph Neural Network (HDGNN) to explicitly model and predict the cumulative impact of papers and authors… ▽ More Quantifying and predicting the long-term impact of scientific writings or individual scholars has important implications for many policy decisions, such as funding proposal evaluation and identifying emerging research fields. In this work, we propose an approach based on Heterogeneous Dynamical Graph Neural Network (HDGNN) to explicitly model and predict the cumulative impact of papers and authors. HDGNN extends heterogeneous GNNs by incorporating temporally evolving characteristics and capturing both structural properties of attributed graph and the growing sequence of citation behavior. HDGNN is significantly different from previous models in its capability of modeling the node impact in a dynamic manner while taking into account the complex relations among nodes. Experiments conducted on a real citation dataset demonstrate its superior performance of predicting the impact of both papers and authors. △ Less

Submitted 26 March, 2020; originally announced March 2020.

arXiv:2001.01829 [pdf, other]

doi 10.1109/ICMLA.2019.00094

Frosting Weights for Better Continual Training

Authors: Xiaofeng Zhu, Feng Liu, Goce Trajcevski, Dingding Wang

Abstract: Training a neural network model can be a lifelong learning process and is a computationally intensive one. A severe adverse effect that may occur in deep neural network models is that they can suffer from catastrophic forgetting during retraining on new data. To avoid such disruptions in the continuous learning, one appealing property is the additive nature of ensemble models. In this paper, we pr… ▽ More Training a neural network model can be a lifelong learning process and is a computationally intensive one. A severe adverse effect that may occur in deep neural network models is that they can suffer from catastrophic forgetting during retraining on new data. To avoid such disruptions in the continuous learning, one appealing property is the additive nature of ensemble models. In this paper, we propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the catastrophic forgetting problem in tuning pre-trained neural network models. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Journal ref: ICMLA 2019

arXiv:1905.09718 [pdf, ps, other]

Meta-GNN: On Few-shot Node Classification in Graph Meta-learning

Authors: Fan Zhou, Chengtai Cao, Kunpeng Zhang, Goce Trajcevski, Ting Zhong, Ji Geng

Abstract: Meta-learning has received a tremendous recent attention as a possible approach for mimicking human intelligence, i.e., acquiring new knowledge and skills with little or even no demonstration. Most of the existing meta-learning methods are proposed to tackle few-shot learning problems such as image and text, in rather Euclidean domain. However, there are very few works applying meta-learning to no… ▽ More Meta-learning has received a tremendous recent attention as a possible approach for mimicking human intelligence, i.e., acquiring new knowledge and skills with little or even no demonstration. Most of the existing meta-learning methods are proposed to tackle few-shot learning problems such as image and text, in rather Euclidean domain. However, there are very few works applying meta-learning to non-Euclidean domains, and the recently proposed graph neural networks (GNNs) models do not perform effectively on graph few-shot learning problems. Towards this, we propose a novel graph meta-learning framework -- Meta-GNN -- to tackle the few-shot node classification problem in graph meta-learning settings. It obtains the prior knowledge of classifiers by training on many similar few-shot learning tasks and then classifies the nodes from new classes with only few labeled samples. Additionally, Meta-GNN is a general model that can be straightforwardly incorporated into any existing state-of-the-art GNN. Our experiments conducted on three benchmark datasets demonstrate that our proposed approach not only improves the node classification performance by a large margin on few-shot learning problems in meta-learning paradigm, but also learns a more general and flexible model for task adaption. △ Less

Submitted 23 May, 2019; originally announced May 2019.

arXiv:1012.2789 [pdf, ps, other]

Experimental Comparison of Representation Methods and Distance Measures for Time Series Data

Authors: Xiaoyue Wang, Hui Ding, Goce Trajcevski, Peter Scheuermann, Eamonn Keogh

Abstract: The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particula… ▽ More The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on thirty-eight time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic. △ Less

Submitted 9 December, 2010; originally announced December 2010.

Showing 1–9 of 9 results for author: Trajcevski, G