-
Spatio-Temporal Graph Representation Learning for Fraudster Group Detection
Authors:
Saeedreza Shehnepoor,
Roberto Togneri,
Wei Liu,
Mohammed Bennamoun
Abstract:
Motivated by potential financial gain, companies may hire fraudster groups to write fake reviews to either demote competitors or promote their own businesses. Such groups are considerably more successful in misleading customers, as people are more likely to be influenced by the opinion of a large group. To detect such groups, a common model is to represent fraudster groups' static networks, conseq…
▽ More
Motivated by potential financial gain, companies may hire fraudster groups to write fake reviews to either demote competitors or promote their own businesses. Such groups are considerably more successful in misleading customers, as people are more likely to be influenced by the opinion of a large group. To detect such groups, a common model is to represent fraudster groups' static networks, consequently overlooking the longitudinal behavior of a reviewer thus the dynamics of co-review relations among reviewers in a group. Hence, these approaches are incapable of excluding outlier reviewers, which are fraudsters intentionally camouflaging themselves in a group and genuine reviewers happen to co-review in fraudster groups. To address this issue, in this work, we propose to first capitalize on the effectiveness of the HIN-RNN in both reviewers' representation learning while capturing the collaboration between reviewers, we first utilize the HIN-RNN to model the co-review relations of reviewers in a group in a fixed time window of 28 days. We refer to this as spatial relation learning representation to signify the generalisability of this work to other networked scenarios. Then we use an RNN on the spatial relations to predict the spatio-temporal relations of reviewers in the group. In the third step, a Graph Convolution Network (GCN) refines the reviewers' vector representations using these predicted relations. These refined representations are then used to remove outlier reviewers. The average of the remaining reviewers' representation is then fed to a simple fully connected layer to predict if the group is a fraudster group or not. Exhaustive experiments of the proposed approach showed a 5% (4%), 12% (5%), 12% (5%) improvement over three of the most recent approaches on precision, recall, and F1-value over the Yelp (Amazon) dataset, respectively.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Social Fraud Detection Review: Methods, Challenges and Analysis
Authors:
Saeedreza Shehnepoor,
Roberto Togneri,
Wei Liu,
Mohammed Bennamoun
Abstract:
Social reviews have dominated the web and become a plausible source of product information. People and businesses use such information for decision-making. Businesses also make use of social information to spread fake information using a single user, groups of users, or a bot trained to generate fraudulent content. Many studies proposed approaches based on user behaviors and review text to address…
▽ More
Social reviews have dominated the web and become a plausible source of product information. People and businesses use such information for decision-making. Businesses also make use of social information to spread fake information using a single user, groups of users, or a bot trained to generate fraudulent content. Many studies proposed approaches based on user behaviors and review text to address the challenges of fraud detection. To provide an exhaustive literature review, social fraud detection is reviewed using a framework that considers three key components: the review itself, the user who carries out the review, and the item being reviewed. As features are extracted for the component representation, a feature-wise review is provided based on behavioral, text-based features and their combination. With this framework, a comprehensive overview of approaches is presented including supervised, semi-supervised, and unsupervised learning. The supervised approaches for fraud detection are introduced and categorized into two sub-categories; classical, and deep learning. The lack of labeled datasets is explained and potential solutions are suggested. To help new researchers in the area develop a better understanding, a topic analysis and an overview of future directions is provided in each step of the proposed systematic framework.
△ Less
Submitted 4 January, 2023; v1 submitted 10 November, 2021;
originally announced November 2021.
-
HIN-RNN: A Graph Representation Learning Neural Network for Fraudster Group Detection With No Handcrafted Features
Authors:
Saeedreza Shehnepoor,
Roberto Togneri,
Wei Liu,
Mohammed Bennamoun
Abstract:
Social reviews are indispensable resources for modern consumers' decision making. For financial gain, companies pay fraudsters preferably in groups to demote or promote products and services since consumers are more likely to be misled by a large number of similar reviews from groups. Recent approaches on fraudster group detection employed handcrafted features of group behaviors without considerin…
▽ More
Social reviews are indispensable resources for modern consumers' decision making. For financial gain, companies pay fraudsters preferably in groups to demote or promote products and services since consumers are more likely to be misled by a large number of similar reviews from groups. Recent approaches on fraudster group detection employed handcrafted features of group behaviors without considering the semantic relation between reviews from the reviewers in a group. In this paper, we propose the first neural approach, HIN-RNN, a Heterogeneous Information Network (HIN) Compatible RNN for fraudster group detection that requires no handcrafted features. HIN-RNN provides a unifying architecture for representation learning of each reviewer, with the initial vector as the sum of word embeddings of all review text written by the same reviewer, concatenated by the ratio of negative reviews. Given a co-review network representing reviewers who have reviewed the same items with the same ratings and the reviewers' vector representation, a collaboration matrix is acquired through HIN-RNN training. The proposed approach is confirmed to be effective with marked improvement over state-of-the-art approaches on both the Yelp (22% and 12% in terms of recall and F1-value, respectively) and Amazon (4% and 2% in terms of recall and F1-value, respectively) datasets.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
ScoreGAN: A Fraud Review Detector based on Multi Task Learning of Regulated GAN with Data Augmentation
Authors:
Saeedreza Shehnepoor,
Roberto Togneri,
Wei Liu,
Mohammed Bennamoun
Abstract:
The promising performance of Deep Neural Networks (DNNs) in text classification, has attracted researchers to use them for fraud review detection. However, the lack of trusted labeled data has limited the performance of the current solutions in detecting fraud reviews. The Generative Adversarial Network (GAN) as a semi-supervised method has demonstrated to be effective for data augmentation purpos…
▽ More
The promising performance of Deep Neural Networks (DNNs) in text classification, has attracted researchers to use them for fraud review detection. However, the lack of trusted labeled data has limited the performance of the current solutions in detecting fraud reviews. The Generative Adversarial Network (GAN) as a semi-supervised method has demonstrated to be effective for data augmentation purposes. The state-of-the-art solutions utilize GANs to overcome the data scarcity problem. However, they fail to incorporate the behavioral clues in fraud generation. Additionally, state-of-the-art approaches overlook the possible bot-generated reviews in the dataset. Finally, they also suffer from a common limitation in scalability and stability of the GAN, slowing down the training procedure. In this work, we propose ScoreGAN for fraud review detection that makes use of both review text and review rating scores in the generation and detection process. Scores are incorporated through Information Gain Maximization (IGM) into the loss function for three reasons. One is to generate score-correlated reviews based on the scores given to the generator. Second, the generated reviews are employed to train the discriminator, so the discriminator can correctly label the possible bot-generated reviews through joint representations learned from the concatenation of GLobal Vector for Word representation (GLoVe) extracted from the text and the score. Finally, it can be used to improve the stability and scalability of the GAN. Results show that the proposed framework outperformed the existing state-of-the-art framework, namely FakeGAN, in terms of AP by 7\%, and 5\% on the Yelp and TripAdvisor datasets, respectively.
△ Less
Submitted 17 March, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
DFraud3- Multi-Component Fraud Detection freeof Cold-start
Authors:
Saeedreza Shehnepoor,
Roberto Togneri,
Wei Liu,
Mohammed Bennamoun
Abstract:
Fraud review detection is a hot research topic inrecent years. The Cold-start is a particularly new but significant problem referring to the failure of a detection system to recognize the authenticity of a new user. State-of-the-art solutions employ a translational knowledge graph embedding approach (TransE) to model the interaction of the components of a review system. However, these approaches s…
▽ More
Fraud review detection is a hot research topic inrecent years. The Cold-start is a particularly new but significant problem referring to the failure of a detection system to recognize the authenticity of a new user. State-of-the-art solutions employ a translational knowledge graph embedding approach (TransE) to model the interaction of the components of a review system. However, these approaches suffer from the limitation of TransEin handling N-1 relations and the narrow scope of a single classification task, i.e., detecting fraudsters only. In this paper, we model a review system as a Heterogeneous InformationNetwork (HIN) which enables a unique representation to every component and performs graph inductive learning on the review data through aggregating features of nearby nodes. HIN with graph induction helps to address the camouflage issue (fraudsterswith genuine reviews) which has shown to be more severe when it is coupled with cold-start, i.e., new fraudsters with genuine first reviews. In this research, instead of focusing only on one component, detecting either fraud reviews or fraud users (fraudsters), vector representations are learnt for each component, enabling multi-component classification. In other words, we are able to detect fraud reviews, fraudsters, and fraud-targeted items, thus the name of our approach DFraud3. DFraud3 demonstrates a significant accuracy increase of 13% over the state of the art on Yelp.
△ Less
Submitted 11 June, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge
Authors:
Mohammad Adiban,
Hossein Sameti,
Saeedreza Shehnepoor
Abstract:
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The…
▽ More
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge from adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we used the Constant Q Cepstral Coefficient (CQCC) features fed into an autoencoder to attain more informative features and to consider the noise information of spoofed utterances for discrimination purpose. Finally, different configurations of the Siamese network were used for the first time in this context for classification. The experiments performed on ASVspoof challenge 2019 dataset using Equal Error Rate (EER) and Tandem Detection Cost Function (t-DCF) as evaluation metrics show that the proposed system improved the results over the baseline by 10.73% and 0.2344 in terms of EER and t-DCF, respectively.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
Statistical feature embedding for heart sound classification
Authors:
Mohammad Adiban,
Bagher BabaAli,
Saeedreza Shehnepoor
Abstract:
Cardiovascular Disease (CVD) is considered as one of the principal causes of death in the world. Over recent years, this field of study has attracted researchers' attention to investigate heart sounds' patterns for disease diagnostics. In this study, an approach is proposed for normal/abnormal heart sound classification on the Physionet challenge 2016 dataset. For the first time, a fixed-length fe…
▽ More
Cardiovascular Disease (CVD) is considered as one of the principal causes of death in the world. Over recent years, this field of study has attracted researchers' attention to investigate heart sounds' patterns for disease diagnostics. In this study, an approach is proposed for normal/abnormal heart sound classification on the Physionet challenge 2016 dataset. For the first time, a fixed-length feature vector; called i-vector; is extracted from each heart sound using Mel Frequency Cepstral Coefficient (MFCC) features. Afterwards, Principal Component Analysis (PCA) transform and Variational Autoencoder (VAE) are applied on the i-vector to achieve dimension reduction. Eventually, the reduced size vector is fed to Gaussian Mixture Models (GMMs) and Support Vector Machine (SVM) for classification purpose. Experimental results demonstrate the proposed method could achieve a performance improvement of 16% based on Modified Accuracy (MAcc) compared with the baseline system on the Physoinet dataset.
△ Less
Submitted 9 November, 2020; v1 submitted 26 April, 2019;
originally announced April 2019.
-
NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media
Authors:
Saeedreza Shehnepoor,
Mostafa Salehi,
Reza Farahbakhsh,
Noel Crespi
Abstract:
Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a co…
▽ More
Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, reviewlinguistic, user-linguistic, the first type of features performs better than the other categories.
△ Less
Submitted 10 March, 2017;
originally announced March 2017.