Search | arXiv e-print repository

A Survey of Relevant Text Mining Technology

Authors: Claudia Peersman, Matthew Edwards, Emma Williams, Awais Rashid

Abstract: Recent advances in text mining and natural language processing technology have enabled researchers to detect an authors identity or demographic characteristics, such as age and gender, in several text genres by automatically analysing the variation of linguistic characteristics. However, applying such techniques in the wild, i.e., in both cybercriminal and regular online social media, differs from… ▽ More Recent advances in text mining and natural language processing technology have enabled researchers to detect an authors identity or demographic characteristics, such as age and gender, in several text genres by automatically analysing the variation of linguistic characteristics. However, applying such techniques in the wild, i.e., in both cybercriminal and regular online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied on social media communications typically has no control over the dataset size, the number of available communications will vary across users. Hence, the system has to be robust towards limited data availability. Additionally, the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to a certain degree of linguistic noise (for example, abbreviations, non-standard language use, spelling variations and errors). Finally, in the context of cybercriminal fora, it has to be robust towards deceptive or adversarial behaviour, i.e. offenders who attempt to hide their criminal intentions (obfuscation) or who assume a false digital persona (imitation), potentially using coded language. In this work we present a comprehensive survey that discusses the problems that have already been addressed in current literature and review potential solutions. Additionally, we highlight which areas need to be given more attention. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2203.13179 [pdf, other]

Automatic User Profiling in Darknet Markets: a Scalability Study

Authors: Claudia Peersman, Matthew Edwards, Emma Williams, Awais Rashid

Abstract: In this study, we investigate the scalability of state-of-the-art user profiling technologies across different online domains. More specifically, this work aims to understand the reliability and limitations of current computational stylometry approaches when these are applied to underground fora in which user populations potentially differ from other online platforms (predominantly male, younger a… ▽ More In this study, we investigate the scalability of state-of-the-art user profiling technologies across different online domains. More specifically, this work aims to understand the reliability and limitations of current computational stylometry approaches when these are applied to underground fora in which user populations potentially differ from other online platforms (predominantly male, younger age and greater computer use) and cyber offenders who attempt to hide their identity. Because no ground truth is available and no validated criminal data from historic investigations is available for validation purposes, we have collected new data from clearweb forums that do include user demographics and could be more closely related to underground fora in terms of user population (e.g., tech communities) than commonly used social media benchmark datasets showing a more balanced user population. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.08642 [pdf, other]

Understanding motivations and characteristics of financially-motivated cybercriminals

Authors: Claudia Peersman, Emma Williams, Matthew Edwards, Awais Rashid

Abstract: Background: Cyber offences, such as hacking, malware creation and distribution, and online fraud, present a substantial threat to organizations attempting to safeguard their data and information. By understanding the evolving characteristics and motivations of individuals involved in these activities, and the threats that they may pose, cyber security practitioners will be better placed to underst… ▽ More Background: Cyber offences, such as hacking, malware creation and distribution, and online fraud, present a substantial threat to organizations attempting to safeguard their data and information. By understanding the evolving characteristics and motivations of individuals involved in these activities, and the threats that they may pose, cyber security practitioners will be better placed to understand and assess current threats to their systems and the range of socio-technical mitigations that may best reduce these. Aim: The reported work-in-progress aims to explore the extent to which findings from prior academic literature regarding the characteristics and motivations of offenders engaging in financially-motivated, cyber-dependent crime are supported by the contemporary experiences and perspectives of practitioners currently working in the cyber crime field. Method: A targeted, online survey was developed consisting of both closed and open-ended questions relating to current cyber threats and the characteristics and motivations of offenders engaged in these activities. Sixteen practitioners working in law enforcement-related domains in the cyber crime field completed the survey, providing a combination of qualitative and quantitative data for analysis. △ Less

Submitted 28 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.08557 [pdf, ps, other]

How darknet market users learned to worry more and love PGP: Analysis of security advice on darknet marketplaces

Authors: Andrew C. Dwyer, Joseph Hallett, Claudia Peersman, Matthew Edwards, Brittany I. Davidson, Awais Rashid

Abstract: Darknet marketplaces, accessible through, Tor are where users can buy illicit goods, and learn to hide from law enforcement. We surveyed the advice on these markets and found valid security advice mixed up with paranoid threat models and a reliance on privacy tools dismissed as unusable by the mainstream. Darknet marketplaces, accessible through, Tor are where users can buy illicit goods, and learn to hide from law enforcement. We surveyed the advice on these markets and found valid security advice mixed up with paranoid threat models and a reliance on privacy tools dismissed as unusable by the mainstream. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2202.07419 [pdf, other]

Characterising Cybercriminals: A Review

Authors: Matthew Edwards, Emma Williams, Claudia Peersman, Awais Rashid

Abstract: This review provides an overview of current research on the known characteristics and motivations of offenders engaging in cyber-dependent crimes. Due to the shifting dynamics of cybercriminal behaviour, and the availability of prior reviews in 2013, this review focuses on original research conducted from 2012 onwards, although some older studies that were not included in prior reviews are also co… ▽ More This review provides an overview of current research on the known characteristics and motivations of offenders engaging in cyber-dependent crimes. Due to the shifting dynamics of cybercriminal behaviour, and the availability of prior reviews in 2013, this review focuses on original research conducted from 2012 onwards, although some older studies that were not included in prior reviews are also considered. As a basis for interpretation of results, a limited quality assessment was also carried out on included studies through examination of key indicators. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:1905.12593 [pdf, other]

Automatically Dismantling Online Dating Fraud

Authors: Guillermo Suarez-Tangil, Matthew Edwards, Claudia Peersman, Gianluca Stringhini, Awais Rashid, Monica Whitty

Abstract: Online romance scams are a prevalent form of mass-marketing fraud in the West, and yet few studies have addressed the technical or data-driven responses to this problem. In this type of scam, fraudsters craft fake profiles and manually interact with their victims. Because of the characteristics of this type of fraud and of how dating sites operate, traditional detection methods (e.g., those used i… ▽ More Online romance scams are a prevalent form of mass-marketing fraud in the West, and yet few studies have addressed the technical or data-driven responses to this problem. In this type of scam, fraudsters craft fake profiles and manually interact with their victims. Because of the characteristics of this type of fraud and of how dating sites operate, traditional detection methods (e.g., those used in spam filtering) are ineffective. In this paper, we present the results of a multi-pronged investigation into the archetype of online dating profiles used in this form of fraud, including their use of demographics, profile descriptions, and images, shedding light on both the strategies deployed by scammers to appeal to victims and the traits of victims themselves. Further, in response to the severe financial and psychological harm caused by dating fraud, we develop a system to detect romance scammers on online dating platforms. Our work presents the first system for automatically detecting this fraud. Our aim is to provide an early detection system to stop romance scammers as they create fraudulent profiles or before they engage with potential victims. Previous research has indicated that the victims of romance scams score highly on scales for idealized romantic beliefs. We combine a range of structured, unstructured, and deep-learned features that capture these beliefs. No prior work has fully analyzed whether these notions of romance introduce traits that could be leveraged to build a detection system. Our ensemble machine-learning approach is robust to the omission of profile details and performs at high accuracy (97\%). The system enables development of automated tools for dating site providers and individual users. △ Less

Submitted 30 May, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1601.02431 [pdf]

The Effects of Age, Gender and Region on Non-standard Linguistic Variation in Online Social Networks

Authors: Claudia Peersman, Walter Daelemans, Reinhild Vandekerckhove, Bram Vandekerckhove, Leona Van Vaerenbergh

Abstract: We present a corpus-based analysis of the effects of age, gender and region of origin on the production of both "netspeak" or "chatspeak" features and regional speech features in Flemish Dutch posts that were collected from a Belgian online social network platform. The present study shows that combining quantitative and qualitative approaches is essential for understanding non-standard linguistic… ▽ More We present a corpus-based analysis of the effects of age, gender and region of origin on the production of both "netspeak" or "chatspeak" features and regional speech features in Flemish Dutch posts that were collected from a Belgian online social network platform. The present study shows that combining quantitative and qualitative approaches is essential for understanding non-standard linguistic variation in a CMC corpus. It also presents a methodology that enables the systematic study of this variation by including all non-standard words in the corpus. The analyses resulted in a convincing illustration of the Adolescent Peak Principle. In addition, our approach revealed an intriguing correlation between the use of regional speech features and chatspeak features. △ Less

Submitted 11 January, 2016; originally announced January 2016.

Showing 1–7 of 7 results for author: Peersman, C