Search | arXiv e-print repository

Hamming Distributions of Popular Perceptual Hashing Techniques

Authors: Sean McKeown, William J Buchanan

Abstract: Content-based file matching has been widely deployed for decades, largely for the detection of sources of copyright infringement, extremist materials, and abusive sexual media. Perceptual hashes, such as Microsoft's PhotoDNA, are one automated mechanism for facilitating detection, allowing for machines to approximately match visual features of an image or video in a robust manner. However, there d… ▽ More Content-based file matching has been widely deployed for decades, largely for the detection of sources of copyright infringement, extremist materials, and abusive sexual media. Perceptual hashes, such as Microsoft's PhotoDNA, are one automated mechanism for facilitating detection, allowing for machines to approximately match visual features of an image or video in a robust manner. However, there does not appear to be much public evaluation of such approaches, particularly when it comes to how effective they are against content-preserving modifications to media files. In this paper, we present a million-image scale evaluation of several perceptual hashing archetypes for popular algorithms (including Facebook's PDQ, Apple's Neuralhash, and the popular pHash library) against seven image variants. The focal point is the distribution of Hamming distance scores between both unrelated images and image variants to better understand the problems faced by each approach. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Journal ref: DFRWS (Digital Forensics Research Conference) EU 2023, 21-24 March 2023, Bonn, Germany

arXiv:2007.13410 [pdf]

doi 10.1109/CyberSecurity49315.2020.9138887

Testing And Hardening IoT Devices Against the Mirai Botnet

Authors: Christopher Kelly, Nikolaos Pitropakis, Sean McKeown, Costas Lambrinoudakis

Abstract: A large majority of cheap Internet of Things (IoT) devices that arrive brand new, and are configured with out-of-the-box settings, are not being properly secured by the manufactures, and are vulnerable to existing malware lurking on the Internet. Among them is the Mirai botnet which has had its source code leaked to the world, allowing any malicious actor to configure and unleash it. A combination… ▽ More A large majority of cheap Internet of Things (IoT) devices that arrive brand new, and are configured with out-of-the-box settings, are not being properly secured by the manufactures, and are vulnerable to existing malware lurking on the Internet. Among them is the Mirai botnet which has had its source code leaked to the world, allowing any malicious actor to configure and unleash it. A combination of software assets not being utilised safely and effectively are exposing consumers to a full compromise. We configured and attacked 4 different IoT devices using the Mirai libraries. Our experiments concluded that three out of the four devices were vulnerable to the Mirai malware and became infected when deployed using their default configuration. This demonstrates that the original security configurations are not sufficient to provide acceptable levels of protection for consumers, leaving their devices exposed and vulnerable. By analysing the Mirai libraries and its attack vectors, we were able to determine appropriate device configuration countermeasures to harden the devices against this botnet, which were successfully validated through experimentation. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: 8 pages, conference paper

arXiv:2006.08749 [pdf]

doi 10.1109/CyberSecurity49315.2020.9138849

Using Amazon Alexa APIs as a Source of Digital Evidence

Authors: Clemens Krueger, Sean McKeown

Abstract: With the release of Amazon Alexa and the first Amazon Echo device, the company revolutionised the smart home. It allowed their users to communicate with, and control, their smart home ecosystem purely using voice commands. However, this also means that Amazon processes and stores a large amount of personal data about their users, as these devices are always present and always listening in peoples'… ▽ More With the release of Amazon Alexa and the first Amazon Echo device, the company revolutionised the smart home. It allowed their users to communicate with, and control, their smart home ecosystem purely using voice commands. However, this also means that Amazon processes and stores a large amount of personal data about their users, as these devices are always present and always listening in peoples' private homes. That makes this data a valuable source of evidence for investigators performing digital forensics. The Alexa Voice Service uses a series of APIs for communication between clients and the Amazon cloud. These APIs return a wide range of data related to the functionality of the device used. The first goal of this research was to clarify exactly what kind of information about the user is stored and accessible through these APIs. To do this, a combination of literature review and exploratory analysis was used to establish a list of all relevant APIs. Then, possible artefacts and conclusions to be drawn from their responses were identified and presented. Lastly, the perspective of the users was taken, and options for improving their privacy were reviewed. Specifically, the history of interaction between the user and Alexa is available through multiple APIs, and there are several options to delete it. It was determined that these options have different behaviours and that most of them do not remove all data related to user interaction. △ Less

Submitted 27 July, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:2006.08060 [pdf]

doi 10.1109/CyberSecurity49315.2020.9138890

Forensic Considerations for the High Efficiency Image File Format (HEIF)

Authors: Sean McKeown, Gordon Russell

Abstract: The High Efficiency File Format (HEIF) was adopted by Apple in 2017 as their favoured means of capturing images from their camera application, with Android devices such as the Galaxy S10 providing support more recently. The format is positioned to replace JPEG as the de facto image compression file type, touting many modern features and better compression ratios over the aging standard. However, w… ▽ More The High Efficiency File Format (HEIF) was adopted by Apple in 2017 as their favoured means of capturing images from their camera application, with Android devices such as the Galaxy S10 providing support more recently. The format is positioned to replace JPEG as the de facto image compression file type, touting many modern features and better compression ratios over the aging standard. However, while millions of devices across the world are already able to produce HEIF files, digital forensics research has not given the format much attention. As HEIF is a complex container format, much different from traditional still picture formats, this leaves forensics practitioners exposed to risks of potentially mishandling evidence. This paper describes the forensically relevant features of the HEIF format, including those which could be used to hide data, or cause issues in an investigation, while also providing commentary on the state of software support for the format. Finally, suggestions for current best-practice are provided, before discussing the requirements of a forensically robust HEIF analysis tool. △ Less

Submitted 27 July, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: 8 pages, conference paper pre-print

arXiv:2006.01849 [pdf, other]

doi 10.1109/CyberSecurity49315.2020.9138859

Towards Identifying Human Actions, Intent, and Severity of APT Attacks Applying Deception Techniques -- An Experiment

Authors: Joel Chacon, Sean McKeown, Richard Macfarlane

Abstract: Attacks by Advanced Persistent Threats (APTs) have been shown to be difficult to detect using traditional signature- and anomaly-based intrusion detection approaches. Deception techniques such as decoy objects, often called honey items, may be deployed for intrusion detection and attack analysis, providing an alternative to detect APT behaviours. This work explores the use of honey items to classi… ▽ More Attacks by Advanced Persistent Threats (APTs) have been shown to be difficult to detect using traditional signature- and anomaly-based intrusion detection approaches. Deception techniques such as decoy objects, often called honey items, may be deployed for intrusion detection and attack analysis, providing an alternative to detect APT behaviours. This work explores the use of honey items to classify intrusion interactions, differentiating automated attacks from those which need some human reasoning and interaction towards APT detection. Multiple decoy items are deployed on honeypots in a virtual honey network, some as breadcrumbs to detect indications of a structured manual attack. Monitoring functionality was created around Elastic Stack with a Kibana dashboard created to display interactions with various honey items. APT type manual intrusions are simulated by an experienced pentesting practitioner carrying out simulated attacks. Interactions with honey items are evaluated in order to determine their suitability for discriminating between automated tools and direct human intervention. The results show that it is possible to differentiate automatic attacks from manual structured attacks; from the nature of the interactions with the honey items. The use of honey items found in the honeypot, such as in later parts of a structured attack, have been shown to be successful in classification of manual attacks, as well as towards providing an indication of severity of the attacks △ Less

Submitted 2 June, 2020; originally announced June 2020.

arXiv:2005.06599 [pdf, other]

doi 10.5220/0008902202890298

Phishing URL Detection Through Top-level Domain Analysis: A Descriptive Approach

Authors: Orestis Christou, Nikolaos Pitropakis, Pavlos Papadopoulos, Sean McKeown, William J. Buchanan

Abstract: Phishing is considered to be one of the most prevalent cyber-attacks because of its immense flexibility and alarmingly high success rate. Even with adequate training and high situational awareness, it can still be hard for users to continually be aware of the URL of the website they are visiting. Traditional detection methods rely on blocklists and content analysis, both of which require time-cons… ▽ More Phishing is considered to be one of the most prevalent cyber-attacks because of its immense flexibility and alarmingly high success rate. Even with adequate training and high situational awareness, it can still be hard for users to continually be aware of the URL of the website they are visiting. Traditional detection methods rely on blocklists and content analysis, both of which require time-consuming human verification. Thus, there have been attempts focusing on the predictive filtering of such URLs. This study aims to develop a machine-learning model to detect fraudulent URLs which can be used within the Splunk platform. Inspired from similar approaches in the literature, we trained the SVM and Random Forests algorithms using malicious and benign datasets found in the literature and one dataset that we created. We evaluated the algorithms' performance with precision and recall, reaching up to 85% precision and 87% recall in the case of Random Forests while SVM achieved up to 90% precision and 88% recall using only descriptive features. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: In Proceedings of the 6th ICISSP

MSC Class: 68-06

Journal ref: ICISSP, Volume 1, pages 289-298 (2020)

Showing 1–6 of 6 results for author: McKeown, S