-
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Authors:
Akila Wickramasekara,
Frank Breitinger,
Mark Scanlon
Abstract:
The growing number of cases that require digital forensic analysis raises concerns about the ability of law enforcement to conduct investigations promptly. Consequently, this paper delves into the potential and effectiveness of integrating Large Language Models (LLMs) into digital forensic investigation to address these challenges. A comprehensive literature review is carried out, encompassing exi…
▽ More
The growing number of cases that require digital forensic analysis raises concerns about the ability of law enforcement to conduct investigations promptly. Consequently, this paper delves into the potential and effectiveness of integrating Large Language Models (LLMs) into digital forensic investigation to address these challenges. A comprehensive literature review is carried out, encompassing existing digital forensic models, tools, LLMs, deep learning techniques, and the use of LLMs in investigations. The review identifies current challenges within existing digital forensic processes and explores both the obstacles and possibilities of incorporating LLMs. In conclusion, the study asserts that the adoption of LLMs in digital forensics, with appropriate constraints, has the potential to improve investigation efficiency, improve traceability, and alleviate technical and judicial barriers faced by law enforcement entities.
△ Less
Submitted 11 June, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review
Authors:
Opeyemi Bamigbade,
John Sheppard,
Mark Scanlon
Abstract:
The task of multimedia geolocation is becoming an increasingly essential component of the digital forensics toolkit to effectively combat human trafficking, child sexual exploitation, and other illegal acts. Typically, metadata-based geolocation information is stripped when multimedia content is shared via instant messaging and social media. The intricacy of geolocating, geotagging, or finding geo…
▽ More
The task of multimedia geolocation is becoming an increasingly essential component of the digital forensics toolkit to effectively combat human trafficking, child sexual exploitation, and other illegal acts. Typically, metadata-based geolocation information is stripped when multimedia content is shared via instant messaging and social media. The intricacy of geolocating, geotagging, or finding geographical clues in this content is often overly burdensome for investigators. Recent research has shown that contemporary advancements in artificial intelligence, specifically computer vision and deep learning, show significant promise towards expediting the multimedia geolocation task. This systematic literature review thoroughly examines the state-of-the-art leveraging computer vision techniques for multimedia geolocation and assesses their potential to expedite human trafficking investigation. This includes a comprehensive overview of the application of computer vision-based approaches to multimedia geolocation, identifies their applicability in combating human trafficking, and highlights the potential implications of enhanced multimedia geolocation for prosecuting human trafficking. 123 articles inform this systematic literature review. The findings suggest numerous potential paths for future impactful research on the subject.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Systematic Literature Review of EM-SCA Attacks on Encryption
Authors:
Muhammad Rusyaidi Zunaidi,
Asanka Sayakkara,
Mark Scanlon
Abstract:
Cryptography is vital for data security, but cryptographic algorithms can still be vulnerable to side-channel attacks (SCAs), physical assaults exploiting power consumption and EM radiation. SCAs pose a significant threat to cryptographic integrity, compromising device keys. While literature on SCAs focuses on real-world devices, the rise of sophisticated devices necessitates fresh approaches. Ele…
▽ More
Cryptography is vital for data security, but cryptographic algorithms can still be vulnerable to side-channel attacks (SCAs), physical assaults exploiting power consumption and EM radiation. SCAs pose a significant threat to cryptographic integrity, compromising device keys. While literature on SCAs focuses on real-world devices, the rise of sophisticated devices necessitates fresh approaches. Electromagnetic side-channel analysis (EM-SCA) gathers information by monitoring EM radiation, capable of retrieving encryption keys and detecting malicious activity. This study evaluates EM-SCA's impact on encryption across scenarios and explores its role in digital forensics and law enforcement. Addressing encryption susceptibility to EM-SCA can empower forensic investigators in overcoming encryption challenges, maintaining their crucial role in law enforcement. Additionally, the paper defines EM-SCA's current state in attacking encryption, highlighting vulnerable and resistant encryption algorithms and devices, and promising EM-SCA approaches. This study offers a comprehensive analysis of EM-SCA in law enforcement and digital forensics, suggesting avenues for further research.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data
Authors:
John M. Scanlon,
Kristofer D. Kusano,
Laura A. Fraade-Blanar,
Timothy L. McMurry,
Yin-Hsiu Chen,
Trent Victor
Abstract:
With fully automated driving systems (ADS; SAE level 4) ride-hailing services expanding in the US, we are now approaching an inflection point, where the process of retrospectively evaluating ADS safety impact can start to yield statistically credible conclusions. An ADS safety impact measurement requires a comparison to a "benchmark" crash rate. This study aims to address, update, and extend the e…
▽ More
With fully automated driving systems (ADS; SAE level 4) ride-hailing services expanding in the US, we are now approaching an inflection point, where the process of retrospectively evaluating ADS safety impact can start to yield statistically credible conclusions. An ADS safety impact measurement requires a comparison to a "benchmark" crash rate. This study aims to address, update, and extend the existing literature by leveraging police-reported crashes to generate human crash rates for multiple geographic areas with current ADS deployments. All of the data leveraged is publicly accessible, and the benchmark determination methodology is intended to be repeatable and transparent. Generating a benchmark that is comparable to ADS crash data is associated with certain challenges, including data selection, handling underreporting and reporting thresholds, identifying the population of drivers and vehicles to compare against, choosing an appropriate severity level to assess, and matching crash and mileage exposure data. Consequently, we identify essential steps when generating benchmarks, and present our analyses amongst a backdrop of existing ADS benchmark literature. One analysis presented is the usage of established underreporting correction methodology to publicly available human driver police-reported data to improve comparability to publicly available ADS crash data. We also identify important dependencies in controlling for geographic region, road type, and vehicle type, and show how failing to control for these features can bias results. This body of work aims to contribute to the ability of the community - researchers, regulators, industry, and experts - to reach consensus on how to estimate accurate benchmarks.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Comparison of Waymo Rider-Only Crash Data to Human Benchmarks at 7.1 Million Miles
Authors:
Kristofer D. Kusano,
John M. Scanlon,
Yin-Hsiu Chen,
Timothy L. McMurry,
Ruoshu Chen,
Tilia Gode,
Trent Victor
Abstract:
This paper examines the safety performance of the Waymo Driver, an SAE level 4 automated driving system (ADS) used in a rider-only (RO) ride-hailing application without a human driver, either in the vehicle or remotely. ADS crash data was derived from NHTSA's Standing General Order (SGO) reporting over 7.14 million RO miles through the end of October 2023 in Phoenix, AZ, San Francisco, CA, and Los…
▽ More
This paper examines the safety performance of the Waymo Driver, an SAE level 4 automated driving system (ADS) used in a rider-only (RO) ride-hailing application without a human driver, either in the vehicle or remotely. ADS crash data was derived from NHTSA's Standing General Order (SGO) reporting over 7.14 million RO miles through the end of October 2023 in Phoenix, AZ, San Francisco, CA, and Los Angeles, CA. This study is one of the first to compare overall crashed vehicle rates using only RO data (as opposed to ADS testing with a human behind the wheel) to a human benchmark that also corrects for biases caused by underreporting and unequal reporting thresholds reported in the literature. When considering all locations together, the any-injury-reported crashed vehicle rate was 0.41 incidents per million miles (IPMM) for the ADS vs 2.78 IPMM for the human benchmark, an 85% reduction or a 6.8 times lower rate. Police-reported crashed vehicle rates for all locations together were 2.1 IPMM for the ADS vs. 4.85 IPMM for the human benchmark, a 57% reduction or 2.3 times lower rate. Police-reported and any-injury-reported crashed vehicle rate reductions for the ADS were statistically significant when compared in San Francisco and Phoenix as well as combined across all locations. The comparison in Los Angeles, which to date has low mileage and no reported events, was not statistically significant. In general, the Waymo ADS had a lower any property damage or injury rate than the human benchmarks. Given imprecision in the benchmark estimate and multiple potential sources of underreporting biasing the benchmarks, caution should be taken when interpreting the results of the any property damage or injury comparison. Together, these crash-rate results should be interpreted as a directional and continuous confidence growth indicator, together with other methodologies, in a safety case approach.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Ensuring Cross-Device Portability of Electromagnetic Side-Channel Analysis
Authors:
Lojenaa Navanesana,
Nhien-An Le-Khac,
Mark Scanlon,
Kasun De Zoysa,
Asanka P. Sayakkara
Abstract:
Investigation on smart devices has become an essential subdomain in digital forensics. The inherent diversity and complexity of smart devices pose a challenge to the extraction of evidence without physically tampering with it, which is often a strict requirement in law enforcement and legal proceedings. Recently, this has led to the application of non-intrusive Electromagnetic Side-Channel Analysi…
▽ More
Investigation on smart devices has become an essential subdomain in digital forensics. The inherent diversity and complexity of smart devices pose a challenge to the extraction of evidence without physically tampering with it, which is often a strict requirement in law enforcement and legal proceedings. Recently, this has led to the application of non-intrusive Electromagnetic Side-Channel Analysis (EM-SCA) as an emerging approach to extract forensic insights from smart devices. EM-SCA for digital forensics is still in its infancy, and has only been tested on a small number of devices so far. Most importantly, the question still remains whether Machine Learning (ML) models in EM-SCA are portable across multiple devices to be useful in digital forensics, i.e., cross-device portability. This study experimentally explores this aspect of EM-SCA using a wide set of smart devices. The experiments using various iPhones and Nordic Semiconductor nRF52-DK devices indicate that the direct application of pre-trained ML models across multiple identical devices does not yield optimal outcomes (under 20% accuracy in most cases). Subsequent experiments included collecting distinct samples of EM traces from all the devices to train new ML models with mixed device data; this also fell short of expectations (still below 20% accuracy). This prompted the adoption of transfer learning techniques, which showed promise for cross-model implementations. In particular, for the iPhone 13 and nRF52-DK devices, applying transfer learning techniques resulted in achieving the highest accuracy, with accuracy scores of 98% and 96%, respectively. This result makes a significant advancement in the application of EM-SCA to digital forensics by enabling the use of pre-trained models across identical or similar devices.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
DFRWS EU 10-Year Review and Future Directions in Digital Forensic Research
Authors:
Frank Breitinger,
Jan-Niclas Hilgert,
Christopher Hargreaves,
John Sheppard,
Rebekah Overdorf,
Mark Scanlon
Abstract:
Conducting a systematic literature review and comprehensive analysis, this paper surveys all 135 peer-reviewed articles published at the Digital Forensics Research Conference Europe (DFRWS EU) spanning the decade since its inaugural running (2014-2023). This comprehensive study of DFRWS EU articles encompasses sub-disciplines such as digital forensic science, device forensics, techniques and funda…
▽ More
Conducting a systematic literature review and comprehensive analysis, this paper surveys all 135 peer-reviewed articles published at the Digital Forensics Research Conference Europe (DFRWS EU) spanning the decade since its inaugural running (2014-2023). This comprehensive study of DFRWS EU articles encompasses sub-disciplines such as digital forensic science, device forensics, techniques and fundamentals, artefact forensics, multimedia forensics, memory forensics, and network forensics. Quantitative analysis of the articles' co-authorships, geographical spread and citation metrics are outlined. The analysis presented offers insights into the evolution of digital forensic research efforts over these ten years and informs some identified future research directions.
△ Less
Submitted 15 March, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown
Authors:
Mark Scanlon,
Frank Breitinger,
Christopher Hargreaves,
Jan-Niclas Hilgert,
John Sheppard
Abstract:
The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of…
▽ More
The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Increase Investment in Accessible Physics Labs: A Call to Action for the Physics Education Community
Authors:
Dimitri R. Dounas-Frazer,
Daniel Gillen,
Catherine M. Herne,
Erin Howard,
Rebecca S. Lindell,
G I. McGrew,
J. Reid Mumford,
Newton H. Nguyen,
L. C. Osadchuk,
Jamie Principato Crane,
Tyler M. Pugeda,
Kevauna Reeves,
Erin M. Scanlon,
David Spiecker,
Sheila Z. Xu
Abstract:
The American Association of Physics Teachers (AAPT) Committee on Laboratories assembled a task force whose charge was to write an open letter to the physics education community calling for increased investment in accessible lab courses. Contributors to this paper include students, staff, and faculty with and without disabilities who expressed interest in the open letter. In this document, we recog…
▽ More
The American Association of Physics Teachers (AAPT) Committee on Laboratories assembled a task force whose charge was to write an open letter to the physics education community calling for increased investment in accessible lab courses. Contributors to this paper include students, staff, and faculty with and without disabilities who expressed interest in the open letter. In this document, we recognize the need for making physics laboratories more accessible in all spaces (e.g., high school courses, graduate level courses, research labs). We focus on the experiences of students with disabilities in physics lab courses at the undergraduate level because that is the context for which the writing team had the most collective experience. The intended audiences for this document consist of undergraduate physics students, staff, and faculty, especially those who have direct stake in laboratory courses; physics departments; and member societies, including AAPT.
We begin by presenting our motivation for the document and the importance of accessibility and diversity in education and the workforce. We start with the broader context of accessibility, narrowing our focus to physics education and the current state of affairs and availability of accessible resources. Accessibility is then discussed in the specific context of physics laboratory courses, focusing on how barriers are created and can be lowered. In exploring ideas and strategies for improving accessibility, we recognize that the development of multiple pathways for laboratory investigation creates opportunities to expand learning opportunities for more students in physics lab programs.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic Investigation
Authors:
Xiaoyu Du,
Chris Hargreaves,
John Sheppard,
Felix Anda,
Asanka Sayakkara,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based…
▽ More
Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based tools and approaches in digital forensics. Automated evidence processing leveraging artificial intelligence based techniques shows great promise in expediting the digital forensic analysis process while increasing case processing capacities. For each application of artificial intelligence highlighted, a number of current challenges and future potential impact is discussed.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events
Authors:
Xiaoyu Du,
Quan Le,
Mark Scanlon
Abstract:
Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications…
▽ More
Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications can facilitate an opportunity for training automated artificial intelligence based evidence processing systems. These can significantly aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination building on the growing trend towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered pertinent files to classify newly discovered files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique generates a relevancy score for file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Assessing the Influencing Factors on the Accuracy of Underage Facial Age Estimation
Authors:
Felix Anda,
Brett A. Becker,
David Lillis,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic material. Automated techniques also show promise…
▽ More
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic material. Automated techniques also show promise in decreasing the overflowing backlog of evidence obtained from increasing numbers of devices and online services. A lack of sufficient training data combined with natural human variance has been long hindering accurate automated age estimation -- especially for underage subjects. This paper presented a comprehensive evaluation of the performance of two cloud age estimation services (Amazon Web Service's Rekognition service and Microsoft Azure's Face API) against a dataset of over 21,800 underage subjects. The objective of this work is to evaluate the influence that certain human biometric factors, facial expressions, and image quality (i.e. blur, noise, exposure and resolution) have on the outcome of automated age estimation services. A thorough evaluation allows us to identify the most influential factors to be overcome in future age estimation systems.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Smarter Password Guessing Techniques Leveraging Contextual Information and OSINT
Authors:
Aikaterini Kanta,
Iwen Coisel,
Mark Scanlon
Abstract:
In recent decades, criminals have increasingly used the web to research, assist and perpetrate criminal behaviour. One of the most important ways in which law enforcement can battle this growing trend is through accessing pertinent information about suspects in a timely manner. A significant hindrance to this is the difficulty of accessing any system a suspect uses that requires authentication via…
▽ More
In recent decades, criminals have increasingly used the web to research, assist and perpetrate criminal behaviour. One of the most important ways in which law enforcement can battle this growing trend is through accessing pertinent information about suspects in a timely manner. A significant hindrance to this is the difficulty of accessing any system a suspect uses that requires authentication via password. Password guessing techniques generally consider common user behaviour while generating their passwords, as well as the password policy in place. Such techniques can offer a modest success rate considering a large/average population. However, they tend to fail when focusing on a single target -- especially when the latter is an educated user taking precautions as a savvy criminal would be expected to do. Open Source Intelligence is being increasingly leveraged by Law Enforcement in order to gain useful information about a suspect, but very little is currently being done to integrate this knowledge in an automated way within password cracking. The purpose of this research is to delve into the techniques that enable the gathering of the necessary context about a suspect and find ways to leverage this information within password guessing techniques.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Retracing the Flow of the Stream: Investigating Kodi Streaming Services
Authors:
Samuel Todd Bromley,
John Sheppard,
Mark Scanlon,
Nhien-An Le-Khac
Abstract:
Kodi is of one of the world's largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual abuse organizations publishing/streaming child ab…
▽ More
Kodi is of one of the world's largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual abuse organizations publishing/streaming child abuse material to an international paying clientele. Open source software used for viewing videos from the Internet, such as Kodi, is being exploited by criminals to conduct their activities. In this paper, we describe a new method to quickly locate Kodi artifacts and gather information for a successful prosecution. We also evaluate our approach on different platforms; Windows, Android and Linux. Our experiments show the file location, artifacts and a history of viewed content including their locations from the Internet. Our approach will serve as a resource to forensic investigators to examine Kodi or similar streaming platforms.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Improving Borderline Adulthood Facial Age Estimation through Ensemble Learning
Authors:
Felix Anda,
David Lillis,
Aikaterini Kanta,
Brett A. Becker,
Elias Bou-Harb,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 years. The weakness of the algorithms specifically in…
▽ More
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 years. The weakness of the algorithms specifically in the borderline has been a motivation for this paper. In our approach, we have developed an ensemble technique that improves the accuracy of underage estimation in conjunction with our deep learning model (DS13K) that has been fine-tuned on the Deep Expectation (DEX) model. We have achieved an accuracy of 68% for the age group 16 to 17 years old, which is 4 times better than the DEX accuracy for such age range. We also present an evaluation of existing cloud-based and offline facial age prediction services, such as Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net and DEX.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts
Authors:
Xiaoyu Du,
Mark Scanlon
Abstract:
The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on seized devices are not pertinent to the investigation. Manually retrieving suspicious files relevant to the investigation is akin to finding a needle in a haystack. In this paper, a methodology for the automatic prioritisation of suspicio…
▽ More
The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on seized devices are not pertinent to the investigation. Manually retrieving suspicious files relevant to the investigation is akin to finding a needle in a haystack. In this paper, a methodology for the automatic prioritisation of suspicious file artefacts (i.e., file artefacts that are pertinent to the investigation) is proposed to reduce the manual analysis effort required. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is likely to be suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Leveraging Electromagnetic Side-Channel Analysis for the Investigation of IoT Devices
Authors:
Asanka Sayakkara,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Internet of Things (IoT) devices have expanded the horizon of digital forensic investigations by providing a rich set of new evidence sources. IoT devices includes health implants, sports wearables, smart burglary alarms, smart thermostats, smart electrical appliances, and many more. Digital evidence from these IoT devices is often extracted from third party sources, e.g., paired smartphone applic…
▽ More
Internet of Things (IoT) devices have expanded the horizon of digital forensic investigations by providing a rich set of new evidence sources. IoT devices includes health implants, sports wearables, smart burglary alarms, smart thermostats, smart electrical appliances, and many more. Digital evidence from these IoT devices is often extracted from third party sources, e.g., paired smartphone applications or the devices' back-end cloud services. However vital digital evidence can still reside solely on the IoT device itself. The specifics of the IoT device's hardware is a black-box in many cases due to the lack of proven, established techniques to inspect IoT devices. This paper presents a novel methodology to inspect the internal software activities of IoT devices through their electromagnetic radiation emissions during live device investigation. When a running IoT device is identified at a crime scene, forensically important software activities can be revealed through an electromagnetic side-channel analysis (EM-SCA) attack. By using two representative IoT hardware platforms, this work demonstrates that cryptographic algorithms running on high-end IoT devices can be detected with over 82% accuracy, while minor software code differences in low-end IoT devices could be detected over 90% accuracy using a neural network-based classifier. Furthermore, it was experimentally demonstrated that malicious modification of the stock firmware of an IoT device can be detected through machine learning-assisted EM-SCA techniques. These techniques provide a new investigative vector for digital forensic investigators to inspect IoT devices.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
A Survey of Electromagnetic Side-Channel Attacks and Discussion on their Case-Progressing Potential for Digital Forensics
Authors:
Asanka Sayakkara,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
The increasing prevalence of Internet of Things (IoT) devices has made it inevitable that their pertinence to digital forensic investigations will increase into the foreseeable future. These devices produced by various vendors often posses limited standard interfaces for communication, such as USB ports or WiFi/Bluetooth wireless interfaces. Meanwhile, with an increasing mainstream focus on the se…
▽ More
The increasing prevalence of Internet of Things (IoT) devices has made it inevitable that their pertinence to digital forensic investigations will increase into the foreseeable future. These devices produced by various vendors often posses limited standard interfaces for communication, such as USB ports or WiFi/Bluetooth wireless interfaces. Meanwhile, with an increasing mainstream focus on the security and privacy of user data, built-in encryption is becoming commonplace in consumer-level computing devices, and IoT devices are no exception. Under these circumstances, a significant challenge is presented to digital forensic investigations where data from IoT devices needs to be analysed. This work explores the electromagnetic (EM) side-channel analysis literature for the purpose of assisting digital forensic investigations on IoT devices. EM side-channel analysis is a technique where unintentional electromagnetic emissions are used for eavesdrop** on the operations and data handling of computing devices. The non-intrusive nature of EM side-channel approaches makes it a viable option to assist digital forensic investigations as these attacks require, and must result in, no modification to the target device. The literature on various EM side-channel analysis attack techniques are discussed - selected on the basis of their applicability in IoT device investigation scenarios. The insight gained from the background study is used to identify promising future applications of the technique for digital forensic analysis on IoT devices - potentially progressing a wide variety of currently hindered digital investigations.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Shining a light on Spotlight: Leveraging Apple's desktop search utility to recover deleted file metadata on macOS
Authors:
Tajvinder Singh Atwal,
Mark Scanlon,
Nhien-An Le-Khac
Abstract:
Spotlight is a proprietary desktop search technology released by Apple in 2004 for its Macintosh operating system Mac OS X 10.4 (Tiger) and remains as a feature in current releases of macOS. Spotlight allows users to search for files or information by querying databases populated with filesystem attributes, metadata, and indexed textual content. Existing forensic research into Spotlight has provid…
▽ More
Spotlight is a proprietary desktop search technology released by Apple in 2004 for its Macintosh operating system Mac OS X 10.4 (Tiger) and remains as a feature in current releases of macOS. Spotlight allows users to search for files or information by querying databases populated with filesystem attributes, metadata, and indexed textual content. Existing forensic research into Spotlight has provided an understanding of the metadata attributes stored within the metadata store database. Current approaches in the literature have also enabled the extraction of metadata records for extant files, but not for deleted files. The objective of this paper is to research the persistence of records for deleted files within Spotlight's metadata store, identify if deleted database pages are recoverable from unallocated space on the volume, and to present a strategy for the processing of discovered records. In this paper, the structure of the metadata store database is outlined, and experimentation reveals that records persist for a period of time within the database but once deleted, are no longer recoverable. The experimentation also demonstrates that deleted pages from the database (containing metadata records) are recoverable from unused space on the filesystem.
△ Less
Submitted 17 March, 2019;
originally announced March 2019.
-
Deep learning at the shallow end: Malware classification for non-domain experts
Authors:
Quan Le,
OisÃn Boydell,
Brian Mac Namee,
Mark Scanlon
Abstract:
Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware f…
▽ More
Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
Digital forensic investigation of two-way radio communication equipment and services
Authors:
Arie Kouwen,
Mark Scanlon,
Kim-Kwang Raymond Choo,
Nhien-An Le-Khac
Abstract:
Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of…
▽ More
Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of these devices also integrate with smartphones, which delivers Push-To-Talk services that make it possible to setup connections between users using a two-way radio and a smartphone. In fact, these devices can be used to connect users only using smartphones. To date, there is little research on the digital traces in modern radio communication equipment. In fact, increasing the knowledge base about these radio communication devices and services can be valuable to law enforcement in a police investigation. In this paper, we investigate what kind of radio communication equipment and services law enforcement digital investigators can encounter at a crime scene or in an investigation. Subsequent to seizure of this radio communication equipment we explore the traces, which may have a forensic interest and how these traces can be acquired. Finally, we test our approach on sample radio communication equipment and services.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
Network Intell: Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network Traffic
Authors:
Erwin van de Wiel,
Mark Scanlon,
Nhien-An Le-Khac
Abstract:
In criminal investigations, telecommunication wiretaps have become a common technique used by law enforcement. While phone-based wiretap** is well documented and the procedure for their execution are well known, the same cannot be said for Internet taps. Lawfully intercepted network traffic often contains a lot of encrypted traffic making it increasingly difficult to find useful information insi…
▽ More
In criminal investigations, telecommunication wiretaps have become a common technique used by law enforcement. While phone-based wiretap** is well documented and the procedure for their execution are well known, the same cannot be said for Internet taps. Lawfully intercepted network traffic often contains a lot of encrypted traffic making it increasingly difficult to find useful information inside the traffic captured. The advent of Internet-of-Things further complicates the process for non-technical investigators. The current level of complexity of intercepted network traffic is close to a point where data cannot be analysed without supervision of a digital investigator with advanced network knowledge. Current investigations focus on analysing all traffic in a chronological manner and are predominately conducted on the data contents of the intercepted traffic. This approach often becomes overly arduous when the amount of data to be analysed becomes very large. In this paper, we propose a novel approach to analyse large amounts of intercepted network traffic based on network metadata. Our approach significantly reduces the duration of the analysis and also produces an insight view of analysing results for the non-technical investigator. We also test our approach with a large sample of network traffic data.
△ Less
Submitted 27 January, 2018; v1 submitted 15 December, 2017;
originally announced December 2017.
-
Hierarchical Bloom Filter Trees for Approximate Matching
Authors:
David Lillis,
Frank Breitinger,
Mark Scanlon
Abstract:
Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files…
▽ More
Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way.
Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.
△ Less
Submitted 12 December, 2017;
originally announced December 2017.
-
Study of Peer-to-Peer Network Based Cybercrime Investigation: Application on Botnet Technologies
Authors:
Mark Scanlon
Abstract:
The scalable, low overhead attributes of Peer-to-Peer (P2P) Internet protocols and networks lend themselves well to being exploited by criminals to execute a large range of cybercrimes. The types of crimes aided by P2P technology include copyright infringement, sharing of illicit images of children, fraud, hacking/cracking, denial of service attacks and virus/malware propagation through the use of…
▽ More
The scalable, low overhead attributes of Peer-to-Peer (P2P) Internet protocols and networks lend themselves well to being exploited by criminals to execute a large range of cybercrimes. The types of crimes aided by P2P technology include copyright infringement, sharing of illicit images of children, fraud, hacking/cracking, denial of service attacks and virus/malware propagation through the use of a variety of worms, botnets, malware, viruses and P2P file sharing. This project is focused on study of active P2P nodes along with the analysis of the undocumented communication methods employed in many of these large unstructured networks. This is achieved through the design and implementation of an efficient P2P monitoring and crawling toolset. The requirement for investigating P2P based systems is not limited to the more obvious cybercrimes listed above, as many legitimate P2P based applications may also be pertinent to a digital forensic investigation, e.g, voice over IP, instant messaging, etc. Investigating these networks has become increasingly difficult due to the broad range of network topologies and the ever increasing and evolving range of P2P based applications. In this work we introduce the Universal P2P Network Investigation Framework (UP2PNIF), a framework which enables significantly faster and less labour intensive investigation of newly discovered P2P networks through the exploitation of the commonalities in P2P network functionality. In combination with a reference database of known network characteristics, it is envisioned that any known P2P network can be instantly investigated using the framework, which can intelligently determine the best investigation methodology and greatly expedite the evidence gathering process. A proof of concept tool was developed for conducting investigations on the BitTorrent network.
△ Less
Submitted 9 December, 2017;
originally announced December 2017.
-
Enabling the Remote Acquisition of Digital Forensic Evidence through Secure Data Transmission and Verification
Authors:
Mark Scanlon
Abstract:
Providing the ability to any law enforcement officer to remotely transfer an image from any suspect computer directly to a forensic laboratory for analysis, can only help to greatly reduce the time wasted by forensic investigators in conducting on-site collection of computer equipment. RAFT (Remote Acquisition Forensic Tool) is a system designed to facilitate forensic investigators by remotely gat…
▽ More
Providing the ability to any law enforcement officer to remotely transfer an image from any suspect computer directly to a forensic laboratory for analysis, can only help to greatly reduce the time wasted by forensic investigators in conducting on-site collection of computer equipment. RAFT (Remote Acquisition Forensic Tool) is a system designed to facilitate forensic investigators by remotely gathering digital evidence. This is achieved through the implementation of a secure, verifiable client/server imaging architecture. The RAFT system is designed to be relatively easy to use, requiring minimal technical knowledge on behalf of the user. One of the key focuses of RAFT is to ensure that the evidence it gathers remotely is court admissible. This is achieved by ensuring that the image taken using RAFT is verified to be identical to the original evidence on a suspect computer.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
Increasing digital investigator availability through efficient workflow management and automation
Authors:
Ronald In de Braekt,
Nhien-An Le-Khac,
Jason Farina,
Mark Scanlon,
M-Tahar Kechadi
Abstract:
The growth of digital storage capacities and diversity devices has had a significant time impact on digital forensic laboratories in law enforcement. Backlogs have become commonplace and increasingly more time is spent in the acquisition and preparation steps of an investigation as opposed to detailed evidence analysis and reporting. There is generally little room for increasing digital investigat…
▽ More
The growth of digital storage capacities and diversity devices has had a significant time impact on digital forensic laboratories in law enforcement. Backlogs have become commonplace and increasingly more time is spent in the acquisition and preparation steps of an investigation as opposed to detailed evidence analysis and reporting. There is generally little room for increasing digital investigation capacity in law enforcement digital forensic units and the allocated budgets for these units are often decreasing. In the context of develo** an efficient investigation process, one of the key challenges amounts to how to achieve more with less. This paper proposes a workflow management automation framework for handling common digital forensic tools. The objective is to streamline the digital investigation workflow - enabling more efficient use of limited hardware and software. The proposed automation framework reduces the time digital forensic experts waste conducting time-consuming, though necessary, tasks. The evidence processing time is decreased through server-side automation resulting in 24/7 evidence preparation. The proposed framework increases efficiency of use of forensic software and hardware, reduces the infrastructure costs and license fees, and simplifies the preparation steps for the digital investigator. The proposed approach is evaluated in a real-world scenario to evaluate its robustness and highlight its benefits.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Private Web Browser Forensics: A Case Study of the Epic Privacy Browser
Authors:
Alan Reed,
Mark Scanlon,
Nhien-An Le-Khac
Abstract:
Organised crime, as well as individual criminals, is benefiting from the protection of private browsers provide to those who would carry out illegal activity, such as money laundering, drug trafficking, the online exchange of child-abuse material, etc. The protection afforded to users of the Epic Privacy Browser illustrates these benefits. This browser is currently in use in approximately 180 coun…
▽ More
Organised crime, as well as individual criminals, is benefiting from the protection of private browsers provide to those who would carry out illegal activity, such as money laundering, drug trafficking, the online exchange of child-abuse material, etc. The protection afforded to users of the Epic Privacy Browser illustrates these benefits. This browser is currently in use in approximately 180 countries worldwide. This paper outlines the location and type of evidence available through live and post-mortem state analyses of the Epic Privacy Browser. This study identifies the manner in which the browser functions during use, where evidence can be recovered after use, as well as the tools and effective presentation of the recovered material.
△ Less
Submitted 4 January, 2018; v1 submitted 5 August, 2017;
originally announced August 2017.
-
Integration of Ether Unpacker into Ragpicker for plugin-based Malware Analysis and Identification
Authors:
Erik Schaefer,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Malware is a pervasive problem in both personal computing devices and distributed computing systems. Identification of malware variants and their families others a great benefit in early detection resulting in a reduction of the analyses time needed. In order to classify malware, most of the current approaches are based on the analysis of the unpacked and unencrypted binaries. However, most of the…
▽ More
Malware is a pervasive problem in both personal computing devices and distributed computing systems. Identification of malware variants and their families others a great benefit in early detection resulting in a reduction of the analyses time needed. In order to classify malware, most of the current approaches are based on the analysis of the unpacked and unencrypted binaries. However, most of the unpacking solutions in the literature have a low unpacking rate. This results in a low contribution towards the identification of transferred code and re-used code. To develop a new malware analysis solution based on clusters of binary code sections, it is required to focus on increasing of the unpacking rate of malware samples to extend the underlying code database. In this paper, we present a new approach of analysing malware by integrating Ether Unpacker into the plugin-based malware analysis tool, Ragpicker. We also evaluate our approach against real-world malware patterns.
△ Less
Submitted 5 August, 2017;
originally announced August 2017.
-
Evaluation of Digital Forensic Process Models with Respect to Digital Forensics as a Service
Authors:
Xiaoyu Du,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Digital forensic science is very much still in its infancy, but is becoming increasingly invaluable to investigators. A popular area for research is seeking a standard methodology to make the digital forensic process accurate, robust, and efficient. The first digital forensic process model proposed contains four steps: Acquisition, Identification, Evaluation and Admission. Since then, numerous pro…
▽ More
Digital forensic science is very much still in its infancy, but is becoming increasingly invaluable to investigators. A popular area for research is seeking a standard methodology to make the digital forensic process accurate, robust, and efficient. The first digital forensic process model proposed contains four steps: Acquisition, Identification, Evaluation and Admission. Since then, numerous process models have been proposed to explain the steps of identifying, acquiring, analysing, storage, and reporting on the evidence obtained from various digital devices. In recent years, an increasing number of more sophisticated process models have been proposed. These models attempt to speed up the entire investigative process or solve various of problems commonly encountered in the forensic investigation. In the last decade, cloud computing has emerged as a disruptive technological concept, and most leading enterprises such as IBM, Amazon, Google, and Microsoft have set up their own cloud-based services. In the field of digital forensic investigation, moving to a cloud-based evidence processing model would be extremely beneficial and preliminary attempts have been made in its implementation. Moving towards a Digital Forensics as a Service model would not only expedite the investigative process, but can also result in significant cost savings - freeing up digital forensic experts and law enforcement personnel to progress their caseload. This paper aims to evaluate the applicability of existing digital forensic process models and analyse how each of these might apply to a cloud-based evidence processing paradigm.
△ Less
Submitted 5 August, 2017;
originally announced August 2017.
-
Privileged Data within Digital Evidence
Authors:
Dominique Fleurbaaij,
Mark Scanlon,
Nhien-An Le-Khac
Abstract:
In recent years the use of digital communication has increased. This also increased the chance to find privileged data in the digital evidence. Privileged data is protected by law from viewing by anyone other than the client. It is up to the digital investigator to handle this privileged data properly without being able to view the contents. Procedures on handling this information are available, b…
▽ More
In recent years the use of digital communication has increased. This also increased the chance to find privileged data in the digital evidence. Privileged data is protected by law from viewing by anyone other than the client. It is up to the digital investigator to handle this privileged data properly without being able to view the contents. Procedures on handling this information are available, but do not provide any practical information nor is it known how effective filtering is. The objective of this paper is to describe the handling of privileged data in the current digital forensic tools and the creation of a script within the digital forensic tool Nuix. The script automates the handling of privileged data to minimize the exposure of the contents to the digital investigator. The script also utilizes technology within Nuix that extends the automated search of identical privileged document to relate files based on their contents. A comparison of the 'traditional' ways of filtering within the digital forensic tools and the script written in Nuix showed that digital forensic tools are still limited when used on privileged data. The script manages to increase the effectiveness as direct result of the use of relations based on file content.
△ Less
Submitted 5 August, 2017;
originally announced August 2017.
-
EviPlant: An efficient digital forensic challenge creation, manipulation and distribution solution
Authors:
Mark Scanlon,
Xiaoyu Du,
David Lillis
Abstract:
Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on the part of the educator to ensure their viability. Once created, the challenge…
▽ More
Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on the part of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant time and resources and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system as an alternative to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely base operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a "diffing" of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an "evidence package". Evidence packages can be created for different personae, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated into the base images. A number of additional applications in digital forensic challenge creation for tool testing and validation, proficiency testing, and malware analysis are also discussed as a result of using EviPlant.
△ Less
Submitted 28 April, 2017;
originally announced April 2017.
-
Towards the Leveraging of Data Deduplication to Break the Disk Acquisition Speed Limit
Authors:
Hannah Wolahan,
Claudio Chico Lorenzo,
Elias Bou-Harb,
Mark Scanlon
Abstract:
Digital forensic evidence acquisition speed is traditionally limited by two main factors: the read speed of the storage device being investigated, i.e., the read speed of the disk, memory, remote storage, mobile device, etc.), and the write speed of the system used for storing the acquired data. Digital forensic investigators can somewhat mitigate the latter issue through the use of high-speed sto…
▽ More
Digital forensic evidence acquisition speed is traditionally limited by two main factors: the read speed of the storage device being investigated, i.e., the read speed of the disk, memory, remote storage, mobile device, etc.), and the write speed of the system used for storing the acquired data. Digital forensic investigators can somewhat mitigate the latter issue through the use of high-speed storage options, such as networked RAID storage, in the controlled environment of the forensic laboratory. However, traditionally, little can be done to improve the acquisition speed past its physical read speed from the target device itself. The protracted time taken for data acquisition wastes digital forensic experts' time, contributes to digital forensic investigation backlogs worldwide, and delays pertinent information from potentially influencing the direction of an investigation. In a remote acquisition scenario, a third contributing factor can also become a detriment to the overall acquisition time - typically the Internet upload speed of the acquisition system. This paper explores an alternative to the traditional evidence acquisition model through the leveraging of a forensic data deduplication system. The advantages that a deduplicated approach can provide over the current digital forensic evidence acquisition process are outlined and some preliminary results of a prototype implementation are discussed.
△ Less
Submitted 20 October, 2016; v1 submitted 18 October, 2016;
originally announced October 2016.
-
Battling the Digital Forensic Backlog through Data Deduplication
Authors:
Mark Scanlon
Abstract:
In everyday life. Technological advancement can be found in many facets of life, including personal computers, mobile devices, wearables, cloud services, video gaming, web-powered messaging, social media, Internet-connected devices, etc. This technological influence has resulted in these technologies being employed by criminals to conduct a range of crimes -- both online and offline. Both the numb…
▽ More
In everyday life. Technological advancement can be found in many facets of life, including personal computers, mobile devices, wearables, cloud services, video gaming, web-powered messaging, social media, Internet-connected devices, etc. This technological influence has resulted in these technologies being employed by criminals to conduct a range of crimes -- both online and offline. Both the number of cases requiring digital forensic analysis and the sheer volume of information to be processed in each case has increased rapidly in recent years. As a result, the requirement for digital forensic investigation has ballooned, and law enforcement agencies throughout the world are scrambling to address this demand. While more and more members of law enforcement are being trained to perform the required investigations, the supply is not kee** up with the demand. Current digital forensic techniques are arduously time-consuming and require a significant amount of man power to execute. This paper discusses a novel solution to combat the digital forensic backlog. This solution leverages a deduplication-based paradigm to eliminate the reacquisition, redundant storage, and reanalysis of previously processed data.
△ Less
Submitted 2 October, 2016;
originally announced October 2016.
-
Current Challenges and Future Research Areas for Digital Forensic Investigation
Authors:
David Lillis,
Brett Becker,
Tadhg O'Sullivan,
Mark Scanlon
Abstract:
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout th…
▽ More
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout the world. It can be anticipated that the number of cases requiring digital forensic analysis will greatly increase in the future. It is also likely that each case will require the analysis of an increasing number of devices including computers, smartphones, tablets, cloud-based services, Internet of Things devices, wearables, etc. The variety of new digital evidence sources pose new and challenging problems for the digital investigator from an identification, acquisition, storage and analysis perspective. This paper explores the current challenges contributing to the backlog in digital forensics from a technical standpoint and outlines a number of future research topics that could greatly contribute to a more efficient digital forensic process.
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
Tiered Forensic Methodology Model for Digital Field Triage by Non-Digital Evidence Specialists
Authors:
Ben Hitchcock,
Nhien-An Le-Khac,
Mark Scanlon
Abstract:
Due to budgetary constraints and the high level of training required, digital forensic analysts are in short supply in police forces the world over. This inevitably leads to a prolonged time taken between an investigator sending the digital evidence for analysis and receiving the analytical report back. In an attempt to expedite this procedure, various process models have been created to place the…
▽ More
Due to budgetary constraints and the high level of training required, digital forensic analysts are in short supply in police forces the world over. This inevitably leads to a prolonged time taken between an investigator sending the digital evidence for analysis and receiving the analytical report back. In an attempt to expedite this procedure, various process models have been created to place the forensic analyst in the field conducting a triage of the digital evidence. By conducting triage in the field, an investigator is able to act upon pertinent information quicker, while waiting on the full report. The work presented as part of this paper focuses on the training of front-line personnel in the field triage process, without the need of a forensic analyst attending the scene. The premise has been successfully implemented within regular/non-digital forensics, i.e., crime scene investigation. In that field, front-line members have been trained in specific tasks to supplement the trained specialists. The concept of front-line members conducting triage of digital evidence in the field is achieved through the development of a new process model providing guidance to these members. To prove the model's viability, an implementation of this new process model is presented and evaluated. The results outlined demonstrate how a tiered response involving digital evidence specialists and non-specialists can better deal with the increasing number of investigations involving digital evidence.
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
Towards the Forensic Identification and Investigation of Cloud Hosted Servers through Noninvasive Wiretaps
Authors:
Hessel Schut,
Mark Scanlon,
Jason Farina,
Nhien-An Le-Khac
Abstract:
When conducting modern cybercrime investigations, evidence has often to be gathered from computer systems located at cloud-based data centres of hosting providers. In cases where the investigation cannot rely on the cooperation of the hosting provider, or where documentation is not available, investigators can often find the identification of which distinct server among many is of interest difficu…
▽ More
When conducting modern cybercrime investigations, evidence has often to be gathered from computer systems located at cloud-based data centres of hosting providers. In cases where the investigation cannot rely on the cooperation of the hosting provider, or where documentation is not available, investigators can often find the identification of which distinct server among many is of interest difficult and extremely time consuming. To address the problem of identifying these servers, in this paper a new approach to rapidly and reliably identify these cloud hosting computer systems is presented. In the outlined approach, a handheld device composed of an embedded computer combined with a method of undetectable interception of Ethernet based communications is presented. This device is tested and evaluated, and a discussion is provided on its usefulness in identifying of server of interest to an investigation.
△ Less
Submitted 2 October, 2015;
originally announced October 2015.
-
HTML5 Zero Configuration Covert Channels: Security Risks and Challenges
Authors:
Jason Farina,
Mark Scanlon,
Stephen Kohlmann,
Nhien-An Le Khac,
M-Tahar Kechadi
Abstract:
In recent months there has been an increase in the popularity and public awareness of secure, cloudless file transfer systems. The aim of these services is to facilitate the secure transfer of files in a peer-to- peer (P2P) fashion over the Internet without the need for centralised authentication or storage. These services can take the form of client installed applications or entirely web browser…
▽ More
In recent months there has been an increase in the popularity and public awareness of secure, cloudless file transfer systems. The aim of these services is to facilitate the secure transfer of files in a peer-to- peer (P2P) fashion over the Internet without the need for centralised authentication or storage. These services can take the form of client installed applications or entirely web browser based interfaces. Due to their P2P nature, there is generally no limit to the file sizes involved or to the volume of data transmitted - and where these limitations do exist they will be purely reliant on the capacities of the systems at either end of the transfer. By default, many of these services provide seamless, end-to-end encryption to their users. The cybersecurity and cyberforensic consequences of the potential criminal use of such services are significant. The ability to easily transfer encrypted data over the Internet opens up a range of opportunities for illegal use to cybercriminals requiring minimal technical know-how. This paper explores a number of these services and provides an analysis of the risks they pose to corporate and governmental security. A number of methods for the forensic investigation of such transfers are discussed.
△ Less
Submitted 2 October, 2015;
originally announced October 2015.
-
Project Maelstrom: Forensic Analysis of the BitTorrent-Powered Browser
Authors:
Jason Farina,
M-Tahar Kechadi,
Mark Scanlon
Abstract:
In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser, Project Maelstrom, into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP-based, client-server model. This decentralised web is powered by each of the visitors accessing each Maelstrom hosted website. Each user shares their copy of the website's source…
▽ More
In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser, Project Maelstrom, into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP-based, client-server model. This decentralised web is powered by each of the visitors accessing each Maelstrom hosted website. Each user shares their copy of the website's source code and multimedia content with new visitors. As a result, a Maelstrom hosted website cannot be taken offline by law enforcement or any other parties. Due to this open distribution model, a number of interesting censorship, security and privacy considerations are raised. This paper explores the application, its protocol, sharing Maelstrom content and its new visitor powered "web-hosting" paradigm.
△ Less
Submitted 2 October, 2015;
originally announced October 2015.
-
Network investigation methodology for BitTorrent Sync: A Peer-to-Peer based file synchronisation service
Authors:
Mark Scanlon,
Jason Farina,
M-Tahar Kechadi
Abstract:
High availability is no longer just a business continuity concern. Users are increasingly dependant on devices that consume and produce data in ever increasing volumes. A popular solution is to have a central repository which each device accesses after centrally managed authentication. This model of use is facilitated by cloud based file synchronisation services such as Dropbox, OneDrive, Google D…
▽ More
High availability is no longer just a business continuity concern. Users are increasingly dependant on devices that consume and produce data in ever increasing volumes. A popular solution is to have a central repository which each device accesses after centrally managed authentication. This model of use is facilitated by cloud based file synchronisation services such as Dropbox, OneDrive, Google Drive and Apple iCloud. Cloud architecture allows the provisioning of storage space with "always-on" access. Recent concerns over unauthorised access to third party systems and large scale exposure of private data have made an alternative solution desirable. These events have caused users to assess their own security practices and the level of trust placed in third party storage services. One option is BitTorrent Sync, a cloudless synchronisation utility provides data availability and redundancy. This utility replicates files stored in shares to remote peers with access controlled by keys and permissions. While lacking the economies brought about by scale, complete control over data access has made this a popular solution. The ability to replicate data without oversight introduces risk of abuse by users as well as difficulties for forensic investigators. This paper suggests a methodology for investigation and analysis of the protocol to assist in the control of data flow across security perimeters.
△ Less
Submitted 3 June, 2015;
originally announced June 2015.
-
Digital Evidence Bag Selection for P2P Network Investigation
Authors:
Mark Scanlon,
M-Tahar Kechadi
Abstract:
The collection and handling of court admissible evidence is a fundamental component of any digital forensic investigation. While the procedures for handling digital evidence take much of their influence from the established policies for the collection of physical evidence, due to the obvious differences in dealing with non-physical evidence, a number of extra policies and procedures are required.…
▽ More
The collection and handling of court admissible evidence is a fundamental component of any digital forensic investigation. While the procedures for handling digital evidence take much of their influence from the established policies for the collection of physical evidence, due to the obvious differences in dealing with non-physical evidence, a number of extra policies and procedures are required. This paper compares and contrasts some of the existing digital evidence formats or "bags" and analyses them for their compatibility with evidence gathered from a network source. A new digital extended evidence bag is proposed to specifically deal with evidence gathered from P2P networks, incorporating the network byte stream and on-the-fly metadata generation to aid in expedited identification and analysis.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.
-
The Case for a Collaborative Universal Peer-to-Peer Botnet Investigation Framework
Authors:
Mark Scanlon,
M-Tahar Kechadi
Abstract:
Peer-to-Peer (P2P) botnets are becoming widely used as a low-overhead, efficient, self-maintaining, distributed alternative to the traditional client/server model across a broad range of cyberattacks. These cyberattacks can take the form of distributed denial of service attacks, authentication cracking, spamming, cyberwarfare or malware distribution targeting on financial systems. These attacks ca…
▽ More
Peer-to-Peer (P2P) botnets are becoming widely used as a low-overhead, efficient, self-maintaining, distributed alternative to the traditional client/server model across a broad range of cyberattacks. These cyberattacks can take the form of distributed denial of service attacks, authentication cracking, spamming, cyberwarfare or malware distribution targeting on financial systems. These attacks can also cross over into the physical world attacking critical infrastructure causing its disruption or destruction (power, communications, water, etc.). P2P technology lends itself well to being exploited for such malicious purposes due to the minimal setup, running and maintenance costs involved in executing a globally orchestrated attack, alongside the perceived additional layer of anonymity. In the ever-evolving space of botnet technology, reducing the time lag between discovering a newly developed or updated botnet system and gaining the ability to mitigate against it is paramount. Often, numerous investigative bodies duplicate their efforts in creating bespoke tools to combat particular threats. This paper outlines a framework capable of fast tracking the investigative process through collaboration between key stakeholders.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.
-
BitTorrent Sync: Network Investigation Methodology
Authors:
Mark Scanlon,
Jason Farina,
M-Tahar Kechadi
Abstract:
The volume of personal information and data most Internet users find themselves amassing is ever increasing and the fast pace of the modern world results in most requiring instant access to their files. Millions of these users turn to cloud based file synchronisation services, such as Dropbox, Microsoft Skydrive, Apple iCloud and Google Drive, to enable "always-on" access to their most up-to-date…
▽ More
The volume of personal information and data most Internet users find themselves amassing is ever increasing and the fast pace of the modern world results in most requiring instant access to their files. Millions of these users turn to cloud based file synchronisation services, such as Dropbox, Microsoft Skydrive, Apple iCloud and Google Drive, to enable "always-on" access to their most up-to-date data from any computer or mobile device with an Internet connection. The prevalence of recent articles covering various invasion of privacy issues and data protection breaches in the media has caused many to review their online security practices with their personal information. To provide an alternative to cloud based file backup and synchronisation, BitTorrent Inc. released an alternative cloudless file backup and synchronisation service, named BitTorrent Sync to alpha testers in April 2013. BitTorrent Sync's popularity rose dramatically throughout 2013, reaching over two million active users by the end of the year. This paper outlines a number of scenarios where the network investigation of the service may prove invaluable as part of a digital forensic investigation. An investigation methodology is proposed outlining the required steps involved in retrieving digital evidence from the network and the results from a proof of concept investigation are presented.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.
-
Leveraging Decentralization to Extend the Digital Evidence Acquisition Window: Case Study on BitTorrent Sync
Authors:
Mark Scanlon,
Jason Farina,
Nhien An Le Khac,
Tahar Kechadi
Abstract:
File synchronization services such as Dropbox, Google Drive, Microsoft OneDrive, Apple iCloud, etc., are becoming increasingly popular in today's always-connected world. A popular alternative to the aforementioned services is BitTorrent Sync. This is a decentralized/cloudless file synchronization service and is gaining significant popularity among Internet users with privacy concerns over where th…
▽ More
File synchronization services such as Dropbox, Google Drive, Microsoft OneDrive, Apple iCloud, etc., are becoming increasingly popular in today's always-connected world. A popular alternative to the aforementioned services is BitTorrent Sync. This is a decentralized/cloudless file synchronization service and is gaining significant popularity among Internet users with privacy concerns over where their data is stored and who has the ability to access it. The focus of this paper is the remote recovery of digital evidence pertaining to files identified as being accessed or stored on a suspect's computer or mobile device. A methodology for the identification, investigation, recovery and verification of such remote digital evidence is outlined. Finally, a proof-of-concept remote evidence recovery from BitTorrent Sync shared folder highlighting a number of potential scenarios for the recovery and verification of such evidence.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.
-
BitTorrent Sync: First Impressions and Digital Forensic Implications
Authors:
Jason Farina,
Mark Scanlon,
M-Tahar Kechadi
Abstract:
With professional and home Internet users becoming increasingly concerned with data protection and privacy, the privacy afforded by popular cloud file synchronisation services, such as Dropbox, OneDrive and Google Drive, is coming under scrutiny in the press. A number of these services have recently been reported as sharing information with governmental security agencies without warrants. BitTorre…
▽ More
With professional and home Internet users becoming increasingly concerned with data protection and privacy, the privacy afforded by popular cloud file synchronisation services, such as Dropbox, OneDrive and Google Drive, is coming under scrutiny in the press. A number of these services have recently been reported as sharing information with governmental security agencies without warrants. BitTorrent Sync is seen as an alternative by many and has gathered over two million users by December 2013 (doubling since the previous month). The service is completely decentralised, offers much of the same synchronisation functionality of cloud powered services and utilises encryption for data transmission (and optionally for remote storage). The importance of understanding BitTorrent Sync and its resulting digital investigative implications for law enforcement and forensic investigators will be paramount to future investigations. This paper outlines the client application, its detected network traffic and identifies artefacts that may be of value as evidence for future digital investigations.
△ Less
Submitted 29 September, 2014;
originally announced September 2014.
-
An Analysis of BitTorrent Cross-Swarm Peer Participation and Geolocational Distribution
Authors:
Mark Scanlon,
Huijie Shen
Abstract:
Peer-to-Peer (P2P) file-sharing is becoming increasingly popular in recent years. In 2012, it was reported that P2P traffic consumed over 5,374 petabytes per month, which accounted for approximately 20.5% of consumer internet traffic. TV is the popular content type on The Pirate Bay (the world's largest BitTorrent indexing website). In this paper, an analysis of the swarms of the most popular pira…
▽ More
Peer-to-Peer (P2P) file-sharing is becoming increasingly popular in recent years. In 2012, it was reported that P2P traffic consumed over 5,374 petabytes per month, which accounted for approximately 20.5% of consumer internet traffic. TV is the popular content type on The Pirate Bay (the world's largest BitTorrent indexing website). In this paper, an analysis of the swarms of the most popular pirated TV shows is conducted. The purpose of this data gathering exercise is to enumerate the peer distribution at different geolocational levels, to measure the temporal trend of the swarm and to discover the amount of cross-swarm peer participation. Snapshots containing peer related information involved in the unauthorised distribution of this content were collected at a high frequency resulting in a more accurate landscape of the total involvement. The volume of data collected throughout the monitoring of the network exceeded 2 terabytes. The presented analysis and the results presented can aid in network usage prediction, bandwidth provisioning and future network design.
△ Less
Submitted 29 September, 2014;
originally announced September 2014.
-
Understanding Student Computational Thinking with Computational Modeling
Authors:
John M. Aiken,
Marcos D. Caballero,
Scott S. Douglas,
John B. Burk,
Erin M. Scanlon,
Brian D. Thoms,
Michael F. Schatz
Abstract:
Recently, the National Research Council's framework for next generation science standards highlighted "computational thinking" as one of its "fundamental practices". 9th Grade students taking a physics course that employed the Modeling Instruction curriculum were taught to construct computational models of physical systems. Student computational thinking was assessed using a proctored programming…
▽ More
Recently, the National Research Council's framework for next generation science standards highlighted "computational thinking" as one of its "fundamental practices". 9th Grade students taking a physics course that employed the Modeling Instruction curriculum were taught to construct computational models of physical systems. Student computational thinking was assessed using a proctored programming assignment, written essay, and a series of think-aloud interviews, where the students produced and discussed a computational model of a baseball in motion via a high-level programming environment (VPython). Roughly a third of the students in the study were successful in completing the programming assignment. Student success on this assessment was tied to how students synthesized their knowledge of physics and computation. On the essay and interview assessments, students displayed unique views of the relationship between force and motion; those who spoke of this relationship in causal (rather than observational) terms tended to have more success in the programming exercise.
△ Less
Submitted 7 September, 2012; v1 submitted 7 July, 2012;
originally announced July 2012.
-
Integrating Numerical Computation into the Modeling Instruction Curriculum
Authors:
Marcos D. Caballero,
John B. Burk,
John M. Aiken,
Scott S. Douglas,
Erin M. Scanlon,
Brian Thoms,
Michael F. Schatz
Abstract:
We describe a way to introduce physics high school students with no background in programming to computational problem-solving experiences. Our approach builds on the great strides made by the Modeling Instruction reform curriculum. This approach emphasizes the practices of "Develo** and using models" and "Computational thinking" highlighted by the NRC K-12 science standards framework. We taught…
▽ More
We describe a way to introduce physics high school students with no background in programming to computational problem-solving experiences. Our approach builds on the great strides made by the Modeling Instruction reform curriculum. This approach emphasizes the practices of "Develo** and using models" and "Computational thinking" highlighted by the NRC K-12 science standards framework. We taught 9th-grade students in a Modeling-Instruction-based physics course to construct computational models using the VPython programming environment. Numerical computation within the Modeling Instruction curriculum provides coherence among the curriculum's different force and motion models, links the various representations which the curriculum employs, and extends the curriculum to include real-world problems that are inaccessible to a purely analytic approach.
△ Less
Submitted 17 November, 2013; v1 submitted 3 July, 2012;
originally announced July 2012.
-
Influence of positional correlations on the propagation of waves in a complex medium with polydisperse resonant scatterers
Authors:
V. Leroy,
A. L. Strybulevych,
J. H. Page,
M. G. Scanlon
Abstract:
We present experimental results on a model system for studying wave propagation in a complex medium exhibiting low frequency resonances. These experiments enable us to investigate a fundamental question that is relevant for many materials, such as metamaterials, where low-frequency scattering resonances strongly influence the effective medium properties. This question concerns the effect of correl…
▽ More
We present experimental results on a model system for studying wave propagation in a complex medium exhibiting low frequency resonances. These experiments enable us to investigate a fundamental question that is relevant for many materials, such as metamaterials, where low-frequency scattering resonances strongly influence the effective medium properties. This question concerns the effect of correlations in the positions of the scatterers on the coupling between their resonances, and hence on wave transport through the medium. To examine this question experimentally, we measure the effective medium wave number of acoustic waves in a sample made of bubbles embedded in an elastic matrix over a frequency range that includes the resonance frequency of the bubbles. The effective medium is highly dispersive, showing peaks in the attenuation and the phase velocity as functions of the frequency, which cannot be accurately described using the Independent Scattering Approximation (ISA). This discrepancy may be explained by the effects of the positional correlations of the scatterers, which we show to be dependent on the size of the scatterers. We propose a self-consistent approach for taking this "polydisperse correlation" into account and show that our model better describes the experimental results than the ISA.
△ Less
Submitted 26 April, 2011; v1 submitted 29 October, 2010;
originally announced October 2010.