-
Investigating child sexual abuse material availability, searches, and users on the anonymous Tor network for a public health intervention strategy
Authors:
Juha Nurmi,
Arttu Paju,
Billy Bob Brumley,
Tegan Insoll,
Anna K. Ovaska,
Valeriia Soloveva,
Nina Vaaranen-Valkonen,
Mikko Aaltonen,
David Arroyo
Abstract:
Tor is widely used for staying anonymous online and accessing onion websites; unfortunately, Tor is popular for distributing and viewing illicit child sexual abuse material (CSAM). From 2018 to 2023, we analyse 176,683 onion domains and find that one-fifth share CSAM. We find that CSAM is easily available using 21 out of the 26 most-used Tor search engines. We analyse 110,133,715 search sessions f…
▽ More
Tor is widely used for staying anonymous online and accessing onion websites; unfortunately, Tor is popular for distributing and viewing illicit child sexual abuse material (CSAM). From 2018 to 2023, we analyse 176,683 onion domains and find that one-fifth share CSAM. We find that CSAM is easily available using 21 out of the 26 most-used Tor search engines. We analyse 110,133,715 search sessions from the Ahmia.fi search engine and discover that 11.1% seek CSAM. When searching CSAM by age, 40.5% search for 11-year-olds and younger; 11.0% for 12-year-olds; 8.2% for 13-year-olds; 11.6% for 14-year-olds; 10.9% for 15-year-olds; and 12.7% for 16-year-olds. We demonstrate accurate filtering for search engines, introduce intervention, show a questionnaire for CSAM users, and analyse 11,470 responses. 65.3% of CSAM users first saw the material when they were children themselves, and half of the respondents first saw the material accidentally, demonstrating the availability of CSAM. 48.1% want to stop using CSAM. Some seek help through Tor, and self-help websites are popular. Our survey finds commonalities between CSAM use and addiction. Help-seeking correlates with increasing viewing duration and frequency, depression, anxiety, self-harming thoughts, guilt, and shame. Yet, 73.9% of help seekers have not been able to receive it.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access
Authors:
Juha Nurmi,
Mikko Niemelä,
Billy Bob Brumley
Abstract:
We investigate the criminal market dynamics of infostealer malware and publish three evidence datasets on malware infections and trade. We justify the value chain between illicit enterprises using the datasets, compare the prices and added value, and use the value chain to identify the most effective countermeasures.
We begin by examining infostealer malware victim logs shared by actors on hacki…
▽ More
We investigate the criminal market dynamics of infostealer malware and publish three evidence datasets on malware infections and trade. We justify the value chain between illicit enterprises using the datasets, compare the prices and added value, and use the value chain to identify the most effective countermeasures.
We begin by examining infostealer malware victim logs shared by actors on hacking forums, and extract victim information and mask sensitive data to protect privacy. We find access to these same victims for sale at Genesis Market. This technically sophisticated marketplace provides its own browser to access victim's online accounts. We collect a second dataset and discover that 91% of prices fall between 1--20 US dollars, with a median of 5 US dollars.
Database Market sells access to compromised online accounts. We produce yet another dataset, finding 91% of prices fall between 1--30 US dollars, with a median of 7 US dollars.
△ Less
Submitted 17 July, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
SoK: A Systematic Review of TEE Usage for Develo** Trusted Applications
Authors:
Arttu Paju,
Muhammad Owais Javed,
Juha Nurmi,
Juha Savimäki,
Brian McGillion,
Billy Bob Brumley
Abstract:
Trusted Execution Environments (TEEs) are a feature of modern central processing units (CPUs) that aim to provide a high assurance, isolated environment in which to run workloads that demand both confidentiality and integrity. Hardware and software components in the CPU isolate workloads, commonly referred to as Trusted Applications (TAs), from the main operating system (OS). This article aims to…
▽ More
Trusted Execution Environments (TEEs) are a feature of modern central processing units (CPUs) that aim to provide a high assurance, isolated environment in which to run workloads that demand both confidentiality and integrity. Hardware and software components in the CPU isolate workloads, commonly referred to as Trusted Applications (TAs), from the main operating system (OS). This article aims to analyse the TEE ecosystem, determine its usability, and suggest improvements where necessary to make adoption easier. To better understand TEE usage, we gathered academic and practical examples from a total of 223 references. We summarise the literature and provide a publication timeline, along with insights into the evolution of TEE research and deployment. We categorise TAs into major groups and analyse the tools available to developers. Lastly, we evaluate trusted container projects, test performance, and identify the requirements for migrating applications inside them.
△ Less
Submitted 15 August, 2023; v1 submitted 26 June, 2023;
originally announced June 2023.
-
A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services
Authors:
Dewant Katare,
Diego Perino,
Jari Nurmi,
Martijn Warnier,
Marijn Janssen,
Aaron Yi Ding
Abstract:
Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terab…
▽ More
Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
SURIMI: Supervised Radio Map Augmentation with Deep Learning and a Generative Adversarial Network for Fingerprint-based Indoor Positioning
Authors:
Darwin Quezada-Gaibor,
Joaquín Torres-Sospedra,
Jari Nurmi,
Yevgeni Koucheryavy,
Joaquín Huerta
Abstract:
Indoor Positioning based on Machine Learning has drawn increasing attention both in the academy and the industry as meaningful information from the reference data can be extracted. Many researchers are using supervised, semi-supervised, and unsupervised Machine Learning models to reduce the positioning error and offer reliable solutions to the end-users. In this article, we propose a new architect…
▽ More
Indoor Positioning based on Machine Learning has drawn increasing attention both in the academy and the industry as meaningful information from the reference data can be extracted. Many researchers are using supervised, semi-supervised, and unsupervised Machine Learning models to reduce the positioning error and offer reliable solutions to the end-users. In this article, we propose a new architecture by combining Convolutional Neural Network (CNN), Long short-term memory (LSTM) and Generative Adversarial Network (GAN) in order to increase the training data and thus improve the position accuracy. The proposed combination of supervised and unsupervised models was tested in 17 public datasets, providing an extensive analysis of its performance. As a result, the positioning error has been reduced in more than 70% of them.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Fault-Tolerant Collaborative Inference through the Edge-PRUNE Framework
Authors:
Jani Boutellier,
Bo Tan,
Jari Nurmi
Abstract:
Collaborative inference has received significant research interest in machine learning as a vehicle for distributing computation load, reducing latency, as well as addressing privacy preservation in communications. Recent collaborative inference frameworks have adopted dynamic inference methodologies such as early-exit and run-time partitioning of neural networks. However, as machine learning fram…
▽ More
Collaborative inference has received significant research interest in machine learning as a vehicle for distributing computation load, reducing latency, as well as addressing privacy preservation in communications. Recent collaborative inference frameworks have adopted dynamic inference methodologies such as early-exit and run-time partitioning of neural networks. However, as machine learning frameworks scale in the number of inference inputs, e.g., in surveillance applications, fault tolerance related to device failure needs to be considered. This paper presents the Edge-PRUNE distributed computing framework, built on a formally defined model of computation, which provides a flexible infrastructure for fault tolerant collaborative inference. The experimental section of this work shows results on achievable inference time savings by collaborative inference, presents fault tolerant system topologies and analyzes their cost in terms of execution time overhead.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets
Authors:
Darwin Quezada-Gaibor,
Lucie Klus,
Joaquín Torres-Sospedra,
Elena Simona Lohan,
Jari Nurmi,
Carlos Granell,
Joaquín Huerta
Abstract:
Wearable and IoT devices requiring positioning and localisation services grow in number exponentially every year. This rapid growth also produces millions of data entries that need to be pre-processed prior to being used in any indoor positioning system to ensure the data quality and provide a high Quality of Service (QoS) to the end-user. In this paper, we offer a novel and straightforward data c…
▽ More
Wearable and IoT devices requiring positioning and localisation services grow in number exponentially every year. This rapid growth also produces millions of data entries that need to be pre-processed prior to being used in any indoor positioning system to ensure the data quality and provide a high Quality of Service (QoS) to the end-user. In this paper, we offer a novel and straightforward data cleansing algorithm for WLAN fingerprinting radio maps. This algorithm is based on the correlation among fingerprints using the Received Signal Strength (RSS) values and the Access Points (APs)'s identifier. We use those to compute the correlation among all samples in the dataset and remove fingerprints with low level of correlation from the dataset. We evaluated the proposed method on 14 independent publicly-available datasets. As a result, an average of 14% of fingerprints were removed from the datasets. The 2D positioning error was reduced by 2.7% and 3D positioning error by 5.3% with a slight increase in the floor hit rate by 1.2% on average. Consequently, the average speed of position prediction was also increased by 14%.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Edge-PRUNE: Flexible Distributed Deep Learning Inference
Authors:
Jani Boutellier,
Bo Tan,
Jari Nurmi
Abstract:
Collaborative deep learning inference between low-resource endpoint devices and edge servers has received significant research interest in the last few years. Such computation partitioning can help reducing endpoint device energy consumption and improve latency, but equally importantly also contributes to privacy-preserving of sensitive data. This paper describes Edge-PRUNE, a flexible but light-w…
▽ More
Collaborative deep learning inference between low-resource endpoint devices and edge servers has received significant research interest in the last few years. Such computation partitioning can help reducing endpoint device energy consumption and improve latency, but equally importantly also contributes to privacy-preserving of sensitive data. This paper describes Edge-PRUNE, a flexible but light-weight computation framework for distributing machine learning inference between edge servers and one or more client devices. Compared to previous approaches, Edge-PRUNE is based on a formal dataflow computing model, and is agnostic towards machine learning training frameworks, offering at the same time wide support for leveraging deep learning accelerators such as embedded GPUs. The experimental section of the paper demonstrates the use and performance of Edge-PRUNE by image classification and object tracking applications on two heterogeneous endpoint devices and an edge server, over wireless and physical connections. Endpoint device inference time for SSD-Mobilenet based object tracking, for example, is accelerated 5.8x by collaborative inference.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Towards Accelerated Localization Performance Across Indoor Positioning Datasets
Authors:
Lucie Klus,
Darwin Quezada-Gaibor,
Joaquın Torres-Sospedra,
Elena Simona Lohan,
Carlos Granell,
Jari Nurmi
Abstract:
The localization speed and accuracy in the indoor scenario can greatly impact the Quality of Experience of the user. While many individual machine learning models can achieve comparable positioning performance, their prediction mechanisms offer different complexity to the system. In this work, we propose a fingerprinting positioning method for multi-building and multi-floor deployments, composed o…
▽ More
The localization speed and accuracy in the indoor scenario can greatly impact the Quality of Experience of the user. While many individual machine learning models can achieve comparable positioning performance, their prediction mechanisms offer different complexity to the system. In this work, we propose a fingerprinting positioning method for multi-building and multi-floor deployments, composed of a cascade of three models for building classification, floor classification, and 2D localization regression. We conduct an exhaustive search for the optimally performing one in each step of the cascade while validating on 14 different openly available datasets. As a result, we bring forward the best-performing combination of models in terms of overall positioning accuracy and processing speed and evaluate on independent sets of samples. We reduce the mean prediction time by 71% while achieving comparable positioning performance across all considered datasets. Moreover, in case of voluminous training dataset, the prediction time is reduced down to 1% of the benchmark's.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Lightweight Hybrid CNN-ELM Model for Multi-building and Multi-floor Classification
Authors:
Darwin Quezada-Gaibor,
Joaquín Torres-Sospedra,
Jari Nurmi,
Yevgeni Koucheryavy,
Joaquín Huerta
Abstract:
Machine learning models have become an essential tool in current indoor positioning solutions, given their high capabilities to extract meaningful information from the environment. Convolutional neural networks (CNNs) are one of the most used neural networks (NNs) due to that they are capable of learning complex patterns from the input data. Another model used in indoor positioning solutions is th…
▽ More
Machine learning models have become an essential tool in current indoor positioning solutions, given their high capabilities to extract meaningful information from the environment. Convolutional neural networks (CNNs) are one of the most used neural networks (NNs) due to that they are capable of learning complex patterns from the input data. Another model used in indoor positioning solutions is the Extreme Learning Machine (ELM), which provides an acceptable generalization performance as well as a fast speed of learning. In this paper, we offer a lightweight combination of CNN and ELM, which provides a quick and accurate classification of building and floor, suitable for power and resource-constrained devices. As a result, the proposed model is 58\% faster than the benchmark, with a slight improvement in the classification accuracy (by less than 1\%
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Towards Ubiquitous Indoor Positioning: Comparing Systems across Heterogeneous Datasets
Authors:
Joaquín Torres-Sospedra,
Ivo Silva,
Lucie Klus,
Darwin Quezada-Gaibor,
Antonino Crivello,
Paolo Barsocchi,
Cristiano Pendão,
Elena Simona Lohan,
Jari Nurmi,
Adriano Moreira
Abstract:
The evaluation of Indoor Positioning Systems (IPS) mostly relies on local deployments in the researchers' or partners' facilities. The complexity of preparing comprehensive experiments, collecting data, and considering multiple scenarios usually limits the evaluation area and, therefore, the assessment of the proposed systems. The requirements and features of controlled experiments cannot be gener…
▽ More
The evaluation of Indoor Positioning Systems (IPS) mostly relies on local deployments in the researchers' or partners' facilities. The complexity of preparing comprehensive experiments, collecting data, and considering multiple scenarios usually limits the evaluation area and, therefore, the assessment of the proposed systems. The requirements and features of controlled experiments cannot be generalized since the use of the same sensors or anchors density cannot be guaranteed. The dawn of datasets is pushing IPS evaluation to a similar level as machine-learning models, where new proposals are evaluated over many heterogeneous datasets. This paper proposes a way to evaluate IPSs in multiple scenarios, that is validated with three use cases. The results prove that the proposed aggregation of the evaluation metric values is a useful tool for high-level comparison of IPSs.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.