Search | arXiv e-print repository

doi 10.1038/s41598-024-58346-7

Investigating child sexual abuse material availability, searches, and users on the anonymous Tor network for a public health intervention strategy

Authors: Juha Nurmi, Arttu Paju, Billy Bob Brumley, Tegan Insoll, Anna K. Ovaska, Valeriia Soloveva, Nina Vaaranen-Valkonen, Mikko Aaltonen, David Arroyo

Abstract: Tor is widely used for staying anonymous online and accessing onion websites; unfortunately, Tor is popular for distributing and viewing illicit child sexual abuse material (CSAM). From 2018 to 2023, we analyse 176,683 onion domains and find that one-fifth share CSAM. We find that CSAM is easily available using 21 out of the 26 most-used Tor search engines. We analyse 110,133,715 search sessions f… ▽ More Tor is widely used for staying anonymous online and accessing onion websites; unfortunately, Tor is popular for distributing and viewing illicit child sexual abuse material (CSAM). From 2018 to 2023, we analyse 176,683 onion domains and find that one-fifth share CSAM. We find that CSAM is easily available using 21 out of the 26 most-used Tor search engines. We analyse 110,133,715 search sessions from the Ahmia.fi search engine and discover that 11.1% seek CSAM. When searching CSAM by age, 40.5% search for 11-year-olds and younger; 11.0% for 12-year-olds; 8.2% for 13-year-olds; 11.6% for 14-year-olds; 10.9% for 15-year-olds; and 12.7% for 16-year-olds. We demonstrate accurate filtering for search engines, introduce intervention, show a questionnaire for CSAM users, and analyse 11,470 responses. 65.3% of CSAM users first saw the material when they were children themselves, and half of the respondents first saw the material accidentally, demonstrating the availability of CSAM. 48.1% want to stop using CSAM. Some seek help through Tor, and self-help websites are popular. Our survey finds commonalities between CSAM use and addiction. Help-seeking correlates with increasing viewing duration and frequency, depression, anxiety, self-harming thoughts, guilt, and shame. Yet, 73.9% of help seekers have not been able to receive it. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Published in the Scientific Reports: https://www.nature.com/articles/s41598-024-58346-7

Journal ref: Scientific Reports 14, 7849 (2024)

arXiv:2306.15726 [pdf, other]

doi 10.1145/3600160.3605047

Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access

Authors: Juha Nurmi, Mikko Niemelä, Billy Bob Brumley

Abstract: We investigate the criminal market dynamics of infostealer malware and publish three evidence datasets on malware infections and trade. We justify the value chain between illicit enterprises using the datasets, compare the prices and added value, and use the value chain to identify the most effective countermeasures. We begin by examining infostealer malware victim logs shared by actors on hacki… ▽ More We investigate the criminal market dynamics of infostealer malware and publish three evidence datasets on malware infections and trade. We justify the value chain between illicit enterprises using the datasets, compare the prices and added value, and use the value chain to identify the most effective countermeasures. We begin by examining infostealer malware victim logs shared by actors on hacking forums, and extract victim information and mask sensitive data to protect privacy. We find access to these same victims for sale at Genesis Market. This technically sophisticated marketplace provides its own browser to access victim's online accounts. We collect a second dataset and discover that 91% of prices fall between 1--20 US dollars, with a median of 5 US dollars. Database Market sells access to compromised online accounts. We produce yet another dataset, finding 91% of prices fall between 1--30 US dollars, with a median of 7 US dollars. △ Less

Submitted 17 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: In The 18th International Conference on Availability, Reliability and Security (ARES 2023), August 29 -- September 1, 2023, Benevento, Italy. 12 pages

arXiv:2306.15025 [pdf, other]

doi 10.1145/3600160.3600169

SoK: A Systematic Review of TEE Usage for Develo** Trusted Applications

Authors: Arttu Paju, Muhammad Owais Javed, Juha Nurmi, Juha Savimäki, Brian McGillion, Billy Bob Brumley

Abstract: Trusted Execution Environments (TEEs) are a feature of modern central processing units (CPUs) that aim to provide a high assurance, isolated environment in which to run workloads that demand both confidentiality and integrity. Hardware and software components in the CPU isolate workloads, commonly referred to as Trusted Applications (TAs), from the main operating system (OS). This article aims to… ▽ More Trusted Execution Environments (TEEs) are a feature of modern central processing units (CPUs) that aim to provide a high assurance, isolated environment in which to run workloads that demand both confidentiality and integrity. Hardware and software components in the CPU isolate workloads, commonly referred to as Trusted Applications (TAs), from the main operating system (OS). This article aims to analyse the TEE ecosystem, determine its usability, and suggest improvements where necessary to make adoption easier. To better understand TEE usage, we gathered academic and practical examples from a total of 223 references. We summarise the literature and provide a publication timeline, along with insights into the evolution of TEE research and deployment. We categorise TAs into major groups and analyse the tools available to developers. Lastly, we evaluate trusted container projects, test performance, and identify the requirements for migrating applications inside them. △ Less

Submitted 15 August, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: In The 18th International Conference on Availability, Reliability and Security (ARES 2023), August 29 -- September 01, 2023, Benevento, Italy. 15 pages

arXiv:2304.14271 [pdf, other]

A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

Authors: Dewant Katare, Diego Perino, Jari Nurmi, Martijn Warnier, Marijn Janssen, Aaron Yi Ding

Abstract: Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terab… ▽ More Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2207.06120 [pdf, other]

doi 10.1109/IPIN54987.2022.9918146

SURIMI: Supervised Radio Map Augmentation with Deep Learning and a Generative Adversarial Network for Fingerprint-based Indoor Positioning

Authors: Darwin Quezada-Gaibor, Joaquín Torres-Sospedra, Jari Nurmi, Yevgeni Koucheryavy, Joaquín Huerta

Abstract: Indoor Positioning based on Machine Learning has drawn increasing attention both in the academy and the industry as meaningful information from the reference data can be extracted. Many researchers are using supervised, semi-supervised, and unsupervised Machine Learning models to reduce the positioning error and offer reliable solutions to the end-users. In this article, we propose a new architect… ▽ More Indoor Positioning based on Machine Learning has drawn increasing attention both in the academy and the industry as meaningful information from the reference data can be extracted. Many researchers are using supervised, semi-supervised, and unsupervised Machine Learning models to reduce the positioning error and offer reliable solutions to the end-users. In this article, we propose a new architecture by combining Convolutional Neural Network (CNN), Long short-term memory (LSTM) and Generative Adversarial Network (GAN) in order to increase the training data and thus improve the position accuracy. The proposed combination of supervised and unsupervised models was tested in 17 public datasets, providing an extensive analysis of its performance. As a result, the positioning error has been reduced in more than 70% of them. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: To appear at 2022 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 5 - 7 Sep. 2022, Bei**g, China

arXiv:2206.08152 [pdf, other]

Fault-Tolerant Collaborative Inference through the Edge-PRUNE Framework

Authors: Jani Boutellier, Bo Tan, Jari Nurmi

Abstract: Collaborative inference has received significant research interest in machine learning as a vehicle for distributing computation load, reducing latency, as well as addressing privacy preservation in communications. Recent collaborative inference frameworks have adopted dynamic inference methodologies such as early-exit and run-time partitioning of neural networks. However, as machine learning fram… ▽ More Collaborative inference has received significant research interest in machine learning as a vehicle for distributing computation load, reducing latency, as well as addressing privacy preservation in communications. Recent collaborative inference frameworks have adopted dynamic inference methodologies such as early-exit and run-time partitioning of neural networks. However, as machine learning frameworks scale in the number of inference inputs, e.g., in surveillance applications, fault tolerance related to device failure needs to be considered. This paper presents the Edge-PRUNE distributed computing framework, built on a formally defined model of computation, which provides a flexible infrastructure for fault tolerant collaborative inference. The experimental section of this work shows results on achievable inference time savings by collaborative inference, presents fault tolerant system topologies and analyzes their cost in terms of execution time overhead. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted to ICML 2022 Workshop on Dynamic Neural Networks (DyNN)

arXiv:2205.02096 [pdf, other]

doi 10.1109/MDM55031.2022.00079

Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets

Authors: Darwin Quezada-Gaibor, Lucie Klus, Joaquín Torres-Sospedra, Elena Simona Lohan, Jari Nurmi, Carlos Granell, Joaquín Huerta

Abstract: Wearable and IoT devices requiring positioning and localisation services grow in number exponentially every year. This rapid growth also produces millions of data entries that need to be pre-processed prior to being used in any indoor positioning system to ensure the data quality and provide a high Quality of Service (QoS) to the end-user. In this paper, we offer a novel and straightforward data c… ▽ More Wearable and IoT devices requiring positioning and localisation services grow in number exponentially every year. This rapid growth also produces millions of data entries that need to be pre-processed prior to being used in any indoor positioning system to ensure the data quality and provide a high Quality of Service (QoS) to the end-user. In this paper, we offer a novel and straightforward data cleansing algorithm for WLAN fingerprinting radio maps. This algorithm is based on the correlation among fingerprints using the Received Signal Strength (RSS) values and the Access Points (APs)'s identifier. We use those to compute the correlation among all samples in the dataset and remove fingerprints with low level of correlation from the dataset. We evaluated the proposed method on 14 independent publicly-available datasets. As a result, an average of 14% of fingerprints were removed from the datasets. The 2D positioning error was reduced by 2.7% and 3D positioning error by 5.3% with a slight increase in the floor hit rate by 1.2% on average. Consequently, the average speed of position prediction was also increased by 14%. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: Submitted to ALIAS2022/MDM2022

arXiv:2204.12947 [pdf, other]

Edge-PRUNE: Flexible Distributed Deep Learning Inference

Authors: Jani Boutellier, Bo Tan, Jari Nurmi

Abstract: Collaborative deep learning inference between low-resource endpoint devices and edge servers has received significant research interest in the last few years. Such computation partitioning can help reducing endpoint device energy consumption and improve latency, but equally importantly also contributes to privacy-preserving of sensitive data. This paper describes Edge-PRUNE, a flexible but light-w… ▽ More Collaborative deep learning inference between low-resource endpoint devices and edge servers has received significant research interest in the last few years. Such computation partitioning can help reducing endpoint device energy consumption and improve latency, but equally importantly also contributes to privacy-preserving of sensitive data. This paper describes Edge-PRUNE, a flexible but light-weight computation framework for distributing machine learning inference between edge servers and one or more client devices. Compared to previous approaches, Edge-PRUNE is based on a formal dataflow computing model, and is agnostic towards machine learning training frameworks, offering at the same time wide support for leveraging deep learning accelerators such as embedded GPUs. The experimental section of the paper demonstrates the use and performance of Edge-PRUNE by image classification and object tracking applications on two heterogeneous endpoint devices and an edge server, over wireless and physical connections. Endpoint device inference time for SSD-Mobilenet based object tracking, for example, is accelerated 5.8x by collaborative inference. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2204.10788 [pdf, other]

doi 10.1109/ICL-GNSS54081.2022.9797035

Towards Accelerated Localization Performance Across Indoor Positioning Datasets

Authors: Lucie Klus, Darwin Quezada-Gaibor, Joaquın Torres-Sospedra, Elena Simona Lohan, Carlos Granell, Jari Nurmi

Abstract: The localization speed and accuracy in the indoor scenario can greatly impact the Quality of Experience of the user. While many individual machine learning models can achieve comparable positioning performance, their prediction mechanisms offer different complexity to the system. In this work, we propose a fingerprinting positioning method for multi-building and multi-floor deployments, composed o… ▽ More The localization speed and accuracy in the indoor scenario can greatly impact the Quality of Experience of the user. While many individual machine learning models can achieve comparable positioning performance, their prediction mechanisms offer different complexity to the system. In this work, we propose a fingerprinting positioning method for multi-building and multi-floor deployments, composed of a cascade of three models for building classification, floor classification, and 2D localization regression. We conduct an exhaustive search for the optimally performing one in each step of the cascade while validating on 14 different openly available datasets. As a result, we bring forward the best-performing combination of models in terms of overall positioning accuracy and processing speed and evaluate on independent sets of samples. We reduce the mean prediction time by 71% while achieving comparable positioning performance across all considered datasets. Moreover, in case of voluminous training dataset, the prediction time is reduced down to 1% of the benchmark's. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: To appear in 2022 International Conference on Localization and GNSS (ICL-GNSS), 7-9 June 2022, Tampere, Finland

arXiv:2204.10418 [pdf, other]

doi 10.1109/ICL-GNSS54081.2022.9797021

Lightweight Hybrid CNN-ELM Model for Multi-building and Multi-floor Classification

Authors: Darwin Quezada-Gaibor, Joaquín Torres-Sospedra, Jari Nurmi, Yevgeni Koucheryavy, Joaquín Huerta

Abstract: Machine learning models have become an essential tool in current indoor positioning solutions, given their high capabilities to extract meaningful information from the environment. Convolutional neural networks (CNNs) are one of the most used neural networks (NNs) due to that they are capable of learning complex patterns from the input data. Another model used in indoor positioning solutions is th… ▽ More Machine learning models have become an essential tool in current indoor positioning solutions, given their high capabilities to extract meaningful information from the environment. Convolutional neural networks (CNNs) are one of the most used neural networks (NNs) due to that they are capable of learning complex patterns from the input data. Another model used in indoor positioning solutions is the Extreme Learning Machine (ELM), which provides an acceptable generalization performance as well as a fast speed of learning. In this paper, we offer a lightweight combination of CNN and ELM, which provides a quick and accurate classification of building and floor, suitable for power and resource-constrained devices. As a result, the proposed model is 58\% faster than the benchmark, with a slight improvement in the classification accuracy (by less than 1\% △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: to appear in 2022 International Conference on Localization and GNSS (ICL-GNSS), 7-9 June 2022, Tampere, Finland

arXiv:2109.09436 [pdf, other]

doi 10.1109/IPIN51156.2021.9662560

Towards Ubiquitous Indoor Positioning: Comparing Systems across Heterogeneous Datasets

Authors: Joaquín Torres-Sospedra, Ivo Silva, Lucie Klus, Darwin Quezada-Gaibor, Antonino Crivello, Paolo Barsocchi, Cristiano Pendão, Elena Simona Lohan, Jari Nurmi, Adriano Moreira

Abstract: The evaluation of Indoor Positioning Systems (IPS) mostly relies on local deployments in the researchers' or partners' facilities. The complexity of preparing comprehensive experiments, collecting data, and considering multiple scenarios usually limits the evaluation area and, therefore, the assessment of the proposed systems. The requirements and features of controlled experiments cannot be gener… ▽ More The evaluation of Indoor Positioning Systems (IPS) mostly relies on local deployments in the researchers' or partners' facilities. The complexity of preparing comprehensive experiments, collecting data, and considering multiple scenarios usually limits the evaluation area and, therefore, the assessment of the proposed systems. The requirements and features of controlled experiments cannot be generalized since the use of the same sensors or anchors density cannot be guaranteed. The dawn of datasets is pushing IPS evaluation to a similar level as machine-learning models, where new proposals are evaluated over many heterogeneous datasets. This paper proposes a way to evaluate IPSs in multiple scenarios, that is validated with three use cases. The results prove that the proposed aggregation of the evaluation metric values is a useful tool for high-level comparison of IPSs. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: to appear in 2021 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 29 Nov. - 2 Dec. 2021, Lloret de Mar, Spain

Showing 1–11 of 11 results for author: Nurmi, J