Search | arXiv e-print repository

DART: A Solution for Decentralized Federated Learning Model Robustness Analysis

Authors: Chao Feng, Alberto Huertas Celdrán, Jan von der Assen, Enrique Tomás Martínez Beltrán, Gérôme Bovet, Burkhard Stiller

Abstract: Federated Learning (FL) has emerged as a promising approach to address privacy concerns inherent in Machine Learning (ML) practices. However, conventional FL methods, particularly those following the Centralized FL (CFL) paradigm, utilize a central server for global aggregation, which exhibits limitations such as bottleneck and single point of failure. To address these issues, the Decentralized FL… ▽ More Federated Learning (FL) has emerged as a promising approach to address privacy concerns inherent in Machine Learning (ML) practices. However, conventional FL methods, particularly those following the Centralized FL (CFL) paradigm, utilize a central server for global aggregation, which exhibits limitations such as bottleneck and single point of failure. To address these issues, the Decentralized FL (DFL) paradigm has been proposed, which removes the client-server boundary and enables all participants to engage in model training and aggregation tasks. Nevertheless, as CFL, DFL remains vulnerable to adversarial attacks, notably poisoning attacks that undermine model performance. While existing research on model robustness has predominantly focused on CFL, there is a noteworthy gap in understanding the model robustness of the DFL paradigm. In this paper, a thorough review of poisoning attacks targeting the model robustness in DFL systems, as well as their corresponding countermeasures, are presented. Additionally, a solution called DART is proposed to evaluate the robustness of DFL models, which is implemented and integrated into a DFL platform. Through extensive experiments, this paper compares the behavior of CFL and DFL under diverse poisoning attacks, pinpointing key factors affecting attack spread and effectiveness within the DFL. It also evaluates the performance of different defense mechanisms and investigates whether defense mechanisms designed for CFL are compatible with DFL. The empirical results provide insights into research challenges and suggest ways to improve the robustness of DFL models for future research. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2405.09318 [pdf, other]

Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Gérôme Bovet, Gregorio Martínez Pérez

Abstract: In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the lite… ▽ More In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the literature. However, current ML/DL vulnerability detection methods struggle with understanding the context and intent behind complex attacks. Integrating large language models (LLMs) with system call analysis offers a promising approach to enhance malware detection. This work presents a novel framework leveraging LLMs to classify malware based on system call data. The framework uses transfer learning to adapt pre-trained LLMs for malware detection. By retraining LLMs on a dataset of benign and malicious system calls, the models are refined to detect signs of malware activity. Experiments with a dataset of over 1TB of system calls demonstrate that models with larger context sizes, such as BigBird and Longformer, achieve superior accuracy and F1-Score of approximately 0.86. The results highlight the importance of context size in improving detection rates and underscore the trade-offs between computational complexity and performance. This approach shows significant potential for real-time detection in high-stakes environments, offering a robust solution to evolving cyber threats. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE MILCOM 2024

arXiv:2401.17917 [pdf, ps, other]

GuardFS: a File System for Integrated Detection and Mitigation of Linux-based Ransomware

Authors: Jan von der Assen, Chao Feng, Alberto Huertas Celdrán, Róbert Oleš, Gérôme Bovet, Burkhard Stiller

Abstract: Although ransomware has received broad attention in media and research, this evolving threat vector still poses a systematic threat. Related literature has explored their detection using various approaches leveraging Machine and Deep Learning. While these approaches are effective in detecting malware, they do not answer how to use this intelligence to protect against threats, raising concerns abou… ▽ More Although ransomware has received broad attention in media and research, this evolving threat vector still poses a systematic threat. Related literature has explored their detection using various approaches leveraging Machine and Deep Learning. While these approaches are effective in detecting malware, they do not answer how to use this intelligence to protect against threats, raising concerns about their applicability in a hostile environment. Solutions that focus on mitigation rarely explore how to prevent and not just alert or halt its execution, especially when considering Linux-based samples. This paper presents GuardFS, a file system-based approach to investigate the integration of detection and mitigation of ransomware. Using a bespoke overlay file system, data is extracted before files are accessed. Models trained on this data are used by three novel defense configurations that obfuscate, delay, or track access to the file system. The experiments on GuardFS test the configurations in a reactive setting. The results demonstrate that although data loss cannot be completely prevented, it can be significantly reduced. Usability and performance analysis demonstrate that the defense effectiveness of the configurations relates to their impact on resource consumption and usability. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2311.05270 [pdf, other]

Evaluation of Data Processing and Machine Learning Techniques in P300-based Authentication using Brain-Computer Interfaces

Authors: Eduardo López Bernal, Sergio López Bernal, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: Brain-Computer Interfaces (BCIs) are used in various application scenarios allowing direct communication between the brain and computers. Specifically, electroencephalography (EEG) is one of the most common techniques for obtaining evoked potentials resulting from external stimuli, as the P300 potential is elicited from known images. The combination of Machine Learning (ML) and P300 potentials is… ▽ More Brain-Computer Interfaces (BCIs) are used in various application scenarios allowing direct communication between the brain and computers. Specifically, electroencephalography (EEG) is one of the most common techniques for obtaining evoked potentials resulting from external stimuli, as the P300 potential is elicited from known images. The combination of Machine Learning (ML) and P300 potentials is promising for authenticating subjects since the brain waves generated by each person when facing a particular stimulus are unique. However, existing authentication solutions do not extensively explore P300 potentials and fail when analyzing the most suitable processing and ML-based classification techniques. Thus, this work proposes i) a framework for authenticating BCI users using the P300 potential; ii) the validation of the framework on ten subjects creating an experimental scenario employing a non-invasive EEG-based BCI; and iii) the evaluation of the framework performance defining two experiments (binary and multiclass ML classification) and three testing configurations incrementally analyzing the performance of different processing techniques and the differences between classifying with epochs or statistical values. This framework achieved a performance close to 100\% f1-score in both experiments for the best classifier, highlighting its effectiveness in accurately authenticating users and demonstrating the feasibility of performing EEG-based authentication using P300 potentials. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.20435 [pdf, other]

Assessing the Sustainability and Trustworthiness of Federated Learning Models

Authors: Alberto Huertas Celdran, Chao Feng, Pedro Miguel Sanchez Sanchez, Lynn Zumtaugwald, Gerome Bovet, Burkhard Stiller

Abstract: Artificial intelligence (AI) plays a pivotal role in various sectors, influencing critical decision-making processes in our daily lives. Within the AI landscape, novel AI paradigms, such as Federated Learning (FL), focus on preserving data privacy while collaboratively training AI models. In such a context, a group of experts from the European Commission (AI-HLEG) has identified sustainable AI as… ▽ More Artificial intelligence (AI) plays a pivotal role in various sectors, influencing critical decision-making processes in our daily lives. Within the AI landscape, novel AI paradigms, such as Federated Learning (FL), focus on preserving data privacy while collaboratively training AI models. In such a context, a group of experts from the European Commission (AI-HLEG) has identified sustainable AI as one of the key elements that must be considered to provide trustworthy AI. While existing literature offers several taxonomies and solutions for assessing the trustworthiness of FL models, a significant gap exists in considering sustainability and the carbon footprint associated with FL. Thus, this work introduces the sustainability pillar to the most recent and comprehensive trustworthy FL taxonomy, making this work the first to address all AI-HLEG requirements. The sustainability pillar assesses the FL system environmental impact, incorporating notions and metrics for hardware efficiency, federation complexity, and energy grid carbon intensity. Then, this work designs and implements an algorithm for evaluating the trustworthiness of FL models by incorporating the sustainability pillar. Extensive evaluations with the FederatedScope framework and various scenarios varying federation participants, complexities, hardware, and energy grids demonstrate the usefulness of the proposed solution. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.08739 [pdf, other]

Voyager: MTD-Based Aggregation Protocol for Mitigating Poisoning Attacks on DFL

Authors: Chao Feng, Alberto Huertas Celdran, Michael Vuong, Gerome Bovet, Burkhard Stiller

Abstract: The growing concern over malicious attacks targeting the robustness of both Centralized and Decentralized Federated Learning (FL) necessitates novel defensive strategies. In contrast to the centralized approach, Decentralized FL (DFL) has the advantage of utilizing network topology and local dataset information, enabling the exploration of Moving Target Defense (MTD) based approaches. This work… ▽ More The growing concern over malicious attacks targeting the robustness of both Centralized and Decentralized Federated Learning (FL) necessitates novel defensive strategies. In contrast to the centralized approach, Decentralized FL (DFL) has the advantage of utilizing network topology and local dataset information, enabling the exploration of Moving Target Defense (MTD) based approaches. This work presents a theoretical analysis of the influence of network topology on the robustness of DFL models. Drawing inspiration from these findings, a three-stage MTD-based aggregation protocol, called Voyager, is proposed to improve the robustness of DFL models against poisoning attacks by manipulating network topology connectivity. Voyager has three main components: an anomaly detector, a network topology explorer, and a connection deployer. When an abnormal model is detected in the network, the topology explorer responds strategically by forming connections with more trustworthy participants to secure the model. Experimental evaluations show that Voyager effectively mitigates various poisoning attacks without imposing significant resource and computational burdens on participants. These findings highlight the proposed reactive MTD as a potent defense mechanism in the context of DFL. △ Less

Submitted 14 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.08097 [pdf, other]

Sentinel: An Aggregation Function to Secure Decentralized Federated Learning

Authors: Chao Feng, Alberto Huertas Celdran, Janosch Baltensperger, Enrique Tomas Martinez Beltran, Gerome Bovet, Burkhard Stiller

Abstract: The rapid integration of Federated Learning (FL) into networking encompasses various aspects such as network management, quality of service, and cybersecurity while preserving data privacy. In this context, Decentralized Federated Learning (DFL) emerges as an innovative paradigm to train collaborative models, addressing the single point of failure limitation. However, the security and trustworthin… ▽ More The rapid integration of Federated Learning (FL) into networking encompasses various aspects such as network management, quality of service, and cybersecurity while preserving data privacy. In this context, Decentralized Federated Learning (DFL) emerges as an innovative paradigm to train collaborative models, addressing the single point of failure limitation. However, the security and trustworthiness of FL and DFL are compromised by poisoning attacks, negatively impacting its performance. Existing defense mechanisms have been designed for centralized FL and they do not adequately exploit the particularities of DFL. Thus, this work introduces Sentinel, a defense strategy to counteract poisoning attacks in DFL. Sentinel leverages the accessibility of local data and defines a three-step aggregation protocol consisting of similarity filtering, bootstrap validation, and normalization to safeguard against malicious model updates. Sentinel has been evaluated with diverse datasets and various poisoning attack types and threat levels, improving the state-of-the-art performance against both untargeted and targeted poisoning attacks. △ Less

Submitted 14 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2308.05978 [pdf, other]

CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation

Authors: Chao Feng, Alberto Huertas Celdran, Pedro Miguel Sanchez Sanchez, Jan Kreischer, Jan von der Assen, Gerome Bovet, Gregorio Martinez Perez, Burkhard Stiller

Abstract: Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective agains… ▽ More Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective against a rising number of heterogeneous zero-day attacks. Thus, this work presents CyberForce, a framework that combines Federated and Reinforcement Learning (FRL) to collaboratively and privately learn suitable MTD techniques for mitigating zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been deployed and evaluated in a scenario consisting of ten physical devices of a real IoT platform affected by heterogeneous malware samples. A pool of experiments has demonstrated that CyberForce learns the MTD technique mitigating each attack faster than existing RL-based centralized approaches. In addition, when various devices are exposed to different attacks, CyberForce benefits from knowledge transfer, leading to enhanced performance and reduced learning time in comparison to recent works. Finally, different aggregation algorithms used during the agent learning process provide CyberForce with notable robustness to malicious attacks. △ Less

Submitted 8 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: 11 pages, 8 figures

arXiv:2308.03554 [pdf, other]

TemporalFED: Detecting Cyberattacks in Industrial Time-Series Data Using Decentralized Federated Learning

Authors: Ángel Luis Perales Gómez, Enrique Tomás Martínez Beltrán, Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán

Abstract: Industry 4.0 has brought numerous advantages, such as increasing productivity through automation. However, it also presents major cybersecurity issues such as cyberattacks affecting industrial processes. Federated Learning (FL) combined with time-series analysis is a promising cyberattack detection mechanism proposed in the literature. However, the fact of having a single point of failure and netw… ▽ More Industry 4.0 has brought numerous advantages, such as increasing productivity through automation. However, it also presents major cybersecurity issues such as cyberattacks affecting industrial processes. Federated Learning (FL) combined with time-series analysis is a promising cyberattack detection mechanism proposed in the literature. However, the fact of having a single point of failure and network bottleneck are critical challenges that need to be tackled. Thus, this article explores the benefits of the Decentralized Federated Learning (DFL) in terms of cyberattack detection and resource consumption. The work presents TemporalFED, a software module for detecting anomalies in industrial environments using FL paradigms and time series. TemporalFED incorporates three components: Time Series Conversion, Feature Engineering, and Time Series Stationary Conversion. To evaluate TemporalFED, it was deployed on Fedstellar, a DFL framework. Then, a pool of experiments measured the detection performance and resource consumption in a chemical gas industrial environment with different time-series configurations, FL paradigms, and topologies. The results showcase the superiority of the configuration utilizing DFL and Semi-Decentralized Federated Learning (SDFL) paradigms, along with a fully connected topology, which achieved the best performance in anomaly detection. Regarding resource consumption, the configuration without feature engineering employed less bandwidth, CPU, and RAM than other configurations. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.11730 [pdf, other]

Mitigating Communications Threats in Decentralized Federated Learning through Moving Target Defense

Authors: Enrique Tomás Martínez Beltrán, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decent… ▽ More The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decentralized nature of the aggregation process, the varied roles and responsibilities of the participants, and the absence of a central authority to oversee and mitigate threats. Addressing these challenges, this paper first delineates a comprehensive threat model focused on DFL communications. In response to these identified risks, this work introduces a security module to counter communication-based attacks for DFL platforms. The module combines security techniques such as symmetric and asymmetric encryption with Moving Target Defense (MTD) techniques, including random neighbor selection and IP/port switching. The security module is implemented in a DFL platform, Fedstellar, allowing the deployment and monitoring of the federation. A DFL scenario with physical and virtual deployments have been executed, encompassing three security configurations: (i) a baseline without security, (ii) an encrypted configuration, and (iii) a configuration integrating both encryption and MTD techniques. The effectiveness of the security module is validated through experiments with the MNIST dataset and eclipse attacks. The results showed an average F1 score of 95%, with the most secure configuration resulting in CPU usage peaking at 68% (+-9%) in virtual deployments and network traffic reaching 480.8 MB (+-18 MB), effectively mitigating risks associated with eavesdrop** or eclipse attacks. △ Less

Submitted 9 December, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2306.15566 [pdf, other]

MTFS: a Moving Target Defense-Enabled File System for Malware Mitigation

Authors: Jan von der Assen, Alberto Huertas Celdrán, Rinor Sefa, Gérôme Bovet, Burkhard Stiller

Abstract: Ransomware has remained one of the most notorious threats in the cybersecurity field. Moving Target Defense (MTD) has been proposed as a novel paradigm for proactive defense. Although various approaches leverage MTD, few of them rely on the operating system and, specifically, the file system, thereby making them dependent on other computing devices. Furthermore, existing ransomware defense techniq… ▽ More Ransomware has remained one of the most notorious threats in the cybersecurity field. Moving Target Defense (MTD) has been proposed as a novel paradigm for proactive defense. Although various approaches leverage MTD, few of them rely on the operating system and, specifically, the file system, thereby making them dependent on other computing devices. Furthermore, existing ransomware defense techniques merely replicate or detect attacks, without preventing them. Thus, this paper introduces the MTFS overlay file system and the design and implementation of three novel MTD techniques implemented on top of it. One delaying attackers, one trap** recursive directory traversal, and another one hiding file types. The effectiveness of the techniques are shown in two experiments. First, it is shown that the techniques can delay and mitigate ransomware on real IoT devices. Secondly, in a broader scope, the solution was confronted with 14 ransomware samples, highlighting that it can save 97% of the files. △ Less

Submitted 16 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.15559 [pdf, other]

RansomAI: AI-powered Ransomware for Stealthy Encryption

Authors: Jan von der Assen, Alberto Huertas Celdrán, Janik Luechinger, Pedro Miguel Sánchez Sánchez, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: Cybersecurity solutions have shown promising performance when detecting ransomware samples that use fixed algorithms and encryption rates. However, due to the current explosion of Artificial Intelligence (AI), sooner than later, ransomware (and malware in general) will incorporate AI techniques to intelligently and dynamically adapt its encryption behavior to be undetected. It might result in inef… ▽ More Cybersecurity solutions have shown promising performance when detecting ransomware samples that use fixed algorithms and encryption rates. However, due to the current explosion of Artificial Intelligence (AI), sooner than later, ransomware (and malware in general) will incorporate AI techniques to intelligently and dynamically adapt its encryption behavior to be undetected. It might result in ineffective and obsolete cybersecurity solutions, but the literature lacks AI-powered ransomware to verify it. Thus, this work proposes RansomAI, a Reinforcement Learning-based framework that can be integrated into existing ransomware samples to adapt their encryption behavior and stay stealthy while encrypting files. RansomAI presents an agent that learns the best encryption algorithm, rate, and duration that minimizes its detection (using a reward mechanism and a fingerprinting intelligent detection system) while maximizing its damage function. The proposed framework was validated in a ransomware, Ransomware-PoC, that infected a Raspberry Pi 4, acting as a crowdsensor. A pool of experiments with Deep Q-Learning and Isolation Forest (deployed on the agent and detection system, respectively) has demonstrated that RansomAI evades the detection of Ransomware-PoC affecting the Raspberry Pi 4 in a few minutes with >90% accuracy. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.09750 [pdf, other]

doi 10.1016/j.eswa.2023.122861

Fedstellar: A Platform for Decentralized Federated Learning

Authors: Enrique Tomás Martínez Beltrán, Ángel Luis Perales Gómez, Chao Feng, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, sin… ▽ More In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, single point of failure, and reliance on a central server. Decentralized Federated Learning (DFL) addresses these issues by enabling decentralized model aggregation and minimizing dependency on a central entity. Despite these advances, current platforms training DFL models struggle with key issues such as managing heterogeneous federation network topologies. To overcome these challenges, this paper presents Fedstellar, a platform extended from p2pfl library and designed to train FL models in a decentralized, semi-decentralized, and centralized fashion across diverse federations of physical or virtualized devices. The Fedstellar implementation encompasses a web application with an interactive graphical interface, a controller for deploying federations of nodes using physical or virtual devices, and a core deployed on each device which provides the logic needed to train, aggregate, and communicate in the network. The effectiveness of the platform has been demonstrated in two scenarios: a physical deployment involving single-board devices such as Raspberry Pis for detecting cyberattacks, and a virtualized deployment comparing various FL approaches in a controlled environment using MNIST and CIFAR-10 datasets. In both scenarios, Fedstellar demonstrated consistent performance and adaptability, achieving F1 scores of 91%, 98%, and 91.2% using DFL for detecting cyberattacks and classifying MNIST and CIFAR-10, respectively, reducing training time by 32% compared to centralized approaches. △ Less

Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.08495 [pdf, other]

Single-board Device Individual Authentication based on Hardware Performance and Autoencoder Transformer Models

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Gérôme Bovet, Gregorio Martínez Pérez

Abstract: The proliferation of the Internet of Things (IoT) has led to the emergence of crowdsensing applications, where a multitude of interconnected devices collaboratively collect and analyze data. Ensuring the authenticity and integrity of the data collected by these devices is crucial for reliable decision-making and maintaining trust in the system. Traditional authentication methods are often vulnerab… ▽ More The proliferation of the Internet of Things (IoT) has led to the emergence of crowdsensing applications, where a multitude of interconnected devices collaboratively collect and analyze data. Ensuring the authenticity and integrity of the data collected by these devices is crucial for reliable decision-making and maintaining trust in the system. Traditional authentication methods are often vulnerable to attacks or can be easily duplicated, posing challenges to securing crowdsensing applications. Besides, current solutions leveraging device behavior are mostly focused on device identification, which is a simpler task than authentication. To address these issues, an individual IoT device authentication framework based on hardware behavior fingerprinting and Transformer autoencoders is proposed in this work. This solution leverages the inherent imperfections and variations in IoT device hardware to differentiate between devices with identical specifications. By monitoring and analyzing the behavior of key hardware components, such as the CPU, GPU, RAM, and Storage on devices, unique fingerprints for each device are created. The performance samples are considered as time series data and used to train outlier detection transformer models, one per device and aiming to model its normal data distribution. Then, the framework is validated within a spectrum crowdsensing system leveraging Raspberry Pi devices. After a pool of experiments, the model from each device is able to individually authenticate it between the 45 devices employed for validation. An average True Positive Rate (TPR) of 0.74+-0.13 and an average maximum False Positive Rate (FPR) of 0.06+-0.09 demonstrate the effectiveness of this approach in enhancing authentication, security, and trust in crowdsensing applications. △ Less

Submitted 11 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

arXiv:2302.09844 [pdf, other]

FederatedTrust: A Solution for Trustworthy Federated Learning

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Ning Xie, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: The rapid expansion of the Internet of Things (IoT) and Edge Computing has presented challenges for centralized Machine and Deep Learning (ML/DL) methods due to the presence of distributed data silos that hold sensitive information. To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged. However, ensuring data pri… ▽ More The rapid expansion of the Internet of Things (IoT) and Edge Computing has presented challenges for centralized Machine and Deep Learning (ML/DL) methods due to the presence of distributed data silos that hold sensitive information. To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged. However, ensuring data privacy and performance alone is insufficient since there is a growing need to establish trust in model predictions. Existing literature has proposed various approaches on trustworthy ML/DL (excluding data privacy), identifying robustness, fairness, explainability, and accountability as important pillars. Nevertheless, further research is required to identify trustworthiness pillars and evaluation metrics specifically relevant to FL models, as well as to develop solutions that can compute the trustworthiness level of FL models. This work examines the existing requirements for evaluating trustworthiness in FL and introduces a comprehensive taxonomy consisting of six pillars (privacy, robustness, fairness, explainability, accountability, and federation), along with over 30 metrics for computing the trustworthiness of FL models. Subsequently, an algorithm named FederatedTrust is designed based on the pillars and metrics identified in the taxonomy to compute the trustworthiness score of FL models. A prototype of FederatedTrust is implemented and integrated into the learning process of FederatedScope, a well-established FL framework. Finally, five experiments are conducted using different configurations of FederatedScope to demonstrate the utility of FederatedTrust in computing the trustworthiness of FL models. Three experiments employ the FEMNIST dataset, and two utilize the N-BaIoT dataset considering a real-world IoT security use case. △ Less

Submitted 6 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2212.14677 [pdf, other]

Adversarial attacks and defenses on ML- and hardware-based IoT device fingerprinting and identification

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Gérôme Bovet, Gregorio Martínez Pérez

Abstract: In the last years, the number of IoT devices deployed has suffered an undoubted explosion, reaching the scale of billions. However, some new cybersecurity issues have appeared together with this development. Some of these issues are the deployment of unauthorized devices, malicious code modification, malware deployment, or vulnerability exploitation. This fact has motivated the requirement for new… ▽ More In the last years, the number of IoT devices deployed has suffered an undoubted explosion, reaching the scale of billions. However, some new cybersecurity issues have appeared together with this development. Some of these issues are the deployment of unauthorized devices, malicious code modification, malware deployment, or vulnerability exploitation. This fact has motivated the requirement for new device identification mechanisms based on behavior monitoring. Besides, these solutions have recently leveraged Machine and Deep Learning techniques due to the advances in this field and the increase in processing capabilities. In contrast, attackers do not stay stalled and have developed adversarial attacks focused on context modification and ML/DL evaluation evasion applied to IoT device identification solutions. This work explores the performance of hardware behavior-based individual device identification, how it is affected by possible context- and ML/DL-focused attacks, and how its resilience can be improved using defense techniques. In this sense, it proposes an LSTM-CNN architecture based on hardware performance behavior for individual device identification. Then, previous techniques have been compared with the proposed architecture using a hardware performance dataset collected from 45 Raspberry Pi devices running identical software. The LSTM-CNN improves previous solutions achieving a +0.96 average F1-Score and 0.8 minimum TPR for all devices. Afterward, context- and ML/DL-focused adversarial attacks were applied against the previous model to test its robustness. A temperature-based context attack was not able to disrupt the identification. However, some ML/DL state-of-the-art evasion attacks were successful. Finally, adversarial training and model distillation defense techniques are selected to improve the model resilience to evasion attacks, without degrading its performance. △ Less

Submitted 30 December, 2022; originally announced December 2022.

arXiv:2212.14647 [pdf, other]

doi 10.1109/TIFS.2024.3402055

RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoT

Authors: Alberto Huertas Celdrán, Pedro Miguel Sánchez Sánchez, Jan von der Assen, Timo Schenk, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learnin… ▽ More Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM. △ Less

Submitted 30 December, 2022; originally announced December 2022.

arXiv:2212.03169 [pdf, other]

When Brain-Computer Interfaces Meet the Metaverse: Landscape, Demonstrator, Trends, Challenges, and Concerns

Authors: Sergio López Bernal, Mario Quiles Pérez, Enrique Tomás Martínez Beltrán, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: The metaverse has gained tremendous popularity in recent years, allowing the interconnection of users worldwide. However, current systems in metaverse scenarios, such as virtual reality glasses, offer a partial immersive experience. In this context, Brain-Computer Interfaces (BCIs) can introduce a revolution in the metaverse, although a study of the applicability and implications of BCIs in these… ▽ More The metaverse has gained tremendous popularity in recent years, allowing the interconnection of users worldwide. However, current systems in metaverse scenarios, such as virtual reality glasses, offer a partial immersive experience. In this context, Brain-Computer Interfaces (BCIs) can introduce a revolution in the metaverse, although a study of the applicability and implications of BCIs in these virtual scenarios is required. Based on the absence of literature, this work reviews, for the first time, the applicability of BCIs in the metaverse, analyzing the current status of this integration based on different categories related to virtual worlds and the evolution of BCIs in these scenarios in the medium and long term. This work also proposes the design and implementation of a general framework that integrates BCIs with different data sources from sensors and actuators (e.g., VR glasses) based on a modular design to be easily extended. This manuscript also validates the framework in a demonstrator consisting of driving a car within a metaverse, using a BCI for neural data acquisition, a VR headset to provide realism, and a steering wheel and pedals. Four use cases (UCs) are selected, focusing on cognitive and emotional assessment of the driver, detection of drowsiness, and driver authentication while using the vehicle. Moreover, this manuscript offers an analysis of BCI trends in the metaverse, also identifying future challenges that the intersection of these technologies will face. Finally, it reviews the concerns that using BCIs in virtual world applications could generate according to different categories: accessibility, user inclusion, privacy, cybersecurity, physical safety, and ethics. △ Less

Submitted 16 November, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

arXiv:2211.08413 [pdf, other]

doi 10.1109/COMST.2023.3315746

Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges

Authors: Enrique Tomás Martínez Beltrán, Mario Quiles Pérez, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: In recent years, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustwo… ▽ More In recent years, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustworthiness concerns affecting the entity responsible for the global model creation. Decentralized Federated Learning (DFL) emerged to address these concerns by promoting decentralized model aggregation and minimizing reliance on centralized architectures. However, despite the work done in DFL, the literature has not (i) studied the main aspects differentiating DFL and CFL; (ii) analyzed DFL frameworks to create and evaluate new solutions; and (iii) reviewed application scenarios using DFL. Thus, this article identifies and analyzes the main fundamentals of DFL in terms of federation architectures, topologies, communication mechanisms, security approaches, and key performance indicators. Additionally, the paper at hand explores existing mechanisms to optimize critical DFL fundamentals. Then, the most relevant features of the current DFL frameworks are reviewed and compared. After that, it analyzes the most used DFL application scenarios, identifying solutions based on the fundamentals and frameworks previously defined. Finally, the evolution of existing DFL solutions is studied to provide a list of trends, lessons learned, and open challenges. △ Less

Submitted 13 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2210.11501 [pdf, other]

Trust-as-a-Service: A reputation-enabled trust framework for 5G networks

Authors: José María Jorquera Valero, Pedro Miguel Sánchez Sánchez, Manuel Gil Pérez, Alberto Huertas Celdrán, Gregorio Martínez Pérez

Abstract: Trust, security, and privacy are three of the major pillars to assemble the fifth generation network and beyond. Despite such pillars are principally interconnected, they arise a multitude of challenges to be addressed separately. 5G ought to offer flexible and pervasive computing capabilities across multiple domains according to user demands and assuring trustworthy network providers. Distributed… ▽ More Trust, security, and privacy are three of the major pillars to assemble the fifth generation network and beyond. Despite such pillars are principally interconnected, they arise a multitude of challenges to be addressed separately. 5G ought to offer flexible and pervasive computing capabilities across multiple domains according to user demands and assuring trustworthy network providers. Distributed marketplaces expect to boost the trading of heterogeneous resources so as to enable the establishment of pervasive service chains between cross-domains. Nevertheless, the need for reliable parties as ``marketplace operators'' plays a pivotal role to achieving a trustworthy ecosystem. One of the principal blockages in managing foreseeable networks is the need of adapting previous trust models to accomplish the new network and business requirements. In this regard, this article is centered on trust management of 5G multi-party networks. The design of a reputation-based trust framework is proposed as a Trust-as-a-Service (TaaS) solution for any distributed multi-stakeholder environment where zero trust and zero-touch principles should be met. Besides, a literature review is also conducted to recognize the network and business requirements currently envisaged. Finally, the validation of the proposed trust framework is performed in a real research environment, the 5GBarcelona testbed, leveraging 12% of a 2.1GHz CPU with 20 cores and 2% of the 30GiB memory. In this regard, these outcomes reveal the feasibility of the TaaS solution in the context of determining reliable network operators. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.11061 [pdf, other]

Analyzing the Robustness of Decentralized Horizontal and Vertical Federated Learning Architectures in a Non-IID Scenario

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Enrique Tomás Martínez Beltrán, Daniel Demeter, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: Federated learning (FL) allows participants to collaboratively train machine and deep learning models while protecting data privacy. However, the FL paradigm still presents drawbacks affecting its trustworthiness since malicious participants could launch adversarial attacks against the training process. Related work has studied the robustness of horizontal FL scenarios under different attacks. How… ▽ More Federated learning (FL) allows participants to collaboratively train machine and deep learning models while protecting data privacy. However, the FL paradigm still presents drawbacks affecting its trustworthiness since malicious participants could launch adversarial attacks against the training process. Related work has studied the robustness of horizontal FL scenarios under different attacks. However, there is a lack of work evaluating the robustness of decentralized vertical FL and comparing it with horizontal FL architectures affected by adversarial attacks. Thus, this work proposes three decentralized FL architectures, one for horizontal and two for vertical scenarios, namely HoriChain, VertiChain, and VertiComb. These architectures present different neural networks and training protocols suitable for horizontal and vertical scenarios. Then, a decentralized, privacy-preserving, and federated use case with non-IID data to classify handwritten digits is deployed to evaluate the performance of the three architectures. Finally, a set of experiments computes and compares the robustness of the proposed architectures when they are affected by different data poisoning based on image watermarks and gradient poisoning adversarial attacks. The experiments show that even though particular configurations of both attacks can destroy the classification performance of the architectures, HoriChain is the most robust one. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.07719 [pdf, other]

doi 10.1109/ICC45041.2023.10278951

A Lightweight Moving Target Defense Framework for Multi-purpose Malware Affecting IoT Devices

Authors: Jan von der Assen, Alberto Huertas Celdrán, Pedro Miguel Sánchez Sánchez, Jordan Cedeño, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: Malware affecting Internet of Things (IoT) devices is rapidly growing due to the relevance of this paradigm in real-world scenarios. Specialized literature has also detected a trend towards multi-purpose malware able to execute different malicious actions such as remote control, data leakage, encryption, or code hiding, among others. Protecting IoT devices against this kind of malware is challengi… ▽ More Malware affecting Internet of Things (IoT) devices is rapidly growing due to the relevance of this paradigm in real-world scenarios. Specialized literature has also detected a trend towards multi-purpose malware able to execute different malicious actions such as remote control, data leakage, encryption, or code hiding, among others. Protecting IoT devices against this kind of malware is challenging due to their well-known vulnerabilities and limitation in terms of CPU, memory, and storage. To improve it, the moving target defense (MTD) paradigm was proposed a decade ago and has shown promising results, but there is a lack of IoT MTD solutions dealing with multi-purpose malware. Thus, this work proposes four MTD mechanisms changing IoT devices' network, data, and runtime environment to mitigate multi-purpose malware. Furthermore, it presents a lightweight and IoT-oriented MTD framework to decide what, when, and how the MTD mechanisms are deployed. Finally, the efficiency and effectiveness of the framework and MTD mechanisms are evaluated in a real-world scenario with one IoT spectrum sensor affected by multi-purpose malware. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2209.04048 [pdf, other]

Studying Drowsiness Detection Performance while Driving through Scalable Machine Learning Models using Electroencephalography

Authors: José Manuel Hidalgo Rogel, Enrique Tomás Martínez Beltrán, Mario Quiles Pérez, Sergio López Bernal, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: - Background / Introduction: Driver drowsiness is a significant concern and one of the leading causes of traffic accidents. Advances in cognitive neuroscience and computer science have enabled the detection of drivers' drowsiness using Brain-Computer Interfaces (BCIs) and Machine Learning (ML). However, the literature lacks a comprehensive evaluation of drowsiness detection performance using a het… ▽ More - Background / Introduction: Driver drowsiness is a significant concern and one of the leading causes of traffic accidents. Advances in cognitive neuroscience and computer science have enabled the detection of drivers' drowsiness using Brain-Computer Interfaces (BCIs) and Machine Learning (ML). However, the literature lacks a comprehensive evaluation of drowsiness detection performance using a heterogeneous set of ML algorithms, and it is necessary to study the performance of scalable ML models suitable for groups of subjects. - Methods: To address these limitations, this work presents an intelligent framework employing BCIs and features based on electroencephalography for detecting drowsiness in driving scenarios. The SEED-VIG dataset is used to evaluate the best-performing models for individual subjects and groups. - Results: Results show that Random Forest (RF) outperformed other models used in the literature, such as Support Vector Machine (SVM), with a 78% f1-score for individual models. Regarding scalable models, RF reached a 79% f1-score, demonstrating the effectiveness of these approaches. This publication highlights the relevance of exploring a diverse set of ML algorithms and scalable approaches suitable for groups of subjects to improve drowsiness detection systems and ultimately reduce the number of accidents caused by driver fatigue. - Conclusions: The lessons learned from this study show that not only SVM but also other models not sufficiently explored in the literature are relevant for drowsiness detection. Additionally, scalable approaches are effective in detecting drowsiness, even when new subjects are evaluated. Thus, the proposed framework presents a novel approach for detecting drowsiness in driving scenarios using BCIs and ML. △ Less

Submitted 30 October, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

arXiv:2209.00993 [pdf, other]

Data Fusion in Neuromarketing: Multimodal Analysis of Biosignals, Lifecycle Stages, Current Advances, Datasets, Trends, and Challenges

Authors: Mario Quiles Pérez, Enrique Tomás Martínez Beltrán, Sergio López Bernal, Eduardo Horna Prat, Luis Montesano Del Campo, Lorenzo Fernández Maimó, Alberto Huertas Celdrán

Abstract: The primary goal of any company is to increase its profits by improving both the quality of its products and how they are advertised. In this context, neuromarketing seeks to enhance the promotion of products and generate a greater acceptance on potential buyers. Traditionally, neuromarketing studies have relied on a single biosignal to obtain feedback from presented stimuli. However, thanks to ne… ▽ More The primary goal of any company is to increase its profits by improving both the quality of its products and how they are advertised. In this context, neuromarketing seeks to enhance the promotion of products and generate a greater acceptance on potential buyers. Traditionally, neuromarketing studies have relied on a single biosignal to obtain feedback from presented stimuli. However, thanks to new devices and technological advances studying this area of knowledge, recent trends indicate a shift towards the fusion of diverse biosignals. An example is the usage of electroencephalography for understanding the impact of an advertisement at the neural level and visual tracking to identify the stimuli that induce such impacts. This emerging pattern determines which biosignals to employ for achieving specific neuromarketing objectives. Furthermore, the fusion of data from multiple sources demands advanced processing methodologies. Despite these complexities, there is a lack of literature that adequately collates and organizes the various data sources and the applied processing techniques for the research objectives pursued. To address these challenges, the current paper conducts a comprehensive analysis of the objectives, biosignals, and data processing techniques employed in neuromarketing research. This study provides both the technical definition and a graphical distribution of the elements under revision. Additionally, it presents a categorization based on research objectives and provides an overview of the combinatory methodologies employed. After this, the paper examines primary public datasets designed for neuromarketing research together with others whose main purpose is not neuromarketing, but can be used for this matter. Ultimately, this work provides a historical perspective on the evolution of techniques across various phases over recent years and enumerates key lessons learned. △ Less

Submitted 21 August, 2023; v1 submitted 30 August, 2022; originally announced September 2022.

Comments: 26 pages, 15 figures

arXiv:2204.08516 [pdf, other]

doi 10.1016/j.iot.2023.100764

LwHBench: A low-level hardware component benchmark and dataset for Single Board Computers

Authors: Pedro Miguel Sánchez Sánchez, José María Jorquera Valero, Alberto Huertas Celdrán, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: In today's computing environment, where Artificial Intelligence (AI) and data processing are moving toward the Internet of Things (IoT) and Edge computing paradigms, benchmarking resource-constrained devices is a critical task to evaluate their suitability and performance. Between the employed devices, Single-Board Computers arise as multi-purpose and affordable systems. The literature has explore… ▽ More In today's computing environment, where Artificial Intelligence (AI) and data processing are moving toward the Internet of Things (IoT) and Edge computing paradigms, benchmarking resource-constrained devices is a critical task to evaluate their suitability and performance. Between the employed devices, Single-Board Computers arise as multi-purpose and affordable systems. The literature has explored Single-Board Computers performance when running high-level benchmarks specialized in particular application scenarios, such as AI or medical applications. However, lower-level benchmarking applications and datasets are needed to enable new Edge-based AI solutions for network, system and service management based on device and component performance, such as individual device identification. Thus, this paper presents LwHBench, a low-level hardware benchmarking application for Single-Board Computers that measures the performance of CPU, GPU, Memory and Storage taking into account the component constraints in these types of devices. LwHBench has been implemented for Raspberry Pi devices and run for 100 days on a set of 45 devices to generate an extensive dataset that allows the usage of AI techniques in scenarios where performance data can help in the device management process. Besides, to demonstrate the inter-scenario capability of the dataset, a series of AI-enabled use cases about device identification and context impact on performance are presented as exploration of the published data. Finally, the benchmark application has been adapted and applied to an agriculture-focused scenario where three RockPro64 devices are present. △ Less

Submitted 24 October, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

arXiv:2202.00137 [pdf, other]

doi 10.1109/TDSC.2022.3204535

Studying the Robustness of Anti-adversarial Federated Learning Models Detecting Cyberattacks in IoT Spectrum Sensors

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Timo Schenk, Adrian Lars Benjamin Iten, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: Device fingerprinting combined with Machine and Deep Learning (ML/DL) report promising performance when detecting cyberattacks targeting data managed by resource-constrained spectrum sensors. However, the amount of data needed to train models and the privacy concerns of such scenarios limit the applicability of centralized ML/DL-based approaches. Federated learning (FL) addresses these limitations… ▽ More Device fingerprinting combined with Machine and Deep Learning (ML/DL) report promising performance when detecting cyberattacks targeting data managed by resource-constrained spectrum sensors. However, the amount of data needed to train models and the privacy concerns of such scenarios limit the applicability of centralized ML/DL-based approaches. Federated learning (FL) addresses these limitations by creating federated and privacy-preserving models. However, FL is vulnerable to malicious participants, and the impact of adversarial attacks on federated models detecting spectrum sensing data falsification (SSDF) attacks on spectrum sensors has not been studied. To address this challenge, the first contribution of this work is the creation of a novel dataset suitable for FL and modeling the behavior (usage of CPU, memory, or file system, among others) of resource-constrained spectrum sensors affected by different SSDF attacks. The second contribution is a pool of experiments analyzing and comparing the robustness of federated models according to i) three families of spectrum sensors, ii) eight SSDF attacks, iii) four scenarios dealing with unsupervised (anomaly detection) and supervised (binary classification) federated models, iv) up to 33% of malicious participants implementing data and model poisoning attacks, and v) four aggregation functions acting as anti-adversarial mechanisms to increase the models robustness. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2201.05410 [pdf, other]

doi 10.1109/TDSC.2023.3252918

CyberSpec: Intelligent Behavioral Fingerprinting to Detect Attacks on Crowdsensing Spectrum Sensors

Authors: Alberto Huertas Celdrán, Pedro Miguel Sánchez Sánchez, Gérôme Bovet, Gregorio Martínez Pérez, Burkhard Stiller

Abstract: Integrated sensing and communication (ISAC) is a novel paradigm using crowdsensing spectrum sensors to help with the management of spectrum scarcity. However, well-known vulnerabilities of resource-constrained spectrum sensors and the possibility of being manipulated by users with physical access complicate their protection against spectrum sensing data falsification (SSDF) attacks. Most recent li… ▽ More Integrated sensing and communication (ISAC) is a novel paradigm using crowdsensing spectrum sensors to help with the management of spectrum scarcity. However, well-known vulnerabilities of resource-constrained spectrum sensors and the possibility of being manipulated by users with physical access complicate their protection against spectrum sensing data falsification (SSDF) attacks. Most recent literature suggests using behavioral fingerprinting and Machine/Deep Learning (ML/DL) for improving similar cybersecurity issues. Nevertheless, the applicability of these techniques in resource-constrained devices, the impact of attacks affecting spectrum data integrity, and the performance and scalability of models suitable for heterogeneous sensors types are still open challenges. To improve limitations, this work presents seven SSDF attacks affecting spectrum sensors and introduces CyberSpec, an ML/DL-oriented framework using device behavioral fingerprinting to detect anomalies produced by SSDF attacks affecting resource-constrained spectrum sensors. CyberSpec has been implemented and validated in ElectroSense, a real crowdsensing RF monitoring platform where several configurations of the proposed SSDF attacks have been executed in different sensors. A pool of experiments with different unsupervised ML/DL-based models has demonstrated the suitability of CyberSpec detecting the previous attacks within an acceptable timeframe. △ Less

Submitted 14 January, 2022; originally announced January 2022.

arXiv:2111.14434 [pdf, other]

doi 10.1007/s10586-022-03949-w

Robust Federated Learning for execution time-based device model identification under label-flip** attack

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, José Rafael Buendía Rubio, Gérôme Bovet, Gregorio Martínez Pérez

Abstract: The computing device deployment explosion experienced in recent years, motivated by the advances of technologies such as Internet-of-Things (IoT) and 5G, has led to a global scenario with increasing cybersecurity risks and threats. Among them, device spoofing and impersonation cyberattacks stand out due to their impact and, usually, low complexity required to be launched. To solve this issue, seve… ▽ More The computing device deployment explosion experienced in recent years, motivated by the advances of technologies such as Internet-of-Things (IoT) and 5G, has led to a global scenario with increasing cybersecurity risks and threats. Among them, device spoofing and impersonation cyberattacks stand out due to their impact and, usually, low complexity required to be launched. To solve this issue, several solutions have emerged to identify device models and types based on the combination of behavioral fingerprinting and Machine/Deep Learning (ML/DL) techniques. However, these solutions are not appropriated for scenarios where data privacy and protection is a must, as they require data centralization for processing. In this context, newer approaches such as Federated Learning (FL) have not been fully explored yet, especially when malicious clients are present in the scenario setup. The present work analyzes and compares the device model identification performance of a centralized DL model with an FL one while using execution time-based events. For experimental purposes, a dataset containing execution-time features of 55 Raspberry Pis belonging to four different models has been collected and published. Using this dataset, the proposed solution achieved 0.9999 accuracy in both setups, centralized and federated, showing no performance decrease while preserving data privacy. Later, the impact of a label-flip** attack during the federated model training is evaluated, using several aggregation mechanisms as countermeasure. Zeno and coordinate-wise median aggregation show the best performance, although their performance greatly degrades when the percentage of fully malicious clients (all training samples poisoned) grows over 50%. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2106.08209 [pdf, other]

doi 10.1016/j.jnca.2022.103579

A methodology to identify identical single-board computers based on hardware behavior fingerprinting

Authors: Pedro Miguel Sánchez Sánchez, José María Jorquera Valero, Alberto Huertas Celdrán, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: The connectivity and resource-constrained nature of single-board devices open the door to cybersecurity concerns affecting Internet of Things (IoT) scenarios. One of the most important issues is the presence of unauthorized IoT devices that want to impersonate legitimate ones by using identical hardware and software specifications. This situation can provoke sensitive information leakages, data po… ▽ More The connectivity and resource-constrained nature of single-board devices open the door to cybersecurity concerns affecting Internet of Things (IoT) scenarios. One of the most important issues is the presence of unauthorized IoT devices that want to impersonate legitimate ones by using identical hardware and software specifications. This situation can provoke sensitive information leakages, data poisoning, or privilege escalation in IoT scenarios. Combining behavioral fingerprinting and Machine/Deep Learning (ML/DL) techniques is a promising approach to identify these malicious spoofing devices by detecting minor performance differences generated by imperfections in manufacturing. However, existing solutions are not suitable for single-board devices since they do not consider their hardware and software limitations, underestimate critical aspects such as fingerprint stability or context changes, and do not explore the potential of ML/DL techniques. To improve it, this work first identifies the essential properties for single-board device identification: uniqueness, stability, diversity, scalability, efficiency, robustness, and security. Then, a novel methodology relies on behavioral fingerprinting to identify identical single-board devices and meet the previous properties. The methodology leverages the different built-in components of the system and ML/DL techniques, comparing the device internal behavior with each other to detect manufacturing variations. The methodology validation has been performed in a real environment composed of 15 identical Raspberry Pi 4 B and 10 Raspberry Pi 3 B+ devices, obtaining a 91.9% average TPR and identifying all devices by setting a 50% threshold in the evaluation process. Finally, a discussion compares the proposed solution with related work, highlighting the fingerprint properties not met, and provides important lessons learned and limitations. △ Less

Submitted 22 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.05434 [pdf, other]

doi 10.1007/978-3-030-91424-0_1

FedDICE: A ransomware spread detection in a distributed integrated clinical environment using federated learning and SDN based mitigation

Authors: Chandra Thapa, Kallol Krishna Karmakar, Alberto Huertas Celdran, Seyit Camtepe, Vijay Varadharajan, Surya Nepal

Abstract: An integrated clinical environment (ICE) enables the connection and coordination of the internet of medical things around the care of patients in hospitals. However, ransomware attacks and their spread on hospital infrastructures, including ICE, are rising. Often the adversaries are targeting multiple hospitals with the same ransomware attacks. These attacks are detected by using machine learning… ▽ More An integrated clinical environment (ICE) enables the connection and coordination of the internet of medical things around the care of patients in hospitals. However, ransomware attacks and their spread on hospital infrastructures, including ICE, are rising. Often the adversaries are targeting multiple hospitals with the same ransomware attacks. These attacks are detected by using machine learning algorithms. But the challenge is devising the anti-ransomware learning mechanisms and services under the following conditions: (1) provide immunity to other hospitals if one of them got the attack, (2) hospitals are usually distributed over geographical locations, and (3) direct data sharing is avoided due to privacy concerns. In this regard, this paper presents a federated distributed integrated clinical environment, aka. FedDICE. FedDICE integrates federated learning (FL), which is privacy-preserving learning, to SDN-oriented security architecture to enable collaborative learning, detection, and mitigation of ransomware attacks. We demonstrate the importance of FedDICE in a collaborative environment with up to four hospitals and four popular ransomware families, namely WannaCry, Petya, BadRabbit, and PowerGhost. Our results find that in both IID and non-IID data setups, FedDICE achieves the centralized baseline performance that needs direct data sharing for detection. However, as a trade-off to data privacy, FedDICE observes overhead in the anti-ransomware model training, e.g., 28x for the logistic regression model. Besides, FedDICE utilizes SDN's dynamic network programmability feature to remove the infected devices in ICE. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Journal ref: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 402), 2021

arXiv:2106.04968 [pdf, other]

Eight Reasons Why Cybersecurity on Novel Generations of Brain-Computer Interfaces Must Be Prioritized

Authors: Sergio López Bernal, Alberto Huertas Celdrán, Gregorio Martínez Pérez

Abstract: This article presents eight neural cyberattacks affecting spontaneous neural activity, inspired by well-known cyberattacks from the computer science domain: Neural Flooding, Neural Jamming, Neural Scanning, Neural Selective Forwarding, Neural Spoofing, Neural Sybil, Neural Sinkhole and Neural Nonce. These cyberattacks are based on the exploitation of vulnerabilities existing in the new generation… ▽ More This article presents eight neural cyberattacks affecting spontaneous neural activity, inspired by well-known cyberattacks from the computer science domain: Neural Flooding, Neural Jamming, Neural Scanning, Neural Selective Forwarding, Neural Spoofing, Neural Sybil, Neural Sinkhole and Neural Nonce. These cyberattacks are based on the exploitation of vulnerabilities existing in the new generation of Brain-Computer Interfaces. After presenting their formal definitions, the cyberattacks have been implemented over a neuronal simulation. To evaluate the impact of each cyberattack, they have been implemented in a Convolutional Neural Network (CNN) simulating a portion of a mouse's visual cortex. This implementation is based on existing literature indicating the similarities that CNNs have with neuronal structures from the visual cortex. Some conclusions are also provided, indicating that Neural Nonce and Neural Jamming are the most impactful cyberattacks for short-term effects, while Neural Scanning and Neural Nonce are the most damaging for long-term effects. △ Less

Submitted 9 June, 2021; originally announced June 2021.

arXiv:2105.10997 [pdf, other]

Neuronal Jamming Cyberattack over Invasive BCI Affecting the Resolution of Tasks Requiring Visual Capabilities

Authors: Sergio López Bernal, Alberto Huertas Celdrán, Gregorio Martínez Pérez

Abstract: Invasive Brain-Computer Interfaces (BCI) are extensively used in medical application scenarios to record, stimulate, or inhibit neural activity with different purposes. An example is the stimulation of some brain areas to reduce the effects generated by Parkinson's disease. Despite the advances in recent years, cybersecurity on BCI is an open challenge since attackers can exploit the vulnerabiliti… ▽ More Invasive Brain-Computer Interfaces (BCI) are extensively used in medical application scenarios to record, stimulate, or inhibit neural activity with different purposes. An example is the stimulation of some brain areas to reduce the effects generated by Parkinson's disease. Despite the advances in recent years, cybersecurity on BCI is an open challenge since attackers can exploit the vulnerabilities of invasive BCIs to induce malicious stimulation or treatment disruption, affecting neuronal activity. In this work, we design and implement a novel neuronal cyberattack, called Neuronal Jamming (JAM), which prevents neurons from producing spikes. To implement and measure the JAM impact, and due to the lack of realistic neuronal topologies in mammalians, we have defined a use case with a Convolutional Neural Network (CNN) trained to allow a mouse to exit a particular maze. The resulting model has been translated to a neural topology, simulating a portion of a mouse's visual cortex. The impact of JAM on both biological and artificial networks is measured, analyzing how the attacks can both disrupt the spontaneous neural signaling and the mouse's capacity to exit the maze. Besides, we compare the impacts of both JAM and FLO (an existing neural cyberattack) demonstrating that JAM generates a higher impact in terms of neuronal spike rate. Finally, we discuss on whether and how JAM and FLO attacks could induce the effects of neurodegenerative diseases if the implanted BCI had a comprehensive electrode coverage of the targeted brain regions. △ Less

Submitted 23 May, 2021; originally announced May 2021.

arXiv:2104.09994 [pdf, other]

doi 10.1016/j.comnet.2021.108693

Federated Learning for Malware Detection in IoT Devices

Authors: Valerian Rey, Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Gérôme Bovet, Martin Jaggi

Abstract: This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to… ▽ More This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to evaluate the proposed framework. Both supervised and unsupervised federated models (multi-layer perceptron and autoencoder) able to detect malware affecting seen and unseen IoT devices of N-BaIoT have been trained and evaluated. Furthermore, their performance has been compared to two traditional approaches. The first one lets each participant locally train a model using only its own data, while the second consists of making the participants share their data with a central entity in charge of training a global model. This comparison has shown that the use of more diverse and large data, as done in the federated and centralized methods, has a considerable positive impact on the model performance. Besides, the federated models, while preserving the participant's privacy, show similar results as the centralized ones. As an additional contribution and to measure the robustness of the federated approach, an adversarial setup with several malicious participants poisoning the federated model has been considered. The baseline model aggregation averaging step used in most federated learning algorithms appears highly vulnerable to different attacks, even with a single adversary. The performance of other model aggregation functions acting as countermeasures is thus evaluated under the same attack scenarios. These functions provide a significant improvement against malicious participants, but more efforts are still needed to make federated approaches robust. △ Less

Submitted 19 November, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Journal ref: Rey, V., Sánchez, P. M. S., Celdrán, A. H., & Bovet, G. (2022). Federated learning for malware detection in iot devices. Computer Networks, 108693

arXiv:2008.03343 [pdf, other]

doi 10.1109/COMST.2021.3064259

A Survey on Device Behavior Fingerprinting: Data Sources, Techniques, Application Scenarios, and Datasets

Authors: Pedro Miguel Sánchez Sánchez, Jose María Jorquera Valero, Alberto Huertas Celdrán, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: In the current network-based computing world, where the number of interconnected devices grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at the same rate. To guarantee the correct functioning and performance of novel environments such as Smart Cities, Industry 4.0, or crowdsensing, it is crucial to identify the capabilities of their devices (e.g., senso… ▽ More In the current network-based computing world, where the number of interconnected devices grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at the same rate. To guarantee the correct functioning and performance of novel environments such as Smart Cities, Industry 4.0, or crowdsensing, it is crucial to identify the capabilities of their devices (e.g., sensors, actuators) and detect potential misbehavior that may arise due to cyberattacks, system faults, or misconfigurations. With this goal in mind, a promising research field emerged focusing on creating and managing fingerprints that model the behavior of both the device actions and its components. The article at hand studies the recent growth of the device behavior fingerprinting field in terms of application scenarios, behavioral sources, and processing and evaluation techniques. First, it performs a comprehensive review of the device types, behavioral data, and processing and evaluation techniques used by the most recent and representative research works dealing with two major scenarios: device identification and device misbehavior detection. After that, each work is deeply analyzed and compared, emphasizing its characteristics, advantages, and limitations. This article also provides researchers with a review of the most relevant characteristics of existing datasets as most of the novel processing techniques are based on machine learning and deep learning. Finally, it studies the evolution of these two scenarios in recent years, providing lessons learned, current trends, and future research challenges to guide new solutions in the area. △ Less

Submitted 3 March, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

arXiv:2007.09466 [pdf, other]

Cyberattacks on Miniature Brain Implants to Disrupt Spontaneous Neural Signaling

Authors: Sergio López Bernal, Alberto Huertas Celdrán, Lorenzo Fernández Maimó, Michael Taynnan Barros, Sasitharan Balasubramaniam, Gregorio Martínez Pérez

Abstract: Brain-Computer Interfaces (BCI) arose as systems that merge computing systems with the human brain to facilitate recording, stimulation, and inhibition of neural activity. Over the years, the development of BCI technologies has shifted towards miniaturization of devices that can be seamlessly embedded into the brain and can target single neuron or small population sensing and control. We present a… ▽ More Brain-Computer Interfaces (BCI) arose as systems that merge computing systems with the human brain to facilitate recording, stimulation, and inhibition of neural activity. Over the years, the development of BCI technologies has shifted towards miniaturization of devices that can be seamlessly embedded into the brain and can target single neuron or small population sensing and control. We present a motivating example highlighting vulnerabilities of two promising micron-scale BCI technologies, demonstrating the lack of security and privacy principles in existing solutions. This situation opens the door to a novel family of cyberattacks, called neuronal cyberattacks, affecting neuronal signaling. This paper defines the first two neural cyberattacks, Neuronal Flooding (FLO) and Neuronal Scanning (SCA), where each threat can affect the natural activity of neurons. This work implements these attacks in a neuronal simulator to determine their impact over the spontaneous neuronal behavior, defining three metrics: number of spikes, percentage of shifts, and dispersion of spikes. Several experiments demonstrate that both cyberattacks produce a reduction of spikes compared to spontaneous behavior, generating a rise in temporal shifts and a dispersion increase. Mainly, SCA presents a higher impact than FLO in the metrics focused on the number of spikes and dispersion, where FLO is slightly more damaging, considering the percentage of shifts. Nevertheless, the intrinsic behavior of each attack generates a differentiation on how they alter neuronal signaling. FLO is adequate to generate an immediate impact on the neuronal activity, whereas SCA presents higher effectiveness for damages to the neural signaling in the long-term. △ Less

Submitted 10 September, 2020; v1 submitted 18 July, 2020; originally announced July 2020.

arXiv:2004.07877 [pdf, other]

doi 10.1016/j.cose.2020.102168

AuthCODE: A Privacy-preserving and Multi-device Continuous Authentication Architecture based on Machine and Deep Learning

Authors: Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, Lorenzo Fernández Maimó, Gregorio Martínez Pérez

Abstract: The authentication field is evolving towards mechanisms able to keep users continuously authenticated without the necessity of remembering or possessing authentication credentials. While existing continuous authentication systems have demonstrated their suitability for single-device scenarios, the Internet of Things and next generation of mobile networks (5G) are enabling novel multi-device scenar… ▽ More The authentication field is evolving towards mechanisms able to keep users continuously authenticated without the necessity of remembering or possessing authentication credentials. While existing continuous authentication systems have demonstrated their suitability for single-device scenarios, the Internet of Things and next generation of mobile networks (5G) are enabling novel multi-device scenarios -- such as Smart Offices -- where continuous authentication is still an open challenge. The paper at hand, proposes an AI-based, privacy-preserving and multi-device continuous authentication architecture called AuthCODE. A realistic Smart Office scenario with several users, interacting with their mobile devices and personal computer, has been used to create a set of single- and multi-device behavioural datasets and validate AuthCODE. A pool of experiments with machine and deep learning classifiers measured the impact of time in authentication accuracy and improved the results of single-device approaches by considering multi-device behaviour profiles. The f1-score average reached for XGBoost on multi-device profiles based on 1-minute windows was 99.33%, while the best performance achieved for single devices was lower than 97.39%. The inclusion of temporal information in the form of vector sequences classified by a Long-Short Term Memory Network, allowed the identification of additional complex behaviour patterns associated to each user, resulting in an average f1-score of 99.02% on identification of long-term behaviours. △ Less

Submitted 30 November, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

arXiv:2004.00931 [pdf, other]

doi 10.1109/TNSM.2020.3031573

Spotting political social bots in Twitter: A use case of the 2019 Spanish general election

Authors: Javier Pastor-Galindo, Mattia Zago, Pantaleone Nespoli, Sergio López Bernal, Alberto Huertas Celdrán, Manuel Gil Pérez, José A. Ruipérez-Valiente, Gregorio Martínez Pérez, Félix Gómez Mármol

Abstract: While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Sp… ▽ More While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Spanish general election. Throughout our study, we classified involved users as social bots or humans, and examined their interactions from a quantitative (i.e., amount of traffic generated and existing relations) and qualitative (i.e., user's political affinity and sentiment towards the most important parties) perspectives. Results demonstrated that a non-negligible amount of those bots actively participated in the election, supporting each of the five principal political parties. △ Less

Submitted 12 October, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:1908.03536 [pdf, other]

doi 10.1145/3427376

Security in Brain-Computer Interfaces: State-of-the-art, opportunities, and future challenges

Authors: Sergio López Bernal, Alberto Huertas Celdrán, Gregorio Martínez Pérez, Michael Taynnan Barros, Sasitharan Balasubramaniam

Abstract: BCIs have significantly improved the patients' quality of life by restoring damaged hearing, sight, and movement capabilities. After evolving their application scenarios, the current trend of BCI is to enable new innovative brain-to-brain and brain-to-the-Internet communication paradigms. This technological advancement generates opportunities for attackers since users' personal information and phy… ▽ More BCIs have significantly improved the patients' quality of life by restoring damaged hearing, sight, and movement capabilities. After evolving their application scenarios, the current trend of BCI is to enable new innovative brain-to-brain and brain-to-the-Internet communication paradigms. This technological advancement generates opportunities for attackers since users' personal information and physical integrity could be under tremendous risk. This work presents the existing versions of the BCI life-cycle and homogenizes them in a new approach that overcomes current limitations. After that, we offer a qualitative characterization of the security attacks affecting each phase of the BCI cycle to analyze their impacts and countermeasures documented in the literature. Finally, we reflect on lessons learned, highlighting research trends and future challenges concerning security on BCIs. △ Less

Submitted 2 October, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

Showing 1–38 of 38 results for author: Celdrán, A H