-
Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Gérôme Bovet,
Gregorio Martínez Pérez
Abstract:
In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the lite…
▽ More
In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the literature. However, current ML/DL vulnerability detection methods struggle with understanding the context and intent behind complex attacks. Integrating large language models (LLMs) with system call analysis offers a promising approach to enhance malware detection. This work presents a novel framework leveraging LLMs to classify malware based on system call data. The framework uses transfer learning to adapt pre-trained LLMs for malware detection. By retraining LLMs on a dataset of benign and malicious system calls, the models are refined to detect signs of malware activity. Experiments with a dataset of over 1TB of system calls demonstrate that models with larger context sizes, such as BigBird and Longformer, achieve superior accuracy and F1-Score of approximately 0.86. The results highlight the importance of context size in improving detection rates and underscore the trade-offs between computational complexity and performance. This approach shows significant potential for real-time detection in high-stakes environments, offering a robust solution to evolving cyber threats.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Aperiodic two-layer energy management system for community microgrids based on blockchain strategy
Authors:
Miguel Gayo Abeleira,
Carlos Santos Pérez,
Francisco Javier Rodríguez Sánchez,
Pedro Martín Sánchez,
Enrique Santiso Gómez,
José Antonio Jiménez Calvo
Abstract:
This work proposes a geographically-based split of the community microgrids into clusters of members that tend to have similar consumption and generation profiles. Assuming a community microgrid divided into clusters, a two-layer architecture is developed to facilitate the greater penetration of distributed energy resources in an efficient way. The first layer, referred as the market layer, is res…
▽ More
This work proposes a geographically-based split of the community microgrids into clusters of members that tend to have similar consumption and generation profiles. Assuming a community microgrid divided into clusters, a two-layer architecture is developed to facilitate the greater penetration of distributed energy resources in an efficient way. The first layer, referred as the market layer, is responsible for creating local energy markets with the aim of maximising the economic benefits for community microgrid members. The second layer is responsible for the network reconfiguration, which is based on the energy balance within each cluster. This layer complies with the IEC 61850 communication standard, in order to control commercial sectionalizing and tie switches. This allows the community microgrid network to be reconfigured to minimise energy exchanges with the main grid. To implement this two-layer energy management strategy, an aperiodic market approach based on Blockchain technology, and the additional functionality offered by Smart Contracts is adopted. This embraces the concept of energy communities since it decentralizes the control and eliminates intermediaries. The use of aperiodic control techniques helps to overcome the challenges of using Blockchain technology in terms of storage, computational requirements and member privacy. The scalability and modularity of the Smart Contract-based system allow each cluster of members to be designed by tailoring the system to their specific needs. The implementation of this strategy is based on low-cost off-the-shelf devices, such as Raspberry Pi 4 Model B boards, which operate as Blockchain nodes of community microgrid members. Finally, the strategy has been validated by emulating two use cases based on the IEEE 123-node system network model highlighting the benefits of the proposal.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Assessing the Sustainability and Trustworthiness of Federated Learning Models
Authors:
Alberto Huertas Celdran,
Chao Feng,
Pedro Miguel Sanchez Sanchez,
Lynn Zumtaugwald,
Gerome Bovet,
Burkhard Stiller
Abstract:
Artificial intelligence (AI) plays a pivotal role in various sectors, influencing critical decision-making processes in our daily lives. Within the AI landscape, novel AI paradigms, such as Federated Learning (FL), focus on preserving data privacy while collaboratively training AI models. In such a context, a group of experts from the European Commission (AI-HLEG) has identified sustainable AI as…
▽ More
Artificial intelligence (AI) plays a pivotal role in various sectors, influencing critical decision-making processes in our daily lives. Within the AI landscape, novel AI paradigms, such as Federated Learning (FL), focus on preserving data privacy while collaboratively training AI models. In such a context, a group of experts from the European Commission (AI-HLEG) has identified sustainable AI as one of the key elements that must be considered to provide trustworthy AI. While existing literature offers several taxonomies and solutions for assessing the trustworthiness of FL models, a significant gap exists in considering sustainability and the carbon footprint associated with FL. Thus, this work introduces the sustainability pillar to the most recent and comprehensive trustworthy FL taxonomy, making this work the first to address all AI-HLEG requirements. The sustainability pillar assesses the FL system environmental impact, incorporating notions and metrics for hardware efficiency, federation complexity, and energy grid carbon intensity. Then, this work designs and implements an algorithm for evaluating the trustworthiness of FL models by incorporating the sustainability pillar. Extensive evaluations with the FederatedScope framework and various scenarios varying federation participants, complexities, hardware, and energy grids demonstrate the usefulness of the proposed solution.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
The stellar occultation by (319) Leona on 13 September 2023 in preparation for the occultation of Betelgeuse
Authors:
J. L. Ortiz,
M. Kretlow,
C. Schnabel,
N. Morales,
J. Flores-Martín,
M. Sánchez González,
F. Casarramona,
A. Selva,
C. Perelló,
A. Román-Reche,
S. Alonso,
J. L. Rizos,
R. Gonçalves,
A. Castillo,
J. M. Madiedo,
P. Martínez Sánchez,
J. M. Fernández andújar,
J. L. Maestre,
E. Smith,
M. Gil,
V. Pelenjow,
S. Moral Soriano,
J. Martí,
P. L. Luque-Escamilla,
R. Casas
, et al. (14 additional authors not shown)
Abstract:
On 12 December 2023, the star $α$ Orionis (Betelgeuse) will be occulted by the asteroid (319) Leona. This represents an extraordinary and unique opportunity to analyze the diameter and brightness distribution of Betelgeuse's photosphere with extreme angular resolution by studying the light curve as the asteroid occults the star from different points on Earth and at different wavelengths. Here we p…
▽ More
On 12 December 2023, the star $α$ Orionis (Betelgeuse) will be occulted by the asteroid (319) Leona. This represents an extraordinary and unique opportunity to analyze the diameter and brightness distribution of Betelgeuse's photosphere with extreme angular resolution by studying the light curve as the asteroid occults the star from different points on Earth and at different wavelengths. Here we present observations of another occultation by Leona on 13 September 2023 to determine its projected shape and size in preparation for the December 12th event. The occultation observation campaign was highly successful. The effective diameter in projected area derived from the positive detections at 17 sites turned out to be 66 km $\pm$ 2 km using an elliptical fit to the instantaneous limb. The body is highly elongated, with dimensions of 79.6 $\pm$ 2.2 km x 54.8 $\pm$ 1.3 km in its long and short axis, respectively, at the occultation time. Also, an accurate position coming from the occultation, to improve the orbit determination of Leona for December 12 is provided.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation
Authors:
Chao Feng,
Alberto Huertas Celdran,
Pedro Miguel Sanchez Sanchez,
Jan Kreischer,
Jan von der Assen,
Gerome Bovet,
Gregorio Martinez Perez,
Burkhard Stiller
Abstract:
Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective agains…
▽ More
Recent research has shown that the integration of Reinforcement Learning (RL) with Moving Target Defense (MTD) can enhance cybersecurity in Internet-of-Things (IoT) devices. Nevertheless, the practicality of existing work is hindered by data privacy concerns associated with centralized data processing in RL, and the unsatisfactory time needed to learn right MTD techniques that are effective against a rising number of heterogeneous zero-day attacks. Thus, this work presents CyberForce, a framework that combines Federated and Reinforcement Learning (FRL) to collaboratively and privately learn suitable MTD techniques for mitigating zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been deployed and evaluated in a scenario consisting of ten physical devices of a real IoT platform affected by heterogeneous malware samples. A pool of experiments has demonstrated that CyberForce learns the MTD technique mitigating each attack faster than existing RL-based centralized approaches. In addition, when various devices are exposed to different attacks, CyberForce benefits from knowledge transfer, leading to enhanced performance and reduced learning time in comparison to recent works. Finally, different aggregation algorithms used during the agent learning process provide CyberForce with notable robustness to malicious attacks.
△ Less
Submitted 8 September, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
TemporalFED: Detecting Cyberattacks in Industrial Time-Series Data Using Decentralized Federated Learning
Authors:
Ángel Luis Perales Gómez,
Enrique Tomás Martínez Beltrán,
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán
Abstract:
Industry 4.0 has brought numerous advantages, such as increasing productivity through automation. However, it also presents major cybersecurity issues such as cyberattacks affecting industrial processes. Federated Learning (FL) combined with time-series analysis is a promising cyberattack detection mechanism proposed in the literature. However, the fact of having a single point of failure and netw…
▽ More
Industry 4.0 has brought numerous advantages, such as increasing productivity through automation. However, it also presents major cybersecurity issues such as cyberattacks affecting industrial processes. Federated Learning (FL) combined with time-series analysis is a promising cyberattack detection mechanism proposed in the literature. However, the fact of having a single point of failure and network bottleneck are critical challenges that need to be tackled. Thus, this article explores the benefits of the Decentralized Federated Learning (DFL) in terms of cyberattack detection and resource consumption. The work presents TemporalFED, a software module for detecting anomalies in industrial environments using FL paradigms and time series. TemporalFED incorporates three components: Time Series Conversion, Feature Engineering, and Time Series Stationary Conversion. To evaluate TemporalFED, it was deployed on Fedstellar, a DFL framework. Then, a pool of experiments measured the detection performance and resource consumption in a chemical gas industrial environment with different time-series configurations, FL paradigms, and topologies. The results showcase the superiority of the configuration utilizing DFL and Semi-Decentralized Federated Learning (SDFL) paradigms, along with a fully connected topology, which achieved the best performance in anomaly detection. Regarding resource consumption, the configuration without feature engineering employed less bandwidth, CPU, and RAM than other configurations.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Mitigating Communications Threats in Decentralized Federated Learning through Moving Target Defense
Authors:
Enrique Tomás Martínez Beltrán,
Pedro Miguel Sánchez Sánchez,
Sergio López Bernal,
Gérôme Bovet,
Manuel Gil Pérez,
Gregorio Martínez Pérez,
Alberto Huertas Celdrán
Abstract:
The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decent…
▽ More
The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decentralized nature of the aggregation process, the varied roles and responsibilities of the participants, and the absence of a central authority to oversee and mitigate threats. Addressing these challenges, this paper first delineates a comprehensive threat model focused on DFL communications. In response to these identified risks, this work introduces a security module to counter communication-based attacks for DFL platforms. The module combines security techniques such as symmetric and asymmetric encryption with Moving Target Defense (MTD) techniques, including random neighbor selection and IP/port switching. The security module is implemented in a DFL platform, Fedstellar, allowing the deployment and monitoring of the federation. A DFL scenario with physical and virtual deployments have been executed, encompassing three security configurations: (i) a baseline without security, (ii) an encrypted configuration, and (iii) a configuration integrating both encryption and MTD techniques. The effectiveness of the security module is validated through experiments with the MNIST dataset and eclipse attacks. The results showed an average F1 score of 95%, with the most secure configuration resulting in CPU usage peaking at 68% (+-9%) in virtual deployments and network traffic reaching 480.8 MB (+-18 MB), effectively mitigating risks associated with eavesdrop** or eclipse attacks.
△ Less
Submitted 9 December, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
RansomAI: AI-powered Ransomware for Stealthy Encryption
Authors:
Jan von der Assen,
Alberto Huertas Celdrán,
Janik Luechinger,
Pedro Miguel Sánchez Sánchez,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
Cybersecurity solutions have shown promising performance when detecting ransomware samples that use fixed algorithms and encryption rates. However, due to the current explosion of Artificial Intelligence (AI), sooner than later, ransomware (and malware in general) will incorporate AI techniques to intelligently and dynamically adapt its encryption behavior to be undetected. It might result in inef…
▽ More
Cybersecurity solutions have shown promising performance when detecting ransomware samples that use fixed algorithms and encryption rates. However, due to the current explosion of Artificial Intelligence (AI), sooner than later, ransomware (and malware in general) will incorporate AI techniques to intelligently and dynamically adapt its encryption behavior to be undetected. It might result in ineffective and obsolete cybersecurity solutions, but the literature lacks AI-powered ransomware to verify it. Thus, this work proposes RansomAI, a Reinforcement Learning-based framework that can be integrated into existing ransomware samples to adapt their encryption behavior and stay stealthy while encrypting files. RansomAI presents an agent that learns the best encryption algorithm, rate, and duration that minimizes its detection (using a reward mechanism and a fingerprinting intelligent detection system) while maximizing its damage function. The proposed framework was validated in a ransomware, Ransomware-PoC, that infected a Raspberry Pi 4, acting as a crowdsensor. A pool of experiments with Deep Q-Learning and Isolation Forest (deployed on the agent and detection system, respectively) has demonstrated that RansomAI evades the detection of Ransomware-PoC affecting the Raspberry Pi 4 in a few minutes with >90% accuracy.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Fedstellar: A Platform for Decentralized Federated Learning
Authors:
Enrique Tomás Martínez Beltrán,
Ángel Luis Perales Gómez,
Chao Feng,
Pedro Miguel Sánchez Sánchez,
Sergio López Bernal,
Gérôme Bovet,
Manuel Gil Pérez,
Gregorio Martínez Pérez,
Alberto Huertas Celdrán
Abstract:
In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, sin…
▽ More
In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, single point of failure, and reliance on a central server. Decentralized Federated Learning (DFL) addresses these issues by enabling decentralized model aggregation and minimizing dependency on a central entity. Despite these advances, current platforms training DFL models struggle with key issues such as managing heterogeneous federation network topologies. To overcome these challenges, this paper presents Fedstellar, a platform extended from p2pfl library and designed to train FL models in a decentralized, semi-decentralized, and centralized fashion across diverse federations of physical or virtualized devices. The Fedstellar implementation encompasses a web application with an interactive graphical interface, a controller for deploying federations of nodes using physical or virtual devices, and a core deployed on each device which provides the logic needed to train, aggregate, and communicate in the network. The effectiveness of the platform has been demonstrated in two scenarios: a physical deployment involving single-board devices such as Raspberry Pis for detecting cyberattacks, and a virtualized deployment comparing various FL approaches in a controlled environment using MNIST and CIFAR-10 datasets. In both scenarios, Fedstellar demonstrated consistent performance and adaptability, achieving F1 scores of 91%, 98%, and 91.2% using DFL for detecting cyberattacks and classifying MNIST and CIFAR-10, respectively, reducing training time by 32% compared to centralized approaches.
△ Less
Submitted 8 April, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Single-board Device Individual Authentication based on Hardware Performance and Autoencoder Transformer Models
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Gérôme Bovet,
Gregorio Martínez Pérez
Abstract:
The proliferation of the Internet of Things (IoT) has led to the emergence of crowdsensing applications, where a multitude of interconnected devices collaboratively collect and analyze data. Ensuring the authenticity and integrity of the data collected by these devices is crucial for reliable decision-making and maintaining trust in the system. Traditional authentication methods are often vulnerab…
▽ More
The proliferation of the Internet of Things (IoT) has led to the emergence of crowdsensing applications, where a multitude of interconnected devices collaboratively collect and analyze data. Ensuring the authenticity and integrity of the data collected by these devices is crucial for reliable decision-making and maintaining trust in the system. Traditional authentication methods are often vulnerable to attacks or can be easily duplicated, posing challenges to securing crowdsensing applications. Besides, current solutions leveraging device behavior are mostly focused on device identification, which is a simpler task than authentication. To address these issues, an individual IoT device authentication framework based on hardware behavior fingerprinting and Transformer autoencoders is proposed in this work. This solution leverages the inherent imperfections and variations in IoT device hardware to differentiate between devices with identical specifications. By monitoring and analyzing the behavior of key hardware components, such as the CPU, GPU, RAM, and Storage on devices, unique fingerprints for each device are created. The performance samples are considered as time series data and used to train outlier detection transformer models, one per device and aiming to model its normal data distribution. Then, the framework is validated within a spectrum crowdsensing system leveraging Raspberry Pi devices. After a pool of experiments, the model from each device is able to individually authenticate it between the 45 devices employed for validation. An average True Positive Rate (TPR) of 0.74+-0.13 and an average maximum False Positive Rate (FPR) of 0.06+-0.09 demonstrate the effectiveness of this approach in enhancing authentication, security, and trust in crowdsensing applications.
△ Less
Submitted 11 November, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
FederatedTrust: A Solution for Trustworthy Federated Learning
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Ning Xie,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
The rapid expansion of the Internet of Things (IoT) and Edge Computing has presented challenges for centralized Machine and Deep Learning (ML/DL) methods due to the presence of distributed data silos that hold sensitive information. To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged. However, ensuring data pri…
▽ More
The rapid expansion of the Internet of Things (IoT) and Edge Computing has presented challenges for centralized Machine and Deep Learning (ML/DL) methods due to the presence of distributed data silos that hold sensitive information. To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged. However, ensuring data privacy and performance alone is insufficient since there is a growing need to establish trust in model predictions. Existing literature has proposed various approaches on trustworthy ML/DL (excluding data privacy), identifying robustness, fairness, explainability, and accountability as important pillars. Nevertheless, further research is required to identify trustworthiness pillars and evaluation metrics specifically relevant to FL models, as well as to develop solutions that can compute the trustworthiness level of FL models. This work examines the existing requirements for evaluating trustworthiness in FL and introduces a comprehensive taxonomy consisting of six pillars (privacy, robustness, fairness, explainability, accountability, and federation), along with over 30 metrics for computing the trustworthiness of FL models. Subsequently, an algorithm named FederatedTrust is designed based on the pillars and metrics identified in the taxonomy to compute the trustworthiness score of FL models. A prototype of FederatedTrust is implemented and integrated into the learning process of FederatedScope, a well-established FL framework. Finally, five experiments are conducted using different configurations of FederatedScope to demonstrate the utility of FederatedTrust in computing the trustworthiness of FL models. Three experiments employ the FEMNIST dataset, and two utilize the N-BaIoT dataset considering a real-world IoT security use case.
△ Less
Submitted 6 July, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Adversarial attacks and defenses on ML- and hardware-based IoT device fingerprinting and identification
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Gérôme Bovet,
Gregorio Martínez Pérez
Abstract:
In the last years, the number of IoT devices deployed has suffered an undoubted explosion, reaching the scale of billions. However, some new cybersecurity issues have appeared together with this development. Some of these issues are the deployment of unauthorized devices, malicious code modification, malware deployment, or vulnerability exploitation. This fact has motivated the requirement for new…
▽ More
In the last years, the number of IoT devices deployed has suffered an undoubted explosion, reaching the scale of billions. However, some new cybersecurity issues have appeared together with this development. Some of these issues are the deployment of unauthorized devices, malicious code modification, malware deployment, or vulnerability exploitation. This fact has motivated the requirement for new device identification mechanisms based on behavior monitoring. Besides, these solutions have recently leveraged Machine and Deep Learning techniques due to the advances in this field and the increase in processing capabilities. In contrast, attackers do not stay stalled and have developed adversarial attacks focused on context modification and ML/DL evaluation evasion applied to IoT device identification solutions. This work explores the performance of hardware behavior-based individual device identification, how it is affected by possible context- and ML/DL-focused attacks, and how its resilience can be improved using defense techniques. In this sense, it proposes an LSTM-CNN architecture based on hardware performance behavior for individual device identification. Then, previous techniques have been compared with the proposed architecture using a hardware performance dataset collected from 45 Raspberry Pi devices running identical software. The LSTM-CNN improves previous solutions achieving a +0.96 average F1-Score and 0.8 minimum TPR for all devices. Afterward, context- and ML/DL-focused adversarial attacks were applied against the previous model to test its robustness. A temperature-based context attack was not able to disrupt the identification. However, some ML/DL state-of-the-art evasion attacks were successful. Finally, adversarial training and model distillation defense techniques are selected to improve the model resilience to evasion attacks, without degrading its performance.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoT
Authors:
Alberto Huertas Celdrán,
Pedro Miguel Sánchez Sánchez,
Jan von der Assen,
Timo Schenk,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learnin…
▽ More
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges
Authors:
Enrique Tomás Martínez Beltrán,
Mario Quiles Pérez,
Pedro Miguel Sánchez Sánchez,
Sergio López Bernal,
Gérôme Bovet,
Manuel Gil Pérez,
Gregorio Martínez Pérez,
Alberto Huertas Celdrán
Abstract:
In recent years, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustwo…
▽ More
In recent years, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustworthiness concerns affecting the entity responsible for the global model creation. Decentralized Federated Learning (DFL) emerged to address these concerns by promoting decentralized model aggregation and minimizing reliance on centralized architectures. However, despite the work done in DFL, the literature has not (i) studied the main aspects differentiating DFL and CFL; (ii) analyzed DFL frameworks to create and evaluate new solutions; and (iii) reviewed application scenarios using DFL. Thus, this article identifies and analyzes the main fundamentals of DFL in terms of federation architectures, topologies, communication mechanisms, security approaches, and key performance indicators. Additionally, the paper at hand explores existing mechanisms to optimize critical DFL fundamentals. Then, the most relevant features of the current DFL frameworks are reviewed and compared. After that, it analyzes the most used DFL application scenarios, identifying solutions based on the fundamentals and frameworks previously defined. Finally, the evolution of existing DFL solutions is studied to provide a list of trends, lessons learned, and open challenges.
△ Less
Submitted 13 September, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Trust-as-a-Service: A reputation-enabled trust framework for 5G networks
Authors:
José María Jorquera Valero,
Pedro Miguel Sánchez Sánchez,
Manuel Gil Pérez,
Alberto Huertas Celdrán,
Gregorio Martínez Pérez
Abstract:
Trust, security, and privacy are three of the major pillars to assemble the fifth generation network and beyond. Despite such pillars are principally interconnected, they arise a multitude of challenges to be addressed separately. 5G ought to offer flexible and pervasive computing capabilities across multiple domains according to user demands and assuring trustworthy network providers. Distributed…
▽ More
Trust, security, and privacy are three of the major pillars to assemble the fifth generation network and beyond. Despite such pillars are principally interconnected, they arise a multitude of challenges to be addressed separately. 5G ought to offer flexible and pervasive computing capabilities across multiple domains according to user demands and assuring trustworthy network providers. Distributed marketplaces expect to boost the trading of heterogeneous resources so as to enable the establishment of pervasive service chains between cross-domains. Nevertheless, the need for reliable parties as ``marketplace operators'' plays a pivotal role to achieving a trustworthy ecosystem. One of the principal blockages in managing foreseeable networks is the need of adapting previous trust models to accomplish the new network and business requirements. In this regard, this article is centered on trust management of 5G multi-party networks. The design of a reputation-based trust framework is proposed as a Trust-as-a-Service (TaaS) solution for any distributed multi-stakeholder environment where zero trust and zero-touch principles should be met. Besides, a literature review is also conducted to recognize the network and business requirements currently envisaged. Finally, the validation of the proposed trust framework is performed in a real research environment, the 5GBarcelona testbed, leveraging 12% of a 2.1GHz CPU with 20 cores and 2% of the 30GiB memory. In this regard, these outcomes reveal the feasibility of the TaaS solution in the context of determining reliable network operators.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Analyzing the Robustness of Decentralized Horizontal and Vertical Federated Learning Architectures in a Non-IID Scenario
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Enrique Tomás Martínez Beltrán,
Daniel Demeter,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
Federated learning (FL) allows participants to collaboratively train machine and deep learning models while protecting data privacy. However, the FL paradigm still presents drawbacks affecting its trustworthiness since malicious participants could launch adversarial attacks against the training process. Related work has studied the robustness of horizontal FL scenarios under different attacks. How…
▽ More
Federated learning (FL) allows participants to collaboratively train machine and deep learning models while protecting data privacy. However, the FL paradigm still presents drawbacks affecting its trustworthiness since malicious participants could launch adversarial attacks against the training process. Related work has studied the robustness of horizontal FL scenarios under different attacks. However, there is a lack of work evaluating the robustness of decentralized vertical FL and comparing it with horizontal FL architectures affected by adversarial attacks. Thus, this work proposes three decentralized FL architectures, one for horizontal and two for vertical scenarios, namely HoriChain, VertiChain, and VertiComb. These architectures present different neural networks and training protocols suitable for horizontal and vertical scenarios. Then, a decentralized, privacy-preserving, and federated use case with non-IID data to classify handwritten digits is deployed to evaluate the performance of the three architectures. Finally, a set of experiments computes and compares the robustness of the proposed architectures when they are affected by different data poisoning based on image watermarks and gradient poisoning adversarial attacks. The experiments show that even though particular configurations of both attacks can destroy the classification performance of the architectures, HoriChain is the most robust one.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
A Lightweight Moving Target Defense Framework for Multi-purpose Malware Affecting IoT Devices
Authors:
Jan von der Assen,
Alberto Huertas Celdrán,
Pedro Miguel Sánchez Sánchez,
Jordan Cedeño,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
Malware affecting Internet of Things (IoT) devices is rapidly growing due to the relevance of this paradigm in real-world scenarios. Specialized literature has also detected a trend towards multi-purpose malware able to execute different malicious actions such as remote control, data leakage, encryption, or code hiding, among others. Protecting IoT devices against this kind of malware is challengi…
▽ More
Malware affecting Internet of Things (IoT) devices is rapidly growing due to the relevance of this paradigm in real-world scenarios. Specialized literature has also detected a trend towards multi-purpose malware able to execute different malicious actions such as remote control, data leakage, encryption, or code hiding, among others. Protecting IoT devices against this kind of malware is challenging due to their well-known vulnerabilities and limitation in terms of CPU, memory, and storage. To improve it, the moving target defense (MTD) paradigm was proposed a decade ago and has shown promising results, but there is a lack of IoT MTD solutions dealing with multi-purpose malware. Thus, this work proposes four MTD mechanisms changing IoT devices' network, data, and runtime environment to mitigate multi-purpose malware. Furthermore, it presents a lightweight and IoT-oriented MTD framework to decide what, when, and how the MTD mechanisms are deployed. Finally, the efficiency and effectiveness of the framework and MTD mechanisms are evaluated in a real-world scenario with one IoT spectrum sensor affected by multi-purpose malware.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
LwHBench: A low-level hardware component benchmark and dataset for Single Board Computers
Authors:
Pedro Miguel Sánchez Sánchez,
José María Jorquera Valero,
Alberto Huertas Celdrán,
Gérôme Bovet,
Manuel Gil Pérez,
Gregorio Martínez Pérez
Abstract:
In today's computing environment, where Artificial Intelligence (AI) and data processing are moving toward the Internet of Things (IoT) and Edge computing paradigms, benchmarking resource-constrained devices is a critical task to evaluate their suitability and performance. Between the employed devices, Single-Board Computers arise as multi-purpose and affordable systems. The literature has explore…
▽ More
In today's computing environment, where Artificial Intelligence (AI) and data processing are moving toward the Internet of Things (IoT) and Edge computing paradigms, benchmarking resource-constrained devices is a critical task to evaluate their suitability and performance. Between the employed devices, Single-Board Computers arise as multi-purpose and affordable systems. The literature has explored Single-Board Computers performance when running high-level benchmarks specialized in particular application scenarios, such as AI or medical applications. However, lower-level benchmarking applications and datasets are needed to enable new Edge-based AI solutions for network, system and service management based on device and component performance, such as individual device identification. Thus, this paper presents LwHBench, a low-level hardware benchmarking application for Single-Board Computers that measures the performance of CPU, GPU, Memory and Storage taking into account the component constraints in these types of devices. LwHBench has been implemented for Raspberry Pi devices and run for 100 days on a set of 45 devices to generate an extensive dataset that allows the usage of AI techniques in scenarios where performance data can help in the device management process. Besides, to demonstrate the inter-scenario capability of the dataset, a series of AI-enabled use cases about device identification and context impact on performance are presented as exploration of the published data. Finally, the benchmark application has been adapted and applied to an agriculture-focused scenario where three RockPro64 devices are present.
△ Less
Submitted 24 October, 2022; v1 submitted 18 April, 2022;
originally announced April 2022.
-
Studying the Robustness of Anti-adversarial Federated Learning Models Detecting Cyberattacks in IoT Spectrum Sensors
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Timo Schenk,
Adrian Lars Benjamin Iten,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
Device fingerprinting combined with Machine and Deep Learning (ML/DL) report promising performance when detecting cyberattacks targeting data managed by resource-constrained spectrum sensors. However, the amount of data needed to train models and the privacy concerns of such scenarios limit the applicability of centralized ML/DL-based approaches. Federated learning (FL) addresses these limitations…
▽ More
Device fingerprinting combined with Machine and Deep Learning (ML/DL) report promising performance when detecting cyberattacks targeting data managed by resource-constrained spectrum sensors. However, the amount of data needed to train models and the privacy concerns of such scenarios limit the applicability of centralized ML/DL-based approaches. Federated learning (FL) addresses these limitations by creating federated and privacy-preserving models. However, FL is vulnerable to malicious participants, and the impact of adversarial attacks on federated models detecting spectrum sensing data falsification (SSDF) attacks on spectrum sensors has not been studied. To address this challenge, the first contribution of this work is the creation of a novel dataset suitable for FL and modeling the behavior (usage of CPU, memory, or file system, among others) of resource-constrained spectrum sensors affected by different SSDF attacks. The second contribution is a pool of experiments analyzing and comparing the robustness of federated models according to i) three families of spectrum sensors, ii) eight SSDF attacks, iii) four scenarios dealing with unsupervised (anomaly detection) and supervised (binary classification) federated models, iv) up to 33% of malicious participants implementing data and model poisoning attacks, and v) four aggregation functions acting as anti-adversarial mechanisms to increase the models robustness.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
CyberSpec: Intelligent Behavioral Fingerprinting to Detect Attacks on Crowdsensing Spectrum Sensors
Authors:
Alberto Huertas Celdrán,
Pedro Miguel Sánchez Sánchez,
Gérôme Bovet,
Gregorio Martínez Pérez,
Burkhard Stiller
Abstract:
Integrated sensing and communication (ISAC) is a novel paradigm using crowdsensing spectrum sensors to help with the management of spectrum scarcity. However, well-known vulnerabilities of resource-constrained spectrum sensors and the possibility of being manipulated by users with physical access complicate their protection against spectrum sensing data falsification (SSDF) attacks. Most recent li…
▽ More
Integrated sensing and communication (ISAC) is a novel paradigm using crowdsensing spectrum sensors to help with the management of spectrum scarcity. However, well-known vulnerabilities of resource-constrained spectrum sensors and the possibility of being manipulated by users with physical access complicate their protection against spectrum sensing data falsification (SSDF) attacks. Most recent literature suggests using behavioral fingerprinting and Machine/Deep Learning (ML/DL) for improving similar cybersecurity issues. Nevertheless, the applicability of these techniques in resource-constrained devices, the impact of attacks affecting spectrum data integrity, and the performance and scalability of models suitable for heterogeneous sensors types are still open challenges. To improve limitations, this work presents seven SSDF attacks affecting spectrum sensors and introduces CyberSpec, an ML/DL-oriented framework using device behavioral fingerprinting to detect anomalies produced by SSDF attacks affecting resource-constrained spectrum sensors. CyberSpec has been implemented and validated in ElectroSense, a real crowdsensing RF monitoring platform where several configurations of the proposed SSDF attacks have been executed in different sensors. A pool of experiments with different unsupervised ML/DL-based models has demonstrated the suitability of CyberSpec detecting the previous attacks within an acceptable timeframe.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
Robust Federated Learning for execution time-based device model identification under label-flip** attack
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
José Rafael Buendía Rubio,
Gérôme Bovet,
Gregorio Martínez Pérez
Abstract:
The computing device deployment explosion experienced in recent years, motivated by the advances of technologies such as Internet-of-Things (IoT) and 5G, has led to a global scenario with increasing cybersecurity risks and threats. Among them, device spoofing and impersonation cyberattacks stand out due to their impact and, usually, low complexity required to be launched. To solve this issue, seve…
▽ More
The computing device deployment explosion experienced in recent years, motivated by the advances of technologies such as Internet-of-Things (IoT) and 5G, has led to a global scenario with increasing cybersecurity risks and threats. Among them, device spoofing and impersonation cyberattacks stand out due to their impact and, usually, low complexity required to be launched. To solve this issue, several solutions have emerged to identify device models and types based on the combination of behavioral fingerprinting and Machine/Deep Learning (ML/DL) techniques. However, these solutions are not appropriated for scenarios where data privacy and protection is a must, as they require data centralization for processing. In this context, newer approaches such as Federated Learning (FL) have not been fully explored yet, especially when malicious clients are present in the scenario setup. The present work analyzes and compares the device model identification performance of a centralized DL model with an FL one while using execution time-based events. For experimental purposes, a dataset containing execution-time features of 55 Raspberry Pis belonging to four different models has been collected and published. Using this dataset, the proposed solution achieved 0.9999 accuracy in both setups, centralized and federated, showing no performance decrease while preserving data privacy. Later, the impact of a label-flip** attack during the federated model training is evaluated, using several aggregation mechanisms as countermeasure. Zeno and coordinate-wise median aggregation show the best performance, although their performance greatly degrades when the percentage of fully malicious clients (all training samples poisoned) grows over 50%.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
A methodology to identify identical single-board computers based on hardware behavior fingerprinting
Authors:
Pedro Miguel Sánchez Sánchez,
José María Jorquera Valero,
Alberto Huertas Celdrán,
Gérôme Bovet,
Manuel Gil Pérez,
Gregorio Martínez Pérez
Abstract:
The connectivity and resource-constrained nature of single-board devices open the door to cybersecurity concerns affecting Internet of Things (IoT) scenarios. One of the most important issues is the presence of unauthorized IoT devices that want to impersonate legitimate ones by using identical hardware and software specifications. This situation can provoke sensitive information leakages, data po…
▽ More
The connectivity and resource-constrained nature of single-board devices open the door to cybersecurity concerns affecting Internet of Things (IoT) scenarios. One of the most important issues is the presence of unauthorized IoT devices that want to impersonate legitimate ones by using identical hardware and software specifications. This situation can provoke sensitive information leakages, data poisoning, or privilege escalation in IoT scenarios. Combining behavioral fingerprinting and Machine/Deep Learning (ML/DL) techniques is a promising approach to identify these malicious spoofing devices by detecting minor performance differences generated by imperfections in manufacturing. However, existing solutions are not suitable for single-board devices since they do not consider their hardware and software limitations, underestimate critical aspects such as fingerprint stability or context changes, and do not explore the potential of ML/DL techniques. To improve it, this work first identifies the essential properties for single-board device identification: uniqueness, stability, diversity, scalability, efficiency, robustness, and security. Then, a novel methodology relies on behavioral fingerprinting to identify identical single-board devices and meet the previous properties. The methodology leverages the different built-in components of the system and ML/DL techniques, comparing the device internal behavior with each other to detect manufacturing variations. The methodology validation has been performed in a real environment composed of 15 identical Raspberry Pi 4 B and 10 Raspberry Pi 3 B+ devices, obtaining a 91.9% average TPR and identifying all devices by setting a 50% threshold in the evaluation process. Finally, a discussion compares the proposed solution with related work, highlighting the fingerprint properties not met, and provides important lessons learned and limitations.
△ Less
Submitted 22 June, 2022; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Federated Learning for Malware Detection in IoT Devices
Authors:
Valerian Rey,
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Gérôme Bovet,
Martin Jaggi
Abstract:
This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to…
▽ More
This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to evaluate the proposed framework. Both supervised and unsupervised federated models (multi-layer perceptron and autoencoder) able to detect malware affecting seen and unseen IoT devices of N-BaIoT have been trained and evaluated. Furthermore, their performance has been compared to two traditional approaches. The first one lets each participant locally train a model using only its own data, while the second consists of making the participants share their data with a central entity in charge of training a global model. This comparison has shown that the use of more diverse and large data, as done in the federated and centralized methods, has a considerable positive impact on the model performance. Besides, the federated models, while preserving the participant's privacy, show similar results as the centralized ones. As an additional contribution and to measure the robustness of the federated approach, an adversarial setup with several malicious participants poisoning the federated model has been considered. The baseline model aggregation averaging step used in most federated learning algorithms appears highly vulnerable to different attacks, even with a single adversary. The performance of other model aggregation functions acting as countermeasures is thus evaluated under the same attack scenarios. These functions provide a significant improvement against malicious participants, but more efforts are still needed to make federated approaches robust.
△ Less
Submitted 19 November, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
A Survey on Device Behavior Fingerprinting: Data Sources, Techniques, Application Scenarios, and Datasets
Authors:
Pedro Miguel Sánchez Sánchez,
Jose María Jorquera Valero,
Alberto Huertas Celdrán,
Gérôme Bovet,
Manuel Gil Pérez,
Gregorio Martínez Pérez
Abstract:
In the current network-based computing world, where the number of interconnected devices grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at the same rate. To guarantee the correct functioning and performance of novel environments such as Smart Cities, Industry 4.0, or crowdsensing, it is crucial to identify the capabilities of their devices (e.g., senso…
▽ More
In the current network-based computing world, where the number of interconnected devices grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at the same rate. To guarantee the correct functioning and performance of novel environments such as Smart Cities, Industry 4.0, or crowdsensing, it is crucial to identify the capabilities of their devices (e.g., sensors, actuators) and detect potential misbehavior that may arise due to cyberattacks, system faults, or misconfigurations. With this goal in mind, a promising research field emerged focusing on creating and managing fingerprints that model the behavior of both the device actions and its components. The article at hand studies the recent growth of the device behavior fingerprinting field in terms of application scenarios, behavioral sources, and processing and evaluation techniques. First, it performs a comprehensive review of the device types, behavioral data, and processing and evaluation techniques used by the most recent and representative research works dealing with two major scenarios: device identification and device misbehavior detection. After that, each work is deeply analyzed and compared, emphasizing its characteristics, advantages, and limitations. This article also provides researchers with a review of the most relevant characteristics of existing datasets as most of the novel processing techniques are based on machine learning and deep learning. Finally, it studies the evolution of these two scenarios in recent years, providing lessons learned, current trends, and future research challenges to guide new solutions in the area.
△ Less
Submitted 3 March, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.
-
AuthCODE: A Privacy-preserving and Multi-device Continuous Authentication Architecture based on Machine and Deep Learning
Authors:
Pedro Miguel Sánchez Sánchez,
Alberto Huertas Celdrán,
Lorenzo Fernández Maimó,
Gregorio Martínez Pérez
Abstract:
The authentication field is evolving towards mechanisms able to keep users continuously authenticated without the necessity of remembering or possessing authentication credentials. While existing continuous authentication systems have demonstrated their suitability for single-device scenarios, the Internet of Things and next generation of mobile networks (5G) are enabling novel multi-device scenar…
▽ More
The authentication field is evolving towards mechanisms able to keep users continuously authenticated without the necessity of remembering or possessing authentication credentials. While existing continuous authentication systems have demonstrated their suitability for single-device scenarios, the Internet of Things and next generation of mobile networks (5G) are enabling novel multi-device scenarios -- such as Smart Offices -- where continuous authentication is still an open challenge. The paper at hand, proposes an AI-based, privacy-preserving and multi-device continuous authentication architecture called AuthCODE. A realistic Smart Office scenario with several users, interacting with their mobile devices and personal computer, has been used to create a set of single- and multi-device behavioural datasets and validate AuthCODE. A pool of experiments with machine and deep learning classifiers measured the impact of time in authentication accuracy and improved the results of single-device approaches by considering multi-device behaviour profiles. The f1-score average reached for XGBoost on multi-device profiles based on 1-minute windows was 99.33%, while the best performance achieved for single devices was lower than 97.39%. The inclusion of temporal information in the form of vector sequences classified by a Long-Short Term Memory Network, allowed the identification of additional complex behaviour patterns associated to each user, resulting in an average f1-score of 99.02% on identification of long-term behaviours.
△ Less
Submitted 30 November, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.