Search | arXiv e-print repository

Cyber Deception Reactive: TCP Stealth Redirection to On-Demand Honeypots

Authors: Pedro Beltran Lopez, Pantaleone Nespoli, Manuel Gil Perez

Abstract: Cybersecurity is develo** rapidly, and new methods of defence against attackers are appearing, such as Cyber Deception (CYDEC). CYDEC consists of deceiving the enemy who performs actions without realising that he/she is being deceived. This article proposes designing, implementing, and evaluating a deception mechanism based on the stealthy redirection of TCP communications to an on-demand honey… ▽ More Cybersecurity is develo** rapidly, and new methods of defence against attackers are appearing, such as Cyber Deception (CYDEC). CYDEC consists of deceiving the enemy who performs actions without realising that he/she is being deceived. This article proposes designing, implementing, and evaluating a deception mechanism based on the stealthy redirection of TCP communications to an on-demand honey server with the same characteristics as the victim asset, i.e., it is a clone. Such a mechanism ensures that the defender fools the attacker, thanks to stealth redirection. In this situation, the attacker will focus on attacking the honey server while enabling the recollection of relevant information to generate threat intelligence. The experiments in different scenarios show how the proposed solution can effectively redirect an attacker to a copied asset on demand, thus protecting the real asset. Finally, the results obtained by evaluating the latency times ensure that the redirection is undetectable by humans and very difficult to detect by a machine. △ Less

Submitted 20 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2307.11730 [pdf, other]

Mitigating Communications Threats in Decentralized Federated Learning through Moving Target Defense

Authors: Enrique Tomás Martínez Beltrán, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decent… ▽ More The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decentralized nature of the aggregation process, the varied roles and responsibilities of the participants, and the absence of a central authority to oversee and mitigate threats. Addressing these challenges, this paper first delineates a comprehensive threat model focused on DFL communications. In response to these identified risks, this work introduces a security module to counter communication-based attacks for DFL platforms. The module combines security techniques such as symmetric and asymmetric encryption with Moving Target Defense (MTD) techniques, including random neighbor selection and IP/port switching. The security module is implemented in a DFL platform, Fedstellar, allowing the deployment and monitoring of the federation. A DFL scenario with physical and virtual deployments have been executed, encompassing three security configurations: (i) a baseline without security, (ii) an encrypted configuration, and (iii) a configuration integrating both encryption and MTD techniques. The effectiveness of the security module is validated through experiments with the MNIST dataset and eclipse attacks. The results showed an average F1 score of 95%, with the most secure configuration resulting in CPU usage peaking at 68% (+-9%) in virtual deployments and network traffic reaching 480.8 MB (+-18 MB), effectively mitigating risks associated with eavesdrop** or eclipse attacks. △ Less

Submitted 9 December, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2306.09750 [pdf, other]

doi 10.1016/j.eswa.2023.122861

Fedstellar: A Platform for Decentralized Federated Learning

Authors: Enrique Tomás Martínez Beltrán, Ángel Luis Perales Gómez, Chao Feng, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, sin… ▽ More In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, single point of failure, and reliance on a central server. Decentralized Federated Learning (DFL) addresses these issues by enabling decentralized model aggregation and minimizing dependency on a central entity. Despite these advances, current platforms training DFL models struggle with key issues such as managing heterogeneous federation network topologies. To overcome these challenges, this paper presents Fedstellar, a platform extended from p2pfl library and designed to train FL models in a decentralized, semi-decentralized, and centralized fashion across diverse federations of physical or virtualized devices. The Fedstellar implementation encompasses a web application with an interactive graphical interface, a controller for deploying federations of nodes using physical or virtual devices, and a core deployed on each device which provides the logic needed to train, aggregate, and communicate in the network. The effectiveness of the platform has been demonstrated in two scenarios: a physical deployment involving single-board devices such as Raspberry Pis for detecting cyberattacks, and a virtualized deployment comparing various FL approaches in a controlled environment using MNIST and CIFAR-10 datasets. In both scenarios, Fedstellar demonstrated consistent performance and adaptability, achieving F1 scores of 91%, 98%, and 91.2% using DFL for detecting cyberattacks and classifying MNIST and CIFAR-10, respectively, reducing training time by 32% compared to centralized approaches. △ Less

Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2211.08413 [pdf, other]

doi 10.1109/COMST.2023.3315746

Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges

Authors: Enrique Tomás Martínez Beltrán, Mario Quiles Pérez, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

Abstract: In recent years, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustwo… ▽ More In recent years, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustworthiness concerns affecting the entity responsible for the global model creation. Decentralized Federated Learning (DFL) emerged to address these concerns by promoting decentralized model aggregation and minimizing reliance on centralized architectures. However, despite the work done in DFL, the literature has not (i) studied the main aspects differentiating DFL and CFL; (ii) analyzed DFL frameworks to create and evaluate new solutions; and (iii) reviewed application scenarios using DFL. Thus, this article identifies and analyzes the main fundamentals of DFL in terms of federation architectures, topologies, communication mechanisms, security approaches, and key performance indicators. Additionally, the paper at hand explores existing mechanisms to optimize critical DFL fundamentals. Then, the most relevant features of the current DFL frameworks are reviewed and compared. After that, it analyzes the most used DFL application scenarios, identifying solutions based on the fundamentals and frameworks previously defined. Finally, the evolution of existing DFL solutions is studied to provide a list of trends, lessons learned, and open challenges. △ Less

Submitted 13 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2210.11517 [pdf, other]

A Security and Trust Framework for Decentralized 5G Marketplaces

Authors: José María Jorquera Valero, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: 5G networks intend to cover user demands through multi-party collaborations in a secure and trustworthy manner. To this end, marketplaces play a pivotal role as enablers for network service consumers and infrastructure providers to offer, negotiate, and purchase 5G resources and services. Nevertheless, marketplaces often do not ensure trustworthy networking by analyzing the security and trust of t… ▽ More 5G networks intend to cover user demands through multi-party collaborations in a secure and trustworthy manner. To this end, marketplaces play a pivotal role as enablers for network service consumers and infrastructure providers to offer, negotiate, and purchase 5G resources and services. Nevertheless, marketplaces often do not ensure trustworthy networking by analyzing the security and trust of their members and offers. This paper presents a security and trust framework to enable the selection of reliable third-party providers based on their history and reputation. In addition, it also introduces a reward and punishment mechanism to continuously update trust scores according to security events. Finally, we showcase a real use case in which the security and trust framework is being applied. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Journal ref: Proceedings of the VII Jornadas Nacionales de Investigación en Ciberseguridad, pp. 237-240, Bilbao, Spain (2022)

arXiv:2210.11501 [pdf, other]

Trust-as-a-Service: A reputation-enabled trust framework for 5G networks

Authors: José María Jorquera Valero, Pedro Miguel Sánchez Sánchez, Manuel Gil Pérez, Alberto Huertas Celdrán, Gregorio Martínez Pérez

Abstract: Trust, security, and privacy are three of the major pillars to assemble the fifth generation network and beyond. Despite such pillars are principally interconnected, they arise a multitude of challenges to be addressed separately. 5G ought to offer flexible and pervasive computing capabilities across multiple domains according to user demands and assuring trustworthy network providers. Distributed… ▽ More Trust, security, and privacy are three of the major pillars to assemble the fifth generation network and beyond. Despite such pillars are principally interconnected, they arise a multitude of challenges to be addressed separately. 5G ought to offer flexible and pervasive computing capabilities across multiple domains according to user demands and assuring trustworthy network providers. Distributed marketplaces expect to boost the trading of heterogeneous resources so as to enable the establishment of pervasive service chains between cross-domains. Nevertheless, the need for reliable parties as ``marketplace operators'' plays a pivotal role to achieving a trustworthy ecosystem. One of the principal blockages in managing foreseeable networks is the need of adapting previous trust models to accomplish the new network and business requirements. In this regard, this article is centered on trust management of 5G multi-party networks. The design of a reputation-based trust framework is proposed as a Trust-as-a-Service (TaaS) solution for any distributed multi-stakeholder environment where zero trust and zero-touch principles should be met. Besides, a literature review is also conducted to recognize the network and business requirements currently envisaged. Finally, the validation of the proposed trust framework is performed in a real research environment, the 5GBarcelona testbed, leveraging 12% of a 2.1GHz CPU with 20 cores and 2% of the 30GiB memory. In this regard, these outcomes reveal the feasibility of the TaaS solution in the context of determining reliable network operators. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2204.08516 [pdf, other]

doi 10.1016/j.iot.2023.100764

LwHBench: A low-level hardware component benchmark and dataset for Single Board Computers

Authors: Pedro Miguel Sánchez Sánchez, José María Jorquera Valero, Alberto Huertas Celdrán, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: In today's computing environment, where Artificial Intelligence (AI) and data processing are moving toward the Internet of Things (IoT) and Edge computing paradigms, benchmarking resource-constrained devices is a critical task to evaluate their suitability and performance. Between the employed devices, Single-Board Computers arise as multi-purpose and affordable systems. The literature has explore… ▽ More In today's computing environment, where Artificial Intelligence (AI) and data processing are moving toward the Internet of Things (IoT) and Edge computing paradigms, benchmarking resource-constrained devices is a critical task to evaluate their suitability and performance. Between the employed devices, Single-Board Computers arise as multi-purpose and affordable systems. The literature has explored Single-Board Computers performance when running high-level benchmarks specialized in particular application scenarios, such as AI or medical applications. However, lower-level benchmarking applications and datasets are needed to enable new Edge-based AI solutions for network, system and service management based on device and component performance, such as individual device identification. Thus, this paper presents LwHBench, a low-level hardware benchmarking application for Single-Board Computers that measures the performance of CPU, GPU, Memory and Storage taking into account the component constraints in these types of devices. LwHBench has been implemented for Raspberry Pi devices and run for 100 days on a set of 45 devices to generate an extensive dataset that allows the usage of AI techniques in scenarios where performance data can help in the device management process. Besides, to demonstrate the inter-scenario capability of the dataset, a series of AI-enabled use cases about device identification and context impact on performance are presented as exploration of the published data. Finally, the benchmark application has been adapted and applied to an agriculture-focused scenario where three RockPro64 devices are present. △ Less

Submitted 24 October, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

arXiv:2106.08209 [pdf, other]

doi 10.1016/j.jnca.2022.103579

A methodology to identify identical single-board computers based on hardware behavior fingerprinting

Authors: Pedro Miguel Sánchez Sánchez, José María Jorquera Valero, Alberto Huertas Celdrán, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: The connectivity and resource-constrained nature of single-board devices open the door to cybersecurity concerns affecting Internet of Things (IoT) scenarios. One of the most important issues is the presence of unauthorized IoT devices that want to impersonate legitimate ones by using identical hardware and software specifications. This situation can provoke sensitive information leakages, data po… ▽ More The connectivity and resource-constrained nature of single-board devices open the door to cybersecurity concerns affecting Internet of Things (IoT) scenarios. One of the most important issues is the presence of unauthorized IoT devices that want to impersonate legitimate ones by using identical hardware and software specifications. This situation can provoke sensitive information leakages, data poisoning, or privilege escalation in IoT scenarios. Combining behavioral fingerprinting and Machine/Deep Learning (ML/DL) techniques is a promising approach to identify these malicious spoofing devices by detecting minor performance differences generated by imperfections in manufacturing. However, existing solutions are not suitable for single-board devices since they do not consider their hardware and software limitations, underestimate critical aspects such as fingerprint stability or context changes, and do not explore the potential of ML/DL techniques. To improve it, this work first identifies the essential properties for single-board device identification: uniqueness, stability, diversity, scalability, efficiency, robustness, and security. Then, a novel methodology relies on behavioral fingerprinting to identify identical single-board devices and meet the previous properties. The methodology leverages the different built-in components of the system and ML/DL techniques, comparing the device internal behavior with each other to detect manufacturing variations. The methodology validation has been performed in a real environment composed of 15 identical Raspberry Pi 4 B and 10 Raspberry Pi 3 B+ devices, obtaining a 91.9% average TPR and identifying all devices by setting a 50% threshold in the evaluation process. Finally, a discussion compares the proposed solution with related work, highlighting the fingerprint properties not met, and provides important lessons learned and limitations. △ Less

Submitted 22 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2008.03343 [pdf, other]

doi 10.1109/COMST.2021.3064259

A Survey on Device Behavior Fingerprinting: Data Sources, Techniques, Application Scenarios, and Datasets

Authors: Pedro Miguel Sánchez Sánchez, Jose María Jorquera Valero, Alberto Huertas Celdrán, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez

Abstract: In the current network-based computing world, where the number of interconnected devices grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at the same rate. To guarantee the correct functioning and performance of novel environments such as Smart Cities, Industry 4.0, or crowdsensing, it is crucial to identify the capabilities of their devices (e.g., senso… ▽ More In the current network-based computing world, where the number of interconnected devices grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at the same rate. To guarantee the correct functioning and performance of novel environments such as Smart Cities, Industry 4.0, or crowdsensing, it is crucial to identify the capabilities of their devices (e.g., sensors, actuators) and detect potential misbehavior that may arise due to cyberattacks, system faults, or misconfigurations. With this goal in mind, a promising research field emerged focusing on creating and managing fingerprints that model the behavior of both the device actions and its components. The article at hand studies the recent growth of the device behavior fingerprinting field in terms of application scenarios, behavioral sources, and processing and evaluation techniques. First, it performs a comprehensive review of the device types, behavioral data, and processing and evaluation techniques used by the most recent and representative research works dealing with two major scenarios: device identification and device misbehavior detection. After that, each work is deeply analyzed and compared, emphasizing its characteristics, advantages, and limitations. This article also provides researchers with a review of the most relevant characteristics of existing datasets as most of the novel processing techniques are based on machine learning and deep learning. Finally, it studies the evolution of these two scenarios in recent years, providing lessons learned, current trends, and future research challenges to guide new solutions in the area. △ Less

Submitted 3 March, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

arXiv:2004.00931 [pdf, other]

doi 10.1109/TNSM.2020.3031573

Spotting political social bots in Twitter: A use case of the 2019 Spanish general election

Authors: Javier Pastor-Galindo, Mattia Zago, Pantaleone Nespoli, Sergio López Bernal, Alberto Huertas Celdrán, Manuel Gil Pérez, José A. Ruipérez-Valiente, Gregorio Martínez Pérez, Félix Gómez Mármol

Abstract: While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Sp… ▽ More While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Spanish general election. Throughout our study, we classified involved users as social bots or humans, and examined their interactions from a quantitative (i.e., amount of traffic generated and existing relations) and qualitative (i.e., user's political affinity and sentiment towards the most important parties) perspectives. Results demonstrated that a non-negligible amount of those bots actively participated in the election, supporting each of the five principal political parties. △ Less

Submitted 12 October, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:2003.13833 [pdf]

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

Authors: Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin , et al. (22 additional authors not shown)

Abstract: Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitu… ▽ More Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

arXiv:2003.13551 [pdf]

European Language Grid: An Overview

Authors: Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici , et al. (11 additional authors not shown)

Abstract: With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, lang… ▽ More With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, languages, verticals and sectors, significantly holding back its impact. The European Language Grid (ELG) project addresses this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG is a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial LTs for all European languages, including running tools and services as well as data sets and resources. Once fully operational, it will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Furthermore, the ELG project organises two open calls for up to 20 pilot projects. It also sets up 32 National Competence Centres (NCCs) and the European LT Council (LTC) for outreach and coordination purposes. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

arXiv:2003.13236 [pdf, other]

Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid

Authors: Penny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitrios Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Mickaël Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, José Manuel Gómez Pérez, Andres Garcia Silva

Abstract: The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies… ▽ More The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the description of Language Resources and Technologies (processing and generation services and tools, models, corpora, term lists, etc.), as well as related entities (e.g., organizations, projects, supporting documents, etc.). The schema powers the European Language Grid platform that aims to be the primary hub and marketplace for industry-relevant Language Technology in Europe. ELG-SHARE has been based on various metadata schemas, vocabularies, and ontologies, as well as related recommendations and guidelines. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

Showing 1–13 of 13 results for author: Pérez, M G