Search | arXiv e-print repository

Generative AI and Large Language Models for Cyber Security: All Insights You Need

Authors: Mohamed Amine Ferrag, Fatima Alwahedi, Ammar Battah, Bilel Cherif, Abdechakour Mechri, Norbert Tihanyi

Abstract: This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs). We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its… ▽ More This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs). We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA. Our analysis extends to LLM vulnerabilities, such as prompt injection, insecure output handling, data poisoning, DDoS attacks, and adversarial instructions. We delve into mitigation strategies to protect these models, providing a comprehensive look at potential attack scenarios and prevention techniques. Furthermore, we evaluate the performance of 42 LLM models in cybersecurity knowledge and hardware security, highlighting their strengths and weaknesses. We thoroughly evaluate cybersecurity datasets for LLM training and testing, covering the lifecycle from data creation to usage and identifying gaps for future research. In addition, we review new strategies for leveraging LLMs, including techniques like Half-Quadratic Quantization (HQQ), Reinforcement Learning with Human Feedback (RLHF), Direct Preference Optimization (DPO), Quantized Low-Rank Adapters (QLoRA), and Retrieval-Augmented Generation (RAG). These insights aim to enhance real-time cybersecurity defenses and improve the sophistication of LLM applications in threat detection and response. Our paper provides a foundational understanding and strategic direction for integrating LLMs into future cybersecurity frameworks, emphasizing innovation and robust model deployment to safeguard against evolving cyber threats. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 50 pages, 8 figures

arXiv:2405.04874 [pdf, other]

Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities

Authors: Yagmur Yigit, Mohamed Amine Ferrag, Iqbal H. Sarker, Leandros A. Maglaras, Christos Chrysoulas, Naghmeh Moradpoor, Helge Janicke

Abstract: Critical National Infrastructure (CNI) encompasses a nation's essential assets that are fundamental to the operation of society and the economy, ensuring the provision of vital utilities such as energy, water, transportation, and communication. Nevertheless, growing cybersecurity threats targeting these infrastructures can potentially interfere with operations and seriously risk national security… ▽ More Critical National Infrastructure (CNI) encompasses a nation's essential assets that are fundamental to the operation of society and the economy, ensuring the provision of vital utilities such as energy, water, transportation, and communication. Nevertheless, growing cybersecurity threats targeting these infrastructures can potentially interfere with operations and seriously risk national security and public safety. In this paper, we examine the intricate issues raised by cybersecurity risks to vital infrastructure, highlighting these systems' vulnerability to different types of cyberattacks. We analyse the significance of trust, privacy, and resilience for Critical Infrastructure Protection (CIP), examining the diverse standards and regulations to manage these domains. We also scrutinise the co-analysis of safety and security, offering innovative approaches for their integration and emphasising the interdependence between these fields. Furthermore, we introduce a comprehensive method for CIP leveraging Generative AI and Large Language Models (LLMs), giving a tailored lifecycle and discussing specific applications across different critical infrastructure sectors. Lastly, we discuss potential future directions that promise to enhance the security and resilience of critical infrastructures. This paper proposes innovative strategies for CIP from evolving attacks and enhances comprehension of cybersecurity concerns related to critical infrastructure. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.18353 [pdf, other]

Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models

Authors: Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag, Ridhi Jain, Lucas C. Cordeiro

Abstract: This study provides a comparative analysis of state-of-the-art large language models (LLMs), analyzing how likely they generate vulnerabilities when writing simple C programs using a neutral zero-shot prompt. We address a significant gap in the literature concerning the security properties of code produced by these models without specific directives. N. Tihanyi et al. introduced the FormAI dataset… ▽ More This study provides a comparative analysis of state-of-the-art large language models (LLMs), analyzing how likely they generate vulnerabilities when writing simple C programs using a neutral zero-shot prompt. We address a significant gap in the literature concerning the security properties of code produced by these models without specific directives. N. Tihanyi et al. introduced the FormAI dataset at PROMISE '23, containing 112,000 GPT-3.5-generated C programs, with over 51.24% identified as vulnerable. We expand that work by introducing the FormAI-v2 dataset comprising 265,000 compilable C programs generated using various LLMs, including robust models such as Google's GEMINI-pro, OpenAI's GPT-4, and TII's 180 billion-parameter Falcon, to Meta's specialized 13 billion-parameter CodeLLama2 and various other compact models. Each program in the dataset is labelled based on the vulnerabilities detected in its source code through formal verification using the Efficient SMT-based Context-Bounded Model Checker (ESBMC). This technique eliminates false positives by delivering a counterexample and ensures the exclusion of false negatives by completing the verification process. Our study reveals that at least 63.47% of the generated programs are vulnerable. The differences between the models are minor, as they all display similar coding errors with slight variations. Our research highlights that while LLMs offer promising capabilities for code generation, deploying their output in a production environment requires risk assessment and validation. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2402.07688 [pdf, other]

CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge

Authors: Norbert Tihanyi, Mohamed Amine Ferrag, Ridhi Jain, Tamas Bisztray, Merouane Debbah

Abstract: Large Language Models (LLMs) are increasingly used across various domains, from software development to cyber threat intelligence. Understanding all the different fields of cybersecurity, which includes topics such as cryptography, reverse engineering, and risk assessment, poses a challenge even for human experts. To accurately test the general knowledge of LLMs in cybersecurity, the research comm… ▽ More Large Language Models (LLMs) are increasingly used across various domains, from software development to cyber threat intelligence. Understanding all the different fields of cybersecurity, which includes topics such as cryptography, reverse engineering, and risk assessment, poses a challenge even for human experts. To accurately test the general knowledge of LLMs in cybersecurity, the research community needs a diverse, accurate, and up-to-date dataset. To address this gap, we present CyberMetric-80, CyberMetric-500, CyberMetric-2000, and CyberMetric-10000, which are multiple-choice Q&A benchmark datasets comprising 80, 500, 2000, and 10,000 questions respectively. By utilizing GPT-3.5 and Retrieval-Augmented Generation (RAG), we collected documents, including NIST standards, research papers, publicly accessible books, RFCs, and other publications in the cybersecurity domain, to generate questions, each with four possible answers. The results underwent several rounds of error checking and refinement. Human experts invested over 200 hours validating the questions and solutions to ensure their accuracy and relevance, and to filter out any questions unrelated to cybersecurity. We have evaluated and compared 25 state-of-the-art LLM models on the CyberMetric datasets. In addition to our primary goal of evaluating LLMs, we involved 30 human participants to solve CyberMetric-80 in a closed-book scenario. The results can serve as a reference for comparing the general cybersecurity knowledge of humans and LLMs. The findings revealed that GPT-4o, GPT-4-turbo, Mixtral-8x7B-Instruct, Falcon-180B-Chat, and GEMINI-pro 1.0 were the best-performing LLMs. Additionally, the top LLMs were more accurate than humans on CyberMetric-80, although highly experienced human experts still outperformed small models such as Llama-3-8B, Phi-2 or Gemma-7b. △ Less

Submitted 3 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.14780 [pdf, other]

Adversarial Attacks and Defenses in 6G Network-Assisted IoT Systems

Authors: Bui Duc Son, Nguyen Tien Hoa, Trinh Van Chien, Waqas Khalid, Mohamed Amine Ferrag, Wan Choi, Merouane Debbah

Abstract: The Internet of Things (IoT) and massive IoT systems are key to sixth-generation (6G) networks due to dense connectivity, ultra-reliability, low latency, and high throughput. Artificial intelligence, including deep learning and machine learning, offers solutions for optimizing and deploying cutting-edge technologies for future radio communications. However, these techniques are vulnerable to adver… ▽ More The Internet of Things (IoT) and massive IoT systems are key to sixth-generation (6G) networks due to dense connectivity, ultra-reliability, low latency, and high throughput. Artificial intelligence, including deep learning and machine learning, offers solutions for optimizing and deploying cutting-edge technologies for future radio communications. However, these techniques are vulnerable to adversarial attacks, leading to degraded performance and erroneous predictions, outcomes unacceptable for ubiquitous networks. This survey extensively addresses adversarial attacks and defense methods in 6G network-assisted IoT systems. The theoretical background and up-to-date research on adversarial attacks and defenses are discussed. Furthermore, we provide Monte Carlo simulations to validate the effectiveness of adversarial attacks compared to jamming attacks. Additionally, we examine the vulnerability of 6G IoT systems by demonstrating attack strategies applicable to key technologies, including reconfigurable intelligent surfaces, massive multiple-input multiple-output (MIMO)/cell-free massive MIMO, satellites, the metaverse, and semantic communications. Finally, we outline the challenges and future developments associated with adversarial attacks and defenses in 6G IoT systems. △ Less

Submitted 28 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: 17 pages, 5 figures, and 4 tables. Submitted for publications

arXiv:2311.14308 [pdf, other]

Distance-Only Task Orchestration Algorithm for Energy Efficiency in Satellite-Based Mist Computing

Authors: Messaoud Babaghayou, Noureddine Chaib, Leandros Maglaras, Yagmur Yigit, Mohamed Amine Ferrag

Abstract: This paper addresses the challenge of efficiently offloading heavy computing tasks from ground mobile devices to the satellite-based mist computing environment. With ground-based edge and cloud servers often being inaccessible, the exploitation of satellite mist computing becomes imperative. Existing offloading algorithms have shown limitations in adapting to the unique characteristics of heavy co… ▽ More This paper addresses the challenge of efficiently offloading heavy computing tasks from ground mobile devices to the satellite-based mist computing environment. With ground-based edge and cloud servers often being inaccessible, the exploitation of satellite mist computing becomes imperative. Existing offloading algorithms have shown limitations in adapting to the unique characteristics of heavy computing tasks. Thus, we propose a heavy computing task offloading algorithm that prioritizes satellite proximity. This approach not only reduces energy consumption during telecommunications but also ensures tasks are executed within the specified timing constraints, which are typically non-time-critical. Our proposed algorithm outperforms other offloading schemes in terms of satellites energy consumption, average end-to-end delay, and tasks success rates. Although it exhibits a higher average VM CPU usage, this increase does not pose critical challenges. This distance-based approach offers a promising solution to enhance energy efficiency in satellite-based mist computing, making it well-suited for heavy computing tasks demands. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2311.12849 [pdf, other]

Reliability Analysis of Fault Tolerant Memory Systems

Authors: Yagmur Yigit, Leandros Maglaras, Mohamed Amine Ferrag, Naghmeh Moradpoor, Georgios Lambropoulos

Abstract: This paper delves into a comprehensive analysis of fault-tolerant memory systems, focusing on recovery techniques modeled using Markov chains to address transient errors. The study revolves around the application of scrubbing methods in conjunction with Single Error Correction and Double Error Detection (SEC-DED) codes. It explores three primary models: 1) Exponentially distributed scrubbing, invo… ▽ More This paper delves into a comprehensive analysis of fault-tolerant memory systems, focusing on recovery techniques modeled using Markov chains to address transient errors. The study revolves around the application of scrubbing methods in conjunction with Single Error Correction and Double Error Detection (SEC-DED) codes. It explores three primary models: 1) Exponentially distributed scrubbing, involving periodic checks of memory words within exponentially distributed time intervals; 2) Deterministic scrubbing, featuring regular, periodic word checks; and 3) Mixed scrubbing, which combines both probabilistic and deterministic scrubbing approaches. The research encompasses the estimation of reliability and Mean Time to Failure (MTTF) values for each model. Notably, the findings highlight the superior performance of mixed scrubbing over simpler scrubbing methods in terms of reliability and MTTF. △ Less

Submitted 23 November, 2023; v1 submitted 16 October, 2023; originally announced November 2023.

Comments: 6 pages

arXiv:2307.10967 [pdf]

doi 10.1109/ACCESS.2023.3332834

ESASCF: Expertise Extraction, Generalization and Reply Framework for an Optimized Automation of Network Security Compliance

Authors: Mohamed C. Ghanem, Thomas M. Chen, Mohamed A. Ferrag, Mohyi E. Kettouche

Abstract: The Cyber threats exposure has created worldwide pressure on organizations to comply with cyber security standards and policies for protecting their digital assets. Vulnerability assessment (VA) and Penetration Testing (PT) are widely adopted Security Compliance (SC) methods to identify security gaps and anticipate security breaches. In the computer networks context and despite the use of autonomo… ▽ More The Cyber threats exposure has created worldwide pressure on organizations to comply with cyber security standards and policies for protecting their digital assets. Vulnerability assessment (VA) and Penetration Testing (PT) are widely adopted Security Compliance (SC) methods to identify security gaps and anticipate security breaches. In the computer networks context and despite the use of autonomous tools and systems, security compliance remains highly repetitive and resources consuming. In this paper, we proposed a novel method to tackle the ever-growing problem of efficiency and effectiveness in network infrastructures security auditing by formally introducing, designing, and develo** an Expert-System Automated Security Compliance Framework (ESASCF) that enables industrial and open-source VA and PT tools and systems to extract, process, store and re-use the expertise in a human-expert way to allow direct application in similar scenarios or during the periodic re-testing. The implemented model was then integrated within the ESASCF and tested on different size networks and proved efficient in terms of time-efficiency and testing effectiveness allowing ESASCF to take over autonomously the SC in Re-testing and offloading Expert by automating repeated segments SC and thus enabling Experts to prioritize important tasks in Ad-Hoc compliance tests. The obtained results validate the performance enhancement notably by cutting the time required for an expert to 50% in the context of typical corporate networks first SC and 20% in re-testing, representing a significant cost-cutting. In addition, the framework allows a long-term impact illustrated in the knowledge extraction, generalization, and re-utilization, which enables better SC confidence independent of the human expert skills, coverage, and wrong decisions resulting in impactful false negatives. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.06616 [pdf, other]

SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs?

Authors: Mohamed Amine Ferrag, Ammar Battah, Norbert Tihanyi, Ridhi Jain, Diana Maimut, Fatima Alwahedi, Thierry Lestable, Narinderjit Singh Thandi, Abdechakour Mechri, Merouane Debbah, Lucas C. Cordeiro

Abstract: Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide mor… ▽ More Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide more accurate outcomes compared to static analyzers, it demands substantial resources and significantly hinders developer productivity. Can Machine Learning (ML) achieve accuracy comparable to FV methods and be used in popular instant code completion frameworks in near real-time? In this paper, we introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model and explicitly tailored for classifying software vulnerabilities. To achieve the best performance, we trained our model using two datasets, namely the FormAI dataset and the FalconVulnDB. The FalconVulnDB is a combination of recent public datasets, namely the SySeVR framework, Draper VDISC, Bigvul, Diversevul, SARD Juliet, and ReVeal datasets. These datasets contain the top 25 most dangerous software weaknesses, such as CWE-119, CWE-120, CWE-476, CWE-122, CWE-190, CWE-121, CWE-78, CWE-787, CWE-20, and CWE-762. SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times. It outperforms existing models such as BERT, RoBERTa, CodeBERT, and traditional ML algorithms, promising to push the boundaries of software vulnerability detection and instant code completion frameworks. △ Less

Submitted 29 May, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.02192 [pdf, other]

doi 10.1145/3617555.3617874

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

Authors: Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C. Cordeiro, Vasileios Mavroeidis

Abstract: This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some program… ▽ More This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks like network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. This is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model checking, abstract interpretation, constraint programming, and satisfiability modulo theories to reason over safety/security properties in programs. This approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. We have associated the identified vulnerabilities with Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program, making the dataset ideal for training LLMs and machine learning algorithms. Our study unveiled that according to ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities, thereby presenting considerable risks to software safety and security. △ Less

Submitted 28 March, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: https://github.com/FormAI-Dataset PLEASE USE PUBLISHED VERSION FOR CITATION: https://doi.org/10.1145/3617555.3617874

Journal ref: PROMISE 2023: Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering December 2023 Pages 33 to 43

arXiv:2306.14263 [pdf, other]

Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices

Authors: Mohamed Amine Ferrag, Mthandazo Ndhlovu, Norbert Tihanyi, Lucas C. Cordeiro, Merouane Debbah, Thierry Lestable, Narinderjit Singh Thandi

Abstract: The field of Natural Language Processing (NLP) is currently undergoing a revolutionary transformation driven by the power of pre-trained Large Language Models (LLMs) based on groundbreaking Transformer architectures. As the frequency and diversity of cybersecurity attacks continue to rise, the importance of incident detection has significantly increased. IoT devices are expanding rapidly, resultin… ▽ More The field of Natural Language Processing (NLP) is currently undergoing a revolutionary transformation driven by the power of pre-trained Large Language Models (LLMs) based on groundbreaking Transformer architectures. As the frequency and diversity of cybersecurity attacks continue to rise, the importance of incident detection has significantly increased. IoT devices are expanding rapidly, resulting in a growing need for efficient techniques to autonomously identify network-based attacks in IoT networks with both high precision and minimal computational requirements. This paper presents SecurityBERT, a novel architecture that leverages the Bidirectional Encoder Representations from Transformers (BERT) model for cyber threat detection in IoT networks. During the training of SecurityBERT, we incorporated a novel privacy-preserving encoding technique called Privacy-Preserving Fixed-Length Encoding (PPFLE). We effectively represented network traffic data in a structured format by combining PPFLE with the Byte-level Byte-Pair Encoder (BBPE) Tokenizer. Our research demonstrates that SecurityBERT outperforms traditional Machine Learning (ML) and Deep Learning (DL) methods, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), in cyber threat detection. Employing the Edge-IIoTset cybersecurity dataset, our experimental analysis shows that SecurityBERT achieved an impressive 98.2% overall accuracy in identifying fourteen distinct attack types, surpassing previous records set by hybrid solutions such as GAN-Transformer-based architectures and CNN-LSTM models. With an inference time of less than 0.15 seconds on an average CPU and a compact model size of just 16.7MB, SecurityBERT is ideally suited for real-life traffic analysis and a suitable choice for deployment on resource-constrained IoT devices. △ Less

Submitted 8 February, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: This paper has been accepted for publication in IEEE Access: http://dx.doi.org/10.1109/ACCESS.2024.3363469

arXiv:2306.10309 [pdf, other]

Edge Learning for 6G-enabled Internet of Things: A Comprehensive Survey of Vulnerabilities, Datasets, and Defenses

Authors: Mohamed Amine Ferrag, Othmane Friha, Burak Kantarci, Norbert Tihanyi, Lucas Cordeiro, Merouane Debbah, Djallel Hamouda, Muna Al-Hawawreh, Kim-Kwang Raymond Choo

Abstract: The ongoing deployment of the fifth generation (5G) wireless networks constantly reveals limitations concerning its original concept as a key driver of Internet of Everything (IoE) applications. These 5G challenges are behind worldwide efforts to enable future networks, such as sixth generation (6G) networks, to efficiently support sophisticated applications ranging from autonomous driving capabil… ▽ More The ongoing deployment of the fifth generation (5G) wireless networks constantly reveals limitations concerning its original concept as a key driver of Internet of Everything (IoE) applications. These 5G challenges are behind worldwide efforts to enable future networks, such as sixth generation (6G) networks, to efficiently support sophisticated applications ranging from autonomous driving capabilities to the Metaverse. Edge learning is a new and powerful approach to training models across distributed clients while protecting the privacy of their data. This approach is expected to be embedded within future network infrastructures, including 6G, to solve challenging problems such as resource management and behavior prediction. This survey article provides a holistic review of the most recent research focused on edge learning vulnerabilities and defenses for 6G-enabled IoT. We summarize the existing surveys on machine learning for 6G IoT security and machine learning-associated threats in three different learning modes: centralized, federated, and distributed. Then, we provide an overview of enabling emerging technologies for 6G IoT intelligence. Moreover, we provide a holistic survey of existing research on attacks against machine learning and classify threat models into eight categories, including backdoor attacks, adversarial examples, combined attacks, poisoning attacks, Sybil attacks, byzantine attacks, inference attacks, and drop** attacks. In addition, we provide a comprehensive and detailed taxonomy and a side-by-side comparison of the state-of-the-art defense methods against edge learning vulnerabilities. Finally, as new attacks and defense technologies are realized, new research and future overall prospects for 6G-enabled IoT are discussed. △ Less

Submitted 8 February, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

Comments: This paper has been accepted for publication in IEEE Communications Surveys \& Tutorials

arXiv:2305.14752 [pdf, other]

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

Authors: Norbert Tihanyi, Ridhi Jain, Yiannis Charalambous, Mohamed Amine Ferrag, Youcheng Sun, Lucas C. Cordeiro

Abstract: This paper introduces an innovative approach that combines Large Language Models (LLMs) with Formal Verification strategies for automatic software vulnerability repair. Initially, we employ Bounded Model Checking (BMC) to identify vulnerabilities and extract counterexamples. These counterexamples are supported by mathematical proofs and the stack trace of the vulnerabilities. Using a specially des… ▽ More This paper introduces an innovative approach that combines Large Language Models (LLMs) with Formal Verification strategies for automatic software vulnerability repair. Initially, we employ Bounded Model Checking (BMC) to identify vulnerabilities and extract counterexamples. These counterexamples are supported by mathematical proofs and the stack trace of the vulnerabilities. Using a specially designed prompt, we combine the original source code with the identified vulnerability, including its stack trace and counterexample that specifies the line number and error type. This combined information is then fed into an LLM, which is instructed to attempt to fix the code. The new code is subsequently verified again using BMC to ensure the fix succeeded. We present the ESBMC-AI framework as a proof of concept, leveraging the well-recognized and industry-adopted Efficient SMT-based Context-Bounded Model Checker (ESBMC) and a pre-trained transformer model to detect and fix errors in C programs, particularly in critical software components. We evaluated our approach on 50,000 C programs randomly selected from the FormAI dataset with their respective vulnerability classifications. Our results demonstrate ESBMC-AI's capability to automate the detection and repair of issues such as buffer overflow, arithmetic overflow, and pointer dereference failures with high accuracy. ESBMC-AI is a pioneering initiative, integrating LLMs with BMC techniques, offering potential integration into the continuous integration and deployment (CI/CD) process within the software development lifecycle. △ Less

Submitted 27 June, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2304.05644 [pdf, other]

Generative Adversarial Networks-Driven Cyber Threat Intelligence Detection Framework for Securing Internet of Things

Authors: Mohamed Amine Ferrag, Djallel Hamouda, Merouane Debbah, Leandros Maglaras, Abderrahmane Lakas

Abstract: While the benefits of 6G-enabled Internet of Things (IoT) are numerous, providing high-speed, low-latency communication that brings new opportunities for innovation and forms the foundation for continued growth in the IoT industry, it is also important to consider the security challenges and risks associated with the technology. In this paper, we propose a two-stage intrusion detection framework f… ▽ More While the benefits of 6G-enabled Internet of Things (IoT) are numerous, providing high-speed, low-latency communication that brings new opportunities for innovation and forms the foundation for continued growth in the IoT industry, it is also important to consider the security challenges and risks associated with the technology. In this paper, we propose a two-stage intrusion detection framework for securing IoTs, which is based on two detectors. In the first stage, we propose an adversarial training approach using generative adversarial networks (GAN) to help the first detector train on robust features by supplying it with adversarial examples as validation sets. Consequently, the classifier would perform very well against adversarial attacks. Then, we propose a deep learning (DL) model for the second detector to identify intrusions. We evaluated the proposed approach's efficiency in terms of detection accuracy and robustness against adversarial attacks. Experiment results with a new cyber security dataset demonstrate the effectiveness of the proposed methodology in detecting both intrusions and persistent adversarial examples with a weighted avg of 96%, 95%, 95%, and 95% for precision, recall, f1-score, and accuracy, respectively. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: The paper is accepted and will be published in the IEEE DCOSS-IoT 2023 Conference Proceedings

arXiv:2303.11751 [pdf, other]

Generative AI for Cyber Threat-Hunting in 6G-enabled IoT Networks

Authors: Mohamed Amine Ferrag, Merouane Debbah, Muna Al-Hawawreh

Abstract: The next generation of cellular technology, 6G, is being developed to enable a wide range of new applications and services for the Internet of Things (IoT). One of 6G's main advantages for IoT applications is its ability to support much higher data rates and bandwidth as well as to support ultra-low latency. However, with this increased connectivity will come to an increased risk of cyber threats,… ▽ More The next generation of cellular technology, 6G, is being developed to enable a wide range of new applications and services for the Internet of Things (IoT). One of 6G's main advantages for IoT applications is its ability to support much higher data rates and bandwidth as well as to support ultra-low latency. However, with this increased connectivity will come to an increased risk of cyber threats, as attackers will be able to exploit the large network of connected devices. Generative Artificial Intelligence (AI) can be used to detect and prevent cyber attacks by continuously learning and adapting to new threats and vulnerabilities. In this paper, we discuss the use of generative AI for cyber threat-hunting (CTH) in 6G-enabled IoT networks. Then, we propose a new generative adversarial network (GAN) and Transformer-based model for CTH in 6G-enabled IoT Networks. The experimental analysis results with a new cyber security dataset demonstrate that the Transformer-based security model for CTH can detect IoT attacks with a high overall accuracy of 95%. We examine the challenges and opportunities and conclude by highlighting the potential of generative AI in enhancing the security of 6G-enabled IoT networks and call for further research to be conducted in this area. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: The paper is accepted and will be published in the IEEE/ACM CCGrid 2023 Conference Proceedings

arXiv:2303.11745 [pdf, other]

Poisoning Attacks in Federated Edge Learning for Digital Twin 6G-enabled IoTs: An Anticipatory Study

Authors: Mohamed Amine Ferrag, Burak Kantarci, Lucas C. Cordeiro, Merouane Debbah, Kim-Kwang Raymond Choo

Abstract: Federated edge learning can be essential in supporting privacy-preserving, artificial intelligence (AI)-enabled activities in digital twin 6G-enabled Internet of Things (IoT) environments. However, we need to also consider the potential of attacks targeting the underlying AI systems (e.g., adversaries seek to corrupt data on the IoT devices during local updates or corrupt the model updates); hence… ▽ More Federated edge learning can be essential in supporting privacy-preserving, artificial intelligence (AI)-enabled activities in digital twin 6G-enabled Internet of Things (IoT) environments. However, we need to also consider the potential of attacks targeting the underlying AI systems (e.g., adversaries seek to corrupt data on the IoT devices during local updates or corrupt the model updates); hence, in this article, we propose an anticipatory study for poisoning attacks in federated edge learning for digital twin 6G-enabled IoT environments. Specifically, we study the influence of adversaries on the training and development of federated learning models in digital twin 6G-enabled IoT environments. We demonstrate that attackers can carry out poisoning attacks in two different learning settings, namely: centralized learning and federated learning, and successful attacks can severely reduce the model's accuracy. We comprehensively evaluate the attacks on a new cyber security dataset designed for IoT applications with three deep neural networks under the non-independent and identically distributed (Non-IID) data and the independent and identically distributed (IID) data. The poisoning attacks, on an attack classification problem, can lead to a decrease in accuracy from 94.93% to 85.98% with IID data and from 94.18% to 30.04% with Non-IID. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: The paper is accepted and will be published in the IEEE ICC 2023 Conference Proceedings

arXiv:2112.08431 [pdf, other]

Cybersecurity Revisited: Honeytokens meet Google Authenticator

Authors: Vasilis Papaspirou, Maria Papathanasaki, Leandros Maglaras, Ioanna Kantzavelou, Christos Douligeris, Mohamed Amine Ferrag, Helge Janicke

Abstract: Although sufficient authentication mechanisms were enhanced by the use of two or more factors that resulted in new multi factor authentication schemes, more sophisticated and targeted attacks have shown they are also vulnerable. This research work proposes a novel two factor authentication system that incorporates honeytokens into the two factor authentication process. The current implementation c… ▽ More Although sufficient authentication mechanisms were enhanced by the use of two or more factors that resulted in new multi factor authentication schemes, more sophisticated and targeted attacks have shown they are also vulnerable. This research work proposes a novel two factor authentication system that incorporates honeytokens into the two factor authentication process. The current implementation collaborates with Google authenticator. The novelty and simplicity of the presented approach aims at providing additional layers of security and protection into a system and thus making it more secure through a stronger and more efficient authentication mechanism. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 6 pages, 1 figure

arXiv:2012.08782 [pdf, other]

A novel Two-Factor HoneyToken Authentication Mechanism

Authors: Vassilis Papaspirou, Leandros Maglaras, Mohamed Amine Ferrag, Ioanna Kantzavelou, Helge Janicke, Christos Douligeris

Abstract: The majority of systems rely on user authentication on passwords, but passwords have so many weaknesses and widespread use that easily raise significant security concerns, regardless of their encrypted form. Users hold the same password for different accounts, administrators never check password files for flaws that might lead to a successful cracking, and the lack of a tight security policy regar… ▽ More The majority of systems rely on user authentication on passwords, but passwords have so many weaknesses and widespread use that easily raise significant security concerns, regardless of their encrypted form. Users hold the same password for different accounts, administrators never check password files for flaws that might lead to a successful cracking, and the lack of a tight security policy regarding regular password replacement are a few problems that need to be addressed. The proposed research work aims at enhancing this security mechanism, prevent penetrations, password theft, and attempted break-ins towards securing computing systems. The selected solution approach is two-folded; it implements a two-factor authentication scheme to prevent unauthorized access, accompanied by Honeyword principles to detect corrupted or stolen tokens. Both can be integrated into any platform or web application with the use of QR codes and a mobile phone. △ Less

Submitted 20 January, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: 7 pages, 6 figures

arXiv:1901.09374 [pdf, other]

Authentication and Authorization for Mobile IoT Devices using Bio-features: Recent Advances and Future Trends

Authors: Mohamed Amine Ferrag, Leandros Maglaras, Abdelouahid Derhab

Abstract: Bio-features are fast becoming a key tool to authenticate the IoT devices; in this sense, the purpose of this investigation is to summaries the factors that hinder biometrics models' development and deployment on a large scale, including human physiological (e.g., face, eyes, fingerprints-palm, or electrocardiogram) and behavioral features (e.g., signature, voice, gait, or keystroke). The differen… ▽ More Bio-features are fast becoming a key tool to authenticate the IoT devices; in this sense, the purpose of this investigation is to summaries the factors that hinder biometrics models' development and deployment on a large scale, including human physiological (e.g., face, eyes, fingerprints-palm, or electrocardiogram) and behavioral features (e.g., signature, voice, gait, or keystroke). The different machine learning and data mining methods used by authentication and authorization schemes for mobile IoT devices are provided. Threat models and countermeasures used by biometrics-based authentication schemes for mobile IoT devices are also presented. More specifically, We analyze the state of the art of the existing biometric-based authentication schemes for IoT devices. Based on the current taxonomy, We conclude our paper with different types of challenges for future research efforts in biometrics-based authentication schemes for IoT devices. △ Less

Submitted 27 January, 2019; originally announced January 2019.

Comments: 13 pages, figure 6, tables 5

arXiv:1901.03899 [pdf, other]

Threats, Protection and Attribution of Cyber Attacks on Critical Infrastructures

Authors: Leandros Maglaras, Mohamed Amine Ferrag, Abdelouahid Derhab, Mithun Mukherjee, Helge Janicke, Stylianos Rallis

Abstract: As Critical National Infrastructures are becoming more vulnerable to cyber attacks, their protection becomes a significant issue for any organization as well as a nation. Moreover, the ability to attribute is a vital element of avoiding impunity in cyberspace. In this article, we present main threats to critical infrastructures along with protective measures that one nation can take, and which are… ▽ More As Critical National Infrastructures are becoming more vulnerable to cyber attacks, their protection becomes a significant issue for any organization as well as a nation. Moreover, the ability to attribute is a vital element of avoiding impunity in cyberspace. In this article, we present main threats to critical infrastructures along with protective measures that one nation can take, and which are classified according to legal, technical, organizational, capacity building, and cooperation aspects. Finally we provide an overview of current methods and practices regarding cyber attribution and cyber peace kee** △ Less

Submitted 12 January, 2019; originally announced January 2019.

arXiv:1812.09059 [pdf, other]

A Novel Hierarchical Intrusion Detection System based on Decision Tree and Rules-based Models

Authors: Ahmed Ahmim, Leandros Maglaras, Mohamed Amine Ferrag, Makhlouf Derdour, Helge Janicke

Abstract: This paper proposes a novel intrusion detection system (IDS) that combines different classifier approaches which are based on decision tree and rules-based concepts, namely, REP Tree, JRip algorithm and Forest PA. Specifically, the first and second method take as inputs features of the data set, and classify the network traffic as Attack/Benign. The third classifier uses features of the initial da… ▽ More This paper proposes a novel intrusion detection system (IDS) that combines different classifier approaches which are based on decision tree and rules-based concepts, namely, REP Tree, JRip algorithm and Forest PA. Specifically, the first and second method take as inputs features of the data set, and classify the network traffic as Attack/Benign. The third classifier uses features of the initial data set in addition to the outputs of the first and the second classifier as inputs. The experimental results obtained by analyzing the proposed IDS using the CICIDS2017 dataset, attest their superiority in terms of accuracy, detection rate, false alarm rate and time overhead as compared to state of the art existing schemes. △ Less

Submitted 21 December, 2018; originally announced December 2018.

Comments: 6 pages, 1 figure

arXiv:1806.09099 [pdf, other]

doi 10.1109/JIOT.2018.2882794

Blockchain Technologies for the Internet of Things: Research Issues and Challenges

Authors: Mohamed Amine Ferrag, Makhlouf Derdour, Mithun Mukherjee, Abdelouahid Derhab, Leandros Maglaras, Helge Janicke

Abstract: This paper presents a comprehensive survey of the existing blockchain protocols for the Internet of Things (IoT) networks. We start by describing the blockchains and summarizing the existing surveys that deal with blockchain technologies. Then, we provide an overview of the application domains of blockchain technologies in IoT, e.g, Internet of Vehicles, Internet of Energy, Internet of Cloud, Fog… ▽ More This paper presents a comprehensive survey of the existing blockchain protocols for the Internet of Things (IoT) networks. We start by describing the blockchains and summarizing the existing surveys that deal with blockchain technologies. Then, we provide an overview of the application domains of blockchain technologies in IoT, e.g, Internet of Vehicles, Internet of Energy, Internet of Cloud, Fog computing, etc. Moreover, we provide a classification of threat models, which are considered by blockchain protocols in IoT networks, into five main categories, namely, identity-based attacks, manipulation-based attacks, cryptanalytic attacks, reputation-based attacks, and service-based attacks. In addition, we provide a taxonomy and a side-by-side comparison of the state-of-the-art methods towards secure and privacy-preserving blockchain technologies with respect to the blockchain model, specific security goals, performance, limitations, computation complexity, and communication overhead. Based on the current survey, we highlight open research challenges and discuss possible future research directions in the blockchain technologies for IoT. △ Less

Submitted 24 June, 2018; originally announced June 2018.

Comments: 14 pages, 5 figures

arXiv:1803.10281 [pdf, other]

Authentication schemes for Smart Mobile Devices: Threat Models, Countermeasures, and Open Research Issues

Authors: Mohamed Amine Ferrag, Leandros Maglaras, Abdelouahid Derhab, Helge Janicke

Abstract: This paper presents a comprehensive investigation of authentication schemes for smart mobile devices. We start by providing an overview of existing survey articles published in the recent years that deal with security for mobile devices. Then, we describe and give a classification of threat models in smart mobile devices in five categories, including, identity-based attacks, eavesdrop**-based at… ▽ More This paper presents a comprehensive investigation of authentication schemes for smart mobile devices. We start by providing an overview of existing survey articles published in the recent years that deal with security for mobile devices. Then, we describe and give a classification of threat models in smart mobile devices in five categories, including, identity-based attacks, eavesdrop**-based attacks, combined eavesdrop** and identity-based attacks, manipulation-based attacks, and service-based attacks. We also provide a classification of countermeasures into four types of categories, including, cryptographic functions, personal identification, classification algorithms, and channel characteristics. According to these, we categorize authentication schemes for smart mobile devices in four categories, namely, 1) biometric-based authentication schemes, 2) channel-based authentication schemes, 3) factor-based authentication schemes, and 4) ID-based authentication schemes. In addition, we provide a taxonomy and comparison of authentication schemes for smart mobile devices in the form of tables. Finally, we identify open challenges and future research directions. △ Less

Submitted 8 March, 2019; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: 22 pages, 9 figures

arXiv:1711.00525 [pdf, other]

Internet of Cloud: Security and Privacy issues

Authors: Allan Cook, Michael Robinson, Mohamed Amine Ferrag, Leandros A. Maglaras, Ying He, Kevin Jones, Helge Janicke

Abstract: The synergy between the cloud and the IoT has emerged largely due to the cloud having attributes which directly benefit the IoT and enable its continued growth. IoT adopting Cloud services has brought new security challenges. In this book chapter, we pursue two main goals: 1) to analyse the different components of Cloud computing and the IoT and 2) to present security and privacy problems that the… ▽ More The synergy between the cloud and the IoT has emerged largely due to the cloud having attributes which directly benefit the IoT and enable its continued growth. IoT adopting Cloud services has brought new security challenges. In this book chapter, we pursue two main goals: 1) to analyse the different components of Cloud computing and the IoT and 2) to present security and privacy problems that these systems face. We thoroughly investigate current security and privacy preservation solutions that exist in this area, with an eye on the Industrial Internet of Things, discuss open issues and propose future directions △ Less

Submitted 1 November, 2017; originally announced November 2017.

Comments: 27 pages, 4 figures

arXiv:1708.04027 [pdf, other]

Security for 4G and 5G Cellular Networks: A Survey of Existing Authentication and Privacy-preserving Schemes

Authors: Mohamed Amine Ferrag, Leandros Maglaras, Antonios Argyriou, Dimitrios Kosmanos, Helge Janicke

Abstract: This paper presents a comprehensive survey of existing authentication and privacy-preserving schemes for 4G and 5G cellular networks. We start by providing an overview of existing surveys that deal with 4G and 5G communications, applications, standardization, and security. Then, we give a classification of threat models in 4G and 5G cellular networks in four categories, including, attacks against… ▽ More This paper presents a comprehensive survey of existing authentication and privacy-preserving schemes for 4G and 5G cellular networks. We start by providing an overview of existing surveys that deal with 4G and 5G communications, applications, standardization, and security. Then, we give a classification of threat models in 4G and 5G cellular networks in four categories, including, attacks against privacy, attacks against integrity, attacks against availability, and attacks against authentication. We also provide a classification of countermeasures into three types of categories, including, cryptography methods, humans factors, and intrusion detection methods. The countermeasures and informal and formal security analysis techniques used by the authentication and privacy preserving schemes are summarized in form of tables. Based on the categorization of the authentication and privacy models, we classify these schemes in seven types, including, handover authentication with privacy, mutual authentication with privacy, RFID authentication with privacy, deniable authentication with privacy, authentication with mutual anonymity, authentication and key agreement with privacy, and three-factor authentication with privacy. In addition, we provide a taxonomy and comparison of authentication and privacy-preserving schemes for 4G and 5G cellular networks in form of tables. Based on the current survey, several recommendations for further research are discussed at the end of this paper. △ Less

Submitted 14 August, 2017; originally announced August 2017.

Comments: 24 pages, 14 figures

arXiv:1612.07206 [pdf, other]

Authentication Protocols for Internet of Things: A Comprehensive Survey

Authors: Mohamed Amine Ferrag, Leandros A. Maglaras, Helge Janicke, Jianmin Jiang

Abstract: In this paper, we present a comprehensive survey of authentication protocols for Internet of Things (IoT). Specifically, we select and in-detail examine more than forty authentication protocols developed for or applied in the context of the IoT under four environments, including: (1) Machine to machine communications (M2M), (2) Internet of Vehicles (IoV), (3) Internet of Energy (IoE), and (4) Inte… ▽ More In this paper, we present a comprehensive survey of authentication protocols for Internet of Things (IoT). Specifically, we select and in-detail examine more than forty authentication protocols developed for or applied in the context of the IoT under four environments, including: (1) Machine to machine communications (M2M), (2) Internet of Vehicles (IoV), (3) Internet of Energy (IoE), and (4) Internet of Sensors (IoS). We start by reviewing all survey articles published in the recent years that focusing on different aspects of the IoT idea. Then, we review threat models, countermeasures, and formal security verification techniques used in authentication protocols for the IoT. In addition, we provide a taxonomy and comparison of authentication protocols for the IoT in form of tables in five terms, namely, network model, goals, main processes, computation complexity, and communication overhead. Based on the current survey, we identify open issues and suggest hints for future research. △ Less

Submitted 21 December, 2016; originally announced December 2016.

Comments: 62 pages, 8 figures

arXiv:1611.07722 [pdf, other]

A Survey on Privacy-preserving Schemes for Smart Grid Communications

Authors: Mohamed Amine Ferrag, Leandros A. Maglaras, Helge Janicke, Jianmin Jiang

Abstract: In this paper, we present a comprehensive survey of privacy-preserving schemes for Smart Grid communications. Specifically, we select and in-detail examine thirty privacy preserving schemes developed for or applied in the context of Smart Grids. Based on the communication and system models, we classify these schemes that are published between 2013 and 2016, in five categories, including, 1) Smart… ▽ More In this paper, we present a comprehensive survey of privacy-preserving schemes for Smart Grid communications. Specifically, we select and in-detail examine thirty privacy preserving schemes developed for or applied in the context of Smart Grids. Based on the communication and system models, we classify these schemes that are published between 2013 and 2016, in five categories, including, 1) Smart grid with the advanced metering infrastructure, 2) Data aggregation communications, 3) Smart grid marketing architecture, 4) Smart community of home gateways, and 5) Vehicle-to grid architecture. For each scheme, we survey the attacks of leaking privacy, countermeasures, and game theoretic approaches. In addition, we review the survey articles published in the recent years that deal with Smart Grids communications, applications, standardization, and security. Based on the current survey, several recommendations for further research are discussed at the end of this paper. △ Less

Submitted 23 November, 2016; originally announced November 2016.

Comments: 30 pages, 13 figures, 15 tables

arXiv:1610.06095 [pdf, other]

Privacy-preserving schemes for Ad Hoc Social Networks: A survey

Authors: Mohamed Amine Ferrag, Leandros Maglaras, Ahmed Ahmim

Abstract: In this paper, we review the state of the art of privacy-preserving schemes for ad hoc social networks, including, mobile social networks (MSNs) and vehicular social networks (VSNs). Specifically, we select and in-detail examine thirty-three privacy preserving schemes developed for or applied in the context of ad hoc social networks. These schemes are published between 2008 and 2016. Based on this… ▽ More In this paper, we review the state of the art of privacy-preserving schemes for ad hoc social networks, including, mobile social networks (MSNs) and vehicular social networks (VSNs). Specifically, we select and in-detail examine thirty-three privacy preserving schemes developed for or applied in the context of ad hoc social networks. These schemes are published between 2008 and 2016. Based on this existing privacy preservation schemes, we survey privacy preservation models, including location privacy, identity privacy, anonymity, traceability, interest privacy, backward privacy, and content oriented privacy. The recent important attacks of leaking privacy, countermeasures, and game theoretic approaches in VSNs and MSNs are summarized in form of tables. In addition, an overview of recommendations for further research is also provided. With this survey, readers can have a more thorough understanding of research trends in privacy-preserving schemes for ad hoc social networks △ Less

Submitted 19 October, 2016; originally announced October 2016.

Comments: 27 pages, 7 figures

Showing 1–28 of 28 results for author: Ferrag, M A