Search | arXiv e-print repository

Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning

Authors: Udi Aharon, Revital Marbel, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. These threats necessitate robust anomaly detection systems capable of identifying malicious API traffic efficiently despite limited and diverse datasets. This paper proposes a novel few-shot detection approach motivated by Natural Language Processing (NLP) and advanced Genera… ▽ More Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. These threats necessitate robust anomaly detection systems capable of identifying malicious API traffic efficiently despite limited and diverse datasets. This paper proposes a novel few-shot detection approach motivated by Natural Language Processing (NLP) and advanced Generative Adversarial Network (GAN)-inspired techniques. Leveraging state-of-the-art Transformer architectures, particularly RoBERTa, our method enhances the contextual understanding of API requests, leading to improved anomaly detection compared to traditional methods. We showcase the technique's versatility by demonstrating its effectiveness with both Out-of-Distribution (OOD) and Transformer-based binary classification methods on two distinct datasets: CSIC 2010 and ATRDF 2023. Our evaluations reveal consistently enhanced or, at worst, equivalent detection rates across various metrics in most vectors, highlighting the promise of our approach for improving API security. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, 7 tables

arXiv:2405.11247 [pdf, other]

Few-Shot API Attack Anomaly Detection in a Classification-by-Retrieval Framework

Authors: Udi Aharon, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Application Programming Interface (API) attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there still remain… ▽ More Application Programming Interface (API) attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there still remains a significant challenge when dealing with attackers who use novel approaches that don't match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities in unconventional manners and with objectives surpassing their intended boundaries. This means API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to anomalous behavior. In response to these challenges, we propose a novel few-shot anomaly detection framework, named FT-ANN. This framework is composed of two parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework enables the development of a lightweight model that can be trained with minimal examples per class or even a model capable of classifying multiple classes. The results show that our framework effectively improves API attack detection accuracy compared to various baselines. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 13 pages, 8 figures, 3 tables

arXiv:2403.11206 [pdf, other]

CBR -- Boosting Adaptive Classification By Retrieval of Encrypted Network Traffic with Out-of-distribution

Authors: Amir Lukach, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Encrypted network traffic Classification tackles the problem from different approaches and with different goals. One of the common approaches is using Machine learning or Deep Learning-based solutions on a fixed number of classes, leading to misclassification when an unknown class is given as input. One of the solutions for handling unknown classes is to retrain the model, however, retraining mode… ▽ More Encrypted network traffic Classification tackles the problem from different approaches and with different goals. One of the common approaches is using Machine learning or Deep Learning-based solutions on a fixed number of classes, leading to misclassification when an unknown class is given as input. One of the solutions for handling unknown classes is to retrain the model, however, retraining models every time they become obsolete is both resource and time-consuming. Therefore, there is a growing need to allow classification models to detect and adapt to new classes dynamically, without retraining, but instead able to detect new classes using few shots learning [1]. In this paper, we introduce Adaptive Classification By Retrieval CBR, a novel approach for encrypted network traffic classification. Our new approach is based on an ANN-based method, which allows us to effectively identify new and existing classes without retraining the model. The novel approach is simple, yet effective and achieved similar results to RF with up to 5% difference (usually less than that) in the classification tasks while having a slight decrease in the case of new samples (from new classes) without retraining. To summarize, the new method is a real-time classification, which can classify new classes without retraining. Furthermore, our solution can be used as a complementary solution alongside RF or any other machine/deep learning classification method, as an aggregated solution. △ Less

Submitted 17 March, 2024; originally announced March 2024.

MSC Class: ACM-class: F.2.2; I.2.7 ACM-class: F.2.2; I.2.7 ACM-class: F.2.2; I.2.7 ACM-class: F.2.2; I.2.7 ACM-class: I.2.6

arXiv:2310.01969 [pdf, other]

doi 10.1109/TIFS.2024.3383770

Steganalysis of AI Models LSB Attacks

Authors: Daniel Gilkarov, Ran Dubin

Abstract: Artificial intelligence has made significant progress in the last decade, leading to a rise in the popularity of model sharing. The model zoo ecosystem, a repository of pre-trained AI models, has advanced the AI open-source community and opened new avenues for cyber risks. Malicious attackers can exploit shared models to launch cyber-attacks. This work focuses on the steganalysis of injected malic… ▽ More Artificial intelligence has made significant progress in the last decade, leading to a rise in the popularity of model sharing. The model zoo ecosystem, a repository of pre-trained AI models, has advanced the AI open-source community and opened new avenues for cyber risks. Malicious attackers can exploit shared models to launch cyber-attacks. This work focuses on the steganalysis of injected malicious Least Significant Bit (LSB) steganography into AI models, and it is the first work focusing on AI model attacks. In response to this threat, this paper presents a steganalysis method specifically tailored to detect and mitigate malicious LSB steganography attacks based on supervised and unsupervised AI detection steganalysis methods. Our proposed technique aims to preserve the integrity of shared models, protect user trust, and maintain the momentum of open collaboration within the AI community. In this work, we propose 3 steganalysis methods and open source our code. We found that the success of the steganalysis depends on the LSB attack location. If the attacker decides to exploit the least significant bits in the LSB, the ability to detect the attacks is low. However, if the attack is in the most significant LSB bits, the attack can be detected with almost perfect accuracy. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.03071 [pdf, other]

Disarming Steganography Attacks Inside Neural Network Models

Authors: Ran Dubin

Abstract: Similar to the revolution of open source code sharing, Artificial Intelligence (AI) model sharing is gaining increased popularity. However, the fast adaptation in the industry, lack of awareness, and ability to exploit the models make them significant attack vectors. By embedding malware in neurons, the malware can be delivered covertly, with minor or no impact on the neural network's performance.… ▽ More Similar to the revolution of open source code sharing, Artificial Intelligence (AI) model sharing is gaining increased popularity. However, the fast adaptation in the industry, lack of awareness, and ability to exploit the models make them significant attack vectors. By embedding malware in neurons, the malware can be delivered covertly, with minor or no impact on the neural network's performance. The covert attack will use the Least Significant Bits (LSB) weight attack since LSB has a minimal effect on the model accuracy, and as a result, the user will not notice it. Since there are endless ways to hide the attacks, we focus on a zero-trust prevention strategy based on AI model attack disarm and reconstruction. We proposed three types of model steganography weight disarm defense mechanisms. The first two are based on random bit substitution noise, and the other on model weight quantization. We demonstrate a 100\% prevention rate while the methods introduce a minimal decrease in model accuracy based on Qint8 and K-LRBP methods, which is an essential factor for improving AI security. △ Less

Submitted 26 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2307.14057 [pdf, other]

Open Image Content Disarm And Reconstruction

Authors: Eli Belkind, Ran Dubin, Amit Dvir

Abstract: With the advance in malware technology, attackers create new ways to hide their malicious code from antivirus services. One way to obfuscate an attack is to use common files as cover to hide the malicious scripts, so the malware will look like a legitimate file. Although cutting-edge Artificial Intelligence and content signature exist, evasive malware successfully bypasses next-generation malware… ▽ More With the advance in malware technology, attackers create new ways to hide their malicious code from antivirus services. One way to obfuscate an attack is to use common files as cover to hide the malicious scripts, so the malware will look like a legitimate file. Although cutting-edge Artificial Intelligence and content signature exist, evasive malware successfully bypasses next-generation malware detection using advanced methods like steganography. Some of the files commonly used to hide malware are image files (e.g., JPEG). In addition, some malware use steganography to hide malicious scripts or sensitive data in images. Steganography in images is difficult to detect even with specialized tools. Image-based attacks try to attack the user's device using malicious payloads or utilize image steganography to hide sensitive data inside legitimate images and leak it outside the user's device. Therefore in this paper, we present a novel Image Content Disarm and Reconstruction (ICDR). Our ICDR system removes potential malware, with a zero trust approach, while maintaining high image quality and file usability. By extracting the image data, removing it from the rest of the file, and manipulating the image pixels, it is possible to disable or remove the hidden malware inside the file. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 14 pages

arXiv:2206.10144 [pdf, other]

Open-Source Framework for Encrypted Internet and Malicious Traffic Classification

Authors: Ofek Bader, Adi Lichy, Amit Dvir, Ran Dubin, Chen Hajaj

Abstract: Internet traffic classification plays a key role in network visibility, Quality of Services (QoS), intrusion detection, Quality of Experience (QoE) and traffic-trend analyses. In order to improve privacy, integrity, confidentiality, and protocol obfuscation, the current traffic is based on encryption protocols, e.g., SSL/TLS. With the increased use of Machine-Learning (ML) and Deep-Learning (DL) m… ▽ More Internet traffic classification plays a key role in network visibility, Quality of Services (QoS), intrusion detection, Quality of Experience (QoE) and traffic-trend analyses. In order to improve privacy, integrity, confidentiality, and protocol obfuscation, the current traffic is based on encryption protocols, e.g., SSL/TLS. With the increased use of Machine-Learning (ML) and Deep-Learning (DL) models in the literature, comparison between different models and methods has become cumbersome and difficult due to a lack of a standardized framework. In this paper, we propose an open-source framework, named OSF-EIMTC, which can provide the full pipeline of the learning process. From the well-known datasets to extracting new and well-known features, it provides implementations of well-known ML and DL models (from the traffic classification literature) as well as evaluations. Such a framework can facilitate research in traffic classification domains, so that it will be more repeatable, reproducible, easier to execute, and will allow a more accurate comparison of well-known and novel features and models. As part of our framework evaluation, we demonstrate a variety of cases where the framework can be of use, utilizing multiple datasets, models, and feature sets. We show analyses of publicly available datasets and invite the community to participate in our open challenges using the OSF-EIMTC. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.08004 [pdf, other]

When a RF Beats a CNN and GRU, Together -- A Comparison of Deep Learning and Classical Machine Learning Approaches for Encrypted Malware Traffic Classification

Authors: Adi Lichy, Ofek Bader, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have beco… ▽ More Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have become the common default. This paper compares well-known DL-based and ML-based models and shows that in the case of malicious traffic classification, state-of-the-art DL-based solutions do not necessarily outperform the classical ML-based ones. We exemplify this finding using two well-known datasets for a varied set of tasks, such as: malware detection, malware family classification, detection of zero-day attacks, and classification of an iteratively growing dataset. Note that, it is not feasible to evaluate all possible models to make a concrete statement, thus, the above finding is not a recommendation to avoid DL-based models, but rather empirical proof that in some cases, there are more simplistic solutions, that may perform even better. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:1603.04865 [pdf, ps, other]

Robust Machine Learning for Encrypted Traffic Classification

Authors: Amit Dvir, Yehonatan Zion, Jonathan Muehlstein, Ofir Pele, Chen Hajaj, Ran Dubin

Abstract: Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victims encrypted network traffic in order to identify users para… ▽ More Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victims encrypted network traffic in order to identify users parameters such as the Operating System (OS), browser and apps. The user may use tools such as a Virtual Private Network (VPN) or even change protocols parameters to protect his/her privacy. We provide a large dataset of more than 20,000 examples for this task. We run a comprehensive set of experiments, that achieves high (above 85) classification accuracy, robustness and resilience to changes of features as a function of different network conditions at test time. We also show the effect of a small training set on the accuracy. △ Less

Submitted 20 July, 2020; v1 submitted 15 March, 2016; originally announced March 2016.

arXiv:1602.02030 [pdf, other]

doi 10.1007/s00530-016-0525-6

Adaptation Logic for HTTP Dynamic Adaptive Streaming using Geo-Predictive Crowdsourcing

Authors: Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar, Itay Katz, Ori Mashiach

Abstract: The increasing demand for video streaming services with high Quality of Experience (QoE) has prompted a lot of research on client-side adaptation logic approaches. However, most algorithms use the client's previous download experience and do not use a crowd knowledge database generated by users of a professional service. We propose a new crowd algorithm that maximizes the QoE. Additionally, we sho… ▽ More The increasing demand for video streaming services with high Quality of Experience (QoE) has prompted a lot of research on client-side adaptation logic approaches. However, most algorithms use the client's previous download experience and do not use a crowd knowledge database generated by users of a professional service. We propose a new crowd algorithm that maximizes the QoE. Additionally, we show how crowd information can be integrated into existing algorithms and illustrate this with two state-of-the-art algorithms. We evaluate our algorithm and state-of-the-art algorithms (including our modified algorithms) on a large, real-life crowdsourcing dataset that contains 336,551 samples on network performance. The dataset was provided by WeFi LTD. Our new algorithm outperforms all other methods in terms of QoS (eMOS). △ Less

Submitted 5 February, 2016; originally announced February 2016.

Comments: 10 pages

arXiv:1602.00490 [pdf, other]

doi 10.1109/TIFS.2017.2730819

I Know What You Saw Last Minute - Encrypted HTTP Adaptive Video Streaming Title Classification

Authors: Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar

Abstract: Desktops and laptops can be maliciously exploited to violate privacy. There are two main types of attack scenarios: active and passive. In this paper, we consider the passive scenario where the adversary does not interact actively with the device, but he is able to eavesdrop on the network traffic of the device from the network side. Most of the Internet traffic is encrypted and thus passive attac… ▽ More Desktops and laptops can be maliciously exploited to violate privacy. There are two main types of attack scenarios: active and passive. In this paper, we consider the passive scenario where the adversary does not interact actively with the device, but he is able to eavesdrop on the network traffic of the device from the network side. Most of the Internet traffic is encrypted and thus passive attacks are challenging. Previous research has shown that information can be extracted from encrypted multimedia streams. This includes video title classification of non HTTP adaptive streams (non-HAS). This paper presents an algorithm for encrypted HTTP adaptive video streaming title classification. We show that an external attacker can identify the video title from video HTTP adaptive streams (HAS) sites such as YouTube. To the best of our knowledge, this is the first work that shows this. We provide a large data set of 10000 YouTube video streams of 100 popular video titles (each title downloaded 100 times) as examples for this task. The dataset was collected under real-world network conditions. We present several machine algorithms for the task and run a through set of experiments, which shows that our classification accuracy is more than 95%. We also show that our algorithms are able to classify video titles that are not in the training set as unknown and some of the algorithms are also able to eliminate false prediction of video titles and instead report unknown. Finally, we evaluate our algorithms robustness to delays and packet losses at test time and show that a solution that uses SVM is the most robust against these changes given enough training data. We provide the dataset and the crawler for future research. △ Less

Submitted 21 July, 2016; v1 submitted 1 February, 2016; originally announced February 2016.

Comments: 9 pages. arXiv admin note: text overlap with arXiv:1602.00489

Journal ref: IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 12, DECEMBER 2017

arXiv:1602.00489 [pdf, ps, other]

Real Time Video Quality Representation Classification of Encrypted HTTP Adaptive Video Streaming - the Case of Safari

Authors: Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar, Itay Richman, Ofir Trabelsi

Abstract: The increasing popularity of HTTP adaptive video streaming services has dramatically increased bandwidth requirements on operator networks, which attempt to shape their traffic through Deep Packet Inspection (DPI). However, Google and certain content providers have started to encrypt their video services. As a result, operators often encounter difficulties in sha** their encrypted video traffic… ▽ More The increasing popularity of HTTP adaptive video streaming services has dramatically increased bandwidth requirements on operator networks, which attempt to shape their traffic through Deep Packet Inspection (DPI). However, Google and certain content providers have started to encrypt their video services. As a result, operators often encounter difficulties in sha** their encrypted video traffic via DPI. This highlights the need for new traffic classification methods for encrypted HTTP adaptive video streaming to enable smart traffic sha**. These new methods will have to effectively estimate the quality representation layer and playout buffer. We present a new method and show for the first time that video quality representation classification for (YouTube) encrypted HTTP adaptive streaming is possible. We analyze the performance of this classification method with Safari over HTTPS. Based on a large number of offline and online traffic classification experiments, we demonstrate that it can independently classify, in real time, every video segment into one of the quality representation layers with 97.18% average accuracy. △ Less

Submitted 19 February, 2016; v1 submitted 1 February, 2016; originally announced February 2016.

Comments: 9 pages

Showing 1–12 of 12 results for author: Dubin, R