Search | arXiv e-print repository

Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning

Authors: Udi Aharon, Revital Marbel, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. These threats necessitate robust anomaly detection systems capable of identifying malicious API traffic efficiently despite limited and diverse datasets. This paper proposes a novel few-shot detection approach motivated by Natural Language Processing (NLP) and advanced Genera… ▽ More Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. These threats necessitate robust anomaly detection systems capable of identifying malicious API traffic efficiently despite limited and diverse datasets. This paper proposes a novel few-shot detection approach motivated by Natural Language Processing (NLP) and advanced Generative Adversarial Network (GAN)-inspired techniques. Leveraging state-of-the-art Transformer architectures, particularly RoBERTa, our method enhances the contextual understanding of API requests, leading to improved anomaly detection compared to traditional methods. We showcase the technique's versatility by demonstrating its effectiveness with both Out-of-Distribution (OOD) and Transformer-based binary classification methods on two distinct datasets: CSIC 2010 and ATRDF 2023. Our evaluations reveal consistently enhanced or, at worst, equivalent detection rates across various metrics in most vectors, highlighting the promise of our approach for improving API security. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, 7 tables

arXiv:2405.11247 [pdf, other]

Few-Shot API Attack Anomaly Detection in a Classification-by-Retrieval Framework

Authors: Udi Aharon, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Application Programming Interface (API) attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there still remain… ▽ More Application Programming Interface (API) attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there still remains a significant challenge when dealing with attackers who use novel approaches that don't match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities in unconventional manners and with objectives surpassing their intended boundaries. This means API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to anomalous behavior. In response to these challenges, we propose a novel few-shot anomaly detection framework, named FT-ANN. This framework is composed of two parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework enables the development of a lightweight model that can be trained with minimal examples per class or even a model capable of classifying multiple classes. The results show that our framework effectively improves API attack detection accuracy compared to various baselines. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 13 pages, 8 figures, 3 tables

arXiv:2403.11206 [pdf, other]

CBR -- Boosting Adaptive Classification By Retrieval of Encrypted Network Traffic with Out-of-distribution

Authors: Amir Lukach, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Encrypted network traffic Classification tackles the problem from different approaches and with different goals. One of the common approaches is using Machine learning or Deep Learning-based solutions on a fixed number of classes, leading to misclassification when an unknown class is given as input. One of the solutions for handling unknown classes is to retrain the model, however, retraining mode… ▽ More Encrypted network traffic Classification tackles the problem from different approaches and with different goals. One of the common approaches is using Machine learning or Deep Learning-based solutions on a fixed number of classes, leading to misclassification when an unknown class is given as input. One of the solutions for handling unknown classes is to retrain the model, however, retraining models every time they become obsolete is both resource and time-consuming. Therefore, there is a growing need to allow classification models to detect and adapt to new classes dynamically, without retraining, but instead able to detect new classes using few shots learning [1]. In this paper, we introduce Adaptive Classification By Retrieval CBR, a novel approach for encrypted network traffic classification. Our new approach is based on an ANN-based method, which allows us to effectively identify new and existing classes without retraining the model. The novel approach is simple, yet effective and achieved similar results to RF with up to 5% difference (usually less than that) in the classification tasks while having a slight decrease in the case of new samples (from new classes) without retraining. To summarize, the new method is a real-time classification, which can classify new classes without retraining. Furthermore, our solution can be used as a complementary solution alongside RF or any other machine/deep learning classification method, as an aggregated solution. △ Less

Submitted 17 March, 2024; originally announced March 2024.

MSC Class: ACM-class: F.2.2; I.2.7 ACM-class: F.2.2; I.2.7 ACM-class: F.2.2; I.2.7 ACM-class: F.2.2; I.2.7 ACM-class: I.2.6

arXiv:2307.14057 [pdf, other]

Open Image Content Disarm And Reconstruction

Authors: Eli Belkind, Ran Dubin, Amit Dvir

Abstract: With the advance in malware technology, attackers create new ways to hide their malicious code from antivirus services. One way to obfuscate an attack is to use common files as cover to hide the malicious scripts, so the malware will look like a legitimate file. Although cutting-edge Artificial Intelligence and content signature exist, evasive malware successfully bypasses next-generation malware… ▽ More With the advance in malware technology, attackers create new ways to hide their malicious code from antivirus services. One way to obfuscate an attack is to use common files as cover to hide the malicious scripts, so the malware will look like a legitimate file. Although cutting-edge Artificial Intelligence and content signature exist, evasive malware successfully bypasses next-generation malware detection using advanced methods like steganography. Some of the files commonly used to hide malware are image files (e.g., JPEG). In addition, some malware use steganography to hide malicious scripts or sensitive data in images. Steganography in images is difficult to detect even with specialized tools. Image-based attacks try to attack the user's device using malicious payloads or utilize image steganography to hide sensitive data inside legitimate images and leak it outside the user's device. Therefore in this paper, we present a novel Image Content Disarm and Reconstruction (ICDR). Our ICDR system removes potential malware, with a zero trust approach, while maintaining high image quality and file usability. By extracting the image data, removing it from the rest of the file, and manipulating the image pixels, it is possible to disable or remove the hidden malware inside the file. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 14 pages

arXiv:2206.10144 [pdf, other]

Open-Source Framework for Encrypted Internet and Malicious Traffic Classification

Authors: Ofek Bader, Adi Lichy, Amit Dvir, Ran Dubin, Chen Hajaj

Abstract: Internet traffic classification plays a key role in network visibility, Quality of Services (QoS), intrusion detection, Quality of Experience (QoE) and traffic-trend analyses. In order to improve privacy, integrity, confidentiality, and protocol obfuscation, the current traffic is based on encryption protocols, e.g., SSL/TLS. With the increased use of Machine-Learning (ML) and Deep-Learning (DL) m… ▽ More Internet traffic classification plays a key role in network visibility, Quality of Services (QoS), intrusion detection, Quality of Experience (QoE) and traffic-trend analyses. In order to improve privacy, integrity, confidentiality, and protocol obfuscation, the current traffic is based on encryption protocols, e.g., SSL/TLS. With the increased use of Machine-Learning (ML) and Deep-Learning (DL) models in the literature, comparison between different models and methods has become cumbersome and difficult due to a lack of a standardized framework. In this paper, we propose an open-source framework, named OSF-EIMTC, which can provide the full pipeline of the learning process. From the well-known datasets to extracting new and well-known features, it provides implementations of well-known ML and DL models (from the traffic classification literature) as well as evaluations. Such a framework can facilitate research in traffic classification domains, so that it will be more repeatable, reproducible, easier to execute, and will allow a more accurate comparison of well-known and novel features and models. As part of our framework evaluation, we demonstrate a variety of cases where the framework can be of use, utilizing multiple datasets, models, and feature sets. We show analyses of publicly available datasets and invite the community to participate in our open challenges using the OSF-EIMTC. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.08004 [pdf, other]

When a RF Beats a CNN and GRU, Together -- A Comparison of Deep Learning and Classical Machine Learning Approaches for Encrypted Malware Traffic Classification

Authors: Adi Lichy, Ofek Bader, Ran Dubin, Amit Dvir, Chen Hajaj

Abstract: Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have beco… ▽ More Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have become the common default. This paper compares well-known DL-based and ML-based models and shows that in the case of malicious traffic classification, state-of-the-art DL-based solutions do not necessarily outperform the classical ML-based ones. We exemplify this finding using two well-known datasets for a varied set of tasks, such as: malware detection, malware family classification, detection of zero-day attacks, and classification of an iteratively growing dataset. Note that, it is not feasible to evaluate all possible models to make a concrete statement, thus, the above finding is not a recommendation to avoid DL-based models, but rather empirical proof that in some cases, there are more simplistic solutions, that may perform even better. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2205.14576 [pdf, other]

Problem-Space Evasion Attacks in the Android OS: a Survey

Authors: Harel Berger, Chen Hajaj, Amit Dvir

Abstract: Android is the most popular OS worldwide. Therefore, it is a target for various kinds of malware. As a countermeasure, the security community works day and night to develop appropriate Android malware detection systems, with ML-based or DL-based systems considered as some of the most common types. Against these detection systems, intelligent adversaries develop a wide set of evasion attacks, in wh… ▽ More Android is the most popular OS worldwide. Therefore, it is a target for various kinds of malware. As a countermeasure, the security community works day and night to develop appropriate Android malware detection systems, with ML-based or DL-based systems considered as some of the most common types. Against these detection systems, intelligent adversaries develop a wide set of evasion attacks, in which an attacker slightly modifies a malware sample to evade its target detection system. In this survey, we address problem-space evasion attacks in the Android OS, where attackers manipulate actual APKs, rather than their extracted feature vector. We aim to explore this kind of attacks, frequently overlooked by the research community due to a lack of knowledge of the Android domain, or due to focusing on general mathematical evasion attacks - i.e., feature-space evasion attacks. We discuss the different aspects of problem-space evasion attacks, using a new taxonomy, which focuses on key ingredients of each problem-space attack, such as the attacker model, the attacker's mode of operation, and the functional assessment of post-attack applications. △ Less

Submitted 21 June, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.04293 [pdf, other]

Do You Think You Can Hold Me? The Real Challenge of Problem-Space Evasion Attacks

Authors: Harel Berger, Amit Dvir, Chen Hajaj, Rony Ronen

Abstract: Android malware is a spreading disease in the virtual world. Anti-virus and detection systems continuously undergo patches and updates to defend against these threats. Most of the latest approaches in malware detection use Machine Learning (ML). Against the robustifying effort of detection systems, raise the \emph{evasion attacks}, where an adversary changes its targeted samples so that they are m… ▽ More Android malware is a spreading disease in the virtual world. Anti-virus and detection systems continuously undergo patches and updates to defend against these threats. Most of the latest approaches in malware detection use Machine Learning (ML). Against the robustifying effort of detection systems, raise the \emph{evasion attacks}, where an adversary changes its targeted samples so that they are misclassified as benign. This paper considers two kinds of evasion attacks: feature-space and problem-space. \emph{Feature-space} attacks consider an adversary who manipulates ML features to evade the correct classification while minimizing or constraining the total manipulations. \textit{Problem-space} attacks refer to evasion attacks that change the actual sample. Specifically, this paper analyzes the gap between these two types in the Android malware domain. The gap between the two types of evasion attacks is examined via the retraining process of classifiers using each one of the evasion attack types. The experiments show that the gap between these two types of retrained classifiers is dramatic and may increase to 96\%. Retrained classifiers of feature-space evasion attacks have been found to be either less effective or completely ineffective against problem-space evasion attacks. Additionally, exploration of different problem-space evasion attacks shows that retraining of one problem-space evasion attack may be effective against other problem-space evasion attacks. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.13922 [pdf, ps, other]

MaMaDroid2.0 -- The Holes of Control Flow Graphs

Authors: Harel Berger, Chen Hajaj, Enrico Mariconti, Amit Dvir

Abstract: Android malware is a continuously expanding threat to billions of mobile users around the globe. Detection systems are updated constantly to address these threats. However, a backlash takes the form of evasion attacks, in which an adversary changes malicious samples such that those samples will be misclassified as benign. This paper fully inspects a well-known Android malware detection system, MaM… ▽ More Android malware is a continuously expanding threat to billions of mobile users around the globe. Detection systems are updated constantly to address these threats. However, a backlash takes the form of evasion attacks, in which an adversary changes malicious samples such that those samples will be misclassified as benign. This paper fully inspects a well-known Android malware detection system, MaMaDroid, which analyzes the control flow graph of the application. Changes to the portion of benign samples in the train set and models are considered to see their effect on the classifier. The changes in the ratio between benign and malicious samples have a clear effect on each one of the models, resulting in a decrease of more than 40% in their detection rate. Moreover, adopted ML models are implemented as well, including 5-NN, Decision Tree, and Adaboost. Exploration of the six models reveals a typical behavior in different cases, of tree-based models and distance-based models. Moreover, three novel attacks that manipulate the CFG and their detection rates are described for each one of the targeted models. The attacks decrease the detection rate of most of the models to 0%, with regards to different ratios of benign to malicious apps. As a result, a new version of MaMaDroid is engineered. This model fuses the CFG of the app and static analysis of features of the app. This improved model is proved to be robust against evasion attacks targeting both CFG-based models and static analysis models, achieving a detection rate of more than 90% against each one of the attacks. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2006.01449 [pdf, other]

Less is More: Robust and Novel Features for Malicious Domain Detection

Authors: Chen Hajaj, Nitay Hason, Nissim Harel, Amit Dvir

Abstract: Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., C\&C, phishing, and spear-phishing). Despite the continuous progress in detecting these attacks, many alarming problems remain open, such as the weak spots of the defense mechanisms. Since machine learning has become one of th… ▽ More Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., C\&C, phishing, and spear-phishing). Despite the continuous progress in detecting these attacks, many alarming problems remain open, such as the weak spots of the defense mechanisms. Since machine learning has become one of the most prominent methods of malware detection, A robust feature selection mechanism is proposed that results in malicious domain detection models that are resistant to evasion attacks. This mechanism exhibits high performance based on empirical data. This paper makes two main contributions: First, it provides an analysis of robust feature selection based on widely used features in the literature. Note that even though the feature set dimensional space is reduced by half (from nine to four features), the performance of the classifier is still improved (an increase in the model's F1-score from 92.92\% to 95.81\%). Second, it introduces novel features that are robust to the adversary's manipulation. Based on an extensive evaluation of the different feature sets and commonly used classification models, this paper shows that models that are based on robust features are resistant to malicious perturbations, and at the same time useful for classifying non-manipulated data. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Comments: 30 pages, 7 figures, 10 tables

arXiv:2006.01206 [pdf, other]

A Thousand Words are Worth More Than One Recording: NLP Based Speaker Change Point Detection

Authors: O. H. Anidjar, C. Hajaj, A. Dvir, I. Gilad

Abstract: Speaker Diarization (SD) consists of splitting or segmenting an input audio burst according to speaker identities. In this paper, we focus on the crucial task of the SD problem which is the audio segmenting process and suggest a solution for the Change Point Detection (CPD) problem. We empirically demonstrate the negative correlation between an increase in the number of speakers and the Recall and… ▽ More Speaker Diarization (SD) consists of splitting or segmenting an input audio burst according to speaker identities. In this paper, we focus on the crucial task of the SD problem which is the audio segmenting process and suggest a solution for the Change Point Detection (CPD) problem. We empirically demonstrate the negative correlation between an increase in the number of speakers and the Recall and F1-Score measurements. This negative correlation is shown to be the outcome of a massive experimental evaluation process, which accounts its superiority to recently developed voice based solutions. In order to overcome the number of speakers issue, we suggest a robust solution based on a novel Natural Language Processing (NLP) technique, as well as a metadata features extraction process, rather than a vocal based alone. To the best of our knowledge, we are the first to propose an intelligent NLP based solution that (I) tackles the CPD problem with a dataset in Hebrew, and (II) solves the CPD variant of the SD problem. We empirically show, based on two distinct datasets, that our method is abled to accurately identify the CPDs in an audio burst with 82.12% and 89.02% of success in the Recall and F1-score measurements. △ Less

Submitted 18 May, 2020; originally announced June 2020.

Comments: 30 pages, 8 Figures, and 6 Tables

arXiv:2003.14123 [pdf, other]

When the Guard failed the Droid: A case study of Android malware

Authors: Harel Berger, Chen Hajaj, Amit Dvir

Abstract: Android malware is a persistent threat to billions of users around the world. As a countermeasure, Android malware detection systems are occasionally implemented. However, these systems are often vulnerable to \emph{evasion attacks}, in which an adversary manipulates malicious instances so that they are misidentified as benign. In this paper, we launch various innovative evasion attacks against se… ▽ More Android malware is a persistent threat to billions of users around the world. As a countermeasure, Android malware detection systems are occasionally implemented. However, these systems are often vulnerable to \emph{evasion attacks}, in which an adversary manipulates malicious instances so that they are misidentified as benign. In this paper, we launch various innovative evasion attacks against several Android malware detection systems. The vulnerability inherent to all of these systems is that they are part of Androguard~\cite{desnos2011androguard}, a popular open source library used in Android malware detection systems. Some of the detection systems decrease to a 0\% detection rate after the attack. Therefore, the use of open source libraries in malware detection systems calls for caution. In addition, we present a novel evaluation scheme for evasion attack generation that exploits the weak spots of known Android malware detection systems. In so doing, we evaluate the functionality and maliciousness of the manipulated instances created by our evasion attacks. We found variations in both the maliciousness and functionality tests of our manipulated apps. We show that non-functional apps, while considered malicious, do not threaten users and are thus useless from an attacker's point of view. We conclude that evasion attacks must be assessed for both functionality and maliciousness to evaluate their impact, a step which is far from commonplace today. △ Less

Submitted 31 March, 2020; originally announced March 2020.

arXiv:1906.10928 [pdf, other]

A wrinkle in time: A case study in DNS poisoning

Authors: Harel Berger, Amit Z. Dvir, Moti Geva

Abstract: The Domain Name System (DNS) provides a translation between readable domain names and IP addresses. The DNS is a key infrastructure component of the Internet and a prime target for a variety of attacks. One of the most significant threat to the DNS's wellbeing is a DNS poisoning attack, in which the DNS responses are maliciously replaced, or poisoned, by an attacker. To identify this kind of attac… ▽ More The Domain Name System (DNS) provides a translation between readable domain names and IP addresses. The DNS is a key infrastructure component of the Internet and a prime target for a variety of attacks. One of the most significant threat to the DNS's wellbeing is a DNS poisoning attack, in which the DNS responses are maliciously replaced, or poisoned, by an attacker. To identify this kind of attack, we start by an analysis of different kinds of response times. We present an analysis of typical and atypical response times, while differentiating between the different levels of DNS servers' response times, from root servers down to internal caching servers. We successfully identify empirical DNS poisoning attacks based on a novel method for DNS response timing analysis. We then present a system we developed to validate our technique that does not require any changes to the DNS protocol or any existing network equipment. Our validation system tested data from different architectures including LAN and cloud environments and real data from an Internet Service Provider (ISP). Our method and system differ from most other DNS poisoning detection methods and achieved high detection rates exceeding 99%. These findings suggest that when used in conjunction with other methods, they can considerably enhance the accuracy of these methods. △ Less

Submitted 26 June, 2019; originally announced June 2019.

arXiv:1603.04865 [pdf, ps, other]

Robust Machine Learning for Encrypted Traffic Classification

Authors: Amit Dvir, Yehonatan Zion, Jonathan Muehlstein, Ofir Pele, Chen Hajaj, Ran Dubin

Abstract: Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victims encrypted network traffic in order to identify users para… ▽ More Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victims encrypted network traffic in order to identify users parameters such as the Operating System (OS), browser and apps. The user may use tools such as a Virtual Private Network (VPN) or even change protocols parameters to protect his/her privacy. We provide a large dataset of more than 20,000 examples for this task. We run a comprehensive set of experiments, that achieves high (above 85) classification accuracy, robustness and resilience to changes of features as a function of different network conditions at test time. We also show the effect of a small training set on the accuracy. △ Less

Submitted 20 July, 2020; v1 submitted 15 March, 2016; originally announced March 2016.

arXiv:1602.02030 [pdf, other]

doi 10.1007/s00530-016-0525-6

Adaptation Logic for HTTP Dynamic Adaptive Streaming using Geo-Predictive Crowdsourcing

Authors: Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar, Itay Katz, Ori Mashiach

Abstract: The increasing demand for video streaming services with high Quality of Experience (QoE) has prompted a lot of research on client-side adaptation logic approaches. However, most algorithms use the client's previous download experience and do not use a crowd knowledge database generated by users of a professional service. We propose a new crowd algorithm that maximizes the QoE. Additionally, we sho… ▽ More The increasing demand for video streaming services with high Quality of Experience (QoE) has prompted a lot of research on client-side adaptation logic approaches. However, most algorithms use the client's previous download experience and do not use a crowd knowledge database generated by users of a professional service. We propose a new crowd algorithm that maximizes the QoE. Additionally, we show how crowd information can be integrated into existing algorithms and illustrate this with two state-of-the-art algorithms. We evaluate our algorithm and state-of-the-art algorithms (including our modified algorithms) on a large, real-life crowdsourcing dataset that contains 336,551 samples on network performance. The dataset was provided by WeFi LTD. Our new algorithm outperforms all other methods in terms of QoS (eMOS). △ Less

Submitted 5 February, 2016; originally announced February 2016.

Comments: 10 pages

arXiv:1602.00490 [pdf, other]

doi 10.1109/TIFS.2017.2730819

I Know What You Saw Last Minute - Encrypted HTTP Adaptive Video Streaming Title Classification

Authors: Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar

Abstract: Desktops and laptops can be maliciously exploited to violate privacy. There are two main types of attack scenarios: active and passive. In this paper, we consider the passive scenario where the adversary does not interact actively with the device, but he is able to eavesdrop on the network traffic of the device from the network side. Most of the Internet traffic is encrypted and thus passive attac… ▽ More Desktops and laptops can be maliciously exploited to violate privacy. There are two main types of attack scenarios: active and passive. In this paper, we consider the passive scenario where the adversary does not interact actively with the device, but he is able to eavesdrop on the network traffic of the device from the network side. Most of the Internet traffic is encrypted and thus passive attacks are challenging. Previous research has shown that information can be extracted from encrypted multimedia streams. This includes video title classification of non HTTP adaptive streams (non-HAS). This paper presents an algorithm for encrypted HTTP adaptive video streaming title classification. We show that an external attacker can identify the video title from video HTTP adaptive streams (HAS) sites such as YouTube. To the best of our knowledge, this is the first work that shows this. We provide a large data set of 10000 YouTube video streams of 100 popular video titles (each title downloaded 100 times) as examples for this task. The dataset was collected under real-world network conditions. We present several machine algorithms for the task and run a through set of experiments, which shows that our classification accuracy is more than 95%. We also show that our algorithms are able to classify video titles that are not in the training set as unknown and some of the algorithms are also able to eliminate false prediction of video titles and instead report unknown. Finally, we evaluate our algorithms robustness to delays and packet losses at test time and show that a solution that uses SVM is the most robust against these changes given enough training data. We provide the dataset and the crawler for future research. △ Less

Submitted 21 July, 2016; v1 submitted 1 February, 2016; originally announced February 2016.

Comments: 9 pages. arXiv admin note: text overlap with arXiv:1602.00489

Journal ref: IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 12, DECEMBER 2017

arXiv:1602.00489 [pdf, ps, other]

Real Time Video Quality Representation Classification of Encrypted HTTP Adaptive Video Streaming - the Case of Safari

Authors: Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar, Itay Richman, Ofir Trabelsi

Abstract: The increasing popularity of HTTP adaptive video streaming services has dramatically increased bandwidth requirements on operator networks, which attempt to shape their traffic through Deep Packet Inspection (DPI). However, Google and certain content providers have started to encrypt their video services. As a result, operators often encounter difficulties in sha** their encrypted video traffic… ▽ More The increasing popularity of HTTP adaptive video streaming services has dramatically increased bandwidth requirements on operator networks, which attempt to shape their traffic through Deep Packet Inspection (DPI). However, Google and certain content providers have started to encrypt their video services. As a result, operators often encounter difficulties in sha** their encrypted video traffic via DPI. This highlights the need for new traffic classification methods for encrypted HTTP adaptive video streaming to enable smart traffic sha**. These new methods will have to effectively estimate the quality representation layer and playout buffer. We present a new method and show for the first time that video quality representation classification for (YouTube) encrypted HTTP adaptive streaming is possible. We analyze the performance of this classification method with Safari over HTTPS. Based on a large number of offline and online traffic classification experiments, we demonstrate that it can independently classify, in real time, every video segment into one of the quality representation layers with 97.18% average accuracy. △ Less

Submitted 19 February, 2016; v1 submitted 1 February, 2016; originally announced February 2016.

Comments: 9 pages

Showing 1–17 of 17 results for author: Dvir, A