-
Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO
Authors:
Shilo Daum,
Tal Shapira,
Anat Bremler-Barr,
David Hay
Abstract:
With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques.
The first component, HO (Hyperparameter Op…
▽ More
With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques.
The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations.
Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it.
Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
IoT Device Labeling Using Large Language Models
Authors:
Bar Meyuhas,
Anat Bremler-Barr,
Tal Shapira
Abstract:
The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques that assume the IoT device, l…
▽ More
The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques that assume the IoT device, labeled by both its vendor and function, was observed during their training phase. We tackle a key challenge in IoT labeling: how can an AI solution label an IoT device that has never been seen before and whose label is unknown?
Our solution extracts textual features such as domain names and hostnames from network traffic, and then enriches these features using Google search data alongside catalog of vendors and device functions. The solution also integrates an auto-update mechanism that uses Large Language Models (LLMs) to update these catalogs with emerging device types. Based on the information gathered, the device's vendor is identified through string matching with the enriched features. The function is then deduced by LLMs and zero-shot classification from a predefined catalog of IoT functions.
In an evaluation of our solution on 97 unique IoT devices, our function labeling approach achieved HIT1 and HIT2 scores of 0.7 and 0.77, respectively. As far as we know, this is the first research to tackle AI-automated IoT labeling.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
It Is Not Where You Are, It Is Where You Are Registered: IoT Location Impact
Authors:
Bar Meyuhas,
Anat Bremler-Barr,
David Hay,
Shoham Danino
Abstract:
This paper investigates how and with whom IoT devices communicate and how their location affects their communication patterns. Specifically, the endpoints an IoT device communicates with can be defined as a small set of domains. To study how the location of the device affects its domain set, we distinguish between the location based on its IP address and the location defined by the user when regis…
▽ More
This paper investigates how and with whom IoT devices communicate and how their location affects their communication patterns. Specifically, the endpoints an IoT device communicates with can be defined as a small set of domains. To study how the location of the device affects its domain set, we distinguish between the location based on its IP address and the location defined by the user when registering the device. We show, unlike common wisdom, that IP-based location has little to no effect on the set of domains, while the user-defined location changes the set significantly. Unlike common approaches to resolving domains to IP addresses at close-by geo-locations (such as anycast), we present a distinctive way to use the ECS field of EDNS to achieve the same differentiation between user-defined locations. Our solution streamlines the network design of IoT manufacturers and makes it easier for security appliances to monitor IoT traffic. Finally, we show that with one domain for all locations, one can achieve succinct descriptions of the traffic of the IoT device across the globe. We will discuss the implications of such description on security appliances and specifically, on the ones using the Manufacturer Usage Description (MUD) framework.
△ Less
Submitted 15 December, 2022; v1 submitted 3 December, 2022;
originally announced December 2022.
-
Dynamic-Deep: Tune ECG Task Performance and Optimize Compression in IoT Architectures
Authors:
Eli Brosh,
Elad Wasserstein,
Anat Bremler-Barr
Abstract:
Monitoring medical data, e.g., Electrocardiogram (ECG) signals, is a common application of Internet of Things (IoT) devices. Compression methods are often applied on the massive amounts of sensor data generated prior to sending it to the Cloud to reduce the storage and delivery costs. A lossy compression provides high compression gain (CG), but may reduce the performance of an ECG application (dow…
▽ More
Monitoring medical data, e.g., Electrocardiogram (ECG) signals, is a common application of Internet of Things (IoT) devices. Compression methods are often applied on the massive amounts of sensor data generated prior to sending it to the Cloud to reduce the storage and delivery costs. A lossy compression provides high compression gain (CG), but may reduce the performance of an ECG application (downstream task) due to information loss. Previous works on ECG monitoring focus either on optimizing the signal reconstruction or the task's performance. Instead, we advocate a self-adapting lossy compression solution that enables configuring a desired performance level on the downstream tasks while maintaining an optimized CG that reduces Cloud costs. We propose Dynamic-Deep, a task-aware compression geared for IoT-Cloud architectures. Our compressor is trained to optimize the CG while maintaining the performance requirement of the downstream tasks chosen out of a wide range. In deployment, the IoT edge device adapts the compression and sends an optimized representation for each data segment, accounting for the downstream task's desired performance without relying on feedback from the Cloud. We conduct an extensive evaluation of our approach on common ECG datasets using two popular ECG applications, which includes heart rate (HR) arrhythmia classification. We demonstrate that Dynamic-Deep can be configured to improve HR classification F1-score in a wide range of requirements. One of which is tuned to improve the F1-score by 3 and increases CG by up to 83% compared to the previous state-of-the-art (autoencoder-based) compressor. Analyzing Dynamic-Deep on the Google Cloud Platform, we observe a 97% reduction in cloud costs compared to a no compression solution.
△ Less
Submitted 2 April, 2022; v1 submitted 30 May, 2021;
originally announced June 2021.
-
NXNSAttack: Recursive DNS Inefficiencies and Vulnerabilities
Authors:
Yehuda Afek,
Anat Bremler-Barr,
Lior Shafir
Abstract:
This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative nam…
▽ More
This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative name servers. The storm is produced by the response of resolvers to unrestricted referral response messages of authoritative name servers. The attack is significantly more destructive than NXDomain attacks (e.g., the Mirai attack): i) It reaches an amplification factor of more than 1620x on the number of packets exchanged by the recursive resolver. ii) In addition to the negative cache, the attack also saturates the 'NS' section of the resolver caches. To mitigate the attack impact, we propose an enhancement to the recursive resolver algorithm, MaxFetch(k), that prevents unnecessary proactive fetches. We implemented the MaxFetch(1) mitigation enhancement on a BIND resolver and tested it on real-world DNS query datasets. Our results show that MaxFetch(1) degrades neither the recursive resolver throughput nor its latency. Following the discovery of the attack, a responsible disclosure procedure was carried out, and several DNS vendors and public providers have issued a CVE and patched their systems.
△ Less
Submitted 29 September, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
NFV-based IoT Security for Home Networks using MUD
Authors:
Yehuda Afek,
Anat Bremler-Barr,
David Hay,
Ran Goldschmidt,
Lior Shafir,
Gafnit Abraham,
Avraham Shalev
Abstract:
A new scalable ISP level system architecture to secure and protect all IoT devices in a large number of homes is presented. The system is based on whitelisting, as in the Manufacturer Usage Description (MUD) framework, implemented as a VNF. Unlike common MUD suggestions that place the whitelist application at the home/enterprise network, our approach is to place the enforcement upstream at the pro…
▽ More
A new scalable ISP level system architecture to secure and protect all IoT devices in a large number of homes is presented. The system is based on whitelisting, as in the Manufacturer Usage Description (MUD) framework, implemented as a VNF. Unlike common MUD suggestions that place the whitelist application at the home/enterprise network, our approach is to place the enforcement upstream at the provider network, combining an NFV (Network Function Virtualization) with router/switching filtering capabilities, e.g., ACLs. The VNF monitors many home networks simultaneously, and therefore, is a highly-scalable managed service solution that provides both the end customers and the ISP with excellent visibility and security of the IoT devices at the customer premises.
The system includes a mechanism to distinguish between flows of different devices at the ISP level despite the fact that most home networks (and their IoT devices) are behind a NAT and all the flows from the same home come out with the same source IP address. Moreover, the NFV system needs to receive only the first packet of each connection at the VNF, and rules space is proportional to the number of unique types of IoT devices rather than the number of IoT devices. The monitoring part of the solution is off the critical path and can also uniquely protect from incoming DDoS attacks.
To cope with internal traffic, that is not visible outside the customer premise and often consists of P2P communication, we suggest a hybrid approach, where we deploy a lightweight component at the CPE, whose sole purpose is to monitor P2P communication. As current MUD solution does not provide a secure solution to P2P communication, we also extend the MUD protocol to deal also with peer-to-peer communicating devices. A PoC with a large national level ISP proves that our technology works as expected.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
IoT or NoT: Identifying IoT Devices in a ShortTime Scale
Authors:
Anat Bremler-Barr,
Haim Levy,
Zohar Yakhini
Abstract:
In recent years the number of IoT devices in home networks has increased dramatically. Whenever a new device connects to the network, it must be quickly managed and secured using the relevant security mechanism or QoS policy. Thus a key challenge is to distinguish between IoT and NoT devices in a matter of minutes. Unfortunately, there is no clear indication of whether a device in a network is an…
▽ More
In recent years the number of IoT devices in home networks has increased dramatically. Whenever a new device connects to the network, it must be quickly managed and secured using the relevant security mechanism or QoS policy. Thus a key challenge is to distinguish between IoT and NoT devices in a matter of minutes. Unfortunately, there is no clear indication of whether a device in a network is an IoT. In this paper, we propose different classifiers that identify a device as IoT or non-IoT, in a short time scale, and with high accuracy.
Our classifiers were constructed using machine learning techniques on a seen (training) dataset and were tested on an unseen (test) dataset. They successfully classified devices that were not in the seen dataset with accuracy above 95%. The first classifier is a logistic regression classifier based on traffic features. The second classifier is based on features we retrieve from DHCP packets. Finally, we present a unified classifier that leverages the advantages of the other two classifiers. We focus on the home-network environment, but our classifiers are also applicable to enterprise networks.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Eradicating Attacks on the Internal Network with Internal Network Policy
Authors:
Yehuda Afek,
Anat Bremler-Barr,
Alon Noy
Abstract:
In this paper we present three attacks on private internal networks behind a NAT and a corresponding new protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is tricked to visit the attacker's website, which contains a malicious script that lets the attacker access the victim's internal ne…
▽ More
In this paper we present three attacks on private internal networks behind a NAT and a corresponding new protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is tricked to visit the attacker's website, which contains a malicious script that lets the attacker access the victim's internal network in different ways, including opening a port in the NAT or sending a sophisticated request to local devices. The first attack utilizes DNS Rebinding in a particular way, while the other two demonstrate different methods of attacking the network, based on application security vulnerabilities. Following the attacks, we provide a new browser security policy, Internal Network Policy (INP), which protects against these types of vulnerabilities and attacks. This policy is implemented in the browser just like Same Origin Policy (SOP) and prevents malicious access to internal resources by external entities.
△ Less
Submitted 3 October, 2019; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Detecting Heavy Flows in the SDN Match and Action Model
Authors:
Yehuda Afek,
Anat Bremler-Barr,
Shir Landau Feibish,
Liron Schiff
Abstract:
Efficient algorithms and techniques to detect and identify large flows in a high throughput traffic stream in the SDN match-and-action model are presented. This is in contrast to previous work that either deviated from the match and action model by requiring additional switch level capabilities or did not exploit the SDN data plane. Our construction has two parts; (a) how to sample in an SDN match…
▽ More
Efficient algorithms and techniques to detect and identify large flows in a high throughput traffic stream in the SDN match-and-action model are presented. This is in contrast to previous work that either deviated from the match and action model by requiring additional switch level capabilities or did not exploit the SDN data plane. Our construction has two parts; (a) how to sample in an SDN match and action model, (b) how to detect large flows efficiently and in a scalable way, in the SDN model.
Our large flow detection methods provide high accuracy and present a good and practical tradeoff between switch - controller traffic, and the number of entries required in the switch flow table. Based on different parameters, we differentiate between heavy flows, elephant flows and bulky flows and present efficient algorithms to detect flows of the different types.
Additionally, as part of our heavy flow detection scheme, we present sampling methods to sample packets with arbitrary probability $p$ per packet or per byte that traverses an SDN switch.
Finally, we show how our algorithms can be adapted to a distributed monitoring SDN setting with multiple switches, and easily scale with the number of monitoring switches.
△ Less
Submitted 26 February, 2017;
originally announced February 2017.
-
Efficient Distinct Heavy Hitters for DNS DDoS Attack Detection
Authors:
Yehuda Afek,
Anat Bremler-Barr,
Edith Cohen,
Shir Landau Feibish,
Michal Shagam
Abstract:
Motivated by a recent new type of randomized Distributed Denial of Service (DDoS) attacks on the Domain Name Service (DNS), we develop novel and efficient distinct heavy hitters algorithms and build an attack identification system that uses our algorithms. Heavy hitter detection in streams is a fundamental problem with many applications, including detecting certain DDoS attacks and anomalies. A (c…
▽ More
Motivated by a recent new type of randomized Distributed Denial of Service (DDoS) attacks on the Domain Name Service (DNS), we develop novel and efficient distinct heavy hitters algorithms and build an attack identification system that uses our algorithms. Heavy hitter detection in streams is a fundamental problem with many applications, including detecting certain DDoS attacks and anomalies. A (classic) heavy hitter (HH) in a stream of elements is a key (e.g., the domain of a query) which appears in many elements (e.g., requests). When stream elements consist of a <key; subkey> pairs, (<domain; subdomain>) a distinct heavy hitter (dhh) is a key that is paired with a large number of different subkeys. Our dHH algorithms are considerably more practical than previous algorithms. Specifically the new fixed-size algorithms are simple to code and with asymptotically optimal space accuracy tradeoffs. In addition we introduce a new measure, a combined heavy hitter (cHH), which is a key with a large combination of distinct and classic weights. Efficient algorithms are also presented for cHH detection. Finally, we perform extensive experimental evaluation on real DNS attack traces, demonstrating the effectiveness of both our algorithms and our DNS malicious queries identification system.
△ Less
Submitted 8 December, 2016;
originally announced December 2016.
-
On the Dynamics of IP Address Allocation and Availability of End-Hosts
Authors:
Oded Argon,
Anat Bremler-Barr,
Osnat Mokryn,
Dvir Schirman,
Yuval Shavitt,
Udi Weinsberg
Abstract:
The availability of end-hosts and their assigned routable IP addresses has impact on the ability to fight spammers and attackers, and on peer-to-peer application performance. Previous works study the availability of hosts mostly by using either active **ing or by studying access to a mail service, both approaches suffer from inherent inaccuracies. We take a different approach by measuring the IP…
▽ More
The availability of end-hosts and their assigned routable IP addresses has impact on the ability to fight spammers and attackers, and on peer-to-peer application performance. Previous works study the availability of hosts mostly by using either active **ing or by studying access to a mail service, both approaches suffer from inherent inaccuracies. We take a different approach by measuring the IP addresses periodically reported by a uniquely identified group of the hosts running the DIMES agent. This fresh approach provides a chance to measure the true availability of end-hosts and the dynamics of their assigned routable IP addresses. Using a two month study of 1804 hosts, we find that over 60% of the hosts have a fixed IP address and 90% median availability, while some of the remaining hosts have more than 30 different IPs. For those that have periodically changing IP addresses, we find that the median average period per AS is roughly 24 hours, with a strong relation between the offline time and the probability of altering IP address.
△ Less
Submitted 10 November, 2010;
originally announced November 2010.