Search | arXiv e-print repository

Characterizing the VPN Ecosystem in the Wild

Authors: Aniss Maghsoudlou, Lukas Vermeulen, Ingmar Poese, Oliver Gasser

Abstract: With the shift to working remotely after the COVID-19 pandemic, the use of Virtual Private Networks (VPNs) around the world has nearly doubled. Therefore, measuring the traffic and security aspects of the VPN ecosystem is more important now than ever. It is, however, challenging to detect and characterize VPN traffic since some VPN protocols use the same port number as web traffic and port-based t… ▽ More With the shift to working remotely after the COVID-19 pandemic, the use of Virtual Private Networks (VPNs) around the world has nearly doubled. Therefore, measuring the traffic and security aspects of the VPN ecosystem is more important now than ever. It is, however, challenging to detect and characterize VPN traffic since some VPN protocols use the same port number as web traffic and port-based traffic classification will not help. VPN users are also concerned about the vulnerabilities of their VPN connections due to privacy issues. In this paper, we aim at detecting and characterizing VPN servers in the wild, which facilitates detecting the VPN traffic. To this end, we perform Internet-wide active measurements to find VPN servers in the wild, and characterize them based on their vulnerabilities, certificates, locations, and fingerprinting. We find 9.8M VPN servers distributed around the world using OpenVPN, SSTP, PPTP, and IPsec, and analyze their vulnerability. We find SSTP to be the most vulnerable protocol with more than 90% of detected servers being vulnerable to TLS downgrade attacks. Of all the servers that respond to our VPN probes, 2% also respond to HTTP probes and therefore are classified as Web servers. We apply our list of VPN servers to the traffic from a large European ISP and observe that 2.6% of all traffic is related to these VPN servers. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: Code and data availabe at https://vpnecosystem.github.io/

Journal ref: Proceedings of the Passive and Active Measurement Conference 2023 (PAM '23)

arXiv:2211.05682 [pdf, other]

doi 10.1145/3555050.3569135

FlowDNS: Correlating Netflow and DNS Streams at Scale

Authors: Aniss Maghsoudlou, Oliver Gasser, Ingmar Poese, Anja Feldmann

Abstract: Knowing customer's interests, e.g. which Video-On-Demand (VoD) or Social Network services they are using, helps telecommunication companies with better network planning to enhance the performance exactly where the customer's interests lie, and also offer the customers relevant commercial packages. However, with the increasing deployment of CDNs by different services, identification, and attributio… ▽ More Knowing customer's interests, e.g. which Video-On-Demand (VoD) or Social Network services they are using, helps telecommunication companies with better network planning to enhance the performance exactly where the customer's interests lie, and also offer the customers relevant commercial packages. However, with the increasing deployment of CDNs by different services, identification, and attribution of the traffic on network-layer information alone becomes a challenge: If multiple services are using the same CDN provider, they cannot be easily distinguished based on IP prefixes alone. Therefore, it is crucial to go beyond pure network-layer information for traffic attribution. In this work, we leverage real-time DNS responses gathered by the clients' default DNS resolvers. Having these DNS responses and correlating them with network-layer headers, we are able to translate CDN-hosted domains to the actual services they belong to. We design a correlation system for this purpose and deploy it at a large European ISP. With our system, we can correlate an average of 81.7% of the traffic with the corresponding services, without any loss on our live data streams. Our correlation results also show that 0.5% of the daily traffic contains malformatted, spamming, or phishing domain names. Moreover, ISPs can correlate the results with their BGP information to find more details about the origin and destination of the traffic. We plan to publish our correlation software for other researchers or network operators to use. △ Less

Submitted 10 November, 2022; originally announced November 2022.

arXiv:2010.13120 [pdf, other]

doi 10.1109/TNSM.2020.3034278

Exploring Network-Wide Flow Data with Flowyager

Authors: Said Jawad Saidi, Aniss Maghsoudlou, Damien Foucard, Georgios Smaragdakis, Ingmar Poese, Anja Feldmann

Abstract: Many network operations, ranging from attack investigation and mitigation to traffic management, require answering network-wide flow queries in seconds. Although flow records are collected at each router, using available traffic capture utilities, querying the resulting datasets from hundreds of routers across sites and over time, remains a significant challenge due to the sheer traffic volume and… ▽ More Many network operations, ranging from attack investigation and mitigation to traffic management, require answering network-wide flow queries in seconds. Although flow records are collected at each router, using available traffic capture utilities, querying the resulting datasets from hundreds of routers across sites and over time, remains a significant challenge due to the sheer traffic volume and distributed nature of flow records. In this paper, we investigate how to improve the response time for a priori unknown network-wide queries. We present Flowyager, a system that is built on top of existing traffic capture utilities. Flowyager generates and analyzes tree data structures, that we call Flowtrees, which are succinct summaries of the raw flow data available by capture utilities. Flowtrees are self-adjusted data structures that drastically reduce space and transfer requirements, by 75% to 95%, compared to raw flow records. Flowyager manages the storage and transfers of Flowtrees, supports Flowtree operators, and provides a structured query language for answering flow queries across sites and time periods. By deploying a Flowyager prototype at both a large Internet Exchange Point and a Tier-1 Internet Service Provider, we showcase its capabilities for networks with hundreds of router interfaces. Our results show that the query response time can be reduced by an order of magnitude when compared with alternative data analytics platforms. Thus, Flowyager enables interactive network-wide queries and offers unprecedented drill-down capabilities to, e.g., identify DDoS culprits, pinpoint the involved sites, and determine the length of the attack. △ Less

Submitted 27 October, 2020; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: accepted at IEEE TNSM Journal DOI added

arXiv:2010.01380 [pdf, other]

Predicting traffic overflows on private peering

Authors: Elad Rapaport, Ingmar Poese, Polina Zilberman, Oliver Holschke, Rami Puzis

Abstract: Large content providers and content distribution network operators usually connect with large Internet service providers (eyeball networks) through dedicated private peering. The capacity of these private network interconnects is provisioned to match the volume of the real content demand by the users. Unfortunately, in case of a surge in traffic demand, for example due to a content trending in a c… ▽ More Large content providers and content distribution network operators usually connect with large Internet service providers (eyeball networks) through dedicated private peering. The capacity of these private network interconnects is provisioned to match the volume of the real content demand by the users. Unfortunately, in case of a surge in traffic demand, for example due to a content trending in a certain country, the capacity of the private interconnect may deplete and the content provider/distributor would have to reroute the excess traffic through transit providers. Although, such overflow events are rare, they have significant negative impacts on content providers, Internet service providers, and end-users. These include unexpected delays and disruptions reducing the user experience quality, as well as direct costs paid by the Internet service provider to the transit providers. If the traffic overflow events could be predicted, the Internet service providers would be able to influence the routes chosen for the excess traffic to reduce the costs and increase user experience quality. In this article we propose a method based on an ensemble of deep learning models to predict overflow events over a short term horizon of 2-6 hours and predict the specific interconnections that will ingress the overflow traffic. The method was evaluated with 2.5 years' traffic measurement data from a large European Internet service provider resulting in a true-positive rate of 0.8 while maintaining a 0.05 false-positive rate. The lockdown imposed by the COVID-19 pandemic reduced the overflow prediction accuracy. Nevertheless, starting from the end of April 2020 with the gradual lockdown release, the old models trained before the pandemic perform equally well. △ Less

Submitted 3 October, 2020; originally announced October 2020.

arXiv:2008.10959 [pdf, other]

doi 10.1145/3419394.3423658

The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic

Authors: Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, Georgios Smaragdakis

Abstract: Due to the COVID-19 pandemic, many governments imposed lock downs that forced hundreds of millions of citizens to stay at home. The implementation of confinement measures increased Internet traffic demands of residential users, in particular, for remote working, entertainment, commerce, and education, which, as a result, caused traffic shifts in the Internet core. In this paper, using data from a… ▽ More Due to the COVID-19 pandemic, many governments imposed lock downs that forced hundreds of millions of citizens to stay at home. The implementation of confinement measures increased Internet traffic demands of residential users, in particular, for remote working, entertainment, commerce, and education, which, as a result, caused traffic shifts in the Internet core. In this paper, using data from a diverse set of vantage points (one ISP, three IXPs, and one metropolitan educational network), we examine the effect of these lockdowns on traffic shifts. We find that the traffic volume increased by 15-20% almost within a week--while overall still modest, this constitutes a large increase within this short time period. However, despite this surge, we observe that the Internet infrastructure is able to handle the new volume, as most traffic shifts occur outside of traditional peak hours. When looking directly at the traffic sources, it turns out that, while hypergiants still contribute a significant fraction of traffic, we see (1) a higher increase in traffic of non-hypergiants, and (2) traffic increases in applications that people use when at home, such as Web conferencing, VPN, and gaming. While many networks see increased traffic demands, in particular, those providing services to residential users, academic networks experience major overall decreases. Yet, in these networks, we can observe substantial increases when considering applications associated to remote working and lecturing. △ Less

Submitted 5 October, 2020; v1 submitted 25 August, 2020; originally announced August 2020.

Journal ref: Proceedings of the 2020 Internet Measurement Conference (IMC '20)

arXiv:2008.07370 [pdf, other]

doi 10.1145/3405837.3411378

Corona-Warn-App: Tracing the Start of the Official COVID-19 Exposure Notification App for Germany

Authors: Jens Helge Reelfs, Oliver Hohlfeld, Ingmar Poese

Abstract: On June 16, 2020, Germany launched an open-source smartphone contact tracing app ("Corona-Warn-App") to help tracing SARS-CoV-2 (coronavirus) infection chains. It uses a decentralized, privacy-preserving design based on the Exposure Notification APIs in which a centralized server is only used to distribute a list of keys of SARS-CoV-2 infected users that is fetched by the app once per day. Its suc… ▽ More On June 16, 2020, Germany launched an open-source smartphone contact tracing app ("Corona-Warn-App") to help tracing SARS-CoV-2 (coronavirus) infection chains. It uses a decentralized, privacy-preserving design based on the Exposure Notification APIs in which a centralized server is only used to distribute a list of keys of SARS-CoV-2 infected users that is fetched by the app once per day. Its success, however, depends on its adoption. In this poster, we characterize the early adoption of the app using Netflow traces captured directly at its hosting infrastructure. We show that the app generated traffic from allover Germany---already on the first day. We further observe that local COVID-19 outbreaks do not result in noticeable traffic increases. △ Less

Submitted 25 July, 2020; originally announced August 2020.

Comments: To appear at ACM SIGCOMM 2020 Posters

arXiv:1909.07455 [pdf, other]

doi 10.1145/3355369.3355590

DDoS Hide & Seek: On the Effectiveness of a Booter Services Takedown

Authors: Daniel Kopp, Matthias Wichtlhuber, Ingmar Poese, Jair Santanna, Oliver Hohlfeld, Christoph Dietzel

Abstract: Booter services continue to provide popular DDoS-as-a-service platforms and enable anyone irrespective of their technical ability, to execute DDoS attacks with devastating impact. Since booters are a serious threat to Internet operations and can cause significant financial and reputational damage, they also draw the attention of law enforcement agencies and related counter activities. In this pape… ▽ More Booter services continue to provide popular DDoS-as-a-service platforms and enable anyone irrespective of their technical ability, to execute DDoS attacks with devastating impact. Since booters are a serious threat to Internet operations and can cause significant financial and reputational damage, they also draw the attention of law enforcement agencies and related counter activities. In this paper, we investigate booter-based DDoS attacks in the wild and the impact of an FBI takedown targeting 15 booter websites in December 2018 from the perspective of a major IXP and two ISPs. We study and compare attack properties of multiple booter services by launching Gbps-level attacks against our own infrastructure. To understand spatial and temporal trends of the DDoS traffic originating from booters we scrutinize 5 months, worth of inter-domain traffic. We observe that the takedown only leads to a temporary reduction in attack traffic. Additionally, one booter was found to quickly continue operation by using a new domain for its website. △ Less

Submitted 16 September, 2019; originally announced September 2019.

arXiv:1810.02978 [pdf, other]

doi 10.1145/3278532.3278567

Dissecting Apple's Meta-CDN during an iOS Update

Authors: Jeremias Blendin, Fabrice Bendfeldt, Ingmar Poese, Boris Koldehofe, Oliver Hohlfeld

Abstract: Content delivery networks (CDN) contribute more than 50% of today's Internet traffic. Meta-CDNs, an evolution of centrally controlled CDNs, promise increased flexibility by multihoming content. So far, efforts to understand the characteristics of Meta-CDNs focus mainly on third-party Meta-CDN services. A common, but unexplored, use case for Meta-CDNs is to use the CDNs map** infrastructure to fo… ▽ More Content delivery networks (CDN) contribute more than 50% of today's Internet traffic. Meta-CDNs, an evolution of centrally controlled CDNs, promise increased flexibility by multihoming content. So far, efforts to understand the characteristics of Meta-CDNs focus mainly on third-party Meta-CDN services. A common, but unexplored, use case for Meta-CDNs is to use the CDNs map** infrastructure to form self-operated Meta-CDNs integrating third-party CDNs. These CDNs assist in the build-up phase of a CDN's infrastructure or mitigate capacity shortages by offloading traffic. This paper investigates the Apple CDN as a prominent example of self-operated Meta-CDNs. We describe the involved CDNs, the request-map** mechanism, and show the cache locations of the Apple CDN using measurements of more than 800 RIPE Atlas probes worldwide. We further measure its load-sharing behavior by observing a major iOS update in Sep. 2017, a significant event potentially reaching up to an estimated 1 billion iOS devices. Furthermore, by analyzing data from a European Eyeball ISP, we quantify third-party traffic offloading effects and find third-party CDNs increase their traffic by 438% while saturating seemingly unrelated links. △ Less

Submitted 6 October, 2018; originally announced October 2018.

Comments: 2018 ACM Internet Measurement Conference (IMC '18). 6 pages

arXiv:1801.05168 [pdf, other]

doi 10.1007/978-3-319-76481-8_19

A First Look at QUIC in the Wild

Authors: Jan Rüth, Ingmar Poese, Christoph Dietzel, Oliver Hohlfeld

Abstract: For the first time since the establishment of TCP and UDP, the Internet transport layer is subject to a major change by the introduction of QUIC. Initiated by Google in 2012, QUIC provides a reliable, connection-oriented low-latency and fully encrypted transport. In this paper, we provide the first broad assessment of QUIC usage in the wild. We monitor the entire IPv4 address space since August 20… ▽ More For the first time since the establishment of TCP and UDP, the Internet transport layer is subject to a major change by the introduction of QUIC. Initiated by Google in 2012, QUIC provides a reliable, connection-oriented low-latency and fully encrypted transport. In this paper, we provide the first broad assessment of QUIC usage in the wild. We monitor the entire IPv4 address space since August 2016 and about 46% of the DNS namespace to detected QUIC-capable infrastructures. Our scans show that the number of QUIC-capable IPs has more than tripled since then to over 617.59 K. We find around 161K domains hosted on QUIC-enabled infrastructure, but only 15K of them present valid certificates over QUIC. Second, we analyze one year of traffic traces provided by MAWI, one day of a major European tier-1 ISP and from a large IXP to understand the dominance of QUIC in the Internet traffic mix. We find QUIC to account for 2.6% to 9.1% of the current Internet traffic, depending on the vantage point. This share is dominated by Google pushing up to 42.1% of its traffic via QUIC. △ Less

Submitted 24 February, 2019; v1 submitted 16 January, 2018; originally announced January 2018.

Journal ref: Passive Active Measurements Conference (PAM), 2018

arXiv:1202.1464 [pdf, ps, other]

Content-aware Traffic Engineering

Authors: Benjamin Frank, Ingmar Poese, Georgios Smaragdakis, Steve Uhlig, Anja Feldmann

Abstract: Today, a large fraction of Internet traffic is originated by Content Providers (CPs) such as content distribution networks and hyper-giants. To cope with the increasing demand for content, CPs deploy massively distributed infrastructures. This poses new challenges for CPs as they have to dynamically map end-users to appropriate servers, without being fully aware of network conditions within an ISP… ▽ More Today, a large fraction of Internet traffic is originated by Content Providers (CPs) such as content distribution networks and hyper-giants. To cope with the increasing demand for content, CPs deploy massively distributed infrastructures. This poses new challenges for CPs as they have to dynamically map end-users to appropriate servers, without being fully aware of network conditions within an ISP as well as the end-users network locations. Furthermore, ISPs struggle to cope with rapid traffic shifts caused by the dynamic server selection process of CPs. In this paper, we argue that the challenges that CPs and ISPs face separately today can be turned into an opportunity. We show how they can jointly take advantage of the deployed distributed infrastructures to improve their operation and end-user performance. We propose Content-aware Traffic Engineering (CaTE), which dynamically adapts the traffic demand for content hosted on CPs by utilizing ISP network information and end-user location during the server selection process. As a result, CPs enhance their end-user to server map** and improve end-user experience, thanks to the ability of network-informed server selection to circumvent network bottlenecks. In addition, ISPs gain the ability to partially influence the traffic demands in their networks. Our results with operational data show improvements in path length and delay between end-user and the assigned CP server, network wide traffic reduction of up to 15%, and a decrease in ISP link utilization of up to 40% when applying CaTE to traffic delivered by a small number of major CPs. △ Less

Submitted 7 February, 2012; originally announced February 2012.

Comments: Also appears as TU-Berlin technical report 2012-3, ISSN: 1436-9915

Showing 1–10 of 10 results for author: Poese, I