-
Feasibility of State Space Models for Network Traffic Generation
Authors:
Andrew Chu,
Xi Jiang,
Shinan Liu,
Arjun Bhagoji,
Francesco Bronzino,
Paul Schmitt,
Nick Feamster
Abstract:
Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of rea…
▽ More
Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of real-world traffic In this paper, we i) survey the evolution of traffic simulators/generators and ii) propose the use of state-space models, specifically Mamba, for packet-level, synthetic network trace generation by modeling it as an unsupervised sequence generation problem. Early evaluation shows that state-space models can generate synthetic network traffic with higher statistical similarity to real traffic than the state-of-the-art. Our approach thus has the potential to reliably generate realistic, informative synthetic network traces for downstream tasks.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Spatial Models for Crowdsourced Internet Access Network Performance Measurements
Authors:
Taveesh Sharma,
Paul Schmitt,
Francesco Bronzino,
Nick Feamster,
Nicole Marwell
Abstract:
Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed wi…
▽ More
Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries, such as zip codes, census tracts, or community areas. However, this assumption may not be valid for two reasons: (1) crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and (2) predefined social boundaries may not align with the actual boundaries of Internet infrastructure.
In this paper, we model Internet performance as a spatial process. We apply and evaluate a series of statistical techniques to: (1) aggregate Internet performance over a geographic region; (2) overlay interpolated maps with various sampling boundary choices; and (3) spatially cluster boundary units to identify areas with similar performance characteristics. We evaluated the effectiveness of these using a 17-month-long crowdsourced dataset from Ookla Speedtest. We evaluate several leading interpolation methods at varying spatial scales. Further, we examine the similarity between the resulting boundaries for smaller realizations of the dataset. Our findings suggest that our combination of techniques achieves a 56% gain in similarity score over traditional methods that rely on aggregates over raw measurement values for performance summarization. Our work highlights an urgent need for more sophisticated strategies in understanding and addressing Internet access disparities.
△ Less
Submitted 21 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning
Authors:
A. Fox,
F. De Pellegrini,
F. Faticanti,
E. Altman,
F. Bronzino
Abstract:
With the uptake of intelligent data-driven applications, edge computing infrastructures necessitate a new generation of admission control algorithms to maximize system performance under limited and highly heterogeneous resources. In this paper, we study how to optimally select information flows which belong to different classes and dispatch them to multiple edge servers where deployed applications…
▽ More
With the uptake of intelligent data-driven applications, edge computing infrastructures necessitate a new generation of admission control algorithms to maximize system performance under limited and highly heterogeneous resources. In this paper, we study how to optimally select information flows which belong to different classes and dispatch them to multiple edge servers where deployed applications perform flow analytic tasks. The optimal policy is obtained via constrained Markov decision process (CMDP) theory accounting for the demand of each edge application for specific classes of flows, the constraints on computing capacity of edge servers and of the access network.
We develop DR-CPO, a specialized primal-dual Safe Reinforcement Learning (SRL) method which solves the resulting optimal admission control problem by reward decomposition. DR-CPO operates optimal decentralized control and mitigates effectively state-space explosion while preserving optimality. Compared to existing Deep Reinforcement Learning (DRL) solutions, extensive results show that DR-CPO achieves 15\% higher reward on a wide variety of environments, while requiring on average only 50\% of the amount of learning episodes to converge. Finally, we show how to match DR-CPO and load-balancing to dispatch optimally information streams to available edge servers and further improve system performance.
△ Less
Submitted 28 June, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
CATO: End-to-End Optimization of ML-Based Traffic Analysis Pipelines
Authors:
Gerry Wan,
Shinan Liu,
Francesco Bronzino,
Nick Feamster,
Zakir Durumeric
Abstract:
Machine learning has shown tremendous potential for improving the capabilities of network traffic analysis applications, often outperforming simpler rule-based heuristics. However, ML-based solutions remain difficult to deploy in practice. Many existing approaches only optimize the predictive performance of their models, overlooking the practical challenges of running them against network traffic…
▽ More
Machine learning has shown tremendous potential for improving the capabilities of network traffic analysis applications, often outperforming simpler rule-based heuristics. However, ML-based solutions remain difficult to deploy in practice. Many existing approaches only optimize the predictive performance of their models, overlooking the practical challenges of running them against network traffic in real time. This is especially problematic in the domain of traffic analysis, where the efficiency of the serving pipeline is a critical factor in determining the usability of a model. In this work, we introduce CATO, a framework that addresses this problem by jointly optimizing the predictive performance and the associated systems costs of the serving pipeline. CATO leverages recent advances in multi-objective Bayesian optimization to efficiently identify Pareto-optimal configurations, and automatically compiles end-to-end optimized serving pipelines that can be deployed in real networks. Our evaluations show that compared to popular feature optimization techniques, CATO can provide up to 3600x lower inference latency and 3.7x higher zero-loss throughput while simultaneously achieving better model performance.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation
Authors:
Xi Jiang,
Shinan Liu,
Aaron Gember-Jacobson,
Arjun Nitin Bhagoji,
Paul Schmitt,
Francesco Bronzino,
Nick Feamster
Abstract:
Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flo…
▽ More
Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flow statistics or a few selected packet attributes, do not always suffice, especially when model training relies on having features that are only available from packet traces. This shortfall manifests in both insufficient statistical resemblance to real traces and suboptimal performance on ML tasks when employed for data augmentation. In this paper, we apply diffusion models to generate high-resolution synthetic network traffic traces. We present NetDiffusion, a tool that uses a finely-tuned, controlled variant of a Stable Diffusion model to generate synthetic network traffic that is high fidelity and conforms to protocol specifications. Our evaluation demonstrates that packet captures generated from NetDiffusion can achieve higher statistical similarity to real data and improved ML model performance than current state-of-the-art approaches (e.g., GAN-based approaches). Furthermore, our synthetic traces are compatible with common network analysis tools and support a myriad of network tasks, suggesting that NetDiffusion can serve a broader spectrum of network analysis and testing tasks, extending beyond ML-centric applications.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Internet Localization of Multi-Party Relay Users: Inherent Friction Between Internet Services and User Privacy
Authors:
Sean Flynn,
Francesco Bronzino,
Paul Schmitt
Abstract:
Internet privacy is increasingly important on the modern Internet. Users are looking to control the trail of data that they leave behind on the systems that they interact with. Multi-Party Relay (MPR) architectures lower the traditional barriers to adoption of privacy enhancing technologies on the Internet. MPRs are unique from legacy architectures in that they are able to offer privacy guarantees…
▽ More
Internet privacy is increasingly important on the modern Internet. Users are looking to control the trail of data that they leave behind on the systems that they interact with. Multi-Party Relay (MPR) architectures lower the traditional barriers to adoption of privacy enhancing technologies on the Internet. MPRs are unique from legacy architectures in that they are able to offer privacy guarantees without paying significant performance penalties. Apple's iCloud Private Relay is a recently deployed MPR service, creating the potential for widespread consumer adoption of the architecture. However, many current Internet-scale systems are designed based on assumptions that may no longer hold for users of privacy enhancing systems like Private Relay. There are inherent tensions between systems that rely on data about users -- estimated location of a user based on their IP address, for example -- and the trend towards a more private Internet.
This work studies a core function that is widely used to control network and application behavior, IP geolocation, in the context of iCloud Private Relay usage. We study the location accuracy of popular IP geolocation services compared against the published location dataset that Apple publicly releases to explicitly aid in geolocating PR users. We characterize geolocation service performance across a number of dimensions, including different countries, IP version, infrastructure provider, and time. Our findings lead us to conclude that existing approaches to IP geolocation (e.g., frequently updated databases) perform inadequately for users of the MPR architecture. For example, we find median location errors >1,000 miles in some countries for IPv4 addresses using IP2Location. Our findings lead us to conclude that new, privacy-focused, techniques for inferring user location may be required as privacy becomes a default user expectation on the Internet.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
AC-DC: Adaptive Ensemble Classification for Network Traffic Identification
Authors:
Xi Jiang,
Shinan Liu,
Saloua Naama,
Francesco Bronzino,
Paul Schmitt,
Nick Feamster
Abstract:
Accurate and efficient network traffic classification is important for many network management tasks, from traffic prioritization to anomaly detection. Although classifiers using pre-computed flow statistics (e.g., packet sizes, inter-arrival times) can be efficient, they may experience lower accuracy than techniques based on raw traffic, including packet captures. Past work on representation lear…
▽ More
Accurate and efficient network traffic classification is important for many network management tasks, from traffic prioritization to anomaly detection. Although classifiers using pre-computed flow statistics (e.g., packet sizes, inter-arrival times) can be efficient, they may experience lower accuracy than techniques based on raw traffic, including packet captures. Past work on representation learning-based classifiers applied to network traffic captures has shown to be more accurate, but slower and requiring considerable additional memory resources, due to the substantial costs in feature preprocessing. In this paper, we explore this trade-off and develop the Adaptive Constraint-Driven Classification (AC-DC) framework to efficiently curate a pool of classifiers with different target requirements, aiming to provide comparable classification performance to complex packet-capture classifiers while adapting to varying network traffic load.
AC-DC uses an adaptive scheduler that tracks current system memory availability and incoming traffic rates to determine the optimal classifier and batch size to maximize classification performance given memory and processing constraints. Our evaluation shows that AC-DC improves classification performance by more than 100% compared to classifiers that rely on flow statistics alone; compared to the state-of-the-art packet-capture classifiers, AC-DC achieves comparable performance (less than 12.3% lower in F1-Score), but processes traffic over 150x faster.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
LEAF: Navigating Concept Drift in Cellular Networks
Authors:
Shinan Liu,
Francesco Bronzino,
Paul Schmitt,
Arjun Nitin Bhagoji,
Nick Feamster,
Hector Garcia Crespo,
Timothy Coyle,
Brian Ward
Abstract:
Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in…
▽ More
Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in general, but is of particular importance in networking's highly dynamic deployment environments. In this paper, we first characterize concept drift in a large cellular network for a major metropolitan area in the United States. We find that concept drift occurs across many important key performance indicators (KPIs), independently of the model, training set size, and time interval -- thus necessitating practical approaches to detect, explain, and mitigate it. We then show that frequent model retraining with newly available data is not sufficient to mitigate concept drift, and can even degrade model accuracy further. Finally, we develop a new methodology for concept drift mitigation, Local Error Approximation of Features (LEAF). LEAF works by detecting drift; explaining the features and time intervals that contribute the most to drift; and mitigates it using forgetting and over-sampling. We evaluate LEAF against industry-standard mitigation approaches (notably, periodic retraining) with more than four years of cellular KPI data. Our initial tests with a major cellular provider in the US show that LEAF consistently outperforms periodic and triggered retraining on complex, real-world data while reducing costly retraining operations.
△ Less
Submitted 2 February, 2023; v1 submitted 7 September, 2021;
originally announced September 2021.
-
Characterizing Service Provider Response to the COVID-19 Pandemic in the United States
Authors:
Shinan Liu,
Paul Schmitt,
Francesco Bronzino,
Nick Feamster
Abstract:
The COVID-19 pandemic has resulted in dramatic changes to the daily habits of billions of people. Users increasingly have to rely on home broadband Internet access for work, education, and other activities. These changes have resulted in corresponding changes to Internet traffic patterns. This paper aims to characterize the effects of these changes with respect to Internet service providers in the…
▽ More
The COVID-19 pandemic has resulted in dramatic changes to the daily habits of billions of people. Users increasingly have to rely on home broadband Internet access for work, education, and other activities. These changes have resulted in corresponding changes to Internet traffic patterns. This paper aims to characterize the effects of these changes with respect to Internet service providers in the United States. We study three questions: (1)How did traffic demands change in the United States as a result of the COVID-19 pandemic?; (2)What effects have these changes had on Internet performance?; (3)How did service providers respond to these changes? We study these questions using data from a diverse collection of sources. Our analysis of interconnection data for two large ISPs in the United States shows a 30-60% increase in peak traffic rates in the first quarter of 2020. In particular, we observe traffic downstream peak volumes for a major ISP increase of 13-20% while upstream peaks increased by more than 30%. Further, we observe significant variation in performance across ISPs in conjunction with the traffic volume shifts, with evident latency increases after stay-at-home orders were issued, followed by a stabilization of traffic after April. Finally, we observe that in response to changes in usage, ISPs have aggressively augmented capacity at interconnects, at more than twice the rate of normal capacity augmentation. Similarly, video conferencing applications have increased their network footprint, more than doubling their advertised IP address space.
△ Less
Submitted 1 November, 2020;
originally announced November 2020.
-
On the Deployability of Augmented Reality Using Embedded Edge Devices
Authors:
Ayoub Ben-Ameur,
Andrea Araldo,
Francesco Bronzino
Abstract:
Edge Computing exploits computational capabilities deployed at the very edge of the network to support applications with low latency requirements. Such capabilities can reside in small embedded devices that integrate dedicated hardware -- e.g., a GPU -- in a low cost package. But these devices have limited computing capabilities compared to standard server grade equipment. When deploying an Edge C…
▽ More
Edge Computing exploits computational capabilities deployed at the very edge of the network to support applications with low latency requirements. Such capabilities can reside in small embedded devices that integrate dedicated hardware -- e.g., a GPU -- in a low cost package. But these devices have limited computing capabilities compared to standard server grade equipment. When deploying an Edge Computing based application, understanding whether the available hardware can meet target requirements is key in meeting the expected performance. In this paper, we study the feasibility of deploying Augmented Reality applications using Embedded Edge Devices (EEDs). We compare such deployment approach to one exploiting a standard dedicated server grade machine. Starting from an empirical evaluation of the capabilities of these devices, we propose a simple theoretical model to compare the performance of the two approaches. We then validate such model with NS-3 simulations and study their feasibility. Our results show that there is no one-fits-all solution. If we need to deploy high responsiveness applications, we need a centralized server grade architecture and we can in any case only support very few users. The centralized architecture fails to serve a larger number of users, even when low to mid responsiveness is required. In this case, we need to resort instead to a distributed deployment based on EEDs.
△ Less
Submitted 7 January, 2021; v1 submitted 28 October, 2020;
originally announced October 2020.
-
Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic
Authors:
Francesco Bronzino,
Paul Schmitt,
Sara Ayoubi,
Hyojoon Kim,
Renata Teixeira,
Nick Feamster
Abstract:
Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus…
▽ More
Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10 Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.
△ Less
Submitted 7 June, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Inferring Streaming Video Quality from Encrypted Traffic: Practical Models and Deployment Experience
Authors:
Paul Schmitt,
Francesco Bronzino,
Sara Ayoubi,
Guilherme Martins,
Renata Teixeira,
Nick Feamster
Abstract:
Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (\ie, startup delay and resolution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the model works in deploym…
▽ More
Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (\ie, startup delay and resolution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the model works in deployment settings where the video sessions and segments must be identified from a mix of traffic and the time precision of the collected traffic statistics is more coarse (\eg, due to aggregation). Second, we develop a single composite model that works for a range of different services (i.e., Netflix, YouTube, Amazon, and Twitch), as opposed to just a single service. Third, unlike many previous models, the model performs predictions at finer granularity (\eg, the precise startup delay instead of just detecting short versus long delays) allowing to draw better conclusions on the ongoing streaming quality. Fourth, we demonstrate the model is practical through a 16-month deployment in 66 homes and provide new insights about the relationships between Internet "speed" and the quality of the corresponding video streams, for a variety of services; we find that higher speeds provide only minimal improvements to startup delay and resolution.
△ Less
Submitted 14 August, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Exploiting Network Awareness to Enhance DASH Over Wireless
Authors:
Francesco Bronzino,
Dragoslav Stojadinovic,
Cedric Westphal,
Dipankar Raychaudhuri
Abstract:
The introduction of Dynamic Adaptive Streaming over HTTP (DASH) helped reduce the consumption of resource in video delivery, but its client-based rate adaptation is unable to optimally use the available end-to-end network bandwidth. We consider the problem of optimizing the delivery of video content to mobile clients while meeting the constraints imposed by the available network resources. Observi…
▽ More
The introduction of Dynamic Adaptive Streaming over HTTP (DASH) helped reduce the consumption of resource in video delivery, but its client-based rate adaptation is unable to optimally use the available end-to-end network bandwidth. We consider the problem of optimizing the delivery of video content to mobile clients while meeting the constraints imposed by the available network resources. Observing the bandwidth available in the network's two main components, core network, transferring the video from the servers to edge nodes close to the client, and the edge network, which is in charge of transferring the content to the user, via wireless links, we aim to find an optimal solution by exploiting the predictability of future user requests of sequential video segments, as well as the knowledge of available infrastructural resources at the core and edge wireless networks in a given future time window. Instead of regarding the bottleneck of the end-to-end connection as our throughput, we distribute the traffic load over time and use intermediate nodes between the server and the client for buffering video content to achieve higher throughput, and ultimately significantly improve the Quality of Experience for the end user in comparison with current solutions.
△ Less
Submitted 18 January, 2015;
originally announced January 2015.