-
Network Function Capacity Reconnaissance by Remote Adversaries
Authors:
Aqsa Kashaf,
Aidan Walsh,
Maria Apostolaki,
Vyas Sekar,
Yuvraj Agarwal
Abstract:
There is anecdotal evidence that attackers use reconnaissance to learn the capacity of their victims before DDoS attacks to maximize their impact. The first step to mitigate capacity reconnaissance attacks is to understand their feasibility. However, the feasibility of capacity reconnaissance in network functions (NFs) (e.g., firewalls, NATs) is unknown. To this end, we formulate the problem of ne…
▽ More
There is anecdotal evidence that attackers use reconnaissance to learn the capacity of their victims before DDoS attacks to maximize their impact. The first step to mitigate capacity reconnaissance attacks is to understand their feasibility. However, the feasibility of capacity reconnaissance in network functions (NFs) (e.g., firewalls, NATs) is unknown. To this end, we formulate the problem of network function capacity reconnaissance (NFCR) and explore the feasibility of inferring the processing capacity of an NF while avoiding detection. We identify key factors that make NFCR challenging and analyze how these factors affect accuracy (measured as a divergence from ground truth) and stealthiness (measured in packets sent). We propose a flexible tool, NFTY, that performs NFCR and we evaluate two practical NFTY configurations to showcase the stealthiness vs. accuracy tradeoffs. We evaluate these strategies in controlled, Internet and/or cloud settings with commercial NFs. NFTY can accurately estimate the capacity of different NF deployments within 10% error in the controlled experiments and the Internet, and within 7% error for a commercial NF deployed in the cloud (AWS). Moreover, NFTY outperforms link-bandwidth estimation baselines by up to 30x.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Summary Statistic Privacy in Data Sharing
Authors:
Zinan Lin,
Shuaiqi Wang,
Vyas Sekar,
Giulia Fanti
Abstract:
We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution (e.g., mean, standard deviation). It achieves this by passing the data through a randomization mechanism. We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism based on the worst-case probability of an adver…
▽ More
We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution (e.g., mean, standard deviation). It achieves this by passing the data through a randomization mechanism. We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism based on the worst-case probability of an adversary guessing the distributional secret within some threshold. Defining distortion as a worst-case Wasserstein-1 distance between the real and released data, we prove lower bounds on the tradeoff between privacy and distortion. We then propose a class of quantization mechanisms that can be adapted to different data distributions. We show that the quantization mechanism's privacy-distortion tradeoff matches our lower bounds under certain regimes, up to small constant factors. Finally, we demonstrate on real-world datasets that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms.
△ Less
Submitted 27 October, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
CANE: A Cascade-Control Approach for Network-Assisted Video QoE Management
Authors:
Mehdi Hosseinzadeh,
Karthick Shankar,
Maria Apostolaki,
Jay Ramachandran,
Steven Adams,
Vyas Sekar,
Bruno Sinopoli
Abstract:
Prior efforts have shown that network-assisted schemes can improve the Quality-of-Experience (QoE) and QoE fairness when multiple video players compete for bandwidth. However, realizing network-assisted schemes in practice is challenging, as: i) the network has limited visibility into the client players' internal state and actions; ii) players' actions may nullify or negate the network's actions;…
▽ More
Prior efforts have shown that network-assisted schemes can improve the Quality-of-Experience (QoE) and QoE fairness when multiple video players compete for bandwidth. However, realizing network-assisted schemes in practice is challenging, as: i) the network has limited visibility into the client players' internal state and actions; ii) players' actions may nullify or negate the network's actions; and iii) the players' objectives might be conflicting. To address these challenges, we formulate network-assisted QoE optimization through a cascade control abstraction. This informs the design of CANE, a practical network-assisted QoE framework. CANE uses machine learning techniques to approximate each player's behavior as a black-box model and model predictive control to achieve a near-optimal solution. We evaluate CANE through realistic simulations and show that CANE improves multiplayer QoE fairness by ~50% compared to pure client-side adaptive bitrate algorithms and by ~20% compared to uniform traffic sha**.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Rethinking Data-driven Networking with Foundation Models: Challenges and Opportunities
Authors:
Franck Le,
Mudhakar Srivatsa,
Raghu Ganti,
Vyas Sekar
Abstract:
Foundational models have caused a paradigm shift in the way artificial intelligence (AI) systems are built. They have had a major impact in natural language processing (NLP), and several other domains, not only reducing the amount of required labeled data or even eliminating the need for it, but also significantly improving performance on a wide range of tasks. We argue foundation models can have…
▽ More
Foundational models have caused a paradigm shift in the way artificial intelligence (AI) systems are built. They have had a major impact in natural language processing (NLP), and several other domains, not only reducing the amount of required labeled data or even eliminating the need for it, but also significantly improving performance on a wide range of tasks. We argue foundation models can have a similar profound impact on network traffic analysis, and management. More specifically, we show that network data shares several of the properties that are behind the success of foundational models in linguistics. For example, network data contains rich semantic content, and several of the networking tasks (e.g., traffic classification, generation of protocol implementations from specification text, anomaly detection) can find similar counterparts in NLP (e.g., sentiment analysis, translation from natural language to code, out-of-distribution). However, network settings also present unique characteristics and challenges that must be overcome. Our contribution is in highlighting the opportunities and challenges at the intersection of foundation models and networking.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
SPIDER: A Practical Fuzzing Framework to Uncover Stateful Performance Issues in SDN Controllers
Authors:
Ao Li,
Rohan Padhye,
Vyas Sekar
Abstract:
Performance issues in software-defined network (SDN) controllers can have serious impacts on the performance and availability of networks. We specifically consider stateful performance issues, where a sequence of initial input messages drives an SDN controller into a state such that its performance degrades pathologically when processing subsequent messages. We identify key challenges in applying…
▽ More
Performance issues in software-defined network (SDN) controllers can have serious impacts on the performance and availability of networks. We specifically consider stateful performance issues, where a sequence of initial input messages drives an SDN controller into a state such that its performance degrades pathologically when processing subsequent messages. We identify key challenges in applying canonical program analysis techniques: large input space of messages (e.g., stateful OpenFlow protocol), complex code base and software architecture (e.g., OSGi framework with dynamic launch), and the semantic dependencies between the internal state and external inputs. We design SPIDER, a practical fuzzing workflow that tackles these challenges and automatically uncovers such issues in SDN controllers. SPIDER's design entails a careful synthesis and extension of semantic fuzzing, performance fuzzing, and static analysis, taken together with domain-specific insights to tackle these challenges. We show that our design workflow is robust across two controllers -- ONOS and OpenDaylight -- with very different internal implementations. Using SPIDER, we were able to identify and confirm multiple stateful performance issues.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams
Authors:
Antonis Manousis,
Zhuo Cheng,
Ran Ben Basat,
Zaoxing Liu,
Vyas Sekar
Abstract:
Today's large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of s…
▽ More
Today's large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a ``sketch of sketches'' to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
On the Privacy Properties of GAN-generated Samples
Authors:
Zinan Lin,
Vyas Sekar,
Giulia Fanti
Abstract:
The privacy implications of generative adversarial networks (GANs) are a topic of great interest, leading to several recent algorithms for training GANs with privacy guarantees. By drawing connections to the generalization properties of GANs, we prove that under some assumptions, GAN-generated samples inherently satisfy some (weak) privacy guarantees. First, we show that if a GAN is trained on m s…
▽ More
The privacy implications of generative adversarial networks (GANs) are a topic of great interest, leading to several recent algorithms for training GANs with privacy guarantees. By drawing connections to the generalization properties of GANs, we prove that under some assumptions, GAN-generated samples inherently satisfy some (weak) privacy guarantees. First, we show that if a GAN is trained on m samples and used to generate n samples, the generated samples are (epsilon, delta)-differentially-private for (epsilon, delta) pairs where delta scales as O(n/m). We show that under some special conditions, this upper bound is tight. Next, we study the robustness of GAN-generated samples to membership inference attacks. We model membership inference as a hypothesis test in which the adversary must determine whether a given sample was drawn from the training dataset or from the underlying data distribution. We show that this adversary can achieve an area under the ROC curve that scales no better than O(m^{-1/4}).
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Initial non-invasive in vivo sensing of the lung using time domain diffuse optics
Authors:
Antonio Pifferi,
Massimo Miniati,
Andrea Farina,
Sanathana Konugolu Venkata Sekar,
Pranav Lanka,
Alberto Dalla Mora,
Paola Taroni
Abstract:
Non-invasive in vivo sensing of the lung with light would help diagnose and monitor pulmonary disorders (caused by e.g. COVID-19, emphysema, immature lung tissue in infants). We investigated the possibility to probe the lung with time domain diffuse optics, taking advantage of the increased depth (few cm) reached by photons detected after a long (few ns) propagation time. An initial study on 5 hea…
▽ More
Non-invasive in vivo sensing of the lung with light would help diagnose and monitor pulmonary disorders (caused by e.g. COVID-19, emphysema, immature lung tissue in infants). We investigated the possibility to probe the lung with time domain diffuse optics, taking advantage of the increased depth (few cm) reached by photons detected after a long (few ns) propagation time. An initial study on 5 healthy volunteers included time-resolved broadband diffuse optical spectroscopy measurements at 3 cm source-detector distance over the 600-1100 nm range, and long-distance (6-9 cm) measurements at 820 nm performed during a breathing protocol. The interpretation of the in vivo data with a simplified homogeneous model yielded a maximum probing depth of 2.6-3.9 cm, suitable to reach the lung. Also, signal changes related to the inspiration act were observed, especially at high photon propagation times. Yet, intra- and inter-subject variability and inconsistencies, possibly alluring to competing scattering and absorption effects, prevented a simple interpretation. Aspects to be further investigated to gain a deeper insight are discussed.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Accurate near wall steady flow field prediction using Physics Informed Neural Network (PINN)
Authors:
Vinothkumar Sekar,
Qinghua Jiang,
Chang Shu,
Boo Cheong Khoo
Abstract:
In this paper, Physics Informed Neural Network (PINN) is explored in order to obtain flow predictions near the wall region accurately with measurements (or sampling points) away from the wall. Often, in fluid mechanics experiments, it is difficult to perform velocity measurements near the wall accurately. Therefore, the present study reveals a new and elegant approach to recover the flow solutions…
▽ More
In this paper, Physics Informed Neural Network (PINN) is explored in order to obtain flow predictions near the wall region accurately with measurements (or sampling points) away from the wall. Often, in fluid mechanics experiments, it is difficult to perform velocity measurements near the wall accurately. Therefore, the present study reveals a new and elegant approach to recover the flow solutions near the wall. Laminar boundary layer flow over a flat plate case is considered for this study in order to explore the ability of PINN to accurately predict the flow field. All the required sampling data for this study is obtained from CFD simulations. A wide range of Reynolds number cases from Re=500 to 100000 has been investigated. First, using PINN, the boundary layer solution is obtained with three different types of boundary conditions. Further, the influence of the location of the sampling points on the accuracy is analysed. From the velocity profiles and the skin friction coefficient distribution, it is clear that PINN results are reasonably accurate near the wall with only a few sampling points away from the wall. This approach has potential application in experiments to obtain the near wall solutions accurately with measurements away from the wall.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
RareGAN: Generating Samples for Rare Classes
Authors:
Zinan Lin,
Hao Liang,
Giulia Fanti,
Vyas Sekar
Abstract:
We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g…
▽ More
We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g., generating images from a rare class). Existing approaches are unsuitable, either requiring fully-labeled datasets or sacrificing the fidelity of the rare class for that of the common classes. We propose RareGAN, a novel synthesis of three key ideas: (1) extending conditional GANs to use labelled and unlabelled data for better generalization; (2) an active learning approach that requests the most useful labels; and (3) a weighted loss function to favor learning the rare class. We show that RareGAN achieves a better fidelity-diversity tradeoff on the rare class than prior work across different applications, budgets, rare class fractions, GAN losses, and architectures.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
A Roadmap for Enabling a Future-Proof In-Network Computing Data Plane Ecosystem
Authors:
Daehyeok Kim,
Nikita Lazarev,
Tommy Tracy,
Farzana Siddique,
Hun Namkung,
James C. Hoe,
Vyas Sekar,
Kevin Skadron,
Zhiru Zhang,
Srinivasan Seshan
Abstract:
As the vision of in-network computing becomes more mature, we see two parallel evolutionary trends. First, we see the evolution of richer, more demanding applications that require capabilities beyond programmable switching ASICs. Second, we see the evolution of diverse data plane technologies with many other future capabilities on the horizon. While some point solutions exist to tackle the interse…
▽ More
As the vision of in-network computing becomes more mature, we see two parallel evolutionary trends. First, we see the evolution of richer, more demanding applications that require capabilities beyond programmable switching ASICs. Second, we see the evolution of diverse data plane technologies with many other future capabilities on the horizon. While some point solutions exist to tackle the intersection of these trends, we see several ecosystem-level disconnects today; e.g., the need to refactor applications for new data planes, lack of systematic guidelines to inform the development of future data plane capabilities, and lack of holistic runtime frameworks for network operators. In this paper, we use a simple-yet-instructive emerging application-data plane combination to highlight these disconnects. Drawing on these lessons, we sketch a high-level roadmap and guidelines for the community to tackle these to create a more thriving "future-proof" data plane ecosystem.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions
Authors:
Todd Huster,
Jeremy E. J. Cohen,
Zinan Lin,
Kevin Chan,
Charles Kamhoua,
Nandi Leslie,
Cho-Yu Jason Chiang,
Vyas Sekar
Abstract:
Generative adversarial networks (GANs) are often billed as "universal distribution learners", but precisely what distributions they can represent and learn is still an open question. Heavy-tailed distributions are prevalent in many different domains such as financial risk-assessment, physics, and epidemiology. We observe that existing GAN architectures do a poor job of matching the asymptotic beha…
▽ More
Generative adversarial networks (GANs) are often billed as "universal distribution learners", but precisely what distributions they can represent and learn is still an open question. Heavy-tailed distributions are prevalent in many different domains such as financial risk-assessment, physics, and epidemiology. We observe that existing GAN architectures do a poor job of matching the asymptotic behavior of heavy-tailed distributions, a problem that we show stems from their construction. Additionally, when faced with the infinite moments and large distances between outlier points that are characteristic of heavy-tailed distributions, common loss functions produce unstable or near-zero gradients. We address these problems with the Pareto GAN. A Pareto GAN leverages extreme value theory and the functional properties of neural networks to learn a distribution that matches the asymptotic behavior of the marginal distributions of the features. We identify issues with standard loss functions and propose the use of alternative metric spaces that enable stable and efficient learning. Finally, we evaluate our proposed approach on a variety of heavy-tailed datasets.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Sketchy With a Chance of Adoption: Can Sketch-Based Telemetry Be Ready for Prime Time?
Authors:
Zaoxing Liu,
Hun Namkung,
Anup Agarwal,
Antonis Manousis,
Peter Steenkiste,
Srinivasan Seshan,
Vyas Sekar
Abstract:
Sketching algorithms or sketches have emerged as a promising alternative to the traditional packet sampling-based network telemetry solutions. At a high level, they are attractive because of their high resource efficiency and accuracy guarantees. While there have been significant recent advances in various aspects of sketching for networking tasks, many fundamental challenges remain unsolved that…
▽ More
Sketching algorithms or sketches have emerged as a promising alternative to the traditional packet sampling-based network telemetry solutions. At a high level, they are attractive because of their high resource efficiency and accuracy guarantees. While there have been significant recent advances in various aspects of sketching for networking tasks, many fundamental challenges remain unsolved that are likely stumbling blocks for adoption. Our contribution in this paper is in identifying and formulating these research challenges across the ecosystem encompassing network operators, platform vendors/developers, and algorithm designers. We hope that these serve as a necessary fillip for the community to enable the broader adoption of sketch-based telemetry.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
Authors:
Zinan Lin,
Vyas Sekar,
Giulia Fanti
Abstract:
Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs). However, there is currently limited understanding of why SN is effective. In this work, we show that SN controls two important failure modes of GAN training: exploding and vanishing gradients. Our proofs illustrate a (perhaps unintentional) connection wit…
▽ More
Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs). However, there is currently limited understanding of why SN is effective. In this work, we show that SN controls two important failure modes of GAN training: exploding and vanishing gradients. Our proofs illustrate a (perhaps unintentional) connection with the successful LeCun initialization. This connection helps to explain why the most popular implementation of SN for GANs requires no hyper-parameter tuning, whereas stricter implementations of SN have poor empirical performance out-of-the-box. Unlike LeCun initialization which only controls gradient vanishing at the beginning of training, SN preserves this property throughout training. Building on this theoretical understanding, we propose a new spectral normalization technique: Bidirectional Scaled Spectral Normalization (BSSN), which incorporates insights from later improvements to LeCun initialization: Xavier initialization and Kaiming initialization. Theoretically, we show that BSSN gives better gradient control than SN. Empirically, we demonstrate that it outperforms SN in sample quality and training stability on several benchmark datasets.
△ Less
Submitted 7 April, 2021; v1 submitted 6 September, 2020;
originally announced September 2020.
-
Unleashing In-network Computing on Scientific Workloads
Authors:
Daehyeok Kim,
Ankush Jain,
Zaoxing Liu,
George Amvrosiadis,
Damian Hazen,
Bradley Settlemyer,
Vyas Sekar
Abstract:
Many recent efforts have shown that in-network computing can benefit various datacenter applications. In this paper, we explore a relatively less-explored domain which we argue can benefit from in-network computing: scientific workloads in high-performance computing. By analyzing canonical examples of HPC applications, we observe unique opportunities and challenges for exploiting in-network comput…
▽ More
Many recent efforts have shown that in-network computing can benefit various datacenter applications. In this paper, we explore a relatively less-explored domain which we argue can benefit from in-network computing: scientific workloads in high-performance computing. By analyzing canonical examples of HPC applications, we observe unique opportunities and challenges for exploiting in-network computing to accelerate scientific workloads. In particular, we find that the dynamic and demanding nature of scientific workloads is the major obstacle to the adoption of in-network approaches which are mostly open-loop and lack runtime feedback. In this paper, we present NSinC (Network-accelerated ScIeNtific Computing), an architecture for fully unleashing the potential benefits of in-network computing for scientific workloads by providing closed-loop runtime feedback to in-network acceleration services. We outline key challenges in realizing this vision and a preliminary design to enable acceleration for scientific applications.
△ Less
Submitted 5 September, 2020;
originally announced September 2020.
-
Fighting Fire with Light: A Case for Defending DDoS Attacks Using the Optical Layer
Authors:
Matthew Hall,
Ramakrishnan Durairajan,
Vyas Sekar
Abstract:
The DDoS attack landscape is growing at an unprecedented pace. Inspired by the recent advances in optical networking, we make a case for optical layer-aware DDoS defense (O-LAD) in this paper. Our approach leverages the optical layer to isolate attack traffic rapidly via dynamic reconfiguration of (backup) wavelengths using ROADMs---bridging the gap between (a) evolution of the DDoS attack landsca…
▽ More
The DDoS attack landscape is growing at an unprecedented pace. Inspired by the recent advances in optical networking, we make a case for optical layer-aware DDoS defense (O-LAD) in this paper. Our approach leverages the optical layer to isolate attack traffic rapidly via dynamic reconfiguration of (backup) wavelengths using ROADMs---bridging the gap between (a) evolution of the DDoS attack landscape and (b) innovations in the optical layer (e.g., reconfigurable optics). We show that the physical separation of traffic profiles allows finer-grained handling of suspicious flows and offers better performance for benign traffic in the face of an attack. We present preliminary results modeling throughput and latency for legitimate flows while scaling the strength of attacks. We also identify a number of open problems for the security, optical, and systems communities: modeling diverse DDoS attacks (e.g., fixed vs. variable rate, detectable vs. undetectable), building a full-fledged defense system with optical advancements (e.g., OpenConfig), and optical layer-aware defenses for a broader class of attacks (e.g., network reconnaissance).
△ Less
Submitted 23 February, 2020;
originally announced February 2020.
-
Enhancing the Privacy of Federated Learning with Sketching
Authors:
Zaoxing Liu,
Tian Li,
Virginia Smith,
Vyas Sekar
Abstract:
In response to growing concerns about user privacy, federated learning has emerged as a promising tool to train statistical models over networks of devices while kee** data localized. Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties. However, current methods still share model updates, which may contain private informatio…
▽ More
In response to growing concerns about user privacy, federated learning has emerged as a promising tool to train statistical models over networks of devices while kee** data localized. Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties. However, current methods still share model updates, which may contain private information (e.g., one's weight and height), during the training process. Existing efforts that aim to improve the privacy of federated learning make compromises in one or more of the following key areas: performance (particularly communication cost), accuracy, or privacy. To better optimize these trade-offs, we propose that \textit{sketching algorithms} have a unique advantage in that they can provide both privacy and performance benefits while maintaining accuracy. We evaluate the feasibility of sketching-based federated learning with a prototype on three representative learning models. Our initial findings show that it is possible to provide strong privacy guarantees for federated learning without sacrificing performance or accuracy. Our work highlights that there exists a fundamental connection between privacy and communication in distributed settings, and suggests important open problems surrounding the theoretical understanding, methodology, and system design of practical, private federated learning.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
Privacy for Free: Communication-Efficient Learning with Differential Privacy Using Sketches
Authors:
Tian Li,
Zaoxing Liu,
Vyas Sekar,
Virginia Smith
Abstract:
Communication and privacy are two critical concerns in distributed learning. Many existing works treat these concerns separately. In this work, we argue that a natural connection exists between methods for communication reduction and privacy preservation in the context of distributed machine learning. In particular, we prove that Count Sketch, a simple method for data stream summarization, has inh…
▽ More
Communication and privacy are two critical concerns in distributed learning. Many existing works treat these concerns separately. In this work, we argue that a natural connection exists between methods for communication reduction and privacy preservation in the context of distributed machine learning. In particular, we prove that Count Sketch, a simple method for data stream summarization, has inherent differential privacy properties. Using these derived privacy guarantees, we propose a novel sketch-based framework (DiffSketch) for distributed learning, where we compress the transmitted messages via sketches to simultaneously achieve communication efficiency and provable privacy benefits. Our evaluation demonstrates that DiffSketch can provide strong differential privacy guarantees (e.g., $\varepsilon$= 1) and reduce communication by 20-50x with only marginal decreases in accuracy. Compared to baselines that treat privacy and communication separately, DiffSketch improves absolute test accuracy by 5%-50% while offering the same privacy guarantees and communication compression.
△ Less
Submitted 6 December, 2019; v1 submitted 3 November, 2019;
originally announced November 2019.
-
Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Authors:
Zinan Lin,
Alankar Jain,
Chen Wang,
Giulia Fanti,
Vyas Sekar
Abstract:
Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datas…
▽ More
Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.
△ Less
Submitted 16 January, 2021; v1 submitted 29 September, 2019;
originally announced September 2019.
-
affinity: A System for Latent User Similarity Comparison on Texting Data
Authors:
Tobias Eichinger,
Felix Beierle,
Sumsam Ullah Khan,
Robin Middelanis,
Veeraraghavan Sekar,
Sam Tabibzadeh
Abstract:
In the field of social networking services, finding similar users based on profile data is common practice. Smartphones harbor sensor and personal context data that can be used for user profiling. Yet, one vast source of personal data, that is text messaging data, has hardly been studied for user profiling. We see three reasons for this: First, private text messaging data is not shared due to thei…
▽ More
In the field of social networking services, finding similar users based on profile data is common practice. Smartphones harbor sensor and personal context data that can be used for user profiling. Yet, one vast source of personal data, that is text messaging data, has hardly been studied for user profiling. We see three reasons for this: First, private text messaging data is not shared due to their intimate character. Second, the definition of an appropriate privacy-preserving similarity measure is non-trivial. Third, assessing the quality of a similarity measure on text messaging data representing a potentially infinite set of topics is non-trivial. In order to overcome these obstacles we propose affinity, a system that assesses the similarity between text messaging histories of users reliably and efficiently in a privacy-preserving manner. Private texting data stays on user devices and data for comparison is compared in a latent format that neither allows to reconstruct the comparison words nor any original private plain text. We evaluate our approach by calculating similarities between Twitter histories of 60 US senators. The resulting similarity network reaches an average 85.0% accuracy on a political party classification task.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Practical Verifiable In-network Filtering for DDoS defense
Authors:
Deli Gong,
Muoi Tran,
Shweta Shinde,
Hao **,
Vyas Sekar,
Prateek Saxena,
Min Suk Kang
Abstract:
In light of ever-increasing scale and sophistication of modern DDoS attacks, it is time to revisit in-network filtering or the idea of empowering DDoS victims to install in-network traffic filters in the upstream transit networks. Recent proposals show that filtering DDoS traffic at a handful of large transit networks can handle volumetric DDoS attacks effectively. However, the innetwork filtering…
▽ More
In light of ever-increasing scale and sophistication of modern DDoS attacks, it is time to revisit in-network filtering or the idea of empowering DDoS victims to install in-network traffic filters in the upstream transit networks. Recent proposals show that filtering DDoS traffic at a handful of large transit networks can handle volumetric DDoS attacks effectively. However, the innetwork filtering primitive can also be misused. Transit networks can use the in-network filtering service as an excuse for any arbitrary packet drops made for their own benefit. For example, transit networks may intentionally execute filtering services poorly or unfairly to discriminate their competing neighbor ASes while claiming that they drop packets for the sake of DDoS defense. We argue that it is due to the lack of verifiable filtering - i.e., no one can check if a transit network executes the filter rules correctly as requested by the DDoS victims. To make in-network filtering a more robust defense primitive, in this paper, we propose a verifiable in-network filtering, called VIF, that exploits emerging hardware-based trusted execution environments (TEEs) and offers filtering verifiability to DDoS victims and neighbor ASes. Our proof of concept demonstrates that a VIF filter implementation on commodity servers with TEE support can handle traffic at line rate (e.g., 10 Gb/s) and execute up to 3,000 filter rules. We show that VIF can easily scale to handle larger traffic volume (e.g., 500 Gb/s) and more complex filtering operations (e.g., 150,000 filter rules) by parallelizing the TEE-based filters. As a practical deployment model, we suggest that Internet exchange points (IXPs) are the ideal candidates for the early adopters of our verifiable filters due to their central locations and flexible software-defined architecture.
△ Less
Submitted 14 January, 2019; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Oh, What a Fragile Web We Weave: Third-party Service Dependencies In Modern Webservices and Implications
Authors:
Aqsa Kashaf,
Carolina Zarate,
Hanrou Wang,
Yuvraj Agarwal,
Vyas Sekar
Abstract:
The recent October 2016 DDoS attack on Dyn served as a wakeup call to the security community as many popular and independent webservices (e.g., Twitter, Spotify) were impacted. This incident raises a larger question on the fragility of modern webservices due to their dependence on third-party services. In this paper, we characterize the dependencies of popular webservices on third party services a…
▽ More
The recent October 2016 DDoS attack on Dyn served as a wakeup call to the security community as many popular and independent webservices (e.g., Twitter, Spotify) were impacted. This incident raises a larger question on the fragility of modern webservices due to their dependence on third-party services. In this paper, we characterize the dependencies of popular webservices on third party services and how these can lead to DoS, RoQ attacks, and reduction in security posture. In particular, we focus on three critical infrastructure services: DNS, CDNs, and certificate authorities (CAs). We analyze both direct relationships (e.g., Twitter uses Dyn) and indirect dependencies (e.g., Netflix uses Symantec as OCSP and Symantec, in turn, uses Verisign for DNS). Our key findings are: (1) 73.14% of the top 100,000 popular services are vulnerable to reduction in availabil- ity due to potential attacks on third-party DNS, CDN, CA services that they exclusively rely on; (2) the use of third-party services is concentrated, so that if the top-10 providers of CDN, DNS and OCSP services go down, they can potentially impact 25%-46% of the top 100K most popular web services; (3) transitive depen- dencies significantly increase the set of webservices that exclusively depend on popular CDN and DNS service providers, in some cases by ten times (4) targeting even less popular webservices can potentially cause signifi- cant collateral damage, affecting upto 20% of the top- 100K webservices due to their shared dependencies. Based on our findings, we present a number of key implications and guidelines to guard against such Internet- scale incidents in the future.
△ Less
Submitted 21 June, 2018;
originally announced June 2018.
-
Vulnerabilities of Electric Vehicle Battery Packs to Cyberattacks
Authors:
Shashank Sripad,
Sekar Kulandaivel,
Vikram Pande,
Vyas Sekar,
Venkatasubramanian Viswanathan
Abstract:
Electric Vehicles (EVs), like all modern vehicles, are entirely controlled by electronic devices embedded within networks that are exposed to the threat of cyberattacks. Cyber vulnerabilities are magnified with EVs due to unique risks associated with EV battery packs. Current batteries have well-known issues with specific energy, cost and fire-related safety risks. In this study, we develop a syst…
▽ More
Electric Vehicles (EVs), like all modern vehicles, are entirely controlled by electronic devices embedded within networks that are exposed to the threat of cyberattacks. Cyber vulnerabilities are magnified with EVs due to unique risks associated with EV battery packs. Current batteries have well-known issues with specific energy, cost and fire-related safety risks. In this study, we develop a systematic framework to assess the impact of cyberattacks on EVs. While the current focus of automotive cyberattacks is on short-term physical safety, it is crucial to consider long-term cyberattacks that aim to cause financial losses through accrued impact, especially in the context of EVs. Faulty components of battery management systems such as a compromised voltage regulator could lead to cyberattacks that can overdischarge or overcharge the battery. Overdischarge could lead to failures such as internal shorts in the timescale of minutes through cyberattacks that compromise energy-intensive EV subsystems like auxiliary components. Attacks that overcharge the pack could shorten the lifetime of a new battery pack to less than a year. Further, such attacks also pose physical safety risks via the triggering of thermal (fire) events. Attacks on auxiliary components lead to battery drain, which could be up to 20% of the state-of-charge per hour. Lastly, we develop a heuristic for the stealthiness of a cyberattack to augment traditional threat models. The methodology presented here will help in building the foundational principles of electric vehicle cybersecurity: a nascent but critical topic in the coming years.
△ Less
Submitted 8 September, 2019; v1 submitted 1 November, 2017;
originally announced November 2017.
-
Shedding Light on the Adoption of Let's Encrypt
Authors:
Antonis Manousis,
Roy Ragsdale,
Ben Draffin,
Adwiteeya Agrawal,
Vyas Sekar
Abstract:
Let's Encrypt is a new entrant in the Certificate Authority ecosystem that offers free and automated certificate signing. It is visionary in its commitment to Certificate Transparency. In this paper, we shed light on the adoption patterns of Let's Encrypt "in the wild" and inform the future design and deployment of this exciting development in the security landscape. We analyze acquisition pattern…
▽ More
Let's Encrypt is a new entrant in the Certificate Authority ecosystem that offers free and automated certificate signing. It is visionary in its commitment to Certificate Transparency. In this paper, we shed light on the adoption patterns of Let's Encrypt "in the wild" and inform the future design and deployment of this exciting development in the security landscape. We analyze acquisition patterns of certificates as well as their usage and deployment trends in the real world. To this end, we analyze data from Certificate Transparency Logs containing records of more then 18 million certificates. We also leverage other sources like Censys, Alexa's historic records, Geolocation databases, and VirusTotal. We also perform active HTTPS measurements on the domains owning Let's Encrypt certificates. Our analysis of certificate acquisition shows that (1) the impact of Let's Encrypt is particularly visible in Western Europe; (2) Let's Encrypt has the potential to democratize HTTPS adoption in countries that are recent entrants to Internet adoption; (3) there is anecdotal evidence of popular domains quitting their previously untrustworthy or expensive CAs in order to transition to Let's Encrypt; and (4) there is a "heavy tailed" behavior where a small number of domains acquire a large number of certificates. With respect to usage, we find that: (1) only 54% of domains actually use the Let's Encrypt certificates they have procured; (2) there are many non-trivial incidents of server misconfigurations; and (3) there is early evidence of use of Let's Encrypt certificates for typosquatting and for malware-laden sites.
△ Less
Submitted 2 November, 2016;
originally announced November 2016.
-
On the Efficiency and Fairness of Multiplayer HTTP-based Adaptive Video Streaming
Authors:
Xiaoqi Yin,
Mihovil Bartulović,
Vyas Sekar,
Bruno Sinopoli
Abstract:
User-perceived quality-of-experience (QoE) is critical in internet video delivery systems. Extensive prior work has studied the design of client-side bitrate adaptation algorithms to maximize single-player QoE. However, multiplayer QoE fairness becomes critical as the growth of video traffic makes it more likely that multiple players share a bottleneck in the network. Despite several recent propos…
▽ More
User-perceived quality-of-experience (QoE) is critical in internet video delivery systems. Extensive prior work has studied the design of client-side bitrate adaptation algorithms to maximize single-player QoE. However, multiplayer QoE fairness becomes critical as the growth of video traffic makes it more likely that multiple players share a bottleneck in the network. Despite several recent proposals, there is still a series of open questions. In this paper, we bring the problem space to light from a control theory perspective by formalizing the multiplayer QoE fairness problem and addressing two key questions in the broader problem space. First, we derive the sufficient conditions of convergence to steady state QoE fairness under TCP-based bandwidth sharing scheme. Based on the insight from this analysis that in-network active bandwidth allocation is needed, we propose a non-linear MPC-based, router-assisted bandwidth allocation algorithm that regards each player as closed-loop systems. We use trace-driven simulation to show the improvement over existing approaches. We identify several research directions enabled by the control theoretic modeling and envision that control theory can play an important role on guiding real system design in adaptive video streaming.
△ Less
Submitted 29 August, 2016;
originally announced August 2016.
-
NetMemex: Providing Full-Fidelity Traffic Archival
Authors:
Hyeontaek Lim,
Vyas Sekar,
Yoshihisa Abe,
David G. Andersen
Abstract:
NetMemex explores efficient network traffic archival without any loss of information. Unlike NetFlow-like aggregation, NetMemex allows retrieving the entire packet data including full payload, which makes it useful in forensic analysis, networked and distributed system research, and network administration. Different from packet trace dumps, NetMemex performs sophisticated data compression for smal…
▽ More
NetMemex explores efficient network traffic archival without any loss of information. Unlike NetFlow-like aggregation, NetMemex allows retrieving the entire packet data including full payload, which makes it useful in forensic analysis, networked and distributed system research, and network administration. Different from packet trace dumps, NetMemex performs sophisticated data compression for small storage space use and optimizes the data layout for fast query processing. NetMemex takes advantage of high-speed random access of flash drives and inexpensive storage space of hard disk drives. These efforts lead to a cost-effective yet high-performance full traffic archival system. We demonstrate that NetMemex can record full-fidelity traffic at near-Gbps rates using a single commodity machine, handling common queries at up to 90.1 K queries/second, at a low storage cost comparable to conventional hard disk-only traffic archival solutions.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.
-
A New Approach to DDoS Defense using SDN and NFV
Authors:
Seyed K. Fayaz,
Yoshiaki Tobioka,
Vyas Sekar,
Michael Bailey
Abstract:
Networks today rely on expensive and proprietary hard- ware appliances, which are deployed at fixed locations, for DDoS defense. This introduces key limitations with respect to flexibility (e.g., complex routing to get traffic to these "chokepoints") and elasticity in handling changing attack patterns. We observe an opportunity to ad- dress these limitations using new networking paradigms such as…
▽ More
Networks today rely on expensive and proprietary hard- ware appliances, which are deployed at fixed locations, for DDoS defense. This introduces key limitations with respect to flexibility (e.g., complex routing to get traffic to these "chokepoints") and elasticity in handling changing attack patterns. We observe an opportunity to ad- dress these limitations using new networking paradigms such as software-defined networking (SDN) and network functions virtualization (NFV). Based on this observation, we design and implement of Bohatei, an elastic and flexible DDoS defense system. In designing Bohatei, we address key challenges of scalability, responsive- ness, and adversary-resilience. We have implemented defenses for several well-known DDoS attacks in Bohatei. Our evaluations show that Bohatei is scalable (handling 500 Gbps attacks), responsive (mitigating attacks within one minute), and resilient to dynamic adversaries.
△ Less
Submitted 5 August, 2015; v1 submitted 29 June, 2015;
originally announced June 2015.
-
Analyzing TCP Throughput Stability and Predictability with Implications for Adaptive Video Streaming
Authors:
Yi Sun,
Xiaoqi Yin,
Nanshu Wang,
Junchen Jiang,
Vyas Sekar,
Yun **,
Bruno Sinopoli
Abstract:
Recent work suggests that TCP throughput stability and predictability within a video viewing session can inform the design of better video bitrate adaptation algorithms. Despite a rich tradition of Internet measurement, however, our understanding of throughput stability and predictability is quite limited. To bridge this gap, we present a measurement study of throughput stability using a large-sca…
▽ More
Recent work suggests that TCP throughput stability and predictability within a video viewing session can inform the design of better video bitrate adaptation algorithms. Despite a rich tradition of Internet measurement, however, our understanding of throughput stability and predictability is quite limited. To bridge this gap, we present a measurement study of throughput stability using a large-scale dataset from a video service provider. Drawing on this analysis, we propose a simple-but-effective prediction mechanism based on a hidden Markov model and demonstrate that it outperforms other approaches. We also show the practical implications in improving the user experience of adaptive video streaming.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Scalable Testing of Context-Dependent Policies over Stateful Data Planes with Armstrong
Authors:
Seyed K. Fayaz,
Yoshiaki Tobioka,
Sagar Chaki,
Vyas Sekar
Abstract:
Network operators today spend significant manual effort in ensuring and checking that the network meets their intended policies. While recent work in network verification has made giant strides to reduce this effort, they focus on simple reachability properties and cannot handle context-dependent policies (e.g., how many connections has a host spawned) that operators realize using stateful network…
▽ More
Network operators today spend significant manual effort in ensuring and checking that the network meets their intended policies. While recent work in network verification has made giant strides to reduce this effort, they focus on simple reachability properties and cannot handle context-dependent policies (e.g., how many connections has a host spawned) that operators realize using stateful network functions (NFs). Together, these introduce new expressiveness and scalability challenges that fall outside the scope of existing network verification mechanisms. To address these challenges, we present Armstrong, a system that enables operators to test if network with stateful data plane elements correctly implements a given context-dependent policy. Our design makes three key contributions to address expressiveness and scalability: (1) An abstract I/O unit for modeling network I/O that encodes policy-relevant context information; (2) A practical representation of complex NFs via an ensemble of finite state machines abstraction; and (3) A scalable application of symbolic execution to tackle state space explosion. We demonstrate that Armstrong is several orders of magnitude faster than existing mechanisms.
△ Less
Submitted 8 June, 2015; v1 submitted 13 May, 2015;
originally announced May 2015.
-
DDA: Cross-Session Throughput Prediction with Applications to Video Bitrate Selection
Authors:
Junchen Jiang,
Vyas Sekar,
Yi Sun
Abstract:
User experience of video streaming could be greatly improved by selecting a high-yet-sustainable initial video bitrate, and it is therefore critical to accurately predict throughput before a video session starts. Inspired by previous studies that show similarity among throughput of similar sessions (e.g., those sharing same bottleneck link), we argue for a cross-session prediction approach, where…
▽ More
User experience of video streaming could be greatly improved by selecting a high-yet-sustainable initial video bitrate, and it is therefore critical to accurately predict throughput before a video session starts. Inspired by previous studies that show similarity among throughput of similar sessions (e.g., those sharing same bottleneck link), we argue for a cross-session prediction approach, where throughput measured on other sessions is used to predict the throughput of a new session. In this paper, we study the challenges of cross-session throughput prediction, develop an accurate throughput predictor called DDA, and evaluate the performance of the predictor with real-world datasets. We show that DDA can predict throughput more accurately than simple predictors and conventional machine learning algorithms; e.g., DDA's 80%ile prediction error of DDA is > 50% lower than other algorithms. We also show that this improved accuracy enables video players to select a higher sustainable initial bitrate; e.g., compared to initial bitrate without prediction, DDA leads to 4x higher average bitrate.
△ Less
Submitted 8 May, 2015;
originally announced May 2015.
-
Accelerating the Development of Software-Defined Network Optimization Applications Using SOL
Authors:
Victor Heorhiadi,
Michael K. Reiter,
Vyas Sekar
Abstract:
Software-defined networking (SDN) can enable diverse network management applications such as traffic engineering, service chaining, network function outsourcing, and topology reconfiguration. Realizing the benefits of SDN for these applications, however, entails addressing complex network optimizations that are central to these problems. Unfortunately, such optimization problems require significan…
▽ More
Software-defined networking (SDN) can enable diverse network management applications such as traffic engineering, service chaining, network function outsourcing, and topology reconfiguration. Realizing the benefits of SDN for these applications, however, entails addressing complex network optimizations that are central to these problems. Unfortunately, such optimization problems require significant manual effort and expertise to express and non-trivial computation and/or carefully crafted heuristics to solve. Our vision is to simplify the deployment of SDN applications using general high-level abstractions for capturing optimization requirements from which we can efficiently generate optimal solutions. To this end, we present SOL, a framework that demonstrates that it is indeed possible to simultaneously achieve generality and efficiency. The insight underlying SOL is that SDN applications can be recast within a unifying path-based optimization abstraction, from which it efficiently generates near-optimal solutions, and device configurations to implement those solutions. We illustrate the generality of SOL by prototy** diverse and new applications. We show that SOL simplifies the development of SDN-based network optimization applications and provides comparable or better scalability than custom optimization solutions.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
A Framework to Quantify the Benefits of Network Functions Virtualization in Cellular Networks
Authors:
Zafar Ayyub Qazi,
Vyas Sekar,
Samir Das
Abstract:
Network functions virtualization (NFV) is an appealing vision that promises to dramatically reduce capital and operating expenses for cellular providers. However, existing efforts in this space leave open broad issues about how NFV deployments should be instantiated or how they should be provisioned. In this paper, we present an initial attempt at a framework that will help network operators syste…
▽ More
Network functions virtualization (NFV) is an appealing vision that promises to dramatically reduce capital and operating expenses for cellular providers. However, existing efforts in this space leave open broad issues about how NFV deployments should be instantiated or how they should be provisioned. In this paper, we present an initial attempt at a framework that will help network operators systematically evaluate the potential benefits that different points in the NFV design space can offer.
△ Less
Submitted 21 June, 2014;
originally announced June 2014.
-
Stratos: A Network-Aware Orchestration Layer for Virtual Middleboxes in Clouds
Authors:
Aaron Gember,
Anand Krishnamurthy,
Saul St. John,
Robert Grandl,
Xiaoyang Gao,
Ashok Anand,
Theophilus Benson,
Vyas Sekar,
Aditya Akella
Abstract:
Enterprises want their in-cloud services to leverage the performance and security benefits that middleboxes offer in traditional deployments. Such virtualized deployments create new opportunities (e.g., flexible scaling) as well as new challenges (e.g., dynamics, multiplexing) for middlebox management tasks such as service composition and provisioning. Unfortunately, enterprises lack systematic to…
▽ More
Enterprises want their in-cloud services to leverage the performance and security benefits that middleboxes offer in traditional deployments. Such virtualized deployments create new opportunities (e.g., flexible scaling) as well as new challenges (e.g., dynamics, multiplexing) for middlebox management tasks such as service composition and provisioning. Unfortunately, enterprises lack systematic tools to efficiently compose and provision in-the-cloud middleboxes and thus fall short of achieving the benefits that cloud-based deployments can offer. To this end, we present the design and implementation of Stratos, an orchestration layer for virtual middleboxes. Stratos provides efficient and correct composition in the presence of dynamic scaling via software-defined networking mechanisms. It ensures efficient and scalable provisioning by combining middlebox-specific traffic engineering, placement, and horizontal scaling strategies. We demonstrate the effectiveness of Stratos using an experimental prototype testbed and large-scale simulations.
△ Less
Submitted 11 March, 2014; v1 submitted 1 May, 2013;
originally announced May 2013.
-
Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+
Authors:
Neil Zhenqiang Gong,
Wenchang Xu,
Ling Huang,
Prateek Mittal,
Emil Stefanov,
Vyas Sekar,
Dawn Song
Abstract:
Understanding social network structure and evolution has important implications for many aspects of network and system design including provisioning, bootstrap** trust and reputation systems via social networks, and defenses against Sybil attacks. Several recent results suggest that augmenting the social network structure with user attributes (e.g., location, employer, communities of interest) c…
▽ More
Understanding social network structure and evolution has important implications for many aspects of network and system design including provisioning, bootstrap** trust and reputation systems via social networks, and defenses against Sybil attacks. Several recent results suggest that augmenting the social network structure with user attributes (e.g., location, employer, communities of interest) can provide a more fine-grained understanding of social networks. However, there have been few studies to provide a systematic understanding of these effects at scale. We bridge this gap using a unique dataset collected as the Google+ social network grew over time since its release in late June 2011. We observe novel phenomena with respect to both standard social network metrics and new attribute-related metrics (that we define). We also observe interesting evolutionary patterns as Google+ went from a bootstrap phase to a steady invitation-only stage before a public release. Based on our empirical observations, we develop a new generative model to jointly reproduce the social structure and the node attributes. Using theoretical analysis and empirical evaluations, we show that our model can accurately reproduce the social and attribute structure of real social networks. We also demonstrate that our model provides more accurate predictions for practical application contexts.
△ Less
Submitted 18 September, 2012; v1 submitted 4 September, 2012;
originally announced September 2012.
-
CARE: Content Aware Redundancy Elimination for Disaster Communications on Damaged Networks
Authors:
Udi Weinsberg,
Athula Balachandran,
Nina Taft,
Gianluca Iannaccone,
Vyas Sekar,
Srinivasan Seshan
Abstract:
During a disaster scenario, situational awareness information, such as location, physical status and images of the surrounding area, is essential for minimizing loss of life, injury, and property damage. Today's handhelds make it easy for people to gather data from within the disaster area in many formats, including text, images and video. Studies show that the extreme anxiety induced by disasters…
▽ More
During a disaster scenario, situational awareness information, such as location, physical status and images of the surrounding area, is essential for minimizing loss of life, injury, and property damage. Today's handhelds make it easy for people to gather data from within the disaster area in many formats, including text, images and video. Studies show that the extreme anxiety induced by disasters causes humans to create a substantial amount of repetitive and redundant content. Transporting this content outside the disaster zone can be problematic when the network infrastructure is disrupted by the disaster.
This paper presents the design of a novel architecture called CARE (Content-Aware Redundancy Elimination) for better utilizing network resources in disaster-affected regions. Motivated by measurement-driven insights on redundancy patterns found in real-world disaster area photos, we demonstrate that CARE can detect the semantic similarity between photos in the networking layer, thus reducing redundant transfers and improving buffer utilization. Using DTN simulations, we explore the boundaries of the usefulness of deploying CARE on a damaged network, and show that CARE can reduce packet delivery times and drops, and enables 20-40% more unique information to reach the rescue teams outside the disaster area than when CARE is not deployed.
△ Less
Submitted 8 June, 2012;
originally announced June 2012.