Search | arXiv e-print repository

Bias in Internet Measurement Platforms

Authors: Pavlos Sermpezis, Lars Prehn, Sofia Kostoglou, Marcel Flores, Athena Vakali, Emile Aben

Abstract: Network operators and researchers frequently use Internet measurement platforms (IMPs), such as RIPE Atlas, RIPE RIS, or RouteViews for, e.g., monitoring network performance, detecting routing events, topology discovery, or route optimization. To interpret the results of their measurements and avoid pitfalls or wrong generalizations, users must understand a platform's limitations. To this end, thi… ▽ More Network operators and researchers frequently use Internet measurement platforms (IMPs), such as RIPE Atlas, RIPE RIS, or RouteViews for, e.g., monitoring network performance, detecting routing events, topology discovery, or route optimization. To interpret the results of their measurements and avoid pitfalls or wrong generalizations, users must understand a platform's limitations. To this end, this paper studies an important limitation of IMPs, the \textit{bias}, which exists due to the non-uniform deployment of the vantage points. Specifically, we introduce a generic framework to systematically and comprehensively quantify the multi-dimensional (e.g., across location, topology, network types, etc.) biases of IMPs. Using the framework and open datasets, we perform a detailed analysis of biases in IMPs that confirms well-known (to the domain experts) biases and sheds light on less-known or unexplored biases. To facilitate IMP users to obtain awareness of and explore bias in their measurements, as well as further research and analyses (e.g., methods for mitigating bias), we publicly share our code and data, and provide online tools (API, Web app, etc.) that calculate and visualize the bias in measurement setups. △ Less

Submitted 24 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.09819 [pdf]

Analyzing large scale political discussions on Twitter: the use case of the Greek wiretap** scandal (#ypoklopes)

Authors: Ilias Dimitriadis, Dimitrios P. Giakatos, Stelios Karamanidis, Pavlos Sermpezis, Kelly Kiki, Athena Vakali

Abstract: In this paper, we study the Greek wiretap**s scandal, which has been revealed in 2022 and attracted a lot of attention by press and citizens. Specifically, we propose a methodology for collecting data and analyzing patterns of online public discussions on Twitter. We apply our methodology to the Greek wiretap**s use case, and present findings related to the evolution of the discussion over tim… ▽ More In this paper, we study the Greek wiretap**s scandal, which has been revealed in 2022 and attracted a lot of attention by press and citizens. Specifically, we propose a methodology for collecting data and analyzing patterns of online public discussions on Twitter. We apply our methodology to the Greek wiretap**s use case, and present findings related to the evolution of the discussion over time, its polarization, and the role of the media. The methodology can be of wider use and replicated to other topics. Finally, we provide publicly an open dataset, and online resources with the results. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2303.15592 [pdf, other]

doi 10.1145/3610914

Uncovering Bias in Personal Informatics

Authors: Sofia Yfantidou, Pavlos Sermpezis, Athena Vakali, Ricardo Baeza-Yates

Abstract: Personal informatics (PI) systems, powered by smartphones and wearables, enable people to lead healthier lifestyles by providing meaningful and actionable insights that break down barriers between users and their health information. Today, such systems are used by billions of users for monitoring not only physical activity and sleep but also vital signs and women's and heart health, among others.… ▽ More Personal informatics (PI) systems, powered by smartphones and wearables, enable people to lead healthier lifestyles by providing meaningful and actionable insights that break down barriers between users and their health information. Today, such systems are used by billions of users for monitoring not only physical activity and sleep but also vital signs and women's and heart health, among others. Despite their widespread usage, the processing of sensitive PI data may suffer from biases, which may entail practical and ethical implications. In this work, we present the first comprehensive empirical and analytical study of bias in PI systems, including biases in raw data and in the entire machine learning life cycle. We use the most detailed framework to date for exploring the different sources of bias and find that biases exist both in the data generation and the model learning and implementation streams. According to our results, the most affected minority groups are users with health issues, such as diabetes, joint issues, and hypertension, and female users, whose data biases are propagated or even amplified by learning models, while intersectional biases can also be observed. △ Less

Submitted 19 July, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Report number: Volume: 7 Number: 3, Article: 139

Journal ref: IMWUT 2023

arXiv:2303.06478 [pdf, other]

PyPoll: A python library automating mining of networks, discussions and polarization on Twitter

Authors: Dimitrios Panteleimon Giakatos, Pavlos Sermpezis, Athena Vakali

Abstract: Today online social networks have a high impact in our society as more and more people use them for communicating with each other, express their opinions, participating in public discussions, etc. In particular, Twitter is one of the most popular social network platforms people mainly use for political discussions. This attracted the interest of many research studies that analyzed social phenomena… ▽ More Today online social networks have a high impact in our society as more and more people use them for communicating with each other, express their opinions, participating in public discussions, etc. In particular, Twitter is one of the most popular social network platforms people mainly use for political discussions. This attracted the interest of many research studies that analyzed social phenomena on Twitter, by collecting data, analysing communication patterns, and exploring the structure of user networks. While previous works share many common methodologies for data collection and analysis, these are mainly re-implemented every time by researchers in a custom way. In this paper, we introduce PyPoll an open-source Python library that operationalizes common analysis tasks for Twitter discussions. With PyPoll users can perform Twitter graph mining, calculate the polarization index and generate interactive visualizations without needing third-party tools. We believe that PyPoll can help researchers automate their tasks by giving them methods that are easy to use. Also, we demonstrate the use of the library by presenting two use cases; the PyPoll visualization app, an online application for graph visualizing and sharing, and the Political Lighthouse, a Web portal for displaying the polarization in various political topics on Twitter. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2210.14189 [pdf, other]

Benchmarking Graph Neural Networks for Internet Routing Data

Authors: Dimitrios Panteleimon Giakatos, Sofia Kostoglou, Pavlos Sermpezis, Athena Vakali

Abstract: The Internet is composed of networks, called Autonomous Systems (or, ASes), interconnected to each other, thus forming a large graph. While both the AS-graph is known and there is a multitude of data available for the ASes (i.e., node attributes), the research on applying graph machine learning (ML) methods on Internet data has not attracted a lot of attention. In this work, we provide a benchmark… ▽ More The Internet is composed of networks, called Autonomous Systems (or, ASes), interconnected to each other, thus forming a large graph. While both the AS-graph is known and there is a multitude of data available for the ASes (i.e., node attributes), the research on applying graph machine learning (ML) methods on Internet data has not attracted a lot of attention. In this work, we provide a benchmarking framework aiming to facilitate research on Internet data using graph-ML and graph neural network (GNN) methods. Specifically, we compile a dataset with heterogeneous node/AS attributes by collecting data from multiple online sources, and preprocessing them so that they can be easily used as input in GNN architectures. Then, we create a framework/pipeline for applying GNNs on the compiled data. For a set of tasks, we perform a benchmarking of different GNN models (as well as, non-GNN ML models) to test their efficiency; our results can serve as a common baseline for future research and provide initial insights for the application of GNNs on Internet data. △ Less

Submitted 25 October, 2022; originally announced October 2022.

arXiv:2206.01421 [pdf, other]

doi 10.1145/3511047.3538029

12 Years of Self-tracking for Promoting Physical Activity from a User Diversity Perspective: Taking Stock and Thinking Ahead

Authors: Sofia Yfantidou, Pavlos Sermpezis, Athena Vakali

Abstract: Despite the indisputable personal and societal benefits of regular physical activity, a large portion of the population does not follow the recommended guidelines, harming their health and wellness. The World Health Organization has called upon governments, practitioners, and researchers to accelerate action to address the global prevalence of physical inactivity. To this end, an emerging wave of… ▽ More Despite the indisputable personal and societal benefits of regular physical activity, a large portion of the population does not follow the recommended guidelines, harming their health and wellness. The World Health Organization has called upon governments, practitioners, and researchers to accelerate action to address the global prevalence of physical inactivity. To this end, an emerging wave of research in ubiquitous computing has been exploring the potential of interactive self-tracking technology in encouraging positive health behavior change. Numerous findings indicate the benefits of personalization and inclusive design regarding increasing the motivational appeal and overall effectiveness of behavior change systems, with the ultimate goal of empowering and facilitating people to achieve their goals. However, most interventions still adopt a "one-size-fits-all" approach to their design, assuming equal effectiveness for all system features in spite of individual and collective user differences. To this end, we analyze a corpus of 12 years of research in self-tracking technology for health behavior change, focusing on physical activity, to identify those design elements that have proven most effective in inciting desirable behavior across diverse population segments. We then provide actionable recommendations for designing and evaluating behavior change self-tracking technology based on age, gender, occupation, fitness, and health condition. Finally, we engage in a critical commentary on the diversity of the domain and discuss ethical concerns surrounding tailored interventions and directions for moving forward. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2110.00772 [pdf, ps, other]

Network Friendly Recommendations: Optimizing for Long Viewing Sessions

Authors: Theodoros Giannakas, Pavlos Sermpezis, Thrasyvoulos Spyropoulos

Abstract: Caching algorithms try to predict content popularity, and place the content closer to the users. Additionally, nowadays requests are increasingly driven by recommendation systems (RS). These important trends, point to the following: \emph{make RSs favor locally cached content}, this way operators reduce network costs, and users get better streaming rates. Nevertheless, this process should preserve… ▽ More Caching algorithms try to predict content popularity, and place the content closer to the users. Additionally, nowadays requests are increasingly driven by recommendation systems (RS). These important trends, point to the following: \emph{make RSs favor locally cached content}, this way operators reduce network costs, and users get better streaming rates. Nevertheless, this process should preserve the quality of the recommendations (QoR). In this work, we propose a Markov Chain model for a stochastic, recommendation-driven \emph{sequence} of requests, and formulate the problem of selecting high quality recommendations that minimize the network cost \emph{in the long run}. While the original optimization problem is non-convex, it can be convexified through a series of transformations. Moreover, we extend our framework for users who show preference in some positions of the recommendations' list. To our best knowledge, this is the first work to provide an optimal polynomial-time algorithm for these problems. Finally, testing our algorithms on real datasets suggests significant potential, e.g., $2\times$ improvement compared to baseline recommendations, and 80\% compared to a greedy network-friendly-RS (which optimizes the cost for I.I.D. requests), while preserving at least 90\% of the original QoR. Finally, we show that taking position preference into account leads to additional performance gains. △ Less

Submitted 2 October, 2021; originally announced October 2021.

Journal ref: IEEE Transactions on Mobile Computing, 2021

arXiv:2109.02358 [pdf, other]

Pointspectrum: Equivariance Meets Laplacian Filtering for Graph Representation Learning

Authors: Marinos Poiitis, Pavlos Sermpezis, Athena Vakali

Abstract: Graph Representation Learning (GRL) has become essential for modern graph data mining and learning tasks. GRL aims to capture the graph's structural information and exploit it in combination with node and edge attributes to compute low-dimensional representations. While Graph Neural Networks (GNNs) have been used in state-of-the-art GRL architectures, they have been shown to suffer from over smoot… ▽ More Graph Representation Learning (GRL) has become essential for modern graph data mining and learning tasks. GRL aims to capture the graph's structural information and exploit it in combination with node and edge attributes to compute low-dimensional representations. While Graph Neural Networks (GNNs) have been used in state-of-the-art GRL architectures, they have been shown to suffer from over smoothing when many GNN layers need to be stacked. In a different GRL approach, spectral methods based on graph filtering have emerged addressing over smoothing; however, up to now, they employ traditional neural networks that cannot efficiently exploit the structure of graph data. Motivated by this, we propose PointSpectrum, a spectral method that incorporates a set equivariant network to account for a graph's structure. PointSpectrum enhances the efficiency and expressiveness of spectral methods, while it outperforms or competes with state-of-the-art GRL methods. Overall, PointSpectrum addresses over smoothing by employing a graph filter and captures a graph's structure through set equivariance, lying on the intersection of GNNs and spectral methods. Our findings are promising for the benefits and applicability of this architectural shift for spectral methods and GRL. △ Less

Submitted 7 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

Comments: 13 pages, 8 figures, 6 tables

arXiv:2105.02346 [pdf, ps, other]

Estimating the Impact of BGP Prefix Hijacking

Authors: Pavlos Sermpezis, Vasileios Kotronis, Konstantinos Arakadakis, Athena Vakali

Abstract: BGP prefix hijacking is a critical threat to the resilience and security of communications in the Internet. While several mechanisms have been proposed to prevent, detect or mitigate hijacking events, it has not been studied how to accurately quantify the impact of an ongoing hijack. When detecting a hijack, existing methods do not estimate how many networks in the Internet are affected (before an… ▽ More BGP prefix hijacking is a critical threat to the resilience and security of communications in the Internet. While several mechanisms have been proposed to prevent, detect or mitigate hijacking events, it has not been studied how to accurately quantify the impact of an ongoing hijack. When detecting a hijack, existing methods do not estimate how many networks in the Internet are affected (before and/or after its mitigation). In this paper, we study fundamental and practical aspects of the problem of estimating the impact of an ongoing hijack through network measurements. We derive analytical results for the involved trade-offs and limits, and investigate the performance of different measurement approaches (control/data-plane measurements) and use of public measurement infrastructure. Our findings provide useful insights for the design of accurate hijack impact estimation methodologies. Based on these insights, we design (i) a lightweight and practical estimation methodology that employs ** measurements, and (ii) an estimator that employs public infrastructure measurements and eliminates correlations between them to improve the accuracy. We validate the proposed methodologies and findings against results from hijacking experiments we conduct in the real Internet. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: IFIP Networking conference 2021

arXiv:2104.11483 [pdf, ps, other]

doi 10.1145/3592621

14 Years of Self-Tracking Technology for mHealth -- Literature Review: Lessons Learnt and the PAST SELF Framework

Authors: Sofia Yfantidou, Pavlos Sermpezis, Athena Vakali

Abstract: In today's connected society, many people rely on mHealth and self-tracking (ST) technology to help them adopt healthier habits with a focus on breaking their sedentary lifestyle and staying fit. However, there is scarce evidence of such technological interventions' effectiveness, and there are no standardized methods to evaluate their impact on people's physical activity (PA) and health. This wor… ▽ More In today's connected society, many people rely on mHealth and self-tracking (ST) technology to help them adopt healthier habits with a focus on breaking their sedentary lifestyle and staying fit. However, there is scarce evidence of such technological interventions' effectiveness, and there are no standardized methods to evaluate their impact on people's physical activity (PA) and health. This work aims to help ST practitioners and researchers by empowering them with systematic guidelines and a framework for designing and evaluating technological interventions to facilitate health behavior change (HBC) and user engagement (UE), focusing on increasing PA and decreasing sedentariness. To this end, we conduct a literature review of 129 papers between 2008 and 2022, which identifies the core ST HCI design methods and their efficacy, as well as the most comprehensive list to date of UE evaluation metrics for ST. Based on the review's findings, we propose PAST SELF, a framework to guide the design and evaluation of ST technology that has potential applications in industrial and scientific settings. Finally, to facilitate researchers and practitioners, we complement this paper with an open corpus and an online, adaptive exploration tool for the PAST SELF data. △ Less

Submitted 29 April, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

Comments: 40 pages, 10 figures

arXiv:2104.00959 [pdf, ps, other]

doi 10.1109/WoWMoM51794.2021.00020

Fairness in Network-Friendly Recommendations

Authors: Theodoros Giannakas, Pavlos Sermpezis, Anastasios Giovanidis, Thrasyvoulos Spyropoulos, George Arvanitakis

Abstract: As mobile traffic is dominated by content services (e.g., video), which typically use recommendation systems, the paradigm of network-friendly recommendations (NFR) has been proposed recently to boost the network performance by promoting content that can be efficiently delivered (e.g., cached at the edge). NFR increase the network performance, however, at the cost of being unfair towards certain c… ▽ More As mobile traffic is dominated by content services (e.g., video), which typically use recommendation systems, the paradigm of network-friendly recommendations (NFR) has been proposed recently to boost the network performance by promoting content that can be efficiently delivered (e.g., cached at the edge). NFR increase the network performance, however, at the cost of being unfair towards certain contents when compared to the standard recommendations. This unfairness is a side effect of NFR that has not been studied in literature. Nevertheless, retaining fairness among contents is a key operational requirement for content providers. This paper is the first to study the fairness in NFR, and design fair-NFR. Specifically, we use a set of metrics that capture different notions of fairness, and study the unfairness created by existing NFR schemes. Our analysis reveals that NFR can be significantly unfair. We identify an inherent trade-off between the network gains achieved by NFR and the resulting unfairness, and derive bounds for this trade-off. We show that existing NFR schemes frequently operate far from the bounds, i.e., there is room for improvement. To this end, we formulate the design of Fair-NFR (i.e., NFR with fairness guarantees compared to the baseline recommendations) as a linear optimization problem. Our results show that the Fair-NFR can achieve high network gains (similar to non-fair-NFR) with little unfairness. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2021

Journal ref: IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Jun 2021, Pisa (virtual), Italy

arXiv:2010.03183 [pdf, ps, other]

Network-aware Recommendations in the Wild: Methodology, Realistic Evaluations, Experiments

Authors: Savvas Kastanakis, Pavlos Sermpezis, Vasileios Kotronis, Daniel Menasché, Thrasyvoulos Spyropoulos

Abstract: Joint caching and recommendation has been recently proposed as a new paradigm for increasing the efficiency of mobile edge caching. Early findings demonstrate significant gains for the network performance. However, previous works evaluated the proposed schemes exclusively on simulation environments. Hence, it still remains uncertain whether the claimed benefits would change in real settings. In th… ▽ More Joint caching and recommendation has been recently proposed as a new paradigm for increasing the efficiency of mobile edge caching. Early findings demonstrate significant gains for the network performance. However, previous works evaluated the proposed schemes exclusively on simulation environments. Hence, it still remains uncertain whether the claimed benefits would change in real settings. In this paper, we propose a methodology that enables to evaluate joint network and recommendation schemes in real content services by only using publicly available information. We apply our methodology to the YouTube service, and conduct extensive measurements to investigate the potential performance gains. Our results show that significant gains can be achieved in practice; e.g., 8 to 10 times increase in the cache hit ratio from cache-aware recommendations. Finally, we build an experimental testbed and conduct experiments with real users; we make available our code and datasets to facilitate further research. To our best knowledge, this is the first realistic evaluation (over a real service, with real measurements and user experiments) of the joint caching and recommendations paradigm. Our findings provide experimental evidence for the feasibility and benefits of this paradigm, validate assumptions of previous works, and provide insights that can drive future research. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: arXiv admin note: text overlap with arXiv:1806.02704

arXiv:2009.10802 [pdf]

My tweets bring all the traits to the yard: Predicting personality and relational traits in Online Social Networks

Authors: Dimitra Karanatsiou, Pavlos Sermpezis, Jon Gruda, Konstantinos Kafetsios, Ilias Dimitriadis, Athena Vakali

Abstract: Users in Online Social Networks (OSN) leaves traces that reflect their personality characteristics. The study of these traces is important for a number of fields, such as a social science, psychology, OSN, marketing, and others. Despite a marked increase on research in personality prediction on based on online behavior the focus has been heavily on individual personality traits largely neglecting… ▽ More Users in Online Social Networks (OSN) leaves traces that reflect their personality characteristics. The study of these traces is important for a number of fields, such as a social science, psychology, OSN, marketing, and others. Despite a marked increase on research in personality prediction on based on online behavior the focus has been heavily on individual personality traits largely neglecting relational facets of personality. This study aims to address this gap by providing a prediction model for a holistic personality profiling in OSNs that included socio-relational traits (attachment orientations) in combination with standard personality traits. Specifically, we first designed a feature engineering methodology that extracts a wide range of features (accounting for behavior, language, and emotions) from OSN accounts of users. Then, we designed a machine learning model that predicts scores for the psychological traits of the users based on the extracted features. The proposed model architecture is inspired by characteristics embedded in psychological theory, i.e, utilizing interrelations among personality facets, and leads to increased accuracy in comparison with the state of the art approaches. To demonstrate the usefulness of this approach, we applied our model to two datasets, one of random OSN users and one of organizational leaders, and compared their psychological profiles. Our findings demonstrate that the two groups can be clearly separated by only using their psychological profiles, which opens a promising direction for future research on OSN user characterization and classification. △ Less

Submitted 22 September, 2020; originally announced September 2020.

arXiv:1911.04924 [pdf, other]

doi 10.1145/3278532.3278556

O Peer, Where Art Thou? Uncovering Remote Peering Interconnections at IXPs

Authors: George Nomikos, Vasileios Kotronis, Pavlos Sermpezis, Petros Gigis, Lefteris Manassakis, Christoph Dietzel, Stavros Konstantaras, Xenofontas Dimitropoulos, Vasileios Giotsas

Abstract: Internet eXchange Points (IXPs) are Internet hubs that mainly provide the switching infrastructure to interconnect networks and exchange traffic. While the initial goal of IXPs was to bring together networks residing in the same city or country, and thus keep local traffic local, this model is gradually shifting. Many networks connect to IXPs without having physical presence at their switching inf… ▽ More Internet eXchange Points (IXPs) are Internet hubs that mainly provide the switching infrastructure to interconnect networks and exchange traffic. While the initial goal of IXPs was to bring together networks residing in the same city or country, and thus keep local traffic local, this model is gradually shifting. Many networks connect to IXPs without having physical presence at their switching infrastructure. This practice, called Remote Peering, is changing the Internet topology and economy, and has become the subject of a contentious debate within the network operators' community. However, despite the increasing attention it attracts, the understanding of the characteristics and impact of remote peering is limited. In this work, we introduce and validate a heuristic methodology for discovering remote peers at IXPs. We (i) identify critical remote peering inference challenges, (ii) infer remote peers with high accuracy (>95%) and coverage (93%) per IXP, and (iii) characterize different aspects of the remote peering ecosystem by applying our methodology to 30 large IXPs. We observe that remote peering is a significantly common practice in all the studied IXPs; for the largest IXPs, remote peers account for 40% of their member base. We also show that today, IXP growth is mainly driven by remote peering, which contributes two times more than local peering. △ Less

Submitted 12 November, 2019; originally announced November 2019.

arXiv:1907.06392 [pdf, ps, other]

Towards QoS-Aware Recommendations

Authors: Pavlos Sermpezis, Savvas Kastanakis, João Ismael Pinheiro, Felipe Assis, Mateus Nogueira, Daniel Menasché, Thrasyvoulos Spyropoulos

Abstract: In this paper we propose that recommendation systems (RSs) for multimedia services should be "QoS-aware", i.e., take into account the expected QoS with which a content can be delivered, to increase the user satisfaction. Network-aware recommendations have been very recently proposed as a promising solution to improve network performance. However, the idea of QoS-aware RSs has been studied from the… ▽ More In this paper we propose that recommendation systems (RSs) for multimedia services should be "QoS-aware", i.e., take into account the expected QoS with which a content can be delivered, to increase the user satisfaction. Network-aware recommendations have been very recently proposed as a promising solution to improve network performance. However, the idea of QoS-aware RSs has been studied from the network perspective. Its feasibility and performance performance advantages for the content-provider or user perspective have only been speculated. Hence, in this paper we aim to provide initial answers for the feasibility of the concept of QoS-aware RS, by investigating its impact on real user experience. To this end, we conduct experiments with real users on a testbed, and present initial experimental results. Our analysis demonstrates the potential of the idea: QoS-aware RSs could be beneficial for both the users (better experience) and content providers (higher user engagement). Moreover, based on the collected dataset, we build statistical models to (i) predict the user experience as a function of QoS, relevance of recommendations (QoR) and user interest, and (ii) provide useful insights for the design of QoS-aware RSs. We believe that our study is an important first step towards QoS-aware recommendations, by providing experimental evidence for their feasibility and benefits, and can help open a future research direction. △ Less

Submitted 1 October, 2020; v1 submitted 15 July, 2019; originally announced July 2019.

Journal ref: ACM RecSys 2020 workshops (CARS: workshop on Context-Aware Recommender Systems)

arXiv:1905.04947 [pdf, ps, other]

The Order of Things: Position-Aware Network-friendly Recommendations in Long Viewing Sessions

Authors: Theodoros Giannakas, Thrasyvoulos Spyropoulos, Pavlos Sermpezis

Abstract: Caching has recently attracted a lot of attention in the wireless communications community, as a means to cope with the increasing number of users consuming web content from mobile devices. Caching offers an opportunity for a win-win scenario: nearby content can improve the video streaming experience for the user, and free up valuable network resources for the operator. At the same time, recent wo… ▽ More Caching has recently attracted a lot of attention in the wireless communications community, as a means to cope with the increasing number of users consuming web content from mobile devices. Caching offers an opportunity for a win-win scenario: nearby content can improve the video streaming experience for the user, and free up valuable network resources for the operator. At the same time, recent works have shown that recommendations of popular content apps are responsible for a significant percentage of users requests. As a result, some very recent works have considered how to nudge recommendations to facilitate the network (e.g., increase cache hit rates). In this paper, we follow up on this line of work, and consider the problem of designing cache friendly recommendations for long viewing sessions; specifically, we attempt to answer two open questions in this context: (i) given that recommendation position affects user click rates, what is the impact on the performance of such network-friendly recommender solutions? (ii) can the resulting optimization problems be solved efficiently, when considering both sequences of dependent accesses (e.g., YouTube) and position preference? To this end, we propose a stochastic model that incorporates position-aware recommendations into a Markovian traversal model of the content catalog, and derive the average cost of a user session using absorbing Markov chain theory. We then formulate the optimization problem, and after a careful sequence of equivalent transformations show that it has a linear program equivalent and thus can be solved efficiently. Finally, we use a range of real datasets we collected to investigate the impact of position preference in recommendations on the proposed optimal algorithm. Our results suggest more than 30\% improvement with respect to state-of-the-art methods. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Journal ref: IEEE/IFIP WiOpt 2019

arXiv:1905.04150 [pdf, ps, other]

doi 10.1145/3326145

Inferring Catchment in Internet Routing

Authors: Pavlos Sermpezis, Vasileios Kotronis

Abstract: BGP is the de-facto Internet routing protocol for exchanging prefix reachability information between Autonomous Systems (AS). It is a dynamic, distributed, path-vector protocol that enables rich expressions of network policies (typically treated as secrets). In this regime, where complexity is interwoven with information hiding, answering questions such as "what is the expected catchment of the an… ▽ More BGP is the de-facto Internet routing protocol for exchanging prefix reachability information between Autonomous Systems (AS). It is a dynamic, distributed, path-vector protocol that enables rich expressions of network policies (typically treated as secrets). In this regime, where complexity is interwoven with information hiding, answering questions such as "what is the expected catchment of the anycast sites of a content provider on the AS-level, if new sites are deployed?", or "how will load-balancing behave if an ISP changes its routing policy for a prefix?", is a hard challenge. In this work, we present a formal model and methodology that takes into account policy-based routing and topological properties of the Internet graph, to predict the routing behavior of networks. We design algorithms that provide new capabilities for informative route inference (e.g., isolating the effect of randomness that is present in prior simulation-based approaches). We analyze the properties of these inference algorithms, and evaluate them using publicly available routing datasets and real-world experiments. The proposed framework can be useful in a number of applications: measurements, traffic engineering, network planning, Internet routing models, etc. As a use case, we study the problem of selecting a set of measurement vantage points to maximize route inference. Our methodology is general and can capture standard valley-free routing, as well as more complex topological and routing setups appearing in practice. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: ACM Sigmetrics 2019

Journal ref: Proceedings of the ACM on the Measurement and Analysis of Computing Systems (POMACS), Vol. 3, No. 2, Article 30. Publication date: June 2019

arXiv:1806.02704 [pdf, ps, other]

CABaRet: Leveraging Recommendation Systems for Mobile Edge Caching

Authors: Savvas Kastanakis, Pavlos Sermpezis, Vasileios Kotronis, Xenofontas Dimitropoulos

Abstract: Joint caching and recommendation has been recently proposed for increasing the efficiency of mobile edge caching. While previous works assume collaboration between mobile network operators and content providers (who control the recommendation systems), this might be challenging in today's economic ecosystem, with existing protocols and architectures. In this paper, we propose an approach that enab… ▽ More Joint caching and recommendation has been recently proposed for increasing the efficiency of mobile edge caching. While previous works assume collaboration between mobile network operators and content providers (who control the recommendation systems), this might be challenging in today's economic ecosystem, with existing protocols and architectures. In this paper, we propose an approach that enables cache-aware recommendations without requiring a network and content provider collaboration. We leverage information provided publicly by the recommendation system, and build a system that provides cache-friendly and high-quality recommendations. We apply our approach to the YouTube service, and conduct measurements on YouTube video recommendations and experiments with video requests, to evaluate the potential gains in the cache hit ratio. Finally, we analytically study the problem of caching optimization under our approach. Our results show that significant caching gains can be achieved in practice; 8 to 10 times increase in the cache hit ratio from cache-aware recommendations, and an extra 2 times increase from caching optimization. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: ACM SIGCOMM 2018 workshops: Workshop on Mobile Edge Communications (MECOMM'18), August 20, 2018, Budapest, Hungary

arXiv:1805.06670 [pdf, ps, other]

Show me the Cache: Optimizing Cache-Friendly Recommendations for Sequential Content Access

Authors: Theodoros Giannakas, Pavlos Sermpezis, Thrasyvoulos Spyropoulos

Abstract: Caching has been successfully applied in wired networks, in the context of Content Distribution Networks (CDNs), and is quickly gaining ground for wireless systems. Storing popular content at the edge of the network (e.g. at small cells) is seen as a `win-win' for both the user (reduced access latency) and the operator (reduced load on the transport network and core servers). Nevertheless, the muc… ▽ More Caching has been successfully applied in wired networks, in the context of Content Distribution Networks (CDNs), and is quickly gaining ground for wireless systems. Storing popular content at the edge of the network (e.g. at small cells) is seen as a `win-win' for both the user (reduced access latency) and the operator (reduced load on the transport network and core servers). Nevertheless, the much smaller size of such edge caches, and the volatility of user preferences suggest that standard caching methods do not suffice in this context. What is more, simple popularity-based models commonly used (e.g. IRM) are becoming outdated, as users often consume multiple contents in sequence (e.g. YouTube, Spotify), and this consumption is driven by recommendation systems. The latter presents a great opportunity to bias the recommender to minimize content access cost (e.g. maximizing cache hit rates). To this end, in this paper we first propose a Markovian model for recommendation-driven user requests. We then formulate the problem of biasing the recommendation algorithm to minimize access cost, while maintaining acceptable recommendation quality. We show that the problem is non-convex, and propose an iterative ADMM-based algorithm that outperforms existing schemes, and shows significant potential for performance improvement on real content datasets. △ Less

Submitted 17 May, 2018; originally announced May 2018.

arXiv:1801.02918 [pdf, ps, other]

A Survey among Network Operators on BGP Prefix Hijacking

Authors: Pavlos Sermpezis, Vasileios Kotronis, Alberto Dainotti, Xenofontas Dimitropoulos

Abstract: BGP prefix hijacking is a threat to Internet operators and users. Several mechanisms or modifications to BGP that protect the Internet against it have been proposed. However, the reality is that most operators have not deployed them and are reluctant to do so in the near future. Instead, they rely on basic - and often inefficient - proactive defenses to reduce the impact of hijacking events, or on… ▽ More BGP prefix hijacking is a threat to Internet operators and users. Several mechanisms or modifications to BGP that protect the Internet against it have been proposed. However, the reality is that most operators have not deployed them and are reluctant to do so in the near future. Instead, they rely on basic - and often inefficient - proactive defenses to reduce the impact of hijacking events, or on detection based on third party services and reactive approaches that might take up to several hours. In this work, we present the results of a survey we conducted among 75 network operators to study: (a) the operators' awareness of BGP prefix hijacking attacks, (b) presently used defenses (if any) against BGP prefix hijacking, (c) the willingness to adopt new defense mechanisms, and (d) reasons that may hinder the deployment of BGP prefix hijacking defenses. We expect the findings of this survey to increase the understanding of existing BGP hijacking defenses and the needs of network operators, as well as contribute towards designing new defense mechanisms that satisfy the requirements of the operators. △ Less

Submitted 9 January, 2018; originally announced January 2018.

arXiv:1801.01085 [pdf, ps, other]

ARTEMIS: Neutralizing BGP Hijacking within a Minute

Authors: Pavlos Sermpezis, Vasileios Kotronis, Petros Gigis, Xenofontas Dimitropoulos, Danilo Cicalese, Alistair King, Alberto Dainotti

Abstract: BGP prefix hijacking is a critical threat to Internet organizations and users. Despite the availability of several defense approaches (ranging from RPKI to popular third-party services), none of them solves the problem adequately in practice. In fact, they suffer from: (i) lack of detection comprehensiveness, allowing sophisticated attackers to evade detection, (ii) limited accuracy, especially in… ▽ More BGP prefix hijacking is a critical threat to Internet organizations and users. Despite the availability of several defense approaches (ranging from RPKI to popular third-party services), none of them solves the problem adequately in practice. In fact, they suffer from: (i) lack of detection comprehensiveness, allowing sophisticated attackers to evade detection, (ii) limited accuracy, especially in the case of third-party detection, (iii) delayed verification and mitigation of incidents, reaching up to days, and (iv) lack of privacy and of flexibility in post-hijack counteractions, on the side of network operators. In this work, we propose ARTEMIS (Automatic and Real-Time dEtection and MItigation System), a defense approach (a) based on accurate and fast detection operated by the AS itself, leveraging the pervasiveness of publicly available BGP monitoring services and their recent shift towards real-time streaming, thus (b) enabling flexible and fast mitigation of hijacking events. Compared to previous work, our approach combines characteristics desirable to network operators such as comprehensiveness, accuracy, speed, privacy, and flexibility. Finally, we show through real-world experiments that, with the ARTEMIS approach, prefix hijacking can be neutralized within a minute. △ Less

Submitted 27 June, 2018; v1 submitted 3 January, 2018; originally announced January 2018.

arXiv:1706.07323 [pdf, ps, other]

Re-map** the Internet: Bring the IXPs into Play

Authors: Pavlos Sermpezis, George Nomikos, Xenofontas Dimitropoulos

Abstract: The Internet topology is of high importance in designing networks and architectures, evaluating performance, and economics. Interconnections between domains (ASes), routers, and points of presence (PoPs), have been measured, analyzed, and modeled. However, existing models have some serious shortcomings, related to ease, accuracy and completeness of measurements, and limited applicability to emergi… ▽ More The Internet topology is of high importance in designing networks and architectures, evaluating performance, and economics. Interconnections between domains (ASes), routers, and points of presence (PoPs), have been measured, analyzed, and modeled. However, existing models have some serious shortcomings, related to ease, accuracy and completeness of measurements, and limited applicability to emerging research areas. To this end, in this paper, we propose a novel approach towards capturing the inter-domain Internet topology. Motivated by the recent interest in the Internet eXchange Points (IXPs), we introduce a network graph model based on IXPs and their AS memberships. The proposed model aims to complement previous modeling efforts, shed light on unexplored characteristics of the Internet topology, and support new research directions. We also collect and make available Internet connectivity data, analyze main topological properties, and discuss application-related issues. △ Less

Submitted 22 June, 2017; originally announced June 2017.

Comments: www.inspire.edu.gr/ixp-map

arXiv:1702.05349 [pdf, ps, other]

doi 10.1145/2934872.2959078

ARTEMIS: Real-Time Detection and Automatic Mitigation for BGP Prefix Hijacking

Authors: Gavriil Chaviaras, Petros Gigis, Pavlos Sermpezis, Xenofontas Dimitropoulos

Abstract: Prefix hijacking is a common phenomenon in the Internet that often causes routing problems and economic losses. In this demo, we propose ARTEMIS, a tool that enables network administrators to detect and mitigate prefix hijacking incidents, against their own prefixes. ARTEMIS is based on the real-time monitoring of BGP data in the Internet, and software-defined networking (SDN) principles, and can… ▽ More Prefix hijacking is a common phenomenon in the Internet that often causes routing problems and economic losses. In this demo, we propose ARTEMIS, a tool that enables network administrators to detect and mitigate prefix hijacking incidents, against their own prefixes. ARTEMIS is based on the real-time monitoring of BGP data in the Internet, and software-defined networking (SDN) principles, and can completely mitigate a prefix hijacking within a few minutes (e.g., 5-6 mins in our experiments) after it has been launched. △ Less

Submitted 17 February, 2017; originally announced February 2017.

Journal ref: Proceedings of the ACM SIGCOMM 2016 Conference (SIGCOMM '16), 625-626

arXiv:1702.04943 [pdf, ps, other]

Femto-Caching with Soft Cache Hits: Improving Performance through Recommendation and Delivery of Related Content

Authors: Pavlos Sermpezis, Thrasyvoulos Spyropoulos, Luigi Vigneri, Theodoros Giannakas

Abstract: Pushing popular content to cheap "helper" nodes (e.g., small cells) during off-peak hours has recently been proposed to cope with the increase in mobile data traffic. User requests can be served locally from these helper nodes, if the requested content is available in at least one of the nearby helpers. Nevertheless, the collective storage of a few nearby helper nodes does not usually suffice to a… ▽ More Pushing popular content to cheap "helper" nodes (e.g., small cells) during off-peak hours has recently been proposed to cope with the increase in mobile data traffic. User requests can be served locally from these helper nodes, if the requested content is available in at least one of the nearby helpers. Nevertheless, the collective storage of a few nearby helper nodes does not usually suffice to achieve a high enough hit rate in practice. We propose to depart from the assumption of hard cache hits, common in existing works, and consider "soft" cache hits, where if the original content is not available, some related contents that are locally cached can be recommended instead. Given that Internet content consumption is entertainment-oriented, we argue that there exist scenarios where a user might accept an alternative content (e.g., better download rate for alternative content, low rate plans, etc.), thus avoiding to access expensive/congested links. We formulate the problem of optimal edge caching with soft cache hits in a relatively generic setup, propose efficient algorithms, and analyze the expected gains. We then show using synthetic and real datasets of related video contents that promising caching gains could be achieved in practice. △ Less

Submitted 16 February, 2017; originally announced February 2017.

arXiv:1702.00188 [pdf, ps, other]

Can SDN Accelerate BGP Convergence? A Performance Analysis of Inter-domain Routing Centralization

Authors: Pavlos Sermpezis, Xenofontas Dimitropoulos

Abstract: The Internet is composed of Autonomous Systems (ASes) or domains, i.e., networks belonging to different administrative entities. Routing between domains/ASes is realised in a distributed way, over the Border Gateway Protocol (BGP). Despite its global adoption, BGP has several shortcomings, like slow convergence after routing changes, which can cause packet losses and interrupt communication even f… ▽ More The Internet is composed of Autonomous Systems (ASes) or domains, i.e., networks belonging to different administrative entities. Routing between domains/ASes is realised in a distributed way, over the Border Gateway Protocol (BGP). Despite its global adoption, BGP has several shortcomings, like slow convergence after routing changes, which can cause packet losses and interrupt communication even for several minutes. To accelerate convergence, inter-domain routing centralization approaches, based on Software Defined Networking (SDN), have been recently proposed. Initial studies show that these approaches can significantly improve performance and routing control over BGP. In this paper, we complement existing system-oriented works, by analytically studying the gains of inter-domain SDN. We propose a probabilistic framework to analyse the effects of centralization on the inter-domain routing performance. We derive bounds for the time needed to establish data plane connectivity between ASes after a routing change, as well as predictions for the control-plane convergence time. Our results provide useful insights (e.g., related to the penetration of SDN in the Internet) that can facilitate future research. We discuss applications of our results, and demonstrate the gains through simulations on the Internet AS-topology. △ Less

Submitted 1 February, 2017; originally announced February 2017.

arXiv:1609.09682 [pdf, ps, other]

Soft Cache Hits and the Impact of Alternative Content Recommendations on Mobile Edge Caching

Authors: Thrasyvoulos Spyropoulos, Pavlos Sermpezis

Abstract: Caching popular content at the edge of future mobile networks has been widely considered in order to alleviate the impact of the data tsunami on both the access and backhaul networks. A number of interesting techniques have been proposed, including femto-caching and "delayed" or opportunistic cache access. Nevertheless, the majority of these approaches suffer from the rather limited storage capaci… ▽ More Caching popular content at the edge of future mobile networks has been widely considered in order to alleviate the impact of the data tsunami on both the access and backhaul networks. A number of interesting techniques have been proposed, including femto-caching and "delayed" or opportunistic cache access. Nevertheless, the majority of these approaches suffer from the rather limited storage capacity of the edge caches, compared to the tremendous and rapidly increasing size of the Internet content catalog. We propose to depart from the assumption of hard cache misses, common in most existing works, and consider "soft" cache misses, where if the original content is not available, an alternative content that is locally cached can be recommended. Given that Internet content consumption is increasingly entertainment-oriented, we believe that a related content could often lead to complete or at least partial user satisfaction, without the need to retrieve the original content over expensive links. In this paper, we formulate the problem of optimal edge caching with soft cache hits, in the context of delayed access, and analyze the expected gains. We then show using synthetic and real datasets of related video contents that promising caching gains could be achieved in practice. △ Less

Submitted 30 September, 2016; originally announced September 2016.

arXiv:1609.05702 [pdf, ps, other]

doi 10.1145/2934872.2959078

Monitor, Detect, Mitigate: Combating BGP Prefix Hijacking in Real-Time with ARTEMIS

Authors: Pavlos Sermpezis, Gavriil Chaviaras, Petros Gigis, Xenofontas Dimitropoulos

Abstract: The Border Gateway Protocol (BGP) is globally used by Autonomous Systems (ASes) to establish route paths for IP prefixes in the Internet. Due to the lack of authentication in BGP, an AS can hijack IP prefixes owned by other ASes (i.e., announce illegitimate route paths), impacting thus the Internet routing system and economy. To this end, a number of hijacking detection systems have been proposed.… ▽ More The Border Gateway Protocol (BGP) is globally used by Autonomous Systems (ASes) to establish route paths for IP prefixes in the Internet. Due to the lack of authentication in BGP, an AS can hijack IP prefixes owned by other ASes (i.e., announce illegitimate route paths), impacting thus the Internet routing system and economy. To this end, a number of hijacking detection systems have been proposed. However, existing systems are usually third party services that -inherently- introduce a significant delay between the hijacking detection (by the service) and its mitigation (by the network administrators). To overcome this shortcoming, in this paper, we propose ARTEMIS, a tool that enables an AS to timely detect hijacks on its own prefixes, and automatically proceed to mitigation actions. To evaluate the performance of ARTEMIS, we conduct real hijacking experiments. To our best knowledge, it is the first time that a hijacking detection/mitigation system is evaluated through extensive experiments in the real Internet. Our results (a) show that ARTEMIS can detect (mitigate) a hijack within a few seconds (minutes) after it has been launched, and (b) demonstrate the efficiency of the different control-plane sources used by ARTEMIS, towards monitoring routing changes. △ Less

Submitted 19 September, 2016; originally announced September 2016.

Journal ref: In Proceedings of the ACM SIGCOMM 2016 Conference (SIGCOMM '16), 625-626

arXiv:1605.08864 [pdf, ps, other]

doi 10.1145/3003977.3003988

Analysing the Effects of Routing Centralization on BGP Convergence Time

Authors: Pavlos Sermpezis, Xenofontas Dimitropoulos

Abstract: Software-defined networking (SDN) has improved the routing functionality in networks like data centers or WANs. Recently, several studies proposed to apply the SDN principles in the Internet's inter-domain routing as well. This could offer new routing opportunities and improve the performance of BGP, which can take minutes to converge to routing changes. Previous works have demonstrated that cen… ▽ More Software-defined networking (SDN) has improved the routing functionality in networks like data centers or WANs. Recently, several studies proposed to apply the SDN principles in the Internet's inter-domain routing as well. This could offer new routing opportunities and improve the performance of BGP, which can take minutes to converge to routing changes. Previous works have demonstrated that centralization can benefit the functionality of BGP, and improve its slow convergence that causes severe packet losses and performance degradation. However, due to (a) the fact that previous works mainly focus on system design aspects, and (b) the lack of real deployments, it is not clearly understood yet to what extent inter-domain SDN can improve performance. To this end, in this work, we make the first effort towards analytically studying the effects of routing centralization on the performance of inter-domain routing, and, in particular, the convergence time of BGP. Specifically, we propose a Markovian model for inter-domain networks, where a subset of nodes (domains) coordinate to centralize their inter-domain routing. We then derive analytic results that quantify the BGP convergence time under various network settings (like, SDN penetration, topology, BGP configuration, etc.). Our analysis and results facilitate the performance evaluation of inter-domain SDN networks, which have been studied (till now) only through simulations/emulations that are known to suffer from high time/resource requirements and limited scalability. △ Less

Submitted 28 May, 2016; originally announced May 2016.

Journal ref: ACM SIGMETRICS Performance Evaluation Review 44, 2 (September 2016), 30-32

arXiv:1601.05266 [pdf, ps, other]

Effects of Content Popularity on the Performance of Content-Centric Opportunistic Networking: An Analytical Approach and Applications

Authors: Pavlos Sermpezis, Thrasyvoulos Spyropoulos

Abstract: Mobile users are envisioned to exploit direct communication opportunities between their portable devices, in order to enrich the set of services they can access through cellular or WiFi networks. Sharing contents of common interest or providing access to resources or services between peers can enhance a mobile node's capabilities, offload the cellular network, and disseminate information to nodes… ▽ More Mobile users are envisioned to exploit direct communication opportunities between their portable devices, in order to enrich the set of services they can access through cellular or WiFi networks. Sharing contents of common interest or providing access to resources or services between peers can enhance a mobile node's capabilities, offload the cellular network, and disseminate information to nodes without Internet access. Interest patterns, i.e. how many nodes are interested in each content or service (popularity), as well as how many users can provide a content or service (availability) impact the performance and feasibility of envisioned applications. In this paper, we establish an analytical framework to study the effects of these factors on the delay and success probability of a content/service access request through opportunistic communication. We also apply our framework to the mobile data offloading problem and provide insights for the optimization of its performance. We validate our model and results through realistic simulations, using datasets of real opportunistic networks. △ Less

Submitted 20 January, 2016; originally announced January 2016.

arXiv:1503.00648 [pdf, ps, other]

Offloading on the Edge: Analysis and Optimization of Local Data Storage and Offloading in HetNets

Authors: Pavlos Sermpezis, Luigi Vigneri, Thrasyvoulos Spyropoulos

Abstract: The rapid increase in data traffic demand has overloaded existing cellular networks. Planned upgrades in the communication architecture (e.g. LTE), while helpful, are not expected to suffice to keep up with demand. As a result, extensive densification through small cells, caching content closer to or even at the device, and device-to-device (D2D) communications are seen as necessary components for… ▽ More The rapid increase in data traffic demand has overloaded existing cellular networks. Planned upgrades in the communication architecture (e.g. LTE), while helpful, are not expected to suffice to keep up with demand. As a result, extensive densification through small cells, caching content closer to or even at the device, and device-to-device (D2D) communications are seen as necessary components for future heterogeneous cellular networks to withstand the data crunch. Nevertheless, these options imply new CAPEX and OPEX costs, extensive backhaul support, contract plan incentives for D2D, and a number of interesting tradeoffs arise for the operator. In this paper, we propose an analytical model to explore how much local storage and communication through "edge" nodes could help offload traffic in various heterogeneous network (HetNet) setups and levels of user tolerance to delays. We then use this model to optimize the storage allocation and access mode of different contents as a tradeoff between user satisfaction and cost to the operator. Finally, we validate our findings through realistic simulations and show that considerable amounts of traffic can be offloaded even under moderate densification levels. △ Less

Submitted 2 March, 2015; originally announced March 2015.

Showing 1–30 of 30 results for author: Sermpezis, P