Search | arXiv e-print repository

Joint UPF and Edge Applications Placement and Routing in 5G & Beyond

Authors: Endri Goshi, Hasanin Harkous, Shohreh Ahvar, Rastin Pries, Fidan Mehmeti, Wolfgang Kellerer

Abstract: The development of 5G networks has enabled support for a vast number of applications with stringent traffic requirements, both in terms of communication and computation. Furthermore, the proximity of the entities, such as edge servers and User Plane Functions (UPFs) that provide these resources is of paramount importance. However, with the ever-increasing demand from these applications, operators… ▽ More The development of 5G networks has enabled support for a vast number of applications with stringent traffic requirements, both in terms of communication and computation. Furthermore, the proximity of the entities, such as edge servers and User Plane Functions (UPFs) that provide these resources is of paramount importance. However, with the ever-increasing demand from these applications, operators often find their resources insufficient to accommodate all requests. Some of these demands can be forwarded to external entities, not owned by the operator. This introduces a cost, reducing the operator's profit. Hence, to maximize operator's profit, it is important to place the demands optimally in internal or external edge nodes. To this end, we formulate a constrained optimization problem that captures this objective and the inter-play between different parameters, which turns out to be NP-hard. Therefore, we resort to proposing a heuristic algorithm which ranks the demands according to their value to the operator and amount of resources they need. Results show that our approach outperforms the benchmark algorithms, deviating from the optimal solution by only ~3% on average. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2403.02292 [pdf, other]

A Decade of Privacy-Relevant Android App Reviews: Large Scale Trends

Authors: Omer Akgul, Sai Teja Peddinti, Nina Taft, Michelle L. Mazurek, Hamza Harkous, Animesh Srivastava, Benoit Seguin

Abstract: We present an analysis of 12 million instances of privacy-relevant reviews publicly visible on the Google Play Store that span a 10 year period. By leveraging state of the art NLP techniques, we examine what users have been writing about privacy along multiple dimensions: time, countries, app types, diverse privacy topics, and even across a spectrum of emotions. We find consistent growth of privac… ▽ More We present an analysis of 12 million instances of privacy-relevant reviews publicly visible on the Google Play Store that span a 10 year period. By leveraging state of the art NLP techniques, we examine what users have been writing about privacy along multiple dimensions: time, countries, app types, diverse privacy topics, and even across a spectrum of emotions. We find consistent growth of privacy-relevant reviews, and explore topics that are trending (such as Data Deletion and Data Theft), as well as those on the decline (such as privacy-relevant reviews on sensitive permissions). We find that although privacy reviews come from more than 200 countries, 33 countries provide 90% of privacy reviews. We conduct a comparison across countries by examining the distribution of privacy topics a country's users write about, and find that geographic proximity is not a reliable indicator that nearby countries have similar privacy perspectives. We uncover some countries with unique patterns and explore those herein. Surprisingly, we uncover that it is not uncommon for reviews that discuss privacy to be positive (32%); many users express pleasure about privacy features within apps or privacy-focused apps. We also uncover some unexpected behaviors, such as the use of reviews to deliver privacy disclaimers to developers. Finally, we demonstrate the value of analyzing app reviews with our approach as a complement to existing methods for understanding users' perspectives about privacy △ Less

Submitted 15 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: This is the extended version of the paper accepted to USENIX Security 2024

arXiv:2204.04221 [pdf, other]

CookieEnforcer: Automated Cookie Notice Analysis and Enforcement

Authors: Rishabh Khandelwal, Asmit Nayak, Hamza Harkous, Kassem Fawaz

Abstract: Online websites use cookie notices to elicit consent from the users, as required by recent privacy regulations like the GDPR and the CCPA. Prior work has shown that these notices use dark patterns to manipulate users into making website-friendly choices which put users' privacy at risk. In this work, we develop CookieEnforcer, a new system for automatically discovering cookie notices and deciding… ▽ More Online websites use cookie notices to elicit consent from the users, as required by recent privacy regulations like the GDPR and the CCPA. Prior work has shown that these notices use dark patterns to manipulate users into making website-friendly choices which put users' privacy at risk. In this work, we develop CookieEnforcer, a new system for automatically discovering cookie notices and deciding on the options that result in disabling all non-essential cookies. In order to achieve this, we first build an automatic cookie notice detector that utilizes the rendering pattern of the HTML elements to identify the cookie notices. Next, CookieEnforcer analyzes the cookie notices and predicts the set of actions required to disable all unnecessary cookies. This is done by modeling the problem as a sequence-to-sequence task, where the input is a machine-readable cookie notice and the output is the set of clicks to make. We demonstrate the efficacy of CookieEnforcer via an end-to-end accuracy evaluation, showing that it can generate the required steps in 91% of the cases. Via a user study, we show that CookieEnforcer can significantly reduce the user effort. Finally, we use our system to perform several measurements on the top 5k websites from the Tranco list (as accessed from the US and the UK), drawing comparisons and observations at scale. △ Less

Submitted 14 April, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

arXiv:2004.06577 [pdf, other]

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

Authors: Hamza Harkous, Isabel Groves, Amir Saffari

Abstract: End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this work, we present DataTuner, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and the target domain.… ▽ More End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this work, we present DataTuner, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and the target domain. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity classifier. Each of our components is learnt end-to-end without the need for dataset-specific heuristics, entity delexicalization, or post-processing. We show that DataTuner achieves state of the art results on the automated metrics across four major D2T datasets (LDC2017T10, WebNLG, ViGGO, and Cleaned E2E), with a fluency assessed by human annotators nearing or exceeding the human-written reference texts. We further demonstrate that the model-based semantic fidelity scorer in DataTuner is a better assessment tool compared to traditional, heuristic-based measures. Our generated text has a significantly better semantic fidelity than the state of the art across all four datasets △ Less

Submitted 10 November, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

Comments: 28th International Conference on Computational Linguistics (COLING 2020), Online, December 8-13, 2020

arXiv:1809.08396 [pdf, other]

The Privacy Policy Landscape After the GDPR

Authors: Thomas Linden, Rishabh Khandelwal, Hamza Harkous, Kassem Fawaz

Abstract: The EU General Data Protection Regulation (GDPR) is one of the most demanding and comprehensive privacy regulations of all time. A year after it went into effect, we study its impact on the landscape of privacy policies online. We conduct the first longitudinal, in-depth, and at-scale assessment of privacy policies before and after the GDPR. We gauge the complete consumption cycle of these policie… ▽ More The EU General Data Protection Regulation (GDPR) is one of the most demanding and comprehensive privacy regulations of all time. A year after it went into effect, we study its impact on the landscape of privacy policies online. We conduct the first longitudinal, in-depth, and at-scale assessment of privacy policies before and after the GDPR. We gauge the complete consumption cycle of these policies, from the first user impressions until the compliance assessment. We create a diverse corpus of two sets of 6,278 unique English-language privacy policies from inside and outside the EU, covering their pre-GDPR and the post-GDPR versions. The results of our tests and analyses suggest that the GDPR has been a catalyst for a major overhaul of the privacy policies inside and outside the EU. This overhaul of the policies, manifesting in extensive textual changes, especially for the EU-based websites, comes at mixed benefits to the users. While the privacy policies have become considerably longer, our user study with 470 participants on Amazon MTurk indicates a significant improvement in the visual representation of privacy policies from the users' perspective for the EU websites. We further develop a new workflow for the automated assessment of requirements in privacy policies. Using this workflow, we show that privacy policies cover more data practices and are more consistent with seven compliance requirements post the GDPR. We also assess how transparent the organizations are with their privacy practices by performing specificity analysis. In this analysis, we find evidence for positive changes triggered by the GDPR, with the specificity level improving on average. Still, we find the landscape of privacy policies to be in a transitional phase; many policies still do not meet several key GDPR requirements or their improved coverage comes with reduced specificity. △ Less

Submitted 24 June, 2019; v1 submitted 22 September, 2018; originally announced September 2018.

arXiv:1802.02561 [pdf, other]

Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

Authors: Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, Karl Aberer

Abstract: Privacy policies are the primary channel through which companies inform users about their data collection and sharing practices. These policies are often long and difficult to comprehend. Short notices based on information extracted from privacy policies have been shown to be useful but face a significant scalability hurdle, given the number of policies and their evolution over time. Companies, us… ▽ More Privacy policies are the primary channel through which companies inform users about their data collection and sharing practices. These policies are often long and difficult to comprehend. Short notices based on information extracted from privacy policies have been shown to be useful but face a significant scalability hurdle, given the number of policies and their evolution over time. Companies, users, researchers, and regulators still lack usable and scalable tools to cope with the breadth and depth of privacy policies. To address these hurdles, we propose an automated framework for privacy policy analysis (Polisis). It enables scalable, dynamic, and multi-dimensional queries on natural language privacy policies. At the core of Polisis is a privacy-centric language model, built with 130K privacy policies, and a novel hierarchy of neural-network classifiers that accounts for both high-level aspects and fine-grained details of privacy practices. We demonstrate Polisis' modularity and utility with two applications supporting structured and free-form querying. The structured querying application is the automated assignment of privacy icons from privacy policies. With Polisis, we can achieve an accuracy of 88.4% on this task. The second application, PriBot, is the first freeform question-answering system for privacy policies. We show that PriBot can produce a correct answer among its top-3 results for 82% of the test questions. Using an MTurk user study with 700 participants, we show that at least one of PriBot's top-3 answers is relevant to users for 89% of the test questions. △ Less

Submitted 29 June, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

Comments: Published at USENIX Security 2018; associated website: https://pribot.org

arXiv:1704.07626 [pdf, other]

Taxonomy Induction using Hypernym Subsequences

Authors: Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer

Abstract: We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on… ▽ More We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof- the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, no previous approaches have been empirically proven to manifest noise-robustness in the input vocabulary. △ Less

Submitted 14 September, 2017; v1 submitted 25 April, 2017; originally announced April 2017.

arXiv:1704.07624 [pdf, other]

280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

Authors: Amit Gupta, Rémi Lebret, Hamza Harkous, Karl Aberer

Abstract: We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art… ▽ More We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages. △ Less

Submitted 12 September, 2017; v1 submitted 25 April, 2017; originally announced April 2017.

arXiv:1702.08234 [pdf, other]

doi 10.1145/3029806.3029837

"If You Can't Beat them, Join them": A Usability Approach to Interdependent Privacy in Cloud Apps

Authors: Hamza Harkous, Karl Aberer

Abstract: Cloud storage services, like Dropbox and Google Drive, have growing ecosystems of 3rd party apps that are designed to work with users' cloud files. Such apps often request full access to users' files, including files shared with collaborators. Hence, whenever a user grants access to a new vendor, she is inflicting a privacy loss on herself and on her collaborators too. Based on analyzing a real da… ▽ More Cloud storage services, like Dropbox and Google Drive, have growing ecosystems of 3rd party apps that are designed to work with users' cloud files. Such apps often request full access to users' files, including files shared with collaborators. Hence, whenever a user grants access to a new vendor, she is inflicting a privacy loss on herself and on her collaborators too. Based on analyzing a real dataset of 183 Google Drive users and 131 third party apps, we discover that collaborators inflict a privacy loss which is at least 39% higher than what users themselves cause. We take a step toward minimizing this loss by introducing the concept of History-based decisions. Simply put, users are informed at decision time about the vendors which have been previously granted access to their data. Thus, they can reduce their privacy loss by not installing apps from new vendors whenever possible. Next, we realize this concept by introducing a new privacy indicator, which can be integrated within the cloud apps' authorization interface. Via a web experiment with 141 participants recruited from CrowdFlower, we show that our privacy indicator can significantly increase the user's likelihood of choosing the app that minimizes her privacy loss. Finally, we explore the network effect of History-based decisions via a simulation on top of large collaboration networks. We demonstrate that adopting such a decision-making process is capable of reducing the growth of users' privacy loss by 70% in a Google Drive-based network and by 40% in an author collaboration network. This is despite the fact that we neither assume that users cooperate nor that they exhibit altruistic behavior. To our knowledge, our work is the first to provide quantifiable evidence of the privacy risk that collaborators pose in cloud apps. We are also the first to mitigate this problem via a usable privacy approach. △ Less

Submitted 24 March, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Comments: Authors' extended version of the paper published at CODASPY 2017

arXiv:1608.05661 [pdf, other]

doi 10.1515/popets-2016-0032

The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps

Authors: Hamza Harkous, Rameez Rahman, Bojan Karlas, Karl Aberer

Abstract: Third party apps that work on top of personal cloud services such as Google Drive and Dropbox, require access to the user's data in order to provide some functionality. Through detailed analysis of a hundred popular Google Drive apps from Google's Chrome store, we discover that the existing permission model is quite often misused: around two thirds of analyzed apps are over-privileged, i.e., they… ▽ More Third party apps that work on top of personal cloud services such as Google Drive and Dropbox, require access to the user's data in order to provide some functionality. Through detailed analysis of a hundred popular Google Drive apps from Google's Chrome store, we discover that the existing permission model is quite often misused: around two thirds of analyzed apps are over-privileged, i.e., they access more data than is needed for them to function. In this work, we analyze three different permission models that aim to discourage users from installing over-privileged apps. In experiments with 210 real users, we discover that the most successful permission model is our novel ensemble method that we call Far-reaching Insights. Far-reaching Insights inform the users about the data-driven insights that apps can make about them (e.g., their topics of interest, collaboration and activity patterns etc.) Thus, they seek to bridge the gap between what third parties can actually know about users and users perception of their privacy leakage. The efficacy of Far-reaching Insights in bridging this gap is demonstrated by our results, as Far-reaching Insights prove to be, on average, twice as effective as the current model in discouraging users from installing over-privileged apps. In an effort for promoting general privacy awareness, we deploy a publicly available privacy oriented app store that uses Far-reaching Insights. Based on the knowledge extracted from data of the store's users (over 115 gigabytes of Google Drive data from 1440 users with 662 installed apps), we also delineate the ecosystem for third-party cloud apps from the standpoint of developers and cloud providers. Finally, we present several general recommendations that can guide other future works in the area of privacy for the cloud. △ Less

Submitted 18 August, 2016; originally announced August 2016.

Journal ref: Proceedings on Privacy Enhancing Technologies. Volume 2016, Issue 4, Pages 123-143, ISSN (Online) 2299-0984

arXiv:1107.5419 [pdf, other]

Scalable and Secure Aggregation in Distributed Networks

Authors: Sebastien Gambs, Rachid Guerraoui, Hamza Harkous, Florian Huc, Anne-Marie Kermarrec

Abstract: We consider the problem of computing an aggregation function in a \emph{secure} and \emph{scalable} way. Whereas previous distributed solutions with similar security guarantees have a communication cost of $O(n^3)$, we present a distributed protocol that requires only a communication complexity of $O(n\log^3 n)$, which we prove is near-optimal. Our protocol ensures perfect security against a compu… ▽ More We consider the problem of computing an aggregation function in a \emph{secure} and \emph{scalable} way. Whereas previous distributed solutions with similar security guarantees have a communication cost of $O(n^3)$, we present a distributed protocol that requires only a communication complexity of $O(n\log^3 n)$, which we prove is near-optimal. Our protocol ensures perfect security against a computationally-bounded adversary, tolerates $(1/2-ε)n$ malicious nodes for any constant $1/2 > ε> 0$ (not depending on $n$), and outputs the exact value of the aggregated function with high probability. △ Less

Submitted 23 November, 2011; v1 submitted 27 July, 2011; originally announced July 2011.

Showing 1–11 of 11 results for author: Harkous, H