Search | arXiv e-print repository

WASEF: Web Acceleration Solutions Evaluation Framework

Authors: Moumena Chaqfeh, Rashid Tahir, Ayaz Rehman, Jesutofunmi Kupoluyi, Saad Ullah, Russell Coke, Muhammad Junaid, Muhammad Arham, Marc Wiggerman, Abijith Radhakrishnan, Ivano Malavolta, Fareed Zaffar, Yasir Zaki

Abstract: The World Wide Web has become increasingly complex in recent years. This complexity severely affects users in the develo** regions due to slow cellular data connectivity and usage of low-end smartphone devices. Existing solutions to simplify the Web are generally evaluated using several different metrics and settings, which hinders the comparison of these solutions against each other. Hence, it… ▽ More The World Wide Web has become increasingly complex in recent years. This complexity severely affects users in the develo** regions due to slow cellular data connectivity and usage of low-end smartphone devices. Existing solutions to simplify the Web are generally evaluated using several different metrics and settings, which hinders the comparison of these solutions against each other. Hence, it is difficult to select the appropriate solution for a specific context and use case. This paper presents Wasef, a framework that uses a comprehensive set of timing, saving, and quality metrics to evaluate and compare different web complexity solutions in a reproducible manner and under realistic settings. The framework integrates a set of existing state-of-the-art solutions and facilitates the addition of newer solutions down the line. Wasef first creates a cache of web pages by crawling both landing and internal ones. Each page in the cache is then passed through a web complexity solution to generate an optimized version of the page. Finally, each optimized version is evaluated in a consistent manner using a uniform environment and metrics. We demonstrate how the framework can be used to compare and contrast the performance characteristics of different web complexity solutions under realistic conditions. We also show that the accessibility to pages in develo** regions can be significantly improved, by evaluating the top 100 global pages in the developed world against the top 100 pages in the lowest 50 develo** countries. Results show a significant difference in terms of complexity and a potential benefit for our framework in improving web accessibility in these countries. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 15 pages, 4 figures

arXiv:2111.11019 [pdf, other]

doi 10.1609/icwsm.v16i1.19290

Are Proactive Interventions for Reddit Communities Feasible?

Authors: Hussam Habib, Maaz Bin Musa, Fareed Zaffar, Rishab Nithyanand

Abstract: Reddit has found its communities playing a prominent role in originating and propagating problematic socio-political discourse. Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequenc… ▽ More Reddit has found its communities playing a prominent role in originating and propagating problematic socio-political discourse. Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, administrative actions (community bans and quarantines) are often taken only when problematic discourse within a community spills over into the real world with serious consequences. In this paper, we investigate the feasibility of deploying tools to proactively identify problematic communities on Reddit. Proactive identification strategies show promise for three reasons: (1) they have potential to reduce the manual efforts required to track communities for problematic content, (2) they give administrators a scientific rationale to back their decisions and interventions, and (3) they facilitate early and more nuanced interventions (than banning or quarantining) to mitigate problematic discourse. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: arXiv admin note: text overlap with arXiv:1906.11932

arXiv:2109.07028 [pdf, other]

Avengers Ensemble! Improving Transferability of Authorship Obfuscation

Authors: Muhammad Haroon, Fareed Zaffar, Padmini Srinivasan, Zubair Shafiq

Abstract: Stylometric approaches have been shown to be quite effective for real-world authorship attribution. To mitigate the privacy threat posed by authorship attribution, researchers have proposed automated authorship obfuscation approaches that aim to conceal the stylometric artefacts that give away the identity of an anonymous document's author. Recent work has focused on authorship obfuscation approac… ▽ More Stylometric approaches have been shown to be quite effective for real-world authorship attribution. To mitigate the privacy threat posed by authorship attribution, researchers have proposed automated authorship obfuscation approaches that aim to conceal the stylometric artefacts that give away the identity of an anonymous document's author. Recent work has focused on authorship obfuscation approaches that rely on black-box access to an attribution classifier to evade attribution while preserving semantics. However, to be useful under a realistic threat model, it is important that these obfuscation approaches work well even when the adversary's attribution classifier is different from the one used internally by the obfuscator. Unfortunately, existing authorship obfuscation approaches do not transfer well to unseen attribution classifiers. In this paper, we propose an ensemble-based approach for transferable authorship obfuscation. Our experiments show that if an obfuscator can evade an ensemble attribution classifier, which is based on multiple base attribution classifiers, it is more likely to transfer to different attribution classifiers. Our analysis shows that ensemble-based authorship obfuscation achieves better transferability because it combines the knowledge from each of the base attribution classifiers by essentially averaging their decision boundaries. △ Less

Submitted 8 October, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: Submitted to PETS 2021

arXiv:2108.13923 [pdf, other]

TrackerSift: Untangling Mixed Tracking and Functional Web Resources

Authors: Abdul Haddi Amjad, Danial Saleem, Fareed Zaffar, Muhammad Ali Gulzar, Zubair Shafiq

Abstract: Trackers have recently started to mix tracking and functional resources to circumvent privacy-enhancing content blocking tools. Such mixed web resources put content blockers in a bind: risk breaking legitimate functionality if they act and risk missing privacy-invasive advertising and tracking if they do not. In this paper, we propose TrackerSift to progressively classify and untangle mixed web re… ▽ More Trackers have recently started to mix tracking and functional resources to circumvent privacy-enhancing content blocking tools. Such mixed web resources put content blockers in a bind: risk breaking legitimate functionality if they act and risk missing privacy-invasive advertising and tracking if they do not. In this paper, we propose TrackerSift to progressively classify and untangle mixed web resources (that combine tracking and legitimate functionality) at multiple granularities of analysis (domain, hostname, script, and method). Using TrackerSift, we conduct a large-scale measurement study of such mixed resources on 100K websites. We find that more than 17% domains, 48% hostnames, 6% scripts, and 9% methods observed in our crawls combine tracking and legitimate functionality. While mixed web resources are prevalent across all granularities, TrackerSift is able to attribute 98% of the script-initiated network requests to either tracking or functional resources at the finest method-level granularity. Our analysis shows that mixed resources at different granularities are typically served from CDNs or as inlined and bundled scripts, and that blocking them indeed results in breakage of legitimate functionality. Our results highlight opportunities for finer-grained content blocking to remove mixed resources without breaking legitimate functionality. △ Less

Submitted 29 September, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

arXiv:2106.13764 [pdf]

To Block or Not to Block: Accelerating Mobile Web Pages On-The-Fly Through JavaScript Classification

Authors: Moumena Chaqfeh, Muhammad Haseeb, Waleed Hashmi, Patrick Inshuti, Manesha Ramesh, Matteo Varvello, Fareed Zaffar, Lakshmi Subramanian, Yasir Zaki

Abstract: The increasing complexity of JavaScript in modern mobile web pages has become a critical performance bottleneck for low-end mobile phone users, especially in develo** regions. In this paper, we propose SlimWeb, a novel approach that automatically derives lightweight versions of mobile web pages on-the-fly by eliminating the use of unnecessary JavaScript. SlimWeb consists of a JavaScript classifi… ▽ More The increasing complexity of JavaScript in modern mobile web pages has become a critical performance bottleneck for low-end mobile phone users, especially in develo** regions. In this paper, we propose SlimWeb, a novel approach that automatically derives lightweight versions of mobile web pages on-the-fly by eliminating the use of unnecessary JavaScript. SlimWeb consists of a JavaScript classification service powered by a supervised Machine Learning (ML) model that provides insights into each JavaScript element embedded in a web page. SlimWeb aims to improve the web browsing experience by predicting the class of each element, such that essential elements are preserved and non-essential elements are blocked by the browsers using the service. We motivate the core design of SlimWeb using a user preference survey of 306 users and perform a detailed evaluation of SlimWeb across 500 popular web pages in a develo** region on real 3G and 4G cellular networks, along with a user experience study with 20 real-world users and a usage willingness survey of 588 users. Evaluation results show that SlimWeb achieves a 50% reduction in the page load time compared to the original pages, and more than 30% reduction compared to competing solutions, while achieving high similarity scores to the original pages measured via a qualitative evaluation study of 62 users. SlimWeb improves the overall user experience by more than 60% compared to the original pages, while maintaining 90%-100% of the visual and functional components of most pages. Finally, the SlimWeb classifier achieves a median accuracy of 90% in predicting the JavaScript category. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: 11 pages, 11 figures

arXiv:2006.15794 [pdf, other]

CanaryTrap: Detecting Data Misuse by Third-Party Apps on Online Social Networks

Authors: Shehroze Farooqi, Maaz Musa, Zubair Shafiq, Fareed Zaffar

Abstract: Online social networks support a vibrant ecosystem of third-party apps that get access to personal information of a large number of users. Despite several recent high-profile incidents, methods to systematically detect data misuse by third-party apps on online social networks are lacking. We propose CanaryTrap to detect misuse of data shared with third-party apps. CanaryTrap associates a honeytoke… ▽ More Online social networks support a vibrant ecosystem of third-party apps that get access to personal information of a large number of users. Despite several recent high-profile incidents, methods to systematically detect data misuse by third-party apps on online social networks are lacking. We propose CanaryTrap to detect misuse of data shared with third-party apps. CanaryTrap associates a honeytoken to a user account and then monitors its unrecognized use via different channels after sharing it with the third-party app. We design and implement CanaryTrap to investigate misuse of data shared with third-party apps on Facebook. Specifically, we share the email address associated with a Facebook account as a honeytoken by installing a third-party app. We then monitor the received emails and use Facebook's ad transparency tool to detect any unrecognized use of the shared honeytoken. Our deployment of CanaryTrap to monitor 1,024 Facebook apps has uncovered multiple cases of misuse of data shared with third-party apps on Facebook including ransomware, spam, and targeted advertising. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:2005.08379 [pdf, other]

Towards Characterizing COVID-19 Awareness on Twitter

Authors: Muhammad Saad, Muhammad Hassan, Fareed Zaffar

Abstract: The coronavirus (COVID-19) pandemic has significantly altered our lifestyles as we resort to minimize the spread through preventive measures such as social distancing and quarantine. An increasingly worrying aspect is the gap between the exponential disease spread and the delay in adopting preventive measures. This gap is attributed to the lack of awareness about the disease and its preventive mea… ▽ More The coronavirus (COVID-19) pandemic has significantly altered our lifestyles as we resort to minimize the spread through preventive measures such as social distancing and quarantine. An increasingly worrying aspect is the gap between the exponential disease spread and the delay in adopting preventive measures. This gap is attributed to the lack of awareness about the disease and its preventive measures. Nowadays, social media platforms (ie., Twitter) are frequently used to create awareness about major events, including COVID-19. In this paper, we use Twitter to characterize public awareness regarding COVID-19 by analyzing the information flow in the most affected countries. Towards that, we collect more than 46K trends and 622 Million tweets from the top twenty most affected countries to examine 1) the temporal evolution of COVID-19 related trends, 2) the volume of tweets and recurring topics in those trends, and 3) the user sentiment towards preventive measures. Our results show that countries with a lower pandemic spread generated a higher volume of trends and tweets to expedite the information flow and contribute to public awareness. We also observed that in those countries, the COVID-19 related trends were generated before the sharp increase in the number of cases, indicating a preemptive attempt to notify users about the potential threat. Finally, we noticed that in countries with a lower spread, users had a positive sentiment towards COVID-19 preventive measures. Our measurements and analysis show that effective social media usage can influence public behavior, which can be leveraged to better combat future pandemics. △ Less

Submitted 20 May, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: Figure 1 is incorrect. Will be updated in the revision

arXiv:1906.11932 [pdf, other]

To Act or React: Investigating Proactive Strategies For Online Community Moderation

Authors: Hussam Habib, Maaz Bin Musa, Fareed Zaffar, Rishab Nithyanand

Abstract: Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, admin… ▽ More Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, administrative actions (community bans and quarantines) are often taken in reaction to media pressure following offensive discourse within a community spilling into the real world with serious consequences. In this paper, we investigate the feasibility of proactive moderation on Reddit -- i.e., proactively identifying communities at risk of committing offenses that previously resulted in bans for other communities. Proactive moderation strategies show promise for two reasons: (1) they have potential to narrow down the communities that administrators need to monitor for hateful content and (2) they give administrators a scientific rationale to back their administrative decisions and interventions. Our work shows that communities are constantly evolving in their user base and topics of discourse and that evolution into hateful or dangerous (i.e., considered bannable by Reddit administrators) communities can often be predicted months ahead of time. This makes proactive moderation feasible. Further, we leverage explainable machine learning to help identify the strongest predictors of evolution into dangerous communities. This provides administrators with insights into the characteristics of communities at risk becoming dangerous or hateful. Finally, we investigate, at scale, the impact of participation in hateful and dangerous subreddits and the effectiveness of community bans and quarantines on the behavior of members of these communities. △ Less

Submitted 27 June, 2019; originally announced June 2019.

arXiv:1505.01637 [pdf, other]

Characterizing Key Stakeholders in an Online Black-Hat Marketplace

Authors: Shehroze Farooqi, Muhammad Ikram, Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, M. Zubair Shafiq, Fareed Zaffar

Abstract: Over the past few years, many black-hat marketplaces have emerged that facilitate access to reputation manipulation services such as fake Facebook likes, fraudulent search engine optimization (SEO), or bogus Amazon reviews. In order to deploy effective technical and legal countermeasures, it is important to understand how these black-hat marketplaces operate, shedding light on the services they of… ▽ More Over the past few years, many black-hat marketplaces have emerged that facilitate access to reputation manipulation services such as fake Facebook likes, fraudulent search engine optimization (SEO), or bogus Amazon reviews. In order to deploy effective technical and legal countermeasures, it is important to understand how these black-hat marketplaces operate, shedding light on the services they offer, who is selling, who is buying, what are they buying, who is more successful, why are they successful, etc. Toward this goal, in this paper, we present a detailed micro-economic analysis of a popular online black-hat marketplace, namely, SEOClerks.com. As the site provides non-anonymized transaction information, we set to analyze selling and buying behavior of individual users, propose a strategy to identify key users, and study their tactics as compared to other (non-key) users. We find that key users: (1) are mostly located in Asian countries, (2) are focused more on selling black-hat SEO services, (3) tend to list more lower priced services, and (4) sometimes buy services from other sellers and then sell at higher prices. Finally, we discuss the implications of our analysis with respect to devising effective economic and legal intervention strategies against marketplace operators and key users. △ Less

Submitted 4 April, 2017; v1 submitted 7 May, 2015; originally announced May 2015.

Comments: 12th IEEE/APWG Symposium on Electronic Crime Research (eCrime 2017)

Showing 1–9 of 9 results for author: Zaffar, F