-
Turning the Tide on Dark Pools? Towards Multi-Stakeholder Vulnerability Notifications in the Ad-Tech Supply Chain
Authors:
Yash Vekaria,
Rishab Nithyanand,
Zubair Shafiq
Abstract:
Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to…
▽ More
Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to mislead advertisers. We investigate the effectiveness of vulnerability notification campaigns aimed at mitigating dark pooling. Prior research on vulnerability notifications has primarily focused on single-stakeholder scenarios, and it is unclear whether vulnerability notifications can be effective in the multi-stakeholder ad-tech supply chain. We implement an automated vulnerability notification pipeline to systematically evaluate the responsiveness of various stakeholders, including publishers, ad-networks, and advertisers to vulnerability notifications by academics and activists. Our nine-month long multi-stakeholder notification study shows that notifications are an effective method for reducing dark pooling vulnerabilities in the online advertising ecosystem, especially when targeted towards ad-networks. Further, the sender reputation does not impact responses to notifications from activists and academics in a statistically different way. In addition to being the first notification study targeting the online advertising ecosystem, we are also the first to study multi-stakeholder context in vulnerability notifications.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Algorithmic amplification of biases on Google Search
Authors:
Hussam Habib,
Ryan Stoldt,
Andrew High,
Brian Ekdale,
Ashley Peterson,
Katy Biddle,
Javie Ssozi,
Rishab Nithyanand
Abstract:
The evolution of information-seeking processes, driven by search engines like Google, has transformed the access to information people have. This paper investigates how individuals' preexisting attitudes influence the modern information-seeking process, specifically the results presented by Google Search. Through a comprehensive study involving surveys and information-seeking tasks focusing on the…
▽ More
The evolution of information-seeking processes, driven by search engines like Google, has transformed the access to information people have. This paper investigates how individuals' preexisting attitudes influence the modern information-seeking process, specifically the results presented by Google Search. Through a comprehensive study involving surveys and information-seeking tasks focusing on the topic of abortion, the paper provides four crucial insights: 1) Individuals with opposing attitudes on abortion receive different search results. 2) Individuals express their beliefs in their choice of vocabulary used in formulating the search queries, sha** the outcome of the search. 3) Additionally, the user's search history contributes to divergent results among those with opposing attitudes. 4) Google Search engine reinforces preexisting beliefs in search results. Overall, this study provides insights into the interplay between human biases and algorithmic processes, highlighting the potential for information polarization in modern information-seeking processes.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Making Sense of Constellations: Methodologies for Understanding Starlink's Scheduling Algorithms
Authors:
Hammas Bin Tanveer,
Mike Puchol,
Rachee Singh,
Antonio Bianchi,
Rishab Nithyanand
Abstract:
Starlink constellations are currently the largest LEO WAN and have seen considerable interest from the research community. In this paper, we use high-frequency and high-fidelity measurements to uncover evidence of hierarchical traffic controllers in Starlink -- a global controller which allocates satellites to terminals and an on-satellite controller that schedules transmission of user flows. We t…
▽ More
Starlink constellations are currently the largest LEO WAN and have seen considerable interest from the research community. In this paper, we use high-frequency and high-fidelity measurements to uncover evidence of hierarchical traffic controllers in Starlink -- a global controller which allocates satellites to terminals and an on-satellite controller that schedules transmission of user flows. We then devise a novel approach for identifying how satellites are allocated to user terminals. Using data gathered with this approach, we measure the characteristics of the global controller and identify the factors that influence the allocation of satellites to terminals. Finally, we use this data to build a model which approximates Starlink's global scheduler. Our model is able to predict the characteristics of the satellite allocated to a terminal at a specific location and time with reasonably high accuracy and at a rate significantly higher than baseline.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
How Auditing Methodologies Can Impact Our Understanding of YouTube's Recommendation Systems
Authors:
Sarmad Chandio,
Daniyal Pirwani Dar,
Rishab Nithyanand
Abstract:
Data generated by audits of social media websites have formed the basis of our understanding of the biases presented in algorithmic content recommendation systems. As legislators around the world are beginning to consider regulating the algorithmic systems that drive online platforms, it is critical to ensure the correctness of these inferred biases. However, as we will show in this paper, doing s…
▽ More
Data generated by audits of social media websites have formed the basis of our understanding of the biases presented in algorithmic content recommendation systems. As legislators around the world are beginning to consider regulating the algorithmic systems that drive online platforms, it is critical to ensure the correctness of these inferred biases. However, as we will show in this paper, doing so is a challenging task for a variety of reasons related to the complexity of configuration parameters associated with the audits that gather data from a specific platform.
Focusing specifically on YouTube, we show that conducting audits to make inferences about YouTube's recommendation systems is more methodologically challenging than one might expect. There are many methodological decisions that need to be considered in order to obtain scientifically valid results, and each of these decisions incur costs. For example, should an auditor use (expensive to obtain) logged-in YouTube accounts while gathering recommendations from the algorithm to obtain more accurate inferences? We explore the impact of this and many other decisions and make some startling discoveries about the methodological choices that impact YouTube's recommendations. Taken all together, our research suggests auditing configuration compromises that YouTube auditors and researchers can use to reduce audit overhead, both economically and computationally, without sacrificing accuracy of their inferences. Similarly, we also identify several configuration parameters that have a significant impact on the accuracy of measured inferences and should be carefully considered.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
The Inventory is Dark and Full of Misinformation: Understanding the Abuse of Ad Inventory Pooling in the Ad-Tech Supply Chain
Authors:
Yash Vekaria,
Rishab Nithyanand,
Zubair Shafiq
Abstract:
Ad-tech enables publishers to programmatically sell their ad inventory to millions of demand partners through a complex supply chain. Bogus or low quality publishers can exploit the opaque nature of the ad-tech to deceptively monetize their ad inventory. In this paper, we investigate for the first time how misinformation sites subvert the ad-tech transparency standards and pool their ad inventory…
▽ More
Ad-tech enables publishers to programmatically sell their ad inventory to millions of demand partners through a complex supply chain. Bogus or low quality publishers can exploit the opaque nature of the ad-tech to deceptively monetize their ad inventory. In this paper, we investigate for the first time how misinformation sites subvert the ad-tech transparency standards and pool their ad inventory with unrelated sites to circumvent brand safety protections. We find that a few major ad exchanges are disproportionately responsible for the dark pools that are exploited by misinformation websites. We further find evidence that dark pooling allows misinformation sites to deceptively sell their ad inventory to reputable brands. We conclude with a discussion of potential countermeasures such as better vetting of ad exchange partners, adoption of new ad-tech transparency standards that enable end-to-end validation of the ad-tech supply chain, as well as widespread deployment of independent audits like ours.
△ Less
Submitted 14 October, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Glowing in the Dark Uncovering IPv6 Address Discovery and Scanning Strategies in the Wild
Authors:
Hammas Bin Tanveer,
Rachee Singh,
Paul Pearce,
Rishab Nithyanand
Abstract:
In this work we identify scanning strategies of IPv6 scanners on the Internet. We offer a unique perspective on the behavior of IPv6 scanners by conducting controlled experiments leveraging a large and unused /56 IPv6 subnet. We selectively make parts of the subnet visible to scanners by hosting applications that make direct or indirect contact with IPv6- capable servers on the Internet. By carefu…
▽ More
In this work we identify scanning strategies of IPv6 scanners on the Internet. We offer a unique perspective on the behavior of IPv6 scanners by conducting controlled experiments leveraging a large and unused /56 IPv6 subnet. We selectively make parts of the subnet visible to scanners by hosting applications that make direct or indirect contact with IPv6- capable servers on the Internet. By careful experiment design, we mitigate the effects of hidden variables on scans sent to our /56 subnet and establish causal relationships between IPv6 host activity types and the scanner attention they evoke. We show that IPv6 host activities e.g., Web browsing, membership in the NTP pool and Tor network, cause scanners to send a magnitude higher number of unsolicited IP scans and reverse DNS queries to our subnet than before. DNS scanners focus their scans in narrow regions of the address space where our applications are hosted whereas IP scanners broadly scan the entire subnet. Even after the host activity from our subnet subsides, we observe persistent residual scanning to portions of the address space that previously hosted applications
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
The Morbid Realities of Social Media: An Investigation into the Misinformation Shared by the Deceased Victims of COVID-19
Authors:
Hussam Habib,
Rishab Nithyanand
Abstract:
Social media platforms have had considerable impact on the real world especially during the Covid-19 pandemic. Misinformation related to Covid-19 might have caused significant impact on the population specifically due to its association with dangerous beliefs such as anti-vaccination and Covid denial. In this work, we study a unique dataset of Facebook posts by users who shared and believed in Cov…
▽ More
Social media platforms have had considerable impact on the real world especially during the Covid-19 pandemic. Misinformation related to Covid-19 might have caused significant impact on the population specifically due to its association with dangerous beliefs such as anti-vaccination and Covid denial. In this work, we study a unique dataset of Facebook posts by users who shared and believed in Covid-19 misinformation before succumbing to Covid-19 often resulting in death. We aim to characterize the dominant themes and sources present in the victim's posts along with identifying the role of the platform in handling deadly narratives. Our analysis reveals the overwhelming politicization of Covid-19 through the prevalence of anti-government themes propagated by right-wing political and media ecosystem. Furthermore, we highlight the failures of Facebook's implementation and completeness of soft moderation actions intended to warn users of misinformation. Results from this study bring insights into the responsibility of political elites in sha** public discourse and the platform's role in dampening the reach of harmful misinformation.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
ATOM: A Generalizable Technique for Inferring Tracker-Advertiser Data Sharing in the Online Behavioral Advertising Ecosystem
Authors:
Maaz Bin Musa,
Rishab Nithyanand
Abstract:
Data sharing between online trackers and advertisers is a key component in online behavioral advertising. This sharing can be facilitated through a variety of processes, including those not observable to the user's browser. The unobservability of these processes limits the ability of researchers and auditors seeking to verify compliance with regulations which require complete disclosure of data sh…
▽ More
Data sharing between online trackers and advertisers is a key component in online behavioral advertising. This sharing can be facilitated through a variety of processes, including those not observable to the user's browser. The unobservability of these processes limits the ability of researchers and auditors seeking to verify compliance with regulations which require complete disclosure of data sharing partners. Unfortunately, the applicability of existing techniques to make inferences about unobservable data sharing relationships is limited due to their dependence on protocol- or case-specific artifacts of the online behavioral advertising ecosystem (e.g., they work only when client-side header bidding is used for ad delivery or when advertisers perform ad retargeting). As behavioral advertising technologies continue to evolve rapidly, the availability of these artifacts and the effectiveness of transparency solutions dependent on them remain ephemeral. In this paper, we propose a generalizable technique, called ATOM, to infer data sharing relationships between online trackers and advertisers. ATOM is different from prior work in that it is universally applicable -- i.e., independent of ad delivery protocols or availability of artifacts. ATOM leverages the insight that by the very nature of behavioral advertising, ad creatives themselves can be used to infer data sharing between trackers and advertisers -- after all, the topics and brands showcased in an ad are dependent on the data available to the advertiser. Therefore, by selectively blocking trackers and monitoring changes in the characteristics of ads delivered by advertisers, ATOM is able to identify data sharing relationships between trackers and advertisers. The relationships discovered by our implementation of ATOM include those not found using prior approaches and are validated by external sources.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Making a Radical Misogynist: How online social engagement with the Manosphere influences traits of radicalization
Authors:
Hussam Habib,
Padmini Srinivasan,
Rishab Nithyanand
Abstract:
The algorithms and the interactions facilitated by online platforms have been used by radical groups to recruit vulnerable individuals to their cause. This has resulted in the sharp growth of violent events and deteriorating online discourse. The Manosphere, a collection of radical anti-feminist communities, is one such group which has attracted attention due to their rapid growth and increasingly…
▽ More
The algorithms and the interactions facilitated by online platforms have been used by radical groups to recruit vulnerable individuals to their cause. This has resulted in the sharp growth of violent events and deteriorating online discourse. The Manosphere, a collection of radical anti-feminist communities, is one such group which has attracted attention due to their rapid growth and increasingly violent real world outbursts. In this paper, we examine the social engagements between Reddit users who have participated in feminist discourse and the Manosphere communities on Reddit to understand the process of development of traits associated with the adoption of extremist ideologies. By using existing research on the psychology of radicalization we track how specific types of social engagement with the Manosphere influence the development of traits associated with radicalization. Our findings show that: (1) participation, even by the simple act of joining the Manosphere, has a significant influence on the language and outlook traits of a user, (2) Manosphere elites are extremely effective propagators of radical traits and cause their increase even outside the Manosphere, and (3) community perception can heavily influence a user's behavior. Finally, we examine how our findings can help draft community and platform moderation policies to help mitigate the problem of online radicalization.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Are Proactive Interventions for Reddit Communities Feasible?
Authors:
Hussam Habib,
Maaz Bin Musa,
Fareed Zaffar,
Rishab Nithyanand
Abstract:
Reddit has found its communities playing a prominent role in originating and propagating problematic socio-political discourse. Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequenc…
▽ More
Reddit has found its communities playing a prominent role in originating and propagating problematic socio-political discourse. Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, administrative actions (community bans and quarantines) are often taken only when problematic discourse within a community spills over into the real world with serious consequences. In this paper, we investigate the feasibility of deploying tools to proactively identify problematic communities on Reddit. Proactive identification strategies show promise for three reasons: (1) they have potential to reduce the manual efforts required to track communities for problematic content, (2) they give administrators a scientific rationale to back their decisions and interventions, and (3) they facilitate early and more nuanced interventions (than banning or quarantining) to mitigate problematic discourse.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Reddit and the Fourth Estate: Exploring the magnitude and effects of media influence on community level moderation on Reddit
Authors:
Hussam Habib,
Rishab Nithyanand
Abstract:
Most platforms, including Reddit, face a dilemma when applying interventions such as subreddit bans to toxic communities -- do they risk angering their user base by proactively enforcing stricter controls on discourse or do they defer interventions at the risk of eventually triggering negative media reactions which might impact their advertising revenue? In this paper, we analyze Reddit's previous…
▽ More
Most platforms, including Reddit, face a dilemma when applying interventions such as subreddit bans to toxic communities -- do they risk angering their user base by proactively enforcing stricter controls on discourse or do they defer interventions at the risk of eventually triggering negative media reactions which might impact their advertising revenue? In this paper, we analyze Reddit's previous administrative interventions to understand one aspect of this dilemma: the relationship between the media and administrative interventions. More specifically, we make two primary contributions. First, using a mediation analysis framework, we find evidence that Reddit's interventions for violating their content policy for toxic content occur because of media pressure. Second, using interrupted time series analysis, we show that media attention on communities with toxic content only increases the problematic behavior associated with that community (both within the community itself and across the platform). However, we find no significant difference in the impact of administrative interventions on subreddits with and without media pressure. Taken all together, this study provides evidence of a media-driven moderation strategy at Reddit and also suggests that such a strategy may not have a significantly different impact than a more proactive strategy.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Inferring Tracker-Advertiser Relationships in the Online Advertising Ecosystem using Header Bidding
Authors:
John Cook,
Rishab Nithyanand,
Zubair Shafiq
Abstract:
Online advertising relies on trackers and data brokers to show targeted ads to users. To improve targeting, different entities in the intricately interwoven online advertising and tracking ecosystems are incentivized to share information with each other through client-side or server-side mechanisms. Inferring data sharing between entities, especially when it happens at the server-side, is an impor…
▽ More
Online advertising relies on trackers and data brokers to show targeted ads to users. To improve targeting, different entities in the intricately interwoven online advertising and tracking ecosystems are incentivized to share information with each other through client-side or server-side mechanisms. Inferring data sharing between entities, especially when it happens at the server-side, is an important and challenging research problem. In this paper, we introduce KASHF: a novel method to infer data sharing relationships between advertisers and trackers by studying how an advertiser's bidding behavior changes as we manipulate the presence of trackers. We operationalize this insight by training an interpretable machine learning model that uses the presence of trackers as features to predict the bidding behavior of an advertiser. By analyzing the machine learning model, we are able to infer relationships between advertisers and trackers irrespective of whether data sharing occurs at the client-side or the server-side. We are also able to identify several server-side data sharing relationships that are validated externally but are not detected by client-side cookie syncing.
△ Less
Submitted 20 September, 2019; v1 submitted 16 July, 2019;
originally announced July 2019.
-
To Act or React: Investigating Proactive Strategies For Online Community Moderation
Authors:
Hussam Habib,
Maaz Bin Musa,
Fareed Zaffar,
Rishab Nithyanand
Abstract:
Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, admin…
▽ More
Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, administrative actions (community bans and quarantines) are often taken in reaction to media pressure following offensive discourse within a community spilling into the real world with serious consequences. In this paper, we investigate the feasibility of proactive moderation on Reddit -- i.e., proactively identifying communities at risk of committing offenses that previously resulted in bans for other communities. Proactive moderation strategies show promise for two reasons: (1) they have potential to narrow down the communities that administrators need to monitor for hateful content and (2) they give administrators a scientific rationale to back their administrative decisions and interventions. Our work shows that communities are constantly evolving in their user base and topics of discourse and that evolution into hateful or dangerous (i.e., considered bannable by Reddit administrators) communities can often be predicted months ahead of time. This makes proactive moderation feasible. Further, we leverage explainable machine learning to help identify the strongest predictors of evolution into dangerous communities. This provides administrators with insights into the characteristics of communities at risk becoming dangerous or hateful. Finally, we investigate, at scale, the impact of participation in hateful and dangerous subreddits and the effectiveness of community bans and quarantines on the behavior of members of these communities.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Online Political Discourse in the Trump Era
Authors:
Rishab Nithyanand,
Brian Schaffner,
Phillipa Gill
Abstract:
We identify general trends in the (in)civility and complexity of political discussions occurring on Reddit between January 2007 and May 2017 -- a period spanning both terms of Barack Obama's presidency and the first 100 days of Donald Trump's presidency. We then investigate four factors that are frequently hypothesized as having contributed to the declining quality of American political discourse…
▽ More
We identify general trends in the (in)civility and complexity of political discussions occurring on Reddit between January 2007 and May 2017 -- a period spanning both terms of Barack Obama's presidency and the first 100 days of Donald Trump's presidency. We then investigate four factors that are frequently hypothesized as having contributed to the declining quality of American political discourse -- (1) the rising popularity of Donald Trump, (2) increasing polarization and negative partisanship, (3) the democratization of news media and the rise of fake news, and (4) merging of fringe groups into mainstream political discussions.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
A Churn for the Better: Localizing Censorship using Network-level Path Churn and Network Tomography
Authors:
Shinyoung Cho,
Rishab Nithyanand,
Abbas Razaghpanah,
Phillipa Gill
Abstract:
Recent years have seen the Internet become a key vehicle for citizens around the globe to express political opinions and organize protests. This fact has not gone unnoticed, with countries around the world repurposing network management tools (e.g., URL filtering products) and protocols (e.g., BGP, DNS) for censorship. However, repurposing these products can have unintended international impact, w…
▽ More
Recent years have seen the Internet become a key vehicle for citizens around the globe to express political opinions and organize protests. This fact has not gone unnoticed, with countries around the world repurposing network management tools (e.g., URL filtering products) and protocols (e.g., BGP, DNS) for censorship. However, repurposing these products can have unintended international impact, which we refer to as "censorship leakage". While there have been anecdotal reports of censorship leakage, there has yet to be a systematic study of censorship leakage at a global scale. In this paper, we combine a global censorship measurement platform (ICLab) with a general-purpose technique -- boolean network tomography -- to identify which AS on a network path is performing censorship. At a high-level, our approach exploits BGP churn to narrow down the set of potential censoring ASes by over 95%. We exactly identify 65 censoring ASes and find that the anomalies introduced by 24 of the 65 censoring ASes have an impact on users located in regions outside the jurisdiction of the censoring AS, resulting in the leaking of regional censorship policies.
△ Less
Submitted 23 June, 2017;
originally announced June 2017.
-
Measuring Offensive Speech in Online Political Discourse
Authors:
Rishab Nithyanand,
Brian Schaffner,
Phillipa Gill
Abstract:
The Internet and online forums such as Reddit have become an increasingly popular medium for citizens to engage in political conversations. However, the online disinhibition effect resulting from the ability to use pseudonymous identities may manifest in the form of offensive speech, consequently making political discussions more aggressive and polarizing than they already are. Such environments m…
▽ More
The Internet and online forums such as Reddit have become an increasingly popular medium for citizens to engage in political conversations. However, the online disinhibition effect resulting from the ability to use pseudonymous identities may manifest in the form of offensive speech, consequently making political discussions more aggressive and polarizing than they already are. Such environments may result in harassment and self-censorship from its targets. In this paper, we present preliminary results from a large-scale temporal measurement aimed at quantifying offensiveness in online political discussions.
To enable our measurements, we develop and evaluate an offensive speech classifier. We then use this classifier to quantify and compare offensiveness in the political and general contexts. We perform our study using a database of over 168M Reddit comments made by over 7M pseudonyms between January 2015 and January 2017 -- a period covering several divisive political events including the 2016 US presidential elections.
△ Less
Submitted 19 July, 2017; v1 submitted 6 June, 2017;
originally announced June 2017.
-
Tracking the Trackers: Towards Understanding the Mobile Advertising and Tracking Ecosystem
Authors:
Narseo Vallina-Rodriguez,
Srikanth Sundaresan,
Abbas Razaghpanah,
Rishab Nithyanand,
Mark Allman,
Christian Kreibich,
Phillipa Gill
Abstract:
Third-party services form an integral part of the mobile ecosystem: they allow app developers to add features such as performance analytics and social network integration, and to monetize their apps by enabling user tracking and targeted ad delivery. At present users, researchers, and regulators all have at best limited understanding of this third-party ecosystem. In this paper we seek to shrink t…
▽ More
Third-party services form an integral part of the mobile ecosystem: they allow app developers to add features such as performance analytics and social network integration, and to monetize their apps by enabling user tracking and targeted ad delivery. At present users, researchers, and regulators all have at best limited understanding of this third-party ecosystem. In this paper we seek to shrink this gap. Using data from users of our ICSI Haystack app we gain a rich view of the mobile ecosystem: we identify and characterize domains associated with mobile advertising and user tracking, thereby taking an important step towards greater transparency. We furthermore outline our steps towards a public catalog and census of analytics services, their behavior, their personal data collection processes, and their use across mobile apps.
△ Less
Submitted 26 October, 2016; v1 submitted 22 September, 2016;
originally announced September 2016.
-
Exploring the Design Space of Longitudinal Censorship Measurement Platforms
Authors:
Abbas Razaghpanah,
Anke Li,
Arturo Filastò,
Rishab Nithyanand,
Vasilis Ververis,
Will Scott,
Phillipa Gill
Abstract:
Despite the high perceived value and increasing severity of online information controls, a data-driven understanding of the phenomenon has remained elusive. In this paper, we consider two design points in the space of Internet censorship measurement with particular emphasis on how they address the challenges of locating vantage points, choosing content to test, and analyzing results. We discuss th…
▽ More
Despite the high perceived value and increasing severity of online information controls, a data-driven understanding of the phenomenon has remained elusive. In this paper, we consider two design points in the space of Internet censorship measurement with particular emphasis on how they address the challenges of locating vantage points, choosing content to test, and analyzing results. We discuss the trade offs of decisions made by each platform and show how the resulting data provides complementary views of global censorship. Finally, we discuss lessons learned and open challenges discovered through our experiences.
△ Less
Submitted 29 October, 2016; v1 submitted 6 June, 2016;
originally announced June 2016.
-
Ad-Blocking and Counter Blocking: A Slice of the Arms Race
Authors:
Rishab Nithyanand,
Sheharbano Khattak,
Mobin Javed,
Narseo Vallina-Rodriguez,
Marjan Falahrastegar,
Julia E. Powles,
Emiliano De Cristofaro,
Hamed Haddadi,
Steven J. Murdoch
Abstract:
Adblocking tools like Adblock Plus continue to rise in popularity, potentially threatening the dynamics of advertising revenue streams. In response, a number of publishers have ramped up efforts to develop and deploy mechanisms for detecting and/or counter-blocking adblockers (which we refer to as anti-adblockers), effectively escalating the online advertising arms race. In this paper, we develop…
▽ More
Adblocking tools like Adblock Plus continue to rise in popularity, potentially threatening the dynamics of advertising revenue streams. In response, a number of publishers have ramped up efforts to develop and deploy mechanisms for detecting and/or counter-blocking adblockers (which we refer to as anti-adblockers), effectively escalating the online advertising arms race. In this paper, we develop a scalable approach for identifying third-party services shared across multiple web-sites and use it to provide a first characterization of anti-adblocking across the Alexa Top-5K websites. We map websites that perform anti-adblocking as well as the entities that provide anti-adblocking scripts. We study the modus operandi of these scripts and their impact on popular adblockers. We find that at least 6.7% of websites in the Alexa Top-5K use anti-adblocking scripts, acquired from 12 distinct entities -- some of which have a direct interest in nourishing the online advertising industry.
△ Less
Submitted 20 July, 2016; v1 submitted 17 May, 2016;
originally announced May 2016.
-
Holding all the ASes: Identifying and Circumventing the Pitfalls of AS-aware Tor Client Design
Authors:
Rishab Nithyanand,
Rachee Singh,
Shinyoung Cho,
Phillipa Gill
Abstract:
Traffic correlation attacks to de-anonymize Tor users are possible when an adversary is in a position to observe traffic entering and exiting the Tor network. Recent work has brought attention to the threat of these attacks by network-level adversaries (e.g., Autonomous Systems). We perform a historical analysis to understand how the threat from AS-level traffic correlation attacks has evolved ove…
▽ More
Traffic correlation attacks to de-anonymize Tor users are possible when an adversary is in a position to observe traffic entering and exiting the Tor network. Recent work has brought attention to the threat of these attacks by network-level adversaries (e.g., Autonomous Systems). We perform a historical analysis to understand how the threat from AS-level traffic correlation attacks has evolved over the past five years. We find that despite a large number of new relays added to the Tor network, the threat has grown. This points to the importance of increasing AS-level diversity in addition to capacity of the Tor network.
We identify and elaborate on common pitfalls of AS-aware Tor client design and construction. We find that succumbing to these pitfalls can negatively impact three major aspects of an AS-aware Tor client -- (1) security against AS-level adversaries, (2) security against relay-level adversaries, and (3) performance. Finally, we propose and evaluate a Tor client -- Cipollino -- which avoids these pitfalls using state-of-the-art in network-measurement. Our evaluation shows that Cipollino is able to achieve better security against network-level adversaries while maintaining security against relay-level adversaries and
△ Less
Submitted 11 May, 2016;
originally announced May 2016.
-
Measuring and mitigating AS-level adversaries against Tor
Authors:
Rishab Nithyanand,
Oleksii Starov,
Adva Zair,
Phillipa Gill,
Michael Schapira
Abstract:
The popularity of Tor as an anonymity system has made it a popular target for a variety of attacks. We focus on traffic correlation attacks, which are no longer solely in the realm of academic research with recent revelations about the NSA and GCHQ actively working to implement them in practice.
Our first contribution is an empirical study that allows us to gain a high fidelity snapshot of the t…
▽ More
The popularity of Tor as an anonymity system has made it a popular target for a variety of attacks. We focus on traffic correlation attacks, which are no longer solely in the realm of academic research with recent revelations about the NSA and GCHQ actively working to implement them in practice.
Our first contribution is an empirical study that allows us to gain a high fidelity snapshot of the threat of traffic correlation attacks in the wild. We find that up to 40% of all circuits created by Tor are vulnerable to attacks by traffic correlation from Autonomous System (AS)-level adversaries, 42% from colluding AS-level adversaries, and 85% from state-level adversaries. In addition, we find that in some regions (notably, China and Iran) there exist many cases where over 95% of all possible circuits are vulnerable to correlation attacks, emphasizing the need for AS-aware relay-selection.
To mitigate the threat of such attacks, we build Astoria--an AS-aware Tor client. Astoria leverages recent developments in network measurement to perform path-prediction and intelligent relay selection. Astoria reduces the number of vulnerable circuits to 2% against AS-level adversaries, under 5% against colluding AS-level adversaries, and 25% against state-level adversaries. In addition, Astoria load balances across the Tor network so as to not overload any set of relays.
△ Less
Submitted 26 December, 2015; v1 submitted 19 May, 2015;
originally announced May 2015.
-
Games Without Frontiers: Investigating Video Games as a Covert Channel
Authors:
Bridger Hahn,
Rishab Nithyanand,
Phillipa Gill,
Rob Johnson
Abstract:
The Internet has become a critical communication infrastructure for citizens to organize protests and express dissatisfaction with their governments. This fact has not gone unnoticed, with governments clam** down on this medium via censorship, and circumvention researchers working to stay one step ahead.
In this paper, we explore a promising new avenue for covert channels: real-time strategy-v…
▽ More
The Internet has become a critical communication infrastructure for citizens to organize protests and express dissatisfaction with their governments. This fact has not gone unnoticed, with governments clam** down on this medium via censorship, and circumvention researchers working to stay one step ahead.
In this paper, we explore a promising new avenue for covert channels: real-time strategy-video games. Video games have two key features that make them attractive cover protocols for censorship circumvention. First, due to the popularity of gaming platforms such as Steam, there are a lot of different video games, each with their own protocols and server infrastructure. Users of video-game-based censorship-circumvention tools can therefore diversify across many games, making it difficult for the censor to respond by simply blocking a single cover protocol. Second, games in the same genre have many common features and concepts. As a result, the same covert channel framework can be easily adapted to work with many different games. This means that circumvention tool developers can stay ahead of the censor by creating a diverse set of tools and by quickly adapting to blockades created by the censor.
We demonstrate the feasibility of this approach by implementing our coding scheme over two real-time strategy-games (including a very popular closed-source game). We evaluate the security of our system prototype -- Castle -- by quantifying its resilience to a censor-adversary, its similarity to real game traffic, and its ability to avoid common pitfalls in covert channel design. We use our prototype to demonstrate that our approach can provide throughput which is amenable to transfer of textual data, such at e-mail, SMS messages, and tweets, which are commonly used to organize political actions.
△ Less
Submitted 19 May, 2015; v1 submitted 19 March, 2015;
originally announced March 2015.
-
How Best to Handle a Dicey Situation
Authors:
Rishab Nithyanand,
Jonathan Toohill,
Rob Johnson
Abstract:
We introduce the {Destructive Object Handling} (DOH) problem, which models aspects of many real-world allocation problems, such as ship** explosive munitions, scheduling processes in a cluster with fragile nodes, re-using passwords across multiple websites, and quarantining patients during a disease outbreak. In these problems, objects must be assigned to handlers, but each object has a probabil…
▽ More
We introduce the {Destructive Object Handling} (DOH) problem, which models aspects of many real-world allocation problems, such as ship** explosive munitions, scheduling processes in a cluster with fragile nodes, re-using passwords across multiple websites, and quarantining patients during a disease outbreak. In these problems, objects must be assigned to handlers, but each object has a probability of destroying itself and all the other objects allocated to the same handler. The goal is to maximize the expected value of the objects handled successfully.
We show that finding the optimal allocation is $\mathsf{NP}$-$\mathsf{complete}$, even if all the handlers are identical. We present an FPTAS when the number of handlers is constant. We note in passing that the same technique also yields a first FPTAS for the weapons-target allocation problem \cite{manne_wta} with a constant number of targets. We study the structure of DOH problems and find that they have a sort of phase transition -- in some instances it is better to spread risk evenly among the handlers, in others, one handler should be used as a ``sacrificial lamb''. We show that the problem is solvable in polynomial time if the destruction probabilities depend only on the handler to which an object is assigned; if all the handlers are identical and the objects all have the same value; or if each handler can be assigned at most one object.
Finally, we empirically evaluate several heuristics based on a combination of greedy and genetic algorithms. The proposed heuristics return fairly high quality solutions to very large problem instances (upto 250 objects and 100 handlers) in tens of seconds.
△ Less
Submitted 28 January, 2014;
originally announced January 2014.
-
New Approaches to Website Fingerprinting Defenses
Authors:
Xiang Cai,
Rishab Nithyanand,
Rob Johnson
Abstract:
Website fingerprinting attacks enable an adversary to infer which website a victim is visiting, even if the victim uses an encrypting proxy, such as Tor. Previous work has shown that all proposed defenses against website fingerprinting attacks are ineffective.
This paper advances the study of website fingerprinting attacks and defenses in two ways. First, we develop bounds on the trade-off betwe…
▽ More
Website fingerprinting attacks enable an adversary to infer which website a victim is visiting, even if the victim uses an encrypting proxy, such as Tor. Previous work has shown that all proposed defenses against website fingerprinting attacks are ineffective.
This paper advances the study of website fingerprinting attacks and defenses in two ways. First, we develop bounds on the trade-off between security and bandwidth overhead that any fingerprinting defense scheme can achieve. This enables us to compare schemes with different security/overhead trade-offs by comparing how close they are to the lower bound. We then refine, implement, and evaluate the Congestion Sensitive BuFLO scheme outlined by Cai, et al. CS-BuFLO, which is based on the provably-secure BuFLO defense proposed by Dyer, et al., was not fully-specified by Cai, et al, but has nonetheless attracted the attention of the Tor developers. Our experiments find that CS-BuFLO has high overhead (around 2.3-2.8x) but can get 6x closer to the bandwidth/security trade-off lower bound than Tor or plain SSH.
△ Less
Submitted 23 January, 2014;
originally announced January 2014.