Search | arXiv e-print repository

PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

Authors: Shuo Yu, Feng Xia, Yueru Wang, Shihao Li, Falih Febrinanto, Madhu Chetty

Abstract: COVID-19 as a global pandemic causes a massive disruption to social stability that threatens human life and the economy. Policymakers and all elements of society must deliver measurable actions based on the pandemic's severity to minimize the detrimental impact of COVID-19. A proper forecasting system is arguably important to provide an early signal of the risk of COVID-19 infection so that the au… ▽ More COVID-19 as a global pandemic causes a massive disruption to social stability that threatens human life and the economy. Policymakers and all elements of society must deliver measurable actions based on the pandemic's severity to minimize the detrimental impact of COVID-19. A proper forecasting system is arguably important to provide an early signal of the risk of COVID-19 infection so that the authorities are ready to protect the people from the worst. However, making a good forecasting model for infection risks in different cities or regions is not an easy task, because it has a lot of influential factors that are difficult to be identified manually. To address the current limitations, we propose a deep graph learning model, called PANDORA, to predict the infection risks of COVID-19, by considering all essential factors and integrating them into a geographical network. The framework uses geographical position relations and transportation frequency as higher-order structural properties formulated by higher-order network structures (i.e., network motifs). Moreover, four significant node attributes (i.e., multiple features of a particular area, including climate, medical condition, economy, and human mobility) are also considered. We propose three different aggregators to better aggregate node attributes and structural features, namely, Hadamard, Summation, and Connection. Experimental results over real data show that PANDORA outperforms the baseline method with higher accuracy and faster convergence speed, no matter which aggregator is chosen. We believe that PANDORA using deep graph learning provides a promising approach to get superior performance in infection risk level forecasting and help humans battle the COVID-19 crisis. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.05225 [pdf, other]

doi 10.1145/3613904.3642333

"Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies

Authors: Brennan Schaffner, Arjun Nitin Bhagoji, Siyuan Cheng, Jacqueline Mei, Jay L. Shen, Grace Wang, Marshini Chetty, Nick Feamster, Genevieve Lakier, Chenhao Tan

Abstract: Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t… ▽ More Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and topics. This paper presents the first systematic study of these policies from the 43 largest online platforms hosting user-generated content, focusing on policies around copyright infringement, harmful speech, and misleading content. We build a custom web-scraper to obtain policy text and develop a unified annotation scheme to analyze the text for the presence of critical components. We find significant structural and compositional variation in policies across topics and platforms, with some variation attributable to disparate legal groundings. We lay the groundwork for future studies of ever-evolving content moderation policies and their impact on users. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2403.17225 [pdf]

doi 10.1145/3613904.3642597

Measuring Compliance with the California Consumer Privacy Act Over Space and Time

Authors: Van Tran, Aarushi Mehrotra, Marshini Chetty, Nick Feamster, Jens Frankenreiter, Lior Strahilevitz

Abstract: The widespread sharing of consumers personal information with third parties raises significant privacy concerns. The California Consumer Privacy Act (CCPA) mandates that online businesses offer consumers the option to opt out of the sale and sharing of personal information. Our study automatically tracks the presence of the opt-out link longitudinally across multiple states after the California Pr… ▽ More The widespread sharing of consumers personal information with third parties raises significant privacy concerns. The California Consumer Privacy Act (CCPA) mandates that online businesses offer consumers the option to opt out of the sale and sharing of personal information. Our study automatically tracks the presence of the opt-out link longitudinally across multiple states after the California Privacy Rights Act (CPRA) went into effect. We categorize websites based on whether they are subject to CCPA and investigate cases of potential non-compliance. We find a number of websites that implement the opt-out link early and across all examined states but also find a significant number of CCPA-subject websites that fail to offer any opt-out methods even when CCPA is in effect. Our findings can shed light on how websites are reacting to the CCPA and identify potential gaps in compliance and opt-out method designs that hinder consumers from exercising CCPA opt-out rights. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2401.15221 [pdf, other]

Designing and Testing a Mobile Application for Collecting WhatsApp Chat Data While Preserving Privacy

Authors: Brennan Schaffner, Archie Brohn, Jason Chee, K. J. Feng, Marshini Chetty

Abstract: It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and over… ▽ More It is common practice for researchers to join public WhatsApp chats and scrape their contents for analysis. However, research shows collecting data this way contradicts user expectations and preferences, even if the data is effectively public. To overcome these issues, we outline design considerations for collecting WhatsApp chat data with improved user privacy by heightening user control and oversight of data collection and taking care to minimize the data researchers collect and process off a user's device. We refer to these design principles as User-Centered Data Sharing (UCDS). To evaluate our UCDS principles, we implemented a mobile application representing one possible instance of these improved data collection techniques and evaluated the viability of using the app to collect WhatsApp chat data. Second, we surveyed WhatsApp users to gather user perceptions on common existing WhatsApp data collection methods as well as UCDS methods. Our results show that we were able to glean similar informative insights into WhatsApp chats using UCDS principles in our prototype app to common, less privacy-preserving methods. Our survey showed that methods following the UCDS principles are preferred by users because they offered users more control over the data collection process. Future user studies could further expand upon UCDS principles to overcome complications of researcher-to-group communication in research on WhatsApp chats and evaluate these principles in other data sharing contexts. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2311.06698 [pdf, other]

VidPlat: A Tool for Fast Crowdsourcing of Quality-of-Experience Measurements

Authors: Xu Zhang, Hanchen Li, Paul Schmitt, Marshini Chetty, Nick Feamster, Junchen Jiang

Abstract: For video or web services, it is crucial to measure user-perceived quality of experience (QoE) at scale under various video quality or page loading delays. However, fast QoE measurements remain challenging as they must elicit subjective assessment from human users. Previous work either (1) automates QoE measurements by letting crowdsourcing raters watch and rate QoE test videos or (2) dynamically… ▽ More For video or web services, it is crucial to measure user-perceived quality of experience (QoE) at scale under various video quality or page loading delays. However, fast QoE measurements remain challenging as they must elicit subjective assessment from human users. Previous work either (1) automates QoE measurements by letting crowdsourcing raters watch and rate QoE test videos or (2) dynamically prunes redundant QoE tests based on previously collected QoE measurements. Unfortunately, it is hard to combine both ideas because traditional crowdsourcing requires QoE test videos to be pre-determined before a crowdsourcing campaign begins. Thus, if researchers want to dynamically prune redundant test videos based on other test videos' QoE, they are forced to launch multiple crowdsourcing campaigns, causing extra overheads to re-calibrate or train raters every time. This paper presents VidPlat, the first open-source tool for fast and automated QoE measurements, by allowing dynamic pruning of QoE test videos within a single crowdsourcing task. VidPlat creates an indirect shim layer between researchers and the crowdsourcing platforms. It allows researchers to define a logic that dynamically determines which new test videos need more QoE ratings based on the latest QoE measurements, and it then redirects crowdsourcing raters to watch QoE test videos dynamically selected by this logic. Other than having fewer crowdsourcing campaigns, VidPlat also reduces the total number of QoE ratings by dynamically deciding when enough ratings are gathered for each test video. It is an open-source platform that future researchers can reuse and customize. We have used VidPlat in three projects (web loading, on-demand video, and online gaming). We show that VidPlat can reduce crowdsourcing cost by 31.8% - 46.0% and latency by 50.9% - 68.8%. △ Less

Submitted 11 November, 2023; originally announced November 2023.

arXiv:2211.15959 [pdf, other]

Enabling Personalized Video Quality Optimization with VidHoc

Authors: Xu Zhang, Paul Schmitt, Marshini Chetty, Nick Feamster, Junchen Jiang

Abstract: The emerging video applications greatly increase the demand in network bandwidth that is not easy to scale. To provide higher quality of experience (QoE) under limited bandwidth, a recent trend is to leverage the heterogeneity of quality preferences across individual users. Although these efforts have suggested the great potential benefits, service providers still have not deployed them to realize… ▽ More The emerging video applications greatly increase the demand in network bandwidth that is not easy to scale. To provide higher quality of experience (QoE) under limited bandwidth, a recent trend is to leverage the heterogeneity of quality preferences across individual users. Although these efforts have suggested the great potential benefits, service providers still have not deployed them to realize the promised QoE improvement. The missing piece is an automation of online per-user QoE modeling and optimization scheme for new users. Previous efforts either optimize QoE by known per-user QoE models or learn a user's QoE model by offline approaches, such as analysis of video viewing history and in-lab user study. Relying on such offline modeling is problematic, because QoE optimization will start late for collecting enough data to train an unbiased QoE model. In this paper, we propose VidHoc, the first automatic system that jointly personalizes QoE model and optimizes QoE in an online manner for each new user. VidHoc can build per-user QoE models within a small number of video sessions as well as maintain good QoE. We evaluate VidHoc in a pilot deployment to fifteen users for four months with the care of statistical validity. Compared with other baselines, the results show that VidHoc can save 17.3% bandwidth while maintaining the same QoE or improve QoE by 13.9% with the same bandwidth. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2110.15345 [pdf, other]

Measuring the Consolidation of DNS and Web Hosting Providers

Authors: Synthia Wang, Kyle MacMillan, Brennan Schaffner, Nick Feamster, Marshini Chetty

Abstract: Despite the Internet's continued growth, it increasingly depends on a small set of service providers to support Domain Name System (DNS) and web content hosting. This trend poses many potential threats including susceptibility to outages, failures, and potential censorship by providers. This paper aims to quantify consolidation in terms of popular domains' reliance on a small set of organizations… ▽ More Despite the Internet's continued growth, it increasingly depends on a small set of service providers to support Domain Name System (DNS) and web content hosting. This trend poses many potential threats including susceptibility to outages, failures, and potential censorship by providers. This paper aims to quantify consolidation in terms of popular domains' reliance on a small set of organizations for both DNS and web hosting. We highlight the extent to which a set of relatively few platforms host the authoritative name servers and web content for the top million websites. Our results show that both DNS and web hosting are concentrated, with Cloudflare and Amazon hosting over $30\%$ of the domains for both services. With the addition of Akamai, Fastly, and Google, these five organizations host $60\%$ of index pages in the Tranco top 10K, as well as the majority of external page resources. These trends are consistent across six different global vantage points, indicating that consolidation is happening globally and popular organizations can influence users' online experience across the world. △ Less

Submitted 30 January, 2024; v1 submitted 28 October, 2021; originally announced October 2021.

arXiv:2002.11834 [pdf, other]

Understanding How and Why University Students Use Virtual Private Networks

Authors: Agnieszka Dutkowska-Zuk, Austin Hounsel, Andre Xiong, Molly Roberts, Brandon Stewart, Marshini Chetty, Nick Feamster

Abstract: We study how and why university students chose and use VPNs, and whether they are aware of the security and privacy risks that VPNs pose. To answer these questions, we conducted 32 in-person interviews and a survey with 349 respondents, all university students in the United States. We find students are mostly concerned with access to content and privacy concerns were often secondary. They made tra… ▽ More We study how and why university students chose and use VPNs, and whether they are aware of the security and privacy risks that VPNs pose. To answer these questions, we conducted 32 in-person interviews and a survey with 349 respondents, all university students in the United States. We find students are mostly concerned with access to content and privacy concerns were often secondary. They made tradeoffs to achieve a particular goal, such as using a free commercial VPN that may collect their online activities to access an online service in a geographic area. Many users expected that their VPNs were collecting data about them, although they did not understand how VPNs work. We conclude with a discussion of ways to help users make choices about VPNs. △ Less

Submitted 22 February, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: Interview guide, interview summary codebook, survey questions, and additional survey figures included in the appendix document

arXiv:2001.10608 [pdf, other]

doi 10.1145/3539737

You, Me, and IoT: How Internet-Connected Consumer Devices Affect Interpersonal Relationships

Authors: Noah Apthorpe, Pardis Emami-Naeini, Arunesh Mathur, Marshini Chetty, Nick Feamster

Abstract: Internet-connected consumer devices have rapidly increased in popularity; however, relatively little is known about how these technologies are affecting interpersonal relationships in multi-occupant households. In this study, we conduct 13 semi-structured interviews and survey 508 individuals from a variety of backgrounds to discover and categorize how consumer IoT devices are affecting interperso… ▽ More Internet-connected consumer devices have rapidly increased in popularity; however, relatively little is known about how these technologies are affecting interpersonal relationships in multi-occupant households. In this study, we conduct 13 semi-structured interviews and survey 508 individuals from a variety of backgrounds to discover and categorize how consumer IoT devices are affecting interpersonal relationships in the United States. We highlight several themes, providing exploratory data about the pervasiveness of interpersonal costs and benefits of consumer IoT devices. These results inform follow-up studies and design priorities for future IoT technologies to amplify positive and reduce negative interpersonal effects. △ Less

Submitted 1 June, 2022; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: 28 pages, 5 figures, 5 tables, 1 supplemental PDF. Camera-ready version for journal publication. Original title: "You, Me, and IoT: How Internet-Connected Home Devices Affect Interpersonal Relationships"

Journal ref: ACM Transactions on Internet of Things, Volume 3, Issue 4, 2022, Article 25, pp 1-29

arXiv:1910.14112 [pdf, other]

Alexa, Who Am I Speaking To? Understanding Users' Ability to Identify Third-Party Apps on Amazon Alexa

Authors: David J. Major, Danny Yuxing Huang, Marshini Chetty, Nick Feamster

Abstract: Many Internet of Things (IoT) devices have voice user interfaces (VUIs). One of the most popular VUIs is Amazon's Alexa, which supports more than 47,000 third-party applications ("skills"). We study how Alexa's integration of these skills may confuse users. Our survey of 237 participants found that users do not understand that skills are often operated by third parties, that they often confuse thi… ▽ More Many Internet of Things (IoT) devices have voice user interfaces (VUIs). One of the most popular VUIs is Amazon's Alexa, which supports more than 47,000 third-party applications ("skills"). We study how Alexa's integration of these skills may confuse users. Our survey of 237 participants found that users do not understand that skills are often operated by third parties, that they often confuse third-party skills with native Alexa functions, and that they are unaware of the functions that the native Alexa system supports. Surprisingly, users who interact with Alexa more frequently are more likely to conclude that a third-party skill is native Alexa functionality. The potential for misunderstanding creates new security and privacy risks: attackers can develop third-party skills that operate without users' knowledge or masquerade as native Alexa functions. To mitigate this threat, we make design recommendations to help users distinguish native and third-party skills. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1907.07032 [pdf, other]

doi 10.1145/3359183

Dark Patterns at Scale: Findings from a Crawl of 11K Shop** Websites

Authors: Arunesh Mathur, Gunes Acar, Michael J. Friedman, Elena Lucherini, Jonathan Mayer, Marshini Chetty, Arvind Narayanan

Abstract: Dark patterns are user interface design choices that benefit an online service by coercing, steering, or deceiving users into making unintended and potentially harmful decisions. We present automated techniques that enable experts to identify dark patterns on a large set of websites. Using these techniques, we study shop** websites, which often use dark patterns to influence users into making mo… ▽ More Dark patterns are user interface design choices that benefit an online service by coercing, steering, or deceiving users into making unintended and potentially harmful decisions. We present automated techniques that enable experts to identify dark patterns on a large set of websites. Using these techniques, we study shop** websites, which often use dark patterns to influence users into making more purchases or disclosing more information than they would otherwise. Analyzing ~53K product pages from ~11K shop** websites, we discover 1,818 dark pattern instances, together representing 15 types and 7 broader categories. We examine these dark patterns for deceptive practices, and find 183 websites that engage in such practices. We also uncover 22 third-party entities that offer dark patterns as a turnkey solution. Finally, we develop a taxonomy of dark pattern characteristics that describes the underlying influence of the dark patterns and their potential harm on user decision-making. Based on our findings, we make recommendations for stakeholders including researchers and regulators to study, mitigate, and minimize the use of these patterns. △ Less

Submitted 20 September, 2019; v1 submitted 16 July, 2019; originally announced July 2019.

Comments: 32 pages, 11 figures, ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2019)

Journal ref: Proceedings of the ACM Human-Computer Interaction, Vol. 3, CSCW, Article 81 (November 2019)

arXiv:1809.00620 [pdf, other]

doi 10.1145/3274388

Endorsements on Social Media: An Empirical Study of Affiliate Marketing Disclosures on YouTube and Pinterest

Authors: Arunesh Mathur, Arvind Narayanan, Marshini Chetty

Abstract: Online advertisements that masquerade as non-advertising content pose numerous risks to users. Such hidden advertisements appear on social media platforms when content creators or "influencers" endorse products and brands in their content. While the Federal Trade Commission (FTC) requires content creators to disclose their endorsements in order to prevent deception and harm to users, we do not kno… ▽ More Online advertisements that masquerade as non-advertising content pose numerous risks to users. Such hidden advertisements appear on social media platforms when content creators or "influencers" endorse products and brands in their content. While the Federal Trade Commission (FTC) requires content creators to disclose their endorsements in order to prevent deception and harm to users, we do not know whether and how content creators comply with the FTC's guidelines. In this paper, we studied disclosures within affiliate marketing, an endorsement-based advertising strategy used by social media content creators. We examined whether content creators follow the FTC's disclosure guidelines, how they word the disclosures, and whether these disclosures help users identify affiliate marketing content as advertisements. To do so, we first measured the prevalence of and identified the types of disclosures in over 500,000 YouTube videos and 2.1 million Pinterest pins. We then conducted a user study with 1,791 participants to test the efficacy of these disclosures. Our findings reveal that only about 10% of affiliate marketing content on both platforms contains any disclosures at all. Further, users fail to understand shorter, non-explanatory disclosures. Based on our findings, we make various design and policy suggestions to help improve advertising disclosure practices on social media platforms. △ Less

Submitted 6 October, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

Comments: 26 pages, 6 figures, ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2018)

Journal ref: Proceedings of the ACM on Human- Computer Interaction, Vol. 2, CSCW, Article 119 (November 2018)

arXiv:1806.11278 [pdf, other]

How Do Tor Users Interact With Onion Services?

Authors: Philipp Winter, Anne Edmundson, Laura M. Roberts, Agnieszka Dutkowska-Zuk, Marshini Chetty, Nick Feamster

Abstract: Onion services are anonymous network services that are exposed over the Tor network. In contrast to conventional Internet services, onion services are private, generally not indexed by search engines, and use self-certifying domain names that are long and difficult for humans to read. In this paper, we study how people perceive, understand, and use onion services based on data from 17 semi-structu… ▽ More Onion services are anonymous network services that are exposed over the Tor network. In contrast to conventional Internet services, onion services are private, generally not indexed by search engines, and use self-certifying domain names that are long and difficult for humans to read. In this paper, we study how people perceive, understand, and use onion services based on data from 17 semi-structured interviews and an online survey of 517 users. We find that users have an incomplete mental model of onion services, use these services for anonymity and have varying trust in onion services in general. Users also have difficulty discovering and tracking onion sites and authenticating them. Finally, users want technical improvements to onion services and better information on how to use them. Our findings suggest various improvements for the security and usability of Tor onion services, including ways to automatically detect phishing of onion services, more clear security indicators, and ways to manage onion domain names that are difficult to remember. △ Less

Submitted 29 June, 2018; originally announced June 2018.

Comments: Appeared in USENIX Security Symposium 2018

Journal ref: USENIX Security Symposium, Baltimore, Maryland, August 2018

arXiv:1803.08488 [pdf, other]

An Empirical Study of Affiliate Marketing Disclosures on YouTube and Pinterest

Authors: Arunesh Mathur, Arvind Narayanan, Marshini Chetty

Abstract: While disclosures relating to various forms of Internet advertising are well established and follow specific formats, endorsement marketing disclosures are often open-ended in nature and written by individual publishers. Because such marketing often appears as part of publishers' actual content, ensuring that it is adequately disclosed is critical so that end-users can identify it as such. In this… ▽ More While disclosures relating to various forms of Internet advertising are well established and follow specific formats, endorsement marketing disclosures are often open-ended in nature and written by individual publishers. Because such marketing often appears as part of publishers' actual content, ensuring that it is adequately disclosed is critical so that end-users can identify it as such. In this paper, we characterize disclosures relating to affiliate marketing---a type of endorsement based marketing---on two popular social media platforms: YouTube & Pinterest. We find that only roughly one-tenth of affiliate content on both platforms contains disclosures. Based on our findings, we make policy recommendations geared towards various stakeholders in the affiliate marketing industry, highlighting how both social media platforms and affiliate companies can enable better disclosure practices. △ Less

Submitted 25 March, 2018; v1 submitted 22 March, 2018; originally announced March 2018.

arXiv:1802.08182 [pdf, ps, other]

doi 10.1145/3274469

User Perceptions of Smart Home IoT Privacy

Authors: Serena Zheng, Noah Apthorpe, Marshini Chetty, Nick Feamster

Abstract: Smart home Internet of Things (IoT) devices are rapidly increasing in popularity, with more households including Internet-connected devices that continuously monitor user activities. In this study, we conduct eleven semi-structured interviews with smart home owners, investigating their reasons for purchasing IoT devices, perceptions of smart home privacy risks, and actions taken to protect their p… ▽ More Smart home Internet of Things (IoT) devices are rapidly increasing in popularity, with more households including Internet-connected devices that continuously monitor user activities. In this study, we conduct eleven semi-structured interviews with smart home owners, investigating their reasons for purchasing IoT devices, perceptions of smart home privacy risks, and actions taken to protect their privacy from those external to the home who create, manage, track, or regulate IoT devices and/or their data. We note several recurring themes. First, users' desires for convenience and connectedness dictate their privacy-related behaviors for dealing with external entities, such as device manufacturers, Internet Service Providers, governments, and advertisers. Second, user opinions about external entities collecting smart home data depend on perceived benefit from these entities. Third, users trust IoT device manufacturers to protect their privacy but do not verify that these protections are in place. Fourth, users are unaware of privacy risks from inference algorithms operating on data from non-audio/visual devices. These findings motivate several recommendations for device designers, researchers, and industry standards to better match device privacy features to the expectations and preferences of smart home owners. △ Less

Submitted 16 October, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

Comments: 20 pages, 1 table

Journal ref: Proceedings of the ACM on Human-Computer Interaction, ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW), Volume 2, Article 200. November 2018

Showing 1–15 of 15 results for author: Chetty, M