Search | arXiv e-print repository

doi 10.1145/3643544

Matcha: An IDE Plugin for Creating Accurate Privacy Nutrition Labels

Authors: Tianshi Li, Lorrie Faith Cranor, Yuvraj Agarwal, Jason I. Hong

Abstract: Apple and Google introduced their versions of privacy nutrition labels to the mobile app stores to better inform users of the apps' data practices. However, these labels are self-reported by developers and have been found to contain many inaccuracies due to misunderstandings of the label taxonomy. In this work, we present Matcha, an IDE plugin that uses automated code analysis to help developers c… ▽ More Apple and Google introduced their versions of privacy nutrition labels to the mobile app stores to better inform users of the apps' data practices. However, these labels are self-reported by developers and have been found to contain many inaccuracies due to misunderstandings of the label taxonomy. In this work, we present Matcha, an IDE plugin that uses automated code analysis to help developers create accurate Google Play data safety labels. Developers can benefit from Matcha's ability to detect user data accesses and transmissions while staying in control of the generated label by adding custom Java annotations and modifying an auto-generated XML specification. Our evaluation with 12 developers showed that Matcha helped our participants improved the accuracy of a label they created with Google's official tool for a real-world app they developed. We found that participants preferred Matcha for its accuracy benefits. Drawing on Matcha, we discuss general design recommendations for developer tools used to create accurate standardized privacy notices. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 38 pages

arXiv:2303.09743 [pdf, other]

doi 10.1145/3544548.3580882

Understanding Frontline Workers' and Unhoused Individuals' Perspectives on AI Used in Homeless Services

Authors: Tzu-Sheng Kuo, Hong Shen, Jisoo Geum, Nev Jones, Jason I. Hong, Haiyi Zhu, Kenneth Holstein

Abstract: Recent years have seen growing adoption of AI-based decision-support systems (ADS) in homeless services, yet we know little about stakeholder desires and concerns surrounding their use. In this work, we aim to understand impacted stakeholders' perspectives on a deployed ADS that prioritizes scarce housing resources. We employed AI lifecycle comicboarding, an adapted version of the comicboarding me… ▽ More Recent years have seen growing adoption of AI-based decision-support systems (ADS) in homeless services, yet we know little about stakeholder desires and concerns surrounding their use. In this work, we aim to understand impacted stakeholders' perspectives on a deployed ADS that prioritizes scarce housing resources. We employed AI lifecycle comicboarding, an adapted version of the comicboarding method, to elicit stakeholder feedback and design ideas across various components of an AI system's design. We elicited feedback from county workers who operate the ADS daily, service providers whose work is directly impacted by the ADS, and unhoused individuals in the region. Our participants shared concerns and design suggestions around the AI system's overall objective, specific model design choices, dataset selection, and use in deployment. Our findings demonstrate that stakeholders, even without AI knowledge, can provide specific and critical feedback on an AI system's design and deployment, if empowered to do so. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23)

arXiv:2302.04732 [pdf, other]

doi 10.1145/3544548.3581268

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Authors: Ángel Alexander Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, Adam Perer

Abstract: Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners d… ▽ More Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners discover real-world patterns and validate systematic failures. We conducted 18 semi-structured interviews with ML practitioners to better understand the challenges of behavioral evaluation and found that it is a collaborative, use-case-first process that is not adequately supported by existing task- and domain-specific tools. Using these findings, we designed Zeno, a general-purpose framework for visualizing and testing AI systems across diverse use cases. In four case studies with participants using Zeno on real-world models, we found that practitioners were able to reproduce previous manual analyses and discover new systematic failures. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.06937 [pdf, other]

doi 10.1145/3579612

Improving Human-AI Collaboration With Descriptions of AI Behavior

Authors: Ángel Alexander Cabrera, Adam Perer, Jason I. Hong

Abstract: People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions, details of how AI systems perform on subgroups of instances. We tested the efficacy of behavior descriptions through user studies with 225 partici… ▽ More People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions, details of how AI systems perform on subgroups of instances. We tested the efficacy of behavior descriptions through user studies with 225 participants in three distinct domains: fake review detection, satellite image classification, and bird classification. We found that behavior descriptions can increase human-AI accuracy through two mechanisms: hel** people identify AI failures and increasing people's reliance on the AI when it is more accurate. These findings highlight the importance of people's mental models in human-AI collaboration and show that informing people of high-level AI behaviors can significantly improve AI-assisted decision making. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 21 pages

Journal ref: Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 136 (April 2023)

arXiv:2205.06937 [pdf]

Experimental Evidence for Using a TTM Stages of Change Model in Boosting Progress Toward 2FA Adoption

Authors: Cori Faklaris, Laura Dabbish, Jason I. Hong

Abstract: Behavior change ideas from health psychology can also help boost end user compliance with security recommendations, such as adopting two-factor authentication (2FA). Our research adapts the Transtheoretical Model Stages of Change from health and wellness research to a cybersecurity context. We first create and validate an assessment to identify workers on Amazon Mechanical Turk who have not enable… ▽ More Behavior change ideas from health psychology can also help boost end user compliance with security recommendations, such as adopting two-factor authentication (2FA). Our research adapts the Transtheoretical Model Stages of Change from health and wellness research to a cybersecurity context. We first create and validate an assessment to identify workers on Amazon Mechanical Turk who have not enabled 2FA for their accounts as being in Stage 1 (no intention to adopt 2FA) or Stages 2-3 (some intention to adopt 2FA). We randomly assigned participants to receive an informational intervention with varied content (highlighting process, norms, or both) or not. After three days, we again surveyed workers for Stage of Amazon 2FA adoption. We found that those in the intervention group showed more progress toward action/maintenance (Stages 4-5) than those in the control group, and those who received content highlighting the process of enabling 2FA were significantly more likely to progress toward 2FA adoption. Our work contributes support for applying a Stages of Change Model in usable security. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: 41 pages, including the stage algorithm programmed on Mturk, the survey flow and specific items used, and a link to download the five informational handouts used for the control condition and the 2FA intervention conditions

ACM Class: H.1.2; H.5.2; K.6.5

arXiv:2204.04540 [pdf, other]

Peekaboo: A Hub-Based Approach to Enable Transparency in Data Processing within Smart Homes (Extended Technical Report)

Authors: Haojian **, Gram Liu, David Hwang, Swarun Kumar, Yuvraj Agarwal, Jason I. Hong

Abstract: We present Peekaboo, a new privacy-sensitive architecture for smart homes that leverages an in-home hub to pre-process and minimize outgoing data in a structured and enforceable manner before sending it to external cloud servers. Peekaboo's key innovations are (1) abstracting common data pre-processing functionality into a small and fixed set of chainable operators, and (2) requiring that develope… ▽ More We present Peekaboo, a new privacy-sensitive architecture for smart homes that leverages an in-home hub to pre-process and minimize outgoing data in a structured and enforceable manner before sending it to external cloud servers. Peekaboo's key innovations are (1) abstracting common data pre-processing functionality into a small and fixed set of chainable operators, and (2) requiring that developers explicitly declare desired data collection behaviors (e.g., data granularity, destinations, conditions) in an application manifest, which also specifies how the operators are chained together. Given a manifest, Peekaboo assembles and executes a pre-processing pipeline using operators pre-loaded on the hub. In doing so, developers can collect smart home data on a need-to-know basis; third-party auditors can verify data collection behaviors; and the hub itself can offer a number of centralized privacy features to users across apps and devices, without additional effort from app developers. We present the design and implementation of Peekaboo, along with an evaluation of its coverage of smart home scenarios, system performance, data minimization, and example built-in privacy features. △ Less

Submitted 18 May, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

Comments: 19 pages

arXiv:2204.03114 [pdf]

Do They Accept or Resist Cybersecurity Measures? Development and Validation of the 13-Item Security Attitude Inventory (SA-13)

Authors: Cori Faklaris, Laura Dabbish, Jason I. Hong

Abstract: We present SA-13, the 13-item Security Attitude inventory. We develop and validate this assessment of cybersecurity attitudes by conducting an exploratory factor analysis, confirmatory factor analysis, and other tests with data from a U.S. Census-weighted Qualtrics panel (N=209). Beyond a core six indicators of Engagement with Security Measures (SA-Engagement, three items) and Attentiveness to Sec… ▽ More We present SA-13, the 13-item Security Attitude inventory. We develop and validate this assessment of cybersecurity attitudes by conducting an exploratory factor analysis, confirmatory factor analysis, and other tests with data from a U.S. Census-weighted Qualtrics panel (N=209). Beyond a core six indicators of Engagement with Security Measures (SA-Engagement, three items) and Attentiveness to Security Measures (SA-Attentiveness, three items), our SA-13 inventory adds indicators of Resistance to Security Measures (SA-Resistance, four items) and Concernedness with Improving Compliance (SA-Concernedness, three items). SA-13 and the subscales exhibit desirable psychometric qualities; and higher scores on SA-13 and on the SA-Engagement and SA-Attentiveness subscales are associated with higher scores for security behavior intention and for self-reported recent security behaviors. SA-13 and the subscales are useful for researchers and security awareness teams who need a lightweight survey measure of user security attitudes. The composite score of the 13 indicators provides a compact measurement of cybersecurity decisional balance. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: Includes the directions for administering the scales in an appendix

ACM Class: H.1.2; I.3.6; J.4

arXiv:2112.14205 [pdf]

Analysis of Longitudinal Changes in Privacy Behavior of Android Applications

Authors: Alexander Yu, Yuvraj Agarwal, Jason I. Hong

Abstract: Privacy concerns have long been expressed around smart devices, and the concerns around Android apps have been studied by many past works. Over the past 10 years, we have crawled and scraped data for almost 1.9 million apps, and also stored the APKs for 135,536 of them. In this paper, we examine the trends in how Android apps have changed over time with respect to privacy and look at it from two p… ▽ More Privacy concerns have long been expressed around smart devices, and the concerns around Android apps have been studied by many past works. Over the past 10 years, we have crawled and scraped data for almost 1.9 million apps, and also stored the APKs for 135,536 of them. In this paper, we examine the trends in how Android apps have changed over time with respect to privacy and look at it from two perspectives: (1) how privacy behavior in apps have changed as they are updated over time, (2) how these changes can be accounted for when comparing third-party libraries and the app's own internals. To study this, we examine the adoption of HTTPS, whether apps scan the device for other installed apps, the use of permissions for privacy-sensitive data, and the use of unique identifiers. We find that privacy-related behavior has improved with time as apps continue to receive updates, and that the third-party libraries used by apps are responsible for more issues with privacy. However, we observe that in the current state of Android apps, there has not been enough of an improvement in terms of privacy and many issues still need to be addressed. △ Less

Submitted 28 December, 2021; originally announced December 2021.

arXiv:2112.12009 [pdf]

Travel Guides for Creative Tourists, Powered by Geotagged Social Media

Authors: Dan Tasse, Jason I. Hong

Abstract: Many modern tourists want to know about everyday life and spend time like a local in a new city. Current tools and guides typically provide them with lists of sights to see, which do not meet their needs. Manually building new tools for them would not scale. However, public geotagged social media data, like tweets and photos, have the potential to fill this gap, showing users an interesting and un… ▽ More Many modern tourists want to know about everyday life and spend time like a local in a new city. Current tools and guides typically provide them with lists of sights to see, which do not meet their needs. Manually building new tools for them would not scale. However, public geotagged social media data, like tweets and photos, have the potential to fill this gap, showing users an interesting and unique side of a place. Through three studies surrounding the design and construction of a social-media-powered Neighborhood Guides website, we show recommendations for building such a site. Our findings highlight an important aspect of social media: while it lacks the user base and consistency to directly reflect users' lives, it does reveal the idealized everyday life that so many visitors want to know about. △ Less

Submitted 22 December, 2021; originally announced December 2021.

ACM Class: H.5.m

arXiv:2112.02775 [pdf, ps, other]

Sensor as a Company: On Self-Sustaining IoT Commons

Authors: Haojian **, Swarun Kumar, Jason I. Hong

Abstract: Beyond the "smart home" and "smart enterprise", the Internet of Things (IoT) revolution is creating "smart communities", where shared IoT devices collectively benefit a large number of residents, for transportation, healthcare, safety, and more. However, large-scale deployments of IoT-powered neighborhoods face two key socio-technical challenges: the significant upfront investment and the lack of… ▽ More Beyond the "smart home" and "smart enterprise", the Internet of Things (IoT) revolution is creating "smart communities", where shared IoT devices collectively benefit a large number of residents, for transportation, healthcare, safety, and more. However, large-scale deployments of IoT-powered neighborhoods face two key socio-technical challenges: the significant upfront investment and the lack of information on local IoT needs. In this paper, we present SensorInc, a new IoT deployment paradigm that incentivizes residents to design and manage sensor deployment through sensor liquefaction. By turning shared sensors into liquid (i.e. tradeable) assets akin to company stock or bond, users can design and invest in promising IoT deployments and receive monetary rewards afterward. We present the detailed design of SensorInc and conduct two case studies (parking occupancy sensors and air pollution sensors) to study the self-sustainability and deployment challenges of such a paradigm. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2111.12182 [pdf]

Identifying Terms and Conditions Important to Consumers using Crowdsourcing

Authors: Xingyu Liu, Annabel Sun, Jason I. Hong

Abstract: Terms and conditions (T&Cs) are pervasive on the web and often contain important information for consumers, but are rarely read. Previous research has explored methods to surface alarming privacy policies using manual labelers, natural language processing, and deep learning techniques. However, this prior work used pre-determined categories for annotations, and did not investigate what consumers r… ▽ More Terms and conditions (T&Cs) are pervasive on the web and often contain important information for consumers, but are rarely read. Previous research has explored methods to surface alarming privacy policies using manual labelers, natural language processing, and deep learning techniques. However, this prior work used pre-determined categories for annotations, and did not investigate what consumers really deem as important from their perspective. In this paper, we instead combine crowdsourcing with an open definition of "what is important" in T&Cs. We present a workflow consisting of pairwise comparisons, agreement validation, and Bradley-Terry rank modeling, to effectively establish rankings of T&C statements from non-expert crowdworkers on this open definition, and further analyzed consumers' preferences. We applied this workflow to 1,551 T&C statements from 27 e-commerce websites, contributed by 3,462 unique crowd workers doing 203,068 pairwise comparisons, and conducted thematic and readability analysis on the statements considered as important/unimportant. We found that consumers especially cared about policies related to after-sales and money, and tended to regard harder-to-understand statements as more important. We also present machine learning models to identify T&C clauses that consumers considered important, achieving at best a 92.7% balanced accuracy, 91.6% recall, and 89.2% precision. We foresee using our workflow and model to efficiently and reliably highlight important T&Cs on websites at a large scale, improving consumers' awareness △ Less

Submitted 30 November, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

arXiv:2109.11690 [pdf, other]

doi 10.1145/3479569

Discovering and Validating AI Errors With Crowdsourced Failure Reports

Authors: Ángel Alexander Cabrera, Abraham J. Druck, Jason I. Hong, Adam Perer

Abstract: AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or w… ▽ More AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or why a model failed, and show how developers can use them to detect AI errors. We also design and implement Deblinder, a visual analytics system for synthesizing failure reports that developers can use to discover and validate systematic failures. In semi-structured interviews and think-aloud studies with 10 AI practitioners, we explore the affordances of the Deblinder system and the applicability of failure reports in real-world settings. Lastly, we show how collecting additional data from the groups identified by developers can improve model performance. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2104.12032 [pdf]

The Design of the User Interfaces for Privacy Enhancements for Android

Authors: Jason I. Hong, Yuvraj Agarwal, Matt Fredrikson, Mike Czapik, Shawn Hanna, Swarup Sahoo, Judy Chun, Won-Woo Chung, Aniruddh Iyer, Ally Liu, Shen Lu, Rituparna Roychoudhury, Qian Wang, Shan Wang, Siqi Wang, Vida Zhang, Jessica Zhao, Yuan Jiang, Haojian **, Sam Kim, Evelyn Kuo, Tianshi Li, **** Liu, Yile Liu, Robert Zhang

Abstract: We present the design and design rationale for the user interfaces for Privacy Enhancements for Android (PE for Android). These UIs are built around two core ideas, namely that developers should explicitly declare the purpose of why sensitive data is being used, and these permission-purpose pairs should be split by first party and third party uses. We also present a taxonomy of purposes and ways o… ▽ More We present the design and design rationale for the user interfaces for Privacy Enhancements for Android (PE for Android). These UIs are built around two core ideas, namely that developers should explicitly declare the purpose of why sensitive data is being used, and these permission-purpose pairs should be split by first party and third party uses. We also present a taxonomy of purposes and ways of how these ideas can be deployed in the existing Android ecosystem. △ Less

Submitted 24 April, 2021; originally announced April 2021.

Comments: 58 pages, 21 figures, 3 tables

arXiv:2012.12415 [pdf, other]

What Makes People Install a COVID-19 Contact-Tracing App? Understanding the Influence of App Design and Individual Difference on Contact-Tracing App Adoption Intention

Authors: Tianshi Li, Camille Cobb, Jackie, Yang, Sagar Baviskar, Yuvraj Agarwal, Beibei Li, Lujo Bauer, Jason I. Hong

Abstract: Smartphone-based contact-tracing apps are a promising solution to help scale up the conventional contact-tracing process. However, low adoption rates have become a major issue that prevents these apps from achieving their full potential. In this paper, we present a national-scale survey experiment ($N = 1963$) in the U.S. to investigate the effects of app design choices and individual differences… ▽ More Smartphone-based contact-tracing apps are a promising solution to help scale up the conventional contact-tracing process. However, low adoption rates have become a major issue that prevents these apps from achieving their full potential. In this paper, we present a national-scale survey experiment ($N = 1963$) in the U.S. to investigate the effects of app design choices and individual differences on COVID-19 contact-tracing app adoption intentions. We found that individual differences such as prosocialness, COVID-19 risk perceptions, general privacy concerns, technology readiness, and demographic factors played a more important role than app design choices such as decentralized design vs. centralized design, location use, app providers, and the presentation of security risks. Certain app designs could exacerbate the different preferences in different sub-populations which may lead to an inequality of acceptance to certain app design choices (e.g., developed by state health authorities vs. a large tech company) among different groups of people (e.g., people living in rural areas vs. people living in urban areas). Our mediation analysis showed that one's perception of the public health benefits offered by the app and the adoption willingness of other people had a larger effect in explaining the observed effects of app design choices and individual differences than one's perception of the app's security and privacy risks. With these findings, we discuss practical implications on the design, marketing, and deployment of COVID-19 contact-tracing apps in the U.S. △ Less

Submitted 10 May, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

Comments: 44 pages, 7 figures, 7 tables

arXiv:2005.11957 [pdf, other]

Decentralized is not risk-free: Understanding public perceptions of privacy-utility trade-offs in COVID-19 contact-tracing apps

Authors: Tianshi Li, Jackie, Yang, Cori Faklaris, Jennifer King, Yuvraj Agarwal, Laura Dabbish, Jason I. Hong

Abstract: Contact-tracing apps have potential benefits in hel** health authorities to act swiftly to halt the spread of COVID-19. However, their effectiveness is heavily dependent on their installation rate, which may be influenced by people's perceptions of the utility of these apps and any potential privacy risks due to the collection and releasing of sensitive user data (e.g., user identity and locatio… ▽ More Contact-tracing apps have potential benefits in hel** health authorities to act swiftly to halt the spread of COVID-19. However, their effectiveness is heavily dependent on their installation rate, which may be influenced by people's perceptions of the utility of these apps and any potential privacy risks due to the collection and releasing of sensitive user data (e.g., user identity and location). In this paper, we present a survey study that examined people's willingness to install six different contact-tracing apps after informing them of the risks and benefits of each design option (with a U.S.-only sample on Amazon Mechanical Turk, $N=208$). The six app designs covered two major design dimensions (centralized vs decentralized, basic contact tracing vs. also providing hotspot information), grounded in our analysis of existing contact-tracing app proposals. Contrary to assumptions of some prior work, we found that the majority of people in our sample preferred to install apps that use a centralized server for contact tracing, as they are more willing to allow a centralized authority to access the identity of app users rather than allowing tech-savvy users to infer the identity of diagnosed users. We also found that the majority of our sample preferred to install apps that share diagnosed users' recent locations in public places to show hotspots of infection. Our results suggest that apps using a centralized architecture with strong security protection to do basic contact tracing and providing users with other useful information such as hotspots of infection in public places may achieve a high adoption rate in the U.S. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 21 pages, 8 figures

ACM Class: K.4.1; H.5.m

Showing 1–15 of 15 results for author: Hong, J I