-
Diverse Perspectives, Divergent Models: Cross-Cultural Evaluation of Depression Detection on Twitter
Authors:
Nuredin Ali,
Charles Chuankai Zhang,
Ned Mayo,
Stevie Chancellor
Abstract:
Social media data has been used for detecting users with mental disorders, such as depression. Despite the global significance of cross-cultural representation and its potential impact on model performance, publicly available datasets often lack crucial metadata related to this aspect. In this work, we evaluate the generalization of benchmark datasets to build AI models on cross-cultural Twitter d…
▽ More
Social media data has been used for detecting users with mental disorders, such as depression. Despite the global significance of cross-cultural representation and its potential impact on model performance, publicly available datasets often lack crucial metadata related to this aspect. In this work, we evaluate the generalization of benchmark datasets to build AI models on cross-cultural Twitter data. We gather a custom geo-located Twitter dataset of depressed users from seven countries as a test dataset. Our results show that depression detection models do not generalize globally. The models perform worse on Global South users compared to Global North. Pre-trained language models achieve the best generalization compared to Logistic Regression, though still show significant gaps in performance on depressed and non-Western users. We quantify our findings and provide several actionable suggestions to mitigate this issue.
△ Less
Submitted 31 March, 2024;
originally announced June 2024.
-
Advancing a Consent-Forward Paradigm for Digital Mental Health Data
Authors:
Sachin R. Pendse,
Logan Stapleton,
Neha Kumar,
Munmun De Choudhury,
Stevie Chancellor
Abstract:
The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to gener…
▽ More
The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to generate revenue for private companies. Given a long history of service user exclusion from data collection practices, we propose an alternative approach that is attentive to this history: the consent-forward paradigm. This paradigm embeds principles of affirmative consent in the design of digital mental health tools and services, strengthening trust through designing around individual choices and needs, and proactively protecting users from unexpected harm. In this perspective, we outline practical steps to implement this paradigm, toward ensuring that people searching for care have the safest experiences possible.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data Producers
Authors:
Hanlin Li,
Nicholas Vincent,
Stevie Chancellor,
Brent Hecht
Abstract:
Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with t…
▽ More
Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in sha** the technology landscape. By synthesizing related literature that reconceptualizes the production of data for computing as ``data labor'', we outline opportunities for researchers, policymakers, and activists to empower data producers in their relationship with tech companies, e.g advocating for transparency about data reuse, creating feedback channels between data producers and companies, and potentially develo** mechanisms to share data's revenue more broadly. In doing so, we characterize data labor with six important dimensions - legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap - based on the parallels between data labor and various other types of labor in the computing literature.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
"All of the White People Went First": How Video Conferencing Consolidates Control and Exacerbates Workplace Bias
Authors:
Mo Houtti,
Moyan Zhou,
Loren Terveen,
Stevie Chancellor
Abstract:
Workplace bias creates negative psychological outcomes for employees, permeating the larger organization. Workplace meetings are frequent, making them a key context where bias may occur. Video conferencing (VC) is an increasingly common medium for workplace meetings; we therefore investigated how VC tools contribute to increasing or reducing bias in meetings. Through a semi-structured interview st…
▽ More
Workplace bias creates negative psychological outcomes for employees, permeating the larger organization. Workplace meetings are frequent, making them a key context where bias may occur. Video conferencing (VC) is an increasingly common medium for workplace meetings; we therefore investigated how VC tools contribute to increasing or reducing bias in meetings. Through a semi-structured interview study with 22 professionals, we found that VC features push meeting leaders to exercise control over various meeting parameters, giving leaders an outsized role in affecting bias. We demonstrate this with respect to four core VC features -- user tiles, raise hand, text-based chat, and meeting recording -- and recommend employing at least one of two mechanisms for mitigating bias in VC meetings -- 1) transferring control from meeting leaders to technical systems or other attendees and 2) hel** meeting leaders better exercise the control they do wield.
△ Less
Submitted 30 January, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
The Users Aren't Alright: Dangerous Mental Illness Behaviors and Recommendations
Authors:
Ashlee Milton,
Stevie Chancellor
Abstract:
In this paper, we argue that recommendation systems are in a unique position to propagate dangerous and cruel behaviors to people with mental illnesses.
In this paper, we argue that recommendation systems are in a unique position to propagate dangerous and cruel behaviors to people with mental illnesses.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
All That's Happening behind the Scenes: Putting the Spotlight on Volunteer Moderator Labor in Reddit
Authors:
Hanlin Li,
Brent Hecht,
Stevie Chancellor
Abstract:
Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, d…
▽ More
Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, developers and researchers of social platforms must account for and support as much of this labor as possible. In this paper, we quantitatively characterize the publicly visible and invisible actions taken by moderators on Reddit, using a unique dataset of private moderator logs for 126 subreddits and over 900 moderators. Our analysis of this dataset reveals the heterogeneity of moderation work across both communities and moderators. Moreover, we find that analyzing only visible work - the dominant way that moderation work has been studied thus far - drastically underestimates the amount of human moderation labor on a subreddit. We discuss the implications of our results on content moderation research and social platforms.
△ Less
Submitted 5 June, 2022; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Measuring the Monetary Value of Online Volunteer Work
Authors:
Hanlin Li,
Brent Hecht,
Stevie Chancellor
Abstract:
Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer…
▽ More
Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer labor's value. Using a novel dataset of private logs generated by moderators, we use linear mixed-effect regression and estimate that Reddit moderators worked a minimum of 466 hours per day in 2020. These hours amount to 3.4 million USD a year based on the median hourly wage for comparable content moderation services in the U.S. We discuss how this information may inform pathways to alleviate the one-sided relationship between technology companies and online volunteers.
△ Less
Submitted 5 June, 2022; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Towards Practices for Human-Centered Machine Learning
Authors:
Stevie Chancellor
Abstract:
"Human-centered machine learning" (HCML) is a term that describes machine learning that applies to human-focused problems. Although this idea is noteworthy and generates scholarly excitement, scholars and practitioners have struggled to clearly define and implement HCML in computer science. This article proposes practices for human-centered machine learning, an area where studying and designing fo…
▽ More
"Human-centered machine learning" (HCML) is a term that describes machine learning that applies to human-focused problems. Although this idea is noteworthy and generates scholarly excitement, scholars and practitioners have struggled to clearly define and implement HCML in computer science. This article proposes practices for human-centered machine learning, an area where studying and designing for social, cultural, and ethical implications are just as important as technical advances in ML. These practices bridge between interdisciplinary perspectives of HCI, AI, and sociotechnical fields, as well as ongoing discourse on this new area. The five practices include ensuring HCML is the appropriate solution space for a problem; conceptualizing problem statements as position statements; moving beyond interaction models to define the human; legitimizing domain contributions; and anticipating sociotechnical failure. I conclude by suggesting how these practices might be implemented in research and practice.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Data Leverage: A Framework for Empowering the Public in its Relationship with Technology Companies
Authors:
Nicholas Vincent,
Hanlin Li,
Nicole Tilly,
Stevie Chancellor,
Brent Hecht
Abstract:
Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stop**, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies. In this paper, we synthes…
▽ More
Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stop**, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies. In this paper, we synthesize emerging research that seeks to better understand and help people action this \textit{data leverage}. Drawing on prior work in areas including machine learning, human-computer interaction, and fairness and accountability in computing, we present a framework for understanding data leverage that highlights new opportunities to change technology company behavior related to privacy, economic inequality, content moderation and other areas of societal concern. Our framework also points towards ways that policymakers can bolster data leverage as a means of changing the balance of power between the public and tech companies.
△ Less
Submitted 17 February, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation
Authors:
Ian Stewart,
Stevie Chancellor,
Munmun De Choudhury,
Jacob Eisenstein
Abstract:
Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We…
▽ More
Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We find that not only does orthographic variation grow more frequent over time, it also becomes more profound or deep, with variants becoming increasingly distant from the original: as, for example, #anarexyia is more distant than #anarexia from the original spelling #anorexia. These changes are driven by newcomers, who adopt the most extreme linguistic practices as they enter the community. Moreover, this behavior correlates with engagement: the newcomers who adopt deeper orthographic variants tend to remain active for longer in the community, and the posts that contain deeper variation receive more positive feedback in the form of "likes." Previous work has linked community membership change with language change, and our work casts this connection in a new light, with newcomers driving an evolving practice, rather than adapting to it. We also demonstrate the utility of orthographic variation as a new lens to study sociolinguistic change in online communities, particularly when the change results from an exogenous force such as a content ban.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.