Search | arXiv e-print repository

Diverse Perspectives, Divergent Models: Cross-Cultural Evaluation of Depression Detection on Twitter

Authors: Nuredin Ali, Charles Chuankai Zhang, Ned Mayo, Stevie Chancellor

Abstract: Social media data has been used for detecting users with mental disorders, such as depression. Despite the global significance of cross-cultural representation and its potential impact on model performance, publicly available datasets often lack crucial metadata related to this aspect. In this work, we evaluate the generalization of benchmark datasets to build AI models on cross-cultural Twitter d… ▽ More Social media data has been used for detecting users with mental disorders, such as depression. Despite the global significance of cross-cultural representation and its potential impact on model performance, publicly available datasets often lack crucial metadata related to this aspect. In this work, we evaluate the generalization of benchmark datasets to build AI models on cross-cultural Twitter data. We gather a custom geo-located Twitter dataset of depressed users from seven countries as a test dataset. Our results show that depression detection models do not generalize globally. The models perform worse on Global South users compared to Global North. Pre-trained language models achieve the best generalization compared to Logistic Regression, though still show significant gaps in performance on depressed and non-Western users. We quantify our findings and provide several actionable suggestions to mitigate this issue. △ Less

Submitted 31 March, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures, NAACL 2024 Main Conference

Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

arXiv:2404.14548 [pdf, ps, other]

Advancing a Consent-Forward Paradigm for Digital Mental Health Data

Authors: Sachin R. Pendse, Logan Stapleton, Neha Kumar, Munmun De Choudhury, Stevie Chancellor

Abstract: The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to gener… ▽ More The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to generate revenue for private companies. Given a long history of service user exclusion from data collection practices, we propose an alternative approach that is attentive to this history: the consent-forward paradigm. This paradigm embeds principles of affirmative consent in the design of digital mental health tools and services, strengthening trust through designing around individual choices and needs, and proactively protecting users from unexpected harm. In this perspective, we outline practical steps to implement this paradigm, toward ensuring that people searching for care have the safest experiences possible. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 15 pages with 2 tables

arXiv:2305.13238 [pdf]

doi 10.1145/3593013.3594070

The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data Producers

Authors: Hanlin Li, Nicholas Vincent, Stevie Chancellor, Brent Hecht

Abstract: Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with t… ▽ More Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in sha** the technology landscape. By synthesizing related literature that reconceptualizes the production of data for computing as ``data labor'', we outline opportunities for researchers, policymakers, and activists to empower data producers in their relationship with tech companies, e.g advocating for transparency about data reuse, creating feedback channels between data producers and companies, and potentially develo** mechanisms to share data's revenue more broadly. In doing so, we characterize data labor with six important dimensions - legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap - based on the parallels between data labor and various other types of labor in the computing literature. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: To appear at the 2023 ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT)

arXiv:2212.00849 [pdf, other]

doi 10.1145/3579597

"All of the White People Went First": How Video Conferencing Consolidates Control and Exacerbates Workplace Bias

Authors: Mo Houtti, Moyan Zhou, Loren Terveen, Stevie Chancellor

Abstract: Workplace bias creates negative psychological outcomes for employees, permeating the larger organization. Workplace meetings are frequent, making them a key context where bias may occur. Video conferencing (VC) is an increasingly common medium for workplace meetings; we therefore investigated how VC tools contribute to increasing or reducing bias in meetings. Through a semi-structured interview st… ▽ More Workplace bias creates negative psychological outcomes for employees, permeating the larger organization. Workplace meetings are frequent, making them a key context where bias may occur. Video conferencing (VC) is an increasingly common medium for workplace meetings; we therefore investigated how VC tools contribute to increasing or reducing bias in meetings. Through a semi-structured interview study with 22 professionals, we found that VC features push meeting leaders to exercise control over various meeting parameters, giving leaders an outsized role in affecting bias. We demonstrate this with respect to four core VC features -- user tiles, raise hand, text-based chat, and meeting recording -- and recommend employing at least one of two mechanisms for mitigating bias in VC meetings -- 1) transferring control from meeting leaders to technical systems or other attendees and 2) hel** meeting leaders better exercise the control they do wield. △ Less

Submitted 30 January, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: To appear at the 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW 2023)

arXiv:2209.03941 [pdf, other]

The Users Aren't Alright: Dangerous Mental Illness Behaviors and Recommendations

Authors: Ashlee Milton, Stevie Chancellor

Abstract: In this paper, we argue that recommendation systems are in a unique position to propagate dangerous and cruel behaviors to people with mental illnesses. In this paper, we argue that recommendation systems are in a unique position to propagate dangerous and cruel behaviors to people with mental illnesses. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: Accepted to the 5th FAccTRec Workshop: Responsible Recommendation (https://facctrec.github.io/facctrec2022/) -- Workshop co-located with the 16th ACM Conference on Recommender Systems

arXiv:2205.14529 [pdf]

All That's Happening behind the Scenes: Putting the Spotlight on Volunteer Moderator Labor in Reddit

Authors: Hanlin Li, Brent Hecht, Stevie Chancellor

Abstract: Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, d… ▽ More Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, developers and researchers of social platforms must account for and support as much of this labor as possible. In this paper, we quantitatively characterize the publicly visible and invisible actions taken by moderators on Reddit, using a unique dataset of private moderator logs for 126 subreddits and over 900 moderators. Our analysis of this dataset reveals the heterogeneity of moderation work across both communities and moderators. Moreover, we find that analyzing only visible work - the dominant way that moderation work has been studied thus far - drastically underestimates the amount of human moderation labor on a subreddit. We discuss the implications of our results on content moderation research and social platforms. △ Less

Submitted 5 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: This is a preprint. The paper will be presented at the 2022 International Conference on Web and Social Media (ICWSM'22)

arXiv:2205.14528 [pdf]

Measuring the Monetary Value of Online Volunteer Work

Authors: Hanlin Li, Brent Hecht, Stevie Chancellor

Abstract: Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer… ▽ More Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer labor's value. Using a novel dataset of private logs generated by moderators, we use linear mixed-effect regression and estimate that Reddit moderators worked a minimum of 466 hours per day in 2020. These hours amount to 3.4 million USD a year based on the median hourly wage for comparable content moderation services in the U.S. We discuss how this information may inform pathways to alleviate the one-sided relationship between technology companies and online volunteers. △ Less

Submitted 5 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: This is a preprint. The paper will be presented at the 2022 International Conference on Web and Social Media (ICWSM'22)

arXiv:2203.00432 [pdf]

Towards Practices for Human-Centered Machine Learning

Authors: Stevie Chancellor

Abstract: "Human-centered machine learning" (HCML) is a term that describes machine learning that applies to human-focused problems. Although this idea is noteworthy and generates scholarly excitement, scholars and practitioners have struggled to clearly define and implement HCML in computer science. This article proposes practices for human-centered machine learning, an area where studying and designing fo… ▽ More "Human-centered machine learning" (HCML) is a term that describes machine learning that applies to human-focused problems. Although this idea is noteworthy and generates scholarly excitement, scholars and practitioners have struggled to clearly define and implement HCML in computer science. This article proposes practices for human-centered machine learning, an area where studying and designing for social, cultural, and ethical implications are just as important as technical advances in ML. These practices bridge between interdisciplinary perspectives of HCI, AI, and sociotechnical fields, as well as ongoing discourse on this new area. The five practices include ensuring HCML is the appropriate solution space for a problem; conceptualizing problem statements as position statements; moving beyond interaction models to define the human; legitimizing domain contributions; and anticipating sociotechnical failure. I conclude by suggesting how these practices might be implemented in research and practice. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 9 pages plus references

arXiv:2012.09995 [pdf, other]

Data Leverage: A Framework for Empowering the Public in its Relationship with Technology Companies

Authors: Nicholas Vincent, Hanlin Li, Nicole Tilly, Stevie Chancellor, Brent Hecht

Abstract: Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stop**, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies. In this paper, we synthes… ▽ More Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stop**, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies. In this paper, we synthesize emerging research that seeks to better understand and help people action this \textit{data leverage}. Drawing on prior work in areas including machine learning, human-computer interaction, and fairness and accountability in computing, we present a framework for understanding data leverage that highlights new opportunities to change technology company behavior related to privacy, economic inequality, content moderation and other areas of societal concern. Our framework also points towards ways that policymakers can bolster data leverage as a means of changing the balance of power between the public and tech companies. △ Less

Submitted 17 February, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: This is a preprint. The paper will be presented at the 2021 Conference on Fairness, Accountability, and Transparency (FAccT 2021)

arXiv:1712.01411 [pdf, other]

#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation

Authors: Ian Stewart, Stevie Chancellor, Munmun De Choudhury, Jacob Eisenstein

Abstract: Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We… ▽ More Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We find that not only does orthographic variation grow more frequent over time, it also becomes more profound or deep, with variants becoming increasingly distant from the original: as, for example, #anarexyia is more distant than #anarexia from the original spelling #anorexia. These changes are driven by newcomers, who adopt the most extreme linguistic practices as they enter the community. Moreover, this behavior correlates with engagement: the newcomers who adopt deeper orthographic variants tend to remain active for longer in the community, and the posts that contain deeper variation receive more positive feedback in the form of "likes." Previous work has linked community membership change with language change, and our work casts this connection in a new light, with newcomers driving an evolving practice, rather than adapting to it. We also demonstrate the utility of orthographic variation as a new lens to study sociolinguistic change in online communities, particularly when the change results from an exogenous force such as a content ban. △ Less

Submitted 4 December, 2017; originally announced December 2017.

Showing 1–10 of 10 results for author: Chancellor, S