Search | arXiv e-print repository

The Effects of Group Sanctions on Participation and Toxicity: Quasi-experimental Evidence from the Fediverse

Authors: Carl Colglazier, Nathan TeBlunthuis, Aaron Shaw

Abstract: Online communities often overlap and coexist, despite incongruent norms and approaches to content moderation. When communities diverge, decentralized and federated communities may pursue group-level sanctions, including defederation (disconnection) to block communication between members of specific communities. We investigate the effects of defederation in the context of the Fediverse, a set of de… ▽ More Online communities often overlap and coexist, despite incongruent norms and approaches to content moderation. When communities diverge, decentralized and federated communities may pursue group-level sanctions, including defederation (disconnection) to block communication between members of specific communities. We investigate the effects of defederation in the context of the Fediverse, a set of decentralized, interconnected social networks with independent governance. Mastodon and Pleroma, the most popular software powering the Fediverse, allow administrators on one server to defederate from another. We use a difference-in-differences approach and matched controls to estimate the effects of defederation events on participation and message toxicity among affected members of the blocked and blocking servers. We find that defederation causes a drop in activity for accounts on the blocked servers, but not on the blocking servers. Also, we find no evidence of an effect of defederation on message toxicity. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 13 pages, 8 figures; Accepted to the 18th International AAAI Conference on Web and Social Media

arXiv:2307.06483 [pdf, other]

Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!

Authors: Nathan TeBlunthuis, Valerie Hase, Chung-Hong Chan

Abstract: Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading resul… ▽ More Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods. △ Less

Submitted 10 December, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: 66 page, 43 Figures. Accepted for publication in Communication Methods & Measures. Top Paper Award from the 2023 Annual Meeting of The International Communication Association Computational Methods Division

ACM Class: G.3; K.4.0; I.2.6

arXiv:2201.04271 [pdf, ps, other]

No Community Can Do Everything: Why People Participate in Similar Online Communities

Authors: Nathan TeBlunthuis, Charles Kiene, Isabella Brown, Laura Alia Levi, Nicole McGinnis, Benjamin Mako Hill

Abstract: Large-scale quantitative analyses have shown that individuals frequently talk to each other about similar things in different online spaces. Why do these overlap** communities exist? We provide an answer grounded in the analysis of 20 interviews with active participants in clusters of highly related subreddits. Within a broad topical area, there are a diversity of benefits an online community ca… ▽ More Large-scale quantitative analyses have shown that individuals frequently talk to each other about similar things in different online spaces. Why do these overlap** communities exist? We provide an answer grounded in the analysis of 20 interviews with active participants in clusters of highly related subreddits. Within a broad topical area, there are a diversity of benefits an online community can confer. These include (a) specific information and discussion, (b) socialization with similar others, and (c) attention from the largest possible audience. A single community cannot meet all three needs. Our findings suggest that topical areas within an online community platform tend to become populated by groups of specialized communities with diverse sizes, topical boundaries, and rules. Compared with any single community, such systems of overlap** communities are able to provide a greater range of benefits. △ Less

Submitted 10 February, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

Comments: Accepted to CSCW 2022

ACM Class: K.4.0

arXiv:2108.10684 [pdf, other]

doi 10.1145/3479986.3479991

Measuring Wikipedia Article Quality in One Dimension by Extending ORES with Ordinal Regression

Authors: Nathan TeBlunthuis

Abstract: Organizing complex peer production projects and advancing scientific knowledge of open collaboration each depend on the ability to measure quality. Article quality ratings on English language Wikipedia have been widely used by both Wikipedia community members and academic researchers for purposes like tracking knowledge gaps and studying how political polarization shapes collaboration. Even so, me… ▽ More Organizing complex peer production projects and advancing scientific knowledge of open collaboration each depend on the ability to measure quality. Article quality ratings on English language Wikipedia have been widely used by both Wikipedia community members and academic researchers for purposes like tracking knowledge gaps and studying how political polarization shapes collaboration. Even so, measuring quality presents many methodological challenges. The most widely used systems use labels on discrete ordinal scales when assessing quality, but such labels can be inconvenient for statistics and machine learning. Prior work handles this by assuming that different levels of quality are "evenly spaced" from one another. This assumption runs counter to intuitions about the relative degrees of effort needed to raise Wikipedia encyclopedia articles to different quality levels. Furthermore, models from prior work are fit to datasets that oversample high-quality articles. This limits their accuracy for representative samples of articles or revisions. I describe a technique extending the Wikimedia Foundations' ORES article quality model to address these limitations. My method uses weighted ordinal regression models to construct one-dimensional continuous measures of quality. While scores from my technique and from prior approaches are correlated, my approach improves accuracy for research datasets and provides evidence that the "evenly spaced" assumption is unfounded in practice on English Wikipedia. I conclude with recommendations for using quality scores in future research and include the full code, data, and models. △ Less

Submitted 31 August, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

Comments: 15 pages, 4 figures, Accepted to OpenSym 2021

ACM Class: H.0; J.4; K.4; I.2

arXiv:2107.06970 [pdf, other]

Identifying Competition and Mutualism Between Online Groups

Authors: Nathan TeBlunthuis, Benjamin Mako Hill

Abstract: Platforms often host multiple online groups with overlap** topics and members. How can researchers and designers understand how related groups affect each other? Inspired by population ecology, prior research in social computing and human-computer interaction has studied related groups by correlating group size with degrees of overlap in content and membership, but has produced puzzling results:… ▽ More Platforms often host multiple online groups with overlap** topics and members. How can researchers and designers understand how related groups affect each other? Inspired by population ecology, prior research in social computing and human-computer interaction has studied related groups by correlating group size with degrees of overlap in content and membership, but has produced puzzling results: overlap is associated with competition in some contexts but with mutualism in others. We suggest that this inconsistency results from aggregating intergroup relationships into an overall environmental effect that obscures the diversity of competition and mutualism among related groups. Drawing on the framework of community ecology, we introduce a time-series method for inferring competition and mutualism. We then use this framework to inform a large-scale analysis of clusters of subreddits that all have high user overlap. We find that mutualism is more common than competition. △ Less

Submitted 18 January, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

Comments: 10 pages, 6 figures

arXiv:2006.03121 [pdf, other]

doi 10.1145/3449130

Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia

Authors: Nathan TeBlunthuis, Benjamin Mako Hill, Aaron Halfaker

Abstract: Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to overprofiling bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can als… ▽ More Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can lead to overprofiling bias when moderators focus on these signals but overlook the misbehavior of others. We propose that algorithmic flagging systems deployed to improve the efficiency of moderation work can also make moderation actions more fair to these users by reducing reliance on social signals and making norm violations by everyone else more visible. We analyze moderator behavior in Wikipedia as mediated by RCFilters, a system which displays social signals and algorithmic flags, and estimate the causal effect of being flagged on moderator actions. We show that algorithmically flagged edits are reverted more often, especially those by established editors with positive social signals, and that flagging decreases the likelihood that moderation actions will be undone. Our results suggest that algorithmic flagging systems can lead to increased fairness in some contexts but that the relationship is complex and contingent. △ Less

Submitted 5 April, 2021; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: 27 pages, 11 figures, ACM CSCW

ACM Class: K.4.3

Journal ref: Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 56 (April 2021), 27 pages

arXiv:2006.03119 [pdf, other]

How individual behaviors drive inequality in online community sizes: an agent-based simulation

Authors: Jeremy Foote, Nathan TeBlunthuis, Benjamin Mako Hill, Aaron Shaw

Abstract: Why are online community sizes so extremely unequal? Most answers to this question have pointed to general mathematical processes drawn from physics like cumulative advantage. These explanations provide little insight into specific social dynamics or decisions that individuals make when joining and leaving communities. In addition, explanations in terms of cumulative advantage do not draw from the… ▽ More Why are online community sizes so extremely unequal? Most answers to this question have pointed to general mathematical processes drawn from physics like cumulative advantage. These explanations provide little insight into specific social dynamics or decisions that individuals make when joining and leaving communities. In addition, explanations in terms of cumulative advantage do not draw from the enormous body of social computing research that studies individual behavior. Our work bridges this divide by testing whether two influential social mechanisms used to explain community joining can also explain the distribution of community sizes. Using agent-based simulations, we evaluate how well individual-level processes of social exposure and decisions based on individual expected benefits reproduce empirical community size data from Reddit. Our simulations contribute to social computing theory by providing evidence that both processes together---but neither alone---generate realistic distributions of community sizes. Our results also illustrate the potential value of agent-based simulation to online community researchers to both evaluate and bridge individual and group-level theories. △ Less

Submitted 4 June, 2020; originally announced June 2020.

ACM Class: K.4.3

Showing 1–7 of 7 results for author: TeBlunthuis, N