Exploring Polarization of Users Behavior on Twitter During the 2019 South American Protests
Authors:
Ramon Villa-Cox,
Helen,
Zeng,
Ashiqur R. KhudaBukhsh,
Kathleen M. Carley
Abstract:
Research across different disciplines has documented the expanding polarization in social media. However, much of it focused on the US political system or its culturally controversial topics. In this work, we explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicia…
▽ More
Research across different disciplines has documented the expanding polarization in social media. However, much of it focused on the US political system or its culturally controversial topics. In this work, we explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicians' tweets and hashtag campaigns with defined stances towards the protest (for or against), we construct a weakly labeled stance dataset with millions of users. We explore polarization in two related dimensions: language and news consumption patterns. In terms of linguistic polarization, we apply recent insights that leveraged machine translation methods, showing that the two communities speak consistently "different" languages, mainly along ideological lines (e.g., fascist translates to communist). Our results indicate that this recently-proposed methodology is also informative in different languages and contexts than originally applied. In terms of news consumption patterns, we cluster news agencies based on homogeneity of their user bases and quantify the observed polarization in its consumption. We find empirical evidence of the "filter bubble" phenomenon during the event, as we not only show that the user bases are homogeneous in terms of stance, but the probability that a user transitions from media of different clusters is low.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
Stance in Replies and Quotes (SRQ): A New Dataset For Learning Stance in Twitter Conversations
Authors:
Ramon Villa-Cox,
Sumeet Kumar,
Matthew Babcock,
Kathleen M. Carley
Abstract:
Automated ways to extract stance (denying vs. supporting opinions) from conversations on social media are essential to advance opinion mining research. Recently, there is a renewed excitement in the field as we see new models attempting to improve the state-of-the-art. However, for training and evaluating the models, the datasets used are often small. Additionally, these small datasets have uneven…
▽ More
Automated ways to extract stance (denying vs. supporting opinions) from conversations on social media are essential to advance opinion mining research. Recently, there is a renewed excitement in the field as we see new models attempting to improve the state-of-the-art. However, for training and evaluating the models, the datasets used are often small. Additionally, these small datasets have uneven class distributions, i.e., only a tiny fraction of the examples in the dataset have favoring or denying stances, and most other examples have no clear stance. Moreover, the existing datasets do not distinguish between the different types of conversations on social media (e.g., replying vs. quoting on Twitter). Because of this, models trained on one event do not generalize to other events.
In the presented work, we create a new dataset by labeling stance in responses to posts on Twitter (both replies and quotes) on controversial issues. To the best of our knowledge, this is currently the largest human-labeled stance dataset for Twitter conversations with over 5200 stance labels. More importantly, we designed a tweet collection methodology that favors the selection of denial-type responses. This class is expected to be more useful in the identification of rumors and determining antagonistic relationships between users. Moreover, we include many baseline models for learning the stance in conversations and compare the performance of various models. We show that combining data from replies and quotes decreases the accuracy of models indicating that the two modalities behave differently when it comes to stance learning.
△ Less
Submitted 27 June, 2020; v1 submitted 31 May, 2020;
originally announced June 2020.